Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.11/site-packages/networkx/algorithms/community/louvain.py: 13%

Shortcuts on this page

r m x   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

135 statements  

1"""Functions for detecting communities based on Louvain Community Detection 

2Algorithm""" 

3 

4import itertools 

5from collections import defaultdict, deque 

6 

7import networkx as nx 

8from networkx.algorithms.community import modularity 

9from networkx.utils import py_random_state 

10 

11__all__ = ["louvain_communities", "louvain_partitions"] 

12 

13 

14@py_random_state("seed") 

15@nx._dispatchable(edge_attrs="weight") 

16def louvain_communities( 

17 G, weight="weight", resolution=1, threshold=0.0000001, max_level=None, seed=None 

18): 

19 r"""Find the best partition of a graph using the Louvain Community Detection 

20 Algorithm. 

21 

22 Louvain Community Detection Algorithm is a simple method to extract the community 

23 structure of a network. This is a heuristic method based on modularity optimization. [1]_ 

24 

25 The algorithm works in 2 steps. On the first step it assigns every node to be 

26 in its own community and then for each node it tries to find the maximum positive 

27 modularity gain by moving each node to all of its neighbor communities. If no positive 

28 gain is achieved the node remains in its original community. 

29 

30 The modularity gain obtained by moving an isolated node $i$ into a community $C$ can 

31 easily be calculated by the following formula (combining [1]_ [2]_ and some algebra): 

32 

33 .. math:: 

34 \Delta Q = \frac{k_{i,in}}{2m} - \gamma\frac{ \Sigma_{tot} \cdot k_i}{2m^2} 

35 

36 where $m$ is the size of the graph, $k_{i,in}$ is the sum of the weights of the links 

37 from $i$ to nodes in $C$, $k_i$ is the sum of the weights of the links incident to node $i$, 

38 $\Sigma_{tot}$ is the sum of the weights of the links incident to nodes in $C$ and $\gamma$ 

39 is the resolution parameter. 

40 

41 For the directed case the modularity gain can be computed using this formula according to [3]_ 

42 

43 .. math:: 

44 \Delta Q = \frac{k_{i,in}}{m} 

45 - \gamma\frac{k_i^{out} \cdot\Sigma_{tot}^{in} + k_i^{in} \cdot \Sigma_{tot}^{out}}{m^2} 

46 

47 where $k_i^{out}$, $k_i^{in}$ are the outer and inner weighted degrees of node $i$ and 

48 $\Sigma_{tot}^{in}$, $\Sigma_{tot}^{out}$ are the sum of in-going and out-going links incident 

49 to nodes in $C$. 

50 

51 The first phase continues until no individual move can improve the modularity. 

52 

53 The second phase consists in building a new network whose nodes are now the communities 

54 found in the first phase. To do so, the weights of the links between the new nodes are given by 

55 the sum of the weight of the links between nodes in the corresponding two communities. Once this 

56 phase is complete it is possible to reapply the first phase creating bigger communities with 

57 increased modularity. 

58 

59 The above two phases are executed until no modularity gain is achieved (or is less than 

60 the `threshold`, or until `max_levels` is reached). 

61 

62 Be careful with self-loops in the input graph. These are treated as 

63 previously reduced communities -- as if the process had been started 

64 in the middle of the algorithm. Large self-loop edge weights thus 

65 represent strong communities and in practice may be hard to add 

66 other nodes to. If your input graph edge weights for self-loops 

67 do not represent already reduced communities you may want to remove 

68 the self-loops before inputting that graph. 

69 

70 Parameters 

71 ---------- 

72 G : NetworkX graph 

73 weight : string or None, optional (default="weight") 

74 The name of an edge attribute that holds the numerical value 

75 used as a weight. If None then each edge has weight 1. 

76 resolution : float, optional (default=1) 

77 If resolution is less than 1, the algorithm favors larger communities. 

78 Greater than 1 favors smaller communities 

79 threshold : float, optional (default=0.0000001) 

80 Modularity gain threshold for each level. If the gain of modularity 

81 between 2 levels of the algorithm is less than the given threshold 

82 then the algorithm stops and returns the resulting communities. 

83 max_level : int or None, optional (default=None) 

84 The maximum number of levels (steps of the algorithm) to compute. 

85 Must be a positive integer or None. If None, then there is no max 

86 level and the threshold parameter determines the stopping condition. 

87 seed : integer, random_state, or None (default) 

88 Indicator of random number generation state. 

89 See :ref:`Randomness<randomness>`. 

90 

91 Returns 

92 ------- 

93 list 

94 A list of sets (partition of `G`). Each set represents one community and contains 

95 all the nodes that constitute it. 

96 

97 Examples 

98 -------- 

99 >>> import networkx as nx 

100 >>> G = nx.petersen_graph() 

101 >>> nx.community.louvain_communities(G, seed=123) 

102 [{0, 4, 5, 7, 9}, {1, 2, 3, 6, 8}] 

103 

104 Notes 

105 ----- 

106 The order in which the nodes are considered can affect the final output. In the algorithm 

107 the ordering happens using a random shuffle. 

108 

109 References 

110 ---------- 

111 .. [1] Blondel, V.D. et al. Fast unfolding of communities in 

112 large networks. J. Stat. Mech 10008, 1-12(2008). https://doi.org/10.1088/1742-5468/2008/10/P10008 

113 .. [2] Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden: guaranteeing 

114 well-connected communities. Sci Rep 9, 5233 (2019). https://doi.org/10.1038/s41598-019-41695-z 

115 .. [3] Nicolas Dugué, Anthony Perez. Directed Louvain : maximizing modularity in directed networks. 

116 [Research Report] Université d’Orléans. 2015. hal-01231784. https://hal.archives-ouvertes.fr/hal-01231784 

117 

118 See Also 

119 -------- 

120 louvain_partitions 

121 :any:`leiden_communities` 

122 """ 

123 

124 partitions = louvain_partitions(G, weight, resolution, threshold, seed) 

125 if max_level is not None: 

126 if max_level <= 0: 

127 raise ValueError("max_level argument must be a positive integer or None") 

128 partitions = itertools.islice(partitions, max_level) 

129 final_partition = deque(partitions, maxlen=1) 

130 return final_partition.pop() 

131 

132 

133@py_random_state("seed") 

134@nx._dispatchable(edge_attrs="weight") 

135def louvain_partitions( 

136 G, weight="weight", resolution=1, threshold=0.0000001, seed=None 

137): 

138 """Yield partitions for each level of the Louvain Community Detection Algorithm 

139 

140 Louvain Community Detection Algorithm is a simple method to extract the community 

141 structure of a network. This is a heuristic method based on modularity optimization. [1]_ 

142 

143 The partitions at each level (step of the algorithm) form a dendrogram of communities. 

144 A dendrogram is a diagram representing a tree and each level represents 

145 a partition of the G graph. The top level contains the smallest communities 

146 and as you traverse to the bottom of the tree the communities get bigger 

147 and the overall modularity increases making the partition better. 

148 

149 Each level is generated by executing the two phases of the Louvain Community 

150 Detection Algorithm. 

151 

152 Be careful with self-loops in the input graph. These are treated as 

153 previously reduced communities -- as if the process had been started 

154 in the middle of the algorithm. Large self-loop edge weights thus 

155 represent strong communities and in practice may be hard to add 

156 other nodes to. If your input graph edge weights for self-loops 

157 do not represent already reduced communities you may want to remove 

158 the self-loops before inputting that graph. 

159 

160 Parameters 

161 ---------- 

162 G : NetworkX graph 

163 weight : string or None, optional (default="weight") 

164 The name of an edge attribute that holds the numerical value 

165 used as a weight. If None then each edge has weight 1. 

166 resolution : float, optional (default=1) 

167 If resolution is less than 1, the algorithm favors larger communities. 

168 Greater than 1 favors smaller communities 

169 threshold : float, optional (default=0.0000001) 

170 Modularity gain threshold for each level. If the gain of modularity 

171 between 2 levels of the algorithm is less than the given threshold 

172 then the algorithm stops and returns the resulting communities. 

173 seed : integer, random_state, or None (default) 

174 Indicator of random number generation state. 

175 See :ref:`Randomness<randomness>`. 

176 

177 Yields 

178 ------ 

179 list 

180 A list of sets (partition of `G`). Each set represents one community and contains 

181 all the nodes that constitute it. 

182 

183 References 

184 ---------- 

185 .. [1] Blondel, V.D. et al. Fast unfolding of communities in 

186 large networks. J. Stat. Mech 10008, 1-12(2008) 

187 

188 See Also 

189 -------- 

190 louvain_communities 

191 :any:`leiden_partitions` 

192 """ 

193 

194 partition = [{u} for u in G.nodes()] 

195 if nx.is_empty(G): 

196 yield partition 

197 return 

198 mod = modularity(G, partition, resolution=resolution, weight=weight) 

199 is_directed = G.is_directed() 

200 if G.is_multigraph(): 

201 graph = _convert_multigraph(G, weight, is_directed) 

202 else: 

203 graph = G.__class__() 

204 graph.add_nodes_from(G) 

205 graph.add_weighted_edges_from(G.edges(data=weight, default=1)) 

206 

207 m = graph.size(weight="weight") 

208 partition, inner_partition, improvement = _one_level( 

209 graph, m, partition, resolution, is_directed, seed 

210 ) 

211 improvement = True 

212 while improvement: 

213 # gh-5901 protect the sets in the yielded list from further manipulation here 

214 yield [s.copy() for s in partition] 

215 new_mod = modularity( 

216 graph, inner_partition, resolution=resolution, weight="weight" 

217 ) 

218 if new_mod - mod <= threshold: 

219 return 

220 mod = new_mod 

221 graph = _gen_graph(graph, inner_partition) 

222 partition, inner_partition, improvement = _one_level( 

223 graph, m, partition, resolution, is_directed, seed 

224 ) 

225 

226 

227def _one_level(G, m, partition, resolution=1, is_directed=False, seed=None): 

228 """Calculate one level of the Louvain partitions tree 

229 

230 Parameters 

231 ---------- 

232 G : NetworkX Graph/DiGraph 

233 The graph from which to detect communities 

234 m : number 

235 The size of the graph `G`. 

236 partition : list of sets of nodes 

237 A valid partition of the graph `G` 

238 resolution : positive number 

239 The resolution parameter for computing the modularity of a partition 

240 is_directed : bool 

241 True if `G` is a directed graph. 

242 seed : integer, random_state, or None (default) 

243 Indicator of random number generation state. 

244 See :ref:`Randomness<randomness>`. 

245 

246 """ 

247 node2com = {u: i for i, u in enumerate(G.nodes())} 

248 inner_partition = [{u} for u in G.nodes()] 

249 if is_directed: 

250 in_degrees = dict(G.in_degree(weight="weight")) 

251 out_degrees = dict(G.out_degree(weight="weight")) 

252 Stot_in = list(in_degrees.values()) 

253 Stot_out = list(out_degrees.values()) 

254 # Calculate weights for both in and out neighbors without considering self-loops 

255 nbrs = {} 

256 for u in G: 

257 nbrs[u] = defaultdict(float) 

258 for _, n, wt in G.out_edges(u, data="weight"): 

259 if u != n: 

260 nbrs[u][n] += wt 

261 for n, _, wt in G.in_edges(u, data="weight"): 

262 if u != n: 

263 nbrs[u][n] += wt 

264 else: 

265 degrees = dict(G.degree(weight="weight")) 

266 Stot = list(degrees.values()) 

267 nbrs = {u: {v: data["weight"] for v, data in G[u].items() if v != u} for u in G} 

268 rand_nodes = list(G.nodes) 

269 seed.shuffle(rand_nodes) 

270 nb_moves = 1 

271 improvement = False 

272 while nb_moves > 0: 

273 nb_moves = 0 

274 for u in rand_nodes: 

275 best_mod = 0 

276 best_com = node2com[u] 

277 weights2com = _neighbor_weights(nbrs[u], node2com) 

278 if is_directed: 

279 in_degree = in_degrees[u] 

280 out_degree = out_degrees[u] 

281 Stot_in[best_com] -= in_degree 

282 Stot_out[best_com] -= out_degree 

283 remove_cost = ( 

284 -weights2com[best_com] / m 

285 + resolution 

286 * (out_degree * Stot_in[best_com] + in_degree * Stot_out[best_com]) 

287 / m**2 

288 ) 

289 else: 

290 degree = degrees[u] 

291 Stot[best_com] -= degree 

292 remove_cost = -weights2com[best_com] / m + resolution * ( 

293 Stot[best_com] * degree 

294 ) / (2 * m**2) 

295 for nbr_com, wt in weights2com.items(): 

296 if is_directed: 

297 gain = ( 

298 remove_cost 

299 + wt / m 

300 - resolution 

301 * ( 

302 out_degree * Stot_in[nbr_com] 

303 + in_degree * Stot_out[nbr_com] 

304 ) 

305 / m**2 

306 ) 

307 else: 

308 gain = ( 

309 remove_cost 

310 + wt / m 

311 - resolution * (Stot[nbr_com] * degree) / (2 * m**2) 

312 ) 

313 if gain > best_mod: 

314 best_mod = gain 

315 best_com = nbr_com 

316 if is_directed: 

317 Stot_in[best_com] += in_degree 

318 Stot_out[best_com] += out_degree 

319 else: 

320 Stot[best_com] += degree 

321 if best_com != node2com[u]: 

322 com = G.nodes[u].get("nodes", {u}) 

323 partition[node2com[u]].difference_update(com) 

324 inner_partition[node2com[u]].remove(u) 

325 partition[best_com].update(com) 

326 inner_partition[best_com].add(u) 

327 improvement = True 

328 nb_moves += 1 

329 node2com[u] = best_com 

330 partition = list(filter(len, partition)) 

331 inner_partition = list(filter(len, inner_partition)) 

332 return partition, inner_partition, improvement 

333 

334 

335def _neighbor_weights(nbrs, node2com): 

336 """Calculate weights between node and its neighbor communities. 

337 

338 Parameters 

339 ---------- 

340 nbrs : dictionary 

341 Dictionary with nodes' neighbors as keys and their edge weight as value. 

342 node2com : dictionary 

343 Dictionary with all graph's nodes as keys and their community index as value. 

344 

345 """ 

346 weights = defaultdict(float) 

347 for nbr, wt in nbrs.items(): 

348 weights[node2com[nbr]] += wt 

349 return weights 

350 

351 

352def _gen_graph(G, partition): 

353 """Generate a new graph based on the partitions of a given graph""" 

354 H = G.__class__() 

355 node2com = {} 

356 for i, part in enumerate(partition): 

357 nodes = set() 

358 for node in part: 

359 node2com[node] = i 

360 nodes.update(G.nodes[node].get("nodes", {node})) 

361 H.add_node(i, nodes=nodes) 

362 

363 for node1, node2, wt in G.edges(data=True): 

364 wt = wt["weight"] 

365 com1 = node2com[node1] 

366 com2 = node2com[node2] 

367 temp = H.get_edge_data(com1, com2, {"weight": 0})["weight"] 

368 H.add_edge(com1, com2, weight=wt + temp) 

369 return H 

370 

371 

372def _convert_multigraph(G, weight, is_directed): 

373 """Convert a Multigraph to normal Graph""" 

374 if is_directed: 

375 H = nx.DiGraph() 

376 else: 

377 H = nx.Graph() 

378 H.add_nodes_from(G) 

379 for u, v, wt in G.edges(data=weight, default=1): 

380 if H.has_edge(u, v): 

381 H[u][v]["weight"] += wt 

382 else: 

383 H.add_edge(u, v, weight=wt) 

384 return H