Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.9/dist-packages/networkx/algorithms/d_separation.py: 18%

84 statements  

« prev     ^ index     » next       coverage.py v7.3.2, created at 2023-10-20 07:00 +0000

1""" 

2Algorithm for testing d-separation in DAGs. 

3 

4*d-separation* is a test for conditional independence in probability 

5distributions that can be factorized using DAGs. It is a purely 

6graphical test that uses the underlying graph and makes no reference 

7to the actual distribution parameters. See [1]_ for a formal 

8definition. 

9 

10The implementation is based on the conceptually simple linear time 

11algorithm presented in [2]_. Refer to [3]_, [4]_ for a couple of 

12alternative algorithms. 

13 

14Here, we provide a brief overview of d-separation and related concepts that 

15are relevant for understanding it: 

16 

17Blocking paths 

18-------------- 

19 

20Before we overview, we introduce the following terminology to describe paths: 

21 

22- "open" path: A path between two nodes that can be traversed 

23- "blocked" path: A path between two nodes that cannot be traversed 

24 

25A **collider** is a triplet of nodes along a path that is like the following: 

26``... u -> c <- v ...``), where 'c' is a common successor of ``u`` and ``v``. A path 

27through a collider is considered "blocked". When 

28a node that is a collider, or a descendant of a collider is included in 

29the d-separating set, then the path through that collider node is "open". If the 

30path through the collider node is open, then we will call this node an open collider. 

31 

32The d-separation set blocks the paths between ``u`` and ``v``. If you include colliders, 

33or their descendant nodes in the d-separation set, then those colliders will open up, 

34enabling a path to be traversed if it is not blocked some other way. 

35 

36Illustration of D-separation with examples 

37------------------------------------------ 

38 

39For a pair of two nodes, ``u`` and ``v``, all paths are considered open if 

40there is a path between ``u`` and ``v`` that is not blocked. That means, there is an open 

41path between ``u`` and ``v`` that does not encounter a collider, or a variable in the 

42d-separating set. 

43 

44For example, if the d-separating set is the empty set, then the following paths are 

45unblocked between ``u`` and ``v``: 

46 

47- u <- z -> v 

48- u -> w -> ... -> z -> v 

49 

50If for example, 'z' is in the d-separating set, then 'z' blocks those paths 

51between ``u`` and ``v``. 

52 

53Colliders block a path by default if they and their descendants are not included 

54in the d-separating set. An example of a path that is blocked when the d-separating 

55set is empty is: 

56 

57- u -> w -> ... -> z <- v 

58 

59because 'z' is a collider in this path and 'z' is not in the d-separating set. However, 

60if 'z' or a descendant of 'z' is included in the d-separating set, then the path through 

61the collider at 'z' (... -> z <- ...) is now "open".  

62 

63D-separation is concerned with blocking all paths between u and v. Therefore, a 

64d-separating set between ``u`` and ``v`` is one where all paths are blocked. 

65 

66D-separation and its applications in probability 

67------------------------------------------------ 

68 

69D-separation is commonly used in probabilistic graphical models. D-separation 

70connects the idea of probabilistic "dependence" with separation in a graph. If 

71one assumes the causal Markov condition [5]_, then d-separation implies conditional 

72independence in probability distributions. 

73 

74Examples 

75-------- 

76 

77>>> 

78>>> # HMM graph with five states and observation nodes 

79... g = nx.DiGraph() 

80>>> g.add_edges_from( 

81... [ 

82... ("S1", "S2"), 

83... ("S2", "S3"), 

84... ("S3", "S4"), 

85... ("S4", "S5"), 

86... ("S1", "O1"), 

87... ("S2", "O2"), 

88... ("S3", "O3"), 

89... ("S4", "O4"), 

90... ("S5", "O5"), 

91... ] 

92... ) 

93>>> 

94>>> # states/obs before 'S3' are d-separated from states/obs after 'S3' 

95... nx.d_separated(g, {"S1", "S2", "O1", "O2"}, {"S4", "S5", "O4", "O5"}, {"S3"}) 

96True 

97 

98 

99References 

100---------- 

101 

102.. [1] Pearl, J. (2009). Causality. Cambridge: Cambridge University Press. 

103 

104.. [2] Darwiche, A. (2009). Modeling and reasoning with Bayesian networks.  

105 Cambridge: Cambridge University Press. 

106 

107.. [3] Shachter, R. D. (1998). 

108 Bayes-ball: rational pastime (for determining irrelevance and requisite 

109 information in belief networks and influence diagrams). 

110 In , Proceedings of the Fourteenth Conference on Uncertainty in Artificial 

111 Intelligence (pp. 480–487). 

112 San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 

113 

114.. [4] Koller, D., & Friedman, N. (2009). 

115 Probabilistic graphical models: principles and techniques. The MIT Press. 

116 

117.. [5] https://en.wikipedia.org/wiki/Causal_Markov_condition 

118 

119""" 

120 

121from collections import deque 

122 

123import networkx as nx 

124from networkx.utils import UnionFind, not_implemented_for 

125 

126__all__ = ["d_separated", "minimal_d_separator", "is_minimal_d_separator"] 

127 

128 

129@not_implemented_for("undirected") 

130@nx._dispatch 

131def d_separated(G, x, y, z): 

132 """ 

133 Return whether node sets ``x`` and ``y`` are d-separated by ``z``. 

134 

135 Parameters 

136 ---------- 

137 G : graph 

138 A NetworkX DAG. 

139 

140 x : set 

141 First set of nodes in ``G``. 

142 

143 y : set 

144 Second set of nodes in ``G``. 

145 

146 z : set 

147 Set of conditioning nodes in ``G``. Can be empty set. 

148 

149 Returns 

150 ------- 

151 b : bool 

152 A boolean that is true if ``x`` is d-separated from ``y`` given ``z`` in ``G``. 

153 

154 Raises 

155 ------ 

156 NetworkXError 

157 The *d-separation* test is commonly used with directed 

158 graphical models which are acyclic. Accordingly, the algorithm 

159 raises a :exc:`NetworkXError` if the input graph is not a DAG. 

160 

161 NodeNotFound 

162 If any of the input nodes are not found in the graph, 

163 a :exc:`NodeNotFound` exception is raised. 

164 

165 Notes 

166 ----- 

167 A d-separating set in a DAG is a set of nodes that 

168 blocks all paths between the two sets. Nodes in `z` 

169 block a path if they are part of the path and are not a collider, 

170 or a descendant of a collider. A collider structure along a path 

171 is ``... -> c <- ...`` where ``c`` is the collider node. 

172 

173 https://en.wikipedia.org/wiki/Bayesian_network#d-separation 

174 """ 

175 

176 if not nx.is_directed_acyclic_graph(G): 

177 raise nx.NetworkXError("graph should be directed acyclic") 

178 

179 union_xyz = x.union(y).union(z) 

180 

181 if any(n not in G.nodes for n in union_xyz): 

182 raise nx.NodeNotFound("one or more specified nodes not found in the graph") 

183 

184 G_copy = G.copy() 

185 

186 # transform the graph by removing leaves that are not in x | y | z 

187 # until no more leaves can be removed. 

188 leaves = deque([n for n in G_copy.nodes if G_copy.out_degree[n] == 0]) 

189 while len(leaves) > 0: 

190 leaf = leaves.popleft() 

191 if leaf not in union_xyz: 

192 for p in G_copy.predecessors(leaf): 

193 if G_copy.out_degree[p] == 1: 

194 leaves.append(p) 

195 G_copy.remove_node(leaf) 

196 

197 # transform the graph by removing outgoing edges from the 

198 # conditioning set. 

199 edges_to_remove = list(G_copy.out_edges(z)) 

200 G_copy.remove_edges_from(edges_to_remove) 

201 

202 # use disjoint-set data structure to check if any node in `x` 

203 # occurs in the same weakly connected component as a node in `y`. 

204 disjoint_set = UnionFind(G_copy.nodes()) 

205 for component in nx.weakly_connected_components(G_copy): 

206 disjoint_set.union(*component) 

207 disjoint_set.union(*x) 

208 disjoint_set.union(*y) 

209 

210 if x and y and disjoint_set[next(iter(x))] == disjoint_set[next(iter(y))]: 

211 return False 

212 else: 

213 return True 

214 

215 

216@not_implemented_for("undirected") 

217@nx._dispatch 

218def minimal_d_separator(G, u, v): 

219 """Compute a minimal d-separating set between 'u' and 'v'. 

220 

221 A d-separating set in a DAG is a set of nodes that blocks all paths 

222 between the two nodes, 'u' and 'v'. This function 

223 constructs a d-separating set that is "minimal", meaning it is the smallest 

224 d-separating set for 'u' and 'v'. This is not necessarily 

225 unique. For more details, see Notes. 

226 

227 Parameters 

228 ---------- 

229 G : graph 

230 A networkx DAG. 

231 u : node 

232 A node in the graph, G. 

233 v : node 

234 A node in the graph, G. 

235 

236 Raises 

237 ------ 

238 NetworkXError 

239 Raises a :exc:`NetworkXError` if the input graph is not a DAG. 

240 

241 NodeNotFound 

242 If any of the input nodes are not found in the graph, 

243 a :exc:`NodeNotFound` exception is raised. 

244 

245 References 

246 ---------- 

247 .. [1] Tian, J., & Paz, A. (1998). Finding Minimal D-separators. 

248 

249 Notes 

250 ----- 

251 This function only finds ``a`` minimal d-separator. It does not guarantee 

252 uniqueness, since in a DAG there may be more than one minimal d-separator 

253 between two nodes. Moreover, this only checks for minimal separators 

254 between two nodes, not two sets. Finding minimal d-separators between 

255 two sets of nodes is not supported. 

256 

257 Uses the algorithm presented in [1]_. The complexity of the algorithm 

258 is :math:`O(|E_{An}^m|)`, where :math:`|E_{An}^m|` stands for the 

259 number of edges in the moralized graph of the sub-graph consisting 

260 of only the ancestors of 'u' and 'v'. For full details, see [1]_. 

261 

262 The algorithm works by constructing the moral graph consisting of just 

263 the ancestors of `u` and `v`. Then it constructs a candidate for 

264 a separating set ``Z'`` from the predecessors of `u` and `v`. 

265 Then BFS is run starting from `u` and marking nodes 

266 found from ``Z'`` and calling those nodes ``Z''``. 

267 Then BFS is run again starting from `v` and marking nodes if they are 

268 present in ``Z''``. Those marked nodes are the returned minimal 

269 d-separating set. 

270 

271 https://en.wikipedia.org/wiki/Bayesian_network#d-separation 

272 """ 

273 if not nx.is_directed_acyclic_graph(G): 

274 raise nx.NetworkXError("graph should be directed acyclic") 

275 

276 union_uv = {u, v} 

277 

278 if any(n not in G.nodes for n in union_uv): 

279 raise nx.NodeNotFound("one or more specified nodes not found in the graph") 

280 

281 # first construct the set of ancestors of X and Y 

282 x_anc = nx.ancestors(G, u) 

283 y_anc = nx.ancestors(G, v) 

284 D_anc_xy = x_anc.union(y_anc) 

285 D_anc_xy.update((u, v)) 

286 

287 # second, construct the moralization of the subgraph of Anc(X,Y) 

288 moral_G = nx.moral_graph(G.subgraph(D_anc_xy)) 

289 

290 # find a separating set Z' in moral_G 

291 Z_prime = set(G.predecessors(u)).union(set(G.predecessors(v))) 

292 

293 # perform BFS on the graph from 'x' to mark 

294 Z_dprime = _bfs_with_marks(moral_G, u, Z_prime) 

295 Z = _bfs_with_marks(moral_G, v, Z_dprime) 

296 return Z 

297 

298 

299@not_implemented_for("undirected") 

300@nx._dispatch 

301def is_minimal_d_separator(G, u, v, z): 

302 """Determine if a d-separating set is minimal. 

303 

304 A d-separating set, `z`, in a DAG is a set of nodes that blocks 

305 all paths between the two nodes, `u` and `v`. This function 

306 verifies that a set is "minimal", meaning there is no smaller 

307 d-separating set between the two nodes. 

308 

309 Note: This function checks whether `z` is a d-separator AND is minimal. 

310 One can use the function `d_separated` to only check if `z` is a d-separator. 

311 See examples below. 

312 

313 Parameters 

314 ---------- 

315 G : nx.DiGraph 

316 The graph. 

317 u : node 

318 A node in the graph. 

319 v : node 

320 A node in the graph. 

321 z : Set of nodes 

322 The set of nodes to check if it is a minimal d-separating set. 

323 The function :func:`d_separated` is called inside this function 

324 to verify that `z` is in fact a d-separator. 

325 

326 Returns 

327 ------- 

328 bool 

329 Whether or not the set `z` is a d-separator and is also minimal. 

330 

331 Examples 

332 -------- 

333 >>> G = nx.path_graph([0, 1, 2, 3], create_using=nx.DiGraph) 

334 >>> G.add_node(4) 

335 >>> nx.is_minimal_d_separator(G, 0, 2, {1}) 

336 True 

337 >>> # since {1} is the minimal d-separator, {1, 3, 4} is not minimal 

338 >>> nx.is_minimal_d_separator(G, 0, 2, {1, 3, 4}) 

339 False 

340 >>> # alternatively, if we only want to check that {1, 3, 4} is a d-separator 

341 >>> nx.d_separated(G, {0}, {4}, {1, 3, 4}) 

342 True 

343 

344 Raises 

345 ------ 

346 NetworkXError 

347 Raises a :exc:`NetworkXError` if the input graph is not a DAG. 

348 

349 NodeNotFound 

350 If any of the input nodes are not found in the graph, 

351 a :exc:`NodeNotFound` exception is raised. 

352 

353 References 

354 ---------- 

355 .. [1] Tian, J., & Paz, A. (1998). Finding Minimal D-separators. 

356 

357 Notes 

358 ----- 

359 This function only works on verifying a d-separating set is minimal 

360 between two nodes. To verify that a d-separating set is minimal between 

361 two sets of nodes is not supported. 

362 

363 Uses algorithm 2 presented in [1]_. The complexity of the algorithm 

364 is :math:`O(|E_{An}^m|)`, where :math:`|E_{An}^m|` stands for the 

365 number of edges in the moralized graph of the sub-graph consisting 

366 of only the ancestors of ``u`` and ``v``. 

367 

368 The algorithm works by constructing the moral graph consisting of just 

369 the ancestors of `u` and `v`. First, it performs BFS on the moral graph 

370 starting from `u` and marking any nodes it encounters that are part of 

371 the separating set, `z`. If a node is marked, then it does not continue 

372 along that path. In the second stage, BFS with markings is repeated on the 

373 moral graph starting from `v`. If at any stage, any node in `z` is 

374 not marked, then `z` is considered not minimal. If the end of the algorithm 

375 is reached, then `z` is minimal. 

376 

377 For full details, see [1]_. 

378 

379 https://en.wikipedia.org/wiki/Bayesian_network#d-separation 

380 """ 

381 if not nx.d_separated(G, {u}, {v}, z): 

382 return False 

383 

384 x_anc = nx.ancestors(G, u) 

385 y_anc = nx.ancestors(G, v) 

386 xy_anc = x_anc.union(y_anc) 

387 

388 # if Z contains any node which is not in ancestors of X or Y 

389 # then it is definitely not minimal 

390 if any(node not in xy_anc for node in z): 

391 return False 

392 

393 D_anc_xy = x_anc.union(y_anc) 

394 D_anc_xy.update((u, v)) 

395 

396 # second, construct the moralization of the subgraph 

397 moral_G = nx.moral_graph(G.subgraph(D_anc_xy)) 

398 

399 # start BFS from X 

400 marks = _bfs_with_marks(moral_G, u, z) 

401 

402 # if not all the Z is marked, then the set is not minimal 

403 if any(node not in marks for node in z): 

404 return False 

405 

406 # similarly, start BFS from Y and check the marks 

407 marks = _bfs_with_marks(moral_G, v, z) 

408 # if not all the Z is marked, then the set is not minimal 

409 if any(node not in marks for node in z): 

410 return False 

411 

412 return True 

413 

414 

415@not_implemented_for("directed") 

416def _bfs_with_marks(G, start_node, check_set): 

417 """Breadth-first-search with markings. 

418 

419 Performs BFS starting from ``start_node`` and whenever a node 

420 inside ``check_set`` is met, it is "marked". Once a node is marked, 

421 BFS does not continue along that path. The resulting marked nodes 

422 are returned. 

423 

424 Parameters 

425 ---------- 

426 G : nx.Graph 

427 An undirected graph. 

428 start_node : node 

429 The start of the BFS. 

430 check_set : set 

431 The set of nodes to check against. 

432 

433 Returns 

434 ------- 

435 marked : set 

436 A set of nodes that were marked. 

437 """ 

438 visited = {} 

439 marked = set() 

440 queue = [] 

441 

442 visited[start_node] = None 

443 queue.append(start_node) 

444 while queue: 

445 m = queue.pop(0) 

446 

447 for nbr in G.neighbors(m): 

448 if nbr not in visited: 

449 # memoize where we visited so far 

450 visited[nbr] = None 

451 

452 # mark the node in Z' and do not continue along that path 

453 if nbr in check_set: 

454 marked.add(nbr) 

455 else: 

456 queue.append(nbr) 

457 return marked