Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.9/dist-packages/networkx/algorithms/d_separation.py: 18%
84 statements
« prev ^ index » next coverage.py v7.3.2, created at 2023-10-20 07:00 +0000
« prev ^ index » next coverage.py v7.3.2, created at 2023-10-20 07:00 +0000
1"""
2Algorithm for testing d-separation in DAGs.
4*d-separation* is a test for conditional independence in probability
5distributions that can be factorized using DAGs. It is a purely
6graphical test that uses the underlying graph and makes no reference
7to the actual distribution parameters. See [1]_ for a formal
8definition.
10The implementation is based on the conceptually simple linear time
11algorithm presented in [2]_. Refer to [3]_, [4]_ for a couple of
12alternative algorithms.
14Here, we provide a brief overview of d-separation and related concepts that
15are relevant for understanding it:
17Blocking paths
18--------------
20Before we overview, we introduce the following terminology to describe paths:
22- "open" path: A path between two nodes that can be traversed
23- "blocked" path: A path between two nodes that cannot be traversed
25A **collider** is a triplet of nodes along a path that is like the following:
26``... u -> c <- v ...``), where 'c' is a common successor of ``u`` and ``v``. A path
27through a collider is considered "blocked". When
28a node that is a collider, or a descendant of a collider is included in
29the d-separating set, then the path through that collider node is "open". If the
30path through the collider node is open, then we will call this node an open collider.
32The d-separation set blocks the paths between ``u`` and ``v``. If you include colliders,
33or their descendant nodes in the d-separation set, then those colliders will open up,
34enabling a path to be traversed if it is not blocked some other way.
36Illustration of D-separation with examples
37------------------------------------------
39For a pair of two nodes, ``u`` and ``v``, all paths are considered open if
40there is a path between ``u`` and ``v`` that is not blocked. That means, there is an open
41path between ``u`` and ``v`` that does not encounter a collider, or a variable in the
42d-separating set.
44For example, if the d-separating set is the empty set, then the following paths are
45unblocked between ``u`` and ``v``:
47- u <- z -> v
48- u -> w -> ... -> z -> v
50If for example, 'z' is in the d-separating set, then 'z' blocks those paths
51between ``u`` and ``v``.
53Colliders block a path by default if they and their descendants are not included
54in the d-separating set. An example of a path that is blocked when the d-separating
55set is empty is:
57- u -> w -> ... -> z <- v
59because 'z' is a collider in this path and 'z' is not in the d-separating set. However,
60if 'z' or a descendant of 'z' is included in the d-separating set, then the path through
61the collider at 'z' (... -> z <- ...) is now "open".
63D-separation is concerned with blocking all paths between u and v. Therefore, a
64d-separating set between ``u`` and ``v`` is one where all paths are blocked.
66D-separation and its applications in probability
67------------------------------------------------
69D-separation is commonly used in probabilistic graphical models. D-separation
70connects the idea of probabilistic "dependence" with separation in a graph. If
71one assumes the causal Markov condition [5]_, then d-separation implies conditional
72independence in probability distributions.
74Examples
75--------
77>>>
78>>> # HMM graph with five states and observation nodes
79... g = nx.DiGraph()
80>>> g.add_edges_from(
81... [
82... ("S1", "S2"),
83... ("S2", "S3"),
84... ("S3", "S4"),
85... ("S4", "S5"),
86... ("S1", "O1"),
87... ("S2", "O2"),
88... ("S3", "O3"),
89... ("S4", "O4"),
90... ("S5", "O5"),
91... ]
92... )
93>>>
94>>> # states/obs before 'S3' are d-separated from states/obs after 'S3'
95... nx.d_separated(g, {"S1", "S2", "O1", "O2"}, {"S4", "S5", "O4", "O5"}, {"S3"})
96True
99References
100----------
102.. [1] Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
104.. [2] Darwiche, A. (2009). Modeling and reasoning with Bayesian networks.
105 Cambridge: Cambridge University Press.
107.. [3] Shachter, R. D. (1998).
108 Bayes-ball: rational pastime (for determining irrelevance and requisite
109 information in belief networks and influence diagrams).
110 In , Proceedings of the Fourteenth Conference on Uncertainty in Artificial
111 Intelligence (pp. 480–487).
112 San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
114.. [4] Koller, D., & Friedman, N. (2009).
115 Probabilistic graphical models: principles and techniques. The MIT Press.
117.. [5] https://en.wikipedia.org/wiki/Causal_Markov_condition
119"""
121from collections import deque
123import networkx as nx
124from networkx.utils import UnionFind, not_implemented_for
126__all__ = ["d_separated", "minimal_d_separator", "is_minimal_d_separator"]
129@not_implemented_for("undirected")
130@nx._dispatch
131def d_separated(G, x, y, z):
132 """
133 Return whether node sets ``x`` and ``y`` are d-separated by ``z``.
135 Parameters
136 ----------
137 G : graph
138 A NetworkX DAG.
140 x : set
141 First set of nodes in ``G``.
143 y : set
144 Second set of nodes in ``G``.
146 z : set
147 Set of conditioning nodes in ``G``. Can be empty set.
149 Returns
150 -------
151 b : bool
152 A boolean that is true if ``x`` is d-separated from ``y`` given ``z`` in ``G``.
154 Raises
155 ------
156 NetworkXError
157 The *d-separation* test is commonly used with directed
158 graphical models which are acyclic. Accordingly, the algorithm
159 raises a :exc:`NetworkXError` if the input graph is not a DAG.
161 NodeNotFound
162 If any of the input nodes are not found in the graph,
163 a :exc:`NodeNotFound` exception is raised.
165 Notes
166 -----
167 A d-separating set in a DAG is a set of nodes that
168 blocks all paths between the two sets. Nodes in `z`
169 block a path if they are part of the path and are not a collider,
170 or a descendant of a collider. A collider structure along a path
171 is ``... -> c <- ...`` where ``c`` is the collider node.
173 https://en.wikipedia.org/wiki/Bayesian_network#d-separation
174 """
176 if not nx.is_directed_acyclic_graph(G):
177 raise nx.NetworkXError("graph should be directed acyclic")
179 union_xyz = x.union(y).union(z)
181 if any(n not in G.nodes for n in union_xyz):
182 raise nx.NodeNotFound("one or more specified nodes not found in the graph")
184 G_copy = G.copy()
186 # transform the graph by removing leaves that are not in x | y | z
187 # until no more leaves can be removed.
188 leaves = deque([n for n in G_copy.nodes if G_copy.out_degree[n] == 0])
189 while len(leaves) > 0:
190 leaf = leaves.popleft()
191 if leaf not in union_xyz:
192 for p in G_copy.predecessors(leaf):
193 if G_copy.out_degree[p] == 1:
194 leaves.append(p)
195 G_copy.remove_node(leaf)
197 # transform the graph by removing outgoing edges from the
198 # conditioning set.
199 edges_to_remove = list(G_copy.out_edges(z))
200 G_copy.remove_edges_from(edges_to_remove)
202 # use disjoint-set data structure to check if any node in `x`
203 # occurs in the same weakly connected component as a node in `y`.
204 disjoint_set = UnionFind(G_copy.nodes())
205 for component in nx.weakly_connected_components(G_copy):
206 disjoint_set.union(*component)
207 disjoint_set.union(*x)
208 disjoint_set.union(*y)
210 if x and y and disjoint_set[next(iter(x))] == disjoint_set[next(iter(y))]:
211 return False
212 else:
213 return True
216@not_implemented_for("undirected")
217@nx._dispatch
218def minimal_d_separator(G, u, v):
219 """Compute a minimal d-separating set between 'u' and 'v'.
221 A d-separating set in a DAG is a set of nodes that blocks all paths
222 between the two nodes, 'u' and 'v'. This function
223 constructs a d-separating set that is "minimal", meaning it is the smallest
224 d-separating set for 'u' and 'v'. This is not necessarily
225 unique. For more details, see Notes.
227 Parameters
228 ----------
229 G : graph
230 A networkx DAG.
231 u : node
232 A node in the graph, G.
233 v : node
234 A node in the graph, G.
236 Raises
237 ------
238 NetworkXError
239 Raises a :exc:`NetworkXError` if the input graph is not a DAG.
241 NodeNotFound
242 If any of the input nodes are not found in the graph,
243 a :exc:`NodeNotFound` exception is raised.
245 References
246 ----------
247 .. [1] Tian, J., & Paz, A. (1998). Finding Minimal D-separators.
249 Notes
250 -----
251 This function only finds ``a`` minimal d-separator. It does not guarantee
252 uniqueness, since in a DAG there may be more than one minimal d-separator
253 between two nodes. Moreover, this only checks for minimal separators
254 between two nodes, not two sets. Finding minimal d-separators between
255 two sets of nodes is not supported.
257 Uses the algorithm presented in [1]_. The complexity of the algorithm
258 is :math:`O(|E_{An}^m|)`, where :math:`|E_{An}^m|` stands for the
259 number of edges in the moralized graph of the sub-graph consisting
260 of only the ancestors of 'u' and 'v'. For full details, see [1]_.
262 The algorithm works by constructing the moral graph consisting of just
263 the ancestors of `u` and `v`. Then it constructs a candidate for
264 a separating set ``Z'`` from the predecessors of `u` and `v`.
265 Then BFS is run starting from `u` and marking nodes
266 found from ``Z'`` and calling those nodes ``Z''``.
267 Then BFS is run again starting from `v` and marking nodes if they are
268 present in ``Z''``. Those marked nodes are the returned minimal
269 d-separating set.
271 https://en.wikipedia.org/wiki/Bayesian_network#d-separation
272 """
273 if not nx.is_directed_acyclic_graph(G):
274 raise nx.NetworkXError("graph should be directed acyclic")
276 union_uv = {u, v}
278 if any(n not in G.nodes for n in union_uv):
279 raise nx.NodeNotFound("one or more specified nodes not found in the graph")
281 # first construct the set of ancestors of X and Y
282 x_anc = nx.ancestors(G, u)
283 y_anc = nx.ancestors(G, v)
284 D_anc_xy = x_anc.union(y_anc)
285 D_anc_xy.update((u, v))
287 # second, construct the moralization of the subgraph of Anc(X,Y)
288 moral_G = nx.moral_graph(G.subgraph(D_anc_xy))
290 # find a separating set Z' in moral_G
291 Z_prime = set(G.predecessors(u)).union(set(G.predecessors(v)))
293 # perform BFS on the graph from 'x' to mark
294 Z_dprime = _bfs_with_marks(moral_G, u, Z_prime)
295 Z = _bfs_with_marks(moral_G, v, Z_dprime)
296 return Z
299@not_implemented_for("undirected")
300@nx._dispatch
301def is_minimal_d_separator(G, u, v, z):
302 """Determine if a d-separating set is minimal.
304 A d-separating set, `z`, in a DAG is a set of nodes that blocks
305 all paths between the two nodes, `u` and `v`. This function
306 verifies that a set is "minimal", meaning there is no smaller
307 d-separating set between the two nodes.
309 Note: This function checks whether `z` is a d-separator AND is minimal.
310 One can use the function `d_separated` to only check if `z` is a d-separator.
311 See examples below.
313 Parameters
314 ----------
315 G : nx.DiGraph
316 The graph.
317 u : node
318 A node in the graph.
319 v : node
320 A node in the graph.
321 z : Set of nodes
322 The set of nodes to check if it is a minimal d-separating set.
323 The function :func:`d_separated` is called inside this function
324 to verify that `z` is in fact a d-separator.
326 Returns
327 -------
328 bool
329 Whether or not the set `z` is a d-separator and is also minimal.
331 Examples
332 --------
333 >>> G = nx.path_graph([0, 1, 2, 3], create_using=nx.DiGraph)
334 >>> G.add_node(4)
335 >>> nx.is_minimal_d_separator(G, 0, 2, {1})
336 True
337 >>> # since {1} is the minimal d-separator, {1, 3, 4} is not minimal
338 >>> nx.is_minimal_d_separator(G, 0, 2, {1, 3, 4})
339 False
340 >>> # alternatively, if we only want to check that {1, 3, 4} is a d-separator
341 >>> nx.d_separated(G, {0}, {4}, {1, 3, 4})
342 True
344 Raises
345 ------
346 NetworkXError
347 Raises a :exc:`NetworkXError` if the input graph is not a DAG.
349 NodeNotFound
350 If any of the input nodes are not found in the graph,
351 a :exc:`NodeNotFound` exception is raised.
353 References
354 ----------
355 .. [1] Tian, J., & Paz, A. (1998). Finding Minimal D-separators.
357 Notes
358 -----
359 This function only works on verifying a d-separating set is minimal
360 between two nodes. To verify that a d-separating set is minimal between
361 two sets of nodes is not supported.
363 Uses algorithm 2 presented in [1]_. The complexity of the algorithm
364 is :math:`O(|E_{An}^m|)`, where :math:`|E_{An}^m|` stands for the
365 number of edges in the moralized graph of the sub-graph consisting
366 of only the ancestors of ``u`` and ``v``.
368 The algorithm works by constructing the moral graph consisting of just
369 the ancestors of `u` and `v`. First, it performs BFS on the moral graph
370 starting from `u` and marking any nodes it encounters that are part of
371 the separating set, `z`. If a node is marked, then it does not continue
372 along that path. In the second stage, BFS with markings is repeated on the
373 moral graph starting from `v`. If at any stage, any node in `z` is
374 not marked, then `z` is considered not minimal. If the end of the algorithm
375 is reached, then `z` is minimal.
377 For full details, see [1]_.
379 https://en.wikipedia.org/wiki/Bayesian_network#d-separation
380 """
381 if not nx.d_separated(G, {u}, {v}, z):
382 return False
384 x_anc = nx.ancestors(G, u)
385 y_anc = nx.ancestors(G, v)
386 xy_anc = x_anc.union(y_anc)
388 # if Z contains any node which is not in ancestors of X or Y
389 # then it is definitely not minimal
390 if any(node not in xy_anc for node in z):
391 return False
393 D_anc_xy = x_anc.union(y_anc)
394 D_anc_xy.update((u, v))
396 # second, construct the moralization of the subgraph
397 moral_G = nx.moral_graph(G.subgraph(D_anc_xy))
399 # start BFS from X
400 marks = _bfs_with_marks(moral_G, u, z)
402 # if not all the Z is marked, then the set is not minimal
403 if any(node not in marks for node in z):
404 return False
406 # similarly, start BFS from Y and check the marks
407 marks = _bfs_with_marks(moral_G, v, z)
408 # if not all the Z is marked, then the set is not minimal
409 if any(node not in marks for node in z):
410 return False
412 return True
415@not_implemented_for("directed")
416def _bfs_with_marks(G, start_node, check_set):
417 """Breadth-first-search with markings.
419 Performs BFS starting from ``start_node`` and whenever a node
420 inside ``check_set`` is met, it is "marked". Once a node is marked,
421 BFS does not continue along that path. The resulting marked nodes
422 are returned.
424 Parameters
425 ----------
426 G : nx.Graph
427 An undirected graph.
428 start_node : node
429 The start of the BFS.
430 check_set : set
431 The set of nodes to check against.
433 Returns
434 -------
435 marked : set
436 A set of nodes that were marked.
437 """
438 visited = {}
439 marked = set()
440 queue = []
442 visited[start_node] = None
443 queue.append(start_node)
444 while queue:
445 m = queue.pop(0)
447 for nbr in G.neighbors(m):
448 if nbr not in visited:
449 # memoize where we visited so far
450 visited[nbr] = None
452 # mark the node in Z' and do not continue along that path
453 if nbr in check_set:
454 marked.add(nbr)
455 else:
456 queue.append(nbr)
457 return marked