^{1}

^{2}

^{3}

^{3}

^{1}

^{2}

^{*}

Conceived and designed the experiments: KEB. Performed the experiments: CIDG HK ZT KEB. Analyzed the data: CIDG. Wrote the paper: CIDG ZT KEB. Implemented and tested algorithm: CIDG HK.

The authors have declared that no competing interests exist.

Uniform sampling from graphical realizations of a given degree sequence is a fundamental component in simulation-based measurements of network observables, with applications ranging from epidemics, through social networks to Internet modeling. Existing graph sampling methods are either link-swap based (Markov-Chain Monte Carlo algorithms) or stub-matching based (the Configuration Model). Both types are ill-controlled, with typically unknown mixing times for link-swap methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, polynomial time algorithm that generates statistically independent graph samples with a given, arbitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing the observable to be measured either uniformly over the graph ensemble, or, alternatively, with a desired distribution. Unlike other algorithms, this method always produces a sample, without back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large

Network representation has become an increasingly widespread methodology of analysis to gain insight into the behavior of complex systems, ranging from gene regulatory networks to human infrastructures such as the Internet, power-grids and airline transportation, through metabolism, epidemics and social sciences

In spite of its practical importance, finding a method to construct degree-based graphs in a way that allows the corresponding graph ensemble to be properly sampled has been a long-standing open problem in the network modeling community (references using various approaches are given below). Here we present a solution to this problem, using a biased sampling approach. We consider degree-based graph ensembles on two levels: 1) sequence-level, where a specific sequence of degrees is given, and 2) distribution level, where the sequences are themselves drawn from a given degree distribution

Recently, a direct, swap-free method to systematically construct all the simple graphs realizing a given graphical sequence

Here, by developing new results from the theorems in Ref.

Before introducing the algorithm, we state some results that will be useful later on. We begin with the Erdös-Gallai (EG) theorem

A necessary and sufficient condition for the graphicality of a degree sequence, which is constrained from having links between some node and a “forbidden set” of other nodes is given by the star-constrained graphicality theorem

A direct consequence

Another result we exploit here is Lemma 3 of Ref.

Finally, using Lemma 1 and Theorem 2, we prove:

Note that if a sequence is non-graphical, then it is not star-constrained graphical either, and thus Theorem 3 is in its strongest form.

The sampling algorithm described below is ergodic in the sense that every possible simple graph with the given finite degree sequence is generated with non-zero probability. However, it does not generate the samples with uniform probability; the sampling is biased. Nevertheless, the algorithm can be used to compute network observables that are unbiased, by appropriately weighing the averages measured from the samples. According to a well known principle of biased sampling

Let

The sample weights needed to obtain unbiased estimates using Eq. 2 are the inverse relative probabilities of generating the particular samples. If in the course of the construction of the sample

The most difficult step in the sampling algorithm is to construct the set of allowed nodes

The first time

If there are non-forbidden nodes in the residual degree sequence that have degree less than any in its leftmost adjacency set, then the maximum fail-degree can be found with a procedure that exploits Theorem 2. In particular, if the hub is connected to a node with a fail-degree, then, by Theorem 2, even if all the remaining links from the hub were connected to the remaining nodes in the leftmost adjacency set, the residual sequence will not be graphical. Our method to find fail-degrees, given below, is based on this argument.

Begin by constructing a new residual sequence

At this point, in principle one could find the maximum fail degree by systematically connecting the last link of the hub with non-forbidden nodes of decreasing degree, and testing each time for graphicality using Theorem 1. If it is not graphical then the degree of the last node connected to the hub is a fail-degree, and the node with the largest degree for which this is true will have the maximum fail-degree. However, this procedure is inefficient because each time a new node is linked with the hub the residual sequence changes and every new sequence must be tested for graphicality.

A more efficient procedure to find the maximum fail-degree instead involves only testing the sequence

Considering these conditions that can cause Inequality 1 to fail for

Note that after a link is placed in the sample construction process, the residual degree sequence

Finally,

A recurrence relation for

For non-increasing degree sequences, define the “crossing-index”

Using Eqs. 4 and 6, the mechanism of the calculation of

It should be noted that the usefulness of this method for calculating

As previously stated, the weight

Furthermore, one can consider not just samples of a particular graphical sequence, but of an ensemble of sequences. By a similar argument to that given above for individual sequences, the weight

The ensemble contained

We have also studied the behavior of the mean and the standard deviation of the probability distribution of the logarithm of the weights of such power-law sequences as a function of

The black circles correspond to

In this section we discuss the algorithm's computational complexity. We first provide an upper bound on the worst case complexity, given a degree sequence

To determine an upper bound on the worst case complexity for constructing a sample from a given degree sequence

From Eq. 8, the expected complexity for the algorithm to construct a sample for a degree sequence of random integers chosen from a distribution

Given a particular form of distribution

The leading order of the computational complexity of the algorithm as a power of

We have tested the estimates shown in

We have solved the long standing problem of how to efficiently and accurately sample the possible graphs of any graphical degree sequence, and of any ensemble of degree sequences. The algorithm we present for this purpose is ergodic and is guaranteed to produce an independent sample in, at most,

It is important to note that the sampling algorithm is guaranteed to successfully and systematically proceed in constructing a graph. This behavior contrasts with that of other algorithms, such as the configuration model (CM), which can run into dead ends that require back-tracking or restarting, leading to considerable losses of time and potentially introducing an uncontrollable bias into the results. While there are classes of sequences for which it is perhaps preferable to use the CM instead of our algorithm, in other cases its performance relative to ours can be remarkably poor. For example, a configuration model code failed to produce even a single sample of a uniformly distributed graphical sequence,

One of the features of our algorithm that makes it efficient is a method of calculating the left and right sides of the inequality in the Erdös-Gallai theorem using recursion relations. Testing a sequence for graphicality can thus be accomplished without requiring repeated computations of long sums, and the method is efficient even when the sequence is nearly non-degenerate. The usefulness of this method is not limited to the algorithm presented for graph sampling, but can be used anytime a fast test of the graphicality of a sequence of integers is needed.

There are now over 6000 publications focusing on complex networks. In many of these publications various processes, such as network growth, flow on networks, epidemics, etc., are studied on toy network models used as “graph representatives” simply because they have become customary to study processes on. These include the Erdös-Rényi random graph model, the Barabási-Albert preferential attachment model, the Watts-Strogatz small-world network model, random geometric graphs, etc. However, these toy models are based on specific processes that constrain their structure beyond their degree-distribution, which in turn might not actually correspond to the processes that have led to the structure of the networks investigated with them, thus potentially introducing dangerous biases in the conclusions of these studies. The algorithm presented here provides a way to study classes of simple graphs constrained solely by their degree sequence, and nothing else. However, additional constraints, such as connectedness, or any functional of the adjacency matrix of the graph being constructed, can in principle be added to the algorithm to further restrict the graph class built.

After this paper was accepted for publication, we became aware of an unpublished work by J. Blitzstein and P. Diaconis that provides another direct construction method for sampling graphs with given degree sequences.

The authors gratefully acknowledge Y. Sun, B. Danila, M. M. Ercsey Ravasz, I. Miklós, E. P. Erdös and L. A. Székely for fruitful comments, discussions and support.