^{1}

^{2}

^{*}

^{1}

^{3}

^{4}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: PAG JJR BG VME. Performed the experiments: PAG. Analyzed the data: PAG. Contributed reagents/materials/analysis tools: PAG JJR BG VME. Wrote the paper: PAG JJR BG VME.

Daily interactions naturally define social circles. Individuals tend to be friends with the people they spend time with and they choose to spend time with their friends, inextricably entangling physical location and social relationships. As a result, it is possible to predict not only someone’s location from their friends’ locations but also friendship from spatial and temporal co-occurrence. While several models have been developed to separately describe mobility and the evolution of social networks, there is a lack of studies coupling social interactions and mobility. In this work, we introduce a model that bridges this gap by explicitly considering the feedback of mobility on the formation of social ties. Data coming from three online social networks (Twitter, Gowalla and Brightkite) is used for validation. Our model reproduces various topological and physical properties of the networks not captured by models uncoupling mobility and social interactions such as: i) the total size of the connected components, ii) the distance distribution between connected users, iii) the dependence of the reciprocity on the distance, iv) the variation of the social overlap and the clustering with the distance. Besides numerical simulations, a mean-field approach is also used to study analytically the main statistical features of the networks generated by a simplified version of our model. The robustness of the results to changes in the model parameters is explored, finding that a balance between friend visits and long-range random connections is essential to reproduce the geographical features of the empirical networks.

The advent of the big data revolution has opened the door to the analysis of massive datasets on all aspects of society. New technologies have made possible the access to unprecedented amount of information on human behavior generated unobtrusively whenever people interact with or through modern technologies such as cell phones, online services, mobile applications, etc. This fact is facilitating the pursuit of a computational approach to the study of problems traditionally associated with social sciences

The relation between physical location and social interactions can be also explored with the new available data. In general, people tend to interact and maintain relations with geographically close peers. A tendency that gets reflected in a decay of the social interaction probability with the physical distance. This effect has been observed, for example, in phone call records

The availability of geo-localized information has also allowed for a detailed exploration of human mobility

In this work, we lay a bridge between these two worlds by introducing a model coupling social tie formation and spatial mobility. Preceding models considering network structure and geography are uncoupled

We have collected data from online social networks containing both social links and information about the users’ physical positions. The first dataset was obtained from Twitter by means of its API

TOTAL(×10^{3}) |
US(×10^{3}) |
UK(×10^{3}) |
DE(×10^{3}) |
|||||

N | L | N | L | N | L | N | L | |

714 | 15000 | 132 | 1100 | 28 | 117 | 3.8 | 8.5 | |

Gowalla | 196 | 950 | 46 | 350 | 5.2 | 20 | 5.2 | 30 |

Brightkite | 58 | 214 | 27 | 167 | 3.1 | 10 | 1.3 | 7.2 |

Social interactions across country borders have particular properties and are affected by political, linguistic or cultural factors. We overcome this difficulty by restricting our analysis to the networks within each country. Intra-country mobility and social contacts account for the large majority of a user activity

The model structure is illustrated in

The central node is the filled red circle and its neighbors are marked in blue. Directionality of links is neglected in this schematic to maintain simplicity.

Travel

Visit a randomly selected friend at his current location with probability _{v}.

Otherwise, travel to a new location. The distance of travel is obtained from a distribution of jump lengths, while the direction is chosen proportionally to the population density at the target distance.

Friendship

With probability

With probability _{c}, create a directed connection to a randomly chosen agent anywhere in the system.

The acronym of the TF model comes from the initials of these two stages. The model is iterated until the number of created connections is equal to the number of links measured in the empirical networks. Despite its simplicity, the model incorporates several major features of human behavior. The

The model has four input parameters: _{v}_{c} and _{v}_{c} as free model parameters, we will systematically explore in the coming sections the impact of these parameters on the model results, since, as it will be shown, they are essential for generating network comparable with the empirical ones.

We start by establishing a set of metrics in order to characterize networks structure and its relation to geography. First, we measure the probability of two users to have a link at a certain distance

Various statistical network properties are plotted for the data obtained from Twitter (red squares), Gowalla (blue diamonds), Brightkite (green triangles) and the null models (dashed lines), for the US (for the UK and Germany, see

A second metric that we consider is the degree distribution of the social networks (see

Connections in Twitter are directed: one user follows the messages emitted by another. Reciprocated connections indicate mutual interest between the two users and a closer type of social relation

With the aim of quantifying social closeness between users, we define the social overlap

Another well known phenomenon in social networks is triadic closure. As one individual has a close relation with other two persons, there are high chances that these two individuals end up creating a social relation between themselves. In network analysis, a magnitude that quantifies this effect is the average clustering coefficient

Given a triangle, several configurations are possible if there is diversity in the edge lengths. The triangle can be equilateral if all the edges have the same length, isosceles if two have the same length and the other is smaller, etc. We estimate the dominant shapes of the triangles in the network by measuring the disparity

Summarizing, we have defined the following metrics in order to characterize the networks structure and its relation to geographical distance:

_{1}(

_{f}(

We will use these metrics in the coming sections to estimate the ability of model to produce social networks comparable with those obtained from the empirical datasets.

Next, we will find a compromise between the different metrics and search for the parameter values for which a given model best fits simultaneously the various statistical properties. To do so, we define an overall error Err to quantify the difference between the networks generated with the model and the empirical ones. The parameters of the model are then explored to find the values that minimize Err. We measure the error

The properties

We perform a Latin square sampling of the parameter space of _{v} and _{v} and in a logarithmic one for _{v} in the interval

Values of the error Err when _{v} and

An example with the displacements between the consecutive locations and the ego networks for a sample of individuals, as generated by the TF model, are displayed in

Mobility (upper row) and ego networks (lower row) of 20 random users (different colors) for the instances of the TF model yielding the lowest error Err (see

The geo-social properties of the networks generated by the TF model are shown in

Various statistical properties are plotted for the networks obtained from Twitter data (red squares) and from simulation of the TF model (black line) for the US. Corresponding results for the UK and Germany can be found in

In this section we explore two null models uncoupling mobility and social interactions to help us interpret the mechanisms acting in the TF model. The first null model, the spatial model (S model), is based solely on the geography and consists of randomly connecting pair of users with a probability depending on the distance, but does not take network structure into account. The second null model, the linking model (L model), in contrast, is based only on random linking and triadic closure, and it is equivalent to the TF model without the mobility. We consider the two uncoupled null models and compare their results with those of the TF model. In this way, we demonstrate the importance of the coupling through a realistic mobility mechanism to reproduce the empirical networks.

The spatial model (S model) consists of randomly connecting pair of users with a probability that decays as power-law of the distance between them (suggested in

The minimal values of the error Err for the TF model, the two null models: spatial (S model) or linking (L model), and the TF model with normally or uniformly distributed travel distances.

The linking model (L model) is a simplified version of the TF model, without random mobility and the box size _{v}, whereas with probability

The geography and the structure are coupled in the TF model through the random mobility. Changes in the underlying mobility mechanism affect the quality of the results. The lowest Err values are obtained with the power-law distribution in the jump lengths, while normal or uniformly distributed jumps yield worse results (e.g., for the US the TF model has Err lower by

Simplified models that neglect either geography or network structure perform considerably worse than the TF model in reproducing the properties of real networks. Likewise, non-realistic assumptions on human mobility mechanism yield worse results than the default TF model. To conclude, the coupling of geography and structure through a realistic mobility mechanism produces networks with significantly more realistic geographic and structural properties.

The results presented so far have been obtained at the optimal values of _{v} and _{v} while _{v}, marking the limit in which random mobility is the main mechanism for the agents’ traveling in detriment of friend visits. In this case, most of the links are created due to encounters occurring in nearby locations or are random connections, and so the distribution of triangles disparity _{v}, the reciprocity

We change the value of _{v} while keeping

_{v} is fixed to its optimal value. The effect of _{v}: these metrics decrease at all distances with increasing

We change the value of _{v} fixed to its optimal value. Note that this corresponds to an exploration of the parameter space along the horizontal line crossing the minimum of Err as plotted in

A possible variation of the TF model consists of eliminating friend visits or random connections (_{v} or

In this section, we consider the L model, introduced earlier in this section, to gain some analytical insights on the mechanisms ruling the final network structure. Although this model is a simplified version of the TF model, the results of the simulations yield a relatively low value of Err (

The clustering coefficient is defined as a ratio of all the closed triads to all triads existing in the network,

The reciprocity of connections

To calculate the degree distribution

_{c}, as in the lower plots of _{c}, as in the upper plots of

Predictions of the analysis versus results of the simulation of the L model for the clustering coefficient

The mean-field analysis of the L model shows that the friend visiting mechanism is a direct cause of triangle closure and link reciprocity. _{c}, which controls the mechanism of random connections, is high. Similarly,

We introduce a model that couples human mobility and link creation in social networks. The aim is to characterize the relation between network topology and geography observed in empirical online networks. The model has two free parameters _{c} and _{v} but, despite its simplicity, it is able to reproduce a good number of geo-social features observed in real data at a country level. Comparing the TF model with simplified null models, we find that the coupling of geography and structure through a realistic mobility mechanism produces significantly more realistic social networks than the uncoupled models.

Social links in our model are formed mostly with relational (due to triadic closure), and proximity (through spatio-temporal coincidences) mechanisms

The TF model is generic and functional for different datasets. Human mobility driven by social ties has impact on the modeling of disease spreading, and may improve its predictions. This model can also be used in simulations of processes that involve social networks and geography, e.g., simulations of opinion formation, language evolution, or responses of a population to extreme events. Moreover, it can also be helpful to design network benchmarks with realistic geo-social properties to test, for instance, the scalability of technical solutions in social online networks related to geography of its physical infrastructure.

The data analyzed are publicly available as they come from public online social sites or data repositories (Twitter, Gowalla and Brightkite). Since our analysis relies on statistical features and not on single cases, any private information about users had been removed and the analysis was performed on anonymized datasets.

(EPS)

(EPS)

(EPS)

(EPS)

_{v} on the TF model._{v} while keeping _{c} fixed to the optimal value. Note that this corresponds to an exploration of the parameter space along the vertical line crossing the minimum of Err as plotted in

(EPS)

_{v} on the TF model._{v} while keeping _{c} fixed to the optimal value. Note that this corresponds to an exploration of the parameter space along the vertical line crossing the minimum of Err as plotted in

(EPS)

_{c} on the TF model._{c} while keeping _{v} fixed to its optimal value. Note that this corresponds to an exploration of the parameter space along the horizontal line crossing the minimum of Err as plotted in

(EPS)

_{c} on the TF model._{c} while keeping _{v} fixed to its optimal value. Note that this corresponds to an exploration of the parameter space along the horizontal line crossing the minimum of Err as plotted in

(EPS)

(PDF)

(PDF)

We would like to warmly thank Luis F. Lafuerza for helpful discussions on the analytical treatment of the model.