^{1}

^{*}

^{2}

^{1}

^{2}

^{1}

^{2}

The authors have declared that no competing interests exist.

Analyzed the data: SR VS RPM DJTS. Contributed reagents/materials/analysis tools: SR VS RPM DJTS. Wrote the paper: SR VS RPM DJTS.

Data arising from social systems is often highly complex, involving non-linear relationships between the macro-level variables that characterize these systems. We present a method for analyzing this type of longitudinal or panel data using differential equations. We identify the best non-linear functions that capture interactions between variables, employing Bayes factor to decide how many interaction terms should be included in the model. This method punishes overly complicated models and identifies models with the most explanatory power. We illustrate our approach on the classic example of relating democracy and economic growth, identifying non-linear relationships between these two variables. We show how multiple variables and variable lags can be accounted for and provide a toolbox in R to implement our approach.

Social science usually aims to explain macro-level phenomena, such as stratification, segregation, democratisation, economic development and changes in values. From the vast number of examples studied across sociology, politics and economics, a few examples include: Does inequality decrease or increase ethnic segregation

While it is widely recognized that understanding at the micro-level is the key to causal mechanisms in sociology

Only a very small subset of social systems are characterized by sigmoidal growth curves. However, non-linear interactions between variables in social systems are common, and using differential equations to give an initial insight into macro-level relationships has a great deal of potential

The approach we take here is inspired by machine-learning and algorithmic modeling

We illustrate our approach on the classic problem of determining an interaction between GDP per capita and democracy (

The Pearson correlation coefficient between the two variables is 0.571 (p<0.01).

Our basic approach to understanding interactions between indicator variables is to model changes in one variable between times

Our aim is to take these time series and fit an ordinary differential equations model to them. A system of differential equations with two variables can be represented as

We can think of the time series of the indicator variables for each entity

In order to model as many non-linearities as possible, we take

In our standard implementation of a two variable model, we study models of the form:

A model is defined by a subset of coefficients

In the first stage, we aim to rapidly narrow our search by finding the maximum-likelihood model for each possible number of terms

We assume that the noise variance

Instead of reporting

In the second stage of our model selection algorithm, we choose the best model among those obtained in the first stage based on their ‘robustness’. Clearly,

To address this problem and evaluate the fit of these models, we adopt a Bayesian approach

The Bayes factor compensates for the increase in the dimensions of the model search space by integrating over all parameter values

In our implementation the range of values for

We compute the Bayes factor for the models

We repeat the same process to obtain the Bayes factor plots for the

In our implementation, and the application given below, we choose to use a uniform prior distribution for

Most social systems have many interacting variables and we can easily extend our methodology to systems with more variables. In this case, we need to get the best possible models

One approach to solve this problem is to use a model pruning algorithm that will search only a fraction of the entire model space by looking only at models with higher terms that are extensions of the best models with fewer terms. For instance, suppose

In social systems, there is often a lag or lead effect in variables. A change in one variable at any instant

To handle this issue, we can extend our approach by including as a new variable the time-lagged variable of interest. For instance, in a two variable system, if there is evidence to suggest that

In the methodology described above, best fit regressions on

Consider, for simplicity, the two variable system. The residual errors for the models

In the case of correlated noise processes,

In the seemingly unrelated regressions approach, we obtain the regression coefficients

Using this estimated covariance matrix, we perform multiple regression using the standard methods for regressions in the presence of correlated noise

If the underlying noise covariance matrix is almost diagonal, indicating that error terms are uncorrelated, the parameters estimated by the seemingly unrelated regressions approach will not differ significantly from the parameters obtained assuming uncorrelated errors.

We now investigate a frequently studied macro-level phenomenon, the relation between democracy and GDP per capita, using our proposed methodology

We start our analysis with two variable models, in which changes in democracy (

(a) The Log Likelihoods and (b) Bayes Factors for the same models.

The best two term model for changes in democracy is

In words, this model tells us that democracy grows once GDP per capita has reached a certain threshold, with this threshold being determined by democracy itself. Specifically, democracy grows when

The best five term model, and the best overall model is

As such complex models are usually more difficult to interpret in words, we might prefer the second best model with just two terms whose Bayes factor is only slightly smaller. Accepting simpler models makes interpretation of the interaction of GDP per capita and democracy more straightforward. On the other hand, the Bayes factor has already taken model complexity in to account, and we should look carefully at what the more complex model tells us.

We can investigate the difference between the two and five term models by visualizing the functional form of the two

For (a) the two term model given by

Now that we understand better what impact GDP per capita has on democracy, we turn our attention to changes in GDP per capita,

(a) The Log Likelihoods and (b) Bayes Factors for the same models.

The model shows that GDP is primarily growing at a constant rate, but is additionally positively affected by democracy interacting with GDP. Moreover, the growth is self-limiting at high levels of GDP.

With the interactions between democracy and GDP per capita identified, they can be displayed visually using a phase portrait We provide phase portraits with example trajectories both from the data (

From (a) country by country data and (b) model predictions made by integrating

We now implement the various extensions presented in the methods section. We start by testing for correlated noise in the models. Computing the error correlations for

The re-estimated coefficients are only slightly different than those originally determined and do not make a significant difference to the phase portrait. Generally, it appears that the effects of both democracy on GDP per capita and of GDP on democracy are slightly overestimated using the original method.

A natural extension to three variables in this case is the inclusion of a lag. In particular, we might expect earlier levels of GDP to influence future changes in democracy. We therefore look at changes in democracy

The Bayes factor for both three variable lagged and two variable non-lagged models are shown in

(a) Bayes factors for the two-variable

The best model (see

Here, democracy is no longer a predictor for GDP per capita, which is now solely predicted by itself at different points in time. This is now a significantly different model than the one given by

For the two-variable

For completeness, we also checked whether changes in democracy improve with

The approach to studying social systems we present here emphasizes exploratory model fitting. Exploratory approaches help to identify new and unexpected patterns and explanations

Our methodology can be applied to any social system which has reasonable amounts of longitudinal or panel data, that is data with repeated measurement over time for a number of independent entities. On the macro-level the method can be used to study cross-national development dynamics, for instance, the relationship between a country’s gross domestic product, child mortality and education levels. If regional or city district data is available it is possible to use the method to study for instance neighbourhood segregation processes. On a meso-level the researched entities could be organisations, companies or schools, to study, for instance, dynamic female employment patterns of companies.

In studying social systems there is seldom one single unique best fit model that fully explains the data. As we saw when comparing the Bayes factor of

We do not attempt here to give a political or sociological interpretion of the best fit models. However, the dynamic nature of the fitted models provides a starting point for thinking about causal mechanisms

The suggestion that the terms in the fitted models could relate to plausible causal mechanisms is one argument for adopting an approach based on differential equations. If our aim was only to predict future changes in indicator values then any one of a range of statistical or machine learning techniques, such as Gaussian processes or neural networks, could be employed in model fitting. We could further employ Bayesian model averaging in which we weight models according to their Bayes factor

The authors would like to acknowledge Stamatios C. Nicolis for numerous discussions that helped shape the current paper.