^{1}

^{2}

^{1}

^{3}

^{1}

^{4}

^{1}

^{5}

The authors have declared that no competing interests exist.

The advent of social networks revolutionized the way people access to information sources. Understanding the complex relationship between these sources and users is crucial. We introduce an algorithm, that we call PopRank, to assess both the

Social media and microblogging platforms has deeply reshaped the way users access content, communicate, and get informed. People can access to an unprecedented amount of information—only on Facebook more than 3M posts are generated per minute [

As far as we know, misinformation spreading on social media is directly related to the increasing polarization and segregation of users [

The present paper adopts a similar methodology, introducing an algorithmic assessment of the nodes of the bipartite pages-users network by leveraging its structure. Obviously, the quantities of interest and the observed dynamics are different from the original field of application of the Fitness approach, i.e. macroeconomics. This imposes different methodological choices and, in particular, a different mathematical formulation of the problem and a new algorithm that we name PopRank. The output of this algorithm is an assessment of pages’ impact and can be used to predict the users’ activity on such pages.

The rest of the paper is organized as follows. In the Methods section we describe the database we use to build the pages-users network and to quantify the future activities of users; we then introduce the PopRank algorithm to measure the Impact of pages and the Engagement of users. In the Results section we show the predictive power of our Impact measure and its dependence on the algorithm parameters; we then analyze the possible effects of users’ polarization. We conclude with a discussion of the implications of our results and some possible future applications.

Our database is a subset of the US Facebook database of [

The quantities we are interested into are basically three:

the monthly Activity

the monthly Activity

the number of users are commenting on a page, possibly divided in groups on the basis of their polarization level (that can be proxied, for instance, by counting how many comments they leave on the same page).

All these quantities can be organized in matrices and seen as the weights of three bipartite pages-users networks, as illustrated in ^{6} users. Notice that both the number of active (i.e. posting) pages and active (i.e. commenting) users can vary month by month.

The database consists in the history of interactions (like, comments) of Facebook users with Facebook pages. In this form, it corresponds to a bipartite graph whose edges have a time tag (i.e. when the interaction happened) and therefore can be multiple (each user can comment at different times the same page).

In total, we have 61 biadjacency matrices ^{m}, one for each month, where the element ^{th} month. Notice that such matrices can have different dimension; in particular, we observe that both the number of pages and the number of users increase with time. However, since the in the first months of the database the matrices ^{m} have a very limited number of elements, we consider only the months from the 40^{th} onwards. We further divide the remaining months in a ^{th} to the 55^{th}) and a ^{th} to the 61^{st}).

In order to build a reliable algorithmic assessment of pages’ impact, we want to aggregate such information in one global biadjacency matrix _{u,p} indicates the number of comments the users ^{th} to the 55^{th} month; as a further filter, we check that there are no inactive pages (i.e. whose total number of comments in the 15 month period is less than 5).

We now resort to a economic analogy: the matrix _{u,_} indicate how user _{_,p} indicate which are the users “investing” their time on page

Originally introduced in an economical context by Balassa [_{up} = 1 only if the share of comments of user

As discussed in the Introduction, we would like to extract information from the users’ activity on the page adopting a philosophy that is inspired to the approach used by Google’s PageRank [

Using the same spirit of the Fitness and Complexity algorithm [_{p} of page _{u} of user ^{(0)} up to the stationary point ^{(∞)}. The iterative procedure consists in computing ^{(0)}, and then ^{(1)} and so on, until a convergence criterion is reached. Using extensive numerical simulations, it has been shown that the fixed point of the Fitness and Complexity algorithm is unique [

We now turn our attention both to the explicit mathematical formulation of the PopRank algorithm and to its connection with the users’ behavior. A reasonable assumption about the Impact is that pages with higher Impact attract a lot of users, so the total number of users commenting page

Finally, as in [_{p} is the Impact of page _{u} is the Engagement of user

The algorithm is iterated until convergence in ranking, using the methodology introduced in [^{6}, that means that the next change in rankings is expected to happen not before 10^{6} iterations. We point out that this kind of stopping criterion is necessary for this kind of algorithms, in particular for sparse matrices. In fact, depending on the specific structure of the input matrix _{p} and _{u} may converge to 0 [

We now compare the output of our algorithm, and in particular the Impact _{p} of the Facebook page

The PopRank algorithm can also predict how many comments will be posted on that page and the number of users will comment its posts. In particular, we show the results for

As seen from the pictures, there is a high correlation among the two quantities ^{2} of a linear regression model, reaches a maximum value ^{2} ∼ 0.46 for ^{−12}). We use the Impact ranking, rescaled in such a way that the page with the highest Impact has and Impact ranking equal to 1. The use of the ranking will be fundamental to compare the results of the algorithm as a function of the exponent

As discussed in the introduction, previous studies have found a substantial symmetry between the polarization dynamics regardless of the specific conveyed information. Our dataset contains both scientific and conspiracy pages, so a natural question is whether the belonging to one of these groups affects the results of our analysis. In order to investigate this possible dependence we compute the residuals of the linear fit shown in

Scientific and conspiracy pages show a similar behavior with respect to the residuals. On the contrary, the Impact ranking shows a slight discriminative power since, on average, conspiracy pages have a lower ranking respect to scientific ones.

Now we test how the predictions given by our assessment of pages’ Impact depend on the exponent _{p} is the Activity of or on page _{u}_{up} (dark blue line, obviously independent from the value of the exponent). Notice that the Fitness and Complexity algorithm (which is recovered in the case

Negative exponents give better results.

We now analyze the possible dependence of the algorithm performance on users’ polarization. In order to quantify how much a page engages polarized users, following the results of [^{⋆} = 0.1, 0.2 … 1, where

We divide users according to their polarization and we count the number of users, belonging to a given group, that comments a given page. The PopRank algorithm can predict such values with similar performances across the groups, and always overperforming a simpler measure of Popularity.

We point out that, in principle, other measures of polarization could be taken into account, for instance replacing the fraction of comments with the fraction of likes, or considering the distribution of lurkers. However, as noted in [

In this paper we have introduced a novel algorithm, called PopRank, to rank both Facebook pages and users on the basis of their mutual interaction. To do so we have built a bipartite network whose links indicate that a given user is commenting the posts of a given page more than a suitable average. The bi-adjacency matrix of the network is the only input of PopRank, whose output is a quantitative assessment of pages’ Impact and users’ Engagement. In particular, we compute the two quantities one as a function of the other, iterating a system of coupled equations up to convergence. The general idea is that pages with a strong Impact are commented by many users with a low Engagement, and users have a high Engagement if they comment many pages with a high Impact. The Impact can be used to successfully predict the activity of and on users on a given page with a six months time delay. This result is robust with respect to reasonable variations of the algorithm’s only parameter

These results have been obtained by analyzing Facebook pages without any discrimination based on their informational content. This means that, for instance, scientific dissemination and fake news are processed in the same way and show the very same behavior: in particular, the relationship between their Impact and the future activity of their users is practically the same. This finding confirms the substantial symmetry between pages (and users) of different opinion, regardless of the possible veracity, if any, of the conveyed information.

Our approach is, as far as we know, the first attempt to leverage the bipartite pages-users network to simultaneously assess both the

The authors would like to thank Giulio Cimini, Emanuele Pugliese and Andrea Tacchella for early discussions.