PLoS ONEplosplosonePLOS ONE1932-6203Public Library of ScienceSan Francisco, CA USAPONE-D-19-3238510.1371/journal.pone.0233276Research ArticleComputer and information sciencesNetwork analysisCentralityPhysical sciencesMathematicsAlgebraLinear algebraEigenvectorsComputer and information sciencesGraph theoryDirected graphsPhysical sciencesMathematicsGraph theoryDirected graphsComputer and information sciencesNetwork analysisScale-free networksSocial sciencesLinguisticsSemanticsComputer and information sciencesNetwork analysisSocial networksSocial sciencesSociologySocial networksComputer and information sciencesNetwork analysisNetwork reciprocityComputer and information sciencesProgramming languagesDistinctiveness centrality in social networksDistinctiveness centrality in social networkshttp://orcid.org/0000-0002-5348-9722Fronzetti ColladonAndreaConceptualizationData curationFormal analysisFunding acquisitionInvestigationMethodologySoftwareValidationVisualizationWriting – original draftWriting – review & editing^{1}*NaldiMaurizioConceptualizationData curationFormal analysisFunding acquisitionInvestigationMethodologySoftwareValidationVisualizationWriting – original draftWriting – review & editing^{2}^{3}Department of Engineering, University of Perugia, Perugia, ItalyDepartment of Law, Economics, Politics and Modern languages, LUMSA University, Rome, ItalyDepartment of Civil Engineering and Computer Science, University of Rome Tor Vergata, Rome, ItalyXiaoGaoxiEditorNanyang Technological University, SINGAPORE
The authors have declared that no competing interests exist.
* E-mail: andrea.fronzetticolladon@unipg.it20202252020155e0233276221120199420202020Fronzetti Colladon, NaldiThis is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The determination of node centrality is a fundamental topic in social network studies. As an addition to established metrics, which identify central nodes based on their brokerage power, the number and weight of their connections, and the ability to quickly reach all other nodes, we introduce five new measures of Distinctiveness Centrality. These new metrics attribute a higher score to nodes keeping a connection with the network periphery. They penalize links to highly-connected nodes and serve the identification of social actors with more distinctive network ties. We discuss some possible applications and properties of these newly introduced metrics, such as their upper and lower bounds. Distinctiveness centrality provides a viewpoint of centrality alternative to that of established metrics.
http://dx.doi.org/10.13039/501100010607Università degli Studi di Perugia"Fondo Ricerca di Base 2019", project n. RICBA19LTIhttp://orcid.org/0000-0002-5348-9722Fronzetti ColladonAndreahttp://dx.doi.org/10.13039/501100003407Ministero dell’Istruzione, dell’Università e della RicercaPRIN Project n. 20174LF3T8 - AHeADNaldiMaurizioM.N.: Work partially supported by MIUR, the Italian Ministry of Education, University and Research, under PRIN Project n. 20174LF3T8 AHeAD (Efficient Algorithms for HArnessing Networked Data). A.F.C.: Work partially supported by the University of Perugia, through the program "Fondo Ricerca di Base 2019", project n. RICBA19LTI ("Business Intelligence, Data Analytics e simulazioni a eventi discreti per le Smart Companies nell’era dell’Industria 4.0"). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Data AvailabilityAll relevant data are within the manuscript and its Supporting Information files.Introduction
The determination of node centrality is a fundamental and popular topic in social network studies [1–3], which never stopped attracting the interest of scholars, e.g. [4–7]. The concept of centrality has been interpreted in many ways, and several metrics have been proposed to study the positional power of social actors [8, 9]. Similarly, different validation approaches were used to assess the role of these metrics in the identification of influential nodes [10]. Three of the most famous centrality metrics—i.e. degree, closeness and betweenness centrality—were described by Freeman [2]. While degree counts how many direct connections a node has, closeness and betweenness are computed considering also indirect connections. Closeness is measured as the reciprocal of the sum of the length of the shortest paths between a node and all other nodes in the graph; it gives an idea of how quickly a social actor can reach its peers. Betweenness centrality counts how many times a node lies in-between the paths that interconnect the other nodes, thus serving as a bridge and acquiring brokerage power.
Other studies introduced the idea that centrality not only depends on the social position of a node but also on that of its neighbours—like in the case of eigenvector centrality [11]. This metric attributes higher scores to nodes connected to other important nodes. “A person with few connections could have a very high eigenvector centrality if those few connections were to very well-connected others” [12]. For example, if a lowly graduate student publishes a paper with her/his supervisor (who has published many papers with others), the student becomes important, simply by virtue of her/his connection to the supervisor. Few connections with extremely important nodes can be enough to make a node important.
On the other hand, scholars like Burt [13] noted that there are cases where social actors exert a stronger influence if their peers are not strongly connected among each other. He posed the question whether having a dense ego-network is beneficial to social capital and showed that individuals might hold positional advantages or disadvantages based on the network they are embedded in (i.e. on the connections among their peers). In particular, missing links among the actors in a node’s neighbourhood (structural holes) are often seen as an advantage, as the node can act as a mediator, use a divide-et-impera strategy, or combine ideas from different sources and come up with the most innovative one [14]. On the other hand, a high ego-network closure is often seen as a constraint to the brokerage power of the ego, who cannot mediate among its peers. This effect is measured through network constraint, i.e. the extent to which the neighbours of a node are also connected to each other [14]. An alternative metric, based on the same logic, is the effective size one [13, 15], which quantifies the non-redundant part of a person’s relationships, with a person’s ego-network having redundancy if her/his contacts are connected to each other as well.
Several variations of the above-mentioned metrics were proposed [1, 16], as well as different algorithms for their fast computation on large graphs [17]. Indeed, metrics such as weighted betweenness centrality are costly to compute [18]. However, the majority of centrality metrics tend to attribute stronger influence to nodes that are highly connected, or which are connected to other important nodes. Connections to the network periphery, on the other hand, are often regarded as less relevant.
In this paper, we question this last assumption and propose a new set of metrics—which we call Distinctiveness Centrality (DC)—that attribute more importance to nodes which have links to loosely connected others. While we still recognize the pivotal importance of traditional centrality metrics, we also believe that there may be contexts in which connections to peripheral nodes should be valued more. For example, it might be the case that nodes with more peripheral connections keep the network together, avoiding fragmentation. These nodes may be the only ones able to reach certain peers and could be used as a seed for the diffusion of practices that promote health in the population. In other applications, for example when analysing word co-occurrence networks [19] to evaluate brand importance [20], brands with connections to distinctive words may be more important, as they show unique traits that distinguish them from competitors. They convey a different brand image. These are just some examples showing the need for new centrality metrics, which can favour non-redundant connections towards loosely connected nodes. Accordingly, we introduce a new set of indicators that capture the value of distinctive connections and add to the information captured by traditional centrality measures. Distinctiveness centrality is also relatively fast to compute, as it does not require the calculation of shortest network paths, which is necessary for other metrics instead (e.g. closeness and betweenness).
The remainder of the paper is organized as follows: in the next two sections, we define a set of five measures of distinctiveness centrality and compare them with well-known ego-network measures, to show that the information they capture is different. We also derive lower and upper bounds that could be used for normalization, to allow the comparison of scores obtained on different networks. Subsequently, we present the five metrics in the case of directed graphs. In the section named Possible Applications, we provide examples and illustrate some possible use cases. In the last section, we discuss our findings and make proposals for future research.
Definition of metrics
In this section, we present five metrics of distinctiveness centrality, which were all conceived following the same logic: giving more importance to nodes that are strongly connected to loosely connected peers, so that they make the network periphery more reachable. In the computation of network centrality, all our metrics penalize connections to hubs or nodes that are very well connected. The concept of degree centrality is reinterpreted following this logic.
Let’s consider a network that we represent through a weighted undirected graph G, which is described by the triplet G = (V, E, W). Let V be the set of nodes of cardinality |V| = n, E = (x, y): x, y ∈ V, x ≠ y be the set of arcs, and W be the set of weights associated to the arcs, with m=minWwij, M=maxWwij, ∀i, j. If the nodes i and j are not connected, we assume w_{ij} = 0, otherwise we assume w_{ij} ≥ 1. If m = M, the graph is practically unweighted; in that case we can rescale all the weights and assume w_{ij} = 1.
For the generic node i ∈ V, we introduce five distinctiveness centrality metrics. In the following, g_{j} is the degree of node j and I_{(f)} is the indicator function which equals 1 if f = TRUE (we will often use the indicator function to account for non-existing arcs). An exponent α ≥ 1 is used in the formulas to allow a stronger penalization of connections with highly connected nodes. In order not to clutter the notation, the exponent α will not be included as an argument of any metric, though it is clear that the value of metric depends on it.
In the following, we do not consider the case of isolates and compute upper and lower bounds for the new metrics. Indeed, established centrality measures, such as degree, closeness, and betweenness, share a common property of being subject to normalization so that they take values in the [0, 1] range. This property is desirable since it allows to make centrality statements of the low-high kind. Also, it allows comparing the centrality of networks of different sizes. We want this property to hold also for our new centrality measures. Here we limit to the case of connected networks, where no node is isolated so that g_{i} ≥ 1, ∀i.
Weighted distinctiveness centrality
It is defined as
D1(i)=∑j=1j≠inwijlog10n−1gjα.
This metric is similar to weighted degree centrality [9], as it sums the weight of all arcs connected to a node. However, weights here are penalized based on the number of connections that a node’s peers have. For each node, the sum providing the metric’s value is made of as many terms as the degree of the node.
If we set α = 1, all the terms are non-negative. However, if a neighbouring node is connected to all the other nodes as well (i.e. g_{j} = n − 1), its contribution to the sum is zero; the rationale is that node i adds the minimum possible improvement to the reachability of node j by connecting it since node j is already connected to all other nodes. Instead, if a neighbouring node is connected to node i only, the weight of the arc w_{ij} connecting them is multiplied by the maximum possible factor log_{10}(n − 1) (the rationale here is that node j would be unreachable if it were not connected by node i).
Instead, if α > 1, all the neighbouring nodes whose degree is
gj>eln(n-1)α
provide a negative contribution to the sum and therefore lower the overall value of the metric.
In order to derive the bounds of the metric, we start by considering that the maximum is achieved when all the following conditions are satisfied: a) the node i has the maximum connectivity (so that the sum has the maximum possible number of terms); b) all the neighbouring nodes of node i have minimum connectivity (g_{j} = 1, ∀j ≠ i, i.e., they are connected to node i only), which in turn guarantees that all contributions are positive; c) the weights of the arcs connected to the node i are maximum. Under these conditions, we have
D1(i)≤M(n-1)log10(n-1).
It is to be noted that conditions a) and b) take place if the node i is the hub of a star topology. In addition, we note that the upper bound implied by Eq (3) does not depend on α and is then valid also when α > 1.
On the other hand, we get the minimum value of the metric when all contributions are negative and take the largest possible values. Namely, the following conditions have to be satisfied: a) the neighbours of node i are connected to all other nodes (g_{j} = n − 1, ∀j ≠ i), which in turn leads to negative contributions to the sum for any α > 1); b) the node i is connected to all other nodes (which leads to the maximum possible number of terms in the sum); c) the weights of the arcs connected to node i are all maximum. We have then
D1(i)≥M(n-1)log10n-1(n-1)α=(1-α)M(n-1)log10(n-1).
It is to be noted that conditions a) and b) are met in a fully meshed network for a node whose arcs all exhibit the maximum weight. Though the lower bound implied by Eq (4) depends on α, the lower bound is valid when α = 1 as well (which leads to D_{1}(i) ≥ 0): for that case, since all contributions are non-negative, the lowest we can get is zero, which is achieved either in a full mesh topology or for a terminal node in a star topology (in that case we would have a sum made of a single zero term).
Distinctiveness centrality
It is defined as
D2(i)=∑j=1j≠inlog10n-1gjαI(wij>0).
This metric can be seen as the degree centrality [9] adjusted through the same logarithmic term used in D_{1}. Alternatively, it can be seen as a variant of D_{1} where arc weights are not considered, but just the number of connections a node has. Mathematically D_{2} is equal to D_{1} with w_{ij} = 1. We can then derive the lower and upper bounds for D_{2} from those obtained for D_{1} just by setting M = 1 in Eqs (4) and (3). We get
D2(i)≤(n-1)log10(n-1),
and
D2(i)≥(1-α)(n-1)log10(n-1).
Global weight distinctiveness centrality
It is defined as
D3(i)=∑j=1j≠inwijlog10∑k,l=1k≠lnwkl2(∑k=1k≠jnwjkα)-wijα+1.
Here again, the index is made of a sum of terms, where just the nodes adjacent to the node of interest i are included. Each adjacent node is accounted for through the weight of the arc connecting it to the node of interest. However, that weight is itself weighted by a logarithmic term that introduces a penalization for those nodes that are highly connected and with large arc weights. When α = 1, the denominator in the logarithm argument is the sum of the arc weights for the arcs connected to the nodes adjacent to the node of interest, excluding the arc connecting it to the node of interest. The numerator of the logarithm argument is just a normalization factor (the sum of all arc weights in the graph), introduced to consider the proportion of the total weights that is accountable to the connections of node j. The major difference with respect to D_{1} is that the arc weight, rather than the degree, is considered in the penalization factor.
As for the other metrics, we look for the lower and upper bound of the metric.
As to the lower bound, the argument of the logarithm in Eq (8) can be lowered through the following sequence of inequalities:
log10∑k,l=1k≠lnwkl2(∑k=1k≠jnwjkα)-wijα+1≥log10∑k=1k≠i,jnwjk+wij∑k=1k≠i,jnwjkα+1≥log10∑k=1k≠i,jnwjk+m∑k=1k≠i,jnwjkα+1≥log10(n-2)M+m(n-2)Mα+1
However, the minimum argument that we get from the last inequality may still be larger than 1 (hence the logarithm would be positive). If that’s the case, i.e. if the following condition holds:
(n-2)M+m(n-2)Mα+1>1→(n-2)(Mα-M)<m-1,
a lower bound for D_{3} is obtained for a sum made of just a single term and the minimum factor:
D3(i)>mlog10(n-2)M+m(n-2)Mα+1.
On the other hand, if the condition (10) does not hold, the logarithm turns negative, and a lower bound can be obtained by considering the maximum possible number of terms in the sum, with the maximum factor:
D3(i)>(n-1)Mlog10(n-2)M+m(n-2)Mα+1.
It is to be noted that the upper bound in the case of negative terms in the sum has been obtained by acting separately on the logarithm and on the sum itself, though the two actions rely on conditions that may not take place at the same time: the resulting lower bound may then be quite loose.
As to the upper bound, again, it seems not to be possible to get an upper bound as tight as we do for the other metrics. As in the other cases, we may draft a list of the features we wish to maximize D_{3}: a) maximizing the number of terms in the sum (8); b) maximizing the weight w_{ij} of the arc connecting the node of interest to its adjacent nodes; c) maximizing the numerator of the argument of the log; d) minimizing the denominator of the argument of the log. Unfortunately, the terms in the sum are not independent of each other, and increasing the number of terms decreases the value of the individual terms: the aims highlighted above typically conflict with each other.
A loose upper bound can be obtained if we satisfy all the conditions reported above, regardless of their interactions. In particular, we consider a sum made of the maximum possible number of terms, with the maximum factor w_{ij} = M. The maximum of the logarithm is considered when the neighbouring nodes have no other connections (to lower the denominator to just the 1 term), and the numerator is computed as for a full mesh with maximum weights (which is the maximum that the sum of all arc weights can get):
D3(i)<(n-1)Mlog10n(n-1)M2.
Weighted proportional distinctiveness centrality
This metrics is defined as
D4(i)=∑j=1j≠inwijwijα∑k=1k≠jnwjkα.
As in the case of D_{3}, this metric is a weighted sum of arc weights, but it differs for the choice of the penalization factor. This factor is now the ratio of the weight of the arc connecting node i to the neighbouring node (raised to the power of α) to the sum of weights (raised to the power of α) of all the arcs connected to the neighbouring node. The factor penalizes nodes connected to highly connected nodes so that we expect the metric to be large for nodes that are highly connected to nodes that are poorly connected. It is to be noted that this metric is always positive (for non-isolated nodes).
Considering the nodes connected to i (those for which w_{ij} ≥ 1), we can rewrite the expression of D_{4} as
D4(i)=∑j=1j≠inwijα+1wjiα+∑k=1k≠j,inwjkα=∑j=1j≠inwij1+∑k=1k≠j,in(wjkwij)α,
since w_{ij} = w_{ji} for an undirected network.
The maximum of this metric is achieved when the following conditions hold: a) node i is maximally connected (g_{i} = n − 1); b) the arcs of node i have maximum weight M; c) the neighbouring nodes are not connected to any other node (w_{jk} = 0 for k ≠ i). These conditions are achieved for the hub node in a star network when all the arcs have weight M. We have then
D4(i)≤(n-1)M.
The minimum of the metric is achieved similarly with the dual conditions: a) the node i is connected to a single node; b) the arc from node i has minimum weight m; c) the neighbour to node i is connected to all other nodes with maximum weight (w_{jk} = M for k ≠ i), so that
D4(i)≥m1+(n-2)(Mm)α.
Proportional distinctiveness centrality
This final metric is defined as
D5(i)=∑j=1j≠in1gjαI(wij>0).
This metric just considers the reciprocals of the degrees of the nodes adjacent to i, raised to the power of α. Again, the rationale is that neighbouring poorly connected nodes count more so that the most influential nodes are those connecting poorly connected nodes.
Here we have again a metric made of positive contributions, as for D_{4}, but differently from what happens for the first three metrics. This metric is maximized if we consider a node that is maximally connected, whose neighbouring nodes instead have minimal connectivity (g_{j} = 1). This is the case of the hub in a star network. The upper bound is then:
D5(i)≤n-1.
As to the lower bound, the same arguments lead us to consider a node that has minimal connectivity (g_{i} = 1), with its only neighbour having maximum degree (g_{j} = n − 1). This is what we have for any terminal node in a star network. The lower bound is therefore obtained by considering a single-term summation:
D5(i)≥1(n-1)α.
A toy application example
In this section, for the purpose of illustrating the computation and the specific features of each metric, we employ a 6-node toy network, shown in Fig 1.
10.1371/journal.pone.0233276.g001Toy network.
Table 1 shows the different values of the distinctiveness metrics for α = 1, 2, 5. We have highlighted in red the highest value of each column and in blue the lowest one. We notice that with α > 1 connections to high-degree nodes have a stronger negative impact on centrality, such that their contribution becomes negative for D_{1}, D_{2} and D_{3}. This does not happen in the case of D_{4} and D_{5}, for which each additional arc makes a positive contribution, however small. In the network of Fig 1, for α = 1, nodes C and D have a higher distinctiveness centrality than node E. However, the centrality ranking changes significantly when we increase α. For D_{1}, D_{2}, and D_{3}, nodes C and D become less central than E already at α = 2, due to their connection with B which is the network hub. It is also important to notice that with α = 1 the nodes with the maximum and minimum centrality are the same for all metrics (B and F respectively). However, when α ≥ 1, these rankings change and may disagree with each other.
10.1371/journal.pone.0233276.t001Distinctiveness centrality metrics in the toy network.
Node
D_{1}
D_{2}
D_{3}
D_{4}
D_{5}
A
3.689
0.796
7.256
5.364
1.250
B
5.882
1.893
9.876
6.714
2.500
C
2.184
0.495
4.870
3.935
0.750
D
2.184
0.495
4.870
3.935
0.750
E
1.990
0.398
4.225
3.571
0.500
F
0.485
0.097
2.386
2.273
0.250
(a) α = 1
Node
D_{1}
D_{2}
D_{3}
D_{4}
D_{5}
A
2.485
0.194
6.193
5.216
1.062
B
4.076
0.990
6.055
5.828
1.750
C
-0.526
-0.408
2.698
4.527
0.312
D
-0.526
-0.408
2.698
4.527
0.312
E
0.485
0.097
3.116
4.310
0.250
F
-2.526
-0.505
1.041
3.378
0.062
(b) α = 2
Node
D_{1}
D_{2}
D_{3}
D_{4}
D_{5}
A
-1.128
-1.612
2.248
5.020
1.001
B
-1.342
-1.720
-6.426
5.061
1.094
C
-8.654
-3.118
-5.345
4.969
0.032
D
-8.654
-3.118
-5.345
4.969
0.032
E
-4.031
-0.806
-0.981
4.949
0.031
F
-11.557
-2.311
-3.323
4.851
0.001
(c) α = 5
Table 2 additionally shows the values of some of the most popular centrality metrics—i.e. non-normalized betweenness, closeness [2] and eigenvector centrality [11]. The values of degree, Burt’s constraint and effective size metrics [13, 14] are also reported in the table. All metrics were calculated through the Python Networkx package [21] both for the unweighted network (Table 2a) and the weighted one (Table 2b). In our network, arc weights represent the strength of relationships; inverse weights were used for the computation of network paths where needed.
10.1371/journal.pone.0233276.t002Popular centrality metrics in the toy network.
From a quick comparison of the values reported in the table, we see that the information captured by each metric is different. This is also true for the effective size measure, whose conceptualization is based on the concept of redundancy: an ego has redundancy if its contacts are connected to each other as well. Distinctiveness centrality rankings differ from those obtained through degree, closeness, betweenness, eigenvector centrality, effective size and constraint.
A comparison with established metrics
In order to extend the comparison of distinctiveness centrality with other popular and frequently-used network metrics [2, 9, 11, 14], we generated 1000 random scale-free networks, according to the Barabási–Albert preferential attachment model (with 50 nodes and 2 arcs that are preferentially attached to existing nodes with high degree, when the network grows). We used the Networkx Python package [21]. Weights of existing arcs were assigned through a uniform selection of random integers in the range [1, 20]. As we did in the previous section, we treated arc weights as the strength of relationships.
For each network, we computed the Spearman’s rank correlation coefficients for all pairs of metrics and several values of α, to see how similar their centrality rankings were. Average correlations are shown in the tables provided as Supporting Information (S2 File), for α = 1, 2 and 5.
We see no perfect overlaps (ρ = 1 or ρ = -1), which means that no two metrics are perfectly interchangeable (i.e. redundant). As expected, rankings produced by our metrics are similar to each other, since they are consistent with the same goal (attributing greater relevance to nodes bridging the network periphery). When α = 1, distinctiveness centrality metrics are most correlated with degree and weighted degree. Increasing the value of α leads to a larger penalization of connections towards high-degree nodes so that correlations with the other indicators drop and sometimes also become negative. For example, if α increases from 1 to 2, the average correlations of D_{1}, D_{2} and D_{3} with closeness and eigenvector centrality are nearly halved.
Figs 2 and 3, are more informative and show the average correlations of DC metrics with the other metrics, for more values of α. In all plots, D_{1}, D_{2} and D_{3} have the correlations that decrease more rapidly, quickly reaching negative scores (for the measure of constraint the effect is of course inverted). On the other hand, rankings obtained through D_{4} and D_{5} are the most stable, i.e. they do not change much when α is increased. There are no cases of perfect ranking overlap if we take α ≥ 1 as in the definition of distinctiveness centrality. In general, all correlations seem to stabilize above specific α thresholds.
10.1371/journal.pone.0233276.g002Spearman’s correlation plots of DC with degree, closeness and betweenness.10.1371/journal.pone.0233276.g003Spearman’s correlation plots of DC with eigenvector centrality, constraint and effective size.Directed networks
Distinctiveness centrality can be further generalized to consider directed networks, where not every arc is reciprocated, and weights may differ in dyadic relationships. Similarly to the case of in- and out-degree [9], we can calculate distinctiveness centrality on directed graphs, considering the number and weight of arcs pointing to each node. Accordingly, we indicate with gi+ the out-degree of the generic node i and with gi- its in-degree. We also notice that in directed networks the arc originating at node i and terminating at node j has a weight w_{ij} that is potentially different from that of its reciprocal w_{ji}.
When conceptualizing DC for directed networks, we want to value incoming arcs more if they originate at nodes with low out-degree. Indeed, a connection from a node sending arcs towards all other nodes is considered of little value. We explain this through an example of love-letter writing. Let us consider the case where student A receives a love-letter from student B, who is sending love-letters to all people in the school. The letter sent to A is much less important to A than the case of B sending only one letter (to A). Indeed, B is ‘spamming’ all the network, sending many outgoing arcs, then each of them gives a low contribution to the receiver’s importance. Similarly, we want to value outgoing arcs more when they reach peers with low in-degree. If the arc sent by a node is the only one, or among the few, to reach another node, that arc will be important. To keep going with our example, if student A is receiving a love letter from student B only, this is much more important than the case of A receiving many love letters. Following this logic, we generalize the equations of distinctiveness centrality to the case of directed networks, thus defining in- and out-distinctiveness:
Weighted Distinctiveness Centrality IN and OUTD1-(i)=∑j=1j≠inwjilog10n-1gj+α.D1+(i)=∑j=1j≠inwijlog10n-1gj-α.
Distinctiveness Centrality IN and OUTD2-(i)=∑j=1j≠inlog10n-1gj+αI(wji>0).D2+(i)=∑j=1j≠inlog10n-1gj-αI(wij>0).
Global Weight Distinctiveness Centrality IN and OUTD3-(i)=∑j=1j≠inwjilog10∑k,l=1k≠lnwkl(∑k=1k≠jnwjkα)-wjiα+1.D3+(i)=∑j=1j≠inwijlog10∑k,l=1k≠lnwkl(∑k=1k≠jnwkjα)-wijα+1.
Weighted Proportional Distinctiveness Centrality IN and OUTD4-(i)=∑j=1j≠inwjiwjiα∑k=1k≠jnwjkα.D4+(i)=∑j=1j≠inwijwijα∑k=1k≠jnwkjα.
Proportional Distinctiveness Centrality IN and OUTD5-(i)=∑j=1j≠in1gj+αI(wji>0).D5+(i)=∑j=1j≠in1gj-αI(wij>0).
Fig 4 presents a directed toy network to illustrate the use of the metrics for directed networks. Table 3 shows the values of in- and out-distinctiveness centrality for this network when α = 1 and α = 2. We highlighted the highest value of each column in red and the lowest one in blue. Node B is certainly important due to its outgoing arcs, as it reaches all other nodes in the network, excepting node E. However, if we consider weighted out-degree, node B has the same score as A. Both nodes reach others that would otherwise be isolated (E and F). The fact that B is sending arcs towards other nodes with more incoming connections penalizes its out-distinctiveness score, and makes it less important than A, for D1+, D3+ and D4+. This effect is amplified for larger values of α. On the other hand, A only has an incoming arc of weight 2, originating at B, which is a node with high out-degree. This makes A less important than all other nodes at α = 1, according to D1-, D3-, D4-—and as much important as F according to D2- and D5-. If we consider D1-, D2- and D5- of nodes E and F, we see that F is lower ranked. Both nodes have a single incoming arc of weight equal to 5, but F received this arc from B, which is sending links towards many other nodes, thus giving a less relevant contribution to the in-distinctiveness of F. On the other hand, node E is reached by A, which only has two outgoing arcs.
10.1371/journal.pone.0233276.g004Directed toy network.10.1371/journal.pone.0233276.t003Directed toy network distinctiveness centrality.
Node
D1-
D2-
D3-
D4-
D5-
D1+
D2+
D3+
D4+
D5+
A
0.194
0.097
0.954
0.364
0.250
7.689
1.398
16.248
11.000
2.000
B
2.388
0.398
4.194
3.273
0.500
6.485
2.194
13.488
8.371
3.000
C
3.689
0.796
8.340
5.364
1.250
1.194
0.398
3.000
1.800
0.500
D
2.291
0.796
5.386
3.364
1.250
1.990
0.398
5.000
3.571
0.500
E
1.990
0.398
3.160
2.273
0.50
0.000
0.000
0.000
0.000
0.000
F
0.485
0.097
3.160
2.273
0.250
0.000
0.000
0.000
0.000
0.000
(a) α = 1
Node
D1-
D2-
D3-
D4-
D5-
D1+
D2+
D3+
D4+
D5+
A
-1.010
-0.505
-0.109
0.216
0.062
7.689
1.398
16.248
11.000
2.000
B
0.581
0.097
0.373
3.541
0.250
5.280
1.592
11.418
7.891
2.500
C
2.485
0.194
7.277
5.216
1.062
0.291
0.097
2.334
2.077
0.250
D
1.087
0.194
4.323
3.216
1.062
0.485
0.097
3.891
4.310
0.250
E
0.485
0.097
-0.455
2.049
0.250
0.000
0.000
0.000
0.000
0.000
F
-2.526
-0.505
1.816
3.378
0.062
0.000
0.000
0.000
0.000
0.000
(b) α = 2
Possible applications
Our metrics could have several applications and offer perspectives for future research, including, e.g., the identification of prominent nodes in criminal organizations. These are sometimes organized as groups of semi-independent, or entirely separated small cells, with the absence of large network hubs [22]. In such a scenario, distinctiveness centrality could effectively serve the identification of nodes that keep the network periphery together. Our metrics could also complement information obtained through other approaches. For example, they could be used to test new network fragmentation strategies meant to contain epidemics [23].
In the field of Semantic Network Analysis, Fronzetti Colladon [20] recently presented the Semantic Brand Score (SBS), a measure of brand importance which is computed from the analysis of potentially-big textual data. While it does not fall within the scope of this paper to discuss the construct of brand importance, we maintain that our distinctiveness centrality metric (namely D_{2}) could be considered as an alternative to degree centrality for the measurement of Diversity (one of the components of the SBS). Indeed, we compute the SBS through a network of co-occurring words, where nodes are words appearing in the analysed texts, and links between them are determined by the frequency of their co-occurrences. For example, if the sentence “it is a beautiful day” appears 7 times, the word nodes “beautiful” and “day” will be connected by an arc of weight 7. In this context, the SBS dimension of Diversity counts how many different textual associations exist for each node, and in particular for those nodes that are considered “brands” in the analysis. Diversity was operationalized through degree centrality [2], without penalizing the connections of the brand node to high-degree nodes. In our view, it could be useful to distinguish brands with common textual associations (shared with many other nodes) from brands that have more exclusive relationships with specific words. To this purpose, distinctiveness centrality (D_{2}) could be considered as a reasonable candidate. The idea of adjusting the SBS Diversity metric is also aligned with the logic behind the term frequency—inverse document frequency (TF-IDF) normalization process that is very often used in text analysis [24, 25]. According to Robertson [26] words within a document can be divided in those with eliteness and those without. TF-IDF helps understanding how important a word is to a document, which is part of a corpus. Specifically, we can consider the DTM (Document Term Matrix) of the corpus, where documents make up rows, and words make up the columns in the corpus. This matrix is populated by values that reflect the frequency of appearance of each word in each document. However, frequency is not sufficient to understand the word-importance to a document—as well as Prevalence is not sufficient to define the SBS. There might be words, such as “and”, which add little meaning to the discourse and appear with high frequency in all documents. In order to identify distinctive words, we transform frequency values into TF-IDF values, which increase proportionally to the number of times a word appears in a document and are offset by the number of documents in the corpus that contain that word. This is what D_{2} and our other distinctiveness centrality metrics do: they attribute more importance to the links that more strongly connect a node with low-degree peers; in the case of a word network, strong links to distinctive words are privileged.
In the following, we provide two more examples based on the analysis of two popular real-world networks: the first (Fig 5) is the unweighted network of marital relationships between Florentine families in the 15th century (available on Networkx); the second (Fig 6) is the weighted network of the Zachary’s karate club [27] (downloaded from the accompanying material of the book of Latora and colleagues [28], https://www.complex-networks.net/datasets.html).
10.1371/journal.pone.0233276.g005Florentine families in the 15th century, with colour and size according to <italic>D</italic><sub>1</sub> (<italic>α</italic> = 1).10.1371/journal.pone.0233276.g006Zachary’s karate club network.
We are interested in comparing the rankings obtained through DC and the other metrics considered so far. These are shown in Table 4 for the first network. Here, we use D_{1}, D_{3} and D_{5} as metrics of distinctiveness, because in unweighted networks—were w_{ij} = 1 for all existing arcs—there is no difference between D_{1} and D_{2} and between D_{4} and D_{5}. In the table we used, as an example, two different values of α (α = 1, 2), omitting D3 for α = 2 as this metric (mainly conceived for weighted networks) does not change when α increases, if all arc weights are equal to 1. We can see that the rankings calculated through DC never overlap with the others in the table, proving that our metrics capture different information. The Medici family is ranked first for all metrics (including constraint, for which we have to consider the inverse ranking), and the Guadagni family ranks second (except for closeness and eigenvector centrality). The metric D_{1} (for both values of α) and D_{3} (for α = 1) both rank the Strozzi family third (apart from them, this only happens for the measure of effective size, which however ranks the Strozzi and Albizzi families equally). If we take the first three families together (as ranked by D_{1} and D_{3} for α = 1), we can reach all other nodes in the network with a direct connection, only excluding the more peripheral Pazzi and Ginori. At α = 2 we see that the D_{1} ranking of the Albizzi family is lower, due to its links with the Guadagni’s and Medici’s families that are highly connected.
DG = degree; BTW = betweenness; CLO = closeness; EIG = eigenvector centrality; CON = constraint; ES = effective size.
In the second example, we computed the DC of the 34 members of the Zachary’s karate club [27]. Fig 6 shows their friendship network, based on a two-year observation of their relationships. Arc weights represent the number of different contexts in which two individuals interacted. Due to a conflict between the club administrator (node 0) and the instructor (node 33), the club split into two (with the two partitions represented in the figure with different colours). Table 5 shows the Spearman’s correlation coefficients of distinctiveness centrality (computed for α = 1, 2) with the weighted version of the other metrics. We highlighted in red the highest value of each row and in blue the lowest one. Again, we find no information overlap, and correlations decrease when α increases. Some correlations drop faster than others while increasing α, and this also depends on the network structure. In this case, D_{1} and D_{2} are the measures that exhibit the fastest decrease.
10.1371/journal.pone.0233276.t005Spearman’s correlation coefficients for the Zachary’s karate club network.
Nodes are coloured by partition and their size varies according to D_{2} (α = 1), with bigger nodes indicating higher values of the metric. Thicker arcs indicate stronger relationships.
Discussion and conclusions
The conceptualization of distinctiveness centrality contributes to network theory and introduces a new perspective in network studies. The set of distinctiveness centrality metrics we have presented in this paper could be used in multiple contexts—in all cases where it is important to value the role of nodes connecting low-degree peers. Those nodes have more distinctive connections and are often a bridge to reach the network periphery. We have additionally evaluated the upper and lower bound of each metric.
As shown in the Definition of metrics section, the node influence measured by distinctiveness centrality is different from that measured by degree, weighted degree, closeness, eigenvector, and betweenness centrality, Burt’s [13, 14] constraint and effective size. The information captured by our metrics is different. We found that Spearman’s correlation coefficients of distinctiveness centrality with popular centrality and ego-network metrics decrease as α increases—even reaching high negative correlations in some cases, for high values of α. This property was tested on random scale-free networks [29, 30], where we found no perfect overlap of rankings with degree, closeness, betweenness, eigenvector centrality, effective size and Burt’s constraint [13] (both in their weighted and unweighted versions). Spearman’s correlation coefficients of these metrics with DC were never equal to 1 or -1. When α was bigger than 1, such correlations could become negative. This happened faster for D_{1}, D_{2} and D_{3}. On the other hand, correlations with D_{4} and D_{5} remained more stable and always positive (excepting Burt’s constraint, for which the correlations are inverted).
This paper has the goal of defining distinctiveness centrality and presenting its main properties. We have also discussed possible applications and provided some preliminary examples based on the analysis of well-known real-world networks. However, dedicated research is needed to dig deeper into the possible applications of DC. Indeed, there might be cases in social network analysis where the most important nodes are those that keep together the network periphery, regardless of strong connections with hubs. Analysts could be interested in assessing how exclusive are the connections of some nodes, like in the case of sending and receiving love messages, mentioned earlier in the paper. Even if not within the scope of this paper, we imagine distinctiveness centrality could serve the identification of social actors with many peripheral connections in sparse local communities, that however have no strong relationships with central authorities. Reaching these actors could help strengthen the relationship of local clusters with central authorities—for example, for goals of social inclusion, or to plan interventions to reduce substance abuse [31, 32]. Similarly, individuals with high distinctiveness could be local leaders in covert networks [33], and our metrics could potentially support their identification. These are just some out of many possible hypotheses that could be tested in future studies.
Future research could further explore the properties of our newly defined centrality indicators on network topologies other than scale-free and using different arc weighting approaches. For example, core-periphery structures [34] could be considered, to see how the nodes in the densely connected core are ranked, taking into account that DC penalizes connections with highly-connected peers. The scores and rankings produced by DC metrics could be compared with those obtained through other centrality measures, also considering directed graphs—for example, by comparing with the measures of hub and authority [35].
In order to facilitate the calculation of distinctiveness centrality, we have created a Python package that is freely available at this link https://pypi.org/project/distinctiveness/. We have uploaded its open-source code onto GitHub, with examples and tutorials (https://github.com/iandreafc/distinctiveness). In the future, we plan to provide more free resources, for example, packages written using other programming languages.
Networks generated using the Python Networkx package [21], according to the procedure presented in the section named “A comparison with established metrics”. Files are in the Gexf format.
(ZIP)
Correlations of distinctiveness centrality with well-known centrality and ego-network metrics, calculated on the <xref ref-type="supplementary-material" rid="pone.0233276.s001">S1 File</xref>.
(PDF)
ReferencesKivimäkiI, LebichotB, SaramäkiJ, SaerensM. Two betweenness centrality measures based on Randomized Shortest Paths. FreemanLC. Centrality in social networks conceptual clarification. OldhamS, FulcherB, ParkesL, ArnatkeviciuteA, SuoC, FornitoA. Consistency and differences between centrality measures across distinct classes of networks. CandeloroL, SaviniL, ConteA. A New Weighted Degree Centrality Measure: The Application in an Animal Disease Epidemic. JoyceKE, LaurientiPJ, BurdetteJH, HayasakaS. A New Measure of Centrality for Brain Networks. PiraveenanM, ProkopenkoM, HossainL. Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks. QiX, FullerE, WuQ, WuY, ZhangCQ. Laplacian centrality: A new centrality measure for weighted networks. BonacichP. Power and centrality: A family of measures. WassermanS, FaustK. BatoolK, NiaziMA. Towards a Methodology for Validation of Centrality Measures in Complex Networks. BonacichP. Some unique properties of eigenvector centrality. HansenDL, ShneidermanB, SmithMA, HimelboimI. Social network analysis: measuring, mapping, and modeling collections of connections. In: “BurtRS. BurtRS. Structural holes and good ideas. BorgattiSP. Structural holes: Unpacking Burt’s redundancy measures. DevkotaP, DanziMC, WuchtyS. Beyond degree and betweenness centrality: Alternative topological measures to predict viral targets. YangJ, ChenY. Fast Computing Betweenness Centrality with Virtual Nodes on Large Sparse Networks. BrandesU. A faster algorithm for betweenness centrality. EvertS. Fronzetti ColladonA. The semantic brand score. Hagberg A, Swart P, S Chult D. Exploring network structure, dynamics, and function using NetworkX. Los Alamos National Lab.(LANL), Los Alamos, NM (United States); 2008.JacksonBA. Terrorist cells. In: ChamiGF, AhnertSE, KabatereineNB, TukahebwaEM. Social network fragmentation and community health. Spärck JonesK. IDF term weighting and IR research lessons. Spärck JonesK. A statistical interpretation of term specificity and its application in retrieval. RobertsonS. Understanding inverse document frequency: On theoretical arguments for IDF. ZacharyWW. An information flow model for conflict and fission in small groups. LatoraV, NicosiaV, RussoG. BarabásiAL, AlbertR. Emergence of scaling in random networks. BarabásiAL, BonabeauE. Scale-free networks. ValenteTW, Ritt-OlsonA, StacyA, UngerJB, OkamotoJ, SussmanS. Peer acceleration: effects of a social network tailored substance abuse prevention program among high-risk adolescents. McCradyBS. To have but one true friend: implications for practice of research on alcohol use disorders and social network. CarleyKM. Destabilization of covert networks. BorgattiSP, EverettMG. Models of core/periphery structures. KleinbergJM. Hubs, authorities, and communities.