The Connectedness to Nature Scale has been used in many different countries and settings. However, no one has yet tested the equivalence of these measures. Equivalence of measures has been the subject of much research in recent years, due to the importance of measuring in the same way when comparing between different groups. The present work studied the differential item functioning (DIF) of the CNS in a Spanish group and a North American group of respondents, using two different methods of detecting DIF. It also evaluated the overall equivalence of the scale. The results reveal differential functioning in most items, and only configural invariance is given. Thus, we suggest a reappraisal of the scale when comparing results from different countries since otherwise the conclusions drawn might be incorrect.

In Western culture nature plays an increasingly less important role in people’s existence. More and more people live in cities, and also spend more time in climate-controlled buildings rather than outdoors. However, nature continues to have an intrinsically positive value for people [

Much research has been conducted in recent years on individuals’ connection to nature, which has in turn been linked to pro-environmental concern and behavior. These works can be grouped together under the perspective of environmental connectedness [

Many methods for measuring nature connectedness exist (see [

Equivalence of measures is a topic that has acquired a great deal of significance in research. In research based on multi-group comparisons, it is assumed that the instrument generally works in exactly the same way in different groups and that the construct of interest has the same structure. However, this is rarely demonstrated [

A test or scale presents equivalence (or invariance) of measures in various groups if respondents with the same score on the latent trait have the same expected score on the item, on the total test score, or on both [

The validity of comparisons of scores obtained in different countries or in different cultures is vital in applied and cultural psychology [

Quantitative assessment of equivalence has been approached in a variety of ways. Most research on

The use of polytomous items requires reconsidering some of the psychometric procedures created specifically for dichotomous items [

There are two types of procedures for the detection of differential functioning; those based on the observed score and those based on the latent variable [

The major aim of the present work is to determine whether there exists an equivalence of measures in the CNS between two groups, one from the United States and one from Spain. A second aim is to compare different methods in the detection of differential functioning in empirical samples, since most of the previous comparisons have been conducted in simulated samples (e.g., [

The first of the methods, the Generalized Mantel-Haenszel test [

It is hypothesized that there is no differential item functioning between the American and Spanish groups, and that, therefore, the two versions are equivalent.

As the reference group, we used a sample of 361 American individuals with a mean age of 31.29 years (SD = 17.06). This group was taken from the studies by Mayer and Frantz, authors of the original CNS [

As a comparison group, we used data from different studies in which the Connectedness to Nature scale was administered in Spanish to students of psychology [

To match the sample size of both groups, a random sample of 384 cases from the Spanish group was selected. Therefore, the total sample used in the comparison of the groups consisted of 745 cases.

We used the CNS [

The study design is a cross-sectional survey, which allows to collect the information on the variables of interest through the CNS. The two groups used, Spanish and North American, are incidental samples, meaning they have been selected because of their availability.

First, to test the mean equality hypothesis between the Spanish group and the American group, a Mann-Whitney U test for independent samples was performed. Subsequently, the dimensionality of the scale was analyzed, using the FACTOR 9.3 program [

In order to meet the aim of detecting the differential functioning in the items and to verify the hypothesis of the present work, three procedures were used. The first is the generalized Mantel-Haenszel statistic for ordinal response variables, QMH (2), which allows to contrast the null hypothesis of non-association between variables, being rejected if the value of the statistic _{MH(2)}) < 0.05. The rejection of the null hypothesis for an item implies it exhibits DIF. This analysis was conducted using the GMHDIF program [

Logistic regression was conducted using proportional reasoning on the items that complied with the parallel lines assumption, and partial proportional reasoning for the remaining items. We followed the criterion of Swaminathan and Rogers [

Finally, to determine whether the questionnaire is equivalent in the two languages, the invariance of the test in English and Spanish was analyzed, examining three aspects: 1) dimensionality; 2) measurement pattern and 3) error variance. This analysis was implemented using the LISREL 8 program [

The level of significance used in the analyses was .05. Bonferroni correction was used in the DIF analysis.

The homogeneity hypothesis of the variances of the total score of the scale was tested in both groups (American and Spanish), using the F test. The results lead us to reject the homogeneity hypothesis (F (1.743) = 12.81, p < .005).

The Mann-Whitney U test for independent samples, to test the equality hypothesis of the total mean scores on the scale was not statistically significant (Z = -.392, p > .05). Thus, it was concluded that the US and Spanish averages are not statistically different, that is, both groups have the same average score in Connectedness to Nature.

The CNS was created as a unidimensional measure. However, some authors have indicated that more than one component could be measured (e.g. [

In the differential functioning analysis of the items using the generalized Mantel Haenszel method (MH), we used four strata formed from participants’ total scores on the scale [

Step 1 | Step 2 | |||||
---|---|---|---|---|---|---|

Item | Q_{GMH(2)}(χ^{2}) |
Df | Sig. | Q_{GMH(2)}(χ^{2}) |
df | Sig. |

1 | 0.515 | 1 | 0.473 | 0.001 | 1 | 0.975 |

2 | 3.990 | 1 | 0.046 | 6.116 | 1 | 0.013 |

3 | 3.990 | 1 | 0.046 | 3.065 | 1 | 0.080 |

4 | 22.313 | 1 | 0.000 |
20.382 | 1 | 0.000 |

5 | 17.277 | 1 | 0.000 |
15.013 | 1 | 0.000 |

6 | 39.809 | 1 | 0.000 |
38.992 | 1 | 0.000 |

7 | 0.375 | 1 | 0.540 | 0.855 | 1 | 0.355 |

8 | 69.913 | 1 | 0.000 |
54.464 | 1 | 0.000 |

9 | 43.902 | 1 | 0.000 |
33.733 | 1 | 0.000 |

10 | 4.417 | 1 | 0.036 | 6.776 | 1 | 0.009 |

11 | 8.598 | 1 | 0.003 |
7.994 | 1 | 0.005 |

12 | 12.273 | 1 | 0.001 |
12.874 | 1 | 0.000 |

13 | 1.089 | 1 | 0.297 | 2.285 | 1 | 0.131 |

*p<0.005

Applying the Bonferroni correction, items 4, 5, 6, 8, 9, 11 and 12 present differential functioning between the two languages (see

However, if we took a significance level of .05, the number of items with DIF would increase to 9.

In differential item functioning analysis using ordinal logistic regression (OLR), three models were fitted, from the most parsimonious to the most complex. Model 1 includes a single explanatory variable, the total score on the scale. Model 2 includes, in addition to the total score, the group effect (Spanish and American). Finally, model 3 also includes the interaction effect between both coefficients.

Prior to examining the fit of the models, compliance with the proportionality (parallelism) assumption was analyzed, since, depending on compliance, a proportional or non-proportional odds model should be adjusted. Only items 3, 6 and 8 complied with this assumption. Thus, for the remaining items a partial proportional odds model was fitted. After this verification, DIF analysis was performed, comparing the three models for each item.

Results are shown in

Item | Model | RV (χ^{2}) |
df | Difference in RV (χ^{2}) |
Dif. df |
---|---|---|---|---|---|

1 | M1 | 407.706 | 4 | ||

M2 | 408.216 | 5 | 0.510 | 1 | |

M3 | 415.441 | 9 | 7.224 | 4 | |

2 | M1 | 425.968 | 1 | ||

M2 | 446.896 | 5 | 20.927 |
4 | |

M3 | 449.566 | 9 | 2.670 | 4 | |

3 | M1 | 264.568 | 1 | ||

M2 | 268.101 | 2 | 3.533 | 1 | |

M3 | 268.891 | 3 | 0.790 | 1 | |

4 | M1 | 224.168 | 4 | ||

M2 | 249.037 | 5 | 24.869 |
3 | |

M3 | 254.048 | 9 | 5.011 | 4 | |

5 | M1 | 294.666 | 1 | ||

M2 | 339.225 | 5 | 44.559 |
4 | |

M3 | 342.922 | 9 | 3.696 | 4 | |

6 | M1 | 377.345 | 1 | ||

M2 | 426.616 | 2 | 49.271 |
1 | |

M3 | 427.122 | 3 | 0.506 | 1 | |

7 | M1 | 319.328 | 1 | ||

M2 | 341.165 | 5 | 21.837 |
4 | |

M3 | 363.882 | 9 | 22.717 |
4 | |

8 | M1 | 194.534 | 1 | ||

M2 | 273.047 | 2 | 78.513 |
1 | |

M3 | 275.278 | 3 | 2.231 | 1 | |

9 | M1 | 426.567 | 4 | ||

M2 | 476.046 | 5 | 49.479 |
3 | |

M3 | 482.493 | 9 | 6.447 | 4 | |

10 | M1 | 389.938 | 4 | ||

M2 | 428.847 | 8 | 38.910 |
4 | |

M3 | 440.871 | 12 | 12.023 |
4 | |

11 | M1 | 480.971 | 4 | ||

M2 | 491.066 | 5 | 10.095 |
3 | |

M3 | 501.552 | 9 | 10.486 |
4 | |

12 | M1 | 147.312 | 1 | ||

M2 | 180.487 | 5 | 33.175 |
4 | |

M3 | 190.054 | 9 | 9.568 |
4 | |

13 | M1 | 162.549 | 1 | ||

M2 | 188.751 | 5 | 26.202 |
4 | |

M3 | 191.414 | 9 | 2.663 | 4 |

*p<0.05;

****p<0.001

From the data in

To complete the above analyses, and in order to verify whether measurement equivalence can be established for the scale as a whole, a multi-group CFA was carried out.

The same configuration was seen to exist in both groups (^{2} (130) = 379.07; RMSEA = 0.05; CFI = 0.89), so we proceeded with the analysis of metric invariance, that is, the equality of factorial weights. When fitting the model in which the equality hypothesis of factorial weights is compared across the both groups, the ΔRSMEA and ΔCFI are less than 0.01, and a value of χ^{2} (143) = 436.06 are obtained. However, if the Chi-Square statistics of both models are compared -the first model without restrictions in the weights and the second in which equality is required-, a statistically significant value of Δχ^{2}(13) = 56.99 (p < .05) is obtained, and Δχ^{2}/df is higher than 3, so the invariance of the measures cannot be assumed. In other words, the factorial weights differ between groups.

Configural invariance | Metric invariance | Scalar invariance | ||||
---|---|---|---|---|---|---|

CNS | RMSEA | .05 | ΔRMSEA | .00 | ΔRMSEA | .00 |

CFI | .89 | ΔCFI | -.01 | ΔCFI | -.01 | |

χ^{2} |
379.07 | Δχ^{2} |
56.99 |
|||

χ^{2}/df |
2.91 | Δχ^{2}/df |
4.38 |

*p < .05

After determining there was no equivalence in the measures, it was impossible to continue the study process imposing other restrictions.

The study of measure equivalence is of great practical utility, since scales and tests are continually administered to very diverse groups of individuals. Indeed, it is often necessary to translate the instruments, as is the case with the CNS by Mayer and Frantz [

Using the generalized Mantel-Haenszel method, we found that 7 of the 13 items comprising the scale show differential functioning. However, this statistic does not allow us to distinguish whether it is uniform or not. This number is high, accounting for more than half the items on the scale. Therefore, according to the results obtained from this statistic, it would be necessary to reduce the scale to 6 items for an adequate comparison between the data obtained on the scale applied to the American sample and that applied to the Spanish sample. However, reliability would decrease from an alpha of .811 to an alpha of .672.

The analyses conducted using the LR method show that 11 of the 13 items present uniform differential functioning between the samples, which means that the probability of answering a determined category is greater for one group than the other across all trait levels. None of the items present non-uniform differential functioning.

Nonetheless, despite the items exhibiting differential functioning, there might exist some compensation across items at scale level, resulting in equivalence of measures when applying the scale as a whole. To verify this, we analyzed invariance using CFA. The levels of the invariance analysis show that, although both groups present the same configuration, it cannot be assumed that the factorial weights are the same, that is, there is no metric invariance. Therefore, after comparing the equivalence of the measures from the three previous methods, it can be concluded that the results obtained on the CNS in the American (CNS in English) and Spanish (CNS in Spanish) samples are not comparable, since the versions are not equivalent. Consequently, this instrument cannot be used for comparative studies between these two groups, since a particular score in one group is not necessarily equivalent to the same score in the other group. Nevertheless, the scale can still be used, provided groups are not compared with each other. It is possible that invariance of measures could be found between a different pair of groups using the same scales. However, this could only be determined by analyzing the equivalence of measurements.

This approach supports the results obtained by Davidov and De Beuckelaer [

Construct bias is the most common form of bias, denoting that the underlying theoretical concept itself has a different meaning for different groups [

By measuring connectedness in different contexts, the meaning of connectedness could be different and, consequently, an assessment of the scale would be necessary, that is, carry out an equalization of scores in the different countries. For this, a new study would have to be carried out, in which different scores obtained in both groups were compared, for which large samples from different countries would be needed.

Moreover, the characteristics of the samples used should be taken into account. The data of the American group was published in 2004, while the Spanish data was taken from studies published on different dates up to 2014. The conception of nature and connectedness is different in different spatial and/or cultural contexts, as has been shown in this paper, but it may also change over time. Thus, it would be advisable to study whether differences in the concept arose for the same groups at different moments in time.

Therefore, it would be desirable that whenever the results obtained in different countries are compared, they should be interpreted with caution, since it is not possible to decide in this work whether the differences are attributable to the different social context or to the different moment of data collection, or to an interaction of both aspects.

Thanks to Pablo Olivos, Maria Luisa Lima, Ana Loureiro, Oscar Navarro, María Amérigo and Stephan Mayer for sending us data files for their Studies.

Thanks to Birgitta Gatersleben for her advice.