^{1}

^{2}

^{3}

^{4}

^{5}

^{1}

^{5}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: DFD SDJP. Performed the experiments: DFD. Analyzed the data: DFD TRM SDJP. Contributed reagents/materials/analysis tools: SPB MHH GSK SDJP. Wrote the paper: DFD SDJP.

The Brazilian population was formed by extensive admixture of three different ancestral roots: Amerindians, Europeans and Africans. Our previous work has shown that at an individual level, ancestry, as estimated using molecular markers, was a poor predictor of color in Brazilians. We now investigate if SNPs known to be associated with human skin pigmentation can be used to predict color in Brazilians. For that, we studied the association of fifteen SNPs, previously known to be linked with skin color, in 243 unrelated Brazilian individuals self-identified as White, Browns or Blacks from Rio de Janeiro and 212 unrelated Brazilian individuals self-identified as White or Blacks from São Paulo. The significance of association of SNP genotypes with self-assessed color was evaluated using partial regression analysis. After controlling for ancestry estimates as covariates, only four SNPs remained significantly associated with skin pigmentation: rs1426654 and rs2555364 within

Brazilians form one of the most heterogeneous populations in the world, the result of five centuries of interethnic crosses of peoples from three continents: the European colonizers, the African slaves, and the autochthonous Amerindians. The relative proportion of these three ancestral roots in the makeup of the Brazilian population has changed considerably along time. After more than 100 years of heavy European immigration beginning in the second half of the 19^{th} Century, all regions of Brazil now show a preponderance of European ancestry, with proportions ranging from 60.6% in the Northeast to 77.7% in the South

In Brazil, color (in Portuguese,

Our previous studies

Our main target population was composed of 243 unrelated Brazilian individuals from the city of Rio de Janeiro, self-evaluated as Whites (n = 82), Browns (n = 80) or Blacks (n = 81), according to the census criteria of IBGE. As mentioned in the Introduction, our previous studies

To achieve that, as previously done

Each point represents a separate individual and the ancestral proportions can be determined by dropping lines parallel to each of the three axes. The graphs were drawn using the

To provide a clearer perspective, we also display in ^{th} percentile – 25^{th} percentile) of 0.336; Browns had a median of 0.803 and IQR of 0.353; Blacks had a median of 0.119 and IQR of 0.210. From inspection of these images it is evident that the color groups had very wide variance and that there was very significant overlap between them, making it impossible to confidently predict color from ancestry at an individual level, the distinction between Whites and Browns being especially difficult. On the other hand, if we assign to the White, Brown and Black categories numerical values 0.0, 0.5 and 1.0, respectively, we obtain a Spearman's rank correlation rho value of −0.557, which is significant (P<0.0001).

We next genotyped all 243 unrelated individuals from the city of Rio de Janeiro, for the 15 SNPs known to be associated with pigmentation of skin, eyes and/or hair (

Rio de Janeiro | São Paulo | |||

RefSNP | NR | CV | NR | CV |

rs26722 | 1.000E+00 | 1.000E+00 | - | - |

rs642742 | 1.610E-01 | 1.000E+00 | 7.851E-19^{*} |
1.562E-04^{*} |

rs1015362 | 8.446E-03^{*} |
1.000E+00 | 4.773E-08^{*} |
1.657E-02^{*} |

rs1042602 | 3.268E-06^{*} |
3.606E-02^{*} |
4.389E-12^{*} |
5.950E-04^{*} |

rs1126809 | 1.000E+00 | 1.000E+00 | - | - |

rs1408799 | 1.943E-01 | 1.000E+00 | - | - |

rs1426654 | 4.032E-20^{*} |
2.080E-09^{*} |
2.350E-46^{*} |
9.399E-20^{*} |

rs1800401 | 1.000E+00 | 1.000E+00 | - | - |

rs1800407 | 1.000E+00 | 1.000E+00 | - | - |

rs2555234 | 6.124E-13^{*} |
6.660E-07^{*} |
3.024E-22^{*} |
8.123E-08^{*} |

rs2733832 | 3.187E-02^{*} |
8.883E-01 | 1.081E-08^{*} |
1.042E-03^{*} |

rs6058017 | 2.874E-05^{*} |
5.349E-02 | 1.365E-10^{*} |
4.543E-02^{*} |

rs12896399 | 5.239E-02 | 1.000E+00 | 4.712E-02^{*} |
1.000E+00 |

rs12913832 | 1.038E-01 | 1.000E+00 | 1.269E-07^{*} |
4.879E-03^{*} |

rs16891982 | 4.511E-17^{*} |
3.460E-09^{*} |
3.550E-37^{*} |
6.787E-17^{*} |

Amerindian ancestry | 4.612E-04^{*} |
- | 1.223E-02^{*} |
- |

African ancestry | 2.588E-13^{*} |
- | 1.761E-37^{*} |
- |

European ancestry | 8.234E-17^{*} |
- | 1.757E-41^{*} |
- |

To perform a statistical association analysis using numeric regression, we then converted, as above, the self-classified color (White, Brown, and Black) into the following numeric values: 0.0, 0.5 and 1.0 respectively. In this fashion, we could use the Golden Helix SVS7 software to perform a numeric regression of phenotypes on genotypes, which had also been converted to numeric values under three models: co-dominant (additive), dominant and recessive models. The additive model presented the lowest P values (Table S3 in

To eliminate ancestry confounding we used the Golden Helix SVS7 software to perform a partial regression analysis using European, African and Amerindian ancestry estimates as covariates. After applying that ancestry control, only four SNPs remained with significant association: rs1426654 and rs2555364 within locus

To ascertain whether these four loci could be confirmed as the most significant in a different Brazilian population, we evaluated 212 unrelated individuals from the city São Paulo, which, as Rio de Janeiro, is also located in Southeastern Brazil. However, this sample differed from the one from Rio de Janeiro in that it was made up only of individuals self-evaluated as Whites (n = 106) or Blacks (n = 106), thus missing individuals self-classified as Browns. Moreover, individuals from São Paulo were only tested for ten SNPs from the 15 SNPs originally tested in Rio de Janeiro, but included, of course, all seven found to be significantly associated with self-classified color on our full-model numeric regression.

To be able to perform numerical analysis we then converted the self-classified color (White and Black) into the numeric values 0.0 and 1.0 respectively, and used the Golden Helix SVS7 software to regress color phenotypes on genotypes, also converted to numeric values, under the co-dominant (additive) model. After application of the Bonferroni correction for multiple comparisons, we observed that, as shown in

Since the proportions of European, African and Amerindian ancestries also showed significant associations with color, we again proceeded to control for biogeographical ancestry confounding and obtain an estimate of the importance of self-identified color alone using a partial regression analysis using the ancestry estimates as covariates. After that, nine SNPs remained with significant association at the 0.05 level after the Bonferroni correction (

The SNPs rs2555364 and rs1426654 are on positions 48,419,386 and 48,426,484 on chromosome 15, only 7,098 base pairs apart. Hence, they are expected to be in linkage disequilibrium. Indeed, our analysis of the data using the web tool CUBEX

We then used the rs2555364-rs1426654 haplotypes within locus

Each thin vertical line represents one individual (243 in total). Vertical black lines separate the individuals into three different self-categorized skin color, identified by the labels on the bottom. Ten

The dot-plot was prepared with the MedCalc software

From the previous results, emerges a picture in which neither biogeographical ancestry nor the pigmentation index (PI) calculated from the genotypes of rs1426654, rs16891982, rs2555364, rs1042602 were capable of predicting well the self-assessed color group of individuals from Rio de Janeiro, although the second appeared to resolve better the three groups. On the other hand, our results using partial regression demonstrated that the association of rs2555364-rs1426654 haplotypes, rs16891982 and rs1042602 with self-assessed color was, at least in part, independent of ancestry estimates. This statistical independence was made evident by the differences between the distribution of ancestry (

We tested this hypothesis experimentally using Principal Component Analysis (PCA) with three variables: Pigmentation Index (PI), African ancestry and Amerindian ancestry. The bi-plot of principal component 1 (PC1) vs. principal component 2 (PC2) is shown in

Each point represents one individual, self-assessed as Whites (red triangles), Browns (blue dots) and Blacks (black asterisks). We ran the Principal Component Analysis (PCA) using three variables: Pigmentation Index (PI), African ancestry and Amerindian ancestry. The PCA and the plot were done using R program, v. 3.0.0

On the basis of association studies present in the literature, we chose several genetic variants previously associated with differences in skin, eye and hair pigmentation among individuals from different parts of the world

The regression results were consistent with those of Pneuman et al. (2012) who showed that a set of SNPs containing rs12913832, rs16891982, and rs1426654 was associated with skin color in various populations around the world. Similarly, the SNPs rs1426654 and rs16891982 are known to be related to hair, skin, and eye color of North Americans

Earlier this year, Beleza et al

The solute carrier family 45 member 2 (

Tyrosinase, encoded by the

We tried to assess whether the genotypes of the four SNPs that were significantly associated with color in the sample from Rio de Janeiro, could be used to make phenotypic predictions about which self-assessed color groups an individual belonged to. For that, we used the rs2555364-rs1426654 haplotypes, the genotypes at rs16891982 and rs1042602 genotypes and the graphical output of Structure software

Thus, neither biogeographical ancestry nor the pigmentation index (PI) calculated from the genotypes of rs1426654, rs16891982, rs2555364, rs1042602 were capable of predicting well the self-assessed choice of color group of individuals from Rio de Janeiro. Since such genotypes had been shown to be, at least in part, statistically independent of ancestry estimates, we tried to ascertain whether they together could generate a better prediction of self-assessed color, by using Principal Component Analysis (PCA) with three variables: Pigmentation index (PI), African ancestry and Amerindian ancestry (

In spite of the incomplete resolution of the three color categories, inspection of the PC1-PC2 biplot shows interesting features. From left to right in

The self-attribution of color is complex. It is influenced by the skin pigmentation, but also by other characteristics such as hair and eye pigmentation, facial features and family history, as well as extraneous factors that may range from sunlight exposure (Pena et al, 2011) to income level, social class and schooling

Petrucelli (2007) notes that the Brown category has the additional complication of apparently designating a residual category in the racial classification system. Inside the Brown category, he distinguishes at least three categories: first, a group that has a phenotype that is perceived to be of African origin; secondly, a group that can be identified as predominantly of Amerindian descent and thirdly, a group that expresses an adhesion to a specific historical-geographical condition and does not actually constitutes a proper ethnic identification in the sense of physical appearance (Petrucelli, 2007). Thus, the Brown category poses an intrinsic classification problem. On the other hand, it is a quite important category, been chosen by 42% of Brazilians (

Under the light of the above, it might be argued,

Another possibility might be the use of skin reflectance levels to measure the degree of pigmentation, as was used in the recent paper by Beleza et al

Forensic scientists have been discussing the possibility of using genotypes at “color loci” to predict the pigmentation phenotypic features of perpetrators of felonies using DNA left in crime scenes

In conclusion, in this study we could observe significant association of self-assessed color categories in Brazilians with genotypes at three genes

The Research Ethics Committee of the Instituto Nacional do Câncer (INCA) approved in 2005 the protocol of this study, as part of a pharmacogenetic project, as well as the written informed consent form. In 2008 the same ethics Committee approved the enlargement of the study to its present format and carried forward the approval of the written informed consent form. The samples were anonymized after collection. Some of the DNA samples of the present study were analyzed in previous publication

The use of the samples from the Laboratory of Genetics and Molecular Hematology of the Faculdade de Medicina da Universidade de São Paulo in 1997 and 2004, respectively, received the approvals CAPPesq 173/1997 and CAPPesq 543/2004 including the written Informed Consent form. The samples also were anonymized after collection. The individuals of the present study are a subset of a larger sample described in two previous studies

We studied 455 unrelated Brazilians from two large cities (Rio de Janeiro and São Paulo) in the Southeast of Brazil as described in detail below. Color assignation was obtained by self-assessment in answer to the closed question “What is your color/race?” as done in the Brazilian census by the Instituto Brasileiro de Geografia e Estatística (IBGE). All subjects of this study described themselves as White, Brown or Black (in Portuguese, respectively, “Branco”, “Pardo” and “Preto”). These three color categories encompass 99.1% of the Brazilian population. No subjects in the study were self-classified as Indian (“Indígena”), Yellow (“Amarelo”) or did not declare a color (“Sem declaração”).

The INCA sample was made up of 243 unrelated, healthy individuals, all collected from blood donors, personnel and research students at the Instituto Nacional do Cancer (INCA). The enrolled individuals, were randomly chosen from within each color category, were self-identified as Whites (n = 82), Browns (n = 80) or Blacks (n = 81).

The São Paulo sample was made up of 212 unrelated healthy volunteer blood donors of the city of São Paulo collected as previously described by Bydlowski et al

After analysis of human genetic variants associated with pigmentation of skin or eyes and tanning response, we selected for this study 15 SNPs, within nine different loci: rs1015362 (

The chosen SNPs were genotyped using the real-time PCR

The classification accuracy of each TaqMan assay was validated by cycle sequencing (forward and reverse) of PCR fragments containing the polymorphisms studied using the DYEnamicTM ET Dye Terminator Kit, (GE Healthcare) standard procedure and a MegaBACE™ 1000 sequencer (GE Healthcare). After the run in the MegaBACE sequencer, the electrofluorograms were visualized using

To estimate the relative proportion of Amerindian, European, and Sub-Saharan African ancestry for each sample from São Paulo we genotyped each sample using the following panel of 40-biallelic short insertion/deletion polymorphisms (indels): MID-1 (rs3917), MID-15 (rs4181), MID-17 (rs4183), MID-51 (rs16343), MID-89 (rs16381), MID-107 (rs16394), MID-131 (rs16415), MID-132 (rs16416), MID-150 (rs16430), MID-159 (rs16438), MID-170 (rs16448), MID-258 (rs16695), MID-278 (rs16715), MID-420 (rs140709), MID-444 (rs140733), MID-468 (rs140757), MID-470 (rs140759), MID-663 (rs1305047), MID-788 (rs1610874), MID-857 (rs1610942), MID-914 (rs1610997), MID-918 (rs1611001), MID-1002 (rs1611084), MID-1092 (rs2067180), MID-1100 (rs2067188), MID-1129 (rs2067217), MID-1291 (rs2067373), MID-1352 (rs2307548), MID-1428 (rs2307624), MID-1537 (rs2307733), MID-1549 (rs2307745), MID-1586 (rs2307782), MID-1642 (rs2307838), MID-1654 (rs2307850), MID-1759 (rs2307955), MID-1763 (rs2307959), MID-1847 (rs2308043), MID-1861 (rs2308057), MID-1943 (rs2308135) and MID-1952 (rs2308144). In this list, The MID number relates to the nomenclature of Weber et al.

This set of 40 indels had been previously validated as useful in ancestry estimation through the study of the HGDP-CEPH Diversity Panel, which is composed of 1,064 individuals from 52 different worldwide populations distributed in seven geographical regions

To estimate the ancestry proportions from the indel genotyping results we used the

The relative proportions of Amerindian, European and Sub-Saharan African biogeographical ancestries of the samples from Rio de Janeiro had previously been estimated

The regression analysis was performed to study the association of each polymorphism with self-classified skin color categories, the phenotype was numerically fixed as dependent variable, the genotypic data were fixed as independent variables and the ancestry values were fixed as covariates. Initially we made a numerical linear regression using Golden Helix SNP and Variation Suite software (SVS Version 7.2.2, (Golden Helix Inc., Bozeman, MT, USA). In addition, a partial regression model (from the same package) called Reduced x Full model was used with a type I error α of 5%, considering the Bonferroni correction. We introduced individual ancestry as a covariate in this last model in order to control spurious associations that may be the result of differences in the ancestral proportions (admixture). When the results (P values) of the two models were compared to determine which loci remain associated with the phenotype even after the correction for covariates (ancestry). For all SNPs numerical regression analyses were performed considering three genetic models: additive, dominant and recessive. The statistical significance was defined as p<0.05. P values were corrected by the Bonferroni adjustment for multiple comparisons tests.

(DOC)

We are grateful to Neusa Antunes Rodrigues, who provided expert technical assistance and Fabrício Rodrigues dos Santos, whose help in real-time PCR was instrumental in this project. This work was supported by grants from the following Brazilian agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Comissão de Aperfeiçoamento de Pessoal de Nível Superior (Capes), and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).