====================================================== == gnomAD_STR_genotypes__[datestamp].tsv.gz == ====================================================== This table contains one row per ExpansionHunter genotype for the 77 TR loci included in the March 2025 update of the gnomAD TR dataset. It includes Population, Sex, and Age metadata columns, as well as a ReadvizFilename column which allows users to derive the url of the REViewer read visualization image for each genotype. Below are the column names and values from a typical row in the table: Id : PABPN1 LocusId : PABPN1 ReferenceRegion : chr14:23321472-23321490 Chrom : chr14 Start_0based : 23321472 End : 23321490 Motif : GCG IsAdjacentRepeat : False Population : sas Sex : XX Age : age_not_available PcrProtocol : pcr_free Genotype : 6/13 Allele1 : 6 Allele2 : 13 GenotypeConfidenceInterval : 6-6/13-13 Filter : PASS Q : 1.0 ManualReviewGenotypeQuality : medium-Low ReadvizFilename : f3fbf261c10b9028628a57290f600605f7f.PABPN1.svg.gz PublicProjectId : Human Genome Diversity Project PublicSampleId : HGDP00356 Id: This id is unique to each TR locus and repeat, meaning that it differs between repeats and any adjacent repeats at a locus. For example the main GAA repeat at the FXN locus has id "FXN" while the adjacent poly-A repeat has id "FXN_A". This id corresponds to the "VariantId" field in the ExpansionHunter variant catalogs @ https://github.com/broadinstitute/str-analysis/tree/main/str_analysis/variant_catalogs LocusId: This id is unique to each TR locus. It corresponds to the "LocusId" field in the ExpansionHunter variant catalogs @ https://github.com/broadinstitute/str-analysis/tree/main/str_analysis/variant_catalogs For most loci, the LocusId matchews the name of the gene that contains the locus. ReferenceRegion: Genomic interval delineating the exact boundaries of the TR repeat in the reference genome. The start coordinate is 0-based. Chrom: The chromosome of the ReferenceRegion. This is provided as a separate column for convenience. Start_0based: The 0-based start coordinate of the ReferenceRegion. This is provided as a separate column for convenience. End: The end coordinate of the ReferenceRegion. This is provided as a separate column for convenience. Motif: The repeat unit of the TR locus. For example this would be "GAA" at the FXN locus. IsAdjacentRepeat: True or False depending on whether this row represents the main repeat at a locus or an adjacent repeat. Adjacent repeats are included for some loci either for technical reasons to improve ExpansionHunter accuracy, or due to research interest in the size of these adjacent repeats. Population: The gnomAD ancestry group of the individual. Possible values are: "afr", "ami", "amr", "asj", "eas", "fin", "mid", "nfe", "oth", "sas" Sex: The sex karyotype of the genotyped individual. Possible values are "XX" and "XY". Age: The age of the individual at the time when they enrolled in one of the research studies underlying gnomAD. The values represent 5 year bins such as "20-25", as well as ">80" for individuals over 80 years old and "<20" for individuals younger than 20. For individuals with unknown age, the value is "age_not_available" PcrProtocol: All samples in the current TR dataset are PCR-free, so this column contains "pcr_free" for all rows. Genotype: The ExpansionHunter genotype for this individual at this locus, generated using the variant catalog without off-target regions (see https://github.com/broadinstitute/str-analysis/tree/main/str_analysis/variant_catalogs). These are the genotypes used to generate all plots in the gnomAD browser TR pages. Allele1: The shorter repeat size from the genotype. This is provided as a separate column for convenience. Allele2: The longer repeat size from the genotype; empty in the special case of hemizygous genotypes (e.g., in male samples at loci on chrX). This is provided as a separate column for convenience. GenotypeConfidenceInterval: The ExpansionHunter confidence intervals associated with the genotype in the "Genotype" column. Filter: The value of the 'FILTER' column from the ExpansionHunter VCF. Nearly all rows are "PASS", while 442 rows are "LowDepth". Q: A genotype quality score computed based on the ratio of the ExpansionHunter allele size and the width of the ExpansionHunter confidence interval. The values range from 0 (low quality) to 1 (high quality). ManualReviewGenotypeQuality: The genotype quality based on manual review of read visualizations. Values include: "high", "medium-high", "medium", "medium-Low", and "low" ReadvizFilename: The filename of the SVG image generated by REViewer based on the ExpansionHunter call reported in the "Genotype" column. This can be used to compute the public url of this image as follows: https://storage.googleapis.com/gnomad-str-public/release_2024_07/readviz_v2/${row.LocusId}/${row.ReadvizFilename} PublicProjectId: This column is populated for samples from the 1000 Genomes Project and the Human Genome Diversity Project. PublicSampleId: This column is populated for samples from the 1000 Genomes Project and the Human Genome Diversity Project.