^{1}

^{2}

^{3}

^{1}

^{4}

^{1}

^{2}

^{1}

^{4}

^{1}

^{2}

The authors have declared that no competing interests exist.

Mendelian randomization (MR) is an established approach to evaluate the effect of an exposure on an outcome. The gene-by-environment (GxE) study design can be used to determine whether the genetic instrument affects the outcome through pathways other than via the exposure of interest (horizontal pleiotropy). MR phenome-wide association studies (MR-pheWAS) search for the effects of an exposure, and can be conducted in UK Biobank using the PHESANT package. In this proof-of-principle study, we introduce the novel GxE MR-pheWAS approach, that combines MR-pheWAS with the use of GxE interactions. This method aims to identify the presence of effects of an exposure while simultaneously investigating horizontal pleiotropy. We systematically test for the presence of causal effects of smoking heaviness–stratifying on smoking status (ever versus never)–as an exemplar. If a genetic variant is associated with smoking heaviness (but not smoking initiation), and this variant affects an outcome (at least partially) via tobacco intake, we would expect the effect of the variant on the outcome to differ in ever versus never smokers. We used PHESANT to test for the presence of effects of smoking heaviness, instrumented by genetic variant rs16969968, among never and ever smokers respectively, in UK Biobank. We ranked results by the strength of interaction between ever and never smokers. We replicated previously established effects of smoking heaviness, including detrimental effects on lung function. Novel results included a detrimental effect of heavier smoking on facial aging. We have demonstrated how GxE MR-pheWAS can be used to identify potential effects of an exposure, while simultaneously assessing whether results may be biased by horizontal pleiotropy.

Mendelian randomization uses genetic variants associated with an exposure to investigate causality. For instance, a genetic variant that relates to how heavily a person smokes, has been used to test whether smoking causally affects health outcomes. Mendelian randomization is biased if the genetic variant also affects the outcome via other pathways. We exploit additional information–that the effect of heavy smoking only occurs in people who actually smoke–to overcome this problem. By testing associations in ever and never smokers separately we can assess whether the genetic variant affects an outcome via smoking or another pathway. If the effect is at least partially via smoking heaviness, we would expect the effect to differ in ever versus never smokers, and this would suggest that smoking causally influences the outcome. Previous Mendelian randomization studies of smoking heaviness focused on specific outcomes–here we searched for the presence of causal effects of smoking heaviness across over 18 000 traits. We identified previously established effects (e.g. a detrimental effect on lung function) and novel results including a detrimental effect of heavier smoking on facial aging. Our approach can be used to test for the presence of causal effects of other exposures, where the exposure only occurs in known subsets of the population.

Mendelian randomization (MR) is an established approach that uses genetic variants as proxies for a modifiable exposure, to test for the presence of a causal effect of the exposure (or estimate the magnitude of its effect) on a potential outcome [

A design in which the genetic variant is interacted with another variable can provide evidence for violation of the exclusion restriction assumption. The additional variable can be of many forms; not necessarily environmental. For example, an early study interacted a genetic variant related to alcohol with sex in a population where women drank very little, such that an effect of the variant on the outcome in women would suggest that at least some of the effect of the variant on the outcome is not acting through alcohol consumption [

The MR GxE approach has been used to investigate the effect of smoking heaviness on traits such as body mass index (BMI) and depression/anxiety [

This figure illustrates the three broad types of results of a GxE study on smoking heaviness: 1) no interaction between ever and never smokers, 2) a quantitative interaction, where associations are in a consistent direction but one is stronger than the other, or 3) a qualitative interaction, where the effect only occurs in one group or the estimates are in opposite directions, also known as a cross over effect [

While MR studies have, to date, been largely performed in a hypothesis-driven manner, it is now possible to perform hypothesis-free MR analyses using the MR phenome-wide association study (MR-pheWAS) approach [

Each additional smoking-increasing allele of rs16969968 was associated with a 1.21 [95% confidence interval (CI): 1.19, 1.23] higher odds of being in a higher smoking heaviness category, after adjusting for age, sex and the first 10 genetic principal components, and a 0.98 [95% CI: 0.97, 0.99] lower odds of being an ever (vs never) smoker.

We performed a GxE MR-pheWAS analysis, ranking on the strength of the interaction between ever versus never smokers (using the P value of Cochran’s Q-test statistic for heterogeneity). Of the 16 692 interactions tested, we identified 8 results with a P value lower than a stringent Bonferroni corrected threshold of 3.00x10^{-6} (0.05/16 692), where an increase in rs16969968 smoking-heaviness increasing allele dosage was associated with worse lung function (3 phenotypes), lower likelihood of being a morning person, and higher blood assay levels (haematocrit percentage, white blood cell count, neutrophil count and haemoglobin concentration) on average, in ever compared with never smokers. We found a further 4 results at a false discovery rate of 5% (using a P value threshold of 0.05x12/16 692 = 3.59x10^{-5}) (see

Green dashed: Bonferroni corrected threshold; Red dash-dotted: 5% false discovery rate (FDR) threshold; Blue dotted: Expected = Actual; Purple points: results of tests performed: a) P-values of tests of interaction in GxE MR-pheWAS, and b) P-values of SNP-outcome associations in MR-pheWAS among ever smokers. Multiple testing thresholds: a) Bonferroni threshold: 0.05/16692 = 3.00x10^{-6}; 5% FDR threshold: 0.05x12/16692 = 3.59x10^{-5}, and b) Bonferroni threshold: 0.05/18513 = 2.70x10^{-6}; 5% FDR threshold: 0.05x69/18513 = 1.86x10^{-4}.

While our main aim is to identify interactions between the associations in ever versus never smokers (which may suggest a causal effect via smoking heaviness), here we ranked outcomes by the strength of association with rs16969968 in ever smokers to maximize statistical power. This approach relies on the assumption that the combined effect through smoking heaviness and any horizontal pleiotropic effect is not null. For example, if the effects via smoking heaviness and a horizontally pleiotropic pathway are of the same magnitude but in opposite directions then they would cancel out and not be identified in a MR-pheWAS in ever smokers. Under the assumption of no horizontal pleiotropy a 5% FDR threshold on the strength of associations in ever smokers can be used to control the proportion of ‘hits’ that are not due to an effect via smoking heaviness at 5%. However, if the effect of the genetic variant on some outcomes is horizontally pleiotropic then this would inflate the FDR, such that the proportion of results below a 5% FDR P value threshold that are not due to an effect via smoking heaviness may be higher than 5%.

The results of our MR-pheWAS among ever smokers includes 18 513 tests ranked by P value of the estimated effect on each outcome, given in ^{-4}), given in Figs ^{-6} (0.05/18 513). For comparison, we also performed MR-pheWAS among never smokers and in our full sample, and ranked these results by the strength of association. We identified 8 results at a false discovery rate of 5% among never smokers (see

Results shown are those identified after correcting for multiple testing. 33 of 38 binary results are shown in this figure, with PHESANT binary result for never smokers. Results not shown in this figure are relevant to smoking participants only, e.g. ‘difficulty not smoking for 1 day’ such that they are absent in the never smokers (see

Results shown are those identified after correcting for multiple testing. 14 of 18 linear results are shown in this figure, with PHESANT linear result for never smokers. Results not shown in this figure are relevant to smoking participants only, e.g. “amount of tobacco currently smoked” such that they are absent in the never smokers (see

Results shown are those identified after correcting for multiple testing. 6 of 12 ordered categorical results are shown in this figure, with PHESANT ordered categorical result for never smokers. Results not shown in this figure are predominantly relevant to smoking participants only, e.g. “number of cigarettes smoked previously” such that they are absent in the never smokers (see

The results identified (below the 5% FDR threshold) when ranking by our P value for interaction were a subset of those identified when ranking by the P value of the main effects among ever smokers, except for a binary outcome describing whether the participant has had a keratinizing squamous cell carcinoma (field ID [FID] = 40011, value 8071). A higher rs16969968 smoking-heaviness increasing allele dosage was associated with a higher risk of keratinizing squamous cell carcinoma among ever smokers, but a lower risk among never smokers (such that the strength of the interaction was stronger than the strength of the association in ever smokers). Overlap of results across all our MR-pheWAS is shown in

Our identified results from our GxE MR-pheWAS (searching for interactions) included a detrimental effect of heavier smoking on facial aging (FID = 1757), where a higher genetic predisposition to heavier smoking was associated with an increased risk of looking older (as perceived by others) relative to your age. Estimates of association of rs16969968 on facial aging are shown in ^{-6}). Our sensitivity analyses adjusting for age, sex and the first 40 principal components were consistent with the results of our main analyses.

Analysis | Sample | N | Odds ratio | Interaction P value ^{3} |
---|---|---|---|---|

^{1} |
||||

Main analysis | Ever smokers | 137,869 | 1.062 [1.043, 1.081] | 7.72x10^{-6} |

Never smokers | 167,781 | 1.004 [0.988, 1.021] | ||

Sensitivity analysis | Ever smokers | 137,869 | 1.062 [1.043, 1.081] | 7.46x10^{-6} |

Never smokers | 167,781 | 1.004 [0.988, 1.021] | ||

^{2} |
||||

Main analysis | Ever and never smokers | 305,662 | 1.293 [1.089, 1.534] | |

Sensitivity analysis | 1.294 [1.090, 1.536] |

Main analysis: Adjusted for age, sex and first 10 genetic principal components. Sensitivity Analysis: Adjusted for age, sex and first 40 genetic principal components.

^{1} Direct test of association between smoking heaviness SNP and facial aging, in ever and never smokers separately. Estimates are the change of odds of reporting looking ‘older than you are’ versus looking ‘younger than you are’ or ‘about the same’, or looking ‘older than you are’ or ‘about the same’ versus ‘younger than you are’, for each additional smoking-increasing allele of rs16969968.

^{2} Two stage IV probit regression. Estimates are the change of odds of reporting facial aging category ‘older than you are’ for a 1 SD increase in lifetime smoking score. Calculated by taking the exponent of 1.6 times the probit estimate [

^{3} Interaction P value generated using meta regression (

We attempted to replicate this effect using a genetic risk score for lifetime smoking exposure, a measure that incorporates duration of smoking and whether (and when) a person stopped smoking, in addition to heaviness of smoking. A 1 standard deviation (SD) higher lifetime smoking genetic instrument was associated with a 0.105 [95% confidence interval (CI): 0.102, 0.109] SD higher lifetime smoking score in our full sample, after adjusting for age, sex and the first 10 genetic principal components. We estimated that a 1SD higher lifetime smoking score caused a 1.293 [95% CI: 1.089, 1.534] higher odds of reporting that people say you look older than you are, after adjusting for age, sex and the first 10 genetic principal components.

Our conclusions about the causal effect of smoking heaviness on our outcomes may be affected by selection-induced collider bias [

The hypothesis-free nature of our analysis means that even when collider bias is small with little impact on individual estimates, across many tests it is possible that this bias inflates the false discovery rate–the proportion of ‘null’ results incorrectly identified as ‘hits’. We extended the above simulation to investigate this assuming a phenome scan restricted to continuous outcomes and found that inflation of the false discovery rate increased as the strength of the colliding relationship increased (see Supplementary section S2 for further details). For example, assuming no effect of the SNP on the outcomes, and given a confounder that has a large effect on smoking status (OR = 100) and explains 20% of the variation in the outcomes, the false discovery rate was 0.071.

In this study, we searched for the presence of causal effects of smoking heaviness, using the PHESANT software package to perform a GxE MR-pheWAS, by estimating the association of genetic variant rs16969968 with each outcome while restricting to ever and never smokers, respectively. We used two approaches to identify potential causal effects of smoking heaviness from our PHESANT results. Our main approach–the GxE MR-pheWAS–ranked results by the strength of interaction between ever and never smokers. This approach is commensurate with our aims, as a genetic variant that affects an outcome (at least in part) through smoking heaviness will exhibit a different effect size on this outcome, among ever and never smokers, respectively. However, identifying interactions has lower statistical power compared with identifying a main effect, especially when combined with the need to correct for the multiple tests performed in an MR-pheWAS. Our secondary approach ranked results based on the strength of the effect of rs16969968 among ever smokers and identifies causal effects under the assumption that the combined effect through smoking heaviness and any horizontal pleiotropic effect is not null. For instance, given a positive effect of a genetic variant on an outcome via smoking heaviness, and an equal but opposite effect via horizontal pleiotropy, then the effect of the variant would appear null in ever smokers and negative in never smokers, and hence this outcome would not be identified when ranking by the effect among ever smokers. Under the assumption of no horizontal pleiotropy a 5% FDR threshold would maintain the FDR of interactions at 5% as well as the FDR of effects among ever smokers. However, the proportion of results identified as a ‘hit’ among ever smokers, but for which there is no effect via smoking heaviness, increases as the prevalence of horizontal pleiotropy increases.

Our GxE MR-pheWAS of smoking heaviness (ranking on interaction strength) identified 12 results, whereas our MR-pheWAS ranking on strength of effect in ever smokers identified 69 results. Of the 12 identified in the former, 11 of these were also identified in the latter. Furthermore, the majority of results identified when ranking on the effect in ever smokers were qualitative, with an effect seen among ever smokers but not among never smokers (type E in

Our MR-pheWAS among ever smokers found only a weak negative effect of rs16969968 on BMI that did not pass the 5% FDR threshold. Previous MR studies have also identified an effect of smoking heaviness on BMI, where additional smoking-increasing alleles were associated with a lower BMI among current smokers [

We identified other novel results, including a detrimental effect of heavier smoking on facial aging. We estimated that a 1SD increase in lifetime smoking causes a 1.29 [95% CI: 1.09, 1.53] higher odds of reporting that others say you look older than you are. A 1SD increase in lifetime smoking is, for example, equivalent to being a current smoker who has smoked 5 cigarettes per day for 12 years, or a former smoker who smoked 5 cigarettes per day for 21 years but stopped smoking 10 years ago, rather than a never smoker. This identified association should be further investigated and replicated in an independent sample, although at present we are not aware of a study with sufficient sample size and data available to do this. Our identified association may reflect a true causal effect of smoking heaviness, may be due to chance, or may arise because the genetic variants have horizontal pleiotropic effects and are thus invalid instruments for smoking heaviness or lifetime smoking. However, we examined the extent to which horizontal pleiotropy might be biasing our smoking heaviness results, by estimating the effect of the smoking heaviness genetic variant among never smokers, and found little evidence of an association, suggesting that smoking status modifies the effect of the genetic variant on facial aging. This is consistent with an effect of the genetic variant via smoking heaviness, but it is also possible that smoking status could modify a pathway from the genetic variant to facial aging that is not via smoking heaviness (see

Our results include examples of the “case-control by proxy” study design, where participants with relatives who are cases for a given phenotype are used as ‘proxy’ cases and those with relatives who are controls are used as ‘proxy’ controls [

Our comparison of results across samples demonstrates the value of stratifying to detect potential causal effects. Of the 12 results identified in our GxE MR-pheWAS (testing interactions), only 5 were identified in our MR-pheWAS using the full sample. While the results of our MR-pheWAS in the full sample included associations that were not identified in our GxE MR-pheWAS in ever smokers, such as risk of operative procedures on the tarsometatarsal joint, and eye problems, these associations may be due to horizontal pleiotropy (or chance) rather than a causal effect of the smoking heaviness variant through smoking status, as these associations were consistent in ever and never smokers.

There are some limitations of this work that should be considered. First, while ranking by interaction strength between ever and never smokers is directly commensurate with our aim of identifying potential causal effects of smoking heaviness (i.e. outcomes where the effect of the genetic variant differs among ever versus never smokers), in the presence of horizontal pleiotropy this has lower statistical power compared with testing for a main effect (e.g. in ever smokers) and identified only a small number of results. Under the assumption of no horizontal pleiotropy ranking using the strength of association among ever smokers is a valid approach to identify interactions and has better statistical power. While this assumption is likely to be violated for some outcomes, in practice we identified many potentially interesting novel results with this method, including all but one of the results based on interaction strength. As GxE MR-pheWAS is a hypothesis-generating approach for discovering potentially interesting associations to be further interrogated in independent data, horizontal pleiotropy can be investigated in follow-up analyses. Follow-up is also important because we rank associations from our MR-pheWAS and GxE MR-pheWAS, such that we should expect the true strength of our strongest estimates to be less than we reported due to the winner’s curse. We also note that we test the strength of the interaction by first testing the effect of the SNP on each outcome in ever and never smokers separately and then following this up with a test for interaction. It is also possible to conduct a GxE MR-pheWAS using a model with an interaction term. This would be essentially equivalent to our stratification approach if the model also includes interactions between the genetic instrument and all covariates. However, this model approach usually assumes that the random error (i.e. variance of the error term) is the same across strata.

Second, our approach is based on the assumption that, if smoking heaviness affects an outcome then we would expect the effect of the genetic variant on the outcome to differ in ever versus never smokers, illustrated in

Third, UK Biobank is a highly selected sample of the UK population, having a response rate of 5.5% [

Fourth, we found an association between rs16969968 and smoking status, where each additional smoking-increasing allele of rs16969968 was associated with a 0.98 [95% CI: 0.97, 0.99] lower odds of being an ever (vs never) smoker. This means that associations may be biased by selection induced collider bias because we stratify our sample on smoking status. While our simulations indicated that any collider bias due to the SNP effect on smoking status would have a negligible impact on the estimated effect of the SNP on a given outcome, it is possible that this has increased our type 1 error rate (the proportion of null results incorrectly identified as ‘hits’) across the large number of tests performed in our MR-pheWAS. Our simulation that assessed this did show some inflation of the false discovery rate among ever smokers as the strength of the colliding relationship increased (all false discovery rates were below 0.165 in all our simulations), but this simulation included strong assumptions. For example, we assumed a particular relationship between the collider and outcomes (see

Fifth, the estimates used to identify interactions are of the direct association of the genetic variant with the outcome, and so are not estimates of the magnitude of effect of smoking heaviness. We cannot follow-up results using a formal IV analysis using these UK Biobank data to estimate a causal effect of smoking heaviness specifically because the association of rs16969968 with smoking heaviness might change across the life-course [

Sixth, it is possible that reporting bias of smoking status may have biased associations. For instance, if some ‘ever’ smokers reported that they have never smoked, then for outcomes affected by rs16969968 via smoking heaviness the estimate in never smokers would be biased towards that in ever smokers. This would bias estimates of interaction between ever and never smokers towards the null. Furthermore, if the effect of smoking heaviness is transient then our interaction estimates may also be biased towards the null, because previous smokers are assigned to the ever smoker group but the effect may no longer be present. In this case, testing the interaction between current (rather than ever) versus never smokers may be the more appropriate strategy.

Seventh, our GxE MR-pheWAS used a sample of unrelated individuals such that it is possible associations may be biased due to family structure, for example, by dynastic effects [

Eighth, due to the hypothesis-free nature of a phenome scan, results generated in this way require careful consideration and follow-up. For example, our MR-pheWAS identified a potentially interesting association of rs16969968 with risk of diagnosis of the International Classification of Diseases (ICD) code ‘Descended testis’ (field 41202 value C621). This association was found in both ever and never smokers, hence we initially considered whether this may reflect an effect of parental genotype on offspring phenotype. However, further inspection revealed that this result is misleading for two reasons. First, this ICD code (C62.1) is a subcategory of ‘Malignant neoplasm of testis’ [C62]), hence pertains specifically to cancerous descended testis. Second, PHESANT deals with ICD fields by generating a binary variable for each ICD code, assigning all participants with the code as TRUE and all participants without the code as FALSE (i.e. assuming no missingness across all participants). This is not appropriate for sex specific codes, where analyses should be restricted to a particular sex. We further investigated this result by restricting to male participants, and testing the effect of rs16969968 on: 1) ‘malignant neoplasm of testis, descended’ and 2) ‘malignant neoplasm of testis, unspecified’. The latter serves as a replication for the former, under the assumption that the proportion of participants with descended versus undescended testis in the ‘unspecified’ group is the same as the ICD codes (C62.1 versus C62.0) where this is known (i.e. the majority of the unspecified group are descended; the ratio in UK Biobank is 1:16). While the positive association with the ‘descended’ group remains in both ever and never smokers (N_descended = 26; odds ratio per each additional smoking-increasing allele of rs16969968 of 3.52 [95% CI: 2.03, 6.31] in the full sample), we find little evidence of an association in the unspecified group (N_unspecified = 199; odds ratio per each additional smoking-increasing allele of rs16969968 of 0.99 [95% CI: 0.80, 1.22] in the full sample), suggesting that the association with the particular outcome is due to chance.

We used the freely available PHESANT package to search for the presence of causal effects of smoking heaviness across thousands of outcomes. While we used the rs16969968 SNP as an instrument for smoking heaviness a recent GWAS (also in UK Biobank) identified 55 smoking heaviness associated variants which could be used (e.g. combined into an allele score) in future GxE MR-pheWAS of smoking heaviness [

UK Biobank is a prospective cohort of 503 325 men and women in the UK aged between 37–73 years (99.5% were between 40 and 69 years) [

Of the 487 406 participants with genetic data, we removed 373 with genetic sex different to reported sex, and 471 with sex chromosome aneuploidy (identified as putatively carrying sex chromosome configurations that are not either XX or XY). We found no outliers in heterozygosity and missing rates, which would indicate poor quality of the genotypes. We removed 78 309 participants not of white British ancestry [

Two SNPs in the

The UK Biobank data showcase allows researchers to identify variables based on the field type (

We excluded 74 fields

This resulted in a set of 2687 UK Biobank fields (347 integer, 1392 continuous, 836 categorical [single] and 112 categorical [multiple]), referred to hereafter as the outcome dataset (because they are tested as an outcome irrespective of whether this is biologically plausible).

Smoking status was self-reported via a questionnaire at the UK Biobank assessment centre. Participants were asked to report whether they smoked previously, currently, or whether they had never smoked. We created a binary variable denoting ever versus never smokers by grouping former and current smokers.

Smoking heaviness was derived from the number of cigarettes smoked per day, which was asked via the same questionnaire, to those who reported being a previous or current smoker. We categorised the number of cigarettes into four bands: 0–10, 11–20, 21–30 and 31+.

We include age and sex as covariates in our models to reduce the variation in our outcomes. Age when participants attended the UK Biobank assessment centre was derived from their date of birth and the date of their assessment centre visit. Sex was self-reported during the touchscreen questionnaire (and validated using the genome-wide data). We adjusted for the first 10 genetic principal components to control for confounding via population stratification. Genetic variants are set at conception, and after conception they cannot be affected by factors that traditionally confound observational associations (such as participant socio-economic position). Also, while it is possible these factors may be on an alternative confounding pathway between the genetic variant and an outcome (e.g. via parental genotype and parental smoking heaviness [

We tested the association of rs16969968 with smoking heaviness using ordered logistic regression (

We searched for the causal effects of smoking heaviness, within three subsamples of UK Biobank participants: 1) ever smokers, 2) never smokers, and 3) our full sample. We conducted our MR-pheWAS in two stages. First, we ran PHESANT (version 0.17) with the ‘save’ option, to derive the PHESANT-processed outcomes for all participants in our full sample. A description of PHESANT’s automated rule-based method is given in detail elsewhere [

In the second stage, for each subsample (ever, never and the full sample), we estimated the univariate association of rs16969968 with each of the outcome variables derived by PHESANT. The rs16969968 SNP and outcome are the independent (exposure) and dependent (outcome) variables in the regression model, respectively. Outcome variables with continuous, binary, ordered categorical and unordered categorical data types, were tested using linear, logistic, ordered logistic, and multinomial logistic regression, respectively. All analyses were adjusted for covariates as described above. The two-stage approach we used ensured that the same data types were assigned to each outcome across subsamples, as we process and assign data types using PHESANT on the whole sample. In each stratum, we only test outcomes with (in that stratum) at least 500 participants and at least 10 participants in each category for binary and unordered categorical variables.

As described above, we only tested outcomes with more than 500 participants, and with at least 10 participants in each category for binary and unordered categorical variables such that the outcomes tested in each subsample may vary. For this reason, we first identified the subset of outcomes tested in both ever and never subsamples (and using the same type of regression). For this subset, we determined the strength of interaction between ever versus never smokers using meta regression (_{threshold} = 0.05×rank/n, where _{threshold} is the P value threshold resulting in a false discovery rate of 5% [

If the genetic variant affects the outcome via smoking heaviness we would expect (with sufficient statistical power) to see an effect in ever smokers, except where a horizontal pleiotropic effect of the same magnitude in the opposite direction exists–these would cancel out to give a zero total effect. Hence, we conducted a secondary analysis, ranking outcomes by P value of the estimated effects of rs16969968 within the ever subsample only. To identify potential causal effects, we used both a Bonferroni and 5% FDR threshold, as described above. We examined the degree to which horizontal pleiotropy may be biasing results by viewing these estimates alongside the estimates among never smokers. We also identified top results in our never smoker and full samples, to compare the sets of results identified using each sample.

It may be reasonable to assume that rs16969968 affects most outcomes only via smoking heaviness (i.e. there is no horizontal pleiotropy), such that an association between rs16969968 and the outcome in the whole sample would indicate an effect of smoking heaviness on this outcome. We performed a two-step approach to identify interactions, similar to an approach for identifying gene-environment interactions proposed previously [

We identified an association with a facial aging phenotype. Participants were asked ‘do people say you look:’ and asked to select either ‘younger than you are’, ‘about your age’ or ‘older than you are’. It is possible that the PHESANT automated approach made inappropriate decisions in its analysis, hence we re-examined this association to ensure it is not erroneous. We estimated the effect of rs16969968 on facial aging, using ordered logistic regression (

We derived a measure of lifetime smoking that incorporates smoking heaviness, duration and time since cessation into a single measure [

Analyses are performed in R version 3.3.1 ATLAS, Matlab r2018a or Stata version 15, and code is available at [

(PDF)

(PDF)

Total number of tests with ever and never result, and same regression type = 16692. Bonferroni threshold = 0.05/16692 = 3.00x10^{-6}. False discovery rate threshold = 0.05x12/16692 = 3.59x10^{-5}. ^{1} Linear: associations are the SD difference in inverse rank normal transformed outcome for each additional smoking-increasing allele of rs16969968. Ordered: Associations are the log odds of a higher outcome group for each additional smoking-increasing allele of rs16969968. Binary: Associations are the log odds of comparison versus baseline outcome group for each additional smoking-increasing allele of rs16969968. ^{2} Strength of interaction between ever versus never smokers using meta regression (metan command in Stata). ^{3} Information on categories for multinomial, ordinal and binary regression results: reference category for multinomial regression results, baseline category for binary logistic regression results, and category ordering for ordered logistic regression results. For example, ^{4} In addition to the field ID, this column also contains the reference value for multinomial regression results, and the field value for which a binary variable was generated for categorical (multiple) fields.

(PDF)

Results of main analysis, adjusting for age, sex and the first 10 genetic principal components. ^{1} Direction of change of outcome with genetic predisposition to higher smoking heaviness. ^{2} For multinomial logistic regression results a single P value was calculated for each model as a whole, using the likelihood ratio chi-square test. ^{3} Information on categories for multinomial, ordinal and binary regression results: reference category for multinomial regression results, baseline category for binary logistic regression results, and category ordering for ordered logistic regression results. For example, “^{4} In addition to the field ID, this column also contains the reference value for multinomial regression results, and the field value for which a binary variable was generated for categorical (multiple) fields. ^{5} Where test type differs in never smokers this is shown in brackets. Bonferroni threshold = 2.70x10^{-6} (0.05/18513); false discovery rate threshold = 0.05x69/18513 = 1.86x10^{-4}. Binary, linear and ordered results in this table are shown in Figs

(PDF)

Total number of tests in never smokers = 17975. Bonferroni threshold = 0.05/17975 = 2.78x10^{-6}. False discovery rate threshold = 0.05x8/17975 = 2.23x10^{-5}.

(PDF)

Total number of tests in whole sample = 23009. Bonferroni threshold = 0.05/23009 = 2.17x10^{-6}. False discovery rate threshold = 0.05x48/23009 = 1.04x10^{-4}.

(PDF)

Step 1: Rank results by association strength among whole sample, identifies 48 results (^{-3}. False discovery rate threshold = 0.05x9/36 = 1.25x10^{-2}. ^{1} Linear: associations are the SD difference in inverse rank normal transformed outcome for each additional smoking-increasing allele of rs16969968. Ordered: Associations are the log odds of a higher outcome group for each additional smoking-increasing allele of rs16969968. Binary: Associations are the log odds of comparison versus baseline outcome group for each additional smoking-increasing allele of rs16969968. ^{2} Strength of interaction between ever versus never smokers using meta regression (metan command in Stata). ^{3} Information on categories for multinomial, ordinal and binary regression results: reference category for multinomial regression results, baseline category for binary logistic regression results, and category ordering for ordered logistic regression results. For example, ^{4} In addition to the field ID, this column also contains the reference value for multinomial regression results, and the field value for which a binary variable was generated for categorical (multiple) fields.

(PDF)

(PDF)

In our UK Biobank sample the smoking heaviness genetic variant is estimated to affect both smoking status and the identified outcome phenotypes. If a confounder exists that affects both smoking status and a given outcome then smoking status is a collider and stratifying on smoking status may cause collider bias–bias in the estimated effect of the genetic variant on the outcome phenotype. An exception to this is when the genetic variant for smoking heaviness does not affect the outcome, and they act additively on smoking status (i.e. there is no additive interaction on the log-probability scale). In this case, collider bias would not occur.

(PDF)

OR_{conf,si} is the odds ratio of the confounder on smoking status, i.e. the change of odds of being an ever versus never smoker for a 1 standard deviation increase in confounder. r2 is proportion of the variance of the continuous facial aging phenotype (underlying the categorical facial aging outcome) that is explained by the confounder. a-d: positive effect of confounder on outcome, with OR of the confounder on smoking status of 10 (a), 20 (b), 50 (c) and 100 (d). e-h: negative effect of confounder on outcome, with OR of the confounder on smoking status of 10 (e), 20 (f), 50 (g) and 100 (h).

(PDF)

OR_{conf,si} is the odds ratio of the confounder on smoking status, i.e. that change of odds of being an ever versus never smoker for a 1 standard deviation increase in confounder. r2 is proportion of the variance of the continuous phenotype that is explained by the confounder. a-d: positive effect of confounder on outcome, with OR of the confounder on smoking status of 10 (a), 20 (b), 50 (c) and 100 (d). e-h: negative effect of confounder on outcome, with OR of the confounder on smoking status of 10 (e), 20 (f), 50 (g) and 100 (h).

(PDF)

These simulations assume a large (and hence unlikely) effect of the confounder on smoking status (odds ratio = 100; i.e. we are assuming a 1 standard deviation change of the confounder causes a 100 higher odds of being an ever smoker). Furthermore, we assume a 1 dosage increase in SNP causes a 0.8 lower odds of being an ever smoker (i.e. we have made this more extreme than we see in UK Biobank for illustration purposes). (a) Categorical facial aging outcome, positive effect of confounder on outcome. (b) Continuous outcome, positive effect of confounder on outcome. (c) Categorical facial aging outcome, negative effect of confounder on outcome. (d) Continuous outcome, negative effect of confounder on outcome. In plots (a) and (c) r2 is proportion of the variance of the continuous facial aging phenotype (underlying the categorical facial aging outcome) that is explained by the confounder. In plots (b) and (d) r2 is proportion of the variance of the continuous phenotype that is explained by the confounder.

(PDF)

Solid black boxes indicate an interaction. Box around smoking status variable indicates that this variable is conditioned upon (i.e. we stratify on ever versus never smokers). Please see code in the project’s GitHub repository [

(PDF)

In this figure we show two directed acyclic graphs (DAGs), after stratification by ever versus never smokers (of the UK Biobank participants). We assume an effect of parental smoking heaviness on the UK Biobank participants outcome. This alternative pathway between the UK Biobank participant's SNP and the outcome (via parental SNP and parental smoking heaviness) is the same in UK Biobank ever and never smokers. It is not horizontal pleiotropy because there is no causal effect of the SNP (or more precisely the genetic variation for which the SNP is tagging) on the outcome–it is the parental genetic variation that affects the outcome. Note also that this assumes participant smoking status is independent of parental smoking status, otherwise the extent to which this alternative pathway manifests would vary in UK Biobank participant ever versus never smokers.

(PDF)

(TSV)

This research has been conducted using the UK Biobank Resource under Application Number 16729. We are grateful to Ruth Mitchell for providing helpful comments on the clarity of the paper.