Obtaining accurate quantitative data on environmental exposures strengthens the applicability of toxicological studies and biomonitoring efforts to estimate actual risks for the development of human health effects. This present study builds upon a small and well-characterized occupational cohort that has been used successfully to assess exposure-dose relationships in HDI exposed workers and to inform exposure assessment models (Fent et al. 2009a; Fent et al. 2008; Fent et al. 2006; Fent et al. 2009b; Flack et al. 2008; Flack et al. 2010a; Flack et al. 2011; Flack et al. 2010b; Gaines et al. 2010a; Gaines et al. 2010b; Gaines et al. 2011; Thomasen et al. 2011a; Thomasen et al. 2011b; Thomasen and Nylander- French 2012). Workers in our study population (n=33) were genotyped and the data were used to discover SNPs that are statistically associated with levels of biomarkers measured in biological media from workers exposed to HDI. We report on the feasibility of determining statistically significant associations between HDA biomarker levels and markers of genetic variability with three types of biomarkers – urine HDA level adjusted for creatinine level, plasma HDA level, and total blood HDA level (plasma + hemoglobin). Furthermore, we found that the associated SNPs are then significant predictors in exposure models built upon repeated measurements (for urine HDA) and cumulative measurements (for blood biomarkers) of biomarker and exposure levels to HDI.
SNP rs169 was significantly associated with the geometric mean of creatinine-adjusted urine biomarker level in an initial linear regression model that included four covariates of exposure (breathing-zone concentration of HDI adjusted by APF and paint time, skin concentration of HDI, ethnicity, and smoking). In a separate LMM conducted with PROC MIXED that incorporated repeated exposure measurements of HDI as covariates, SNP rs169 was again a significant predictor of urine HDA adjusted by creatinine measured over multiple visits per worker. A total of 14 SNPs were found to be significantly associated with urine HDA holding FDR at q<0.20 and each of the top five SNPs were found to be significant
predictors in linear mixed effects models when run individually. The model that contained rs169 was the best-fit model as determined by AIC compared with models that incorporated the other SNPs. Ideally, the exposure assessment model would have incorporated multiple SNPs to better account for inter- individual variability, but in LMMs that included multiple SNPs as covariates for association with urine biomarker levels, no SNPs were observed to be significant predictors.
A total of 7 SNPs were significantly associated with the geometric mean of total blood HDA (calculated as a sum of plasma HDA and hemoglobin HDA) and 4 SNPs were significantly associated with the geometric mean of plasma HDA after FDR correction. Of those, rs10134376 was the most significantly associated SNP with total blood HDA and rs2061660 was the most significantly associated SNP with plasma HDA. Each of these SNPs was also a significant predictor in the linear regression models that assessed the association between cumulative blood biomarkers and cumulative breathing-zone and skin concentrations of HDI. The top SNPs associated with total blood HDA were highly correlated and there is evidence that the markers are in linkage disequilibrium (LD) with each other because their chromosomal positions on chromosome 14 are closely spaced together. Similarly, several of the top SNPs associated with plasma HDA appear to be in LD. Therefore, only linear models that incorporated one SNP at a time could be assessed due to the presence of collinearity between the SNPs.
The present method is sensitive to statistical outliers in the data with high values of the dependent and independent variables. The SNPs that are determined to be significantly associated with biomarker values differ widely under various association criteria, indicating that the GWAS analysis may not identify stable markers of individual susceptibility in our small cohort. Genotypes corresponding to the individuals with particularly high or low biomarker values can produce high residuals and impact the appropriateness of fitting linear models with the data and complicate the estimation of parameters. In addition, variation in the environmental measurements of HDI can also have influence on the
association model as measured by Cook’s distance (e.g., the worker homozygous for the minor allele of the SNPs most associated with total blood and plasma). The categorization of these individuals as influential points is not necessarily a problem for exposure assessment, in general, because such high exposure and biomarker levels can and do occur in the workplace. However, for the purposes of
statistical modeling and determining genetic associations, an outlier in the data impacts the distribution of residuals from the fitted model and may impact the validity of positive associations. Changes in the exact association criteria used for the GWAS analysis, e.g., allele coding and minor allele frequency criteria, can have the impact of altering the number and order of significant SNPs. As such, the utility of the linear exposure assessment models produced in this study should be validated by a separate set of samples to ensure replicability and broad applicability.
The AA genotype of rs169, which is homozygous for the minor allele, is associated in the linear mixed- effects model with a large increase of creatinine-adjusted urine HDA level. The parameter estimate for the recessive genotype dwarfs the biomarker measurements for most workers in the cohort, which may also explain why the environmental measurements, i.e., log-transformed breathing-zone HDI and log- transformed skin HDI concentration, that were found to be predictive of urine and blood biomarker levels in previous studies were not significant in this analysis (Flack et al. 2010a; Flack et al. 2011; Flack et al. 2010b; Gaines et al. 2010b; Gaines et al. 2011). The fixed effects shown in Table 11 are Wald estimates from added-last tests that interrogate the added value of a particular covariate on top of a model with all of the other covariates already in place. Because the estimated effect-size of the SNPs in the linear mixed effects models exceeds the effect-sizes for the breathing-zone concentration of HDI and the skin concentration of HDI, the addition of these estimates in a model that already includes the SNPs may be less likely to have significant predictive effects.
The use of a quantitative dependent variable in the genetic association study may have been a double- edged sword in the case of this dataset. On one hand, the wide range in biomarker levels among the workers provided potentially high effect-sizes that could identify SNPs with significant associations. Dichotomization of disease endpoints as case or control leads to a loss of information which can be critical in assessing association effects, particularly with exposure studies. However, in our specific analysis and with our small cohort, the range in biomarker and exposure measurements could also present a source of bias in the data that skews the least squares regression modeling in a direction that is not representative of each SNP’s actual biological influence on biomarker levels. In addition, GWAS often suffers from the “winner’s curse” in which the effect-sizes of variants may be overestimated in initial, exploratory results because scientists tend to focus only on variants that yield significant evidence for association (Xiao and Boehnke 2011).