Non-parametric linkage methods have had limited success in detecting gene by gene interactions. Using affected sibling-pair (ASP) data from all replicates of the simulated data from Problem 3, we assessed the statistical power of three approaches to identify the gene × gene interaction between two loci on different chromosomes. The first method conditioned on linkage at the primary disease susceptibility locus (DR), to find linkage to a simulated effect modifier at Locus A with a mean allele sharing test. The second approach used a regression-based mean test to identify either the presence of interaction between the two loci or linkage to the A locus in the presence of linkage to DR. The third method applied a conditional logistic model designed to test for the presence of interacting loci. The first approach had decreased power over an unconditional linkage analysis, supporting the idea that gene × gene interaction cannot be detected with ASP data. The regression-based mean test and the conditional logistic model had the lowest power to detect gene × gene interaction, possibly because of the complex recoding of the tri-allelic DR locus for use as a covariate. We conclude that the ASP approaches tested have low power to successfully identify the interaction between the DR and A loci despite the large sample size...
Gene set analysis allows the inclusion of knowledge from established gene sets, such as gene pathways, and potentially improves the power of detecting differentially expressed genes. However, conventional methods of gene set analysis focus on gene marginal effects in a gene set, and ignore gene interactions which may contribute to complex human diseases. In this study, we propose a method of gene interaction enrichment analysis, which incorporates knowledge of predefined gene sets (e.g. gene pathways) to identify enriched gene interaction effects on a phenotype of interest. In our proposed method, we also discuss the reduction of irrelevant genes and the extraction of a core set of gene interactions for an identified gene set, which contribute to the statistical variation of a phenotype of interest. The utility of our method is demonstrated through analyses on two publicly available microarray datasets. The results show that our method can identify gene sets that show strong gene interaction enrichments. The enriched gene interactions identified by our method may provide clues to new gene regulation mechanisms related to the studied phenotypes. In summary, our method offers a powerful tool for researchers to exhaustively examine the large numbers of gene interactions associated with complex human diseases...
We sought to find significant gene × gene interaction in a genome-wide association analysis of rheumatoid arthritis (RA) by performing pair-wise tests of interaction among collections of single-nucleotide polymorphisms (SNPs) obtained by one of two methods. The first method involved screening the results of the genome-wide association analysis for main effects p-values < 1 × 10-4. The second method used biological databases such as the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes to define gene collections that each contained one of four genes with known associations with RA: PTPN22, STAT4, TRAF1, and C5. We used a permutation approach to determine whether any of these SNP sets had empirical enrichment of significant interaction effects. We found that the SNP set obtained by the first method was significantly enriched with significant interaction effects (empirical p = 0.003). Additionally, we found that the "protein complex assembly" collection of genes from the Gene Ontology collection containing the TRAF1 gene was significantly enriched with interaction effects with p-values < 1 × 10-8 (empirical p = 0.012).
Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene-gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene-gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium-sensing receptor gene (CaSR)...
It is unclear why disease occurs in only a small proportion of persons carrying common risk alleles of disease susceptibility genes. Here we demonstrate that an interaction between a specific virus infection and a mutation in the Crohn’s disease susceptibility gene Atg16L1 induces intestinal pathologies in mice. This virus-plus-susceptibility gene interaction generated abnormalities in granule packaging and unique patterns of gene expression in Paneth cells. Further, the response to injury induced by the toxic substance dextran sodium sulfate was fundamentally altered to include pathologies resembling aspects of Crohn’s disease. These pathologies triggered by virus-plus-susceptibility gene interaction were dependent on TNFα and IFNγ and were prevented by treatment with broad spectrum antibiotics. Thus, we provide a specific example of how a virus-plus-susceptibility gene interaction can, in combination with additional environmental factors and commensal bacteria, determine the phenotype of hosts carrying common risk alleles for inflammatory disease.
Total cholesterol, low-density lipoprotein cholesterol, triglyceride, and high-density lipoprotein cholesterol (HDL-C) levels are among the most important risk factors for coronary artery disease. We tested for gene–gene interactions affecting the level of these four lipids based on prior knowledge of established genome-wide association study (GWAS) hits, protein–protein interactions, and pathway information. Using genotype data from 9,713 European Americans from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and a locus near LIPC in their effect on HDL-C levels (Bonferroni corrected Pc = 0.002). Using an adaptive locus-based validation procedure, we successfully validated this gene–gene interaction in the European American cohorts from the Framingham Heart Study (Pc = 0.002) and the Multi-Ethnic Study of Atherosclerosis (MESA; Pc = 0.006). The interaction between these two loci is also significant in the African American sample from ARIC (Pc = 0.004) and in the Hispanic American sample from MESA (Pc = 0.04). Both HMGCR and LIPC are involved in the metabolism of lipids, and genome-wide association studies have previously identified LIPC as associated with levels of HDL-C. However...
Angiotensin II type 1 receptor (AGTR1) has been reported to play a fibrogenic role in non-alcoholic fatty liver disease (NAFLD). In this study, five variants of the AGTR1 gene (rs3772622, rs3772627, rs3772630, rs3772633, and rs2276736) were examined for their association with susceptibility to NAFLD. Subjects made up of 144 biopsy-proven NAFLD patients and 198 controls were genotyped using TaqMan assays. The liver biopsy specimens were histologically graded and scored according to the method of Brunt. Single locus analysis in pooled subjects revealed no association between each of the five variants with susceptibility to NAFLD. In the Indian ethnic group, the rs2276736, rs3772630 and rs3772627 appear to be protective against NAFLD (p = 0.010, p = 0.016 and p = 0.026, respectively). Haplotype ACGCA is shown to be protective against NAFLD for the Indian ethnic subgroup (p = 0.03). Gene-gene interaction between the AGTR1 gene and the patatin-like phospholipase domain-containing 3 (PNPLA3) gene, which we previously reported as associated with NAFLD in this sample, showed a strong interaction between AGTR1 (rs3772627), AGTRI (rs3772630) and PNPLA3 (rs738409) polymorphisms on NAFLD susceptibility (p = 0.007). Further analysis of the NAFLD patients revealed that the G allele of the AGTR1 rs3772622 is associated with increased fibrosis score (p = 0.003). This is the first study that replicates an association between AGTR1 polymorphism and NAFLD...
The spatial conformation of a genome plays an important role in the long-range regulation of genome-wide gene expression and methylation, but has not been extensively studied due to lack of genome conformation data. The recently developed chromosome conformation capturing techniques such as the Hi-C method empowered by next generation sequencing can generate unbiased, large-scale, high-resolution chromosomal interaction (contact) data, providing an unprecedented opportunity to investigate the spatial structure of a genome and its applications in gene regulation, genomics, epigenetics, and cell biology. In this work, we conducted a comprehensive, large-scale computational analysis of this new stream of genome conformation data generated for three different human leukemia cells or cell lines by the Hi-C technique. We developed and applied a set of bioinformatics methods to reliably generate spatial chromosomal contacts from high-throughput sequencing data and to effectively use them to study the properties of the genome structures in one-dimension (1D) and two-dimension (2D). Our analysis demonstrates that Hi-C data can be effectively applied to study tissue-specific genome conformation, chromosome-chromosome interaction, chromosomal translocations...
Detecting gene-gene interaction in complex diseases is a major challenge for common disease genetics. Most interaction detection approaches use disease-marker associations and such methods have low power and unknown reliability in real data. We developed and tested a powerful linkage-analysis-based gene-gene interaction detection strategy based on conditioning the family data on a known disease-causing allele or disease-associated marker allele. We computer-generated multipoint linkage data for a disease caused by two epistatically interacting loci (A and B). We examined several two-locus epistatic inheritance models: dominant-dominant, dominant-recessive, recessive-dominant, recessive-recessive. At one of the loci (A), there was a known disease-related allele. We stratified the family data on the presence of this allele, eliminating family members who were without it. This elimination step has the effect of raising the “penetrance” at the second locus (B). We then calculated the lod score at the second locus (B) and compared the pre- and post-stratification lod scores at B. A positive difference indicated interaction. We also examined if it was possible to detect interaction with locus B based on a disease-marker association (instead of an identified disease allele) at locus A. We also tested whether the presence of genetic heterogeneity would generate false positive evidence of interaction. The power to detect interaction for a known disease allele was 60–90%. The probability of false positives...
The genome project increased appreciation of genetic complexity underlying disease phenotypes: many genes contribute each phenotype and each gene contributes multiple phenotypes. The aspiration of predicting common disease in individuals has evolved from seeking primary loci to marginal risk assignments based on many genes. Genetic interaction, defined as contributions to a phenotype that are dependent upon particular digenic allele combinations, could improve prediction of phenotype from complex genotype, but it is difficult to study in human populations. High throughput, systematic analysis of S. cerevisiae gene knockouts or knockdowns in the context of disease-relevant phenotypic perturbations provides a tractable experimental approach to derive gene interaction networks, in order to deduce by cross-species gene homology how phenotype is buffered against disease-risk genotypes. Yeast gene interaction network analysis to date has revealed biology more complex than previously imagined. This has motivated the development of more powerful yeast cell array phenotyping methods to globally model the role of gene interaction networks in modulating phenotypes (which we call yeast phenomic analysis). The article illustrates yeast phenomic technology...
Atrial fibrillation (AF) is the most common cardiac arrhythmia at the clinic. Recent GWAS identified several variants associated with AF, but they account for <10% of heritability. Gene-gene interaction is assumed to account for a significant portion of missing heritability. Among GWAS loci for AF, only three were replicated in the Chinese Han population, including SNP rs2106261 (G/A substitution) in ZFHX3, rs2200733 (C/T substitution) near PITX2c, and rs3807989 (A/G substitution) in CAV1. Thus, we analyzed the interaction among these three AF loci. We demonstrated significant interaction between rs2106261 and rs2200733 in three independent populations and combined population with 2,020 cases/5,315 controls. Compared to non-risk genotype GGCC, two-locus risk genotype AATT showed the highest odds ratio in three independent populations and the combined population (OR=5.36 (95% CI 3.87-7.43), P=8.00×10-24). The OR of 5.36 for AATT was significantly higher than the combined OR of 3.31 for both GGTT and AACC, suggesting a synergistic interaction between rs2106261 and rs2200733. Relative excess risk due to interaction (RERI) analysis also revealed significant interaction between rs2106261 and rs2200733 when exposed two copies of risk alleles (RERI=2.87...
Patterns of interactions could influence the biological systems at various levels and potentially affect the evolutionary history. Gene interactions could affect the relation among genotypes and their phenotypes. Polymorphisms of genes potentially alter interactions among genes, and hence, affect the fitness of individuals. Certain combinations of polymorphisms among genes can be maintained by selection. The main question of this thesis regards the effects of interactions in biological systems.
Reproductive isolation arises as a by-product of different combinations of substitutions between divergent populations. Bateson-Dobzhansky-Muller (BDM) model states fitness changes due to incompatible combinations of loci. Nonlinear rates of accumulation of incompatibilities have been proposed considering interactions among multiple loci. However, the effects of topologies of gene interaction networks (GINs) altering the rates of accumulation of incompatibilities have not been investigated.
The third topic revolves around effects of gene interactions in hybridizing species. Gene flow homogenizes the gene pool of incipient species and impedes divergence. This process can take place because incipient species either remain in spatial contact or have secondary contact through range shifts. The porous intrinsic reproductive barriers between species for loci post various properties contributing to success to move between species.
We utilized human GINs combined with single nucleotide polymorphisms (SNPs) from human HapMap to investigate the correlations between interactions and interlocus nonrandom associations of polymorphisms. To investigate the effects of gene interactions between species...
Background: Although genome-wide association studies have successfully identified thousands of variants associated to complex traits, these variants only explain a small amount of the entire heritability of the trait. Gene-gene interactions have been proposed as a source to explain a significant percentage of the missing heritability. However, detecting gene-gene interactions has proven to be very difficult due to computational and statistical challenges. The vast number of possible interactions that can be tested induces very stringent multiple hypotheses corrections that limit the power of detection. These issues have been mostly highlighted for the identification of pairwise effects and are even more challenging when addressing higher order interaction effects. In this work we explore the use of local ancestry in recently admixed individuals to find signals of gene-gene interaction on human traits and diseases. Results: We introduce statistical methods that leverage the correlation between local ancestry and the hidden unknown causal variants to find distant gene-gene interactions. We show that the power of this test increases with the number of causal variants per locus and the degree of differentiation of these variants between the ancestral populations. Overall...
When two genes interact to cause a clinically important phenotype, it would seem reasonable to expect that we could leverage genotypic information at one of the loci in order to improve our ability to detect the other. We were therefore interested in extending the posterior probability of linkage (PPL), a class of linkage statistics we have been developing over the past decade, in order to explicitly allow for gene × gene interaction. In this report we utilize a new implementation of the PPL incorporating liability classes (LCs), which provide a direct parameterization of gene × gene interaction by allowing the penetrances at the locus being evaluated to depend upon measured genotypes at a known locus. With knowledge of the generating model for the simulated rheumatoid arthritis (RA) data, we selected two loci for examination: Locus A, which in interaction with the HLA-DR antigen locus affects risk of the dichotomous RA phenotype; and Locus E, which in interaction with DR affects quantitative levels of the anti-CCP phenotype. The data comprised nuclear families of two parents and an affected sib pair (ASP). Our results confirm theoretical work suggesting that gene × gene interactions CANNOT be leveraged to improve linkage detection for dichotomous traits based on affecteds-only data structures. However...
Epistasis has been suggested to underlie part of the missing heritability in genome-wide association studies. In this study, we first report an analysis of gene-gene interactions affecting HDL cholesterol (HDL-C) levels in a candidate gene study of 2,091 individuals with mixed dyslipidemia from a clinical trial. Two additional studies, the Atherosclerosis Risk in Communities study (ARIC; n = 9,713) and the Multi-Ethnic Study of Atherosclerosis (MESA; n = 2,685), were considered for replication. We identified a gene-gene interaction between rs1532085 and rs12980554 (P = 7.1×10−7) in their effect on HDL-C levels, which is significant after Bonferroni correction (Pc = 0.017) for the number of SNP pairs tested. The interaction successfully replicated in the ARIC study (P = 7.0×10−4; Pc = 0.02). Rs1532085, an expression QTL (eQTL) of LIPC, is one of the two SNPs involved in another, well-replicated gene-gene interaction underlying HDL-C levels. To further investigate the role of this eQTL SNP in gene-gene interactions affecting HDL-C, we tested in the ARIC study for interaction between this SNP and any other SNP genome-wide. We found the eQTL to be involved in a few suggestive interactions, one of which significantly replicated in MESA. Importantly...
Background: While the importance of gene-gene interactions in human diseases
has been well recognized, identifying them has been a great challenge,
especially through association studies with millions of genetic markers and
thousands of individuals. Computationally efficient and powerful tools are in
great need for the identification of new gene-gene interactions in
high-dimensional association studies. Result: We develop C++ software for
genome-wide gene-gene interaction analyses (GWGGI). GWGGI utilizes tree-based
algorithms to search a large number of genetic markers for a disease-associated
joint association with the consideration of high-order interactions, and then
uses non-parametric statistics to test the joint association. The package
includes two functions, likelihood ratio Mann-whitney (LRMW) and Tree
Assembling Mann-whitney (TAMW).We optimize the data storage and computational
efficiency of the software, making it feasible to run the genome-wide analysis
on a personal computer. The use of GWGGI was demonstrated by using two real
data-sets with nearly 500 k genetic markers. Conclusion: Through the empirical
study, we demonstrated that the genome-wide gene-gene interaction analysis
using GWGGI could be accomplished within a reasonable time on a personal
This paper presents a non-parametric classification technique for identifying
a candidate bi-allelic genetic marker set that best describes disease
susceptibility in gene-gene interaction studies. The developed technique
functions by creating a mapping between inferred haplotypes and case/control
status. The technique cycles through all possible marker combination models
generated from the available marker set where the best interaction model is
determined from prediction accuracy and two auxiliary criteria including
low-to-high order haplotype propagation capability and model parsimony. Since
variable-length haplotypes are created during the best model identification,
the developed technique is referred to as a variable-length haplotype
construction for gene-gene interaction (VarHAP) technique. VarHAP has been
benchmarked against a multifactor dimensionality reduction (MDR) program and a
haplotype interaction technique embedded in a FAMHAP program in various
two-locus interaction problems. The results reveal that VarHAP is suitable for
all interaction situations with the presence of weak and strong linkage
disequilibrium among genetic markers.; Comment: 7 pages, 2 figures
Much of the natural variation for a complex trait can be explained by
variation in DNA sequence levels. As part of sequence variation, gene-gene
interaction has been ubiquitously observed in nature, where its role in shaping
the development of an organism has been broadly recognized. The identification
of interactions between genetic factors has been progressively pursued via
statistical or machine learning approaches. A large body of currently adopted
methods, either parametrically or nonparametrically, predominantly focus on
pairwise single marker interaction analysis. As genes are the functional units
in living organisms, analysis by focusing on a gene as a system could
potentially yield more biologically meaningful results. In this work, we
conceptually propose a gene-centric framework for genome-wide gene-gene
interaction detection. We treat each gene as a testing unit and derive a
model-based kernel machine method for two-dimensional genome-wide scanning of
gene-gene interactions. In addition to the biological advantage, our method is
statistically appealing because it reduces the number of hypotheses tested in a
genome-wide scan. Extensive simulation studies are conducted to evaluate the
performance of the method. The utility of the method is further demonstrated
with applications to two real data sets. Our method provides a conceptual
framework for the identification of gene-gene interactions which could shed
novel light on the etiology of complex diseases.; Comment: Published in at http://dx.doi.org/10.1214/12-AOAS545 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org)
Leukotrienes are arachidonic acid derivatives long known for their inflammatory properties and their involvement with a number of human diseases, most notably asthma. Recently, leukotriene-based inflammation has also been implicated in atherosclerosis: ALOX5AP and LTA4H, two genes in the leukotriene biosynthesis pathway, have been associated with various cardiovascular disease (CVD) phenotypes. To assess the role of the leukotriene pathway in CVD pathogenesis, we performed genetic association studies of ALOX5AP and LTA4H in a non-familial data set of early onset coronary artery disease. Our results support a modest role for the leukotriene pathway in atherosclerosis pathogenesis, reveal important genomic interactions within the pathway, and suggest the importance of using pathway-based modeling for evaluating the genomics of atherosclerosis susceptibility. Motivated by this need, we investigated the statistical properties of a class of matrix-based statistics to assess epistasis. We simulated multiple two-variant disease models with haplotypes to gain an understanding of pathway interactions in terms of correlation patterns. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequlibrium (LD) patterns with other haplotype markers. The simulated models can be summarized into three categories: 1. No epistasis in the presence of marginal effects and LD; 2. Epistasis in the presence of LD and no marginal effects; and 3. Epistasis in the presence marginal effects and LD. We then assessed previously introduced single-gene methods that compare whole matrices of Single Nucleotide Polymorphism (SNP) LD between two samples. These methods include comparing two sets of principal components...
The success of genome-wide association studies (GWAS) has been limited by missing heritability and lack of biological relevance of identified variants. We sought to address these issues by characterizing interaction among genotypes and environment using case-control samples enrolled at Duke University Medical Center. First, we studied the impact of age on coronary artery disease (CAD). Gene-by-age (GxAGE) interactions were tested at genome-wide scale, along with genes' marginal effects in age-stratified groups. Based on the interaction model, age plays the role as a modifier of the age-CAD relationship. SNPs associated with CAD in both young and old demonstrate consistency in effect sizes and directions. In spite of these SNPs, vastly different CAD associated genes were discovered across age and race groups, suggesting age-dependent mechanisms of CAD onset. Second, we explored gene-by-gene interaction (GxG) using a statistical model and compared results to biological evidence. Specifically, we investigated GATA2 as a candidate gene transcription factor, and modeled the interaction with genome-wide SNPs. The genetic effects at interacting loci were modified by GATA2 genotype. Without taking GATA2 variants into account , no marginal main effects were detected. Open access ChIP-seq data was available for comparison with the statistical model...