University of Southern California
If you are a patient, please contact The Doctors of USC at 800-USC-CARE.


Description of our approach

In this collaboration, we have initially adopted a candidate gene approach to understanding genetic causes of breast, prostate, and colorectal cancers in the MEC.  The genes under study are those selected based on a priori biological hypotheses of their involvement in a particular disease pathway. Our focus is on genes involved in steroid hormone metabolism, receptor proteins, and the IGF-pathways.  Our initial methodology utilizes a two-pronged systematic approach to comprehensively survey genetic variation in these genes:  

  1. To perform deep resequencing of putative functional regions (exons, 5’ upstream regions, and conserved non-coding regions) in DNA from 19 individuals from each of the five major ethnic groups (African-Americans, Hawaiians, Japanese, Latinos, and Whites) with advanced breast and prostate cancers.  This work is currently being conducted at the Broad Institute.  
  2. To perform haplotype analysis across the entire genomic locus of each gene to search for disease-associated variants in coding and non-coding regions.

 Haplotype-based studies

Haplotype-based association studies have been proposed as a powerful comprehensive approach to identify causal genetic variation underlying complex diseases [Daly et al., 2001; Gabriel et al., 2002].  Recently, studies have shown that the human genome is comprised of genomic segments (blocks) that display little evidence of historical recombination and low haplotype diversity [Patil et al., 2001; Gabriel et al., 2002].  Due to the high degree of linkage disequilibrium (LD) observed between SNPs within these blocks, disease-variants may be uncovered through evaluation of the underlying haplotypes.  This methodology does not require the causal variant to be identified and tested directly, but rather, has the potential to highlight physical regions that harbor putative disease-associated variants.   

SNP Selection and Genotyping in a Multiethnic Panel

We survey genetic variation across each gene from 20 kb upstream of transcription to 10 kb downstream of the gene.  We attempt to select SNPs every 3-5 kb across the locus to ensure a high density of markers of moderate allele frequency and to provide adequate characterization of genetic haplotype diversity within defined LD blocks.  We included all known SNPs in the coding region.  SNPs are selected from the National Center for Biotechnology Information SNP database (dbSNP: http://www.ncbi.nlm.nih.gov/SNP), the Celera database (http://www.celera.com), the literature and from our own sequencing effort.  SNPs are genotyped in a sample of 349 women in the MEC without a history of cancer: African-American (n=70), Hawaiian (n=69), Japanese (n=70), Latina (n=70) and White (n=70).  This sample size guarantees that any haplotype with a frequency of  = 5% in any one ethnic group will be represented at least once among the 140 chromosomes with probability > 99%.  Genotyping is performed by time-of-flight mass spectrometry (MALDI-TOF) using the Sequenom platform at the Broad Institute.        

Haplotype Block Determination

LD block structure is examined in each ethnic group using the criteria of Gabriel et al. [2002], which utilizes the 90% confidence bounds of D' to define sites of historical recombination between SNPs.  SNPs are selected in an iterative manner and added until there are 6-8 common SNPs (= 10%) per LD block and the distance between adjacent blocks is < 10 kb. Block structure is assessed using SNPs with minor allele frequencies = 10%.  On occasion, SNPs with minor allele frequencies as low as 5% are needed to both extend block boundaries as well as to fully describe the diversity of the underlying common haplotypes in each ethnic group. 

 Haplotype Reconstruction and htSNP Selection

Haplotype frequency estimates are constructed from genotype data in the multiethnic panel (one ethnicity at a time) within blocks using the expectation-maximization (EM) algorithm of Excoffier and Slatkin [1995]. The squared correlations (Rh2’s) between the true haplotypes (h’s) and their estimates from this calculation are then estimated as described by Stram et al. [2003]. Haplotype tagging SNPs (htSNPs) for the case-control study are then chosen by finding the minimum set of SNPs (within a block) which would have Rh2 = 0.7 for all haplotypes with an estimated frequency of  = 5%, where  Rh2 is a sample size inflation factor.  To achieve equivalent power as having perfectly tagged the haplotypes using N samples requires N/Rh2 samples.  A computer program for the calculation of Rh2 is available http://www-rcf.usc.edu/~stram .  We include as htSNPs all coding region SNPs before minimizing the number of htSNPs required to predict the common haplotypes.  Genotyping of htSNPs in the case-control studies are performed by the 5' nuclease Taqman allelic discrimination assay using the ABI7900 (Applied Biosystems, Foster City, CA) in the MEC Genotyping Laboratory at USC, the Genomics Core Laboratory at the University of Hawaii and by MALDI-TOF using the Sequenom platform at the Broad Institute.

 References

Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921-7.

Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229-32.

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225-9.

Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719-23.

Stram DO, Haiman CA, Hirschhorn J, Altshuler D, Kolonel LN, Henderson BE, Pike MC (2003) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered: 55 (1):27-36.