Reference panel for imputation of Indian SNP data
India has one of most diverse human genetic pool in the world with 4635 anthropologically well-defined populations, who speaks one of the 4 major language families; Indo-European (IE), Dravidian (DR), Austro-Asiatic (AA) and Tibeto-Burmans (TB); and maintaining endogamy for the last thousands of years. We have earlier demonstrated that the contemporary Indian populations are the admixture of 2 ancestral populations; Ancestral North Indians (ANI) and Ancestral South Indians (ASI)1. While, ANI has genetic affinities with Middle Easterns, West-Eurasians and Europeans; ASI is not related to any group outside Indian sub-continent. Hence, genetic data on Indian populations should be analyzed with caution!
To increase the statistical power of genome-wide association study (GWAS), it has been the common practice to impute the missing genotypes with various tools, including Beagle2. Since, this tool needs the reference panel of haplotype for imputation; we used existing reference panels, which include;African, European and East Asian HapMap population samples; South-Asians of 1000 genome project3; Indian population samples (Indo-Europeans and Dravidians); and combined HapMap and Indian population samples. We found that the Indian reference samples have shown better performance, compared to other reference samples. Based on this, we generated our own reference panel (8,717,71 SNPs) for imputation, which includes haplotype of founders in 15 Dravidians trios and 13 Indo-Europeans trios4,5.
We believe that this Indian reference panel would be highly useful for those, who are working on Indian population genetics. Hence, the data and the script (for examining imputation accuracy) are made freely available for the research purpose.
Downloads
CCMB_refrence_panel.zip
Imputation_errors_tools.zip
Please feel free to contact us in case of any difficulty:
thangs@ccmb.res.in
snizam@ccmb.res.in / snizam001@gmail.com
References
- Reich D, Thangaraj K, Patterson N, Price AL, Singh L: Reconstructing Indian population history. Nature 2009; 461: 489-494.
- Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007; 81: 1084-1097.
- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
- Periyasamy Govindaraj#, Sheikh Nizamuddin#, Anugula Sharath, Vuskamalla Jyothi, Harish Rotti, Ritu Raval, Jayakrishna Nayak, Balakrishna K Bhat, B.V. Prasanna, Pooja Shintre, Mayura Sule, Kalpana S. Joshi, Amrish P. Dedge, Ramachandra Bharadwaj, G.G. Gangadharan, Sreekumaran Nair, Puthiya M Gopinath, Bhushan Patwardhan, Paturu Kondaiah, Kapaettu Satyamoorthy, Marthanda Varma Sankaran Valiathan, Kumarasamy Thangaraj. Genome wide analysis correlates Ayurveda Prakriti. Sci. Rep. 5, 15786. (# equal contribution)
- Nizamuddin, Sheikh, K. Thangaraj.Signatures of natural selection in the drug metabolizing enzyme genes: Opportunity for developing personalized and precision medicine. bioRxiv, doi: 10.1101/113514
# equal contribution