Population-scale analysis of human microsatellites reveals novel sources of exonic variation
Using our microsatellite specific genotyping method, we analyzed tandem repeats, which are known to be highly variable with some recognized as biomarkers causative of disease, in over 500 individuals who were exon sequenced in a 1000 Genomes Project pilot study. We were able to genotype over 97% of the microsatellite loci in the targeted regions. A total of 25,115 variations were observed, including repeat length and single nucleotide polymorphisms, corresponding to an average of 45.6 variations per individual and a density of 1.1 variations per kilobase. Standard variant detection did not report 94.2% of the exonic repeat length variations in part because the alignment techniques are not ideal for repetitive regions. Additionally some standard variation detection tools rely on a database of known variations, making them less likely to call repeat length variations as only a small percent of these loci (~ 6000) have been accurately characterized. A subset of the hundreds of non-synonymous variations we identified was experimentally validated, indicating an accuracy of 96.5% for our microsatellite-based genotyping method, with some novel variants identified in genes associated with cancer. We propose that microsatellite-based genotyping be used as a part of large scale sequencing studies to identify novel variants.
► We use microsatellite-based genotyping to identify variations in 551 individuals. ► Over 68% of the exonic repeat length variations we identify are novel. ► Indel-based genotyping only reports 5.8% of the exonic repeat length variations. ► Microsatellite-based genotyping accuracy, from experimental validation, is 96.5%. ► Novel non-synonymous variations we identify are in cancer genes