Scripts were provided for De novo genomic analyses for non-model organisms studies.
High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease
of obtaining such data, using these data effectively still poses notable challenges, especially for
those working with organisms without a high-quality reference genome. For every stage of
analysis – from assembly to annotation to variant discovery – researchers have to distinguish
technical artifacts from the biological realities of their data before they can make inference. In
this work, I explore these challenges by generating a large de novo comparative transcriptomic
dataset data for a clade of lizards and constructing a pipeline to analyze these data. Then, using
a combination of novel metrics and an externally validated variant data set, I test the efﬁcacy
of my approach, identify areas of improvement, and propose ways to minimize these errors. I
ﬁnd that with careful data curation, HTS can be a powerful tool for generating genomic data
for non-model organisms.
- General utility scripts (Singhal): https://github.com/singhal/randomPerlScripts