
BioAwk - fasta, fastq, SAM, BED, GFF aware awk programming

Bioawk is an extension to Brian Kernighan's awk created by Heng Li that adds support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q as well as generic TAB-delimited formats with the column names.
The source code can be found at: bioawk GitHub page. Users will need to download and run make to compile it. In the examples below it is assumed that this version of awk is being used.
There is a a short manual page in the main distribution and a longer HTML formatted help page
Extract unmapped reads without header:
awk -c sam 'and($flag,4)' aln.sam.gz
Extract mapped reads with header:
awk -c sam -H '!and($flag,4)'
Reverse complement FASTA:
awk -c fastx '{ print ">"$name;print revcomp($seq) }' seq.fa.gz
Create FASTA from SAM (uses revcomp if FLAG & 16)::
samtools view aln.bam | \
    awk -c sam '{ s=$seq; if(and($flag, 16)) {s=revcomp($seq) } print ">"$qname"\n"s}'
Get the %GC from FASTA:
awk -c fastx '{ print ">"$name; print gc($seq) }' seq.fa.gz
Get the mean Phred quality score from FASTQ:
awk -c fastx '{ print ">"$name; print meanqual($qual) }' seq.fq.gz
Take column name from the first line (where "age" appears in the first line of input.txt):
awk -c header '{ print $age }' input.txt

