2012年9月25日星期二

BamUtil: stats

http://genome.sph.umich.edu/wiki/BamUtil:_stats


Basic (--basic)

Prints summary statistics for the file:
  • TotalReads - # of reads that are in the file
  • MappedReads - # of reads marked mapped in the flag
  • PairedReads - # of reads marked paired in the flag
  • ProperPair - # of reads marked paired AND proper paired in the flag
  • DuplicateReads - # of reads marked duplicate in the flag
  • QCFailureReads - # of reads marked QC failure in the flag
  • MappingRate(%) - # of reads marked mapped in the flag / TotalReads
  • PairedReads(%) - # of reads marked paired in the flag / TotalReads
  • ProperPair(%) - # of reads marked paired AND proper paired in the flag / TotalReads
  • DupRate(%) - # of reads marked duplicate in the flag / TotalReads
  • QCFailRate(%) - # of reads marked QC failure in the flag / TotalReads
  • TotalBases - # of bases in all reads
  • BasesInMappedReads - # of bases in reads marked mapped in the flag

Qual/Phred (--phred and --qual)

Prints a count of the number of times each quality value appears in the file to stderr.
  • phred Displays Quality as phred integers [0-93]
  • qual Displays Quality as non-phred integers (phred + 33) [33-126]
By default, these counts include all qualities in the BAM file.
To exclude unmapped reads and soft clips, use --excludeFlags 4.
To only include records that overlap a set of regions, use --regionList and specify a bed file with the regions. If a read overlaps the region, all qualities will be counted even if those bases do not fall in the region. If you only want to count qualities that fall within the region, also specify --withinRegion. Without excluding unmapped reads, it will include soft clips that overlap the region.

BaseQC (--pBaseQC and --cBaseQC and --baseSum)

The pBaseQC and cBaseQC options generate per base statistics. Only one of these two options can be specified. They write statistics generated for each position to the file specified after the option. They use the same logic for calculating statistics, but pBaseQC writes the statistics as percentages, and cBaseQC writes them as counts. The order of the statistics are also different.
The baseSum option can be used with either pBaseQC or cBaseQC or on its own. baseSum generates a summary of the per position statistics and writes it to stderr. It calculates the per position base statistics even if they will not be written anywhere (neither pBaseQC nor cBaseQC are specified).

All three options use the same logic for calculating the statistics:
  • A read spans a position if the read starts at or before the position, ends at or after the position and the position is not a clip. CIGAR operations allowed for the position are M/X/=/D/N. If the CIGAR is '*', only numbers for the specified reference position are incremented.
  • Currently there is no special logic to exclude positions/reads where the reference base is 'N' or the read base is 'N'.

没有评论:

发表评论