2013年2月11日星期一

Script to parse fasta headers

http://www.biostars.org/p/62884/


Assuming that you do not have spaces in your sequences you can try that:
sed -e '/^>/ s/ .*//' mybigfile
Awk should work too (even if your sequences contain spaces):
awk '{print /^>/ ? $1 : $0}' mybigfile
and just for fun, a pure bash version:
while read l ; do echo "${l%% *}" ; done < mybigfile

没有评论:

发表评论