merge (join) two files based one two columns, the rows are uniquely defined by the shared two columns in these two files. join (merge) all the columns from the two files based on the same values in these two share columns.
#####################################################################
# file 1
cat test
1 200 T C 1 500
1 539702 C C 1 1501
1 539703 T T 1 1502
1 539704 T T 1 1503
1 539705 A A 1 1504
1 539706 T C 1 1505
1 539707 G A 1 1506
1 539708 A A 1 1507
1 539709 G G 1 1508
1 539710 C - 1 1509
2 99 M T 3 1000
# file 2
head -n20 test2
1 539702 C C 1 1501
1 539703 T T 1 1502
1 539704 T T 1 1503
1 539705 A A 1 1504
1 539706 T C 1 1505
1 539707 G A 1 1506
1 539708 A A 1 1507
1 539709 G G 1 1508
1 539710 C - 1 1509
1 539714 A G 1 1510
1 539715 A A 1 1511
1 539716 A A 1 1512
#
nawk 'FNR==NR{f1[$1,$2]=$0;next}{idx=$1 SUBSEP $2; if(idx in f1) $0=f1[idx] OFS $0}1' test test21 539702 C C 1 1501 1 539702 C C 1 1501
1 539703 T T 1 1502 1 539703 T T 1 1502
1 539704 T T 1 1503 1 539704 T T 1 1503
1 539705 A A 1 1504 1 539705 A A 1 1504
1 539706 T C 1 1505 1 539706 T C 1 1505
1 539707 G A 1 1506 1 539707 G A 1 1506
1 539708 A A 1 1507 1 539708 A A 1 1507
1 539709 G G 1 1508 1 539709 G G 1 1508
1 539710 C - 1 1509 1 539710 C - 1 1509
1 539714 A G 1 1510
1 539715 A A 1 1511
1 539716 A A 1 1512
1 539717 G G 1 1513
1 539718 C T 1 1514
1 539719 T C 1 1515
#
nawk 'FNR==NR{f1[$1,$2]=$0;next}{idx=$1 SUBSEP $2; if(idx in f1) $0=f1[idx] OFS $0}1' test test2 | awk '$7'
1 539702 C C 1 1501 1 539702 C C 1 1501
1 539703 T T 1 1502 1 539703 T T 1 1502
1 539704 T T 1 1503 1 539704 T T 1 1503
1 539705 A A 1 1504 1 539705 A A 1 1504
1 539706 T C 1 1505 1 539706 T C 1 1505
1 539707 G A 1 1506 1 539707 G A 1 1506
1 539708 A A 1 1507 1 539708 A A 1 1507
1 539709 G G 1 1508 1 539709 G G 1 1508
1 539710 C - 1 1509 1 539710 C - 1 1509
http://www.unix.com/unix-dummies-questions-answers/158593-merging-two-files-based-two-columns-make-third-file.html
没有评论:
发表评论