2011年3月21日星期一

access shell variables in awk

[1] http://www.tek-tips.com/faqs.cfm?fid=1281

You can use any of the following 3 methods to access shell variables inside an awk script ...

1. Assign the shell variables to awk variables after the body of the script, but before you specfiy the input

awk '{print v1, v2}' v1=$VAR1 v2=$VAR2 input_file

Note: There are a couple of constraints with this method;
- Shell variables assigned using this method are not available in the BEGIN section
- If variables are assigned after a filename, they will not be available when processing that filename ...

e.g.

awk '{print v1, v2}' v1=$VAR1 file1 v2=$VAR2 file2

In this case, v2 is not available to awk when processing file1.

2. Use the -v switch to assign the shell variables to awk variables. This works with nawk, but not with all flavours of awk. On my system (Solaris 2.6) -v cannot be used with /usr/bin/awk but will work with /usr/xpg4/bin/awk.

nawk -v v1=$VAR1 -v v2=$VAR2 '{print v1, v2}' input_file

3. Protect the shell variables from awk by enclosing them with "'" (i.e. double quote - single quote - double quote).

awk '{print "'"$VAR1"'", "'"$VAR2"'"}' input_file


[2] http://zvfak.blogspot.com/search/label/awk

If you have a shell variable in a bash script you can't pass it to AWK just by putting "$" sign in front of it, but you can enclose them with "'" in AWK code and they will be used in AWK with no problem.

for example you have a bed file called "example.bed":

$ cat example.bed
chr1 1000 2000 id1
chr1 4000 5000 id2
chr1 5500 6000 id3

Let's say you want to concatenate a string (in this case "brain_" string) to column 4 of this file, you can do this in AWK as follows:

$ awk '{OFS="\t";$4="brain_"$4; print;}' example.bed
chr1 1000 2000 brain_id1
chr1 4000 5000 brain_id2
chr1 5500 6000 brain_id3

however if you store the string in a variable as follows in the terminal or in a bash script:

$ TISSUE="brain_"

the following will not work,
$ awk '{OFS="\t";$4=$TISSUE$4; print;}' example.bed

but this will :

$ awk '{OFS="\t";$4="'"$TISSUE"'"$4; print;}' example.bed
chr1 1000 2000 brain_id1
chr1 4000 5000 brain_id2
chr1 5500 6000 brain_id3