Alternative content
Thanks for comment John. I was in hurry to post this query. Yes you are right I want to merge content of files into one big file using 1st column which is comman in all small files and this column is unique without any duplication. Paste command does not work with files with different length. for eg. file 1 has 100 rows and file 2 has 50 rows. So while merging, script should insert NA in case identifier values not present in file 2.
for e.g
file 1
X 50
Y 100
file 2
Y 50
Merge file
X 50 NA
Y 100 100
I know it is possible with R/Python, etc.
But I want to do this thing in awk.
Thanks
Hi Rahul,
Sorry, I am not good at awk. But you can achieve the same with join function.
$ join -1 1 -2 1 -3 1 -4 1 -5 1 -6 1 -7 1 -8 1 -9 1 -10 1 -a1 file1 file2 file3 file4 file5 file6 file7 file8 file9 file10
Note: File should be sorted.
Thanks
Hi Rahul,
I guess you have some typographical error above. In correct sense you want the following output in your merge file.
Merge file
X 50 NA
Y 100 50
If this is what you want then following code should work fine.
$ cat *.file | awk -F',' 'NF>1{a[$1] = a[$1]","$2}END{for(i in a){print i""a[i]}}'
Thanks
Hi Rahul,
Use this awk script to achieve the desire result.
BEGIN { FS = "\t" }
FNR==1 { ++file }
{
a[$1,file] = $2 FS $3
++seen[$1]
}
END {
for (j in seen) {
split(j, b, SUBSEP)
s = b[1] FS b[2]
for (i=1; i<=file; ++i) {
s = s FS (j SUBSEP i in a ? a[j,i] : "NA" FS "NA")
}
print s
}
}
Usage:
$ awk -f merge.awk file1 file2 file3 file4 file5 file6 file7 file8 file9 file10
Note: I assume only three column.
Best.
Hi Rahul,
Sorry, your question is not properly explained. A little example will help.
I guess you would like to read the first column of many tab delimited files (>10) and merge them side by side in an output file. The easiest way to achieve it through paste and cut command of Linux.
paste file1 file2 file3 ... file10 | cut -f1,2 > Newfile.txt
Thanks