I have file1 likes:
0 AFFX-SNP-000541 NA 0 AFFX-SNP-002255 NA 1 rs12103 0.6401 1 rs12103_1247494 0.696 1 rs12142199 0.7672
And a file2:
0 AFFX-SNP-000541 1 0 AFFX-SNP-002255 1 1 rs12103 0.5596 1 rs12103_1247494 0.5581 1 rs12142199 0.4931
And would like a file3 such that:
0 AFFX-SNP-000541 NA 1 0 AFFX-SNP-002255 NA 1 1 rs12103 0.6401 0.5596 1 rs12103_1247494 0.696 0.5581 1 rs12142199 0.7672 0.4931
Which means to put the 4th column of file2 to file1 by the name of the 2nd column.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This should do it:
join -j 2 -o 1.1,1.2,1.3,2.3 file1 file2
Important: this assumes your files are sorted (as in your example) according to the SNP name. If they are not, sort them first:
join -j 2 -o 1.1,1.2,1.3,2.3 <(sort -k2 file1) <(sort -k2 file2)
Output:
0 AFFX-SNP-000541 NA 1 0 AFFX-SNP-002255 NA 1 1 rs12103 0.6401 0.5596 1 rs12103_1247494 0.696 0.5581 1 rs12142199 0.7672 0.4931
Explanation (from info join):
`join’ writes to standard output a line for each pair of input lines
that have identical join fields.
`-1 FIELD'
Join on field FIELD (a positive integer) of file 1.
`-2 FIELD'
Join on field FIELD (a positive integer) of file 2.
`-j FIELD'
Equivalent to `-1 FIELD -2 FIELD'.
`-o FIELD-LIST'
Otherwise, construct each output line according to the format in
FIELD-LIST. Each element in FIELD-LIST is either the single
character `0' or has the form M.N where the file number, M, is `1'
or `2' and N is a positive field number.
So, the command above joins the files on the second field and prints the 1st,2nd and 3rd field of file one, followed by the 3rd field of file2.
Method 2
You could use awk:
$ awk 'NR==FNR {h[$2] = $3; next} {print $1,$2,$3,h[$2]}' file2 file1 > file3
output:
$ cat file3 0 AFFX-SNP-000541 NA 1 0 AFFX-SNP-002255 NA 1 1 rs12103 0.6401 0.5596 1 rs12103_1247494 0.696 0.5581 1 rs12142199 0.7672 0.4931
Explanation:
Walk through file2 (NR==FNR is only true for the first file argument). Save column 3 in hash-array using column 2 as key: h[$2] = $3. Then walk through file1 and output all three columns $1,$2,$3, appending the corresponding saved column from hash-array h[$2].
Method 3
If you don’t need any ordering, than a simple solution would be
paste file{1,2} | awk '{print $1,$2,$3,$6}' > file3
This presumes that all rows have three entries, and column 1 and 2 of both files are the same (as in your example data)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0