File1.txt
id No
gi|371443199|gb|JH556661.1| 7907290
gi|371443198|gb|JH556662.1| 7573913
gi|371443197|gb|JH556663.1| 7384412
gi|371440577|gb|JH559283.1| 6931777
File2.txt
id P R S gi|367088741|gb|AGAJ01056324.1| 5 5 0 gi|371443198|gb|JH556662.1| 2 2 0 gi|367090281|gb|AGAJ01054784.1| 4 4 0 gi|371440577|gb|JH559283.1| 21 19 2
output.txt
id P R S NO gi|371443198|gb|JH556662.1| 2 2 0 7573913 gi|371440577|gb|JH559283.1| 21 19 2 6931777
File1.txt has two columns & File2.txt has four columns. I want to join both files which has unique id (array[1] should match in both files (file1.txt & file2.txt)
and give ouput only matched id (see output.txt).
I have tried join -v <(sort file1.txt) <(sort file2.txt). Any help with awk or join commands requested.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
join works great:
$ join <(sort File1.txt) <(sort File2.txt) | column -t | tac id No P R S gi|371443198|gb|JH556662.1| 7573913 2 2 0 gi|371440577|gb|JH559283.1| 6931777 21 19 2
ps. does ouput column order matter?
if yes use:
$ join <(sort 1) <(sort 2) | tac | awk '{print $1,$3,$4,$5,$2}' | column -t
id P R S No
gi|371443198|gb|JH556662.1| 2 2 0 7573913
gi|371440577|gb|JH559283.1| 21 19 2 6931777
Method 2
One way using awk:
Content of script.awk:
## Process first file of arguments. Save 'id' as key and 'No' as value
## of a hash.
FNR == NR {
if ( FNR == 1 ) {
header = $2
next
}
hash[ $1 ] = $2
next
}
## Process second file of arguments. Print header in first line and for
## the rest check if first field is found in the hash.
FNR < NR {
if ( $1 in hash || FNR == 1 ) {
printf "%s %sn", $0, ( FNR == 1 ? header : hash[ $1 ] )
}
}
Run it like:
awk -f script.awk File1.txt File2.txt | column -t
With following result:
id P R S NO gi|371443198|gb|JH556662.1| 2 2 0 7573913 gi|371440577|gb|JH559283.1| 21 19 2 6931777
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0