Objective: Merge the contents of two files using common key present in the files
file1.txt ========= key1 11 key2 12 key3 13 file2.txt ========= key2 22 key3 23 key4 24 key5 25 Expected Output : ================== key1 11 key2 12 22 key3 13 23 key4 24 key5 25
Approaches tried:
-
joincommand:join -a 1 -a 2 file1.txt file2.txt ## full outer join
-
awk:awk 'FNR==NR{a[$1]=$2;next;}{ print $0, a[$1]}' 2.txt 1.txt
Approach 2 is resulting in a right outer join and NOT a full outer join:
key1 11 key2 12 22 key3 13 23
What needs to be modified in approach 2 to result in a full outer join?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
My solution using join:
join -a1 -a2 -1 1 -2 1 -o 0,1.2,2.2 -e "NULL" file1 file2
I don’t know much about awk for joining large files and always use join.
key1 11 NULL key2 12 22 key3 13 23 key4 NULL 24 key5 NULL 25
Method 2
My solution with awk:
awk '{a[$1]=a[$1]" "$2} END{for(i in a)print i, a[i]}' file1.txt file2.txt
With keyn as index, append the second fields from each line to corresponding a[keyn](with space). At the end, print all the indices and array element.
Output:
AMD$ awk '{a[$1]=a[$1]" "$2} END{for(i in a)print i, a[i]}' file1.txt file2.txt
key1 11
key2 12 22
key3 13 23
key4 24
key5 25
Method 3
With awk, try:
awk '{a[$1]=($1 in a)?a[$1]" "$2:$2};END{for(i in a)print i,a[i]}' file1 file2
For huge files, you should use join instead of awk approach, since when awk approach will store all files content in memory before printing out.
Method 4
Your first join seems to be ok here, although it is misspelled in caps letters:
$>join -a 1 -a 2 file1.txt file2.txt key1 11 key2 12 22 key3 13 23 key4 24 key5 25
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0