I need to merge below 2 files:
- file1:
TABLES REF-IO HEAD-IO DIFF-IO
test 200 500 -300
exam 2 3 -1
final 2 1 1
mail 4 2 2
TOTAL 208 506 -298
- file2:
TABLES REF-SELECT HEAD-SELECT DIFF-SELECT
test 5 7 -2
game 3 3 0
exam 0 7 -7
final 12 6 6
TOTAL 20 23 -3
merged file should be as shown below:
TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
awk '
NR==FNR {vals[$1] = $2 " " $3 " " $4; next}
!($1 in vals) {vals[$1] = "0 0 0"}
{$(NF+1) = vals[$1]; print}
' file2 file1
TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT test 200 500 -300 5 7 -2 exam 2 3 -1 0 7 -7 final 2 1 1 12 6 6 mail 4 2 2 0 0 0 TOTAL 208 506 -298 20 23 -3
Method 2
Here’s how to do it with join and preserve line order, headers, footers etc…
join -1 2 -2 1 -a 1 -e 0 -o 1.1,1.2,1.3,1.4,1.5,2.2,2.3,2.4 <(sort -k2,2 <(nl -ba -nrz file1)) <(sort -k1,1 file2) | sort -k1,1n | cut -d -f2-
result:
TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3
How it works:
nl -ba -nrz file1 numbers the lines in file1, the output is then sorted on 2nd field; file2 is also sorted on 1st field and the results are then joined on 2nd field from 1st input and 1st field from 2nd input, replacing missing input fields with 0:
000003 exam 2 3 -1 0 7 -7
000004 final 2 1 1 12 6 6
000005 mail 4 2 2 0 0 0
000001 TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
000002 test 200 500 -300 5 7 -2
000006 TOTAL 208 506 -298 20 23 -3
this is then sorted on 1st field with sort -k1,1n to restore line order and then cut -d -f2- removes the line numbers. You could further prettify it with column e.g. .... | column -t:
TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3
Method 3
This script should work:
touch resultFile
while read f; do
header1=$(echo $f | awk '{print $1;}');
values1=$(echo $f | awk -F "$header1 " '{print $NF;}');
while read g; do
header2=$(echo $g | awk '{print $1;}');
values2=$(echo $g | awk -F "$header2 " '{print $NF;}');
if [ $header1 = $header2 ]; then
echo "$header1 $values1 $values2" >> resultFile
fi
done < file2
done < file1
Headers must be unique, otherwise only the last line found will be merged.
PS: I have considered that “mail” header should not be present in the result as it is not present in both files, as “game” (in your example “mail” is present but not “game”).
Method 4
In case you were wondering, it is possible to do this with join, although the order of the lines is not preserved:
(
join <(head -n1 file1) <(head -n1 file2)
join -a 1 -e 0 -o '0 1.2 1.3 1.4 2.2 2.3 2.4'
<(sed -n 2,5p file1 | sort)
<(sed -n 2,5p file2 | sort)
join <(tail -n1 file1) <(tail -n1 file2)
)
Output:
TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
test 200 500 -300 5 7 -2
TOTAL 208 506 -298 20 23 -3
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0