Merge 2 files based on all values of the first column of the first file

I need to merge below 2 files:

file1:

TABLES REF-IO HEAD-IO DIFF-IO
test 200 500 -300
exam 2 3 -1
final 2 1 1
mail 4 2 2
TOTAL 208 506 -298

file2:

TABLES REF-SELECT HEAD-SELECT DIFF-SELECT
test 5 7 -2
game 3 3 0
exam 0 7 -7
final 12 6 6
TOTAL 20 23 -3

merged file should be as shown below:

TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

awk '
    NR==FNR {vals[$1] = $2 " " $3 " " $4; next} 
    !($1 in vals) {vals[$1] = "0 0 0"} 
    {$(NF+1) = vals[$1]; print}
' file2 file1

TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3

Method 2

Here’s how to do it with join and preserve line order, headers, footers etc…

join -1 2 -2 1 -a 1 -e 0 -o 1.1,1.2,1.3,1.4,1.5,2.2,2.3,2.4 
<(sort -k2,2 <(nl -ba -nrz file1)) <(sort -k1,1 file2) | 
sort -k1,1n | cut -d  -f2-

result:

TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
test 200 500 -300 5 7 -2
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
TOTAL 208 506 -298 20 23 -3

How it works:

nl -ba -nrz file1 numbers the lines in file1, the output is then sorted on 2nd field; file2 is also sorted on 1st field and the results are then joined on 2nd field from 1st input and 1st field from 2nd input, replacing missing input fields with 0:

000003 exam 2 3 -1 0 7 -7
000004 final 2 1 1 12 6 6
000005 mail 4 2 2 0 0 0
000001 TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
000002 test 200 500 -300 5 7 -2
000006 TOTAL 208 506 -298 20 23 -3

this is then sorted on 1st field with sort -k1,1n to restore line order and then cut -d -f2- removes the line numbers. You could further prettify it with column e.g. .... | column -t:

TABLES  REF-IO  HEAD-IO  DIFF-IO  REF-SELECT  HEAD-SELECT  DIFF-SELECT
test    200     500      -300     5           7            -2
exam    2       3        -1       0           7            -7
final   2       1        1        12          6            6
mail    4       2        2        0           0            0
TOTAL   208     506      -298     20          23           -3

Method 3

This script should work:

touch resultFile
while read f; do
    header1=$(echo $f | awk '{print $1;}');
    values1=$(echo $f | awk -F "$header1 " '{print $NF;}');
    while read g; do
        header2=$(echo $g | awk '{print $1;}');
        values2=$(echo $g | awk -F "$header2 " '{print $NF;}');
        if [ $header1 = $header2 ]; then
            echo "$header1 $values1 $values2" >> resultFile
        fi
    done < file2
done < file1

Headers must be unique, otherwise only the last line found will be merged.

PS: I have considered that “mail” header should not be present in the result as it is not present in both files, as “game” (in your example “mail” is present but not “game”).

Method 4

In case you were wondering, it is possible to do this with join, although the order of the lines is not preserved:

(
  join <(head -n1 file1) <(head -n1 file2)
  join -a 1 -e 0 -o '0 1.2 1.3 1.4 2.2 2.3 2.4' 
                    <(sed -n 2,5p file1 | sort) 
                    <(sed -n 2,5p file2 | sort)
  join <(tail -n1 file1) <(tail -n1 file2)
)

Output:

TABLES REF-IO HEAD-IO DIFF-IO REF-SELECT HEAD-SELECT DIFF-SELECT
exam 2 3 -1 0 7 -7
final 2 1 1 12 6 6
mail 4 2 2 0 0 0
test 200 500 -300 5 7 -2
TOTAL 208 506 -298 20 23 -3

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating