Find records present in file 1 also in both file 2 and file 3

I have three files file1.txt, file2.txt, file3.txt and they are of same format.

I want to select records from file1.txt which are present in both file2.txt and file3.txt based on column 2 and column 3 respectively to the output file out.txt.

Also, I need to create another file, out2.txt, with additional columns (column 4 from file2.txt, column 5 from file3.txt).

Sample Input:

file1.txt

1. abc 1 a f11 f13 f14 
2. abd 2 b f12 f14 f13  
3. abe 4 d f13 f16 f12 
4. acf 6 s f14 f15 f19

file2.txt

 1. abc 1 a f21 f23 f24 
 2. abd 1 b f21 f24 f23  
 3. abe 4 d f24 f26 f22 
 4. acf 6 s f23 f25 f29

file3.txt

 1. abc 1 a f31 f33 f34 
 2. abd 2 b f31 f34 f33  
 3. acf 5 s f33 f35 f39 
 4. abe 4 d f34 f36 f32

Desired output

out.txt

 1. abc 1 a f11 f13 f14 
 2. abe 4 d f13 f16 f12

out2.txt

 1. abc 1 a f11 f13 f14 f21 f31
 2. abe 4 d f13 f16 f12 f24 f34

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You might want to check this site about diff3 with this program you can compare 3 files as sample output:

$ diff3 parent.txt your.txt mine.txt

 ==== 

1:1,2c Hello, 

This is parent file.

2:1,2c Hello, 

This is your file. 

3:1,2c Hello, 

This is my file.

You can use

diff3 file1.txt file2.txt file3.txt > output.txt

Method 2

To select shared lines in all files you can use grep (out.txt)

grep -ho ' [0-9] [a-z] ' file3 | grep -Fof - file2 | grep -Ff - file1

select operatable field (as variant cut -d' ' -f3,4 file3) and use it to search it in next file2 and file1.

As usual for joining 2 file use join command (surprise!) (out2.txt)

join -j 3 <(sort -k3,4 file1 | sed 's/ /+/3') 
          <(join -j 3 <(sort -k3,4 file2 | sed 's/ /+/3') 
                      <(sort -k3,4 file3 | sed 's/ /+/3') 
                      -o '1.4 2.4 1.3') 
          -o '1.1,1.2,1.3,1.4,1.5,2.1,2.2' | sed 's/+/ /'

so to operate with 3nd and 4th fields together we have to concatenate it (by + sign e.g.). As join operates with sorted lines only so we do sort by 3rd and 4th fields.
Firstly join file2 and file3, than result will be joined with file1 and remove + sign by sed

Method 3

Possible solution with awk (I will edit if needed because it is a little unclear from your question what are exact requirements):

awk 'FILENAME == ARGV[1] {
    m[$2,$3] = 0; z[$2,$3] = $5;
    next; 
}
FILENAME == ARGV[2] {
    if (($2,$3) in m) {
        m[$2,$3] = 1;
        z[$2,$3] = $5 " " z[$2,$3];
    }
    next;
}
{
    if (($2,$3) in m && m[$2,$3] == 1) {
        print $0 >"out.txt";
        print $0 " " z[$2,$3] >"out2.txt";
    }
}' file3.txt file2.txt file1.txt

We read the third file, create two arrays with keys column 2 and 3, first array filled with zeros, second with required value for file out2.txt. Then we read the second file and we check if key from column 2 and 3 exists in first array, if yes we change value from zero to one and we concatenate required value in the second array for file out2.txt. Finally we read the first file, check for existing key and we print interesting values in files out.txt and out2.txt so:

out.txt should contain:

1. abc 1 a f11 f13 f14
3. abe 4 d f13 f16 f12

out2.txt should contain:

1. abc 1 a f11 f13 f14 f21 f31
3. abe 4 d f13 f16 f12 f24 f34


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x