Will it be possible to use diff on a specific columns in a file?
file1
Something 123 item1 Something 456 item2 Something 768 item3 Something 353 item4
file2
Another 123 stuff1 Another 193 stuff2 Another 783 stuff3 Another 353 stuff4
output(Expected)
Something 456 item2 Something 768 item3 Another 193 stuff2 Another 783 stuff3
I want to diff the 2nd column of each file, then, the result will contain the diff-ed column but along with the whole line.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
awk is a better tool for comparing columns of files. See, for example, the answer to: compare two columns of different files and print if it matches — there are similar answers out there for printing lines for matching columns.
Since you want to print lines that don’t match, we can create an awk command that prints the lines in file2 for which column 2 has not been seen in file1:
$ awk 'NR==FNR{c[$2]++;next};c[$2] == 0' file1 file2
Another 193 stuff2
Another 783 stuff3
As explained similarly by terdon in the above-mentioned question,
NR==FNR: NR is the current input line number and FNR the current file’s line number. The two will be equal only while the 1st file is being read.c[$2]++; next: if this is the 1st file, save the 2nd field in thecarray. Then, skip to the next line so that this is only applied on the 1st file.c[$2] == 0: the else block will only be executed if this is the second file so we check whether field 2 of this file has already been seen (c[$2]==0) and if it has been, we print the line. Inawk, the default action is to print the line so ifc[$2]==0is true, the line will be printed.
But you also want the lines from file1 for which column 2 doesn’t match in file2. This you can get by simply exchanging their position in the same command:
$ awk 'NR==FNR{c[$2]++;next};c[$2] == 0' file2 file1
Something 456 item2
Something 768 item3
So now you can generate the output you want, by using awk twice. Perhaps someone with more awk expertise can get it done in one pass.
You tagged your question with /ksh, so I’ll assume you are using korn shell. In ksh you can define a function for your diff, say diffcol2, to make your job easier:
diffcol2()
{
awk 'NR==FNR{c[$2]++;next};c[$2] == 0' $2 $1
awk 'NR==FNR{c[$2]++;next};c[$2] == 0' $1 $2
}
This has the behavior you desire:
$ diffcol2 file1 file2 Something 456 item2 Something 768 item3 Another 193 stuff2 Another 783 stuff3
Method 2
I don’t think diff (even in combination with cut) will be flexible enough to handle this. And it seems as though what you really want is keys in file1 that are not in file2 and vice versa – not strictly a line-by-line diff. If the input files are big, I would go with perl, but for small files this awk script works for the input provided:
%cat a.awk
BEGIN {
while (getline < "file1") {
line=$0;
split(line,f," ");
key=f[2];
f1[key]=line
}
while (getline < "file2") {
line=$0;
split(line,f," ");
key=f[2];
f2[key]=line
}
}
END {
for (c in f1) {
if (c in f2 == 0) print f1[c]
}
for (c in f2) {
if (c in f1 == 0) print f2[c]
}
}
And this is how you run it (note the trick with /dev/null, since awk expects an input file as a parameter:
%awk -f a.awk /dev/null Something 456 item2 Something 768 item3 Another 193 stuff2 Another 783 stuff3
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0