I have below inputs with huge number of rows
11|ABCD|19900101123123445455|555|AAA|50505050|0000009030 11|ABCD|19900101123123445455|555|AAA|50505050|0000000199 13|ABCD|201803010YYY66666666|600|ETC|20180300|0000084099 11|ABCD|19900101123123445455|555|AAA|50505050|0008995001
And I need to get below output
11|ABCD|19900101123123445455|555|AAA|50505050|9004230 13|ABCD|201803010YYY66666666|600|ETC|20180300|84099
I have been trying with below awk but having too limited knowledge with arrays.
cat test|awk -F"|" '{ a[$1]++;b[$2]++;c[$3]++;d[$4]++;e[$5]++;f[$6]+=$6 }; END { for (i in a); print i, f[i]}'
I need to sum last column of column number 6 and print all first 5 columns, which are separated by pipe and last 6th column as sum of 6th column.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
With GNU datamash command:
$ datamash -t'|' -s -g 1,2,3,4,5,6 sum 7 < infile 11|ABCD|19900101123123445455|555|AAA|50505050|9004230 13|ABCD|201803010YYY66666666|600|ETC|20180300|8409
In datamash v1.2+, you can specify the columns range also.
$ datamash -t'|' -s -g 1-6 sum 7 < infile
Or shortest AWK alternative and where you had N columns and you should not specify all one by one:
awk -F'|' '{x=$NF;NF--; a[$0]+=x} END{for(i in a) print i, a[i]}' OFS='|' infile
Method 2
Awk solution:
awk 'BEGIN{ FS=OFS="|" }
{ a[$1 FS $2 FS $3 FS $4 FS $5 FS $6] += $7 }
END{ for (i in a) print i, a[i] }' file
The output:
11|ABCD|19900101123123445455|555|AAA|50505050|9004230 13|ABCD|201803010YYY66666666|600|ETC|20180300|84099
Method 3
The idea is right, but for such a requirement you create the hash key as the values except the last column and use that key to sum up values in the last column. Once all the lines are processed in the END clause we print the summed up values
awk '
BEGIN {FS=OFS="|"} {
hashKey = ""
for(i=1;i<=(NF-1); i++) {
hashKey = ( hashKey ? (hashKey FS $i):$i )
}
total[hashKey]+=$NF
}
END { for ( j in total ) print j, total[j] }
' file
Method 4
and perl
perl -lne '
$sum{$1} += $2 if /(.*)|(.*)/
} END {
print "$_|$sum{$_}" for keys %sum
' file
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0