I have a .txt that can be exemplified like this:
NAME | CODE name1 | 001 name2 | 001 name3 | 002 name4 | 003 name5 | 003 name6 | 003
I need to write a script to split this file according to the CODE column, so in this case I’d get this:
file 1: NAME | CODE name1 | 001 name2 | 001 file 2: NAME | CODE name3 | 002 file 3: NAME | CODE name4 | 003 name5 | 003 name6 | 003
According to some research, using awk would work:
$ awk -F, '{print > $2".txt"}' inputfile
The thing is, I also need to include the header to the first line and I need the file names to be different. Instead of 001.txt, for example, I need the file name to be something like FILE_$FILENAME_IDK.txt.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could try like this:
awk 'NR==1{h=$0; next}
!seen[$3]++{f="FILE_"FILENAME"_"$3".txt";print h > f}
{print >> f}' infile
The above saves the header in a variable h (NR==1{h=$0; next}) then, if $3 not seen (!seen[$3]++ i.e. if it’s the first time it encounters the current value of $3) it sets the filename (f=...) and writes the header to filename (print h > f). Then it appends the entire line to filename (print >> f). It uses default FS (field separator): blank. If you want to use | as FS (or even a regex with gnu awk) see cas‘ comment below.
Method 2
I bet someone is going to come up with a one-liner, but I had to make a script:
in='inputfile'
header=$(head -n1 "$in")
codes=($(sed -n 's/.*| ([0-9]+)/1/p' "$in" | uniq ))
for line in "${codes[@]}"; do
out="file_$i.txt"
echo "$header" > "$out"
grep "|.* $line$" "$in" >> "$out"
done
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0