How to split a file according to a column (including header) and rename the generated files?

I have a .txt that can be exemplified like this:

NAME | CODE
name1 | 001
name2 | 001
name3 | 002
name4 | 003
name5 | 003
name6 | 003

I need to write a script to split this file according to the CODE column, so in this case I’d get this:

file 1:
NAME | CODE
name1 | 001
name2 | 001

file 2:
NAME | CODE
name3 | 002

file 3:
NAME | CODE
name4 | 003
name5 | 003
name6 | 003

According to some research, using awk would work:

$ awk -F, '{print > $2".txt"}' inputfile

The thing is, I also need to include the header to the first line and I need the file names to be different. Instead of 001.txt, for example, I need the file name to be something like FILE_$FILENAME_IDK.txt.

Contents hide

Answers:

Method 1

Method 2

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could try like this:

awk 'NR==1{h=$0; next}
!seen[$3]++{f="FILE_"FILENAME"_"$3".txt";print h > f} 
{print >> f}' infile

The above saves the header in a variable h (NR==1{h=$0; next}) then, if $3 not seen (!seen[$3]++ i.e. if it’s the first time it encounters the current value of $3) it sets the filename (f=...) and writes the header to filename (print h > f). Then it appends the entire line to filename (print >> f). It uses default FS (field separator): blank. If you want to use | as FS (or even a regex with gnu awk) see cas‘ comment below.

Method 2

I bet someone is going to come up with a one-liner, but I had to make a script:

in='inputfile'
header=$(head -n1 "$in")
codes=($(sed -n 's/.*| ([0-9]+)/1/p' "$in" | uniq ))
for line in "${codes[@]}"; do
    out="file_$i.txt"
    echo "$header" > "$out"
    grep "|.* $line$" "$in" >> "$out"
done

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating