Is there a command line spell to drop a column in a CSV-file?
Having a file of the following contents:
1111,2222,3333,4444 aaaa,bbbb,cccc,dddd
I seek to get a file equal to the original but lacking a n-th column like, for n = 2 (or may it be 3)
1111,2222,4444 aaaa,bbbb,dddd
or, for n = 0 (or may it be 1)
2222,3333,4444 bbbb,cccc,dddd
A real file can be gigabytes long having tens thousands columns.
As always in such cases, I suspect command line magicians can offer an elegant solution… 🙂
In my actual real case I need to drop 2 first columns, which can be done by dropping a first column twice in a sequence, but I suppose it would be more interesting to generalise a bit.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I believe this is specific to cut from the GNU coreutils:
$ cut --complement -f 3 -d, inputfile 1111,2222,4444 aaaa,bbbb,dddd
Normally you specify the fields you want via -f, but by adding –complement you reverse the meaning, naturally. From ‘man cut’:
--complement complement the set of selected bytes, characters or fields
One caveat: if any of the columns contain a comma, it will throw cut off, because cut isn’t a CSV parser in the same way that a spreadsheet is. Many
parsers have different ideas about how to handle escaping commas in CSV. For the simple CSV case, on the command line, cut is still the way to go.
Method 2
If the data is simply made of comma-separated columns:
cut -d , -f 1-2,4-
You can also use awk, but it’s a bit awkward because while clearing a field is easy, removing the separator takes some work. If you have no empty field, it’s not too bad:
awk -F , 'BEGIN {OFS=FS} {$3=""; sub(",,", ","); print}'
If you have actual CSV, where commas can appear inside fields if properly quoted, you need a real CSV library.
Method 3
Try the command below to drop columns using index.
dropColumnCSV --index=0 --file=file.csv
This would work if the columns are separated by comma, as sed commands are used inside the function to remove strings.
dropColumnCSV() { # argument check while [ $# -gt 0 ]; do case "$1" in --index=*) index="${1#*=}" ;; --file=*) file="${1#*=}" ;; *) printf "* Error: Invalid argument. *n" return esac shift done # file check if [ ! -f $file ]; then printf "* Error: $file not found.*n" return fi # sed remove command index zero if [[ $index == 0 ]]; then sed -i 's/([^,]*),(.*)/2/' $file # sed remove command index greater than zero elif [[ $index > 0 ]]; then pos_str=$(for i in {1..$(seq "$index")}; do echo -n '[^,]*',; done| sed 's/,$//') ; sed -i 's/^('$pos_str'),[^,]*/1/' $file fi }
Method 4
To remove the 3rd column and save a new file, you can do:
cut -d , -f 1-2,4- > output_file.csv
Note that
1-2,4-
means “keep columns 1 to 2, and 4 to the end”.All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0