Is there a command line spell to drop a column in a CSV-file?

Having a file of the following contents:

1111,2222,3333,4444
aaaa,bbbb,cccc,dddd

I seek to get a file equal to the original but lacking a n-th column like, for n = 2 (or may it be 3)
1111,2222,4444
aaaa,bbbb,dddd

or, for n = 0 (or may it be 1)
2222,3333,4444
bbbb,cccc,dddd

A real file can be gigabytes long having tens thousands columns.

As always in such cases, I suspect command line magicians can offer an elegant solution… 🙂

In my actual real case I need to drop 2 first columns, which can be done by dropping a first column twice in a sequence, but I suppose it would be more interesting to generalise a bit.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I believe this is specific to cut from the GNU coreutils:

$ cut --complement -f 3 -d, inputfile
1111,2222,4444
aaaa,bbbb,dddd

Normally you specify the fields you want via -f, but by adding –complement you reverse the meaning, naturally. From ‘man cut’:
--complement
    complement the set of selected bytes, characters or fields

One caveat: if any of the columns contain a comma, it will throw cut off, because cut isn’t a CSV parser in the same way that a spreadsheet is. Many
parsers have different ideas about how to handle escaping commas in CSV. For the simple CSV case, on the command line, cut is still the way to go.

Method 2

If the data is simply made of comma-separated columns:

cut -d , -f 1-2,4-

You can also use awk, but it’s a bit awkward because while clearing a field is easy, removing the separator takes some work. If you have no empty field, it’s not too bad:
awk -F , 'BEGIN {OFS=FS}  {$3=""; sub(",,", ","); print}'

If you have actual CSV, where commas can appear inside fields if properly quoted, you need a real CSV library.

Method 3

Try the command below to drop columns using index.

dropColumnCSV --index=0 --file=file.csv

This would work if the columns are separated by comma, as sed commands are used inside the function to remove strings.
dropColumnCSV() {
  # argument check
  while [ $# -gt 0 ]; do
    case "$1" in
      --index=*)
        index="${1#*=}"
        ;;
      --file=*)
        file="${1#*=}"
        ;;
      *)
        printf "* Error: Invalid argument. *n"
        return
    esac
    shift
  done

  # file check
  if [ ! -f $file ]; then
        printf "* Error: $file not found.*n"
        return
  fi

  # sed remove command index zero
  if [[ $index == 0 ]]; then
    sed -i 's/([^,]*),(.*)/2/' $file

  # sed remove command index greater than zero
  elif [[ $index > 0 ]]; then
    pos_str=$(for i in {1..$(seq "$index")}; do echo -n '[^,]*',; done| sed 's/,$//') ;
    sed -i 's/^('$pos_str'),[^,]*/1/' $file
  fi
}

Method 4

To remove the 3rd column and save a new file, you can do:

cut -d , -f 1-2,4- > output_file.csv

Note that 1-2,4- means “keep columns 1 to 2, and 4 to the end”.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments