sed – remove the very last occurrence of a string (a comma) in a file?

I have a very large csv file. How would you remove the very last , with sed (or similar) ?

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0],
]

Desired output

...
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

The following sed command will delete the last occurrence per line, but I want per file.

sed -e 's/,$//' foo.csv

Nor does this work

sed '$s/,//' foo.csv

Contents hide

Answers:

Method 1

Using awk

Using awk and bash

Using sed

Using bash

Method 2

Method 3

OUTPUT:

Method 4

If the comma might not be on the second-to-last line

Using awk and tac:

Method 5

Method 6

Method 7

Method 8

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Using `awk`

If the comma is always at the end of the second to last line:

$ awk 'NR>2{print a;} {a=b; b=$0} END{sub(/,$/, "", a); print a;print b;}'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using `awk` and `bash`

$ awk -v "line=$(($(wc -l <input)-1))" 'NR==line{sub(/,$/, "")} 1'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using `sed`

$ sed 'x;${s/,$//;p;x;};1d'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

For OSX and other BSD platforms, try:

sed -e x -e '$ {s/,$//;p;x;}' -e 1d  input

Using `bash`

while IFS=  read -r line
do
    [ "$a" ] && printf "%sn" "$a"
    a=$b
    b=$line
done <input
printf "%sn" "${a%,}"
printf "%sn" "$b"

Method 2

Simply you could try the below Perl one-liner command.

perl -00pe 's/,(?!.*,)//s' file

Explanation:

, Matches a comma.
(?!.*,) Negative lookahead asserts that there wouldn’t be a comma after that matched comma. So it would match the last comma.
s And the most importing thing is s DOTALL modifier which makes dot to match even newline characters also.

Method 3

lcomma() { sed '
    $x;$G;/(.*),/!H;//!{$!d
};  $!x;$s//1/;s/^n//'
}

That should remove only the last occurrence of a , in any input file – and it will still print those in which a , does not occur. Basically, it buffers sequences of lines that do not contain a comma.

When it encounters a comma it swaps the current line buffer with the hold buffer and in that way simultaneously prints out all lines that occurred since the last comma and frees its hold buffer.

I was just digging through my history file and found this:

lmatch(){ set "USAGE:
        lmatch /BRE [-(((s|-sub) BRE)|(r|-ref)) REPL [-(f|-flag) FLAG]*]*
"       "${1%"${1#?}"}" "[email protected]"
        eval "${ZSH_VERSION:+emulate sh}"; eval '
        sed "   1x;     \$3$2!{1!H;$!d
                };      \$3$2{x;1!p;$!d;x
                };      \$3$2!x;\$3$2!b'"
        $(      unset h;i=3 p=:-:shfr e='33[' m=$(($#+1)) f=OPTERR
                [ -t 2 ] && f=$e2K$e'1;41;17m}r${h-'$f$em
                f='${$m?""${h-'$f':t${$i$en}$1""}\c' e=} _o=
                o(){    IFS= ;getopts  $p a "$1"       &&
                        [ -n "${a#[?:]}" ]              &&
                        o=${a#-}${OPTARG-${1#-?}}       ||
                        ! eval "o=$f;o=${o%%*{$m}*}"
        };      a(){    case ${a#[!-]}$o in (?|-*) a=;;esac; o=
                        set $* "${3-$2$}{$((i+=!${#a}))${a:+#-?}}"
                                ${3+$2 "{$((i+=1))$e"} $2
                        IFS=$;  _o=${_o%"${3+$_o} "*}$*
        };      while   eval "o "${$((i+=(OPTIND=1)))}""
                do      case            ${o#[!$a]}      in
                        (s*|ub)         a s 2 ''        ;;
                        (r*|ef)         a s 2           ;;
                        (f*|lag)        a               ;;
                        (h*|elp)        h= o; break     ;;
                esac;   done;   set -f; printf  "t%bnt" $o $_o
)"";}

It’s actually pretty good. Yes, it uses eval, but it never passes anything to it beyond a numeric reference to its arguments. It builds arbitrary sed scripts for handling a last match. I’ll show you:

printf "%d" %d' %d" %d'n" $(seq 5 5 200) |                               
    tee /dev/fd/2 |                                                         
    lmatch  d^.0       #all re's delimit w/ d now                           
        -r '&&&&'      #-r or --ref like: '...s//$ref/...'      
        --sub ' sq    #-s or --sub like: '...s/$arg1/$arg2/...'
        --flag 4       #-f or --flag appended to last -r or -s
        -s" \dq      #short opts can be '-s $arg1 $arg2' or '-r$arg1'
        -fg             #tacked on so: '...s/"/dq/g...'

That prints the following to stderr. This is a copy of lmatch‘s input:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
105" 110' 115" 120'
125" 130' 135" 140'
145" 150' 155" 160'
165" 170' 175" 180'
185" 190' 195" 200'

The function’s evaled subshell iterates through all of its arguments once. As it walks over them it iterates a counter appropriately depending on the context for each switch and skips over that many arguments for the next iteration. From then on it does one of a few things per argument:

For each option the option parser adds $a to $o. $a is assigned based on the value of $i which is incremented by arg count for each arg processed. $a is assigned one of the two following values:
- a=$((i+=1)) – this is assigned if either a short-option does not have its argument appended to it or if the option was a long one.
- a=$i#-? – this is assigned if the option is a short one and does have its arg appended to it.
- a=${$a}${1:+$d${$(($1))}} – Regardless of the initial assignment, $a‘s value is always wrapped in braces and – in an -s case – sometimes $i is incremented one more and additionally delimited field is appended.

The result is that eval is never passed a string containing any unknowns. Each of the command-line arguments are referred to by their numeric argument number – even the delimiter which is extracted from the first character of the first argument and is the only time you should use whatever character that is unescaped. Basically, the function is a macro generator – it never interprets the arguments’ values in any special way because sed can (and will, of course) easily handle that when it parses the script. Instead, it just sensibly arranges its args into a workable script.

Here’s some debug output of the function at work:

... sed "   1x;\$2$1!{1!H;$!d
        };      \$2$1{x;1!p;$!d;x
        };      \$2$1!x;\$2$1!b
        s$1$1${4}$1
        s$1${6}$1${7}$1${9}
        s$1${10#-?}$1${11}$1${12#-?}
        "
++ sed '        1x;d^.0d!{1!H;$!d
        };      d^.0d{x;1!p;$!d;x
        };      d^.0d!x;d^.0d!b
        sdd&&&&d
        sd'''dsqd4
        sd"ddqdg
        '

And so lmatch can be used to easily apply regexes to data following the last match in a file. The result of the command I ran above is:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
101010105dq 110' 115dq 120'
125dq 130' 135dq 140sq
145dq 150' 155dq 160'
165dq 170' 175dq 180'
185dq 190' 195dq 200'

…which, given the subset of the file input that follows the last time /^.0/ is matched, applies the following substitutions:

sdd&&&&d – replaces $match with itself 4 times.
sd'dsqd4 – the fourth single-quote following the beginning of the line since the last match.
sd"ddqd2 – ditto, but for double-quotes and globally.

And so, to demonstrate how one might use lmatch to remove the last comma in a file:

printf "%d, %d %d, %dn" $(seq 5 5 100) |
lmatch '/(.*),' -r\1

OUTPUT:

5, 10 15, 20
25, 30 35, 40
45, 50 55, 60
65, 70 75, 80
85, 90 95 100

Method 4

If the comma might not be on the second-to-last line

Using `awk` and `tac`:

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' | tac

The awk command is a simple one to do the substitution
the first time the pattern is seen.
tac reverses the order of the lines in the file,
so the awk command ends up removing the last comma.

I’ve been told that

tac foo.csv | awk '/,$/ && !handled { sub(/,$/, ""); handled++ } {print}' > tmp && tac tmp

may be more efficient.

Method 5

see https://stackoverflow.com/questions/12390134/remove-comma-from-last-line

This is worked for me:

$cat input.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"},
$ sed '$s/,$//' < input.txt >output.txt
$cat output.txt
{"name": "secondary_ua","type":"STRING"},
{"name": "request_ip","type":"STRING"},
{"name": "cb","type":"STRING"}

My be best way is remove the last line and after removing comma, add the ] char again

Method 6

If you can use tac:

tac file | perl -pe '$_=reverse;!$done && s/,// && $done++;$_=reverse'|tac

Method 7

Try with below vi:

  vi "+:$-1s/(,)(_s*])/2/e" "+:x" file

Explanation:

$-1 select second to last line
s replace
(,)(_s*]) find a comma followed by ] and separated by spaces or newline
2 replace by (_s*]) i.e. spaces or newline followed by ]

Method 8

Try with below sed command.

sed -i '$s/,$//' foo.csv

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating