Remove only the commas present within the double quotes

In a text file, I want to remove , (commas) and also the " (quotes) (only if the double quotes contains numbers separated by commas).

56,72,"12,34,54",x,y,"foo,a,b,bar"

Expected ouput

56,72,123454,x,y,"foo,a,b,bar"

Note: I show the above line just as an example. My text file contains many lines like above and the numbers separated by commas present within the double quotes should vary. That is,

56,72,"12,34,54",x,y,"foo,a,b,bar"
56,92,"12,34",x,y,"foo,a,b,bar"
56,72,"12,34,54,78,76,54,67",x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar","12,34,54"
56,72,x,y,"foo,a,b,bar","12,34,54","45,57,84,92","bar,foo"

Expected output:

56,72,123454,x,y,"foo,a,b,bar"
56,92,1234,x,y,"foo,a,b,bar"
56,72,12345478765467,x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar",123454
56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"

There a n number of numbers present within the double quotes separated by commas. And also leave the double quotes which contains characters as it is.

I love sed text processing tool. I’m happy if you post any sed solution for this.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If perl is OK, here is a short (and probably fast, if not necessarily simple 🙂 ) way of doing it:

perl -pe 's:"(d[d,]+)":$1=~y/,//dr:eg' file

The e flag to the s::: operator (which is just another way of writing s///) causes the replacement to be treated as an expression which is evaluated every time. That expression takes the $1 capture from the regex (which is already missing the quotes) and translates (y///, which can also be written as tr///) it by deleting (/d) all the commas. The r flag to y is necessary in order to get the value to be the translated string, instead of the count of translations.

For those who somehow feel sullied by perl, here is the python equivalent. Python is really not a shell one-liner tool, but sometimes it can be cajoled into co-operating. The following can be written as one line (unlike for loops, which cannot be), but the horizontal scrolling makes it (even more) unreadable:

python -c '
import re;
import sys;
r=re.compile(""(d+(,d+)*)"");
all(not sys.stdout.write(r.sub(lambda m:m.group(1).replace(",",""),l))
    for l in sys.stdin)
' < file

Method 2

This (adapted from here) should do what you need though @rici’s Perl one is much simpler:

$ sed -r ':a;s/(("[0-9,]*",?)*"[0-9,]*),/1/;ta; s/""/","/g; 
          s/"([0-9]*)",?/1,/g ' file
56,72,123454,x,y,"foo,a,b,bar"
56,92,1234,x,y,"foo,a,b,bar"
56,72,12345478765467,x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar",123454,
56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"

Explanation

  • :a : define a label called a.
  • s/(("[0-9,]*",?)*"[0-9,]*),/1/ : This one needs to be broken down
    • First of all, using this construct : (foo(bar)), 1 will be foobar and 2 will be bar.
    • "[0-9,]*",? : match 0 or more of 0-9 or ,, followed by 0 or 1 ,.
    • ("[0-9,]*",?)* : match 0 or more of the above.
    • "[0-9,]* : match 0 or more of 0-9 or , that come right after a "
  • ta; : go back to the label a and run again if the substitution was successful.
  • s/""/","/g; : post-processing. Replace "" with ",".
  • s/"([0-9]*)",?/1,/g : remove all quotes around numbers.

This might be easier to understand with another example:

$ echo '"1,2,3,4"' | sed -nr ':a;s/(("[0-9,]*",?)*"[0-9,]*),/1/;p;ta;'
"1,2,34"
"1,234"
"1234"
"1234"

So, while you can find a number that is right after a quote and followed by a comma and another number, join the two numbers together and repeat the process until it is no longer possible.

At this point I believe it is useful to mention a quote from info sed that appears in the section describing advanced functions such as the label used above (thanks for finding if @Braiam):

In most cases, use of these commands indicates that you are probably
better off programming in something like `awk’ or Perl.

Method 3

For CSV data, I’d use a language with a real CSV parser. For example with Ruby:

ruby -rcsv -pe '
  row = CSV::parse_line($_).map {|e| e.delete!(",") if e =~ /^[d,]+$/; e} 
  $_  = CSV::generate_line(row)
' <<END
56,72,"12,34,54",x,y,"foo,a,b,bar"
56,92,"12,34",x,y,"foo,a,b,bar"
56,72,"12,34,54,78,76,54,67",x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar","12,34,54"
56,72,x,y,"foo,a,b,bar","12,34,54","45,57,84,92","bar,foo"
END
56,72,123454,x,y,"foo,a,b,bar"
56,92,1234,x,y,"foo,a,b,bar"
56,72,12345478765467,x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar",123454
56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"

Method 4

Blockquote

Hi Here is the Python code to Replace commas with in double quotes,
commas are replaced with pipe (|) character

This Python code is to replace commas enclosed in double quotes

eg: x,y,z,1,2,”r,e,t,y”,h,8,5,6

if replace with Pipe x,y,z,1,2,”r|e|t|y”,h,8,5,6

if replace with null x,y,z,1,2,”rety”,h,8,5,6

writingFile = open('FileToWrite', 'w')
with open('FileToRead') as f:

    while True:

        c = f.read(1)
        if not c:
            print ("End of file")
            break
        print ("Read a character:", c)


        if c=='"':
            writingFile.write(c) 
            c = f.read(1)
            while c != '"':
                if c== ',':
                    c= '|'
                writingFile.write(c)
                c = f.read(1)


        writingFile.write(c)


writingFile.close()


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x