In a text file, I want to remove , (commas) and also the " (quotes) (only if the double quotes contains numbers separated by commas).
56,72,"12,34,54",x,y,"foo,a,b,bar"
Expected ouput
56,72,123454,x,y,"foo,a,b,bar"
Note: I show the above line just as an example. My text file contains many lines like above and the numbers separated by commas present within the double quotes should vary. That is,
56,72,"12,34,54",x,y,"foo,a,b,bar" 56,92,"12,34",x,y,"foo,a,b,bar" 56,72,"12,34,54,78,76,54,67",x,y,"foo,a,b,bar" 56,72,x,y,"foo,a,b,bar","12,34,54" 56,72,x,y,"foo,a,b,bar","12,34,54","45,57,84,92","bar,foo"
Expected output:
56,72,123454,x,y,"foo,a,b,bar" 56,92,1234,x,y,"foo,a,b,bar" 56,72,12345478765467,x,y,"foo,a,b,bar" 56,72,x,y,"foo,a,b,bar",123454 56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"
There a n number of numbers present within the double quotes separated by commas. And also leave the double quotes which contains characters as it is.
I love sed text processing tool. I’m happy if you post any sed solution for this.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If perl is OK, here is a short (and probably fast, if not necessarily simple 🙂 ) way of doing it:
perl -pe 's:"(d[d,]+)":$1=~y/,//dr:eg' file
The e flag to the s::: operator (which is just another way of writing s///) causes the replacement to be treated as an expression which is evaluated every time. That expression takes the $1 capture from the regex (which is already missing the quotes) and translates (y///, which can also be written as tr///) it by deleting (/d) all the commas. The r flag to y is necessary in order to get the value to be the translated string, instead of the count of translations.
For those who somehow feel sullied by perl, here is the python equivalent. Python is really not a shell one-liner tool, but sometimes it can be cajoled into co-operating. The following can be written as one line (unlike for loops, which cannot be), but the horizontal scrolling makes it (even more) unreadable:
python -c '
import re;
import sys;
r=re.compile(""(d+(,d+)*)"");
all(not sys.stdout.write(r.sub(lambda m:m.group(1).replace(",",""),l))
for l in sys.stdin)
' < file
Method 2
This (adapted from here) should do what you need though @rici’s Perl one is much simpler:
$ sed -r ':a;s/(("[0-9,]*",?)*"[0-9,]*),/1/;ta; s/""/","/g;
s/"([0-9]*)",?/1,/g ' file
56,72,123454,x,y,"foo,a,b,bar"
56,92,1234,x,y,"foo,a,b,bar"
56,72,12345478765467,x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar",123454,
56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"
Explanation
:a: define a label calleda.s/(("[0-9,]*",?)*"[0-9,]*),/1/: This one needs to be broken down- First of all, using this construct :
(foo(bar)),1will befoobarand2will bebar. "[0-9,]*",?: match 0 or more of0-9or,, followed by 0 or 1,.("[0-9,]*",?)*: match 0 or more of the above."[0-9,]*: match 0 or more of0-9or,that come right after a"
- First of all, using this construct :
ta;: go back to the labelaand run again if the substitution was successful.s/""/","/g;: post-processing. Replace""with",".s/"([0-9]*)",?/1,/g: remove all quotes around numbers.
This might be easier to understand with another example:
$ echo '"1,2,3,4"' | sed -nr ':a;s/(("[0-9,]*",?)*"[0-9,]*),/1/;p;ta;'
"1,2,34"
"1,234"
"1234"
"1234"
So, while you can find a number that is right after a quote and followed by a comma and another number, join the two numbers together and repeat the process until it is no longer possible.
At this point I believe it is useful to mention a quote from info sed that appears in the section describing advanced functions such as the label used above (thanks for finding if @Braiam):
In most cases, use of these commands indicates that you are probably
better off programming in something like `awk’ or Perl.
Method 3
For CSV data, I’d use a language with a real CSV parser. For example with Ruby:
ruby -rcsv -pe '
row = CSV::parse_line($_).map {|e| e.delete!(",") if e =~ /^[d,]+$/; e}
$_ = CSV::generate_line(row)
' <<END
56,72,"12,34,54",x,y,"foo,a,b,bar"
56,92,"12,34",x,y,"foo,a,b,bar"
56,72,"12,34,54,78,76,54,67",x,y,"foo,a,b,bar"
56,72,x,y,"foo,a,b,bar","12,34,54"
56,72,x,y,"foo,a,b,bar","12,34,54","45,57,84,92","bar,foo"
END
56,72,123454,x,y,"foo,a,b,bar" 56,92,1234,x,y,"foo,a,b,bar" 56,72,12345478765467,x,y,"foo,a,b,bar" 56,72,x,y,"foo,a,b,bar",123454 56,72,x,y,"foo,a,b,bar",123454,45578492,"bar,foo"
Method 4
Blockquote
Hi Here is the Python code to Replace commas with in double quotes,
commas are replaced with pipe (|) character
This Python code is to replace commas enclosed in double quotes
eg: x,y,z,1,2,”r,e,t,y”,h,8,5,6
if replace with Pipe x,y,z,1,2,”r|e|t|y”,h,8,5,6
if replace with null x,y,z,1,2,”rety”,h,8,5,6
writingFile = open('FileToWrite', 'w')
with open('FileToRead') as f:
while True:
c = f.read(1)
if not c:
print ("End of file")
break
print ("Read a character:", c)
if c=='"':
writingFile.write(c)
c = f.read(1)
while c != '"':
if c== ',':
c= '|'
writingFile.write(c)
c = f.read(1)
writingFile.write(c)
writingFile.close()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0