I have the output of a command in tabular form. I’m parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I’m using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:
>>> str1="a b c d" # spaces are irregular
>>> str1
'a b c d'
>>> str2=re.split("( )+", str1)
>>> str2
['a', ' ', 'b', ' ', 'c', ' ', 'd'] # 1 space element between!!!
Is there a better way to do this?
After each split str2 is appended to a list.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
By using (,), you are capturing the group, if you simply remove them you will not have this problem.
>>> str1 = "a b c d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']
However there is no need for regex, str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
>>> str1.split() ['a', 'b', 'c', 'd']
If you really wanted regex you can use this ('s' represents whitespace and it’s clearer):
>>> re.split("s+", str1)
['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
>>> re.findall(r'S+',str1) ['a', 'b', 'c', 'd']
Method 2
The str.split method will automatically remove all white space between items:
>>> str1 = "a b c d" >>> str1.split() ['a', 'b', 'c', 'd']
Docs are here: http://docs.python.org/library/stdtypes.html#str.split
Method 3
When you use re.split and the split pattern contains capturing groups, the groups are retained in the output. If you don’t want this, use a non-capturing group instead.
Method 4
Its very simple actually. Try this:
str1="a b c d" splitStr1 = str1.split() print splitStr1
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0