I have strings like the following:
1338516 -...pair - 5pk 1409093 -...re Wax 3Pk 1409085 -...dtnr - 5pk 1415090 -...accessories 490663 - 3 pack 1490739 -...2 - 3 pack
What I’m trying to do is, split these strings so that the first string is 1338516 -...pair - 5pk and the second one is 1409093 -...re Wax 3Pk.
Currently, I’m able to extract the numbers using the following code:
list(filter(lambda k: '...' in k, reqText)) lst1 = ''.join(lst) numbers = re.findall(r'd+', lst1) numbers1 = [x for x in numbers if len(x) > 3]
Any suggestions?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could use split with a pattern:
[^Sn]+(?=d{5,7}b)
Explanation
[^Sn]+Match 1 or more spaces without a newline(?=d{5,7}b)Positive lookahead, assert 5-7 digits to the right followed by a word boundary
import re
pattern = r"[^Sn]+(?=d{5,7}b)"
lst = [
"1338516 -...pair - 5pk 1409093 -...re Wax 3Pk",
"1409085 -...dtnr - 5pk 1415090 -...accessories",
"490663 - 3 pack 1490739 -...2 - 3 pack"
]
for s in lst:
print(re.split(pattern, s))
Output
['1338516 -...pair - 5pk', '1409093 -...re Wax 3Pk'] ['1409085 -...dtnr - 5pk', '1415090 -...accessories'] ['490663 - 3 pack', '1490739 -...2 - 3 pack']
Another option could be a matching approach:
bd{5,7}b.*?(?=[^Sn]+d{5,7}b|$)
Method 2
You can use
^(.+?)s*b(d{5,7}b.*)
See the regex demo.
In Python, use a raw string literal to declare this regex:
pattern = r'^(.+?)s*b(d{5,7}b.*)'
Details:
^– start of string(.+?)– Group 1: one or more (but as few as possible) occurrences of any char other than line break charss*– zero or more whitespacesb– a word boundary(d{5,7}b.*)– Group 2: five-seven digit number, word boundary and the rest of the line.
See a Python demo:
import re
text = "1338516 -...pair - 5pk 1409093 -...re Wax 3Pk"
pattern = r'^(.+?)s*b(d{5,7}b.*)'
m = re.search(pattern, text)
if m:
print(m.group(1)) # => 1338516 -...pair - 5pk
print(m.group(2)) # => 1409093 -...re Wax 3Pk
If you need to use it in a Pandas dataframe, you can use
df[['result_col_1', 'result_col_2']] = df['source'].str.extract(pattern, expand=True)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0