Split string based on regex

What is the best way to split a string like "HELLO there HOW are YOU" by upper case words (in Python)?

So I’d end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']


EDIT:

I have tried:

p = re.compile("b[A-Z]{2,}b")
print p.split(page_text)

It doesn’t seem to work, though.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I suggest

l = re.compile("(?<!^)s+(?=[A-Z])(?!.s)").split(s)

Check this demo.

Method 2

You could use a lookahead:

re.split(r'[ ](?=[A-Z]+b)', input)

This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

Note that the square brackets are only for readability and could as well be omitted.

If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

re.split(r'[ ](?=[A-Z])', input)

Now this splits at every space followed by any upper-case letter.

Method 3

Your question contains the string literal "b[A-Z]{2,}b",
but that b will mean backspace, because there is no r-modifier.

Try: r"b[A-Z]{2,}b".


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x