How to split strings into text and number?

I’d like to split strings like these

'foofo21'
'bar432'
'foobar12345'

into

['foofo', '21']
['bar', '432']
['foobar', '12345']

Does somebody know an easy and simple way to do this in python?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Method 11

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
    items = match.groups()
print(items)
>> ("foofo", "21")

Method 2

>>> def mysplit(s):
...     head = s.rstrip('0123456789')
...     tail = s[len(head):]
...     return head, tail
... 
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
>>>

Method 3

Yet Another Option:

>>> [re.split(r'(d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]

Method 4

>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Method 5

I’m always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(w+?)(d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I’m using a simpler (less to type) regex than most of the previous answers.

Method 6

here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,

def seperate_string_number(string):
    previous_character = string[0]
    groups = []
    newword = string[0]
    for x, i in enumerate(string[1:]):
        if i.isalpha() and previous_character.isalpha():
            newword += i
        elif i.isnumeric() and previous_character.isnumeric():
            newword += i
        else:
            groups.append(newword)
            newword = i

        previous_character = i

        if x == len(string) - 2:
            groups.append(newword)
            newword = ''
    return groups

print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']

Method 7

import re

s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)

Method 8

without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number

def text_num_split(item):
    for index, letter in enumerate(item, 0):
        if letter.isdigit():
            return [item[:index],item[index:]]

print(text_num_split("foobar12345"))

OUTPUT :

['foobar', '12345']

Method 9

This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.

def getNumbers( input ):
    # Collect Info
    compile = ""
    complete = []

    for letter in input:
        # If compiled string
        if compile:
            # If compiled and letter are same type, append letter
            if compile.isdigit() == letter.isdigit():
                compile += letter
            
            # If compiled and letter are different types, append compiled string, and begin with letter
            else:
                complete.append( compile )
                compile = letter
            
        # If no compiled string, begin with letter
        else:
            compile = letter
        
    # Append leftover compiled string
    if compile:
        complete.append( compile )
    
    # Return numbers only
    numbers = [ word for word in complete if word.isdigit() ]
        
    return numbers

Method 10

Here is simple solution for that problem, no need for regex:

user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []

for item in user:
 try:
    item = int(item)  # searching for integers in your string
  except:
    str_list.append(item)
    string = ''.join(str_list)
  else:  # if there are integers i will add it to int_list but as str, because join function only can work with str
    int_list.append(str(item))
    integer = int(''.join(int_list))  # if you want it to be string just do z = ''.join(int_list)

final = [string, integer]  # you can also add it to dictionary d = {string: integer}
print(final)

Method 11

In Addition to the answer of @Evan
If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.

import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
    items = match.groups()
print(items)
>> ("21", "foofo")

Otherwise, you’ll get UnboundLocalError: local variable 'items' referenced before assignment error.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating