Python pandas: remove everything after a delimiter in a string

I have data frames which contain e.g.:

"vendor a::ProductA"
"vendor b::ProductA"
"vendor a::Productb"

I need to remove everything (and including) the two :: so that I end up with:

"vendor a"
"vendor b"
"vendor a"

I tried str.trim (which seems to not exist) and str.split without success.
what would be the easiest way to accomplish this?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that’s created from the split method:

>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})
>>> df
                 text
0  vendor a::ProductA
1  vendor b::ProductA
2  vendor a::Productb
>>> df['text_new'] = df['text'].str.split('::').str[0]
>>> df
                 text  text_new
0  vendor a::ProductA  vendor a
1  vendor b::ProductA  vendor b
2  vendor a::Productb  vendor a

Here’s a non-pandas solution:

>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]
>>> df
                 text  text_new text_new1
0  vendor a::ProductA  vendor a  vendor a
1  vendor b::ProductA  vendor b  vendor b
2  vendor a::Productb  vendor a  vendor a

Edit: Here’s the step-by-step explanation of what’s happening in pandas above:

# Select the pandas.Series object you want
>>> df['text']
0    vendor a::ProductA
1    vendor b::ProductA
2    vendor a::Productb
Name: text, dtype: object

# using pandas.Series.str allows us to implement "normal" string methods 
# (like split) on a Series
>>> df['text'].str
<pandas.core.strings.StringMethods object at 0x110af4e48>

# Now we can use the split method to split on our '::' string. You'll see that
# a Series of lists is returned (just like what you'd see outside of pandas)
>>> df['text'].str.split('::')
0    [vendor a, ProductA]
1    [vendor b, ProductA]
2    [vendor a, Productb]
Name: text, dtype: object

# using the pandas.Series.str method, again, we will be able to index through
# the lists returned in the previous step
>>> df['text'].str.split('::').str
<pandas.core.strings.StringMethods object at 0x110b254a8>

# now we can grab the first item in each list above for our desired output
>>> df['text'].str.split('::').str[0]
0    vendor a
1    vendor b
2    vendor a
Name: text, dtype: object

I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.

Method 2

If it is in a specific column (having name: column) of a data frame (having name: dataframe), you can also use

dataframe.column.str.replace("(::).*","")

It gives you the below result

         column        new_column       
0  vendor a::ProductA  vendor a
1  vendor b::ProductA  vendor b
2  vendor a::Productb  vendor a

By using this you need not specify any position, as it gets rid of anything present after ‘::‘

I guess this might come oh help,Good luck!

Method 3

You can use str.replace(":", " ") to remove the "::".
To split, you need to specify the character you want to split into: str.split(" ")

The trim function is called strip in python: str.strip()

Also, you can do str[:7] to get just "vendor x" in your strings.

Good luck

Method 4

Alternatively you can use extract which returns the part of the string inside the parenthesis:

In [3]: df.assign(result=df['column'].str.extract('(.*)::'))
Out[3]: 
               column    result
0  vendor a::ProductA  vendor a
1  vendor b::ProductA  vendor b
2  vendor a::Productb  vendor a

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating