I have data frames which contain e.g.:
"vendor a::ProductA" "vendor b::ProductA" "vendor a::Productb"
I need to remove everything (and including) the two :: so that I end up with:
"vendor a" "vendor b" "vendor a"
I tried str.trim (which seems to not exist) and str.split without success.
what would be the easiest way to accomplish this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use pandas.Series.str.split just like you would use split normally. Just split on the string '::', and index the list that’s created from the split method:
>>> df = pd.DataFrame({'text': ["vendor a::ProductA", "vendor b::ProductA", "vendor a::Productb"]})
>>> df
text
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
>>> df['text_new'] = df['text'].str.split('::').str[0]
>>> df
text text_new
0 vendor a::ProductA vendor a
1 vendor b::ProductA vendor b
2 vendor a::Productb vendor a
Here’s a non-pandas solution:
>>> df['text_new1'] = [x.split('::')[0] for x in df['text']]
>>> df
text text_new text_new1
0 vendor a::ProductA vendor a vendor a
1 vendor b::ProductA vendor b vendor b
2 vendor a::Productb vendor a vendor a
Edit: Here’s the step-by-step explanation of what’s happening in pandas above:
# Select the pandas.Series object you want
>>> df['text']
0 vendor a::ProductA
1 vendor b::ProductA
2 vendor a::Productb
Name: text, dtype: object
# using pandas.Series.str allows us to implement "normal" string methods
# (like split) on a Series
>>> df['text'].str
<pandas.core.strings.StringMethods object at 0x110af4e48>
# Now we can use the split method to split on our '::' string. You'll see that
# a Series of lists is returned (just like what you'd see outside of pandas)
>>> df['text'].str.split('::')
0 [vendor a, ProductA]
1 [vendor b, ProductA]
2 [vendor a, Productb]
Name: text, dtype: object
# using the pandas.Series.str method, again, we will be able to index through
# the lists returned in the previous step
>>> df['text'].str.split('::').str
<pandas.core.strings.StringMethods object at 0x110b254a8>
# now we can grab the first item in each list above for our desired output
>>> df['text'].str.split('::').str[0]
0 vendor a
1 vendor b
2 vendor a
Name: text, dtype: object
I would suggest checking out the pandas.Series.str docs, or, better yet, Working with Text Data in pandas.
Method 2
If it is in a specific column (having name: column) of a data frame (having name: dataframe), you can also use
dataframe.column.str.replace("(::).*","")
It gives you the below result
column new_column 0 vendor a::ProductA vendor a 1 vendor b::ProductA vendor b 2 vendor a::Productb vendor a
By using this you need not specify any position, as it gets rid of anything present after ‘::‘
I guess this might come oh help,Good luck!
Method 3
You can use str.replace(":", " ") to remove the "::".
To split, you need to specify the character you want to split into: str.split(" ")
The trim function is called strip in python: str.strip()
Also, you can do str[:7] to get just "vendor x" in your strings.
Good luck
Method 4
Alternatively you can use extract which returns the part of the string inside the parenthesis:
In [3]: df.assign(result=df['column'].str.extract('(.*)::'))
Out[3]:
column result
0 vendor a::ProductA vendor a
1 vendor b::ProductA vendor b
2 vendor a::Productb vendor a
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0