How do I convert a list in a Pandas DF into a string?

I have a pandas data frame. One of the columns contains a list. I want that column to be a single string.

For example my list ['one','two','three'] should simply be 'one, two, three'

df['col'] = df['col'].astype(str).apply(lambda x: ', '.join(df['col'].astype(str)))

gives me ['one, two, three],['four','five','six'] where the second list is from the next row. Needless to say with millions of rows this concatenation across rows is not only incorrect, it kills my memory.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You should certainly not convert to string before you transform the list. Try:

df['col'].apply(', '.join)

Also note that apply applies the function to the elements of the series, so using df['col'] in the lambda function is probably not what you want.


Or, there is a native .str.join method, but it is (surprisingly) a bit slower than apply.

Method 2

When you cast col to str with astype, you get a string representation of a python list, brackets and all. You do not need to do that, just apply join directly:

import pandas as pd

df = pd.DataFrame({
    'A': [['a', 'b', 'c'], ['A', 'B', 'C']]
    })

# Out[8]: 
#            A
# 0  [a, b, c]
# 1  [A, B, C]

df['Joined'] = df.A.apply(', '.join)

#            A   Joined
# 0  [a, b, c]  a, b, c
# 1  [A, B, C]  A, B, C

Method 3

You could convert your list to str with astype(str) and then remove ', [, ] characters. Using @Yakim example:

In [114]: df
Out[114]:
           A
0  [a, b, c]
1  [A, B, C]

In [115]: df.A.astype(str).str.replace('[|]|'', '')
Out[115]:
0    a, b, c
1    A, B, C
Name: A, dtype: object

Timing

import pandas as pd
df = pd.DataFrame({'A': [['a', 'b', 'c'], ['A', 'B', 'C']]})
df = pd.concat([df]*1000)


In [2]: timeit df['A'].apply(', '.join)
292 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [3]: timeit df['A'].str.join(', ')
368 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [4]: timeit df['A'].apply(lambda x: ', '.join(x))
505 µs ± 5.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: timeit df['A'].str.replace('[|]|'', '')
2.43 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Method 4

Pandas offers a method for this, Series.str.join.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x