Pandas split column into multiple columns by comma

I am trying to split a column into multiple columns based on comma/space separation.

My dataframe currently looks like

     KEYS                                                  1
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420, 4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490, 4000.0499, 4000.0500, 4000.0504,

I would like

   KEYS                                                  1           2            3        4 
0   FIT-4270                                          4000.0439
1   FIT-4269                                          4000.0420  4000.0471
2   FIT-4268                                          4000.0419
3   FIT-4266                                          4000.0499
4   FIT-4265                                          4000.0490  4000.0499  4000.0500  4000.0504

My code currently removes The KEYS column and I’m not sure why. Could anyone improve or help fix the issue?

v = dfcleancsv[1]

#splits the columns by spaces into new columns but removes KEYS?

dfcleancsv = dfcleancsv[1].str.split(' ').apply(Series, 1)

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

In case someone else wants to split a single column (deliminated by a value) into multiple columns – try this:

series.str.split(',', expand=True)

This answered the question I came here looking for.

Credit to EdChum’s code that includes adding the split columns back to the dataframe.

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

Note: The first argument df[[0]] is DataFrame.

The second argument df[1].str.split is the series that you want to split.

split Documentation

concat Documentation

Method 2

Using Edchums answer of

pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)

I was able to solve it by substituting my variables.

dfcleancsv = pd.concat([dfcleancsv['KEYS'], dfcleancsv[1].str.split(', ', expand=True)], axis=1)

Method 3

The OP had a variable number of output columns.
In the particular case of a fixed number of output columns another elegant solution to name the resulting columns is to use a multiple assignation.

Load a sample dataset and reshape it to long format to obtain a variable
called organ_dimension.

import seaborn
iris = seaborn.load_dataset('iris')
df = iris.melt(id_vars='species', var_name='organ_dimension', value_name='value')

Split the organ_dimension variable in 2 variables organ and dimension based on the _ separator.

df[['organ', 'dimension']] = df['organ_dimension'].str.split('_', expand=True)
df.head()

Out[10]: 
  species organ_dimension  value  organ dimension
0  setosa    sepal_length    5.1  sepal    length
1  setosa    sepal_length    4.9  sepal    length
2  setosa    sepal_length    4.7  sepal    length
3  setosa    sepal_length    4.6  sepal    length
4  setosa    sepal_length    5.0  sepal    length

Based on this answer “How to split a column into two columns?”

Method 4

The simplest way to use is, vectorization

df = df.apply(lambda x:pd.Series(x))

Method 5

maybe this should work:

df = pd.concat([df['KEYS'],df[1].apply(pd.Series)],axis=1)

Method 6

Check this out

Responder_id    LanguagesWorkedWith
0   1   HTML/CSS;Java;JavaScript;Python
1   2   C++;HTML/CSS;Python
2   3   HTML/CSS
3   4   C;C++;C#;Python;SQL
4   5   C++;HTML/CSS;Java;JavaScript;Python;SQL;VBA
... ... ...
87564   88182   HTML/CSS;Java;JavaScript
87565   88212   HTML/CSS;JavaScript;Python
87566   88282   Bash/Shell/PowerShell;Go;HTML/CSS;JavaScript;W...
87567   88377   HTML/CSS;JavaScript;Other(s):
87568   88863   Bash/Shell/PowerShell;HTML/CSS;Java;JavaScript...`
###Split the LanguagesWorkedWith column into  multiple columns  by using` data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)`.###
` data1 = pd.read_csv('data.csv', sep=',')
data1.set_index('Responder_id',inplace=True)
data1
data1.loc[1,:]
data= data1['LanguagesWorkedWith'].str.split(';').apply(pd.Series)
data.head()`

Method 7

You may also want to try datar, a package ports dplyr, tidyr and related R packages to python:

>>> df
         i       j              A
  <object> <int64>       <object>
0       AR       5    Paris,Green
1      For       3  Moscow,Yellow
2      For       4  NewYork,Black
>>> from datar import f
>>> from datar.tidyr import separate
>>> separate(df, f.A, ['City', 'Color'])
         i       j     City    Color
  <object> <int64> <object> <object>
0       AR       5    Paris    Green
1      For       3   Moscow   Yellow
2      For       4  NewYork    Black

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating