How to remove square bracket from pandas dataframe

I came up with values in square bracket(more like a list) after applying str.findall() to column of a pandas dataframe. How can I remove the square bracket ?

print df

id     value                 
1      [63]        
2      [65]       
3      [64]        
4      [53]       
5      [13]      
6      [34]

Contents hide

Answers:

Method 1

Method 2

Method 3

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If values in column value have type list, use:

df['value'] = df['value'].str[0]

Or:

df['value'] = df['value'].str.get(0)

Docs.

Sample:

df = pd.DataFrame({'value':[[63],[65],[64]]})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'list'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'list'>

df['value'] = df['value'].str.get(0)
print (df)
   value
0     63
1     65
2     64

If strings use str.strip and then convert to numeric by astype:

df['value'] = df['value'].str.strip('[]').astype(int)

Sample:

df = pd.DataFrame({'value':['[63]','[65]','[64]']})
print (df)
  value
0  [63]
1  [65]
2  [64]

#check type if index 0 exist
print (type(df.loc[0, 'value']))
<class 'str'>

#check type generally, index can be `DatetimeIndex`, `FloatIndex`...
print (type(df.loc[df.index[0], 'value']))
<class 'str'>


df['value'] = df['value'].str.strip('[]').astype(int)
print (df)
  value
0    63
1    65
2    64

Method 2

if string we can also use string.replace method

import pandas as pd

df =pd.DataFrame({'value':['[63]','[65]','[64]']})

print(df)
  value
0  [63]
1  [65]
2  [64]

df['value'] =  df['value'].apply(lambda x: x.replace('[','').replace(']','')) 

#convert the string columns to int
df['value'] = df['value'].astype(int)

#output
print(df)

   value
0     63
1     65
2     64

print(df.dtypes)
value    int32
dtype: object

Method 3

A general solution to remove [ and ] chars from a dataframe string column is

df['value'] = df['value'].str.replace(r'[][]', '', regex=True)  # one by one
df['value'] = df['value'].str.replace(r'[][]+', '', regex=True) # by chunks of one or more [ or ] chars

The [][] is a character class in regex, that matches a ] or [ char. + makes the regex engine match these chars one or more times sequentially.

See the regex demo.

However, in this case, the square brackets mark the list of strings that was a result of Series.str.findall. It is clear that you wanted to extract one, the first match from the column values.

When you need the first match, use Series.str.extract
When you need all matches, use Series.str.findall

So, in this case, to avoid this trouble you found yourself in, you could use

df['value'] = df['source_column'].str.extract(r'my regex with one set of (parentheses)')

Note that str.extract requires at least one set of capturing parentheses to actually work and return a value (str.findall works even without a capturing group).

Note if you were to get multiple matches with findall, and you wanted a single string as output, you could str.join the matches:

df['value'] = df['source_column'].str.findall(pattern).str.join(', ')

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating