pandas select from Dataframe using startswith

This works (using Pandas 12 dev)

table2=table<div class="su-table su-table-alternate"></div> =='INVERNESS']

Then I realized I needed to select the field using “starts with” Since I was missing a bunch.
So per the Pandas doc as near as I could follow I tried

criteria = table['SUBDIVISION'].map(lambda x: x.startswith('INVERNESS'))
table2 = table[criteria]

And got AttributeError: ‘float’ object has no attribute ‘startswith’

So I tried an alternate syntax with the same result

table[[x.startswith('INVERNESS') for x in table['SUBDIVISION']]]

Reference http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
Section 4: List comprehensions and map method of Series can also be used to produce more complex criteria:

What am I missing?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can use the str.startswith DataFrame method to give more consistent results:

In [11]: s = pd.Series(['a', 'ab', 'c', 11, np.nan])

In [12]: s
Out[12]:
0      a
1     ab
2      c
3     11
4    NaN
dtype: object

In [13]: s.str.startswith('a', na=False)
Out[13]:
0     True
1     True
2    False
3    False
4    False
dtype: bool

and the boolean indexing will work just fine (I prefer to use loc, but it works just the same without):

In [14]: s.loc[s.str.startswith('a', na=False)]
Out[14]:
0     a
1    ab
dtype: object

.

It looks least one of your elements in the Series/column is a float, which doesn’t have a startswith method hence the AttributeError, the list comprehension should raise the same error…

Method 2

To retrieve all the rows which startwith required string

dataFrameOut = dataFrame[dataFrame['column name'].str.match('string')]

To retrieve all the rows which contains required string

dataFrameOut = dataFrame[dataFrame['column name'].str.contains('string')]

Method 3

Using startswith for a particular column value

df  = df.loc[df["SUBDIVISION"].str.startswith('INVERNESS', na=False)]

Method 4

You can use apply to easily apply any string matching function to your column elementwise.

table2=table<div class="su-table su-table-alternate"></div>.apply(lambda x: x.startswith('INVERNESS'))]

this assuming that your “SUBDIVISION” column is of the correct type (string)

Edit: fixed missing parenthesis

Method 5

This can also be achieved using query:

table.query('SUBDIVISION.str.startswith("INVERNESS").values')


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x