How to extract elements from a list in pandas through regex?

I’m looking to extract the string of numbers that come after ‘accession’ in this Dataframe. My dataframe looks like this:

targets_list = pd.DataFrame(targets_df[['target_components', 'target_chembl_id']])

and the elements in each column of the target_components looks like the following:

[{'accession': 'O43451', 'component_description': 'Maltase-glucoamylase, intestinal', 'component_id': 434, 'component_type': 'PROTEIN', 'relationship': 'SINGLE PROTEIN', 'target_component_synonyms',...}]

I would just like to extract the number code after ‘accession’. As I thought it was the first element of the list, I tried to tgt = targets_list['target_components'][0][0], but this returns the first element of that list, but not the accession number.

I can see that it is a list that’s in each row, but how to parse that list and get that number and add it to a column is what’s missing for me. It should be possible with Regex maybe? But I’m not sure how Regex works at all.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You could try:

tgt = targets_list["target_components"].str[0].str["accession"]

Result for

targets_list = pd.DataFrame(
    {"target_components": [
        [{"accession": "O43451", "b": "c", "d": 1}],
        [{"accession": "012345", "b": "e", "d": 2}],
        [{"b": "f", "d": 3}],
        []]}
)
                              target_components
0  [{'accession': 'O43451', 'b': 'c', 'd': 1}]
1  [{'accession': '012345', 'b': 'e', 'd': 2}]
2                         [{'b': 'f', 'd': 3}]
3                                           []

is

0    O43451
1    012345
2      None
3       NaN
Name: target_components, dtype: object

Method 2

You can use the .findall() function or .extract() to get the id.

Refer to :
Use regular expression to extract elements from a pandas data frame

Method 3

First there is no need to use pd.DataFrame again to create dataframe from existing columns:

targets_list = targets_df[['target_components', 'target_chembl_id']]

Then you can use apply to access the column element

tgt = targets_list['target_components'].apply(lambda x: x[0]['accession'])

Method 4

You can try this:

targets_list['target_components'].map(lambda x: x[0].get("accession"] if x else '')


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x