I’m looking to extract the string of numbers that come after ‘accession’ in this Dataframe. My dataframe looks like this:
targets_list = pd.DataFrame(targets_df[['target_components', 'target_chembl_id']])
and the elements in each column of the target_components looks like the following:
[{'accession': 'O43451', 'component_description': 'Maltase-glucoamylase, intestinal', 'component_id': 434, 'component_type': 'PROTEIN', 'relationship': 'SINGLE PROTEIN', 'target_component_synonyms',...}]
I would just like to extract the number code after ‘accession’. As I thought it was the first element of the list, I tried to tgt = targets_list['target_components'][0][0], but this returns the first element of that list, but not the accession number.
I can see that it is a list that’s in each row, but how to parse that list and get that number and add it to a column is what’s missing for me. It should be possible with Regex maybe? But I’m not sure how Regex works at all.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could try:
tgt = targets_list["target_components"].str[0].str["accession"]
Result for
targets_list = pd.DataFrame(
{"target_components": [
[{"accession": "O43451", "b": "c", "d": 1}],
[{"accession": "012345", "b": "e", "d": 2}],
[{"b": "f", "d": 3}],
[]]}
)
target_components
0 [{'accession': 'O43451', 'b': 'c', 'd': 1}]
1 [{'accession': '012345', 'b': 'e', 'd': 2}]
2 [{'b': 'f', 'd': 3}]
3 []
is
0 O43451 1 012345 2 None 3 NaN Name: target_components, dtype: object
Method 2
You can use the .findall() function or .extract() to get the id.
Refer to :
Use regular expression to extract elements from a pandas data frame
Method 3
First there is no need to use pd.DataFrame again to create dataframe from existing columns:
targets_list = targets_df[['target_components', 'target_chembl_id']]
Then you can use apply to access the column element
tgt = targets_list['target_components'].apply(lambda x: x[0]['accession'])
Method 4
You can try this:
targets_list['target_components'].map(lambda x: x[0].get("accession"] if x else '')
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0