I am trying to figure out how Featuretools works and I am testing it on the Housing Prices dataset on Kaggle. Because the dataset is huge, I’ll work here with only a set of it.
The dataframe is:
train={'Id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'MSSubClass': {0: 60, 1: 20, 2: 60, 3: 70, 4: 60}, 'MSZoning': {0: 'RL', 1: 'RL', 2: 'RL', 3: 'RL', 4: 'RL'}, 'LotFrontage': {0: 65.0, 1: 80.0, 2: 68.0, 3: 60.0, 4: 84.0}, 'LotArea': {0: 8450, 1: 9600, 2: 11250, 3: 9550, 4: 14260}}
I create an EntitySet for this dataframe:
es_train = ft.EntitySet()
I add the dataframe to the created EntitySet:
es_train.add_dataframe(dataframe_name='train', dataframe=train, index='Id')
Then I call the function:
ap, tp = ft.get_valid_primitives(entityset=es_train, target_dataframe_name='train')
And here it all breaks up, because I get the following error message:
KeyError: ‘DataFrame train does not exist in entity set’
I tried to study the tutorials on the Featuretools site, but all I could find are tutorials with multiple dataframes, so it didn’t help me at all.
Where am I mistaking? How can I correct the mistake(s)?
Thanks!
Later edit: I am using PyCharm. When I work in script mode, I get the error above. However, when I use the command line, everything works perfectly.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The only issue I see with your code is that you’re not wrapping your train object with pd.Dataframe
This code works well for me:
import featuretools as ft
import pandas as pd
train=pd.DataFrame({
'Id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
'MSSubClass': {0: 60, 1: 20, 2: 60, 3: 70, 4: 60},
'MSZoning': {0: 'RL', 1: 'RL', 2: 'RL', 3: 'RL', 4: 'RL'},
'LotFrontage': {0: 65.0, 1: 80.0, 2: 68.0, 3: 60.0, 4: 84.0},
'LotArea': {0: 8450, 1: 9600, 2: 11250, 3: 9550, 4: 14260}
})
es_train = ft.EntitySet()
es_train.add_dataframe(dataframe_name='train', dataframe=train, index='Id')
_, tp = ft.get_valid_primitives(entityset=es_train, target_dataframe_name='train')
for p in tp:
print(p.name)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0