How can I use get_valid_primitives when I have only one dataframe in Featuretools?

I am trying to figure out how Featuretools works and I am testing it on the Housing Prices dataset on Kaggle. Because the dataset is huge, I’ll work here with only a set of it.

The dataframe is:

train={'Id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'MSSubClass': {0: 60, 1: 20, 2: 60, 3: 70, 4: 60}, 'MSZoning': {0: 'RL', 1: 'RL', 2: 'RL', 3: 'RL', 4: 'RL'}, 'LotFrontage': {0: 65.0, 1: 80.0, 2: 68.0, 3: 60.0, 4: 84.0}, 'LotArea': {0: 8450, 1: 9600, 2: 11250, 3: 9550, 4: 14260}}

I create an EntitySet for this dataframe:

es_train = ft.EntitySet()

I add the dataframe to the created EntitySet:

es_train.add_dataframe(dataframe_name='train', dataframe=train, index='Id')

Then I call the function:

ap, tp = ft.get_valid_primitives(entityset=es_train, target_dataframe_name='train')

And here it all breaks up, because I get the following error message:

KeyError: ‘DataFrame train does not exist in entity set’

I tried to study the tutorials on the Featuretools site, but all I could find are tutorials with multiple dataframes, so it didn’t help me at all.

Where am I mistaking? How can I correct the mistake(s)?

Thanks!

Later edit: I am using PyCharm. When I work in script mode, I get the error above. However, when I use the command line, everything works perfectly.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The only issue I see with your code is that you’re not wrapping your train object with pd.Dataframe

This code works well for me:

import featuretools as ft
import pandas as pd

train=pd.DataFrame({
    'Id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 
    'MSSubClass': {0: 60, 1: 20, 2: 60, 3: 70, 4: 60}, 
    'MSZoning': {0: 'RL', 1: 'RL', 2: 'RL', 3: 'RL', 4: 'RL'}, 
    'LotFrontage': {0: 65.0, 1: 80.0, 2: 68.0, 3: 60.0, 4: 84.0}, 
    'LotArea': {0: 8450, 1: 9600, 2: 11250, 3: 9550, 4: 14260}
})

es_train = ft.EntitySet()
es_train.add_dataframe(dataframe_name='train', dataframe=train, index='Id')

_, tp = ft.get_valid_primitives(entityset=es_train, target_dataframe_name='train')


for p in tp:
    print(p.name)


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x