I’m trying to create a custom transformer that will split a column into multiple columns and I want to provide the delimiter also.
Here is the code I made to create the transformer
class StringSplitTransformer(BaseEstimator, TransformerMixin):
def __init__(self, cols = None):
self.cols = cols
def transform(self,df,delim):
X = df.copy()
for col in self.cols:
X = pd.concat([X,X[col].str.split(delim,expand = True)], axis = 1)
return X
def fit(self, *_):
return self
When I run fit() and transform() separately, it all works fine:
split_trans = StringSplitTransformer(cols = ['Cabin']) split_trans.fit(df) split_trans.transform(df, '/')
But when I run fit_transform() it give me an error:
split_trans.fit_transform(X_train, '/') TypeError: transform() missing 1 required positional argument: 'delim'
In my transform() function if I don’t have the delim parameter and instead just provide the delimiter then fit_transform() works.
I don’t understand why it does that.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
fit should accept at least two arguments, positional X and optional y=None. When you call fit_transform, your transformer assigns y='' and misses delim. Well, I would rather make delim a class variable:
class StringSplitTransformer(BaseEstimator, TransformerMixin):
def __init__(self, delim, cols=None):
self.delim = delim
self.cols = cols
def fit(self, df, y=None):
return self
def transform(self, df):
X = df.copy()
for col in self.cols:
X = pd.concat([X, X[col].str.split(self.delim, expand=True)],
axis=1)
return X
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0