I have a categorical variable in a series. I want to assign integer ids to each unique value and create a new series with the ids, effectively turning a string variable into an integer variable. What is the most compact/efficient way to do this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You could use pandas.factorize:
In [32]: s = pd.Series(['a','b','c']) In [33]: labels, levels = pd.factorize(s) In [35]: labels Out[35]: array([0, 1, 2])
Method 2
Example using the new pandas categorical type in pandas 0.15+
http://pandas.pydata.org/pandas-docs/version/0.16.2/categorical.html
In [553]: x = pd.Series(['a', 'a', 'a', 'b', 'b', 'c']).astype('category')
In [554]: x
Out[554]:
0 a
1 a
2 a
3 b
4 b
5 c
dtype: category
Categories (3, object): [
a
, b
, c]
In [555]: x.cat.codes
Out[555]:
0 0
1 0
2 0
3 1
4 1
5 2
dtype: int8
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0