I have two Series s1 and s2 with the same (non-consecutive) indices. How do I combine s1 and s2 to being two columns in a DataFrame and keep one of the indices as a third column?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I think concat is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):
In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1') In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2') In [3]: pd.concat([s1, s2], axis=1) Out[3]: s1 s2 A 1 3 B 2 4 In [4]: pd.concat([s1, s2], axis=1).reset_index() Out[4]: index s1 s2 0 A 1 3 1 B 2 4
Note: This extends to more than 2 Series.
Method 2
Why don’t you just use .to_frame if both have the same indexes?
>= v0.23
a.to_frame().join(b)
< v0.23
a.to_frame().join(b.to_frame())
Method 3
Pandas will automatically align these passed in series and create the joint index
They happen to be the same here. reset_index moves the index to a column.
In [2]: s1 = Series(randn(5),index=[1,2,4,5,6]) In [4]: s2 = Series(randn(5),index=[1,2,4,5,6]) In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index() Out[8]: index s1 s2 0 1 -0.176143 0.128635 1 2 -1.286470 0.908497 2 4 -0.995881 0.528050 3 5 0.402241 0.458870 4 6 0.380457 0.072251
Method 4
If I may answer this.
The fundamentals behind converting series to data frame is to understand that
1. At conceptual level, every column in data frame is a series.
2. And, every column name is a key name that maps to a series.
If you keep above two concepts in mind, you can think of many ways to convert series to data frame.
One easy solution will be like this:
Create two series here
import pandas as pd series_1 = pd.Series(list(range(10))) series_2 = pd.Series(list(range(20,30)))
Create an empty data frame with just desired column names
df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])
Put series value inside data frame using mapping concept
df['Column_name#1'] = series_1 df['Column_name#2'] = series_2
Check results now
df.head(5)
Method 5
Example code:
a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})
Pandas allows you to create a DataFrame from a dict with Series as the values and the column names as the keys. When it finds a Series as a value, it uses the Series index as part of the DataFrame index. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFrame has duplicated value. In the above example, data['idx_col'] has the same data as data.index.
Method 6
Not sure I fully understand your question, but is this what you want to do?
pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)
(index=s1.index is not even necessary here)
Method 7
A simplification of the solution based on join():
df = a.to_frame().join(b)
Method 8
If you are trying to join Series of equal length but their indexes don’t match (which is a common scenario), then concatenating them will generate NAs wherever they don’t match.
x = pd.Series({'a':1,'b':2,})
y = pd.Series({'d':4,'e':5})
pd.concat([x,y],axis=1)
#Output (I've added column names for clarity)
Index x y
a 1.0 NaN
b 2.0 NaN
d NaN 4.0
e NaN 5.0
Assuming that you don’t care if the indexes match, the solution is to reindex both Series before concatenating them. If drop=False, which is the default, then Pandas will save the old index in a column of the new dataframe (the indexes are dropped here for simplicity).
pd.concat([x.reset_index(drop=True),y.reset_index(drop=True)],axis=1)
#Output (column names added):
Index x y
0 1 4
1 2 5
Method 9
I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as ‘prediction’. If you need dataframe converted back to a list then use values.tolist()
output=pd.DataFrame(X_test) output['prediction']=y_pred list=output.values.tolist()
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0