I am working with pandas dataframes that are essentially time series like this:
level Date 1976-01-01 409.67 1976-02-01 409.58 1976-03-01 409.66 …
What I want to have, is multiple indexes/headers for the level column, like so:
Station1 #Name of the datasource
43.1977317,-4.6473648,5 #Lat/Lon of the source
Precip #Type of data
Date
1976-01-01 409.67
1976-02-01 409.58
1976-03-01 409.66
…
So essentially I am searching for something like Mydata.columns.level1 = ['Station1'], Mydata.columns.level2 = [Lat,Lon], Mydata.columns.level3 = ['Precip'].
Reason being that a single location can have multiple datasets, and that I want to be able to pick either all data from one location, or all data of a certain type from all locations, from a subsequent merged, big dataframe.
I can set up an example dataframe from the pandas documentation, and test my selection, but with my real data, I need a different way to set the indexes as in the example.
Example:
Built a small dataframe
header = [np.array(['location','location','location','location2','location2','location2']),
np.array(['S1','S2','S3','S1','S2','S3'])]
df = pd.DataFrame(np.random.randn(5, 6), index=['a','b','c','d','e'], columns = header )
df
location location2
S1 S2 S3 S1 S2 S3
a -1.469932 -1.544511 -1.373463 -0.317262 0.024832 -0.641000
b 0.047170 -0.339423 1.351253 0.601172 -1.607339 0.035932
c -0.257479 1.140829 0.188291 -0.242490 1.019315 -1.163429
d 0.832949 0.098170 -0.818513 -0.070383 0.557419 -0.489839
e -0.628549 -0.158419 0.366167 -2.319316 -0.474897 -0.319549
Pick datatype or location:
df.loc(axis=1)[:,'S1']
location location2
S1 S1
a -1.469932 -0.317262
b 0.047170 0.601172
c -0.257479 -0.242490
d 0.832949 -0.070383
e -0.628549 -2.319316
df['location']
S1 S2 S3
a -1.469932 -1.544511 -1.373463
b 0.047170 -0.339423 1.351253
c -0.257479 1.140829 0.188291
d 0.832949 0.098170 -0.818513
e -0.628549 -0.158419 0.366167
Or am I just looking for the wrong terminology? Because 90% of all examples in the documentation, and the questions here only treat the vertical “stuff” (dates or abcde in my case) as index, and a quick df.index.values on my test data also just gets me the vertical array(['a', 'b', 'c', 'd', 'e'], dtype=object).
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
You can use multiIndex to give multiple columns with names for each level. Use MultiIndex.from_product() to make multiIndex from cartesian products of multiple iterables.
header = pd.MultiIndex.from_product([['location1','location2'],
['S1','S2','S3']],
names=['loc','S'])
df = pd.DataFrame(np.random.randn(5, 6),
index=['a','b','c','d','e'],
columns=header)
Two levels will be loc and S.
df loc location1 location2 S S1 S2 S3 S1 S2 S3 a -1.245988 0.858071 -1.433669 0.105300 -0.630531 -0.148113 b 1.132016 0.318813 0.949564 -0.349722 -0.904325 0.443206 c -0.017991 0.032925 0.274248 0.326454 -0.108982 0.567472 d 2.363533 -1.676141 0.562893 0.967338 -1.071719 -0.321113 e 1.921324 0.110705 0.023244 -0.432196 0.172972 -0.50368
Now you can use xs to slice the dateframe based on levels.
df.xs('location1',level='loc',axis=1)
S S1 S2 S3
a -1.245988 0.858071 -1.433669
b 1.132016 0.318813 0.949564
c -0.017991 0.032925 0.274248
d 2.363533 -1.676141 0.562893
e 1.921324 0.110705 0.02324
df.xs('S1',level='S',axis=1)
loc location1 location2
a -1.245988 0.105300
b 1.132016 -0.349722
c -0.017991 0.326454
d 2.363533 0.967338
e 1.921324 -0.43219
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0