I’ve got a Pandas DataFrame and I want to combine the ‘lat’ and ‘long’ columns to form a tuple.
<class 'pandas.core.frame.DataFrame'> Int64Index: 205482 entries, 0 to 209018 Data columns: Month 205482 non-null values Reported by 205482 non-null values Falls within 205482 non-null values Easting 205482 non-null values Northing 205482 non-null values Location 205482 non-null values Crime type 205482 non-null values long 205482 non-null values lat 205482 non-null values dtypes: float64(4), object(5)
The code I tried to use was:
def merge_two_cols(series):
return (series['lat'], series['long'])
sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
However, this returned the following error:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-261-e752e52a96e6> in <module>()
2 return (series['lat'], series['long'])
3
----> 4 sample['lat_long'] = sample.apply(merge_two_cols, axis=1)
5
…
AssertionError: Block shape incompatible with manager
How can I solve this problem?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Get comfortable with zip. It comes in handy when dealing with column data.
df['new_col'] = list(zip(df.lat, df.long))
It’s less complicated and faster than using apply or map. Something like np.dstack is twice as fast as zip, but wouldn’t give you tuples.
Method 2
In [10]: df
Out[10]:
A B lat long
0 1.428987 0.614405 0.484370 -0.628298
1 -0.485747 0.275096 0.497116 1.047605
2 0.822527 0.340689 2.120676 -2.436831
3 0.384719 -0.042070 1.426703 -0.634355
4 -0.937442 2.520756 -1.662615 -1.377490
5 -0.154816 0.617671 -0.090484 -0.191906
6 -0.705177 -1.086138 -0.629708 1.332853
7 0.637496 -0.643773 -0.492668 -0.777344
8 1.109497 -0.610165 0.260325 2.533383
9 -1.224584 0.117668 1.304369 -0.152561
In [11]: df['lat_long'] = df[['lat', 'long']].apply(tuple, axis=1)
In [12]: df
Out[12]:
A B lat long lat_long
0 1.428987 0.614405 0.484370 -0.628298 (0.484370195967, -0.6282975278)
1 -0.485747 0.275096 0.497116 1.047605 (0.497115615839, 1.04760475074)
2 0.822527 0.340689 2.120676 -2.436831 (2.12067574274, -2.43683074367)
3 0.384719 -0.042070 1.426703 -0.634355 (1.42670326172, -0.63435462504)
4 -0.937442 2.520756 -1.662615 -1.377490 (-1.66261469102, -1.37749004179)
5 -0.154816 0.617671 -0.090484 -0.191906 (-0.0904840623396, -0.191905582481)
6 -0.705177 -1.086138 -0.629708 1.332853 (-0.629707821728, 1.33285348929)
7 0.637496 -0.643773 -0.492668 -0.777344 (-0.492667604075, -0.777344111021)
8 1.109497 -0.610165 0.260325 2.533383 (0.26032456699, 2.5333825651)
9 -1.224584 0.117668 1.304369 -0.152561 (1.30436900612, -0.152560909725)
Method 3
Pandas has the itertuples method to do exactly this:
list(df[['lat', 'long']].itertuples(index=False, name=None))
Method 4
You should try using pd.to_records(index=False):
import pandas as pd
df = pd.DataFrame({'language': ['en', 'ar', 'es'], 'greeting': ['Hi', 'اهلا', 'Hola']})
df
language greeting
0 en Hi
1 ar اهلا
2 es Hola
df['list_of_tuples'] = list(df[['language', 'greeting']].to_records(index=False))
df['list_of_tuples']
0 [en, Hi]
1 [ar, اهلا]
2 [es, Hola]
enjoy!
Method 5
I’d like to add df.values.tolist(). (as long as you don’t mind to get a column of lists rather than tuples)
import pandas as pd
import numpy as np
size = int(1e+07)
df = pd.DataFrame({'a': np.random.rand(size), 'b': np.random.rand(size)})
%timeit df.values.tolist()
1.47 s ± 38.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit list(zip(df.a,df.b))
1.92 s ± 131 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0