DataFrame:
c_os_family_ss c_os_major_is l_customer_id_i 0 Windows 7 90418 1 Windows 7 90418 2 Windows 7 90418
Code:
print df
for name, group in df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)):
print name
print group
I’m trying to just loop over the aggregated data, but I get the error:
ValueError: too many values to unpack
@EdChum, here’s the expected output:
c_os_family_ss
l_customer_id_i
131572 Windows 7,Windows 7,Windows 7,Windows 7,Window...
135467 Windows 7,Windows 7,Windows 7,Windows 7,Window...
c_os_major_is
l_customer_id_i
131572 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
135467 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...
The output is not the problem, I wish to loop over every group.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.
In general:
-
df.groupby(...)returns aGroupByobject (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:grouped = df.groupby('A') for name, group in grouped: ... -
When you apply a function on the groupby, in your example
df.groupby(...).agg(...)(but this can also betransform,apply,mean, …), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the ‘split-apply-combine’ paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).
Method 2
Here is an example of iterating over a pd.DataFrame grouped by the column atable. For this sample, “create” statements for an SQL database are generated within the for loop:
import pandas as pd
df1 = pd.DataFrame({
'atable': ['Users', 'Users', 'Domains', 'Domains', 'Locks'],
'column': ['col_1', 'col_2', 'col_a', 'col_b', 'col'],
'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],
'is_null': ['No', 'No', 'Yes', 'No', 'Yes'],
})
df1_grouped = df1.groupby('atable')
# iterate over each group
for group_name, df_group in df1_grouped:
print('nCREATE TABLE {}('.format(group_name))
for row_index, row in df_group.iterrows():
col = row['column']
column_type = row['column_type']
is_null = 'NOT NULL' if row['is_null'] == 'No' else ''
print('t{} {} {},'.format(col, column_type, is_null))
print(");")
Method 3
You can iterate over the index values if your dataframe has already been created.
df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))
for name in df.index:
print name
print df.loc[name]
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0