pandas groupby

Categories

apply
aggregate
transform
filter

diff

df.groupby(pd.cut(df['val'],[0,3000,5000,7000,10000])).size()

df.groupby(
        df['sales rep'].apply(lambda x: 'william' in x)
        ).size()sort_values(ascending = False)

will sum all rows based on columns

df.apply(sum, axis = 0)

aggregation:
size
sum
mean / Median
max/min
idxmax / idsmin ==> inde of the maximum / minimum

agg({'orderId':'size' , 'val':['sum','mean'] , 'sale:['sum','mean']})

with names:

df.groupby('Column0').agg(Name1=('Column1','count') , 
                    Name2=('Column2' , 'nunique' ))
gr = {'name':('column1':'sum) , 'name2':('column2':'size')}
df.agg(**gr)

Transform:

df.groupby('sales rep')['val'].transform(lambda x:x/sum(x))

Filter:
it is like Having in sql

df.groupby('sales rep').filter(labmda x: (x['val']* x['sales']).sum() > 200000 )

diff()

df.groupby('UserID')['ShamsiDate'].diff()

در ردیف دوم اختلاف آن با ردیف اول را می‌آورد.