Answers for "aggregate one variable on a subset of the dataframe and the other on the total dataframe pandas"

Python

Groups the DataFrame using the specified columns

# Groups the DataFrame using the specified columns

df.groupBy().avg().collect()
# [Row(avg(age)=3.5)]
sorted(df.groupBy('name').agg({'age': 'mean'}).collect())
# [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
sorted(df.groupBy(df.name).avg().collect())
# [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)]
sorted(df.groupBy(['name', df.age]).count().collect())
# [Row(name='Alice', age=2, count=1), Row(name='Bob', age=5, count=1)]

Posted by: Guest on April-08-2020

Source

Aggregate on the entire DataFrame without group

# Aggregate on the entire DataFrame without group

df.agg({"age": "max"}).collect()
# [Row(max(age)=5)]
from pyspark.sql import functions as F
df.agg(F.min(df.age)).collect()
# [Row(min(age)=2)]

Posted by: Guest on April-20-2020

Source

Code answers related to "aggregate one variable on a subset of the dataframe and the other on the total dataframe pandas"

Code answers related to "Python"

Python Answers by Framework

Django
Flask

Browse Popular Code Answers by Language

Python

Javascript

Whatever

Shell/Bash

CSS

Html

PHP

SQL

Java

Answers for "aggregate one variable on a subset of the dataframe and the other on the total dataframe pandas"

Code answers related to "aggregate one variable on a subset of the dataframe and the other on the total dataframe pandas"

Code answers related to "Python"

Python Answers by Framework

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us