A distributed collection of data grouped into named columns

# A distributed collection of data grouped into named columns

people = sqlContext.read.parquet("...")

ageCol = people.age

# To create DataFrame using SQLContext
people = sqlContext.read.parquet("...")
department = sqlContext.read.parquet("...")

people.filter(people.age > 30).join(
  department, people.deptId == department.id).groupBy(
  department.name, "gender").agg({"salary": "avg", "age": "max"})

Posted by: Guest on April-19-2020

Source

Code answers related to "A distributed collection of data grouped into named columns"

Code answers related to "Python"

Browse Popular Code Answers by Language

Answers for "A distributed collection of data grouped into named columns"

Code answers related to "A distributed collection of data grouped into named columns"

Code answers related to "Python"

Python Answers by Framework

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us