Answers for "drop duplicates subset"

Python

Return a new DataFrame with duplicate rows removed

# Return a new DataFrame with duplicate rows removed

from pyspark.sql import Row
df = sc.parallelize([
  Row(name='Alice', age=5, height=80),
  Row(name='Alice', age=5, height=80),
  Row(name='Alice', age=10, height=80)]).toDF()
df.dropDuplicates().show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# |  5|    80|Alice|
# | 10|    80|Alice|
# +---+------+-----+

df.dropDuplicates(['name', 'height']).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# |  5|    80|Alice|
# +---+------+-----+

Posted by: Guest on April-08-2020

Source

droping Duplicates

# this is based on 2 factors the name of the dog and the breed of the dog so we can
# have 2 dog with the same name but diff breed.

df.drop_duplicates(subset=["name", "breed"])

Posted by: Guest on November-06-2021

Source

Code answers related to "drop duplicates subset"

Code answers related to "Python"

Python Answers by Framework

Django
Flask

Browse Popular Code Answers by Language

Python

Javascript

Whatever

Shell/Bash

CSS

Html

PHP

SQL

Java

Answers for "drop duplicates subset"

Code answers related to "drop duplicates subset"

Code answers related to "Python"

Python Answers by Framework

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us