Answers for "remove duplicate pandas"

3

drop duplicates pandas first column

import pandas as pd 
  
# making data frame from csv file 
data = pd.read_csv("employees.csv") 
  
# sorting by first name 
data.sort_values("First Name", inplace = True) 
  
# dropping ALL duplicte values 
data.drop_duplicates(subset ="First Name",keep = False, inplace = True) 
  
# displaying data 
print(data)
Posted by: Guest on June-28-2020
0

pandas remove repeated index

df[~df.index.duplicated()]
Posted by: Guest on August-17-2020
8

remove duplicate row in df

df = df.drop_duplicates()
Posted by: Guest on August-19-2020
0

pandas remove repeated index

idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
idx.drop_duplicates(keep='first')
Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object')
idx.drop_duplicates(keep='last')
Index(['cow', 'beetle','lamb', 'hippo'], dtype='object')
idx.drop_duplicates(keep='False')
Index(['cow', 'beetle','hippo'], dtype='object')
Posted by: Guest on August-17-2020
0

Return a new DataFrame with duplicate rows removed

# Return a new DataFrame with duplicate rows removed

from pyspark.sql import Row
df = sc.parallelize([
  Row(name='Alice', age=5, height=80),
  Row(name='Alice', age=5, height=80),
  Row(name='Alice', age=10, height=80)]).toDF()
df.dropDuplicates().show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# |  5|    80|Alice|
# | 10|    80|Alice|
# +---+------+-----+

df.dropDuplicates(['name', 'height']).show()
# +---+------+-----+
# |age|height| name|
# +---+------+-----+
# |  5|    80|Alice|
# +---+------+-----+
Posted by: Guest on April-08-2020
0

Pandas drop duplicates

df = df.drop_duplicates(subset = ["b"])
Posted by: Guest on August-31-2021

Code answers related to "remove duplicate pandas"

Python Answers by Framework

Browse Popular Code Answers by Language