Answers for "clustering python"

1

clustering python

#HIERARCHCAL CLUSTERING

#import the linkage, dendrogram and fcluster func from scipy
from scipy.cluster.hierarchy import dendrogram,linkage,fcluster
import matplotlib.pyplot as plt
import numpy as np
#Create your linkage object, it contains all the info about the 
#joins and clusterization
#The "ward" argument refers to the linking method ("single","average"...)
Z = linkage(X,"ward")
#plotting the dendrogram
plt.figure(figsize = (25,30))
# Color threshold refers to the distance cutoff for coloring the clusters
dendrogram(Z, leaf_font_size = 8, color_threshold = 10)
plt.show()
# fcluster returns an array as big as your df \w the cluster each data belongs
# u can cut the clusters using diferent "criterion". Some examples:

# U only want 4 clusters:
k = 4
clusters = fcluster(Z,k,criterion="maxclust")

# U want the max distance in a cluster to be 10:
max_d = 20
clusters = fcluster(Z,max_d, criterion = "distance")

# Visualization of the clustering (2d clustering)
plt.figure(figsize = (10,8))
# now we use the object /w the ncluster info to color the scatter
# cmap refers to the color palette we are using
plt.scatter(X[:,0], X[:,1] ,  c = clusters, cmap = "brg")
plt.show()
Posted by: Guest on August-11-2021
0

k-means clustering python

from sklearn.cluster import KMeans
kmeans = KMeans(init="random", n_clusters=3, n_init=10, max_iter=300, random_state=42 )
kmeans.fit(x_train) #Replace your training dataset instead of x_train
# The lowest SSE value
print(kmeans.inertia_)
# Final locations of the centroid
print(kmeans.cluster_centers_)
# The number of iterations required to converge
print(kmeans.n_iter_)
# first five predicted labels 
print(kmeans.labels_[:5])


# init controls the initialization technique. The standard version of the k-means algorithm is implemented by setting init to "random". Setting this to "k-means++" employs an advanced trick to speed up convergence, which you’ll use later.

# n_clusters sets k for the clustering step. This is the most important parameter for k-means.

# n_init sets the number of initializations to perform. This is important because two runs can converge on different cluster assignments. The default behavior for the scikit-learn algorithm is to perform ten k-means runs and return the results of the one with the lowest SSE.

# max_iter sets the number of maximum iterations for each initialization of the k-means algorithm.
Posted by: Guest on September-11-2020

Python Answers by Framework

Browse Popular Code Answers by Language