clustering python

#HIERARCHCAL CLUSTERING

#import the linkage, dendrogram and fcluster func from scipy
from scipy.cluster.hierarchy import dendrogram,linkage,fcluster
import matplotlib.pyplot as plt
import numpy as np
#Create your linkage object, it contains all the info about the 
#joins and clusterization
#The "ward" argument refers to the linking method ("single","average"...)
Z = linkage(X,"ward")
#plotting the dendrogram
plt.figure(figsize = (25,30))
# Color threshold refers to the distance cutoff for coloring the clusters
dendrogram(Z, leaf_font_size = 8, color_threshold = 10)
plt.show()
# fcluster returns an array as big as your df \w the cluster each data belongs
# u can cut the clusters using diferent "criterion". Some examples:

# U only want 4 clusters:
k = 4
clusters = fcluster(Z,k,criterion="maxclust")

# U want the max distance in a cluster to be 10:
max_d = 20
clusters = fcluster(Z,max_d, criterion = "distance")

# Visualization of the clustering (2d clustering)
plt.figure(figsize = (10,8))
# now we use the object /w the ncluster info to color the scatter
# cmap refers to the color palette we are using
plt.scatter(X[:,0], X[:,1] ,  c = clusters, cmap = "brg")
plt.show()

Posted by: Guest on August-11-2021

from sklearn.cluster import KMeans kmeans = KMeans(init="random", n_clusters=3, n_init=10, max_iter=300, random_state=42 ) kmeans.fit(x_train) #Replace your training dataset instead of x_train # The lowest SSE value print(kmeans.inertia_) # Final locations of the centroid print(kmeans.cluster_centers_) # The number of iterations required to converge print(kmeans.n_iter_) # first five predicted labels print(kmeans.labels_[:5]) # init controls the initialization technique. The standard version of the k-means algorithm is implemented by setting init to "random". Setting this to "k-means++" employs an advanced trick to speed up convergence, which you’ll use later. # n_clusters sets k for the clustering step. This is the most important parameter for k-means. # n_init sets the number of initializations to perform. This is important because two runs can converge on different cluster assignments. The default behavior for the scikit-learn algorithm is to perform ten k-means runs and return the results of the one with the lowest SSE. # max_iter sets the number of maximum iterations for each initialization of the k-means algorithm.

Code answers related to "clustering python"

Code answers related to "Python"

Browse Popular Code Answers by Language

Answers for "clustering python"

Code answers related to "clustering python"

Code answers related to "Python"

Python Answers by Framework

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us