k-means clustering algorithm

k-means clustering python

from sklearn.cluster import KMeans
kmeans = KMeans(init="random", n_clusters=3, n_init=10, max_iter=300, random_state=42 )
kmeans.fit(x_train) #Replace your training dataset instead of x_train
# The lowest SSE value
print(kmeans.inertia_)
# Final locations of the centroid
print(kmeans.cluster_centers_)
# The number of iterations required to converge
print(kmeans.n_iter_)
# first five predicted labels 
print(kmeans.labels_[:5])


# init controls the initialization technique. The standard version of the k-means algorithm is implemented by setting init to "random". Setting this to "k-means++" employs an advanced trick to speed up convergence, which you’ll use later.

# n_clusters sets k for the clustering step. This is the most important parameter for k-means.

# n_init sets the number of initializations to perform. This is important because two runs can converge on different cluster assignments. The default behavior for the scikit-learn algorithm is to perform ten k-means runs and return the results of the one with the lowest SSE.

# max_iter sets the number of maximum iterations for each initialization of the k-means algorithm.

Posted by: Guest on September-11-2020

# K-means clustering with a k-means++ like initialization mode from pyspark.ml.linalg import Vectors data = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([1.0, 1.0]),), (Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)] df = spark.createDataFrame(data, ["features"]) kmeans = KMeans(k=2, seed=1) model = kmeans.fit(df) centers = model.clusterCenters() len(centers) # 2 model.computeCost(df) # 2.000... transformed = model.transform(df).select("features", "prediction") rows = transformed.collect() rows[0].prediction == rows[1].prediction # True rows[2].prediction == rows[3].prediction # True model.hasSummary summary.k # 2 summary.clusterSizes # [2, 2] kmeans_path = temp_path + "/kmeans" kmeans.save(kmeans_path) kmeans2 = KMeans.load(kmeans_path) kmeans2.getK() # 2 model_path = temp_path + "/kmeans_model" model.save(model_path) model2 = KMeansModel.load(model_path) model2.hasSummary # False model.clusterCenters()[0] == model2.clusterCenters()[0] # array([ True, True], dtype=bool) model.clustersCenters()[1] == model2.clusterCenters()[1] # array([ True, True], dtype=bool)

Code answers related to "k-means clustering algorithm"

Code answers related to "Python"

Browse Popular Code Answers by Language

Answers for "k-means clustering algorithm"

Code answers related to "k-means clustering algorithm"

Code answers related to "Python"

Python Answers by Framework

Browse Popular Code Answers by Language

Popular Programming Languages

Advertisements

Company

Compilers

Help

Connect with us