Class KMeansPlusPlusClusterer<T extends Clusterable>
- java.lang.Object
-
- org.apache.commons.math3.ml.clustering.Clusterer<T>
-
- org.apache.commons.math3.ml.clustering.KMeansPlusPlusClusterer<T>
-
- Type Parameters:
T
- type of the points to cluster
public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T>
Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.- Since:
- 3.2
- See Also:
- K-means++ (wikipedia)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
KMeansPlusPlusClusterer.EmptyClusterStrategy
Strategies to use for replacing an empty cluster.
-
Field Summary
Fields Modifier and Type Field Description private KMeansPlusPlusClusterer.EmptyClusterStrategy
emptyStrategy
Selected strategy for empty clusters.private int
k
The number of clusters.private int
maxIterations
The maximum number of iterations.private RandomGenerator
random
Random generator for choosing initial centers.
-
Constructor Summary
Constructors Constructor Description KMeansPlusPlusClusterer(int k)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
assignPointsToClusters(java.util.List<CentroidCluster<T>> clusters, java.util.Collection<T> points, int[] assignments)
Adds the given points to the closestCluster
.private Clusterable
centroidOf(java.util.Collection<T> points, int dimension)
Computes the centroid for a set of points.private java.util.List<CentroidCluster<T>>
chooseInitialCenters(java.util.Collection<T> points)
Use K-means++ to choose the initial centers.java.util.List<CentroidCluster<T>>
cluster(java.util.Collection<T> points)
Runs the K-means++ clustering algorithm.KMeansPlusPlusClusterer.EmptyClusterStrategy
getEmptyClusterStrategy()
Returns theKMeansPlusPlusClusterer.EmptyClusterStrategy
used by this instance.private T
getFarthestPoint(java.util.Collection<CentroidCluster<T>> clusters)
Get the point farthest to its cluster centerint
getK()
Return the number of clusters this instance will use.int
getMaxIterations()
Returns the maximum number of iterations this instance will use.private int
getNearestCluster(java.util.Collection<CentroidCluster<T>> clusters, T point)
Returns the nearestCluster
to the given pointprivate T
getPointFromLargestNumberCluster(java.util.Collection<? extends Cluster<T>> clusters)
Get a random point from theCluster
with the largest number of pointsprivate T
getPointFromLargestVarianceCluster(java.util.Collection<CentroidCluster<T>> clusters)
Get a random point from theCluster
with the largest distance variance.RandomGenerator
getRandomGenerator()
Returns the random generator this instance will use.-
Methods inherited from class org.apache.commons.math3.ml.clustering.Clusterer
distance, getDistanceMeasure
-
-
-
-
Field Detail
-
k
private final int k
The number of clusters.
-
maxIterations
private final int maxIterations
The maximum number of iterations.
-
random
private final RandomGenerator random
Random generator for choosing initial centers.
-
emptyStrategy
private final KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy
Selected strategy for empty clusters.
-
-
Constructor Detail
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data into
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to use
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to userandom
- random generator to use for choosing initial centers
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to userandom
- random generator to use for choosing initial centersemptyStrategy
- strategy to use for handling empty clusters that may appear during algorithm iterations
-
-
Method Detail
-
getK
public int getK()
Return the number of clusters this instance will use.- Returns:
- the number of clusters
-
getMaxIterations
public int getMaxIterations()
Returns the maximum number of iterations this instance will use.- Returns:
- the maximum number of iterations, or -1 if no maximum is set
-
getRandomGenerator
public RandomGenerator getRandomGenerator()
Returns the random generator this instance will use.- Returns:
- the random generator
-
getEmptyClusterStrategy
public KMeansPlusPlusClusterer.EmptyClusterStrategy getEmptyClusterStrategy()
Returns theKMeansPlusPlusClusterer.EmptyClusterStrategy
used by this instance.- Returns:
- the
KMeansPlusPlusClusterer.EmptyClusterStrategy
-
cluster
public java.util.List<CentroidCluster<T>> cluster(java.util.Collection<T> points) throws MathIllegalArgumentException, ConvergenceException
Runs the K-means++ clustering algorithm.- Specified by:
cluster
in classClusterer<T extends Clusterable>
- Parameters:
points
- the points to cluster- Returns:
- a list of clusters containing the points
- Throws:
MathIllegalArgumentException
- if the data points are null or the number of clusters is larger than the number of data pointsConvergenceException
- if an empty cluster is encountered and theemptyStrategy
is set toERROR
-
assignPointsToClusters
private int assignPointsToClusters(java.util.List<CentroidCluster<T>> clusters, java.util.Collection<T> points, int[] assignments)
Adds the given points to the closestCluster
.
-
chooseInitialCenters
private java.util.List<CentroidCluster<T>> chooseInitialCenters(java.util.Collection<T> points)
Use K-means++ to choose the initial centers.- Parameters:
points
- the points to choose the initial centers from- Returns:
- the initial centers
-
getPointFromLargestVarianceCluster
private T getPointFromLargestVarianceCluster(java.util.Collection<CentroidCluster<T>> clusters) throws ConvergenceException
Get a random point from theCluster
with the largest distance variance.- Parameters:
clusters
- theCluster
s to search- Returns:
- a random point from the selected cluster
- Throws:
ConvergenceException
- if clusters are all empty
-
getPointFromLargestNumberCluster
private T getPointFromLargestNumberCluster(java.util.Collection<? extends Cluster<T>> clusters) throws ConvergenceException
Get a random point from theCluster
with the largest number of points- Parameters:
clusters
- theCluster
s to search- Returns:
- a random point from the selected cluster
- Throws:
ConvergenceException
- if clusters are all empty
-
getFarthestPoint
private T getFarthestPoint(java.util.Collection<CentroidCluster<T>> clusters) throws ConvergenceException
Get the point farthest to its cluster center- Parameters:
clusters
- theCluster
s to search- Returns:
- point farthest to its cluster center
- Throws:
ConvergenceException
- if clusters are all empty
-
getNearestCluster
private int getNearestCluster(java.util.Collection<CentroidCluster<T>> clusters, T point)
Returns the nearestCluster
to the given point
-
centroidOf
private Clusterable centroidOf(java.util.Collection<T> points, int dimension)
Computes the centroid for a set of points.- Parameters:
points
- the set of pointsdimension
- the point dimension- Returns:
- the computed centroid for the set of points
-
-