Clusters data using the KMeans algorithm.
References: KMeans algorithm: KMeans++ initialization:
Constructor | Description |
Full Usage:
KMeans(createDefaultFunction, distanceFunction, isDistanceFunctionReturningSquareOfDistance, sumUpFunction, divideFunction)
Parameters:
Func<'TDataSum>
-
Creates the default data. Sum up the default data and other data should result in the same value of the other data.
distanceFunction : Func<'TDataSum, 'TData, float>
-
A function that evaluates the distance between the mean value of the cluster and a data point.
The return value is either the Euclidean distance, or the square of the distance. In the latter case
the next argument must be set to true.
isDistanceFunctionReturningSquareOfDistance : bool
-
True if the distanceFunction returns the square of the distance.
sumUpFunction : Func<'TDataSum, 'TData, 'TDataSum>
-
A function that adds the first and the second argument and returns the result.
divideFunction : Func<'TDataSum, int, 'TDataSum>
-
A function that divides the first argument by the second argument. Calling this function
signals that the sum up is finished. Because of this extra functionality, the divide function is called even if
the divisor is 1.
|
The function sumUpFunction and divideFunction have to work in such a way that divideFunction(sumUpFunction(default, x), 1) == x.
Example
For taking the linear mean, the sumUpFunction can be defined as sumUpFunction = (sum, x) => sum + x, and the divideFunction as divideFunction = (nom, denom) => nom/denom. For taking the logarithmic mean, the sumUpFunction can be defined as sumUpFunction = (sum, x) => sum + Math.Log(x), and the divideFunction as divideFunction = (nom, denom) => Math.Exp(nom/denom). The latter example works because the divideFunction is called in every case after summing up, even if the denominator is 1. |
Instance member | Description | ||
|
Gets a list which contains the number of values in each of the clusters (length is numberOfClusters).
|
||
|
Gets a list with the same length as the number of data points, in which each element is the index of the cluster this data point is assigned to.
|
||
|
Gets a list which contains the mean values of the clusters (length is numberOfClusters).
|
||
|
Gets a list which contains the data points provided.
|
||
Full Usage:
this.Evaluate
Parameters:
IEnumerable<'TData>
-
The data points.
numberOfClusters : int
-
The number of clusters to create.
|
Clusters the provided data. Please note that the data are not normalized. Thus, for multidimensional data, please normalize the data before!
|
||
Full Usage:
this.EvaluateClustersStandardDeviation
Returns: IReadOnlyList<float>
The standard deviation of each cluster, i.e. the square root ( of the sum of squared distances divided by N-1)
|
Evaluates for each cluster the standard deviation, i.e. the square root ( of the sum of squared distances divided by N-1)
|
||
Full Usage:
this.EvaluateDaviesBouldinIndex
Parameters:
Func<'TDataSum, 'TDataSum, float>
-
Function to calculate the Euclidean distance between two cluster centroids.
Returns: float
The Davies-Bouldin-Index using q=1 (p=2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).
|
Evaluates the Davies-Bouldin-Index. The exponent q (see KMeans.EvaluateDaviesBouldinIndex) is set to 1, meaning that the mean Euclidean distance of the points to their respective centroid is used in the nominator.
|
||
Full Usage:
this.EvaluateDaviesBouldinIndex
Parameters:
Func<'TDataSum, 'TDataSum, float>
-
Function to calculate the Euclidean distance between two cluster centroids.
q : int
-
Order of the moment to calculate the average distance of the cluster points to their centroid. A value of one results in
calculating the average (Euclidean) distance, a value of 2 results in the square root of the mean squared distances, etc.
Returns: float
The Davies-Bouldin-Index using the exponent q (p is implicitely 2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).
|
Evaluates the Davies-Bouldin-Index. The exponent q (used to calculate the average distance of the cluster points to their centroid) can be set as parameter.
|
||
Full Usage:
this.EvaluateMean2ndMomentOfDistances
Returns: IReadOnlyList<float>
The 2nd moment of the distances in each cluster, i.e. the square root of the average of the squared Euclidean distances (or whatever the distance function is) of the points to their respective centroid
|
Evaluates for each cluster the 2nd moment of the distances, i.e. the square root of the average of the squared Euclidean distances (or whatever the distance function is) of the points to their respective centroid.
|
||
Full Usage:
this.EvaluateMeanDistances
Returns: IReadOnlyList<float>
The mean distance in each cluster, i.e. the mean of the Euclidean distances (or whatever the distance function is) of the points to their respective centroid
|
Evaluates for each cluster the mean distance, i.e. the average of the Euclidean distances (or whatever the distance function is) of the points to their respective centroid.
|
||
Full Usage:
this.EvaluateMeanNthMomentOfDistances
Parameters:
int
Returns: IReadOnlyList<float>
The mean distance in each cluster, i.e. the square root ( of the sum of squared distances divided by N)
|
Evaluates for each cluster the mean distance, i.e. the square root ( of the sum of squared distances divided by N)
|
||
Full Usage:
this.EvaluateSumOfSquaredDistancesToClusterMean
Returns: float
Sum of (squared distance of each point to its cluster center).
|
Evaluates the sum of (squared distance of each point to its cluster center).
|
||
Full Usage:
this.HasPatchingEmptyClustersFailed
|
If true, during evaluation, empty clusters have appeared, which could not be patched with other data. |
||
Full Usage:
this.HasReachedMaximumNumberOfIterations
|
If true, the evaluation has reached the maximum number of iterations, without converging. |
||
Full Usage:
this.SortingOfClusterValues
|
Get/sets the sorting of cluster values after evaluation. It presumes that the generic type TDataSum implements the IComparable interface. |
||
Full Usage:
this.TryEvaluate
Parameters:
IEnumerable<'TData>
-
The data points.
numberOfClusters : int
-
The number of clusters to create.
Returns: bool
True if successful; otherwise, false.
|
Clusters the provided data. Please note that the data are not normalized. Thus, for multidimensional data, please normalize the data before!
|