KMeans<'TData, 'TDataSum> Type

Clusters data using the KMeans algorithm.

References: KMeans algorithm: KMeans++ initialization:

Constructors

Constructor Description

KMeans(createDefaultFunction, distanceFunction, isDistanceFunctionReturningSquareOfDistance, sumUpFunction, divideFunction)

Full Usage: KMeans(createDefaultFunction, distanceFunction, isDistanceFunctionReturningSquareOfDistance, sumUpFunction, divideFunction)

Parameters:
    createDefaultFunction : Func<'TDataSum> - Creates the default data. Sum up the default data and other data should result in the same value of the other data.
    distanceFunction : Func<'TDataSum, 'TData, float> - A function that evaluates the distance between the mean value of the cluster and a data point. The return value is either the Euclidean distance, or the square of the distance. In the latter case the next argument must be set to true.
    isDistanceFunctionReturningSquareOfDistance : bool - True if the distanceFunction returns the square of the distance.
    sumUpFunction : Func<'TDataSum, 'TData, 'TDataSum> - A function that adds the first and the second argument and returns the result.
    divideFunction : Func<'TDataSum, int, 'TDataSum> - A function that divides the first argument by the second argument. Calling this function signals that the sum up is finished. Because of this extra functionality, the divide function is called even if the divisor is 1.

Initalize a new instance of KMeans.

The function sumUpFunction and divideFunction have to work in such a way that

divideFunction(sumUpFunction(default, x), 1) == x
.

createDefaultFunction : Func<'TDataSum>

Creates the default data. Sum up the default data and other data should result in the same value of the other data.

distanceFunction : Func<'TDataSum, 'TData, float>

A function that evaluates the distance between the mean value of the cluster and a data point. The return value is either the Euclidean distance, or the square of the distance. In the latter case the next argument must be set to true.

isDistanceFunctionReturningSquareOfDistance : bool

True if the distanceFunction returns the square of the distance.

sumUpFunction : Func<'TDataSum, 'TData, 'TDataSum>

A function that adds the first and the second argument and returns the result.

divideFunction : Func<'TDataSum, int, 'TDataSum>

A function that divides the first argument by the second argument. Calling this function signals that the sum up is finished. Because of this extra functionality, the divide function is called even if the divisor is 1.

Example

For taking the linear mean, the sumUpFunction can be defined as

sumUpFunction = (sum, x) => sum + x
, and the divideFunction as
divideFunction = (nom, denom) => nom/denom.

For taking the logarithmic mean, the sumUpFunction can be defined as

sumUpFunction = (sum, x) => sum + Math.Log(x)
, and the divideFunction as
divideFunction = (nom, denom) => Math.Exp(nom/denom)
. The latter example works because the divideFunction is called in every case after summing up, even if the denominator is 1.

Instance members

Instance member Description

this.ClusterCounts

Full Usage: this.ClusterCounts

Returns: IReadOnlyList<int>

Gets a list which contains the number of values in each of the clusters (length is numberOfClusters).

Returns: IReadOnlyList<int>

this.ClusterIndices

Full Usage: this.ClusterIndices

Returns: IReadOnlyList<int>

Gets a list with the same length as the number of data points, in which each element is the index of the cluster this data point is assigned to.

Returns: IReadOnlyList<int>

this.ClusterMeans

Full Usage: this.ClusterMeans

Returns: IReadOnlyList<'TDataSum>

Gets a list which contains the mean values of the clusters (length is numberOfClusters).

Returns: IReadOnlyList<'TDataSum>

this.Data

Full Usage: this.Data

Returns: IReadOnlyList<'TData>

Gets a list which contains the data points provided.

Returns: IReadOnlyList<'TData>

this.Evaluate

Full Usage: this.Evaluate

Parameters:
    data : IEnumerable<'TData> - The data points.
    numberOfClusters : int - The number of clusters to create.

Clusters the provided data. Please note that the data are not normalized. Thus, for multidimensional data, please normalize the data before!

data : IEnumerable<'TData>

The data points.

numberOfClusters : int

The number of clusters to create.

InvalidOperationException If either the maximum number of iterations has reached without convergence, or empty clusters were created during evaluation, which could not be filled up.

this.EvaluateClustersStandardDeviation

Full Usage: this.EvaluateClustersStandardDeviation

Returns: IReadOnlyList<float> The standard deviation of each cluster, i.e. the square root ( of the sum of squared distances divided by N-1)

Evaluates for each cluster the standard deviation, i.e. the square root ( of the sum of squared distances divided by N-1)

Returns: IReadOnlyList<float>

The standard deviation of each cluster, i.e. the square root ( of the sum of squared distances divided by N-1)

this.EvaluateDaviesBouldinIndex

Full Usage: this.EvaluateDaviesBouldinIndex

Parameters:
    distanceFunctionOfClusterMeans : Func<'TDataSum, 'TDataSum, float> - Function to calculate the Euclidean distance between two cluster centroids.

Returns: float The Davies-Bouldin-Index using q=1 (p=2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).

Evaluates the Davies-Bouldin-Index. The exponent q (see KMeans.EvaluateDaviesBouldinIndex) is set to 1, meaning that the mean Euclidean distance of the points to their respective centroid is used in the nominator.

distanceFunctionOfClusterMeans : Func<'TDataSum, 'TDataSum, float>

Function to calculate the Euclidean distance between two cluster centroids.

Returns: float

The Davies-Bouldin-Index using q=1 (p=2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).

this.EvaluateDaviesBouldinIndex

Full Usage: this.EvaluateDaviesBouldinIndex

Parameters:
    distanceFunctionOfClusterMeans : Func<'TDataSum, 'TDataSum, float> - Function to calculate the Euclidean distance between two cluster centroids.
    q : int - Order of the moment to calculate the average distance of the cluster points to their centroid. A value of one results in calculating the average (Euclidean) distance, a value of 2 results in the square root of the mean squared distances, etc.

Returns: float The Davies-Bouldin-Index using the exponent q (p is implicitely 2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).

Evaluates the Davies-Bouldin-Index. The exponent q (used to calculate the average distance of the cluster points to their centroid) can be set as parameter.

distanceFunctionOfClusterMeans : Func<'TDataSum, 'TDataSum, float>

Function to calculate the Euclidean distance between two cluster centroids.

q : int

Order of the moment to calculate the average distance of the cluster points to their centroid. A value of one results in calculating the average (Euclidean) distance, a value of 2 results in the square root of the mean squared distances, etc.

Returns: float

The Davies-Bouldin-Index using the exponent q (p is implicitely 2 if distanceFunctionOfClusterMeans returns the Euclidean distance between the cluster centroids).

this.EvaluateMean2ndMomentOfDistances

Full Usage: this.EvaluateMean2ndMomentOfDistances

Returns: IReadOnlyList<float> The 2nd moment of the distances in each cluster, i.e. the square root of the average of the squared Euclidean distances (or whatever the distance function is) of the points to their respective centroid

Evaluates for each cluster the 2nd moment of the distances, i.e. the square root of the average of the squared Euclidean distances (or whatever the distance function is) of the points to their respective centroid.

Returns: IReadOnlyList<float>

The 2nd moment of the distances in each cluster, i.e. the square root of the average of the squared Euclidean distances (or whatever the distance function is) of the points to their respective centroid

this.EvaluateMeanDistances

Full Usage: this.EvaluateMeanDistances

Returns: IReadOnlyList<float> The mean distance in each cluster, i.e. the mean of the Euclidean distances (or whatever the distance function is) of the points to their respective centroid

Evaluates for each cluster the mean distance, i.e. the average of the Euclidean distances (or whatever the distance function is) of the points to their respective centroid.

Returns: IReadOnlyList<float>

The mean distance in each cluster, i.e. the mean of the Euclidean distances (or whatever the distance function is) of the points to their respective centroid

this.EvaluateMeanNthMomentOfDistances

Full Usage: this.EvaluateMeanNthMomentOfDistances

Parameters:
    q : int

Returns: IReadOnlyList<float> The mean distance in each cluster, i.e. the square root ( of the sum of squared distances divided by N)

Evaluates for each cluster the mean distance, i.e. the square root ( of the sum of squared distances divided by N)

q : int
Returns: IReadOnlyList<float>

The mean distance in each cluster, i.e. the square root ( of the sum of squared distances divided by N)

this.EvaluateSumOfSquaredDistancesToClusterMean

Full Usage: this.EvaluateSumOfSquaredDistancesToClusterMean

Returns: float Sum of (squared distance of each point to its cluster center).

Evaluates the sum of (squared distance of each point to its cluster center).

Returns: float

Sum of (squared distance of each point to its cluster center).

this.HasPatchingEmptyClustersFailed

Full Usage: this.HasPatchingEmptyClustersFailed

If true, during evaluation, empty clusters have appeared, which could not be patched with other data.

this.HasReachedMaximumNumberOfIterations

Full Usage: this.HasReachedMaximumNumberOfIterations

If true, the evaluation has reached the maximum number of iterations, without converging.

this.SortingOfClusterValues

Full Usage: this.SortingOfClusterValues

Get/sets the sorting of cluster values after evaluation. It presumes that the generic type TDataSum implements the IComparable interface.

this.TryEvaluate

Full Usage: this.TryEvaluate

Parameters:
    data : IEnumerable<'TData> - The data points.
    numberOfClusters : int - The number of clusters to create.

Returns: bool True if successful; otherwise, false.

Clusters the provided data. Please note that the data are not normalized. Thus, for multidimensional data, please normalize the data before!

data : IEnumerable<'TData>

The data points.

numberOfClusters : int

The number of clusters to create.

Returns: bool

True if successful; otherwise, false.