Click or drag to resize

Quality Threshold (QT) Clustering

This topic contains the following sections:

QT-Clustering is the partitioning method invented initially for gene clustering. Although it is more expensive than the K-Means, it overcomes K-Means weaknesses: it does not require specifying the number of clusters and is predictable, i.e. guaranties the same partition over multiple runs.

QT-Clustering Algorithm

The QT-Clustering is guided by a quality threshold which in standard specification determines the maximum radius of clusters. The cluster radius is defined as the maximal distance existing among the central element and any member of the cluster. The central element is that has minimal summary distance to other cluster members.

General clustering logic is presented by two-step iterative procedure:

  1. Generation of candidate clusters. The algorithm starts with a global set that includes all the objects of the data set and each object is used as a base to generate a candidate cluster.

    For the first object, it builds the candidate cluster by including the closest object, the next closest, and so on, until the radius of the cluster surpasses the threshold. The second candidate cluster is formed by repeating the procedure with the second object. That is, the objects from the first candidate cluster are not removed from consideration and can be assigned to the second (and any other) candidate cluster which results in overlapping candidate clusters. The process continues for all the objects.

  2. Selection of the true candidate and establishing QT-cluster. At the conclusion of Generation step, there is a set of candidate clusters and many of them overlap. At this point, the largest (by number of members) candidate cluster is selected as true and established as a QT-cluster. The objects within this cluster are now removed from consideration. All remaining objects outside the QT-cluster will be used for the next round of candidate clusters Generation (step 1).

The entire process is repeated until each element is assigned to some cluster. The result is a set of non-overlapping QT-clusters that meet quality threshold with respect to maximum allowable radius.
Note Note

The QT-Clustering algorithm is computationally intensive as much as O (N 3) , so it is not recommended for clustering of more than 1000 objects in real-time or simulation applications.

Implementation and Usage

Caution note Caution

Hereinafter working with raw observations should remember that the objects are assumed to be arranged in columns and presented by their features in rows.

The QT-Clustering algorithm is implemented by the QTClustering class.

To initialize an instance of the class, the only constructor QTClustering(Double) is provided which requires the quality threshold value, cluster radius, as input parameter.

The class provides all the Basic Methods and Properties.

Code Sample

QT Clustering can be used as follows:

C#
 1using System;
 2using FinMath.LinearAlgebra;
 3using FinMath.ClusterAnalysis;
 4using FinMath.DataStructures;
 5
 6namespace FinMath.Samples
 7{
 8    class QTClusteringSample
 9    {
10        static void Main()
11        {
12            // Input parameters
13            const int objectsCount = 9;
14            const int factorsCount = 5;
15            const MetricType metricType = MetricType.EuclideanDistance;
16            const Double radius = 0.75;
17
18            // Generate input data matrix.
19            Matrix dataMatrix = Matrix.Random(factorsCount, objectsCount);
20
21            Console.WriteLine("Input DataMatrix: ");
22            Console.WriteLine(dataMatrix.ToString("0.000"));
23
24            // Initialize reusable clusterization performer.
25            QTClustering clustering = new QTClustering(radius);
26
27            // Perform KMeans clustering.
28            // We use Euclidean distance to construct distance matrix.
29            clustering.ComputeClustering(dataMatrix, metricType);
30
31            // Get resulting numbers of clusters.
32            Int32 clusterCount = clustering.ClustersCount;
33            // Get centroids data matrix. Each line describe centroid of one cluster.
34            Matrix centroids = clustering.Centroids();
35            // Get central elements list.
36            IntegerArray centralElements = clustering.CentralElements();
37
38            if (clustering.Status == FMStatus.MethodSucceeded)
39            {
40                Console.WriteLine();
41                Console.WriteLine("Clustering computed successfully. Test Result:");
42                Console.WriteLine("  Number of clusters: " + clusterCount);
43                // Output clusters assignment. i-th element contains number of cluster assigned to i-th object
44                Console.WriteLine("  Cluster assignment: " + clustering.ClustersAssignment());
45                Console.WriteLine();
46
47                for (Int32 i = 0; i < clusterCount; ++i)
48                {
49                    Console.WriteLine("Cluster #" + (i + 1));
50                    // Output elements of i-th cluster.
51                    Console.WriteLine("  Cluster elements: " + clustering.GetCluster(i + 1));
52                    // Output central element of i-th cluster.
53                    Console.WriteLine("  Cluster central element: " + centralElements[i]);
54                    // Output centroid of i-th cluster.
55                    Console.WriteLine("  Cluster centroid " + centroids.GetRow(i).ToString("0.000"));
56                }
57            }
58            else
59                Console.WriteLine("Clustering computation failed.");
60        }
61    }
62}

See Also