Click or drag to resize

MergingDigest Class

Original parer can be found https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf Maintains a t-digest by collecting new points in a buffer that is then sorted occasionally and merged into a sorted array that contains previously computed centroids. This can be very fast because the cost of sorting and merging is amortized over several insertion. If we keep N centroids total and have the input array is k long, then the amortized cost is something like N/k + log k These costs even out when N/k = log k. Balancing costs is often a good place to start in optimizing an algorithm. For different values of compression factor, the following table shows estimated asymptotic values of N and suggested values of k: Sizing considerations for t-digest
CompressionNk
507825
10015742
20031473
The virtues of this kind of t-digest implementation include:
  • No allocation is required after initialization
  • The data structure automatically compresses existing centroids when possible
  • No object overhead is incurred for centroids since data is kept in primitive arrays
The current implementation takes the liberty of using ping-pong buffers for implementing the merge resulting in a substantial memory penalty, but the complexity of an in place merge was not considered as worthwhile since even with the overhead, the memory cost is less than 40 bytes per centroid which is much less than half what the AVLTreeDigest uses. Speed tests are still not complete so it is uncertain whether the merge strategy is faster than the tree strategy.
Inheritance Hierarchy
SystemObject
  FinAnalysis.HistogramMergingDigest

Namespace:  FinAnalysis.Histogram
Assembly:  FinAnalysis (in FinAnalysis.dll) Version: 2.1.13-cc97e13414b71fde928e8f3a546ac1daf26f295f
Syntax
C#
[SerializableAttribute]
public class MergingDigest

The MergingDigest type exposes the following members.

Constructors
  NameDescription
Public methodMergingDigest(Double)
Allocates a buffer merging t-digest. This is the normally used constructor that allocates default sized internal arrays. Other versions are available, but should only be used for special cases.
Public methodMergingDigest(Double, Int32)
If you know the size of the temporary buffer for incoming points, you can use this entry point.
Public methodMergingDigest(Double, Int32, Int32)
Fully specified constructor. Normally only used for deserializing a buffer t-digest.
Top
Properties
  NameDescription
Public propertyCentroidCount
The number of centroids in the collection.
Public propertyCentroids
The CDF approximation centroids.
Public propertyCompression
Returns the current compression factor.
Public propertyMax
The observed data maximum value.
Public propertyMin
The observed data minimum value.
Public propertySize
Returns the number of points that have been added to this TDigest. (The sum of the weights on all centroids.)
Top
Methods
  NameDescription
Public methodAdd(Double)
Add a sample to this TDigest.
Public methodAdd(ListMergingDigest)
Public methodAdd(Double, Int32)
Adds a sample to a histogram.
Public methodCdf
Returns the fraction of all points added which are less than x.
Public methodCompress
Re-examines a t-digest to determine whether some centroids are redundant.If your data are perversely ordered, this may be a good idea.Even if not, this may save 20% or so in space. The cost is roughly the same as adding as many data points as there are centroids.This is typically < 10 * compression, but could be as high as 100 * compression. This is a destructive operation that is not thread-safe.
Public methodEquals
Determines whether the specified object is equal to the current object.
(Inherited from Object.)
Public methodGetHashCode
Serves as the default hash function.
(Inherited from Object.)
Public methodGetType
Gets the Type of the current instance.
(Inherited from Object.)
Public methodQuantile
Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Top
See Also