FinAnalysis.Histogram Namespace

Classes

Class

Description

Original parer can be found https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf Maintains a t-digest by collecting new points in a buffer that is then sorted occasionally and merged into a sorted array that contains previously computed centroids. This can be very fast because the cost of sorting and merging is amortized over several insertion. If we keep N centroids total and have the input array is k long, then the amortized cost is something like N/k + log k These costs even out when N/k = log k. Balancing costs is often a good place to start in optimizing an algorithm. For different values of compression factor, the following table shows estimated asymptotic values of N and suggested values of k: Sizing considerations for t-digest

Compression	N	k
50	78	25
100	157	42
200	314	73

The virtues of this kind of t-digest implementation include:

No allocation is required after initialization
The data structure automatically compresses existing centroids when possible
No object overhead is incurred for centroids since data is kept in primitive arrays

The current implementation takes the liberty of using ping-pong buffers for implementing the merge resulting in a substantial memory penalty, but the complexity of an in place merge was not considered as worthwhile since even with the overhead, the memory cost is less than 40 bytes per centroid which is much less than half what the AVLTreeDigest uses. Speed tests are still not complete so it is uncertain whether the merge strategy is faster than the tree strategy.

Structures

	Structure	Description
	MergingDigestCentroid	The centroid object.