Click or drag to resize

FinAnalysis.Histogram Namespace

Public classMergingDigest
Original parer can be found Maintains a t-digest by collecting new points in a buffer that is then sorted occasionally and merged into a sorted array that contains previously computed centroids. This can be very fast because the cost of sorting and merging is amortized over several insertion. If we keep N centroids total and have the input array is k long, then the amortized cost is something like N/k + log k These costs even out when N/k = log k. Balancing costs is often a good place to start in optimizing an algorithm. For different values of compression factor, the following table shows estimated asymptotic values of N and suggested values of k: Sizing considerations for t-digest
The virtues of this kind of t-digest implementation include:
  • No allocation is required after initialization
  • The data structure automatically compresses existing centroids when possible
  • No object overhead is incurred for centroids since data is kept in primitive arrays
The current implementation takes the liberty of using ping-pong buffers for implementing the merge resulting in a substantial memory penalty, but the complexity of an in place merge was not considered as worthwhile since even with the overhead, the memory cost is less than 40 bytes per centroid which is much less than half what the AVLTreeDigest uses. Speed tests are still not complete so it is uncertain whether the merge strategy is faster than the tree strategy.