Least Squares Mutual Information (LSMI)

Mutual Information coefficient covers all kinds of statistical dependencies including nonlinear ones. Mutual Information of two random variables equals zero if and only if they are completely statistical independent.

Use density ratio approximation to approximate alternative version of mutual information between two scalar series. Use Cross Validation to select model parameters.

This topic contains the following sections:

LSMI Algorithm
Implementation
See Also

LSMI Algorithm

Mutual information of two continuous random variables X, Y can be defined as:

As the calculation of continuous MI by definition on limited statistical data set is impossible due to necessity of knowing exact probability distribution functions, we have to use approximations. The approximation of mutual information between two data series is calculated using Least Squares Mutual Information method. LSMI receives two data series and allows to approximate mutual information between them. The method does not involve density estimation and directly models the density ratio:

by the following linear model:

where α = (α₁, α₂, . . . , α_b)^⊤ are parameters to be learned from samples, and φ(x, y) = (φ₁(x, y), φ₂(x, y), . . . , φ_b(x, y))^⊤ are basis functions such that φ(x, y) ≥ 0_b for all (x, y). 0_b denotes the b-dimensional vector with all zeros. b is the basis size of the model. The basis functions φ(x, y):

The steps of the algorithm are the following (the input parameters of constructors are marked with bold font):

Algorithm randomly chooses basisSize number of centers {c_l | c_l = (u^⊤_l, v^⊤_l)^⊤}^b_l=1 from the input data.
Depending on the input series we select one of the cases:
- In case both series are continuous (useDelta flag should be false): as model candidates, we propose using a Gaussian kernel model:
  where sigma will be optimally chosen from the candidates list sigmaList.
- In case of one series (we consider it is the second one) is discrete φ_i,y should be a Kronecker delta (useDelta flag should be true): Dy,y_i:
- If both series are discrete, following approximation doesn’t make any sense, cause in this case mutual information can be found from formula:
In order to estimate model parameter - vector α, we need to maximize Likelihood function under constraint α_i > 0, we obtain the following optimization problem:
where lambda is Lagrange multiplier and
Differentiating the objective function with respect to α and equating it to zero, we can obtain an analytic-form solution:
where I_b is the b-dimensional identity matrix; and the value of lambda will be chosen optimally from the candidates list lambdaList. It will allow us to find solution very fast using Least Squares instead of complex constrained numerical optimization.
Algorithm splits all data into basketCount baskets and performs Cross Validation procedure: estimates model parameters on data set with one basket excluded and validates solution fitness on the excluded basket. Model with best CV score will be chosen to calculate final MI value.

Implementation

Least Squares Mutual Information method is implemented by the LSMI class.

The following constructors create an instance of the class and calculate Mutual Information between series X and Y:

Constructor	Description	Performance
set series and other parameters	Calculate Mutual Information between series X and Y, set list of possible values for Lagrange multiplier and Gaussian standard deviation, whether Kronecker Delta function should be used, basket count and number of basis function to approximate density ratio. lambdaList - candidates list from which value of lambda will be chosen optimally; sigmaList - candidates list from which value of sigma will be chosen optimally; useDelta - flag which should be set to true if the second series is discrete; basketCount - number of baskets for cross validation procedure; basisSize - number of basis functions for the model. LSMI(Double, Double, Double, Double, Boolean, Int32, Int32)
set series and use default parameters	Calculate Mutual Information between series X and Y. Default values: lambdaList = vector of 9 logarithmically spaced points between 10 and 10⁹ sigmaList = vector of 9 logarithmically spaced points between 10^-2 and 10² useDelta = false basketCount = 5 basisSize = min(200, input vectors length) LSMI(Double, Double)

Constructor

Description

Performance

set series and other parameters

Calculate Mutual Information between series X and Y, set list of possible values for Lagrange multiplier and Gaussian standard deviation, whether Kronecker Delta function should be used, basket count and number of basis function to approximate density ratio.

lambdaList - candidates list from which value of lambda will be chosen optimally;

sigmaList - candidates list from which value of sigma will be chosen optimally;

useDelta - flag which should be set to true if the second series is discrete;

basketCount - number of baskets for cross validation procedure;

basisSize - number of basis functions for the model.

method LSMI(Double, Double, Double, Double, Boolean, Int32, Int32)

set series and use default parameters

Calculate Mutual Information between series X and Y.

Default values:

lambdaList = vector of 9 logarithmically spaced points between 10 and 10⁹

sigmaList = vector of 9 logarithmically spaced points between 10^-2 and 10²

useDelta = false

basketCount = 5

basisSize = min(200, input vectors length)

method LSMI(Double, Double)

The class provides two methods:

Method	Description	Performance
out sample validate	Estimate goodness of model fitness on new samples. OutSampleValidate(Double, Double)
log space generate	The logspace function generates logarithmically spaced vectors. Especially useful for creating frequency vectors. Generates n points between decades 10^a and 10^b. LogSpace(Double, Double, Int32)

Method

Description

Performance

out sample validate

Estimate goodness of model fitness on new samples.

method OutSampleValidate(Double, Double)

log space generate

The logspace function generates logarithmically spaced vectors. Especially useful for creating frequency vectors. Generates n points between decades 10^a and 10^b.

method LogSpace(Double, Double, Int32)

The class provides one property MutualInformation which is the approximation of Mutual Information between two series.

Other Resources

FactorAnalysis

Affine Transformation

Principal Component Analysis (PCA)