Next: Example Up: Information Discriminant Analysis (IDA) Previous: Information Discriminant Analysis (IDA)

Introduction

This tutorial is an accompanying document to the computer code for information discriminant analysis. The details of the method can be found in [1] and the computer code is written in MATLAB^TM. The code consists of several basic functions:

negative_mu.m returns the value of the $\mu$ -measure whose maximization yields a feature extraction matrix, T^*. This function can return additional arguments, namely the gradient and the Hessian of $\mu$ with respect to the current feature extraction matrix, T. The minimization of $\mu$ is iterative in T, and knowing the gradient and the Hessian of $\mu$ enables feasible computations. The initial value of T is chosen by the user. For more info type help negative_mu in MATLAB^TMcommand prompt.

ida_feature_extraction_matrix.m returns the optimal feature extraction matrix, T^*, as well as the value of the $\mu$ -measure at the optimal feature subspace. ida_feature_extraction_matrix.m uses a built-in MATLAB^TMoptimization function fminunc.m, therefore to run this function Optimization Toolbox may be necessary (see below for exceptions). Since all MATLAB^TM optimization routines are written as minimizations, the maximization of the $\mu$ -measure is achieved as the minimization of - $\mu$ . Hence the name of the function above (negative_mu). The input arguments of ida_feature_extraction_matrix.m allow for various choices of initial condition, optimization tolerances, size of feature space, optimization method, etc. Type ida_feature_extraction_matrix in MATLAB^TMcommand prompt to learn more about this function.

orthonormalize.m is a function I wrote not being aware of MATLAB^TMfunction orth.m. It turns out that orth(A')=orthonormalize(A)', where A is an arbitrary matrix. In addition, orthonormalize.m returns the largest singular value of A. This function is an auxiliary function and is used to orthonormalize the feature extraction matrix T.

The optimization in ida_feature_extraction_matrix.m can be implemented using the conjugate-gradient method. It runs very efficiently, and in general is faster than the trust-region method, used by fminunc.m. This is especially true for large-scale problems, where the feature extraction matrix, T, has a lot of elements. For this purpose two additional functions are needed:

These functions were written by Hans Bruun Nielsen, and the above links point to his web page. As far as I can tell, the functions are bug-free, except for one minor thing: I had to replace the variable named alpha in conjugate_gradient.m with Alpha. I think the code is several years old, and meanwhile alpha.m became a legitimate MATLAB^TMfunction. Therefore, using alpha will cause MATLAB^TMto call the function, and consequently report an error.

Using conjugate_gradient.m places some constraints on the way the objective function (in this case negative_mu.m) is called. In particular, the parameters of negative_mu.m have to be passed as a single argument. In addition, conjugate-gradient uses no Hessian, and so I decided to write a version of negative_mu.m that works with conjugate_gradient.m. The function is called:

negative_mu_cg.m

This function is marginally different from its original version, but requires some manipulations of the feature extraction matrix, T, so I decided to write a separate function. Anyway, with these 6 functions, one should be able to implement IDA as a feature extraction technique. Final remark: running conjugate-gradient method does not require MATLAB^TMOptimization Toolbox.

Next: Example Up: Information Discriminant Analysis (IDA) Previous: Information Discriminant Analysis (IDA)

Zoran Nenadic 2007-10-04