Nonlinear Dimensionality Reduction for Regression

Nonlinear Dimensionality Reduction for Regression [1]

The task of dimensionality reduction for regression (DRR) is to find a low dimensional representation, z (q-dim), of the input covariates, x (p-dim), with q << p, for regressing the output, y (d-dim), given n i.i.d. data {(xi, yi)}. DRR is mainly useful for: (1) visualization of high dimensional data, (2) efficient regressor design with a reduced input dimension, and (3) elimination of noise in data x by uncovering the essential information z for predicting y. It should be noted that DRR is not tied to a particular regression estimation method, and can be rather seen as a prior task to the regressor design for a better understanding of data.

DRR is different from other well-known dimensionality reduction algorithms. To clarify, one can categorize DRR as a supervised technique with a real multivariate label y. On the other hand, most supervised techniques are devoted for the classification setting (i.e., discrete y), which includes Linear Discriminant Analysis (LDA), kernel LDA, the general graph embedding, and the metric learning. The unsupervised dimension reduction framework even assumes that y is unknown, subsuming the principal subspace methods (PCA and kernel PCA), the nonlinear locality-preserving manifold learning (LLE, ISOMAP, and Laplacian Eigenmap), and the probabilistic methods like GPLVM.

The crucial notion related to DRR is the sufficiency in  dimension reduction (SDR), which states that one has to find the linear subspace bases B = [b1, …, bq] where bi is a p-dim vector (in the nonlinear case, B = {b1(), …, bq()}, where bi() is a nonlinear basis function) such that y and x are conditionally independent given BTx. As this condition implies that the conditional distribution of y given x equals to that of y given z = BTx, the dimension reduction entails no loss of information for the purpose of regression.  It is known that such B always exists with non-unique solutions. Hence we are naturally interested in the minimal subspace or the intersection of all such subspaces, often called the central subspace (Although the subspace is usually meant for a linear case, however, we abuse the term for both linear and nonlinear cases).

Typically, two schools of approaches have been suggested to find the central subspace: the inverse regression (IR) [1,3] and the kernel dimension reduction (KDR) [2,4]. KDR in [2] directly reduces the task of imposing conditional independence to the optimization problem that minimizes the conditional covariance operator in RKHS (reproducing kernel Hilbert space). This is achieved by quantifying the notion of conditional dependency (between y and x given BTx) by the positive definite ordering of the expected covariance operators in what is called the probability-determining RKHS (e.g., the RBF kernel-induced Hilbert space).

Although KDR formulates the problem in RKHS, the final projection is  linear in the original space. For the nonlinear extension, [4] proposed the manifold KDR which first maps the original input space to a nonlinear manifold (e.g., by Laplacian Eigenmap learned from x only), and applies the KDR to find a linear subspace in the manifold. However, this introduces a tight coupling between the central space and the manifold learned separately, which restricts itself to a transduction setting only. That is, for a new input point, one has to rebuild the manifold entirely with data including the new point. Moreover, both  methods do not have closed-form solutions, resorting to a gradient-based optimization.

The inverse regression (IR) is another interesting framework for DRR. IR is based on the fact that the inverse regression, E[x|y], lies on the subspace spanned by B (the bases of the central subspace), provided that the marginal distribution of x is ellipse-symmetric (e.g., a Gaussian). Thus B coincides with the principal directions in the variance of the inverse regression, namely, V(E[x|y]). In [1], this variance was estimated by slicing the output space (i.e., clustering on y), thereby known as sliced IR (or SIR).

Despite its simplicity and closed-form solution, SIR assumes a linear central subspace, with a strong restriction on the marginal distribution of x. To cope with the limitation, a natural kernel extension (KSIR) was proposed in [3]. It discovers a nonlinear central subspace, and moreover allows the distribution of x to be rarely restricted. However, KSIR still resorts to slicing on y, which can result in unreliable variance estimation for high dimensional y.

In this work we propose a novel nonlinear method for DRR that exploits the kernel Gram matrices of input and output. It estimates the variance of the inverse regression under the IR framework, however, avoids the slicing by the effective use of covariance operators in RKHS. In fact, we show that KSIR is a special case of ours in that KSIR can be instantiated by a particular choice of output kernel matrix. Our approach can be reliably applied to the cases of high dimensional output, while suppressing potential noise in the output data.


We demonstrate the benefits of the proposed method in a comprehensive set of evaluations on several important regression problems that often arise in the computer vision areas:

a) Estismation of head pose from images with varying illumination condition

  • input x: (64 x 64) face images with diverse lighting directions
  • output y: 2D head pose (left/right and up/down rotation angles)
  • central subspace dim: 2
Central subspace obtained from the proposed “COIR”

Central subspace from KSIR

b) Human body pose estimation from a silhouette image

  • input x: (160 x 100) silhouette image at a side view
  • output y: 59-dim 3D joint angles at articulation points
  • central subspace dim: 2

Selected frames from a half walking cycle

Central subspaces; The proposed method denoted by “COIR”


[1] K.-C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 1991 
[2] K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised Learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 2004 
[3] H. M. Wu, Kernel sliced inverse regression with applications on classification, ICSA Applied Statistics Symposium, 2006 
[4] J. Nilsson, F. Sha, and M. I. Jordan, Regression on manifolds using kernel dimension reduction, International Conference on Machine Learning (ICML), 2007


  • [1] M. Kim and V. Pavlovic. “Dimensionality Reduction using Covariance Operator Inverse Regression”. IEEE Conf. Computer Vision and Pattern Recognition. 2008.

Leave a Reply

Your email address will not be published. Required fields are marked *