Sequence Grouping

We consider the problem of learning density mixture models for Classification. Traditional learning of mixtures for density estimation focuses on models that correctly represent the density at all points in the sample space. Discriminative learning, on the other hand, aims at representing the density at the decision boundary. We introduce novel discriminative learning methods for mixtures of generative models.

Generative probabilistic models such as Bayesian networks (BNs) are an attractive choice in a number of data-driven modeling tasks. While such models are implicitly employed for joint density estimation, they have recently been shown to also yield performance comparable to sophisticated discriminative classifiers such as SVMs and C4.5 [1,2]. In the classification settings, maximizing a conditional likelihood (CML) is known to achieve better classification performance than the traditional Maximum Likelihood (ML) fitting [3]. Unfortunately, the CML optimization problem is, in general, complex with non-unique solutions. Typical CML solutions resort to gradient based numerical optimization methods. Despite improved classification performance, the gradient search makes standard approaches computationally demanding.

Moreover, we focus on the class of density mixture models. A mixture model has a potential to yield superior classification performance to a single BN model, as well as serve as a rich density estimator. Again typical CML learning relies on the same gradient search (e.g, [4]) suffering from computational overhead. We formulate an efficient and theoretically sound approach to discriminate mixture learning that avoids the parametric gradient optimization.

The proposed method exploits the properties of mixtures to alleviate the complex learning task. In a greedy fashion, the mixture components are added recursively while maximizing the conditional likelihood. More specifically, at each iteration it finds a new mixture component f that, when added to the current mixture F, maximally decreases the conditional loss. Using functional gradient boosting, it results in data weights with which the new component f will be learned. Interestingly, our weighting scheme makes the data points at the decision boundary focused highly, which is a desirable property for successful classification. On the other hand, the generative (non-discriminative) recursive mixture model of [5] assigns higher weights on the data at the class centers, which is promising for data fitting, but less for classification.

A crucial benefit of this method is efficiency: finding a new f requires ML learning on weighed data, which is relatively easy to do (e.g., computing sufficient statistics if f is in the exponential family). Thus this approach is particularly suited to domains with complex component models (e.g., hidden Markov models (HMMs) in time-series classification) that are usually too complex for effective gradient search. In addition, the recursive approach can benefit from optimal order estimation and insensitiveness to the initial parameters.

We demonstrate the benefits of the proposed methods in an extensive set of evaluations on time-series sequence classification problems. Comparing with state-of-the-art non-generative discriminative approaches such as kernel-based classifiers of [6], we show that the newly proposed approaches can yield performance comparable or better than that of many standard methods.[1] [2]

Algorithms [1][2]



  • [1] M. Kim and V. Pavlovic. “A Recursive Method for Discriminative Mixture Learning”. Int’l Conf. Machine Learning (ICML). 2007.
  • [2] M. Kim and V. Pavlovic. “Discriminative Learning of Mixture of Bayesian Network Classifiers for Sequence Classification”. IEEE Conf. Computer Vision and Pattern Recognition. 2006. pp. 268-275.


  • M. Kim, V. Pavlovic. Efficient Discriminative Learning of Mixture of Bayesian Network Classifiers for Sequence Classification – The Learning Workshop at Snowbird, Utah, April 4-7 2006.
  • M. Kim, V. Pavlovic. Discriminative Mixture Models – New York Academy of Sciences (NYAS) Machine Learning Symposium, NY, Oct. 27, 2006.


  1. Bayesian Network Classifiers, N. Friedman, D. Geiger, and M. Goldszmidt, Machine Learning, 1997.
  2. Efficient Discriminative Learning of Bayesian Network Classifier via Boosted Augmented Naive Bayes, Y. Jing, V. Pavlovic, and J. M. Rehg, ICML, 2005.
  3. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers, R. Greiner and W. Zhou, AAAI, 2002.
  4. Discriminative mixture weight estimation for large Gaussian mixture models, F. Beaufays, M. Weintraub, and Y. Konig, Proc. ICASSP, 337-340, 1999.
  5. Model-Based Motion Clustering Using Boosted Mixture Modeling, V. Pavlovic, CVPR 2004.
  6. Exploiting generative models in discriminative classifiers, T. Jaakkola and D. Haussler, NIPS 1998.

Human Motion

Marginal Nonlinear Dynamic System (MNDS) [1]

The human figure exhibits complex and rich dynamic behavior that is both nonlinear and time-varying. To automate the process of motion modeling we consider a class of learned dynamic models cast in the framework of dynamic Bayesian networks (DBNs) applied to analysis and tracking of the human figure. We are especially interested in learning dynamic models from motion capture data using DBN formalisms and dimensionality reduction methods. We have been explored the role of dynamics in dimensionality reduction problems and developed a statistical approach to human motion modeling to utilize this important factor.

We propose a new family of marginal auto-regressive (MAR) models that describe the space of all stable auto-regressive sequences, regardless of their specific dynamics. We apply the MAR class of models as sequence priors in probabilistic sequence subspace embedding problems. In particular, we consider a Gaussian process latent variable approach to dimensionality reduction.

Marginal Auto-Regressive (MAR) Model

Marginal Nonlinear Dynamic System (MNDS) Model

Experiment of Synthetic Sequence Data

Experiment on Human Motion Data

Human Motion Modeling using MNDS

Results on Synthetic Sequences

Tracking resuls for real sequence ( animated GIF)


    [1] K. Moon and V. Pavlovic. “Impact of Dynamics on Subspace Embedding and Tracking of Sequences”. IEEE Conf. Computer Vision and Pattern Recognition. 2006. pp. 198-205.


  • GPLVM software package is provided by N. Lawrence –
  • The 3D human model and the Maya binaries provided by the authors of “ Discriminative Density Propagation for 3D Human Motion Estimation ” (C. Sminchisescu, A. Kanaujia, Z. Li, D. Metaxas), IEEE CVPR ’05.

Face Recognition – Videos




trellis 70

Al Pacino

Angelina Jolie

Bill Gates

Jodie Foster

Julia Roberts

Steven Spielberg

Tony Blair

Tracking and Recognition – Bill Clinton

Tracking and Recognition – Jodie Foster

Tracking and Recognition – Tony Blair

Face Recognition

Video-based Face Tracking and Recognition with Visual Constraints [1]

We address the problem of tracking and recognition of faces in real-world, noisy videos. We identify faces using a tracker that adaptively builds the target model reflecting changes in appearance, typical of a video setting. However, adaptive appearance trackers often suffer from drifting, a gradual adaptation of the tracker to non-targets. To alleviate this problem, our tracker introduces visual constraints using a combination of local generative and discriminative models in a particle filtering framework. The generative term conforms the particles to the space of generic face poses while the discriminative one ensures rejection of poorly aligned targets. This leads to a tracker that significantly improves robustness against abrupt appearance changes and occlusions, critical for the subsequent recognition phase. Identity of the tracked subject is established by fusing pose-discriminant and person-discriminant features over the duration of a video sequence. This leads to a robust video-based face recognizer with state-of-the-art recognition performance. We test the quality of tracking and face recognition on real-world noisy videos from YouTube as well as a standard Honda/UCSD database. Our approach produces successful face tracking results on over 80% of all videos without video or person-specific parameter tuning. The good tracking performance induces similarly high recognition rates: 100% on Honda/UCSD and over 70% on the new YouTube set with 35 celebrities. 

1. Face Tracking with Visual Constraints

We consider challenging cases that include significant amount of facial pose change, illumination, and occlusion. The tracking problem can be seen as an online temporal filtering, where the tracking states are represented by the affine transformation parameters.

Under the particle filtering framework, the likelihood potential plays a crucial role.

The well-known first-frame (two-frame) tracker often fails in appearance change (occlusion). Recently, the Incremental Visual Tracker (IVT) has been introduced to adapt to appearance change. It incrementally updates the appearance model based on the previous estimates. However, it still suffers from the abrupt pose change or occlusion.

In addition to the adaptive term of the IVT, our proposed tracker introduces two likelihood terms that can serve as visual constraints. The first term is the distance to the pose-specific subspace. The intuition is to restrict the candidate track to conform to the predefined facial prototypes. The third term is the SVM face crop discrimination which can quickly discard ill-cropped and ill-aligned candidates.

The following is an illustrating example that compares the proposed tracker with the IVT. (Top) Undergoing the occlusion at t = 104 ~ 106, IVT (green box) is adapted to the non-target images, while the proposed tracker (red box) survives due to the two constraint terms (pose + svm). (Bottom) Tracked data compatibility (data log likelihood) of the two trackers. Lines in red (green) are the values of -E(It) evaluated on the red (green) boxes by the proposed tracker (solid) and IVT (dashed). During the occlusion, IVT strongly adapted to the wrong target (e.g., t = 106), leading to a highly peaked data score. Consequently, at t = 108, the green particle is incorrectly chosen as the best estimate. Visual constraints restrict the adaptation to the occluding non-target, producing more balanced hypotheses that get resolved in the subsequent frames.

2. Video-based Face Recognition

In video-based face recognition, we assume that both train and test data are sequences of frames, where each frame is a well-cropped and well-aligned face image that may be obtained from the output of face tracking. One may apply static frame-by-frame face recognizers, however, there are certain drawbacks. Rather, identifying the problem as a sequence classification problem, we use a probabilistic sequence model like HMMs.

For the observation feature of the HMM, we project the image onto the offline-trained pose-discriminant LDA subspace. The pose space presents an appealing choice for the latent space. Unlike arbitrary PCA-based subspaces, the pose space may allow the use of well-defined discriminative pose features in the face recognition HMM. This indeed gives better result than PCA-based features (See Table below). Moreover, our recognizer can easily be enriched with additional observation features that may further improve the recognition accuracy. We introduce the so-called LMT (i.e., LandMark Template) features. The LMT features consist of multi-scale Gabor features (at 6 scales and 12 orientations) applied to 13 landmark facial points that are locally searched within the bounding box starting from the tracked state. Since the LMT features are high (~1000) dimensional, we used PCA to extract only 10 major factors. We concatenate the LMT features with the pose-discriminating LDA features to form an observation feature vector in our recognizer. For the Honda/UCSD dataset [2], the table below shows the recognition accuracies of the proposed model (LDA+LMT and LDA-Only), manifold-based approaches [1,2], and the static frame-by-frame approaches.

The following illustrates how the pose change in the video can affect the subject prediction (as well as pose prediction). The top row shows an example face sequence, the second row gives the pose prediction, P(st|x1, …, xt), and the bottom two rows depict the subject prediction, P(y|x1, …, xt), in historical and histogram views. The pose is predicted correctly changing from frontal to R-profile. The true class is Danny. It is initially incorrect as Ming (blue/dashed curve in the third row). But, as time goes by, the red/solid curve overtakes Ming, and finally it is predicted correctly.





  • K.-C. Lee, J. Ho, M.H. Yang, and D. Kriegman, Video-based face recognition using probabilistic appearance manifolds, Computer Vision and Pattern Recognition (CVPR), 2003
  • K.-C. Lee, J. Ho, M.H. Yang, and D. Kriegman, Visual tracking and recognition using probabilistic appearance manifolds, Computer Vision and Image Understanding, 2005



  • [1] M. Kim, S. Kumar, V. Pavlovic and H. Rowley. “Face Tracking and Recognition with Visual Constraints in Real-World Videos”. IEEE Conf. Computer Vision and Pattern Recognition. 2008.

See Software & Data for details on how to obtain the dataset used in this work.


Image Segmentation

Image segmentation is one of the most important steps leading to the analysis of image data. The goal is dividing the image into parts that have homogeneous attributes, and have a strong correlation with objects or areas of the real world contained in the image. Region-based segmentation methods, e.g., Markov Random Fields (MRFs), are usually robust to noise and easy to capture contextual dependencies, but often generate rough boundaries and hard to incorporate shape and topology constraints. On the other hand, edge-based segmentation methods, e.g., deformable models, are usually easy to incorporate shape prior and object topology, but sensitive to noise (false edges and weak edges) and, sometimes, initializations.

In this project, we are combining deformable models and Markov random fields using a graphical model framework for better image segmentation. The integrated framework takes advantages of both models and generate better segmentation results in many cases.

The tightly coupled model :

The exact (yet intractable) Inference

The variational inference (to decouple the original intractable model inference)

The extended MRF model (solved by the belief propagation algorithm)

The probabilistic deformable model

The optimal variational parameters

The whole segmentation algorithm is an EM algorithm solving the above equations iteratively:


More results in our paper  [1]

  • A “more” tightly-coupled model (belief propagation inference can be performed in the whole model instead of using the variational inference) [2]
  • An extension to 3D segmentation [3]


  • [1] R. Huang, V. Pavlovic and D. N. Metaxas. “A graphical model framework for coupling MRFs and deformable models”. Proc. CVPR. 2004.
  • [2] R. Huang, V. Pavlovic and D. N. Metaxas. “A Hybrid Framework for Image Segmentation Using Probabilistic Integration of Heterogeneous Constraints”. Computer Vision for Biomedical Image Application: Current Techniques and Future Trends. 2005.
  • [3] R. Huang, V. Pavlovic and D. N. Metaxas. “A tightly coupled region-shape framework for 3D medical image segmentation”. Int’l Symposium Biomedical Imaging. 2006.



Shape Modeling

Shape analysis is an important process for many computer vision applications, including image classification, recognition, retrieval, registration, segmentation, etc. An ideal shape model should be both invariant to global transformations and robust to local distortions. In this work we developed a new shape modeling framework that achieves both efficiently. A shape instance is described by a curvature-based shape descriptor. A Profile Hidden Markov Model (PHMM) is then built on such descriptors to represent a class of similar shapes. PHMMs are a particular type of Hidden Markov Models (HMMs) with special states and architecture that can tolerate considerable shape contour perturbations, including rigid and non-rigid deformations, occlusions, and missing parts. The sparseness of the PHMM structure provides efficient inference and learning algorithms for shape modeling and analysis. To capture the global characteristics of a class of shapes, the PHMM parameters are further embedded into a subspace that models long term spatial dependencies. The new framework can be applied to a wide range of problems, such as shape matching/registration, classification/recognition, etc. Our experimental results demonstrate the effectiveness and robustness of this new model in these different settings. [1]


[1] R. Huang, V. Pavlovic and D. N. Metaxas. “Embedded Profile Hidden Markov Models for Shape Analysis”. IEEE Int’l Conf. Computer Vision. 2007.

Nonlinear Dimensionality Reduction for Regression

Nonlinear Dimensionality Reduction for Regression [1]

The task of dimensionality reduction for regression (DRR) is to find a low dimensional representation, z (q-dim), of the input covariates, x (p-dim), with q << p, for regressing the output, y (d-dim), given n i.i.d. data {(xi, yi)}. DRR is mainly useful for: (1) visualization of high dimensional data, (2) efficient regressor design with a reduced input dimension, and (3) elimination of noise in data x by uncovering the essential information z for predicting y. It should be noted that DRR is not tied to a particular regression estimation method, and can be rather seen as a prior task to the regressor design for a better understanding of data.

DRR is different from other well-known dimensionality reduction algorithms. To clarify, one can categorize DRR as a supervised technique with a real multivariate label y. On the other hand, most supervised techniques are devoted for the classification setting (i.e., discrete y), which includes Linear Discriminant Analysis (LDA), kernel LDA, the general graph embedding, and the metric learning. The unsupervised dimension reduction framework even assumes that y is unknown, subsuming the principal subspace methods (PCA and kernel PCA), the nonlinear locality-preserving manifold learning (LLE, ISOMAP, and Laplacian Eigenmap), and the probabilistic methods like GPLVM.

The crucial notion related to DRR is the sufficiency in  dimension reduction (SDR), which states that one has to find the linear subspace bases B = [b1, …, bq] where bi is a p-dim vector (in the nonlinear case, B = {b1(), …, bq()}, where bi() is a nonlinear basis function) such that y and x are conditionally independent given BTx. As this condition implies that the conditional distribution of y given x equals to that of y given z = BTx, the dimension reduction entails no loss of information for the purpose of regression.  It is known that such B always exists with non-unique solutions. Hence we are naturally interested in the minimal subspace or the intersection of all such subspaces, often called the central subspace (Although the subspace is usually meant for a linear case, however, we abuse the term for both linear and nonlinear cases).

Typically, two schools of approaches have been suggested to find the central subspace: the inverse regression (IR) [1,3] and the kernel dimension reduction (KDR) [2,4]. KDR in [2] directly reduces the task of imposing conditional independence to the optimization problem that minimizes the conditional covariance operator in RKHS (reproducing kernel Hilbert space). This is achieved by quantifying the notion of conditional dependency (between y and x given BTx) by the positive definite ordering of the expected covariance operators in what is called the probability-determining RKHS (e.g., the RBF kernel-induced Hilbert space).

Although KDR formulates the problem in RKHS, the final projection is  linear in the original space. For the nonlinear extension, [4] proposed the manifold KDR which first maps the original input space to a nonlinear manifold (e.g., by Laplacian Eigenmap learned from x only), and applies the KDR to find a linear subspace in the manifold. However, this introduces a tight coupling between the central space and the manifold learned separately, which restricts itself to a transduction setting only. That is, for a new input point, one has to rebuild the manifold entirely with data including the new point. Moreover, both  methods do not have closed-form solutions, resorting to a gradient-based optimization.

The inverse regression (IR) is another interesting framework for DRR. IR is based on the fact that the inverse regression, E[x|y], lies on the subspace spanned by B (the bases of the central subspace), provided that the marginal distribution of x is ellipse-symmetric (e.g., a Gaussian). Thus B coincides with the principal directions in the variance of the inverse regression, namely, V(E[x|y]). In [1], this variance was estimated by slicing the output space (i.e., clustering on y), thereby known as sliced IR (or SIR).

Despite its simplicity and closed-form solution, SIR assumes a linear central subspace, with a strong restriction on the marginal distribution of x. To cope with the limitation, a natural kernel extension (KSIR) was proposed in [3]. It discovers a nonlinear central subspace, and moreover allows the distribution of x to be rarely restricted. However, KSIR still resorts to slicing on y, which can result in unreliable variance estimation for high dimensional y.

In this work we propose a novel nonlinear method for DRR that exploits the kernel Gram matrices of input and output. It estimates the variance of the inverse regression under the IR framework, however, avoids the slicing by the effective use of covariance operators in RKHS. In fact, we show that KSIR is a special case of ours in that KSIR can be instantiated by a particular choice of output kernel matrix. Our approach can be reliably applied to the cases of high dimensional output, while suppressing potential noise in the output data.


We demonstrate the benefits of the proposed method in a comprehensive set of evaluations on several important regression problems that often arise in the computer vision areas:

a) Estismation of head pose from images with varying illumination condition

  • input x: (64 x 64) face images with diverse lighting directions
  • output y: 2D head pose (left/right and up/down rotation angles)
  • central subspace dim: 2
Central subspace obtained from the proposed “COIR”

Central subspace from KSIR

b) Human body pose estimation from a silhouette image

  • input x: (160 x 100) silhouette image at a side view
  • output y: 59-dim 3D joint angles at articulation points
  • central subspace dim: 2

Selected frames from a half walking cycle

Central subspaces; The proposed method denoted by “COIR”


[1] K.-C. Li, Sliced inverse regression for dimension reduction, Journal of the American Statistical Association, 1991 
[2] K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised Learning with reproducing kernel Hilbert spaces, Journal of Machine Learning Research, 2004 
[3] H. M. Wu, Kernel sliced inverse regression with applications on classification, ICSA Applied Statistics Symposium, 2006 
[4] J. Nilsson, F. Sha, and M. I. Jordan, Regression on manifolds using kernel dimension reduction, International Conference on Machine Learning (ICML), 2007


  • [1] M. Kim and V. Pavlovic. “Dimensionality Reduction using Covariance Operator Inverse Regression”. IEEE Conf. Computer Vision and Pattern Recognition. 2008.


Gaussian Process Manifold Kernel Dimensionality Reduction (GPMKDR) [1]

Constructing optimal regressors is an important task in computer vision. Problems such as tracking, 3D human and general object pose estimation, image and signal denoising, illumination direction estimation are but some of the problems that can be formulated in the regression setting. 
In this work we consider the task of discovering low-dimensional manifolds that preserve information relevant for a general nonlinear regression. We have proposed a novel dimension reduction approach called Gaussian Process Manifold Kernel Dimensional Reduction (GPMKDR), induced by reformulating the manifold kernel dimensional reduction (mKDR) in the Gaussian Process (GP) framework. In this framework, a closed-form solution for mKDR is given by the maximum eigenvalue-eigenvector solution to a kernelized problem. 

Sufficient Dimensionality Reduction (SDR)

Gaussian Process Manifold KDR

Results :

Experiments on two digit datasets

Result of Illumination Estimation

Result of Human Motion Estimation


  • [1] K. Moon and V. Pavlovic. “Visual inference using Gaussian process manifold kernel dimensionality reduction”. 2008.

Boosted Bayesian Network Classifiers

Discriminative Graphical Models

In Collaboration with Dr. James M. Rehg and Yushi Jing, College of Computing, Georgia Institute of Technology.

Discriminative learning, or learning for classification, is a common learning task that has been addressed in a number of different frameworks.  One may design a complex classifier, such as a support vector machine, that explicitly minimizes classification error. Alternatively, a set of weak learners can be trained using the boosting algorithm [Schapire97].  However, one may be explicitly interested in constructing a generative model for classification, such as a Bayesian network. The option in that case is to discriminatively train this generative model.  Unfortunately, discriminative training of generative models is computationally complex [Friedman97,Greiner02,Grossman04].  On the other hand, if the model is trained in a generative ML fashion its strong reliance on correct structure and independence assumption often undermine its classification ability. 

In this project we study a new framework for discriminative training of generative models.  Similar to traditional boosting, we recursively learn a set classifiers, this time constructed from generative models. Unlike boosting, where weak classifiers are trained discriminatively, the ‘weak classifiers’ in our method are trained generatively, to maximize the likelihood of the weighted data.  This approach has two distinct benefits.  First, our classifiers are constructed from generative models.  This is important in many practical cases when generative models, such as Bayesian networks or HMM, are desired or appropriate (e.g., sequence modeling).  Second, the ML training of generative models is computationally more efficient than discriminative training of the same. Therefore, by discriminatively setting the weights on the data and by generatively training intermediate models we achieve a computationally efficient way of training generative classifiers.




  1. Jing, Y., Pavlovic, V. & Rehg, J.M. (2008), “Boosted Bayesian network classifiers”, Machine Learning Journal.
  2. Jing, Y., Pavlovic, V. & Rehg, J.M. (2005) *Tech-report version (GIT-GVU-05-23)*  (Includes a preliminary analysis of boosted Dynamic Bayesian Network Classifiers, as an alternative to discriminative training methods like Conditional Random Fields )
  3. Jing, Y., Pavlovic, V. & Rehg, J.M. (2005) Efficient discriminative learning of Bayesian network classifiers via Boosted Augmented Naive Bayes – Proceedings of International Conference on Machine Learning (ICML 2005), Distinguished Student Paper Award.
  4. Jing, Y., Pavlovic, V. & Rehg, J.M. (2005) Discriminative Learning Using Boosted Generative Models – The Learning Workshop at Snowbird, Utah, April 5-8.

Software and Data

We have developed a C++ library for structure and parameter learning in boosted augmented Bayesian networks.  Please see our Software page for more details.

Classifying Brain Signals


The brain mechanisms underlying the ability of humans to process faces have been studied extensively in the last two decades. Brain imaging techniques, particularly fMRI (functional Magnetic Resonance Imaging) that possesses high spatial resolution but limited temporal resolution, are advancing our knowledge of the spatial and anatomical organization of face-sensitive brain areas. At the other end are EEG recording techniques with high temporal resolution but poor spatial resolution. They reveal event-related potentials (ERPs) that serve as correlates for various aspects of facial processing. The best-established ERP marker of face perception is N170, a negative component that occurs roughly 170 ms after stimulus onset. Other markers, such as N250, N400 and P600, which occur later than N170, are considered to contribute to face recognition and identification. In a typical fMRI or ERP study, signals are recorded while the subject is exposed to a face or a non-face stimulus to isolate the brain areas that respond differentially to faces (fMRI), and the temporal intervals that exhibit major differences between face and a non-face responses in the data (ERP). Early studies in both fMRI and ERP employed simplistic signal processing techniques, involving multiple instance averaging and then a manual examination to detect differentiating components. More advanced statistical signal analysis techniques were first applied to fMRI signals (for a review see. Systematic analyses generally lagged behind in the ERP domain. A recent principled approach is by Moulson et al, in which the authors applied statistical classification in the form of Linear Discrimination Analysis (LDA) to face/non-face ERPs and obtained classification accuracies of about 70%.

There are two major goals for the current ERP study: First, to emphasize higher brain areas of visual processing at the expense of early visual areas by using a strategically designed control non-face stimulus. Second, and more importantly, this study seeks to apply systematic machine learning and pattern recognition techniques for classifying face and non-face responses in both the spatial and temporal domains. Toward the first major goal, we use a face and a non-face stimulus derived from the well-known vase/faces illusion.

The key feature of the face and non-face stimuli is that they share the same defining contours, differing only slightly in stimulus space. The defining contours  are attributed to whatever forms the figure; as the percept alternates spontaneously, these contours are attributed alternately to the faces or to the vase.

This attribution is biased, using minimal image manipulations, toward the vase in or the faces, but the same contours are used in both cases. These shared contours result in relatively similar responses for the faces and vase stimuli and produce a difficult classification task for the ensuing ERP signals. With this choice of face and non-face stimuli, we bias the analysis towards detecting signal differences that are elicited more by the high-level percepts (face or non-face) rather than low-level image differences.

With respect to the second major goal, we note that several types of classifiers have been used in previous EEG classification studies: kNN, logistic regression and multi layer perceptron, support vector machines and LDA. In authors have used group penalty for lasso regression in frequency domain for EEG signal classification and showed the utility of grouping in that domain. The present study extends this systematic use of classifiers by using the ERP signals obtained with the faces/vase stimuli to test three major classification schemes: kNN, L1-norm logistic regression, and group lasso logistic regression. We perform the classification analysis between two classes of face and vase responses agnostically, based on purely statistical estimates, without favoring any sensors or temporal intervals. Our goal is to use the data to point to salient spatio-temporal ERP signatures most indicative of the stimuli classes. Obviously, ERP classification can provide important applications in neuroscience, such as in brain-computer interfaces (BCI) and in detecting pathologies such as epilepsy.

The main results of our tests with the three major classifier schemes are: First, kNN produced the worst performance, having to rely on all, potentially noisy, features. Second, the other two schemes were able to classify the signals with an overall accuracy of roughly 80%. Finally, the learned weights of L1-norm logistic regression and group lasso were able to locate the salient features in space (electrode position) and time in close agreement with the accepted wisdom of previous studies, confirming various markers of face perception such as N170, N400 and P600.


Applying sparse dictionary methods not only improved the classification performance but also pointed out the spatial and temporal regions of high interest. That is, the most discriminant features in time and space (channels) stand for ERPs and active regions of brain that are resposible for distinguishing between a face and non-face. Following figures illustrate them.

Mapping of L1 regression coefficients to channels (first figure) and time (second figure)


Related Publications

  • [1] S. Shariat, V. Pavlovic, T. Papathomas, A. Braun and P. Sinha. “Sparse dictionary methods for EEG signal classification in face perception”. Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop on. 2010. pp. 331 – 336.