Special Permissions / Overrides for Fall 17 Courses

Many students have contacted me asking about Special Permissions or  Prerequisite Overrides for 206 & 535 courses for Fall 2017.  As of now, Thursday, August 31, the status is as follows:  I am told that our administrative staff is still processing requests.  I have yet to receive any lists of students who submitted their requests.

Once I have more information, I will post it here.

Update (9/5/2017):  This afternoon I finally received access to SPN request lists.  I will be processing those requests in the next few days.

CVPR 2017 – Wrap Up

It was quite exciting to attend the largest CVPR ever – almost 5000 attendees.  Having it in a beautiful location made it even more appealing.

Thanks to my students and colleagues who made the work we presented at CVPR possible.

Joint work with Imperial College and MIT on using copula models for joint facial AU estimation.
Joint NTU Singapore – Rutgers work on generative models for robust 3D face pose estimation
Break time at Waikiki beach
Hai presenting his work at the 1st Int’l Workshop on Deep Affective Learning and Context Modeling

ICCV 2017

Our paper about unsupervised probabilistic domain adaptation [1] for deep models (and other models too) has been accepted for ICCV’17:

[1] B. Gholami, O. Rudovic, and V. Pavlovic, “PUnDA: Probabilistic Unsupervised Domain Adaptation,” in Proc. IEEE International Conference Computer Vision, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Gholami, Behnam and Rudovic, Ognjen and Pavlovic, Vladimir},
biburl = {https://www.bibsonomy.org/bibtex/29fd2aed9b960e71bf2472b5d70079060/vpavlovic},
booktitle = {Proc. IEEE International Conference Computer Vision},
interhash = {4a8c0ac9cfbd7b02e2d86ef7b5a289a0},
intrahash = {9fd2aed9b960e71bf2472b5d70079060},
keywords = {domain_adaptation myown unsupervised},
owner = {vladimir},
timestamp = {2017-07-18T18:21:19.000+0200},
title = {PUnDA: Probabilistic Unsupervised Domain Adaptation},
year = 2017

Congratulations to Behnam and Ognjen!

Depth Recovery Paper

Our article on depth recovery using deformable object priors was accepted for publication in Journal of Visual Communication and Image Representation [1]:

[1] C. Chen, H. X. Pham, V. Pavlovic, J. Cai, G. Shi, and Y. Gao, “Using 3D Face Priors for Depth Recovery,” J. Visual Commun. Image Represent., 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Chen, Chongyu and Pham, Hai Xuan and Pavlovic, Vladimir and Cai, Jianfei and Shi, Guangming and Gao, Yuefang},
biburl = {https://www.bibsonomy.org/bibtex/2d52164653eb6d2d7324656f9cbc20d3f/vpavlovic},
interhash = {14284536d8e03504dfb06b171c8598fe},
intrahash = {d52164653eb6d2d7324656f9cbc20d3f},
journal = {J. Visual Commun. Image Represent.},
keywords = {imported myown},
owner = {vladimir},
timestamp = {2017-07-18T18:07:32.000+0200},
title = {Using 3D Face Priors for Depth Recovery},
year = 2017



CVPR 2017

 We are excited to have three CVPR 2017 main conference papers accepted [1, 2, 3], as well as one workshop paper [4] :

[1] B. Babagholami and V. Pavlovic, “Probabilistic Temporal Subspace Clustering,” in IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Babagholami, Behnam and Pavlovic, Vladimir},
biburl = {https://www.bibsonomy.org/bibtex/2e0a04837c7cd0540d2cd2f0aa442a097/vpavlovic},
booktitle = {IEEE Int'l Conf. Computer Vision and Pattern Recognition},
date-added = {2017-01-17 15:43:46 +0000},
date-modified = {2017-01-17 15:45:08 +0000},
interhash = {92906adf52c283061c4f06eaac54aeca},
intrahash = {e0a04837c7cd0540d2cd2f0aa442a097},
keywords = {cvpr17 myown subspace_clustering},
note = {Under review. 50\% contribution.},
timestamp = {2017-07-18T18:07:19.000+0200},
title = {Probabilistic Temporal Subspace Clustering},
year = 2017
[2] R. Walecki, O. Rudovic, V. Pavlovic, B. Schuller, and M. Pantic, “Deep Structured Learning for Facial Expression Intensity Estimation,” in IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Walecki, Robert and Rudovic, Ognjen and Pavlovic, Vladimir and Schuller, Bjorn and Pantic, Maja},
biburl = {https://www.bibsonomy.org/bibtex/2fd5d11a30fb9f63aa76cfe46783b9bd8/vpavlovic},
booktitle = {IEEE Int'l Conf. Computer Vision and Pattern Recognition},
date-added = {2017-01-17 15:41:28 +0000},
date-modified = {2017-01-17 15:43:23 +0000},
interhash = {956179029a34e1c71fc4b94fb1a90ee8},
intrahash = {fd5d11a30fb9f63aa76cfe46783b9bd8},
keywords = {cvpr17 deep_learning emotion_modeling expression_intensity_estimation myown ordinal_models},
timestamp = {2017-07-22T21:33:47.000+0200},
title = {Deep Structured Learning for Facial Expression Intensity Estimation},
year = 2017
[3] L. Sheng, J. Cai, T. Cham, V. Pavlovic, and K. N. Ngan, “A Generative Model for Depth-based Robust 3D Facial Pose Tracking,” in IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Sheng, Lu and Cai, Jianfei and Cham, Tat-Jen and Pavlovic, Vladimir and Ngan, King Ngi},
biburl = {https://www.bibsonomy.org/bibtex/2e18c6335e20344d4a11bceb8ec657b57/vpavlovic},
booktitle = {IEEE Int'l Conf. Computer Vision and Pattern Recognition},
date-added = {2017-01-17 15:45:33 +0000},
date-modified = {2017-01-17 15:46:47 +0000},
interhash = {112c6e22dda03c3d485b79595d210af1},
intrahash = {e18c6335e20344d4a11bceb8ec657b57},
keywords = {cvpr17 depth_camera face_modeling face_tracking myown},
note = {Under review. 25\% contribution.},
timestamp = {2017-07-18T18:07:32.000+0200},
title = {A Generative Model for Depth-based Robust 3D Facial Pose Tracking},
year = 2017
[4] H. Pham and V. Pavlovic, “Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach,” in IEEE Int’l Conf. Computer Vision and Pattern Recognition – Workshop on Deep Affective Learning and Context Modeling, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Pham, Hai and Pavlovic, Vladimir},
biburl = {https://www.bibsonomy.org/bibtex/2e6c72a93e19dd9fe1270b6d559534637/vpavlovic},
booktitle = {IEEE Int'l Conf. Computer Vision and Pattern Recognition - Workshop on Deep Affective Learning and Context Modeling},
interhash = {538627e583e79fae93dd8af09fcdc7eb},
intrahash = {e6c72a93e19dd9fe1270b6d559534637},
keywords = {deep_learning emotion_modeling face_animation myown speech_analysis},
owner = {vladimir},
timestamp = {2017-07-18T18:07:32.000+0200},
title = {Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach},
year = 2017


Go to our Research page to find out more.

Congratulations to all!

Ancient Coin Recognition Paper

 Our article dealing with the problem of Ancient Coin Recognition was recently accepted for publication [1]:

[1] [doi] J. Kim and V. Pavlovic, “Discovering Characteristic Landmarks on Ancient Coins Using Convolutional Networks,” SPIE Journal of Electronic Imaging, 2017.
added-at = {2017-07-18T17:26:03.000+0200},
author = {Kim, Jongpil and Pavlovic, Vladimir},
biburl = {https://www.bibsonomy.org/bibtex/20a2ffe339e53fea346b16501a76c5844/vpavlovic},
date-added = {2016-09-11 21:37:27 +0000},
date-modified = {2017-01-17 15:49:38 +0000},
doi = {10.1117/1.JEI.26.1.011018},
interhash = {0b5ba506800f82bb385df4391c8290d9},
intrahash = {0a2ffe339e53fea346b16501a76c5844},
journal = {{SPIE} Journal of Electronic Imaging},
keywords = {coin_analysis deep_learning myown},
note = {Accepted for publication. 50\% contribution.},
timestamp = {2017-07-18T18:07:19.000+0200},
title = {Discovering Characteristic Landmarks on Ancient Coins Using Convolutional Networks},
year = 2017

Congratulations to Jongpil!

Distributed Probabilistic Learning

1. Abstract

Traditional computer vision algorithms, particularly those that exploit various probabilistic and learning-based approaches, are often formulated in centralized settings. However, modern computational settings are becoming increasingly characterized by networks of peer-to-peer connected devices, with local data processing abilities. A number of distributed algorithms have been proposed to address the problems such as calibration, pose estimation, tracking, object and activity recognition in large camera networks [1],[2].

One critical challenge in distributed data analysis includes dealing with missing data. In camera networks, different nodes will only have access to a partial set of data features because of varying camera views or object movement. For instance, object points used for SfM may be visible only in some cameras and only in particular object poses. As a consequence, different nodes will be frequently exposed to missing data. However, most current distributed data analysis methods are algebraic in nature and cannot seamlessly handle such missing data.

In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we show how traditional centralized models, such as probabilistic PCA (PPCA) [3] and missing-data PPCA [4], Bayesian PCA (BPCA) [4] can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed affine structure from motion. Our experiments suggest that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations.

2. Distributed Probabilistic Learning

We propose a distributed consensus learning approach for parametric probabilistic models with latent variables that can effectively deal with missing data. The goal of the network of sensors is to learn a single consensus probabilistic model (e.g., 3D object structure) without ever resorting to a centralized data pooling and centralized computation. Let \( \mathbf{X} = \{ \mathbf{x}_{n} | \mathbf{x}_{n} \in \mathcal{R}^{D} \} \) be a set of iid multivariate data points with the corresponding latent variables  \( \mathbf{Z} = \{ \mathbf{z}_{n} | \mathbf{z}_{n} \in \mathcal{R}^{M} \} \), \(n = 1 … N\). Our model is a joint density defined on \( (\mathbf{x}_{n}, \mathbf{z}_{n}) \) with a global parameter \( \theta \),

(\mathbf{x}_{n}, \mathbf{z}_{n}) \sim p(\mathbf{x}_{n}, \mathbf{z}_{n} | \theta),

with \( p(\mathbf{X}, \mathbf{Z} | \theta) = \prod_n p(\mathbf{x}_{n}, \mathbf{z}_{n} | \theta) \), as depicted in Figure 1-a. In this general model, we can find an optimal global parameter \( \hat{\theta} \) (in a MAP sense) by applying standard EM learning. It is important to point out that each posterior density estimate at point \( n \) depends solely on the corresponding measurement \( \mathbf{x}_{n} \) and does not depend on any other \( \mathbf{x}_{k}, k \neq n \), hence is decentralized. To consider the distributed counterpart of this model, let \( G = (V, E) \) be an undirected connected graph with vertices \( i, j \in V \) and edges \( e_{ij} = (i, j) \in E \) connecting the two vertices. Each \( i \)-th node is directly connected with 1-hop neighbors in \( \mathcal{B}_{i} = \{ j; e_{ij} \in E \} \). Suppose the set of data samples at \( i \)-th node is \( \mathbf{X}_{i} = \{ \mathbf{x}_{in}; n = 1, … , N_{i} \} \), where \( \mathbf{x}_{in} \in \mathcal{R}^{D} \) is \( n \)-th measurement vector and \( N_{i} \) is the number of samples collected in \( i \)-th node. Likewise, we define the latent variable set for node \( i \) as \( \mathbf{Z}_{i} = \{ \mathbf{z}_{in}; n = 1, … , N_{i} \} \).

Learning the model parameter would be decentralized if each node had its own independent parameter \(\theta_i\). Still, the centralized model can be equivalently defined using the set of local parameters, with an additional constraint on their consensus, \( \theta_1 = \theta_2 = \cdots = \theta_{|V|} \). This is illustrated in Figure 1-b where the local node models are constrained using ties defined on the underlying graph. The simple consensus tying can be more conveniently defined using a set of auxiliary variables \( \rho_{ij} \), one for each edge \( e_{ij} \) (Figure 1-c). This now leads to the final distributed consensus learning formulation, similar to [5]:

\begin{align*} \label{dpm_opt1} \hat{\mathbf{\theta}} = \arg\min_{ \{ \theta_{i} : i \in V \} } & -\log p( \mathbf{X} | \mathbf{\theta}, G) \\ s.t. &\quad \theta_{i} = \rho_{ij}, \rho_{ij} = \theta_{j}, i \in V, j \in \mathcal{B}_{i}. \end{align*}

This is a constrained optimization task that can be solved in a principal manner using the Alternating Direction Method of Multipliers (ADMM) [6]. ADMM iteratively, in a block-coordinate fashion, solves \( \max_{\lambda} \min_{\theta} \mathcal{L}(\cdot) \) on the augmented Lagrangian

\begin{align*} \label{dpm_opt2} \mathcal{L}( \mathbf{\theta}, \rho, \lambda ) &= -\log p( \mathbf{X} | \theta_{1}, \theta_{2}, … , \theta_{|V|}, G) \\ &\quad + \sum_{i \in V} \sum_{j \in \mathcal{B}_{i}} \left\{ \lambda_{ij1}^{\text{T}} ( \theta_{i} – \rho_{ij} ) + \lambda_{ij2}^{\text{T}} ( \rho_{ij} – \theta_{j} ) \right\} \nonumber \\ &\quad + \frac{ \eta }{ 2 } \sum_{i \in V} \sum_{j \in \mathcal{B}_{i}} \left\{ || \theta_{i} – \rho_{ij} ||^{2} + || \rho_{ij} – \theta_{j} ||^{2} \right\} \end{align*}

where \( \lambda_{ij1}, \lambda_{ij2}, i,j \in V \) are the Lagrange multipliers, \( \eta \) is some positive scalar parameter and \( ||\cdot|| \) is induced norm. The last term (modulated by \( \eta \) ) is not strictly necessary for consensus but introduces additional regularization.


3. Distributed Probabilistic Principal Component Analysis (D-PPCA)

Distributed versions of PPCA and missing-data PPCA can be derived straightforwardly based on the model above. Detailed information, including derivation of iterative formula for distributed EM [5] can be found in Yoon and Pavlovic (2012).

We tested D-PPCA using synthetic Gaussian data including the case when some of the values are missing. As one can see in Figure 2, D-PPCA, regardless of existence of missing data (either missing-at-random (MAR) or missing-not-at-random (MNAR)), showed competence against the centralized counterpart. We also report empirical convergence analysis in the supplementary material.

We also applied D-PPCA to the problem of distributed affine SfM. We conducted experiments on both synthetic (multiple cameras observing a rotating cube) and real settings. For real settings, we used videos obtained from Caltech [7] and Johns Hopkins [8]. We simulated multiple camera setting by sequentially dividing frames by the number of cameras, in our case 5, i.e. frame no. 1~6 are assigned to camera 1, 7~12 are assigned to camera 2, etc. assuming we have 5 cameras and 30 frames in total. We compared D-PPCA reconstructed structure with centralized, SVD-based reconstructed structure by using subspace angle. Table 1 below shows the result we obtained from Caltech turntable dataset. It clearly shows that D-PPCA rivals that of SVD-based methods even when some values in observation are missing. For detailed and additional results and explanation, please refer the attached manuscript and supplementary materials.

4. Distributed Bayesian Principal Component Analysis (D-BPCA)

We can also apply the similar framework to obtain distributed extension of the mean field variational inference formulation (MFVI). It is easy to show that the distributed counterpart is equivalent to the centralized mean field variational inference optimization problem:

\begin{align*}\label{ooo} [\hat{\lambda}_Z, \hat{\lambda}_W] = &\underset{\lambda_{Z_i}, \lambda_{W_i}: i \in V}{\arg\min}\; – \mathbb{E}_Q\big[ \log P(X,Z,W|\Omega_z,\Omega_w) \big] + \mathbb{E}_Q[\log Q] \nonumber \\ &s.t. \;\; \lambda_{W_i} = \rho_{ij},\;\;\rho_{ij} = \lambda_{W_j},\;\; i\in V, j\in \mathcal{B}_i \end{align*}

where \( Z=\{z_i \in \mathbb{R}^{M}\}_{i=1}^{N} \) denote a set of local latent variables,  \( W \) denotes a global latent variable and \( \Omega=[\Omega_z, \Omega_w] \) denote a set of fixed parameters, and the form of \( Q(z_n;\lambda_{z_n}) \) and \( Q(W;\lambda_{W}) \) are set to be in the same exponential family as the conditional distributions \( P(W|X,Z,\Omega_w) \), Using conjugate exponential family for prior and likelihood distributions, each coordinate descent update in MFVI can be done in closed form. However, the penalty terms would be quadratic in the norm difference of \( (\lambda_{W_i} – \rho_{ij}) \), that may result in the non-analytic updates for \( \{\lambda_{W_i}\}_{i=1}^{|V|} \)

To solve the above problem efficiently, we propose to use Bregman ADMM (B-ADMM) [9] which generalizes the ADMM by replacing the quadratic penalty term by different Bregman divergences in order to exploit the structure of problems. We propose to use the log partition function of the global parameter as the bregman function. Based on the proposed Bregman function, we can obtain the analytical update formula for BADMM, which have closed form solutions. Figure below shows the performance comparison of the distributed affine structure from motion experiment explained above using the centralized versions of SVD, PPCA and BPCA (using variational inference) and the proposed distributed versions of PPCA and BPCA with varying noise levels.

Related Publications

  • S. Yoon and V. Pavlovic, “Decentralized Probabilistic Learning For Sensor Networks,” in IEEE Global Conference on Signal and Information Processing, 2016.
    author = {Sejong Yoon and Vladimir Pavlovic},
    title = {Decentralized Probabilistic Learning For Sensor Networks},
    booktitle = {IEEE Global Conference on Signal and Information Processing},
    year = {2016},
    month = dec,
    note = {50\% contribution.},
    date-added = {2016-09-11 21:34:27 +0000},
    date-modified = {2016-09-11 21:35:57 +0000},

  • C. Song, S. Yoon, and V. Pavlovic, “Fast ADMM Algorithm for Distributed Optimization with Adaptive Penalty,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, {USA.}, 2016, p. 753–759.
    [BibTeX] [Download PDF]
    author = {Changkyu Song and Sejong Yoon and Vladimir Pavlovic},
    title = {Fast {ADMM} Algorithm for Distributed Optimization with Adaptive Penalty},
    booktitle = {Proceedings of the Thirtieth {AAAI} Conference on Artificial Intelligence},
    year = {2016},
    pages = {753--759},
    address = {Phoenix, Arizona, {USA.}},
    month = feb,
    note = {33\% contribution},
    bdsk-url-1 = {http://arxiv.org/abs/1506.08928},
    date-modified = {2016-09-11 21:16:22 +0000},
    url = {http://arxiv.org/abs/1506.08928},

  • B. Babagholami, S. Yoon, and V. Pavlovic, “D-MFVI: Distributed Mean Field Variational Inference using Bregman ADMM,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, {USA.}, 2016, p. 1582–158.
    [BibTeX] [Download PDF]
    author = {Behnam Babagholami and Sejong Yoon and Vladimir Pavlovic},
    title = {{D-MFVI}: Distributed Mean Field Variational Inference using Bregman {ADMM}},
    booktitle = {Proceedings of the Thirtieth {AAAI} Conference on Artificial Intelligence},
    year = {2016},
    pages = {1582--158},
    address = {Phoenix, Arizona, {USA.}},
    month = feb,
    note = {33\% contribution},
    bdsk-url-1 = {http://arxiv.org/abs/1507.00824},
    date-modified = {2016-09-11 21:16:12 +0000},
    url = {http://arxiv.org/abs/1507.00824},

  • S. Yoon and V. Pavlovic, “Distributed Probabilistic Learning for Camera Networks with Missing Data,” in Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States., 2012, p. 2933–2941.
    [BibTeX] [Download PDF]
    author = {Sejong Yoon and Vladimir Pavlovic},
    title = {Distributed Probabilistic Learning for Camera Networks with Missing Data},
    booktitle = {Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States.},
    year = {2012},
    pages = {2933--2941},
    note = {50\% contribution},
    bdsk-url-1 = {http://papers.nips.cc/paper/4629-distributed-probabilistic-learning-for-camera-networks-with-missing-data},
    url = {http://papers.nips.cc/paper/4629-distributed-probabilistic-learning-for-camera-networks-with-missing-data},


  1. R. J. Radke. “A Survey of Distributed Computer Vision Algorithms”. Nakashima, Hideyuki, Aghajan, Hamid, Augusto and J. Carlos eds. Springer Science+Business Media, LLC. 2010.
  2. R. Tron and R. Vidal. “Distributed Computer Vision Algorithms”, IEEE Signal Processing Magazine, Vol. 28. 2011, pp. 32-45.
  3. M. E. Tipping and C. M. Bishop. “Probabilistic Principal Component Analysis”, Journal of the Royal Statistical Society, Vol. Series B. 1999, pp. 611-622.
  4. A. Ilin and T. Raiko. “Practical Approaches to Principal Component Analysis in the Presence of Missing Values”, Journal of Machine Learning Research, Vol. 11. 2010, pp. 1957-2000.
  5. P. A. Forero, A. Cano and G. B. Giannakis. “Distributed clustering using wireless sensor networks”, IEEE Journal of Selected Topics in Signal Processing, Vol. 5, August, 2011, pp. 707-724.
  6. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, vol. 3, Now Publishers, 2011.
  7. P. Moreels and P. Perona. “Evaluation of Features Detectors and Descriptors based on 3D Objects”, International Journal of Computer Vision, Vol. 73. 2007, pp. 263-284.
  8. R. Tron and R. Vidal. “A Benchmark for the Comparison of 3-D Motion Segmentation Algorithms“, IEEE International Conference on Computer Vision and Pattern Recognition. 2007.
  9. H. Wang and A. Banerjee, “Bregman Alternating Direction Method of Multipliers”, Advances in Neural Information Processing Systems 27, 2014.

Recognition of Ancient Roman Coins using Spatial Information

Recognition of Ancient Roman Coins

1. Problem Formulation

For a given Roman coin image, the goal is to recognize who is on the coin


There are thousand of different ways to define the Roman coins. For example, we can classify the coins by attributes such as symbols, sizes, materials and legend. Please note that those attributes are correlated together. One attribute may help reveal the other attributes. In this project, we focus on a face recognition problem where for a given Roman coin image, the goal is to recognize who is on the coin. So, for above images, we want to know that the Roman emperor, Caligular, is engraved on the coin. This is for Maximus second, the famous emperors Nero and Tiberius.


2. Motivation

Understanding the ancient Roman coins could serve as references to understand the Roman empire

  • The Roman coins are always connected to Roman historical events and Roman imperial propaganda
  • The Roman empire knew how to effectively use the coin as their political propaganda
  • The Roman coins were widely used to convey the achievements of Roman emperors to public
  • The Roman coins were served to spread messages of changing policies or merits through the empire
  • The Roman emperors also could show themselves to the entire empire by engraving portraits on the coins
  • The Roman coins were the newspaper of the Roman empire

3. Practical Application

A reliable and automatic method to recognize the coins is necessary

  • The coin market is very active as many people are collecting coins as hobby. Also the coins were massively produced and new Roman coins are daily excavated, making themselves affordable to collect.
  • Ancient coins are becoming subject to a very large illicit trade. Recognition of the ancient Roman coins is not easy for novices but requires knowledge.
  • <!–

  • A traditional way is to periodically and manually search catalogue, dealers or the Internet by authority forces.
  • –>

4. Challenges

  • Inter-class similarity due to engraver’s lack of knowledge for the emperor’s portrait and abstraction
  • Intra-class dissimilarity. The coins were made manually from different factories
  • The recognition of the face on the coin is different from that of the real face

5. Coin Data Collection

  • Coin images are collected from a numismatic website [1, 2]
  • 2815 coin images with 15 Roman emperors
  • – Small part of the much larger dataset
    – Annotated for visual analysis (the original dataset only has numismatic annotation)
    – Each emperor has at least 10 coin images

  • High resolution images : 350-by-350 pixels

6. Coin Recognition Methods using Spatial Information

  1. Deformable Part Model (DPM) based method
  2. – Precise encoding of spatial information more specifically than spatial pyramid by alignment
    – DPM is used to align the coin image by locating the face of the emperor
    – Training and test of DPM

  3. Fisher Vector based method
  4. – Each point is presented as a combination of visual features and location, (x, l)
    – Gaussian mixture model to describe probability of (x, l)
    \[ \begin{eqnarray} p(\mathbf{x}, \mathbf{l}) & = & \sum_k \pi_k \cdot p(\mathbf{x}, \mathbf{l}; {\Sigma}_k^V, {\Sigma}_k^L, {\mu}_k^V, {\mu}_k^L) \nonumber \\ & = & \sum_k \pi_k \cdot p(\mathbf{x}; {\Sigma}_k^V, {\mu}_k^V) \cdot p(\mathbf{l}; {\Sigma}_k^L, {\mu}_k^L), \end{eqnarray} \] where \(\pi_k\) is a prior probability for the \(k\)th component, \({\Sigma}_k^V, {\mu}_k^V\) are means and covariances for the visual descriptors, \({\Sigma}_k^L, {\mu}_k^L\) mean and covariance for the location, and \[ \begin{eqnarray} p(\mathbf{x}; {\Sigma}_k^V, {\mu}_k^V) & \quad \sim \quad & \mathcal{N} (\mathbf{x}; {\Sigma}_k^V, {\mu}_k^V)\\ p(\mathbf{l}; {\Sigma}_k^L, {\mu}_k^L) & \quad \sim \quad & \mathcal{N} (\mathbf{l}; {\Sigma}_k^L, {\mu}_k^L). \end{eqnarray} \] The gradient with respect to the mu and sigma defines the Fisher vector.

7. Experimental Results

  • Experimental settings
  • – 2815 coin images with 15 emperors
    – For evaluation, divide the coin dataset into 5 fold splits, training on 4 splits and testing on 1 split
    – SIFT as visual feature
    – Multi-class SVM for training and prediction

  • Recognition accuracies for various methods
  • Confusion matrices
  • Discriminative regions
  • Outlier detection

8. Conclusion

We proposed two automatic methods to recognize the ancient Roman coins. The first method employs the deformable part model to align the coin images to improve the recognition accuracy. The second method facilitates the spatial information of the coin by directly encoding the location information. As the first method takes the information of the face location into account, it performs slightly better than the second method. The experiments show that both methods outperform the other methods such as the standard spatial pyramid model and human face recognition method.

In this project, we collect a new ancient Roman coin dataset and investigate an automatic framework to recognize the coins where we employ the state-of-the-art face recognition system and facilitate the spatial information of the coin to improve the recognition accuracy. The coin images are high-resolution (350-by-350 pixels) and the face locations are annotated. While the proposed coin recognition framework is based on the standard methods such as bag-of-words with spatial pyramids, Fisher vectors and DPM, we believe that their use in the context of the ancient coin recognition represents an interesting contribution.


  • [1] J. Kim and V. Pavlovic. “Ancient Coin Recognition Based on Spatial Coding”. Proc. International Conference on Pattern Recognition (ICPR). 2014.
  • [2] J. Kim and V. Pavlovic. “Improving Ancient Roman Coin Recognition with Alignment and Spatial Encoding”. ECCV Workshop VISART. 2014.

Hybrid On-line 3D Face and Facial Actions Tracking in RGBD Video Sequences

1. Abstract

Tracking human faces has remained an active research area among the computer vision community for a long time due to its usefulness in a number of applications, such as video surveillance, expression analysis and human-computer interaction. An automatic vision-based tracking system is desirable and such a system should be capable of recovering the head pose and facial features, or facial actions. It is a non-trivial task because of the highly deformable nature of faces and their rich variability in appearances.

A popular approach for face modeling and alignment is using statistical models such as Active Shape Models and Active Appearance Models. These techniques have been refined over long period of time and proven to be really robust. However, they were originally developed to work on 2D texture and require intensive preparation of training data. Using 3D morphable model on the other hand is another approach. In these techniques, a 3D facial shape model is deformed to fit to input data. These trackers rely on either texture or depth, not taking advantages of both sources of information or using them sparsely. In addition, sophisticated trackers use specially designed 3D face models which are not freely available. Lastly, they often require prior training or manual initial alignment of the face model performed by human operators.

In this work, we propose a hybrid on-line 3D face tracker to take advantages of both texture and depth information, which is capable of tracking 3D head pose and facial actions simultaneously. First, we employ a generic deformable model, the Candide-3, into our ICP fitting framework. Second, we introduce a strategy to automatically initialize the tracker using the depth information. Lastly, we propose a hybrid tracking framework that combines ICP and OAM to utilize the strengths of both techniques. The ICP algorithm, which is aided by optical flow to correctly follow large head movement, robustly tracks the head pose across frames using depth information. It provides a good initialization for OAM. In return, the OAM algorithm maintains the texture model of the face, adjusts any drifting incurred by ICP and transforms the 3D shape closer to correct deformation, which then provides ICP with a good initialization in the next frame.

2. Parameterized Face Model

We use an off-the-shell 3D deformable model, Candide-3, which was developed by J. Ahlberg [1]. The deformation of the face model is controlled by Shape Units (SUs) which represent face biometry specific to a person, and Action Units (AUs) which control facial expressions and are user-invariant. Since every vertex can be transformed independently, each vertex of the model is reshaped according to: \[g = p_0 + S\sigma + A\alpha \] where $p_0$ is the base coordinates of a vertex {\it p}, S and A are shape and action deformation matrices associated with vertex {\it p}, respectively. $\sigma$ is the vector of shape deformation parameters and $\alpha$ is the action deformation parameters vector. In general, the transformation of a vertex given global motion including rotation {\it R} and translation {\it t} is defined as: \[p’ = R(p_0 + S\sigma + A\alpha ) + t \]

We use the first frame to estimate the SU parameters corresponding to the test subject in neutral expression, together with initial head pose. From the second frame onwards , we keep shape unit parameters $\sigma$ unchanged and track the action unit parameters $\alpha$, along with head pose {\it R} and {\it t}. 7 action units are tracked in our framework as depicted below.

3. Initialization

The initialization pipeline is described in the following figure:

First, using a general 2D face alignment algorithm, we can reliably detect 6 features points (eye/mouth corners) as shown below

These 2D points are back-projected to world coordnates to form a set of 3D correspondences using the depth map. Then using the registration technique in [2], we recover the initial head pose. We use some heuristics to guess initial shape parameters by searching for facial parts (nose, chin). Lastly, we jointly optimize pose and shape unit parameters by minimizing the the following ICP energy:

\over R} ,\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}}
\over t} ,\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\frown$}}
\over \sigma }  = \mathop {\arg \min }\limits_{R,t,\sigma } \sum\limits_{i = 1}^N {{{\left\| {R({p_0} + {S_i}\sigma ) + t – {d_i}} \right\|}^2}} \]

Levenberg-Marquardt algorithm is used to solve the above non-linear least squares problem [3].

4. Tracking

The overall tracking process is given in the below diagram:

The tracking process starts with minimizing the ICP energy to recover the head pose and action unit parameters. The procedure is similar to Algorithm 1, with only one change: in the first iteration, the correspondences are formed by optical flow tracking of the 2D-projected vertex features from the previous color frame to the current color frame. From the second iteration, correspondences are found by searching for closest points.

Optical flow inherently introduces drifting into tracking, and the error accumulated over time will certainly reduce the tracking performance. Thus we incorporate On-line Appearance Model as a refinement step in our tracker using the full facial texture information while maintaining the no-training requirement.

The On-line Appearance Model in our tracker is similar to that of [4], in which:
-The appearance model is represented in a fixed-sized template.
-The mean appearance is built on-line for the current user after the 1st frame
– Each pixel in the template is modeled by an independent Gaussian distribution and thus the appearance vector is a multivariate Gaussian distribution which is updated over time:
\[{\mu _{{i_{t + 1}}}} = \left( {1 – \alpha } \right){\mu _{{i_t}}} + \alpha {\chi _{{i_t}}} \]
\[\sigma _{{i_{t + 1}}}^2 = \left( {1 – \alpha } \right)\sigma _{{i_t}}^2 + \alpha {\left( {{\chi _{{i_t}}} – {\mu _{{i_t}}}} \right)^2} \]

The final transformation parameters are found by minimizing the Mahalanobis distance (u is the (R, t, α) parameters vector)
\over u} }_t} = \mathop {\arg \min }\limits_{{u_t}} {\sum\limits_{i = 1}^n {\left( {\frac{{\chi {{({u_t})}_i} – {\mu _{{i_t}}}}}{{{\sigma _{{i_t}}}}}} \right)} ^2} \]

5. Experiments

5.1. Synthetic Data

Our single-threaded C++ implementation can run at up to 16fps on a 2.3Ghz Intel Xeon CPU, unfortunately that’s not fast enough to run on live stream. We generate 446 synthetic RGBD sequences from BU-4DFE dataset [5] where the initial frames contain neutral expression, with white noise applied to the depth maps. The size of the rendered face is about 200×250 pixels.

We compare the results of our tracker to a pure ICP-based tracker whose resulting parameters are clamped within predefined boundaries to prevent drifting. The errors shown in Table 1 do not truly reflect the superior performance of the hybrid tracker over the ICP tracker as seen in the figure.

5.2. Real RGB-D sequences

We capture sequences from a Kinect and a Senz3D cameras. In the Kinect sequence, the depth map is aligned to the color image, and our tracker performs really well.

In the sequence captured from the Senz3D camera, due to the disparity between the texture and the depth map resolutions, we map the texture to the depth map instead – the generated texture thus becomes very noisy but the tracker can still works reasonably.


  • H.X. Pham and V. Pavlovic, “Hybrid On-line 3D Face and Facial Actions Tracking in RGBD Video Sequences” In: Proc. International Conference on Pattern Recognition (ICPR). (2014)


  • [1] J. Ahlberg, “An updated parameterized face” Image Coding Group,Dept. of Electrical Engineering, Linkoping University, Tech. Rep.
  • [2] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3d point sets,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 9, no. 5, pp. 698–700, 1987.
  • [3] A. W. Fitzgibbon, “Robust registration of 2d and 3d point sets,” Image and Vis. Comput., no. 21(13-14), pp. 1145–1153, 2003.
  • [4] F. Dornaika and J. Orozco, “Real-time 3d face and facial feature tracking,” J. Real-time Image Proc., pp. 35–44, 2007.
  • [5] L. Yin, X. Chen, Y. Sun, T. Worm and M. Reale, “A High-Resolution 3D Dynamic Expression Database”, in IEEE FG’08, 2008.