A result sample of our 3D face tracker, where (a) shows the 3D landmarks projected onto image plane, (b,c) show the 3D blendshape model and the input point cloud, and (d) shows the skinned 3D shape.
1. Abstract.
We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. This is accomplished by the combination of a flexible 3D shape regressor and the joint 2D+3D optimization on shape parameters. Our approach fits facial blendshapes to the point cloud of the human head, while being driven by an efficient and rapid 3D shape regressor trained on generic RGB datasets. As an on-line tracking system, the identity of the unknown user is adapted on-the-fly resulting in improved 3D model reconstruction and consequently better tracking performance. The result is a robust RGBD face tracker, capable of handling a wide range of target scene depths, beyond those that can be afforded by traditional depth or RGB face trackers. Lastly, since the blendshape is not able to accurately recover the real facial shape, we use the tracked 3D face model as a prior in a novel filtering process to further refine the depth map for use in other tasks, such as 3D reconstruction.
2. The tracking framework.
In this work, we use the blendshape model from FaceWarehouse database.
The figure shows the pipeline of the proposed face tracking framework, which follows a coarse-to-fine multi-stage optimization design. In particular, our framework consists of two major stages: shape regression and shape refinement. The shape regressor performs the first optimization stage, which is learned from training data, to quickly estimate shape parameters from the RGB frame. Then, in the second stage, a carefully designed optimization is performed on both the 2D image and the available 3D point cloud data to refine the shape parameters, and finally the identity parameters are updated to improve shape fitting to the input RGBD data.
The 3D shape regressor is the key component to achieve our goal of 3D tracking at large distance, where quality of the depth map is often poor. Unlike the existing RGBD-based face tracking works, which either heavily rely on the accurate input point cloud (at close distances) to model shape transformation by ICP or use off-the-shelf 2D face tracker to guide the shape transformation, we predict the 3D shape parameters directly from the RGB frame by the developed 3D regressor. This is motivated by the success of the 3D shape regression from RGB images. The approach is especially meaningful for our considered large distance scenarios, where the depth quality is poor. Thus, we do not make use of the depth information in the 3D shape regression to avoid profusion of inaccuracies from the depth map.
Initially, a color frame I is passed through the regressor to recover the shape parameters θ. The projection of the Nl landmarks vertices of the 3D shape to image plane typically does not accurately match the 2D landmarks annotated in the training data. We therefore include 2D displacements D into the parameter set and define a new global shape parameter set P = ({θ},D) = (R,T,e,D). The advantages of including D in P are two-fold. First, it helps train the regressor to reproduce the landmarks in the test image similar to those in the training set. Second, it prepares the regressor to work with unseen identity which does not appear in the training set. In such case the displacement error D may be large to compensate for the difference in identities. The regression process can be expressed as \[P^{out} = {f_{r}}(I,P^{in})\], where fr is the regression function, I is the current frame, Pin and Pout are the input (from the shape regression for the previous frame) and output shape parameter sets, respectively. The coarse estimates Pout are refined further in the next stage, using more precise energy optimization added with depth information. Specifically, \[ \theta = (R,T,e)\] are optimized w.r.t both the 2D prior constraints provided by the estimated 2D landmarks by the shape regressor and the 3D point cloud. Lastly, the identity vector wid is re-estimated given the current transformation. (For more details, please refer to our manuscript on arXiv).
The effect of using depth data for regularization: (a,b) without depth data; (c,d) with depth data
Identity adaptation:
3. Tracking results.
– On BU4DFE dataset
– On real RGBD sequences
with occlusion:
4. Depth recovery using dense shape priors.
Based on our previous work, we replace the sparse Candide face model with blendshape and develop the depth recovery process as a filter on depth map.
A result sample on real data at 2m: (a) the prior, (b) the raw depth data (c) filtered without prior (d) filtered with prior
Publication:
- H. X. Pham, C. Chen, L. N. Dao, V. Pavlovic, J. Cai and T.-J. Cham. “Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes”. 2015