1. Overview

Modeling and classification of human actions are important problems that have received significant attention in pattern recognition. Mocap data is widely available and can serve as a good proxy for assessing action models before they are applied to video data. In this paper, we present a human action classification framework that extends the video analysis using Granger causality graphs to represent densely sampled human actions embodied in mocap data. We accomplish this by defining sparse events detected in movements of human body parts. The events are taken as nodes of a graph and edge weights are calculated from Granger causality between pairs of events. The graph describes human actions in terms of causal relationship among body parts movements.

Fig 1. Framework overview. Walking action is shown on the left column and jumping on the right. Top row depicts example of mocap sequences \(\bf{d_k}\). Two point processes on the events of right leg \(N_{k1}\) and left leg \(N_{k4}\) are shown for each action. Different temporal patterns are observed for different actions. From the point processes, Granger causality graph \(G_k\) is constructed to represent its motion by causal relations between events. For walking sequence, the event in left leg causes right leg(\(G_{N_{k1}\to N_{k4}}\)). But the same causal relationship is not observed for jumping. Finally, a model that classifies causal graphs is learned for each action class.

**2. Prior work**

Granger causality is a statistical test to detect a relationship between two time series [1]. In prediction for a time series \(X\), it can be seen that another time series \(Y\) causes \(X\) if adding \(Y\) helps prediction of \(X\). Given two auto regressive (AR) models of \(X\)

\[ X_t=\sum_{j=1}^{\infty}a_{1j}X_{t-j}+\epsilon_{1t},\>\>\epsilon_{1t} \sim \mathcal{N}(0, \Sigma_1), \tag{1} \]

\[ X_t=\sum_{j=1}^{\infty}a_{2j}X_{t-j}+\sum_{j=1}^{\infty}b_{2j}Y_{t-j}+\epsilon_{2t}, \>\>\epsilon_{2t} \sim \mathcal{N}(0, \Sigma_2), \tag{2} \]

the causal power \(G_{Y \to X}\) is high if adding \(Y\) reduces prediction error of \(X\). Thus, Granger causality is defined as: \(G_{Y \to X}=\ln(\Sigma_1/\Sigma_2)\).

Non-parametric pairwise Granger causality is calculated as follows: given two point processes \(n_X\) and \(n_Y\), a power spectral matrix \(S_{XY}\) is defined as the Fourier transform of covariance of two point processes \(n_{X}, n_{Y},\) which is estimated using the multitaper function \(h_k(t_j)\) [2]:

\[ S_{XY}(f)=\frac{1}{2\pi KT}\sum_{k=1}^{K}\widetilde{n_X}(f,k)\widetilde{n_Y}(f,k)^*,\tag{3} \]

\[ \widetilde{n_i}(f,k)=\sum_{j}h_k(t_j)\exp(-i2 \pi f t_j),\tag{4}\]

and \(S_{XY}\) is factorized by Wilson’s algorithm as follows:

\[ S_{XY}(f)=H_{XY}(f) \Sigma_{XY} H^{*}_{XY}(f),\tag{5}\]

where \(H_{XY}\) is the transfer function which corresponds to coefficient of AR model, \(\Sigma\) corresponds to covariance matrix of error term of AR model and \(\ast\) represent conjugate transpose.

Nonparametric pairwise Granger causality of \(G_{n_Y \to n_X}\) for frequency \(f\) is finally calculated as:

\[ G_{n_{Y} \to n_{X}}(f)=\ln\frac{S_{XX}(f)}{S_{XX}(f)-(\Sigma_{YY}-\frac{\Sigma_{XY}^2}{\Sigma_{XX}}) |H_{XY}(f)|^2},\tag{6} \]

We will use this notion of causality in analyzing the mocap data, where many motions exhibit natural (quasi) periodic behavior.

**3. Sparse Granger Causality Graph Model**

**Algorithm 1.** Sparse Granger Causality Graph Model

Input: mocap dataset \( D = \{(\mathbf{d}_1, y_1), \ldots, (\mathbf{d}_n, y_n)\}\)

1. Generate a set of point processes \(N_k\) from \(\mathbf{d}_k\)

\({\bf for } \textrm{joint } i\)

\(N_{ki} \leftarrow \{1(t)|d_{ki}^t = \textrm{peak or valley}\}\)

\({\bf end for}\)

2. Estimate Granger causality graph \(G_k\) from \(N_k\)

\({\bf for}\) joint pair \(X,Y\)

Estimate spectrum \(S_{XY}\) from Eq. (3)

Factorize \(S_{XY}\) from Eq. (5)

Estimate Granger causality \(G_{n_Y \to n_X}\) from Eq. (6)

\({\bf end for}\)

3. Learn a sparse model

\({\bf for}\) action class \(c_i\)

Learn L1 regularized log. reg. model from \(\{G_k|y_k = c_i\}\)

\({\bf end for}\)

Our approach to building the sparse causality graph models of human actions is summarized in Algorithm 1. The approach, denoted by SGCGM, has three major steps:

1. From each mocap sequence, events are detected and point processes are generated on the events. A mocap sequence consists of multiple time series densely recorded for each joint angle. From each dense time series of a joint angle, two different events of peak and valley are extracted through the extreme point detector. As a result, we define \(M\) events over all joints, and \(M\) point processes on events construct a set of point processes \(N_k\) for a mocap sequence \(\bf{d_k}\). We assume that representing joint angle trajectories with extreme points conveys enough information to construct causal structure among joints.

2. From a set of point processes \(N_k\) for a mocap sequence \(\bf{d_k}\), a Granger causality graph \(G_k\) is constructed. For each pair of point processes \(N_k\), a power spectrum \(S\) is estimated by the multitaper method.

Pairwise non-parametric Granger causality is calculated over \(F\) frequencies from the Equation \eqref{eq:G}. As a result, a Granger causality graph is represented in \(F\) adjacency matrices of size \(M\textrm{x}M\), one for each frequency band. Each node represents an event in the point process \(n_X\) and a pairwise causality power \(G_{n_Y \to n_X}\) is reflected as a direct edge weight between node \(X\) and \(Y\).2. From a set of point processes \(N_k\) for a mocap sequence \(\bf{d_k}\), a Granger causality graph \(G_k\) is constructed. For each pair of point processes \(N_k\), a power spectrum \(S\) is estimated by the multitaper method.

3. After each mocap sequence is converted into a causal graph, we learn a model that classifies the causal graph into one of the action classes. In order to capture sparse structure of the graph displayed across samples of each class, we exploit an L1 regularized logistic regression model. To represent the graph, we take Granger causality of all edges and frequencies as a feature.

A sparse regression model will learn regression coefficients between the input features and the class label. The classification model is learned for each action class from the training data and each test data is classified to the action that shows highest confidence level on the logistic function.

4. Experimental Evaluation

We performed experiments on the HDM05 dataset [3]. HDM05 contains mocap data in form of 29 skeletal joints, each of 2-3 rotation angles, resulting in 62 joint angle time series. From each time series, two events of peak and valley are detected. As a result, the number of total point processes M is set to 124. Also, we set the number of frequencies F in a power spectrum to 128. Upon computing the Granger causality graph for all 128 frequencies, we summarize them into 4 bands of high, mid-high, mid-low and low frequency by applying Hanning window in log scale.

SGCGM depends on the model learned for each class, which requires sufficient number of samples. HDM05 dataset is well-suited for our requirement with more than 100 classes and multiple trials performed by 5 subjects. We select 8 action classes that have a sufficient number of samples across subjects. The chosen classes are listed in Table 2.

We perform 5-fold cross validation in two different settings. In the first one the data is randomly samples across subjects so that both test and train data contain samples of motions performed by the same person. In the second setting we split the data so that data from the test subject was not used during training. This is typically a more challenging setting.

**Events detected from CMU motion capture data**

- Events of left and right knee extracted from a
**walking**sequence

- Events of left and right knee extracted from a
**jumping**sequence

**Granger causal graph**

Fig 2. A Granger causality graph of the class *DepositFloorR*. Edges having top 5\% of the weights are drawn. Edges among femurs, tibias and feet describe bending legs. Direct edges from right hand and thumb to right and left tibia represent deposit motion with right hand.

**5. Results**

Setting | LDS | DTW | CTW | IsoCCA | SGCGM |

Cut1 | 79.31±8.1 | 69.5±8.49 | 62.8±6.8 | 74.6±9.9 | 87.4±4.7 |

Cut2 | 45.9±13.0 | 51.8±10.5 | 50.9±12.0 | 57.5±11.4 | 69.3±9.5 |

Table 2: Confusion matrix of SGCGM result for Cut1

- [1] C. W. J. Granger. “Investigating causal relations by econometric models and cross-spectral methods”, Econometrica, 37(3):424–438, 1969.
- [2] A. Nedungadi, G. Rangarajan, N. Jain, and M. Ding. “Analyzing multiple spike trains with nonparametric granger causality”, Journal of Computational Neuro-science, 27:55–64, 2009.
- [3] M. Müller, T. Röder, M. Clausen, B. Eberhardt, B. Krüger, A. Weber “Documentation Mocap Database HDM05”, Technical report, No. CG-2007-2, ISSN 1610-8892, Universität Bonn, June 2007.

# References

**Full text:[pdf]**

**Presentation slide:[pptx]**