**1. Abstract**

Smartphone based indoor localization caught massive interest of the localization community in recent years. Combining pedestrian dead reckoning obtained using the phone’s inertial sensors with the GraphSLAM (Simultaneous Localization and Mapping) algorithm is one of the most effective approaches to reconstruct the entire pedestrian trajectory given a set of visited landmarks during movement. A key to GraphSLAM-based localization is the detection of reliable landmarks, which are typically identified using visual cues or via NFC tags or QR codes. Alternatively, human activity can be classified to detect organic landmarks such as visits to stairs and elevators while in movement. We provide a novel human activity classification framework that is invariant to the pose of the smartphone. Pose invariant features allow robust observation no matter how a user puts the phone in the pocket. In addition, activity classification obtained by an SVM (Support Vector Machine) is used in a Bayesian framework with an HMM (Hidden Markov Model) that improves the activity inference based on temporal smoothness. Furthermore, the HMM jointly infers activity and floor information, thus providing multi-floor indoor localization. Our experiments show that the proposed framework detects landmarks accurately and enables multi-floor indoor localization from the pocket using GraphSLAM.

**2. Motivation**

- We extended the design of pose invariant features for an activity classification task. Pose is defined by how a person puts a smartphone in the pocket. We show that pose invariant features can be used to successfully classify activities.
- We designed a Hidden Markov Model that enables the integration of activity classification and floor inference.
- We applied the GraphSLAM algorithm with our activity and floor detection framework to provide multi-floor localization in a building.

**3. Framework**

The overview of the framework is depicted in the following figure.

** a) Feature Extraction**

**Pose Invariant Feature for IMU Sensors**

IMU sensor readings depend on the pose of the smartphone, which is defined as the orientation of the phone in the pocket. A pose-invariant system is strongly desirable because it frees the user from the restriction of keeping the smartphone in a particular orientation. [1] identified that the autocorrelation of acceleration data is invariant to the rotation of the accelerometer:

\begin{align}

f(\omega) &= \int exp(-i \omega t)s(t)dt \in \mathbb{C}^3, \\

F &= [f(\omega_1), f(\omega_2), \dots, f(\omega_n)] \in \mathbb{C}^{\mathrm{3}\times\mathrm{n}},\\

A &= F^*F.

\end{align}

The pose invariant property is inherited from the fact that the rotation matrix \(R\) is an orthogonal matrix, \(R^TR=I\).

\begin{align}

\hat{s}(t) &= Rs(t), \nonumber \\

\hat{f}(\omega) &= \int exp(-i \omega t)\hat{s}(t)dt, \nonumber\\

&= R\int exp(-i \omega t)s(t)dt, \nonumber\\

&= Rf(\omega). \\

\hat{A} &= \hat{F}^*\hat{F} = F^*R^TRF = F^*F = A.

\end{align}

We extend the idea and apply the pose invariant features on both accelerometer and gyroscope sensor data to classify pedestrian activity. The following figure illustrates pose invariant features.

**Statistical Features from a Barometer**

Barometer readings consistently fluctuate even if the sensor stays at the same level, thus we need to use some statistical features to get robust observations, as listed in the following table.

** b) SVM Activity Classification**

Rotation invariant features are extracted from accelerometer and gyroscope sensors and statistical features from barometer in each sliding window of an input sequence. A linear SVM model classifies each sliding window sample and generates class probability from Platt’s scaling algorithm. SVM classification is limited to observations from one sliding window and has no ability to maintain reference to activities occurring in previous sliding windows. Hence, there may arise sporadic misclassifications. Classification results can be improved if we promote temporal smoothness on the activity sequence.

** c) HMM Activity and Floor Inference**

Activity classification results obtained from the SVM can be refined by an HMM if we define activities as states and suppress the unlikely state transitions. Furthermore, by extending the definition of a state as a joint identification of the activity and the floor, state inference can integrate activity with floor inferences. Such a combined state will help constrain the state transition.

**Transition probability**

We manually design the transition probabilities as shown in the following figure. It results from the fact that activity transition occurs sparsely over time, thus probability of state transition is much lower than staying in the same state. Moreover, the transition between certain activities is not possible.

**Observation probability**

Observation probabilities are obtained jointly from activity and floor likelihood. Air pressure observation from a barometer \(y_{\textrm{floor}}\) is modeled by a mixture of Gaussians, where each floor forms a Gaussian distribution with \((\mu_{\textrm{floor}}, \sigma_{\textrm{floor}})\). Activity class posterior \(p(s_{\textrm{act}_i}|y_\textrm{act})\) is estimated from Platt’s scaling on SVM decision values.

\begin{align}

p(y|s_i) &= \frac{p(s_i | y)p(y)}{p(s_i)},\\

p(s_i|y) &= p(s_{\textrm{floor}_i}|y_{\textrm{floor}}) p(s_{\textrm{act}_i}|y_{\textrm{act}}),\\

p(y) &= \frac{1}{|T|}, p(s_i) = \frac{1}{|S|}.

\end{align}

** d) Post-Process Rectification**

The HMM smooths the state transition because the probability of state change is much smaller than that of staying in the same state. Thus, the number of sporadic misclassifications from the SVM may be reduced. In addition, activity inference of the HMM can be further improved by rectifying activities of *stairs* that involve no floor change to *walk* and, likewise, *elevators* to *stand still*.

** e) Multi-floor GraphSLAM with Organic Landmark**

GraphSLAM is an approach that optimizes a trajectory by representing it as a graph of constraints between consecutive positions and by minimizing an error to satisfy the constraints specified by the graph. In order to obtain an accurate trajectory, GraphSLAM requires a good number of landmarks visited more than once. Detailed explanation and formulation can be found in the tutorial [2].

In this paper, we focus on providing organic landmarks which are stairs and elevators detected when a pedestrian moves inside a building. The identity of landmarks can be determined by comparing WiFi visibility signatures such as the MAC address of a WiFi access point. On training, WiFi visibility and the physical location of all landmarks are obtained as a reference landmark list. Then, when a landmark is detected on testing, we compare the current WiFi visibility to all landmarks and take the physical location of the closest landmark in the reference list.

**3. Experiments and Results**

We experimented with the proposed method in a large, multi-floor office building with many stairs and elevators. In our experiments, accelerometer, gyroscope and barometer data are recorded from Android smartphones at 50Hz. Training data were recorded for a total of 10271 seconds performed by three subjects. To help with annotation, the same action was performed repeatedly. We defined 6 indoor activities of *walking, taking stairs down, taking stairs up, standing still, taking elevator down* and *taking elevator up*. For test data, subjects walked inside a building naturally. Test trajectories are composed of 12 sequences in total of 6160 seconds long.

** a) Quantitative analysis**

Activity classification results for various models are shown in the following table. Columns show class accuracies for SVM, HMM and rectification results, respectively. The HMM inference obtained from the Viterbi algorithm improves over the SVM classification for all activities. Figure 4 shows that the HMM improves confusions on locomotive activities of *walk, stair down* and *stair up*. It also improves misclassifications of the activities of *stand still* to *elevator down* and *elevator up*. Such sporadic misclassifications were suppressed by temporal smoothing from the HMM. Finally, post-processing with HMM inference further rectifies the *walk* activity which was misclassified as *stairs*.

**b) Qualitative analysis**

The following figure shows an example of the inference result. The labels of activities are *walking(WA), stair down(SD), stair up(SU), stand still(SS), elevator down(ED) *and* elevator up(EU)* from bottom to top. In the given sequence, the user visited 7 floors including \(F0\) which is the basement.

We observe that SVM inference gives misclassification between locomotive activities of *walking, stair down *and* stair up*. Those misclassifications are corrected by the HMM Viterbi algorithm. The floor is correctly inferred by the Viterbi algorithm. Post-processing further corrects *stairs down* and *stairs up* activities that did not incur a floor change to *walk*.

**c) Multi-floor GraphSLAM**

This example describes a trajectory that included visits to 4 floors.

Initial trajectory obtained from smatphone PDR contains drift.

The following video shows how GraphSLAM improves trajectory using organic landmarks obtained from the proposed framework.

https://www.youtube.com/watch?v=JK9ABxnTRMw

**4. Conclusion**

In this paper, we propose a novel framework that jointly infers activity and floor landmarks. Pose invariant features from inertial sensors are adopted for SVM-based activity classification. We design an HMM where an activity on each floor defines a state. State transitions are designed to provide temporal smoothness of a state sequence. The probability of an observation is estimated from an activity class probability provided by the SVM classification, which is multiplied by the floor likelihood from a mixture-of-Gaussians model. We further rectify activity inferences when we observe activities of stairs and elevators which do not incur floor changes. Our experiments show that the proposed framework accurately classifies activities and infers floors. Finally, we showed that the organic landmarks obtained from our framework can be applied effectively to enable multi-floor GraphSLAM.

**References**

- [1] T. Kobayashi, K. Hasida, and N. Otsu, “Rotation invariant feature extraction from 3D acceleration signals,” in
*ICASSP*, 2011 - [2] G. Grisetti, R. Kummerle, C. Stachniss, and W. Burgard, “A tutorial on graph-based SLAM,” in
*Intelligent Transportation Systems Magazine*, IEEE, 2010.

**Full text: [pdf]**

**Presentation slide: [pptx]
**