The Morning Paper is an old friend blogger with several reposts here in The Intelligence of Information (and the former The Information Age…). The wide scope of the author of this blog in covering every weekday a paper relevant for Computer Science issues is a preciosity in an often vacuous content Internet of mass media consumption..
Today Adrian Colyer got is mojo back about deep learning issues with a more than business as usual good post about a new framework: DeepSense. It is must read (the post and the paper of the same title) and I will briefly sketch my own main takeaways:
DeepSense is a deep learning framework that runs on mobile devices, and can be used for regression and classification tasks based on data coming from mobile sensors (e.g., motion sensors). An example of a classification task is heterogeneous human activity recognition (HHAR) – detecting which activity someone might be engaged in (walking, biking, standing, and so on) based on motion sensor measurements. Another example is biometric motion analysis where a user must be identified from their gate. An example of a regression task is tracking the location of a car using acceleration measurements to infer position.
Compared to the state-of-art, DeepSense provides an estimator with far smaller tracking error on the car tracking problem, and outperforms state-of-the-art algorithms on the HHAR and biometric user identification tasks by a large margin.
Despite a general shift towards remote cloud processing for a range of mobile applications, we argue that it is intrinsically desirable that heavy sensing tasks be carried out locally on-device, due to the usually tight latency requirements, and the prohibitively large data transmission requirement as dictated by the high sensor sampling frequency (e.g., accelerometer, gyroscope). Therefore we also demonstrate the feasibility of implementing and deploying DeepSense on mobile devices by showing its moderate energy consumption and low overhead for all three tasks on two different types of smart device.
With introductions as these we are more than well vindicated on our way to a great read and a wealth of possibilities that this new deep learning framework might bring to the Internet of Things (IoT) sensors’ literature and practice.
The notebook sketches Adrian provide helped him better understand the paper. A nice suggestion to anyone struggling with the high abstraction of some the concepts in deep learning frameworks:
In working through this paper, I ended up with quite a few sketches in my notebook before I reached a proper understanding of how DeepSense works. In this write-up I’m going to focus on taking you through the core network design, and if that piques your interest, the rest of the evaluation details etcetera should then be easy to pick up from the paper itself.
Let’s start off by considering a single sensor (ultimately we want to build applications that combine data from multiple sensors). The sensor may provide multi-dimensional measurements. For example, a motion sensor that report motion along , and axes. We collect sensor readings in each of these dimensions at regular intervals (i.e., a time series), which we can represent in matrix form as follows:
Finding patterns in the time series data works better in the frequency dimension than in the time dimension, so the next step is to take each of the windows, and pass them through a Fourier transform resulting in frequency components, each with a magnitude and phase. This gives us a matrix for each window.
We’ve got of these, and we can pack all of that data into a tensor.
It’s handy for the implementation to have everything nicely wrapped up in a single tensor at this point, but actually we’re going to process slice by slice in the dimension (one window at a time). Each window slice is passed through a convolution neural network component comprising three stages as illustrated below:
First we use 2D convolutional filters to capture interactions among dimensions, and in the local frequency domain. The output is then passed through 1D convolutional filter layers to capture high-level relationships. The output of the last filter layer is flattened to yield sensor feature vector.
Next in the post we pass to the data processing from the combination of inputs from multiple sensors. There’s a second convolutional neural network used in this stage with a Rectified Linear Unit (ReLU) as the activation function:
The sensor feature matrix is then fed through a second convolutional neural network component with the same structure as the one we just looked at. That is, a 2D convolutional filter layer followed by two 1D layers. Again, we take the output of the last filter layer and flatten it into a combined sensors feature vector. The window width is tacked onto the end of this vector.
For each convolutional layer, DeepSenses learns 64 filters, and uses ReLU as the activation function. In addition, batch normalization is applied at each layer to reduce internal covariate shift.
From here the need to learn inter-window relationships across time windows arises and a Recurrent Neural Network (RNN) is the choice. In addition the type of RNN isn’t the now usual LSTM (Long-short term memory) unit but it is design a Gated Recurrent Unit (GRU) instead:
So now we have combined sensor feature vectors, each learning intra-window interactions. But of course it’s also important to learn inter-window relationships across time windows. To do this the feature vectors are fed into an RNN.
At this point I think we’re ready for the big picture.
Instead of using LSTMs, the authors choose to use Gated Recurrent Units (GRUs) for the RNN layer.
… GRUs show similar performance to LSTMs on various tasks, while having a more concise expression, which reduces network complexity for mobile applications.
To wrap up:
From the discussion/conclusion section of the paper:
We evaluated DeepSense via three representative mobile sensing tasks, where DeepSense outperformed state of the art baselines by significant margins while still claiming its mobile-feasibility through moderate energy consumption and low latency on both mobile and embedded platforms.
From The Morning Paper post:
The evaluation tasks focused mostly on motion sensors, but the approach can be applied to many other sensor types including microphone, Wi-Fi signal, Barometer, and light-sensors.
The Second suggestion on Sensors for IoT with a Convolutional Neural Network framework
Whilst I was eagerly reading the reference list of the paper that was brilliantly commented by The Morning Paper, I caught on with another recent paper about the implementation of convolutional neural networks for applications in smartphones with embedded sensors. In this case it concerns a wearable device-based gait recognition by a mobile device. I found it to be an open access paper published in the open access journals platform MDPI. The paper with its abstract:
Wearable Device-Based Gait Recognition Using Angle Embedded Gait Dynamic Images and a Convolutional Neural Network
The widespread installation of inertial sensors in smartphones and other wearable devices provides a valuable opportunity to identify people by analyzing their gait patterns, for either cooperative or non-cooperative circumstances. However, it is still a challenging task to reliably extract discriminative features for gait recognition with noisy and complex data sequences collected from casually worn wearable devices like smartphones. To cope with this problem, we propose a novel image-based gait recognition approach using the Convolutional Neural Network (CNN) without the need to manually extract discriminative features. The CNN’s input image, which is encoded straightforwardly from the inertial sensor data sequences, is called Angle Embedded Gait Dynamic Image (AE-GDI). AE-GDI is a new two-dimensional representation of gait dynamics, which is invariant to rotation and translation. The performance of the proposed approach in gait authentication and gait labeling is evaluated using two datasets: (1) the McGill University dataset, which is collected under realistic conditions; and (2) the Osaka University dataset with the largest number of subjects. Experimental results show that the proposed approach achieves competitive recognition accuracy over existing approaches and provides an effective parametric solution for identification among a large number of subjects by gait patterns.
This is a more involved technically paper and I will not fully review it here. But it is reading of interest and another confirmation as to the feasibility of implementation convolutional neural networks within embedded systems in Internet of Things (IoT) settings.
Based on the gait cycle starting point candidate set, the initial grid defined by its initial starting index and spacing will be evolved toward them in two main stages until the grid indexes match the corresponding peaks, as shown in Algorithm 1. In the first stage, equally spaced grid optimization by iteration is performed (line 3 to line 11), aiming at estimating the starting index and spacing of the grid while the grid indexes do not have to correspond to the peaks. In the second stage, each grid index is locally optimized without the equally spaced constraint (line 13 to 16) to move to the nearest peak with higher score. The greedy searching functions in Algorithm 1 are explained as follows:
- GreedySearch 1: By greedy searching neighborhood of within , return which is the nearest starting position candidate with highest score and ,
- GreedySearch 2: By greedy searching neighborhood of within , return which is the nearest starting position candidate with highest score and . In this step, the score function is modified by adding a distance penalty:
This paper is a more technical Computational Physics intense approach and research compared with the The Morning Paper’s DeepSense commented above. Below are the main concluding remarks (as alway the full reading is recommended):
In this paper, we proposed an effective parametric gait recognition approach using AE-GDI and CNN. AE-GDI is a novel 2D representation of gait patterns defined based on the linear transformation invariant feature of inertial sensor data sequence. To generate AE-GDIs, a grid-based greedy algorithm is also introduced to achieve robust gait cycle segmentation. The proposed approach is evaluated using the MCGILL dataset with long period inertial sensor data sequence and the OU-ISIR dataset with the largest number of subjects in two main applications of gait recognition respectively: gait authentication and gait labeling. Experiment results show our method is competitive against state of the art approaches or the designed baselines and outperforms them in recognition accuracy. Unlike non-parametric classification, samples in the training set do not have to be stored, and hand-craft selection and extraction of features are not needed with the proposed approach. In addition, the proposed AE-GDI representation allows for some image-based mechanisms like CNN to be applied directly. We believe the proposed approach moves a step toward accurate and reliable wearable-based gait recognition.
Although AE-GDI is orientation and translation invariant for inertial sensor data time series, it is still sensitive to sensor placement. In our future work, we will carry out more experiments for testing and evaluation of our approach on more practical applications to investigate how to improve recognition accuracy based on noisy and complex data from casually and loosely installed sensors. As the convolutional recurrent neural network has proven powerful for both automatic feature exaction and handling temporal correlation, we will investigate how it can be integrated with the AE-GDIs.