Talking Hands

We investigate the utility of mobile accelerometer data to identify human hand gestures and recognize the sign language using Recurrent Neural Networks (RNNs). A set of unsupervised features are learn, to recognize the phrases from American Sign Language (ASL), using Restricted Boltzman Machines (RBM). We validate the efficacy of our method by comparing it with the best performing supervised feature set — the proposed unsupervised features outperform the traditional handcrafted features. We use a labelled dataset of 600 accelerometer readings collected from 50 users to validate our approach.

The recent surge in the popularity of wearable devices is changing the way we communicate with surroundings. From healthcare devices to smart assistants, sensing devices have seamlessly permeated into our daily lives. The majority of these devices are physical activity tracking specific—they often track and record steps, sleep patterns, calories burned etc. to assist people. These devices record human activities using various on board sensors such as gyroscope, GPS, IMU and accelerometer. In our proposed work, we explore the use of the accelerometer for fine-grained activity recognition, specifically, sign language recognition. A variety of algorithms have been applied on the accelerometer data harnessed from mobile devices and other wearables to perform human activity recognition. When it comes to high accuracy in predication, multiple accelerometer sensors (attached to various body parts) have proven better in terms accuracy but often fall short in ‘out of lab’ experiments because of the discomfort of wearing multiple devices. Widespread availability of the accelerometers in wearables and mobile devices set them apart from other sensors. Its compact size, low cost and low power requirements makes it most relevant to our work. We take in our motivations from such works and propose the use of accelerometer for ‘phrase recognition’. Given a use case, where a deaf person responds to someone who does not know ASL or a deaf person wishes to interact in a foreign country, our solution can come in handy. To the best of our knowledge we haven’t witnessed the use of mobile accelerometer data for ASL recognition in past. In past we have witnessed the use of vision and signal processing based algorithms to recognize hand gestures for the interactions. They often have dealt with recognition of simple ‘words’ and ‘gestures’. We instead use the accelerometer sensor data to generate a description for the actions performed by a person wearing the sensor. We use RBM to learn an unsupervised feature representation of the accelerometer data. Accelerometer-specific features are learnt by finding the compact low-dimensional representation using RBMs. We use stacked variant of RBM, i.e. DBN (Deep Belief Network), to learn hierarchical organization of explanatory factors in data (first level training). DBNs are theoretically more expressive and being a composition of simple RBM networks they yield better representative features. Hierarchical representation learned by stacking RBMs with compressive architecture. RBM being an unsupervised feature learner and has an expressive representation making it appropriate for our needs.

User wears the device on his wrist and performs the desired action. The solution identifies the phrase linked to the actions.
User wears the device on his wrist and performs the
desired action. The solution identifies the phrase linked to the actions.