Long short-term memory (LSTM) is a relatively recent technique applied in the context of artificial neural networks. But it has reached a status of fundamental component in new products for major technology companies the likes of Google, Apple or Baidu.
The paper review will be about the application of LSTM on two sets of distinct fields, that nonetheless share in common the traits necessary for the application of LSTM: there are very long time lags of unknown size between important events. The authors of this paper named the technique contextual text classification and applied it to costumer support (business) and in the analysis of voter opinion during elections (political studies).
In this work, we apply word embeddings and neural networks with Long Short-Term Memory (LSTM) to text classification problems, where the classification criteria are decided by the context of the application. We examine two applications in particular. The first is that of Actionability, where we build models to classify social media messages from customers of service providers as Actionable or Non-Actionable. We build models for over 30 different languages for actionability, and most of the models achieve accuracy around 85%, with some reaching over 90% accuracy. We also show that using LSTM neural networks with word embeddings vastly outperform traditional techniques. Second, we explore classification of messages with respect to political leaning, where social media messages are classified as Democratic or Republican. The model is able to classify messages with a high accuracy of 87.57%. As part of our experiments, we vary different hyperparameters of the neural networks, and report the effect of such variation on the accuracy. These actionability models have been deployed to production and help company agents provide customer support by prioritizing which messages to respond to. The model for political leaning has been opened and made available for wider use.
As is rightly mentioned in the first paragraphs, the added benefit of applying these models and techniques are to do with increased fast response times (reduced latency) in the case of costumer support, and better sentiment analysis extraction in long sequences of text in the analysis of speech (voter opinion in the context of electoral political studies):
Public interactions on social networks provide an ideal platform for customers to interact with brands, companies and service providers. According to Okeleke  50% of users prefer reaching service providers on social media over contacting a call center. A user may seek customer support from a service provider on social media with a complaint or a call for assistance. In most cases where a customer may reach out in this manner, the message with the complaint expresses a negative sentiment. A service provider seeking to resolve such customer complaints may find little value in identifying that the sentiment of such messages is negative.
Instead, a better contextual classification is to classify the messages as Actionable or Non-Actionable. For such conversations, social media messages that include a clear call to action, or raise a specific issue can be categorized as Actionable. Alternatively, agents of the service provider may not be able to respond to Non-Actionable messages that are too broad, general or not related to any specific issue. The service providers could then prioritize responding to Actionable messages over non-actionable ones, saving money and resources in the process. The ability to sift through interactions and classify them as actionable can help reduce company response latency and improve efficiency, thereby leading to better customer service. A good solution to identify actionable messages can lead to large cost savings for companies by reducing call center load
Another example where such contextual classification becomes important, is with respect to voter opinions during elections. During the election year in the United States, people on social media platforms such as Twitter may talk about various political issues, some leaning left with Democratic views and some leaning right with Republican views. While expressing such views, a particular individual may speak positively about certain issues and negatively about others. In this scenario, merely understanding the sentiment of messages is not sufficient to provide enough insight into voter opinions. Instead messages classified as Democratic or Republican may provide greater insight into the political preferences of different groups of people, and such classification may prove to be more valuable to a political candidate trying to understand his or her voters.
The main contribution of this work is further described below, with a well structured explanation of the context in which the application of the kind of neural networks the authors work with are thought to be of most utility (and indeed proved by plenty other studies, as mentioned ):
In recent studies, neural networks have shown great promise in performing sentiment and text classification tasks , , . Further, word embeddings have proven to be useful semantic feature extractors , , and long-short term memory (LSTM) networks  have been shown to be quite effective in tasks involving text sequences . Here we apply word embeddings and LSTM networks to the problem of contextual text classification, and present experiments and results for the two applications described above.
Our contributions in this study are as follows:
- Actionability: We build models to classify messages as actionable or non-actionable, to help company agents providing customer support by prioritizing which messages to respond to. These models have been deployed to production.
- Political Leaning: We build a second set of models to predict political leaning from social media messages, and classify them as Democrat or Republican. We open this model and make it available for wider use.
- RNN experiments: We vary different hyperparameters of the RNN models used for classifying text in the applications above, and provide a report of the effect of each hyperparameter on the accuracy.
- Multi-Lingual models: We build models for over 30 different languages for actionability using the LSTM neural networks, and also compare these results with traditional machine learning techniques.
Related Work and Methodology
The particular kind of neural network featured in this work is a recurrent neural network (RNN), of which the LSTM is one such network, have shown to be remarkably successful at text classification tasks requiring sequential learning. Below I will sketch the motivation, the related research work in this area as well as the methodology employed in the paper:
Deep learning techniques have shown remarkable success in text processing problems. The authors in ,  and  have applied Convolutional Neural Networks (CNNs) for a variety of NLP and text processing tasks with great results. Recurrent Neural Networks (RNNs) have also shown to be very effective for text classification , and LSTM networks  in particular have shown to perform well for sequence based learning tasks . Furthermore, word embeddings such as those in word2vec  or GloVe , map words and phrases to a lower dimensional space, which serve as semantic feature extractors that can be effectively used for training. Given the inherently sequential nature of messages and sentences, we employ LSTM units in our neural networks in this work, with word embeddings for semantic feature extraction.
Here it is a nice set of two paragraphs outlining the work in this paper from the view of motivation and goals:
The first problem we consider is that of actionability of messages with respect to customer support on social media platforms. Munro  studied this problem of actionability in the context of the disaster relief, where subword and spatiotemporal models were used to classify short-form messages. Jin et. al.  used KNN and SVM techniques to identify customer complaints on a dataset containing 5, 500 messages from an online Chinese hotel community. Although the study showed promising results the dataset was limited and constrained to a very specific domain. Our work here builds on our previous work in , where manually extracted features and traditional supervised learning techniques were applied to this problem.
Another problem we consider is that of identifying political leaning of messages, by analyzing social media posts by users of such platforms. A similar study was carried out by the authors in , where the datasets considered were derived from debate transcripts and articles by authors with well-known political leanings. Here we instead analyze this problem for a dataset derived from social media messages, which are far more noisy in nature, and include colloquial language, abbreviations and slang. As noted by the authors in , for politics in the United States, a voter from either party may have a mixture of conservative or liberal views. While the work in  classified documents as liberal or conservative, here we classify messages along party lines, as Democratic or Republic
The methodology for this work had to be carried in ways that could overcome shortcomings from the application of RNNs, such as the problem of exploding and vanishing gradients:
Long Short Term Memory (LSTM) units explicitly avoid this problem by regulating the information in a cell state using input, output and forget gates . Such long-term dependencies become an important consideration when learning to classify messages. The neural network used here employs word embeddings with LSTM units to perform contextual text classification. We feed the pre-processed input to the neural network, and use the labels to perform supervised learning on the messages. Figure 1 shows these layers pictorially:
Following is the training and evaluation of the model accompanied by a set of graphs detailing the accuracy as a function of the number LSTM and the Embedding Layer Units:
For training the model, we use a similar architecture as the one used for the actionability problem above, with 128 units each in the embedding and LSTM layers. The model achieves a 88.82% accuracy on the training set, and a 87.57% accuracy on the test set. We examine the effects of changing various parameters on the accuracy of the model.
- Embedding Layer Units: For a fixed number of units in the LSTM layer (64), we observe that changing the number of units in the Embedding layer between 64 to 256 units shows very little change in accuracy. When the units in the Embedding layer are lower (32) or higher (512), the accuracy is also higher, but if the number of units is too low (16) the accuracy drops. Figure 2a shows this graphically.
- LSTM Layer Units: When the Embedding layer is held fixed at 128 units, the highest accuracy is seen when the LSTM layer has 32 units. Increasing the number of units beyond 32 lowers the accuracy, and remains more or less constant thereafter. Figure 2b shows the change in accuracy as a function of the LSTM layer units.
- Embedding and LSTM Layer Units: When the Embedding and LSTM layers have the same number of units, larger sizes show higher accuracies, but the difference is nevertheless small. Figure 2c shows a distinct increasing trend in accuracy as the number of units are increased.
- Optimizers: Among the tested optimizers, the ’Adam’ optimizer  performs the best with an accuracy of 87.57%, followed by ’Adagrad’ with 87.12% and ’RMSprop’ with 87.06%.
The authors conclude the paper with the somewhat surprising claim that sentiment analysis is quite limited for accurate text classification of customer support on social media and for the determination of political leaning over messages or posts on social media:
In this study, we explore two useful applications of text classification that go beyond sentiment analysis. For the purpose of such classification, we employ neural networks with word embedding and LSTM layers, and examine various parameters that affect the model.
The first application we consider is that in the context of customer support on social media, where company agents respond to customer complaints. For this problem, we classify social media messages from customers as actionable or non-actionable, so that company agents can prioritize which messages to respond to. We break down this problem and solve it for each language separately, and find that the models perform very well for most languages, with over 90% accuracy for some of them. We also find that the LSTM networks outperform traditional learning techniques by a large margin. Further a relatively small vocabulary size of 20, 000 and a training set of size 330, 000 is able to capture the features required to predict actionability.
The second application we investigate is that of political leaning, where we classify messages as Democratic or Republican. We try variations of different parameters of the neural networks and observe their effects on the accuracy. Overall we see an accuracy of around 87% in the best performing models for political leaning, and provide several illustrative examples of their effectiveness.
In the case of actionability most customer complaints carry a negative sentiment, whereas for political leaning both sides display a mixture of positive and negative sentiments. As such, sentiment proves to be less valuable for such applications, since it is not a strong indicator of actionability or political leaning. We therefore make the case here, that going beyond sentiment analysis to classify text-based on other contextual criteria can lead to large benefits in various applications.
Featured Image: Wikipedia page of LSTM