PACIS 2018 Proceedings

Short Text Classification Research Based on TW-CNN

Kefeng Pei, Nanjing University of Aeronautics and AstronauticsFollow
Yongzhou Chen, Nanjing University of Aeronautics and AstronauticsFollow
Jing Ma, Nanjing University of Aeronautics and AstronauticsFollow
Weimin Nie, Nanjing University of Aeronautics and AstronauticsFollow

Abstract

Short texts are characterized by short length and sparse features. The study is less effective in the classification of short texts. Motivated by this, this paper seeks to extract features from the “topic” and “word” levels with proposing a convolutional neural network (CNN) based on topic and word, which is named TW-CNN. It uses the Latent Dirichlet Allocation (LDA), a topic model, and word2vec to obtain two distinct word vector matrices, which are then respectively taken as the inputs of two CNNs. After the process of convolution and pooling of the CNNs, there are two different vector representations of the text. And the vector representations are connected with the text-topic vector obtained by LDA, forming the final representation vector of the text. In the end, softmax text classification is conducted. And experiments based on short news texts show that the TW-CNN model has an improvement over the traditional CNNs.

Recommended Citation

Pei, Kefeng; Chen, Yongzhou; Ma, Jing; and Nie, Weimin, "Short Text Classification Research Based on TW-CNN" (2018). PACIS 2018 Proceedings. 41.
https://aisel.aisnet.org/pacis2018/41

Download

COinS

PACIS 2018 Proceedings

Short Text Classification Research Based on TW-CNN

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

PACIS 2018 Proceedings

Short Text Classification Research Based on TW-CNN

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner