Abstract
In the context of TV and social media surveillance, constructing models to automate topic identification of short texts is key task. This paper formalizes the topic classification as a top-K multinomial classification problem and constructs worth-to-consider models for practical usage. We describe the full data processing pipeline, discussing about dataset selection, text preprocessing, feature extraction, model selection and learning, including hyperparameter optimization. When computing time and resources are limited, we show that a classical model like SVM performs as well as an advanced deep neural network, but with shorter model training time.
Paper Type
Full Paper
DOI
10.62036/ISD.2022.50
Topic Classification for Short Texts
In the context of TV and social media surveillance, constructing models to automate topic identification of short texts is key task. This paper formalizes the topic classification as a top-K multinomial classification problem and constructs worth-to-consider models for practical usage. We describe the full data processing pipeline, discussing about dataset selection, text preprocessing, feature extraction, model selection and learning, including hyperparameter optimization. When computing time and resources are limited, we show that a classical model like SVM performs as well as an advanced deep neural network, but with shorter model training time.
Recommended Citation
Neagu, D. C., Rus, A. B., Grec, M., Boroianu, M. A., & Silaghi, G. C. (2022). Topic Classification for Short Texts. In R. A. Buchmann, G. C. Silaghi, D. Bufnea, V. Niculescu, G. Czibula, C. Barry, M. Lang, H. Linger, & C. Schneider (Eds.), Information Systems Development: Artificial Intelligence for Information Systems Development and Operations (ISD2022 Proceedings). Cluj-Napoca, Romania: Risoprint. ISBN: 978-973-53-2917-4. https://doi.org/10.62036/ISD.2022.50