Track 5: Data Science & Business Analytics

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Lasse Lommel, Leuphana University Lüneburg, Institute of Information Systems, Lüneburg, Germany
Meike Riebling, Leuphana University Lüneburg, Institute of Information Systems, Lüneburg, Germany
Burkhardt Funk, Leuphana University Lüneburg, Institute of Information Systems, Lüneburg, Germany
Christian Junginger, Otto GmbH & Co KG, Data Science, Hamburg, Germany

Description

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.

Download

COinS

Feb 28th, 8:00 AM

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Track 5: Data Science & Business Analytics

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Description

Search

Browse

Author Corner

Track 5: Data Science & Business Analytics

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Presenter Information

Description

Share

Search

Browse

Author Corner