WHICEB 2022 Proceedings

Auto-Classification of Similar Categories Based on an Improved BERT-MLDFA Method ——Taking E271 and E712.51 of Chinese Library Classification as an Example

Xiangdong Li, School of Information and Management, Wuhan University, Wuhan, 430072, China; Center for E-commerce Research and Development, Wuhan University, Wuhan, 430072, China
Jian Shi, School of Information and Management, Wuhan University, Wuhan, 430072, China
Qianru Sun, School of Information and Management, Wuhan University, Wuhan, 430072, China
Renxian Zuo, School of Information and Management, Wuhan University, Wuhan, 430072, ChinaFollow

Abstract

The high correlation and low distinction degree in the same low-level category of Chinese Library Classification lead to a high-cost of manual classification. The classification task of such similar texts requires more attention from researchers. Aiming at the difficulty of classifying the highly similar text content in the lower-level data of Chinese Library Classification, this paper proposes a Bidirectional Encoder Representations from Transformers(BERT) with Muti-Layers Dynamic Fusion based on Attention (BERT-MLDFA). The model dynamically integrates the parameters of different levels of BERT through a multi-level attention mechanism, which can capture the subtle semantic information of the text so as to distinguish texts of similar categories better. Its superiority is verified by comparing with baseline models such as Long Short Term Memory (LSTM), Convolutional Neural Network(CNN), BERT, etc. Taking E271 and E712.51 in Chinese Library Classification as the classification targets, our results show that the classification effect of baseline models such as LSTM, CNN and BERT is better than that of traditional machine learning methods such as k-nearest neighbor (KNN), naive bayes (NB) and support vector machine (SVM), among which the put forward BERT-MLDFA model performs the best and the Macro_F1 score reaches up to 0.983. As a result, the put forward model can effectively improve classification efficiency, reduce misclassification, and save large classification task costs.

Recommended Citation

Li, Xiangdong; Shi, Jian; Sun, Qianru; and Zuo, Renxian, "Auto-Classification of Similar Categories Based on an Improved BERT-MLDFA Method ——Taking E271 and E712.51 of Chinese Library Classification as an Example" (2022). WHICEB 2022 Proceedings. 66.
https://aisel.aisnet.org/whiceb2022/66

Download

COinS

WHICEB 2022 Proceedings

Auto-Classification of Similar Categories Based on an Improved BERT-MLDFA Method ——Taking E271 and E712.51 of Chinese Library Classification as an Example

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

WHICEB 2022 Proceedings

Auto-Classification of Similar Categories Based on an Improved BERT-MLDFA Method ——Taking E271 and E712.51 of Chinese Library Classification as an Example

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner