Abstract

The high correlation and low distinction degree in the same low-level category of Chinese Library Classification lead to a high-cost of manual classification. The classification task of such similar texts requires more attention from researchers. Aiming at the difficulty of classifying the highly similar text content in the lower-level data of Chinese Library Classification, this paper proposes a Bidirectional Encoder Representations from Transformers(BERT) with Muti-Layers Dynamic Fusion based on Attention (BERT-MLDFA). The model dynamically integrates the parameters of different levels of BERT through a multi-level attention mechanism, which can capture the subtle semantic information of the text so as to distinguish texts of similar categories better. Its superiority is verified by comparing with baseline models such as Long Short Term Memory (LSTM), Convolutional Neural Network(CNN), BERT, etc. Taking E271 and E712.51 in Chinese Library Classification as the classification targets, our results show that the classification effect of baseline models such as LSTM, CNN and BERT is better than that of traditional machine learning methods such as k-nearest neighbor (KNN), naive bayes (NB) and support vector machine (SVM), among which the put forward BERT-MLDFA model performs the best and the Macro_F1 score reaches up to 0.983. As a result, the put forward model can effectively improve classification efficiency, reduce misclassification, and save large classification task costs.

Share

COinS