Abstract

In view of the fact that the product reviews have the characteristics of short text and nonstandard expression words, this research aims to explore the method of automatic categorization of product reviews by the product categories and its reasons. The core words set of the training set is constructed by TF-IDF and LDA, and short texts are extended by Word2Vec similarity calculation method. After extension, the product reviews are categorized by product categories based on BERT model. The method is compared with the method that based on BERT model without extension and the method of using HowNet similarity calculation to extend based on BERT model. Facing the characteristics of nonstandard expression words, the corresponding experiment is designed to counter test to the effectiveness of the method proposed in this paper. For the product reviews after extension when using BERT classification, the F1 value obtained by the method proposed in this paper is 2.1 percent higher than that when not extended, and it is 0.9 percent higher than that when using Hownet similarity calculation method. The reasons for the effectiveness of the method proposed in this paper are analyzed from the aspects of basic principles, different word similarity calculation methods, and words used methods. The method proposed in this paper can effectively improve the classification performance of product reviews when organizing information by product categories.

Share

COinS