In this work, a study of Machine Learning algorithms combined with various forms of word embedding vector representation of patent documents was carried out to analyze the performance of classifiers for an automatic process of searching and retrieving information in the patent domain. Data were obtained from WIPO and were selected to recover the most discriminating data, using a methodology for selecting documents based on the centroids of the classes, reducing the data set by 78%. The classifiers were built using the HyperOpt automatic learning tool to adjust the hyperparameters. A comparative analysis was performed between the eight classifiers combined with four distinct vector representations of the document. The best result obtained a performance of 83.36% accuracy in the test set, considered competitive when compared to other works that used the same data set and language.
Farias, Henrique C.; Martins, Claudia A.; and Francisco, Rafaela S., "Algoritmos de Classificação e Representação Word Embedding em Dados de Patentes" (2021). ISLA 2021 Proceedings. 9.