ISLA 2021 Proceedings

Algoritmos de Classificação e Representação Word Embedding em Dados de Patentes

Henrique C. Farias, Universidade Federal de Mato GrossoFollow
Claudia A. Martins, Universidade Federal de Mato GrossoFollow
Rafaela S. Francisco, Universidade Federal de Mato GrossoFollow

Media is loading

Abstract

In this work, a study of Machine Learning algorithms combined with various forms of word embedding vector representation of patent documents was carried out to analyze the performance of classifiers for an automatic process of searching and retrieving information in the patent domain. Data were obtained from WIPO and were selected to recover the most discriminating data, using a methodology for selecting documents based on the centroids of the classes, reducing the data set by 78%. The classifiers were built using the HyperOpt automatic learning tool to adjust the hyperparameters. A comparative analysis was performed between the eight classifiers combined with four distinct vector representations of the document. The best result obtained a performance of 83.36% accuracy in the test set, considered competitive when compared to other works that used the same data set and language.

Recommended Citation

Farias, Henrique C.; Martins, Claudia A.; and Francisco, Rafaela S., "Algoritmos de Classificação e Representação Word Embedding em Dados de Patentes" (2021). ISLA 2021 Proceedings. 9.
https://aisel.aisnet.org/isla2021/9

Download

COinS

ISLA 2021 Proceedings

Algoritmos de Classificação e Representação Word Embedding em Dados de Patentes

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

ISLA 2021 Proceedings

Algoritmos de Classificação e Representação Word Embedding em Dados de Patentes

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner