Abstract
Natural language processing (NLP), like other Machine Learning (ML) or Deep Leaning (DL), and data processing tasks, requires a large amount of data to be effective. Thus, one of the most significant challenges confronting ML/DL tasks, including NLP, is a lack of data. This is especially noticeable in the case of text data for niche languages like Polish. The paper present the development process of mobile application supporting the process of collecting and labelling data in Natural Language Processing. The application has been published for public and tested by several users who assisted in the process of data labelling. Using the collected data, an emotion classification machine learning model was created. It can predict which of the six basic emotions – anger, fear, joy, love, surprise, or sadness – is expressed the most in Polish text. The model is a feed-forward neural network built with TensorFlow and Keras. The experiments were carried out to check the performance of the model, and then the results were discussed. Moreover, the possibilities for further solution development were proposed, in order to increase its usability.
Recommended Citation
Gumińska, Urszula; Miłosz, Artur; Kowalski, Karol; and Poniszewska-Marańda, Aneta, "Process of collecting and labelling the data in natural language processing in Polish" (2023). Proceedings of the 2023 Pre-ICIS SIGDSA Symposium. 18.
https://aisel.aisnet.org/sigdsa2023/18
- Usage
- Downloads: 85
- Abstract Views: 62