Abstract
Analyzing annual reports of Thai public companies poses significant challenges due to the complexity and volume of unstructured textual data. To address this, we propose a Retrieval-Augmented Generation (RAG) framework that integrates dense retrieval techniques with large language models (LLMs) to facilitate natural language querying over financial documents. The framework utilizes the GTE-Large embedding model to semantically encode and retrieve relevant document segments from a vector store, which are then used by the Llama 3-8B model to generate informed, context-specific responses. A qualitative comparison between the RAG system and a naive LLM setup reveals that the RAG approach significantly reduces hallucinations and produces outputs that are more faithful to the source documents. The results underscore the importance of grounding language model outputs in external knowledge when dealing with domain-specific queries. This lightweight and modular framework demonstrates practical potential for scalable applications in financial analytics and similar domains requiring accurate information extraction. Future research will extend this work by incorporating advanced optimization techniques, deploying quantitative evaluation metrics to benchmark performance, and exploring real-world deployment strategies for interactive financial question-answering systems.
Recommended Citation
Deesiri, Burin and Chotisarn, Noptanit, "Enhancing Thai Annual Report Queries with Retrieval‑Augmented Generation" (2025). ICEB 2025 Proceedings (Hanoi, Vietnam). 51.
https://aisel.aisnet.org/iceb2025/51