Start Date

14-12-2012 12:00 AM

Description

In this paper, we propose a novel problem of summarizing textual corporate risk factor disclosure, which aims to simultaneously infer the risk types across corpus and assign each risk factor to its most probable risk type. To solve the problem, we develop a variation of LDA topic model called Sent-LDA. The variational EM learning algorithm, which guarantees fast convergence, is derived and implemented for our model. Experiments show that our model is much more efficient and effective than LDA for solving our proposed problem. Specifically, our model is 50 times faster than LDA in the same conditions, and generates better topics for summarization than LDA. Our model is visualized in a publicly available system.

Share

COinS
 
Dec 14th, 12:00 AM

Summarization of Corporate Risk Factor Disclosure through Topic Modeling

In this paper, we propose a novel problem of summarizing textual corporate risk factor disclosure, which aims to simultaneously infer the risk types across corpus and assign each risk factor to its most probable risk type. To solve the problem, we develop a variation of LDA topic model called Sent-LDA. The variational EM learning algorithm, which guarantees fast convergence, is derived and implemented for our model. Experiments show that our model is much more efficient and effective than LDA for solving our proposed problem. Specifically, our model is 50 times faster than LDA in the same conditions, and generates better topics for summarization than LDA. Our model is visualized in a publicly available system.