Extracting visual words to represent images is useful in many applications. Existing approaches usually use traditional methods such as SIFT to extract features and clustering algorithms such as k-means to extract visual words. However, this kind of approaches has some drawbacks such as focusing on the local information without considering high-level semantics of images and being hard to determine the number of clusters. Things can get even worse in medical images because medical images are usually very similar to each other. In this paper, we propose a new approach based on deep learning model to extract visual words, which are then used to train classification model and topic model from images. To show the effectiveness of our proposed methods, we conducted experiments on real retinal images. Experimental results show that our proposed methods can achieve better accuracy in glaucoma diagnosis and can find meaningful topics in topic modeling.