By improving the accuracy of sales forecasting, this paper provides support for fashion product sales enterprises to make better inventory management and operational decisions. The deep neural network is introduced into the construction of multimodal features, and the internal structure of different modes, such as historical sales features, picture features, and basic attribute features of products, are fully considered, and finally the sales forecasting model of fashion products based on multimodal feature fusion is constructed. In addition, combined with the actual data of the enterprise, the proposed model is compared with the exponential regression model and shallow neural network model. The paper finds that multimodal features and deep learning representation method has better performance than traditional methods (exponential regression and shallow neural network) in the task of predicting sales of fashion products. The results help enterprises use the deep learning method and the data of multiple modal to make accurate sales forecast.