Existing works for document-level sentiment classification task treat the review document as an overall text unit, performing feature extraction with various sophisticated model architectures. In this paper, we draw inspiration from fine-grained sentiment analysis, proposing to first learn the latent target-opinion distribution behind the documents, and then leverage such fine-grained prior knowledge into the classification process. We model the latent target-opinion distribution as hierarchical variables, where global-level variable captures the overall target and opinion, and local-level variables retrieve the detailed opinion clues at the word level. The proposed method consists of two main parts: a variational module and a classification module. We employ the conditional variational autoencoder to make reconstructions of the document, during which the user and product information can be integrated. In the classification module, we build a hierarchical model based on Transformer encoders, where the local-level and global-level prior distribution representations induced from the variational module are injected into the word-level and sentence-level Transformers, respectively. Experimental results on benchmark datasets show that the proposed method significantly outperforms strong baselines, achieving the state-of-the-art performance. Further analysis shows that our model is capable of capturing the latent fine-grained target and opinion prior information, which is highly effective for improving the task performance.