Recommendation models are usually trained on observational interaction data. However, observational interaction data could result from users’ conformity towards popular items, which entangles users’ real interest. Existing methods tracks this problem as eliminating popularity bias, e.g., by re-weighting training samples or leveraging a small fraction of unbiased data. However, the variety of user conformity is ignored by these approaches, and different causes of an interaction are bundled together as unified representations, hence robustness and interpretability are not guaranteed when underlying causes are changing. In this paper, we present DICE, a general framework that learns representations where interest and conformity are structurally disentangled, and various backbone recommendation models could be smoothly integrated. We assign user and items with separate embeddings for interest and conformity, and make each embedding capture only one cause by training with cause-specific data and imposing direct disentanglement supervision on the embedding distribution. Our proposed methodology outperforms state-of-the-art baselines with remarkable improvements on two real-world datasets on top of various backbone models. We further demonstrate that the learned embeddings successfully capture the desired causes, and show that DICE guarantees the robustness and interpretability of recommendation.