The evolution of mobile apps has greatly changed the way that we live. It becomes increasingly important to understand and model the users on mobile apps. Instead of focusing on some specific app alone, it has become a popular paradigm to study the user behavior on various mobile apps in a symbiotic environment. In this paper, we study the task of user representation learning with both macro and micro interaction data on mobile apps. In specific, macro and micro interaction refer to user-app interaction or user-item interaction on some specific app, respectively. By combining the two kinds of user data, it is expected to derive a more comprehensive, robust user representation model on mobile apps. In order to effectively fuse the information across the two views, we propose a novel macro-micro fusion network for user representation learning on mobile apps. With a Transformer architecture as the base model, we design a representation fusion component that is able to capture the category-based semantic alignment at the user level. After such semantic alignment, the information across the two views can be adaptively fused in our approach. Furthermore, we adopt mutual information maximization to derive a self-supervised loss to enhance the learning of our fusion network. Extensive experiments with three downstream tasks on two real-world datasets have demonstrated the effectiveness of our approach.