Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP — independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our model does not only achieve state-of-the-art predictive results, but also improves runtime performance by multiple orders of magnitude compared to standard methods on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes at previously unattainable scale.