Script event prediction (SEP) aims to choose a correct subsequent event from a candidate list, given a chain of ordered context events. Event representation learning has been proposed and successfully applied to this task. Most previous methods learning representations mainly focus on coarse-grained connections at event or chain level, while ignoring more fine-grained connections between events. Here we propose a novel framework which can enhance the representation learning of events by mining their connections at multiple granularity levels, including argument level, event level and chain level. In our method, we first employ a masked self-attention mechanism to model the relations between the components of events (i.e. arguments). Then, a directed graph convolutional network is further utilized to model the temporal or causal relations between events in the chain. Finally, we introduce an attention module to the context event chain, so as to dynamically aggregate context events with respect to the current candidate event. By fusing threefold connections in a unified framework, our approach can learn more accurate argument/event/chain representations, and thus leads to better prediction performance. Comprehensive experiment results on public New York Times corpus demonstrate that our model outperforms other state-of-the-art baselines.