Existing research on cross-lingual retrieval can not take good advantage of large-scale pretrained language models such as multilingual BERT and XLM. In this paper, we hypothesize that the absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are key factors. We propose to directly finetune language models on the evaluation collection by making Transformers capable of accepting longer sequences. We introduce two novel retrieval-oriented pretraining tasks to further pretrain cross-lingual language models for downstream retrieval tasks, such as cross-lingual ad-hoc retrieval (CLIR) and cross-lingual question answering (CLQA). We construct distant supervision data from multilingual Wikipedia using section alignment to support retrieval-oriented language model pretraining. Experiments on multiple benchmark datasets show that our proposed model can significantly improve upon general multilingual language models on both cross-lingual retrieval setting and cross-lingual transfer setting. We make our pretraining implementation and checkpoints publicly available for future research.