Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the “dark knowledge” from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the “dark knowledge” of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model’s performance in the target domain. Our framework considers the “dark knowledge” learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness.