Data fusion in information retrieval has been investigated by many researchers and quite a few data fusion methods have been proposed. However, their effect on effectiveness has not been well understood. In this paper, we apply statistical principles to data fusion and obtain some useful conclusions, which can be used as a guideline for data fusion methods. Based on that, CombSum, the linear combination methods, and the correlation methods can be justified in certain conditions. We also investigate how to improve the effectiveness of some existing data fusion methods such as CombSum and the linear combination method. Experimental results with TREC data are reported to support the conclusions.