Evaluating the impact of papers, researchers and venues objectively is of great significance to academia and beyond. This may help researchers, research organizations, and government agencies in various ways, such as helping researchers find valuable papers and authoritative venues and helping research organizations identify good researchers. A few studies find that rather than treating citations equally, differentiating them is a promising way for impact evaluation of academic entities. However, most of those methods are metadata-based only and do not consider contents of cited and citing papers; while a few content-based methods are not sophisticated, and further improvement is possible. In this paper, we study the citation relationships between entities by content-based approaches. Especially, an ensemble learning method is used to classify citations into different strength types, and a word-embedding based method is used to estimate topical similarity of the citing and cited papers. A heterogeneous network is constructed with the weighted citation links and several other features. Based on the heterogeneous network that consists of three types of entities, we apply an iterative PageRank-like method to rank the impact of papers, authors and venues at the same time through mutual reinforcement. Experiments are conducted on an ACL dataset, and the results demonstrate that our method greatly outperforms state-of-the art competitors in improving ranking effectiveness of papers, authors and venues, as well as in being robust against malicious manipulation of citations.
- Scientific impact evaluation · Heterogeneous network · Content-based citation analysis · Citation strength · Topical similarity
- Heterogeneous network
- Content-based citation analysis
- Citation strength
- Topical similarity