Abstract
The field of information security suffers from the lack of labelled entities. This study proposes a zero-shot hybrid approach, combining a clustering algorithm with a method for representing category labels, to classify fine-grained entity typing based on unified cybersecurity ontology (UCO) to address this issue. However, certain category labels in UCO do not have distinct domain features, while certain abbreviations cannot be obtained directly from word embedding using Word2vec. Thus, we propose a new method, referred to as mixed entities and hierarchy of UCO (MEHC), to represent the category labels. Moreover, to further improve the performance of fine-grained entity typing we propose the triClustering algorithm to re-cluster coarse-grained classification results or determine corresponding types for new entities, based on the theorem that the sum of two sides of a triangle is greater than the third. The experimental results prove that our triClustering algorithm can effectively shorten the computation time and that the proposed hybrid method is superior to other baselines for information security applications.
Original language | English |
---|---|
Article number | 107472 |
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | Knowledge-Based Systems |
Volume | 232 |
Early online date | 15 Sept 2021 |
DOIs | |
Publication status | Published (in print/issue) - 28 Nov 2021 |
Bibliographical note
Funding Information:This work was supported by Major Science and Technology Project in Henan Province, China (grant No. 201300210500 ).
Publisher Copyright:
© 2021 Elsevier B.V.
Keywords
- Clustering algorithm
- Fine-grained entity typing
- Information security
- Representation method for categories
- Unified cybersecurity ontology