The field of information security suffers from the lack of labelled entities. This study proposes a zero-shot hybrid approach, combining a clustering algorithm with a method for representing category labels, to classify fine-grained entity typing based on unified cybersecurity ontology (UCO) to address this issue. However, certain category labels in UCO do not have distinct domain features, while certain abbreviations cannot be obtained directly from word embedding using Word2vec. Thus, we propose a new method, referred to as mixed entities and hierarchy of UCO (MEHC), to represent the category labels. Moreover, to further improve the performance of fine-grained entity typing we propose the triClustering algorithm to re-cluster coarse-grained classification results or determine corresponding types for new entities, based on the theorem that the sum of two sides of a triangle is greater than the third. The experimental results prove that our triClustering algorithm can effectively shorten the computation time and that the proposed hybrid method is superior to other baselines for information security applications.
|Number of pages||12|
|Early online date||15 Sep 2021|
|Publication status||Published (in print/issue) - 28 Nov 2021|
Bibliographical noteFunding Information:
This work was supported by Major Science and Technology Project in Henan Province, China (grant No. 201300210500 ).
© 2021 Elsevier B.V.
- Clustering algorithm
- Fine-grained entity typing
- Information security
- Representation method for categories
- Unified cybersecurity ontology