Semantification of text through summarisation

  • Monika Joshi

Student thesis: Doctoral Thesis

Abstract

The research topic of this thesis is semantic representation of text document and abstractive summarisation. Designing a semantic representation of text document is an important research topic due to increasing unstructured textual information over web. To automatically process this textual information first it should be represented in a standard way. In addition, abundance of information has increased demand for shortening lengthy online text documents from different genre i.e. patent documents, news articles into useful summaries.

In this thesis, we present a systematic analysis of different semantic representations of text data. We have analysed two ways of constructing semantic graphs from the semantic relations of words. One graph is based on logical triples of subject-predicate-object and the other graph is based on dependencies other than logical triples. Our experiments on benchmark datasets for text summarisation confirmed the effectiveness of new proposed graph in text summarisation.

We have also looked beyond traditional representations and proposed inclusion of objectoriented principles into semantic graph design. This resulted in object oriented semantic graph of text document where important entities of text are projected as object and different properties of objects are extracted from text by utilising different natural language processing (NLP) processes. Further methodologies were developed to generate abstractive summary directly from this graph instead of the original document. We have analysed the abstractive summaries generated from object-oriented semantic graph by automated evaluation tool ROUGE and by manual evaluation. Although the ROUGE results achieved by object-oriented semantic graph could not surpass the states of the art that were achieved by xiii extractive summarisers but results were better than previous semantic graph based summarisation results.

An analysis was done on inclusion of various syntactic units into summary and the conclusion of this analysis is that including adjectives into summary improves the informativeness of summary, but inclusion of adverbs does not affect it. Overall, this thesis presents a theory and methodology to generate efficient semantic graphs from text document and gives strategies to use this graph as a replacement for original document in NLP processes such as text summarisation.

The research work presented in this thesis can be extended further by improving the graph generation capabilities to handle texts that are more complex and by improving the ranking methodologies for different graph elements. Quality of abstractive summaries generated from object-oriented semantic graph can be improved by including better natural language generation techniques.
Date of AwardMar 2019
Original languageEnglish
SponsorsVCRS
SupervisorH. Wang (Supervisor) & Sally McClean (Supervisor)

Keywords

  • summarisation
  • extractive summaries
  • text graph
  • dense graph

Cite this

'