Sentence representation learning and generation for neural machine translation

  • Isaac Kojo Essel Ampomah

Student thesis: Doctoral Thesis


Machine translation (MT) systems have become indispensable tools allowing for the automatic translation of texts from one language to another. Earlier MT models ranged from rule-based systems to statistical MT (SMT) models. However, in recent years, neural machine translation (NMT) has attracted greater attention within the MT research community. The strength and success of NMT models over prior systems can be attributed to their ability to automatically learn the linguistic features required for the translation task without explicit feature engineering. Typically, NMT systems employ an encoder-decoder architecture to model and learn the target translation. The encoder learns the semantic representation of the source sentence from which the decoder generates the target translation. Therefore, the translation performance of an NMT model relies heavily on the representation and generation ability of both encoder and decoder subnetworks. Besides, the overall performance of any MT system is also affected by the linguistic structures and properties of the language pairs under consideration. This implies that the architectural design of the NMT system is of significant importance to efficiently learn the necessary linguistic information to achieve higher translation performance across different language pairs. Accordingly, the overall aim of this thesis is to design and build architectures to perform
the translation task more efficiently. The contributions of the thesis are in two main folds:
(1) To ensure minimal loss of source information during the target translation, two main joint attention strategies are presented in Chapter 3 to allow the decoder access to source information captured by different encoding layers. (2) Enhancing the sentence representational ability of the encoder and decoder subnetworks using strategies including (a) exploiting the strengths of multitask learning and auxiliary training approaches to design an encoder-based multi-level supervision framework in Chapter 4; (b) improving the performance of the self-attention mechanism at capturing efficiently the local and global contextual
information and dependencies in Chapter 5. Evaluations on multiple language translation tasks show that the approaches proposed in this work significantly enhance the sentence representation and generation ability of the encoder-decoder architecture consequently improving the overall translation performance of the NMT model.
Date of AwardMay 2021
Original languageEnglish
SupervisorSally McClean (Supervisor), Glenn Hawe (Supervisor) & Zhiwei Lin (Supervisor)


  • Local Contexts
  • Attention Neural Networks
  • Multi-Level Supervision
  • Contextual Modelling

Cite this