Abstract
The neural framework employed for the task of neural machine translation (NMT) usually consists of a stack of multiple encoding and decoding layers. However, only the source feature representation from the top-level encoder layer is leveraged by the decoder subnetwork during the generation of target sequence. These models do not fully exploit the useful source representations learned by the lower-level encoder layers. Furthermore, there is no guarantee that the top-level encoder layer encodes all the necessary source information required by the decoder for the target generation. Inspired by recent advances in deep representation learning, this paper proposes a Multi-Layer Multi-Head Attention (MLMHA) module to exploit the different source representations from the multi-layer encoder subnetwork. Specifically, the decoder is allowed a more direct access to multiple encoder layers during the target generation. This technique further improves the translation performance of the model. Also, exposing multiple encoder layers enhances the flow of gradient information between the two subnetworks. Experimental results on two IWSLT language translation tasks (Spanish-English and English-Vietnamese) and WMT’14 English-German demonstrate the effectiveness of allowing the decoder access to representations from multiple encoder layers. Specifically, the MLMHA approaches explored in this paper achieve improvements up to +0.71, +0.75 and +0.49 BLEU points over the Transformer baseline model on the English-German, Spanish-English, and English-Vietnamese translation tasks respectively.
Original language | English |
---|---|
Pages (from-to) | 51-82 |
Number of pages | 32 |
Journal | The Prague Bulletin of Mathematical Linguistics |
Volume | 115 |
DOIs | |
Publication status | Published (in print/issue) - 30 Oct 2020 |