Abstract
Self-attention-based encoder-decoder frameworks have drawn increasing attention in recent years. The self-attention mechanism generates contextual representations by attending to all tokens in the sentence. Despite improvements in performance, recent research argues that the self-attention mechanism tends to concentrate more on the global context with less emphasis on the contextual information available within the local neighbourhood of tokens. This work presents the Dual Contextual (DC) module, an extension of the conventional self-attention unit, to effectively leverage both the local and global contextual information. The goal is to further improve the sentence representation ability of the encoder and decoder subnetworks, thus enhancing the overall performance of the translation model. Experimental results on WMT’14 English-German (En→De) and eight IWSLT translation tasks show that the DC module can further improve the translation performance of the Transformer model.
Original language | English |
---|---|
Journal | Machine Translation |
Early online date | 12 Oct 2021 |
DOIs | |
Publication status | Published online - 12 Oct 2021 |
Bibliographical note
Publisher Copyright:© 2021, The Author(s).
Keywords
- Deep neural representation learning
- Self-attention networks
- Local contexts
- Global contexts