Distilling Task Specific Models from Ernie for Chinese Medical Intention Classification

Zumin Wang, Chang Xu, Liming Chen, Jing Qin, Changqing Ji

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The improvement of intention classification of medical query can effectively improve the performance of search engines and Question-Answering systems, and further improve medical services. The Pre-Trained language models has achieved good performance in various NLP tasks, but the slow reasoning speed and high requirements for storing computing resources of the model make it difficult for medical institutions to deploy relevant models. In this paper, we propose a knowledge distillation model based on ERNIE to solve the Chinese medical intention classification task, in order to provide a Small-Scale and more efficient intention classification model. We propose a Two-Stage framework, which is distilled on specific domain knowledge and tasks, and we use data augmentation methods to minimize the loss of accuracy caused by model compression. Specifically, we distill the Embedding layer and Transformer layer of the teacher model in the Domain-Knowledge-Specific-Distillation stage, so that our student model can better capture the general domain and Medical specific domain knowledge of the teacher model. In the Intent-Classification Task-specific Distillation stage, we also distilled the Prediction layer and added Soft-Label and Hard-Label to calculate the Prediction layer loss to ensure that the student model acquires knowledge of specific tasks. Considering the lack of labeled data available for training in the medical field and intention classification task, we use word exchange and whole entity masking methods to augment the labeled data and augment the generalization ability of student models. We conducted experiments on a publicly available CBLUE dataset, and the experimental results showed that our proposed 4-layer student model retained more than 98.6% of the language comprehension capabilities of the original 12-layer teacher model—ERNIE, while reducing the computational resources required and significantly accelerating reasoning speed.
Original languageEnglish
Title of host publication2023 IEEE Smart World Congress (SWC)
PublisherIEEE
Pages1-6
Number of pages6
ISBN (Electronic)979-8-3503-1980-4
ISBN (Print)979-8-3503-1981-1
DOIs
Publication statusPublished online - 1 Mar 2024

Publication series

Name2023 IEEE Smart World Congress (SWC)
PublisherIEEE Control Society

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Intent Classification
  • Knowledge Distillation
  • Medical NLP
  • Pre-Trained Language Models

Fingerprint

Dive into the research topics of 'Distilling Task Specific Models from Ernie for Chinese Medical Intention Classification'. Together they form a unique fingerprint.

Cite this