CN111950302A - Knowledge distillation-based machine translation model training method, device, equipment and medium - Google Patents

Knowledge distillation-based machine translation model training method, device, equipment and medium Download PDF

Info

Publication number
CN111950302A
CN111950302A CN202010843014.5A CN202010843014A CN111950302A CN 111950302 A CN111950302 A CN 111950302A CN 202010843014 A CN202010843014 A CN 202010843014A CN 111950302 A CN111950302 A CN 111950302A
Authority
CN
China
Prior art keywords
model
module
training
student
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010843014.5A
Other languages
Chinese (zh)
Other versions
CN111950302B (en
Inventor
袁秋龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhilv Information Technology Co ltd
Original Assignee
Shanghai Zhilv Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhilv Information Technology Co ltd filed Critical Shanghai Zhilv Information Technology Co ltd
Priority to CN202010843014.5A priority Critical patent/CN111950302B/en
Publication of CN111950302A publication Critical patent/CN111950302A/en
Application granted granted Critical
Publication of CN111950302B publication Critical patent/CN111950302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a knowledge distillation-based machine translation model training method, a knowledge distillation-based machine translation model training device, knowledge distillation-based machine translation model training equipment and a knowledge distillation-based machine translation model training medium, wherein the method comprises the following steps of: acquiring a teacher model and a student model; acquiring a sample data set containing training corpora; inputting the training corpus into the teacher model to obtain intermediate content output by the simplified module in the teacher model and a final result output by the teacher model; inputting the training corpus into the student model to obtain intermediate content output by the simplified module in the student model and a final result output by the student model; determining a model loss function according to the labeled translation label of the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model; and performing iterative training on the student model according to the model loss function. The teacher model is used for training the student model, and the performance effect of the model is ensured under the condition of simplified model structure.

Description

Knowledge distillation-based machine translation model training method, device, equipment and medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a knowledge distillation-based machine translation model training method, a knowledge distillation-based machine translation model training device, knowledge distillation-based machine translation model training equipment and a knowledge distillation-based machine translation model training medium.
Background
Machine translation (machine translation), also known as auto-translation, is a process of converting a natural source language into another natural target language using a computer, and generally refers to the translation of sentences and full text between natural languages. Machine translation is a branch of Natural Language Processing (Natural Language Processing), and has a close-up relationship with Computational Linguistics (Computational Linguistics) and Natural Language Understanding (Natural Language Understanding). The idea of using machines for translation was first proposed by Warren Weaver in 1949. For a long time (50 s to 80 s of the 20 th century), machine translation has been accomplished by studying linguistic information in both source and target languages, i.e., generating translations based on dictionaries and grammars, which is known as rule-based machine translation (RBMT). As statistics have evolved, researchers have begun to apply statistical models to machine translation, which is based on analysis of bilingual text databases to generate translation results. This approach, known as Statistical Machine Translation (SMT), performed better than RBMT and dominated this field between the 1980 s and 2000 s. In 1997, Ramon Neco and Mikel Forcada proposed the idea of using Encoder-Decoder (Encoder-Decoder) architecture for machine translation. In 2003 years later, a research team led by Yoshua Bengio, university of montreal, developed a language model based on a neural network, and improved the data sparsity problem of the traditional SMT model. The research work of the users lays a foundation for the application of the neural network in machine translation in the future.
In 2017, Google (Google) proposed a transform model in the paper "Attention Is All You Need". The model based on the self-attention mechanism can well solve the problem of the sequence model, is applied to a machine translation task, and greatly improves the translation effect. On the one hand, however, with the development of the Transformer series from BERT to GPT2 to XLNet model, the capacity of the translation model increases, and although the translation effect can be improved to some extent, the inference performance (delay and throughput) of the translation model on line is worse and worse, and how to improve the inference performance of the translation model on line is a key factor for determining whether the translation model can be well deployed and providing user-friendly service; on the other hand, with the dramatic increase of the number of the accessed foreign language languages, how to effectively compress the model on the premise of not losing the translation effect of the model is convenient for storing and releasing the model, and the method is an important problem to be faced for engineering deployment of the algorithm model.
Disclosure of Invention
In view of the above deficiencies of the prior art, an object of the present invention is to provide a machine translation model training method, apparatus, device and medium based on knowledge distillation, so as to train a simplified student model according to a teacher model on the premise of not affecting the model effect as much as possible, improve the throughput when the model is deployed on line, reduce the delay of the model, and further improve the user experience.
In order to achieve the above object, the present invention provides a knowledge-based machine translation model training method, including:
acquiring a trained teacher model and untrained student models, wherein the student models are obtained by simplifying partial modules in the teacher model;
acquiring a sample data set, wherein the sample data set comprises a plurality of training corpora and labeled translation labels corresponding to the training corpora;
inputting the training corpus into the teacher model for processing to obtain intermediate content output by the simplified module in the teacher model and a final result output by the teacher model;
inputting the training corpus into the student model for processing to obtain intermediate content output by a simplified module in the student model and a final result output by the student model;
determining a model loss function according to a label translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model;
and performing iterative training on the student model according to the model loss function.
In a preferred embodiment of the present invention, the determining a model loss function according to the labeled translation label corresponding to the corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model, and the final result output by the student model, includes:
determining a first loss function according to the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model;
determining a second loss function according to the labeled translation label corresponding to the training corpus and the final result output by the student model;
determining a third loss function according to the final result output by the teacher model and the final result output by the student model;
and determining the model loss function according to the first loss function, the second loss function and the third loss function.
In a preferred embodiment of the present invention, the teacher model and the student model respectively comprise an embedding module, an encoding module, a decoding module and an output module.
In a preferred embodiment of the present invention, the student model and the teacher model have the same structure as the embedding module, the encoding module and the output module of the teacher model, the decoding module of the student model is obtained by simplifying the decoding module of the teacher model, and a full connection layer is provided between the decoding module of the student model and the decoding module of the teacher model.
In a preferred embodiment of the present invention, after the sample data set is obtained, the method further includes: and preprocessing the training corpus.
In a preferred embodiment of the present invention, the preprocessing the corpus includes:
converting characters in the training corpus into corresponding numerical values;
and dividing the training corpuses into different batches, and adjusting the training corpuses of each batch to be the same in length in a zero value filling mode.
In order to achieve the above object, the present invention further provides a knowledge-based machine translation model training apparatus, including:
the model acquisition module is used for acquiring a trained teacher model and untrained student models, and the student models are obtained by simplifying part of modules in the teacher model;
the system comprises a sample acquisition module, a translation module and a translation module, wherein the sample acquisition module is used for acquiring a sample data set, and the sample data set comprises a plurality of training corpora and labeled translation labels corresponding to the training corpora;
the teacher model processing module is used for inputting the training corpora into the teacher model for processing to obtain intermediate contents output by the simplified module in the teacher model and a final result output by the teacher model;
the student model processing module is used for inputting the training corpus into the student model for processing to obtain intermediate content output by the simplified module in the student model and a final result output by the student model;
the model loss function determining module is used for determining a model loss function according to the labeled translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model;
and the model training module is used for carrying out iterative training on the student model according to the model loss function.
In a preferred embodiment of the present invention, the model loss function determining module includes:
a first loss function determination unit configured to determine a first loss function based on the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model;
a second loss function determining unit, configured to determine a second loss function according to the labeled translation label corresponding to the training corpus and the final result output by the student model;
a third loss function determination unit configured to determine a third loss function according to a final result output by the teacher model and a final result output by the student model;
a model loss function determining unit, configured to determine the model loss function according to the first loss function, the second loss function, and the third loss function.
In a preferred embodiment of the present invention, the teacher model and the student model respectively comprise an embedding module, an encoding module, a decoding module and an output module.
In a preferred embodiment of the present invention, the student model and the teacher model have the same structure as the embedding module, the encoding module and the output module of the teacher model, the decoding module of the student model is obtained by simplifying the decoding module of the teacher model, and a full connection layer is provided between the decoding module of the student model and the decoding module of the teacher model.
In a preferred embodiment of the present invention, the apparatus further comprises: and the preprocessing module is used for preprocessing the training corpus after the sample data set is acquired.
In a preferred embodiment of the present invention, the preprocessing module includes:
the numerical value conversion unit is used for converting the characters in the training corpus into corresponding numerical values;
and the length adjusting unit is used for dividing the training corpuses into different batches and adjusting the training corpuses of each batch to be the same length in a zero value filling mode.
In order to achieve the above object, the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the aforementioned machine translation model training method when executing the computer program.
To achieve the above object, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned machine translation model training step.
By adopting the technical scheme, the invention has the following beneficial effects:
according to the label translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model, a model loss function is determined, and iterative training is carried out on the student model according to the model loss function. Compared with a teacher model, the student model obtained by training simplifies the model structure, and the intermediate content and the final result output by the teacher model are utilized to supervise in the training process, so that the performance and the effect of the model can be ensured as far as possible under the condition that the parameters of the student model are reduced.
Drawings
FIG. 1 is a flowchart of a knowledge-based distillation machine translation model training method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a training method of a knowledge-based distillation machine translation model in embodiment 1 of the present invention;
FIG. 3 is a block diagram of a knowledge-based distillation machine translation model training apparatus according to embodiment 2 of the present invention;
fig. 4 is a hardware architecture diagram of an electronic device in embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Example 1
The embodiment provides a knowledge distillation-based machine translation model training method, as shown in fig. 1, the method specifically includes the following steps:
and S1, acquiring a trained teacher model and untrained student models, wherein the student models are obtained by simplifying partial modules in the teacher model.
The knowledge distillation is a network model compression method, a teacher model-student model framework is constructed, the teacher model guides the training of the student model, the knowledge learned by the teacher model with a complex model structure and large parameter quantity about feature representation is distilled out, and the knowledge is transferred to the student model with a simple model structure, small parameter quantity and weak learning ability. The knowledge distillation mode can improve the performance of the model without increasing the complexity of the student model.
In the present embodiment, a machine translation model that has been trained is prepared in advance as a teacher model and a student model obtained by simplifying part of modules in the teacher model. The teacher model is a prediction mode, and the prediction mode represents that the model parameters of the teacher model are frozen, namely the model parameters of the teacher model cannot be modified in the subsequent training process; the student model is a training mode, and model parameters in the student model can be modified in the training process.
For example, the teacher model and the student model in the present embodiment may be translation models having a transform as a basic structure. As shown in fig. 2, the teacher model and the student model respectively include an embedding module, an encoding module, a decoding module, and an output module, which are sequentially cascaded, where the embedding module may include a corpus embedding layer and a language type embedding layer. Because the embedded module, the coding module and the output module occupy less during reasoning, the embedded module, the coding module and the output module of the student model are consistent with the embedded module, the coding module and the output module of the teacher model in structure, are not reduced, and parameters can be shared. That is, the present embodiment simply compresses the decoding module of the teacher model by reducing the number of decoding layers in the decoding module) as the decoding module of the student model. In order to ensure the translation effect of the student model, the number of the neurons of the embedding module and the output module of the student model is consistent with that of the neurons of the embedding module and the output module of the teacher model.
In addition, in order to ensure that the dimension of the intermediate content output by the decoding module in the student model is consistent with the dimension of the intermediate content output by the decoding module in the teacher model, so as to perform the subsequent loss function calculation, a full connection layer is arranged between the decoding module of the student model and the decoding module of the teacher model.
S2, obtaining a sample data set, wherein the sample data set comprises a plurality of training corpora and label translation labels corresponding to the training corpora, and the training corpora can also carry corresponding language types.
S3, preprocessing the sample data set. The method specifically comprises the following steps: firstly, the characters in the training corpus are converted into corresponding numerical values, and the training corpus is divided into different batches. Because the training corpuses are different in length, the training corpuses in each batch can be adjusted to be the same in length in a zero value filling mode. The zero-value filling method is to fill the missing characters in other sentences with 0 on the basis of the longest sentence in the same batch of training corpus, so that the lengths of the characters are adjusted to be consistent with the longest sentence. Thus, input data with a size of Batch _ size and Sequence _ length is obtained, where Batch _ size refers to the number of corpora in the same Batch, and Sequence _ length refers to the length of the longest corpus in the same Batch.
And S4, inputting the preprocessed corpus into the teacher model for processing to obtain the intermediate content output by the simplified module in the teacher model and the final result output by the teacher model.
For example, when the teacher model has the structure shown in fig. 2, the training corpus is first input to the embedding module of the teacher model, the training corpus and the language type thereof are mapped through the corpus layer and the language type layer of the embedding module, the corpus embedding result and the language type embedding result are merged and then input to the encoding module for feature encoding, then feature decoding is performed through the decoding module, the intermediate content output by the decoding module is collected, and finally the decoding result is processed through the output module to obtain the final result output by the teacher model.
And S5, inputting the training corpus into the student model for processing to obtain the intermediate content output by the simplified module in the student model and the final result output by the student model.
For example, when the student model has the structure shown in fig. 2, the training corpus is first input to the embedding module of the student model, the training corpus and the language type thereof are mapped through the corpus layer and the language type layer of the embedding module, the corpus embedding result and the language type embedding result are merged and then input to the encoding module for feature encoding, then feature decoding is performed through the decoding module, the intermediate content output by the decoding module is collected, and finally the decoding result is processed through the output module to obtain the final result output by the student model.
And S6, determining a model loss function according to the label translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model. The specific implementation process of the step is as follows:
and S61, determining a first loss function according to the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model.
For example, when the student model is the structure shown in fig. 2, the first loss function L is calculated according to the following equation (1)AT_FMT
Figure BDA0002642114170000091
Wherein C represents the number of decoding layers of the decoding module in the student model, DklRepresenting a function for calculating the KL divergence,
Figure BDA0002642114170000092
shows the result of the content output by the c-th decoding layer in the teacher model after being processed by the full-connection layer,
Figure BDA0002642114170000093
and the processing result output by the decoding layer of the layer c in the student model is represented.
And S62, determining a second loss function according to the label translation label corresponding to the training corpus and the final result output by the student model.
For example, when the student model is the structure shown in fig. 2, the second loss function L is calculated according to the following equation (2)hard
Lhard={-p′ijlog(pij)-(1-p′ij)log(1-pij)} (2)
Wherein log (#) represents a logarithmic function, pijProbability value p 'representing that ith word output by the student model corresponds to jth translation tag'ijProbability value (p'ijObtained according to the labeled translation label corresponding to the training corpus).
And S63, determining a third loss function according to the final result output by the teacher model and the final result output by the student model.
For example, when the student model is the structure shown in fig. 2, the third loss function L is calculated according to the following equation (3)soft
Figure BDA0002642114170000094
Wherein log (#) represents a logarithmic function, pijRepresenting the probability value of the ith word output by the student model corresponding to the jth translation tag,
Figure BDA0002642114170000095
and representing the probability value of the ith word output by the teacher model corresponding to the jth translation tag.
S64, according to the first loss function LAT_FMTA second loss function LhardAnd a third loss function LsoftAnd determining the model loss function.
For example, the model Loss function Loss is calculated according to the following equation (4)all
Lossall=αLhard+(1-α)Lsoft+βLAT-FMT (4)
Wherein alpha and beta respectively represent corresponding weight coefficients of loss values, alpha belongs to (0, 1), and beta belongs to R, and the specific values can be preset according to experience.
And S7, training the student model according to the model loss function, namely updating the parameters of the student model according to the loss function.
The process of training the model according to the loss function is an iterative process, and whether a preset training termination condition is met or not is judged once training. If the training termination condition is not satisfied, continuing the training according to the steps S4 to S7 until the training termination condition is satisfied.
In one possible implementation, satisfying the training termination condition includes, but is not limited to, the following three cases: first, the number of iterative trainings reaches a threshold number. The number threshold may be set empirically, or may be flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application. Second, modelThe loss function is less than a loss threshold. The loss threshold may be set empirically, or may be freely adjusted according to an application scenario, which is not limited in this embodiment of the application. Third, the model loss function converges. The convergence of the model loss function means that the fluctuation range of the model loss function is within a reference range in the training result of the reference times along with the increase of the iterative training times. For example, assume a reference range of-10-3~10-3Assume that the reference number is 10. If the model loss function has the fluctuation range of-10 in the 10 times of iterative training results-3~10-3And (4) considering the model loss function to be converged. When any one of the conditions is met, the condition that the training termination condition is met is indicated, and the training of the student model is completed.
In the process of updating the model parameters by using the model loss function, the optimization can be performed by adopting an Adaptive Moment Estimation (Adam) optimization algorithm. In the training process, a coding module lr of the student modelebIs less than or equal to the learning rate lr of the decoding moduledb
In addition, a hierarchical training mode can be used for reducing the decoding modules of the student models step by step in the training process. As shown in fig. 2, after obtaining a student model (including L decoding layers) by training according to a teacher model (including K decoding layers), the student model after training is used as a new teacher model to train a student model with a smaller number of decoding layers, and so on until obtaining a student model only including a predetermined number of (N) decoding layers by training, where K > M > N. In this embodiment, the compression ratio of the student model is appropriately selected according to the improvement of the theoretical performance of the translation model and the compromise of the effect of the translation model. And after the student model training is finished, removing the teacher model.
The student model that this embodiment training obtained has simplified the model structure to through utilizing the intermediate content and the final result of teacher's model output to supervise at the training in-process, make the performance, the effect of model are guaranteed as far as possible under the condition that the student model can the parameter become less, because the model structure of student model is simplified, therefore promoted the model and deployed the throughput when online, reduced the model and delayed, and then promoted user experience.
It should be noted that the foregoing embodiments are described as a series of acts or combinations for simplicity in explanation, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Example 2
The embodiment provides a knowledge-based distillation machine translation model training device, as shown in fig. 3, the device 1 specifically includes: the model training system comprises a model acquisition module 11, a sample acquisition module 12, a preprocessing module 13, a teacher model processing module 14, a student model processing module 15, a model loss function determination module 16 and a model training module 17.
Each module is described in detail below:
the model obtaining module 11 is configured to obtain a trained teacher model and untrained student models, where the student models are obtained by simplifying some modules in the teacher model.
The knowledge distillation is a network model compression method, a teacher model-student model framework is constructed, the teacher model guides the training of the student model, the knowledge learned by the teacher model with a complex model structure and large parameter quantity about feature representation is distilled out, and the knowledge is transferred to the student model with a simple model structure, small parameter quantity and weak learning ability. The knowledge distillation mode can improve the performance of the model without increasing the complexity of the student model.
In the present embodiment, a machine translation model that has been trained is prepared in advance as a teacher model and a student model obtained by simplifying part of modules in the teacher model. The teacher model is a prediction mode, and the prediction mode represents that the model parameters of the teacher model are frozen, namely the model parameters of the teacher model cannot be modified in the subsequent training process; the student model is a training mode, and model parameters in the student model can be modified in the training process.
For example, the teacher model and the student model in the present embodiment may be translation models having a transform as a basic structure. As shown in fig. 2, the teacher model and the student model respectively include an embedding module, an encoding module, a decoding module, and an output module, which are sequentially cascaded, where the embedding module may include a corpus embedding layer and a language type embedding layer. Because the embedded module, the coding module and the output module occupy less during reasoning, the embedded module, the coding module and the output module of the student model are consistent with the embedded module, the coding module and the output module of the teacher model in structure, are not reduced, and parameters can be shared. That is, the present embodiment simply compresses the decoding module of the teacher model by reducing the number of decoding layers in the decoding module) as the decoding module of the student model. In order to ensure the translation effect of the student model, the number of the neurons of the embedding module and the output module of the student model is consistent with that of the neurons of the embedding module and the output module of the teacher model.
In addition, in order to ensure that the dimension of the intermediate content output by the decoding module in the student model is consistent with the dimension of the intermediate content output by the decoding module in the teacher model, so as to perform the subsequent loss function calculation, a full connection layer is arranged between the decoding module of the student model and the decoding module of the teacher model.
The sample obtaining module 12 is configured to obtain a sample data set, where the sample data set includes a plurality of training corpora and labeled translation tags corresponding to the training corpora, and the training corpora may also carry corresponding language types.
The preprocessing module 13 is configured to preprocess the sample data set. The method specifically comprises the following steps: a numerical value conversion unit 131, configured to convert characters in the corpus into corresponding numerical values; the length adjustment unit 132 is configured to divide the corpus into different batches, and adjust each batch of the corpus to have the same length through a zero-value filling method because the corpus has different lengths. The zero-value filling method is to fill the missing characters in other sentences with 0 on the basis of the longest sentence in the same batch of training corpus, so that the lengths of the characters are adjusted to be consistent with the longest sentence. Thus, input data with a size of Batch _ size and Sequence _ length is obtained, where Batch _ size refers to the number of corpora in the same Batch, and Sequence _ length refers to the length of the longest corpus in the same Batch.
The teacher model processing module 14 is configured to input the preprocessed corpus into the teacher model for processing, so as to obtain intermediate content output by the simplified module in the teacher model and a final result output by the teacher model.
For example, when the teacher model has the structure shown in fig. 2, the training corpus is first input to the embedding module of the teacher model, the training corpus and the language type thereof are mapped through the corpus layer and the language type layer of the embedding module, the corpus embedding result and the language type embedding result are merged and then input to the encoding module for feature encoding, then feature decoding is performed through the decoding module, the intermediate content output by the decoding module is collected, and finally the decoding result is processed through the output module to obtain the final result output by the teacher model.
The student model processing module 15 is configured to input the corpus into the student model for processing, so as to obtain intermediate content output by the simplified module in the student model and a final result output by the student model.
For example, when the student model has the structure shown in fig. 2, the training corpus is first input to the embedding module of the student model, the training corpus and the language type thereof are mapped through the corpus layer and the language type layer of the embedding module, the corpus embedding result and the language type embedding result are merged and then input to the encoding module for feature encoding, then feature decoding is performed through the decoding module, the intermediate content output by the decoding module is collected, and finally the decoding result is processed through the output module to obtain the final result output by the student model.
The model loss function determining module 16 is configured to determine a model loss function according to the labeled translation label corresponding to the corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model, and the final result output by the student model. The specific implementation process of the step is as follows:
the first loss function determining unit 161 is configured to determine a first loss function according to the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model.
For example, when the student model is the structure shown in fig. 2, the first loss function L is calculated according to the following equation (1)AT_FMT
Figure BDA0002642114170000131
Wherein C represents the number of decoding layers of the decoding module in the student model, DklRepresenting a function for calculating the KL divergence,
Figure BDA0002642114170000141
shows the result of the content output by the c-th decoding layer in the teacher model after being processed by the full-connection layer,
Figure BDA0002642114170000142
and the processing result output by the decoding layer of the layer c in the student model is represented.
The second loss function determining unit 162 is configured to determine a second loss function according to the labeled translation label corresponding to the corpus and the final result output by the student model.
For example, when the student model is the structure shown in fig. 2, the second loss function L is calculated according to the following equation (2)hard
Lhard={-p′ijlog(pij)-(1-p′ij)log(1-pij)} (2)
The third loss function determining unit 163 is configured to determine a third loss function based on the final result output by the teacher model and the final result output by the student model.
For example, when the student model is the structure shown in fig. 2, the third loss function L is calculated according to the following equation (3)soft
Figure BDA0002642114170000143
The model loss function determination unit 164 is configured to determine a first loss function L according to the first loss function LAT_FMTA second loss function LhardAnd a third loss function LsoftAnd determining the model loss function.
For example, the model Loss function Loss is calculated according to the following equation (4)all
Lossall=αLhard+(1-α)Lsoft+βLAT-FMT (4)
Wherein alpha and beta respectively represent corresponding weight coefficients of loss values, alpha belongs to (0, 1), and beta belongs to R, and the specific values can be preset according to experience.
The model training module 17 is configured to train the student model according to the model loss function, that is, update parameters of the student model according to the loss function.
The process of training the model according to the loss function is an iterative process, and whether a preset training termination condition is met or not is judged once training. And if the training termination condition is not met, continuing training until the training termination condition is met.
In one possible implementation, satisfying the training termination condition includes, but is not limited to, the following three cases: first, the number of iterative trainings reaches a threshold number. The number threshold may be set empirically, or may be flexibly adjusted according to an application scenario, which is not limited in the embodiment of the present application. Second, the model loss function is less than the loss threshold. The loss threshold may be set empirically, or may be freely adjusted according to an application scenario, which is not limited in this embodiment of the application. Third, model loss functionAnd (7) converging. The convergence of the model loss function means that the fluctuation range of the model loss function is within a reference range in the training result of the reference times along with the increase of the iterative training times. For example, assume a reference range of-10-3~10-3Assume that the reference number is 10. If the model loss function has the fluctuation range of-10 in the 10 times of iterative training results-3~10-3And (4) considering the model loss function to be converged. When any one of the conditions is met, the condition that the training termination condition is met is indicated, and the training of the student model is completed.
In the process of updating the model parameters by using the model loss function, the optimization can be performed by adopting an Adaptive Moment Estimation (Adam) optimization algorithm. In the training process, a coding module lr of the student modelebIs less than or equal to the learning rate lr of the decoding moduledb
In addition, a hierarchical training mode can be used for reducing the decoding modules of the student models step by step in the training process. As shown in fig. 2, after obtaining a student model (including L decoding layers) by training according to a teacher model (including K decoding layers), the student model after training is used as a new teacher model to train a student model with a smaller number of decoding layers, and so on until obtaining a student model only including a predetermined number of (N) decoding layers by training, where K > M > N. In this embodiment, the compression ratio of the student model is appropriately selected according to the improvement of the theoretical performance of the translation model and the compromise of the effect of the translation model. And after the student model training is finished, removing the teacher model.
The student model that this embodiment training obtained has simplified the model structure to through utilizing the intermediate content and the final result of teacher's model output to supervise at the training in-process, make the performance, the effect of model are guaranteed as far as possible under the condition that the student model can the parameter become less, because the model structure of student model is simplified, therefore promoted the model and deployed the throughput when online, reduced the model and delayed, and then promoted user experience.
Example 3
The present embodiment provides an electronic device, which may be represented in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the knowledge-based distillation machine translation model training method provided in embodiment 1.
Fig. 4 shows a schematic diagram of a hardware structure of the present embodiment, and as shown in fig. 4, the electronic device 9 specifically includes:
at least one processor 91, at least one memory 92, and a bus 93 for connecting the various system components (including the processor 91 and the memory 92), wherein:
the bus 93 includes a data bus, an address bus, and a control bus.
Memory 92 includes volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and can further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 91 executes a computer program stored in the memory 92 to execute various functional applications and data processing, such as the machine translation model training method based on knowledge distillation provided in embodiment 1 of the present invention.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., a keyboard, a pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 9 via the bus 93. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the knowledge-based distillation machine translation model training method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps of implementing the knowledge-based distillation machine translation model training method of example 1, when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (14)

1. A knowledge distillation-based machine translation model training method is characterized by comprising the following steps:
acquiring a trained teacher model and untrained student models, wherein the student models are obtained by simplifying partial modules in the teacher model;
acquiring a sample data set, wherein the sample data set comprises a plurality of training corpora and labeled translation labels corresponding to the training corpora;
inputting the training corpus into the teacher model for processing to obtain intermediate content output by the simplified module in the teacher model and a final result output by the teacher model;
inputting the training corpus into the student model for processing to obtain intermediate content output by a simplified module in the student model and a final result output by the student model;
determining a model loss function according to a label translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model;
and performing iterative training on the student model according to the model loss function.
2. The knowledge distillation-based machine translation model training method according to claim 1, wherein the determining a model loss function according to the labeled translation tags corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model, and the final result output by the student model comprises:
determining a first loss function according to the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model;
determining a second loss function according to the labeled translation label corresponding to the training corpus and the final result output by the student model;
determining a third loss function according to the final result output by the teacher model and the final result output by the student model;
and determining the model loss function according to the first loss function, the second loss function and the third loss function.
3. The knowledge distillation-based machine translation model training method of claim 1, wherein the teacher model and the student model respectively comprise an embedding module, an encoding module, a decoding module and an output module.
4. The knowledge distillation-based machine translation model training method according to claim 3, wherein the student model is consistent with the structure of the embedding module, the coding module and the output module of the teacher model, the decoding module of the student model is obtained by simplifying the decoding module of the teacher model, and a full connection layer is arranged between the decoding module of the student model and the decoding module of the teacher model.
5. The knowledge-based distillation machine translation model training method of claim 1, wherein after acquiring the sample data set, the method further comprises: and preprocessing the training corpus.
6. The knowledge-based distillation machine translation model training method according to claim 5, wherein the preprocessing the corpus comprises:
converting characters in the training corpus into corresponding numerical values;
and dividing the training corpuses into different batches, and adjusting the training corpuses of each batch to be the same in length in a zero value filling mode.
7. A knowledge distillation-based machine translation model training device is characterized by comprising:
the model acquisition module is used for acquiring a trained teacher model and untrained student models, and the student models are obtained by simplifying part of modules in the teacher model;
the system comprises a sample acquisition module, a translation module and a translation module, wherein the sample acquisition module is used for acquiring a sample data set, and the sample data set comprises a plurality of training corpora and labeled translation labels corresponding to the training corpora;
the teacher model processing module is used for inputting the training corpora into the teacher model for processing to obtain intermediate contents output by the simplified module in the teacher model and a final result output by the teacher model;
the student model processing module is used for inputting the training corpus into the student model for processing to obtain intermediate content output by the simplified module in the student model and a final result output by the student model;
the model loss function determining module is used for determining a model loss function according to the labeled translation label corresponding to the training corpus, the intermediate content output by the simplified module in the teacher model, the final result output by the teacher model, the intermediate content output by the simplified module in the student model and the final result output by the student model;
and the model training module is used for carrying out iterative training on the student model according to the model loss function.
8. The knowledge-based distillation machine translation model training apparatus of claim 7, wherein the model loss function determination module comprises:
a first loss function determination unit configured to determine a first loss function based on the intermediate content output by the simplified module in the teacher model and the intermediate content output by the simplified module in the student model;
a second loss function determining unit, configured to determine a second loss function according to the labeled translation label corresponding to the training corpus and the final result output by the student model;
a third loss function determination unit configured to determine a third loss function according to a final result output by the teacher model and a final result output by the student model;
a model loss function determining unit, configured to determine the model loss function according to the first loss function, the second loss function, and the third loss function.
9. The knowledge distillation-based machine translation model training apparatus of claim 7, wherein the teacher model and the student model respectively comprise an embedding module, an encoding module, a decoding module and an output module.
10. The knowledge distillation-based machine translation model training device according to claim 9, wherein the student model is consistent with the structure of the embedding module, the coding module and the output module of the teacher model, the decoding module of the student model is obtained by simplifying the decoding module of the teacher model, and a full connection layer is arranged between the decoding module of the student model and the decoding module of the teacher model.
11. The knowledge-based distillation machine translation model training apparatus of claim 7, further comprising: and the preprocessing module is used for preprocessing the training corpus after the sample data set is acquired.
12. The knowledge-based distillation machine translation model training device of claim 11, wherein the preprocessing module comprises:
the numerical value conversion unit is used for converting the characters in the training corpus into corresponding numerical values;
and the length adjusting unit is used for dividing the training corpuses into different batches and adjusting the training corpuses of each batch to be the same length in a zero value filling mode.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 6 are implemented when the computer program is executed by the processor.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202010843014.5A 2020-08-20 2020-08-20 Knowledge distillation-based machine translation model training method, device, equipment and medium Active CN111950302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010843014.5A CN111950302B (en) 2020-08-20 2020-08-20 Knowledge distillation-based machine translation model training method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010843014.5A CN111950302B (en) 2020-08-20 2020-08-20 Knowledge distillation-based machine translation model training method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN111950302A true CN111950302A (en) 2020-11-17
CN111950302B CN111950302B (en) 2023-11-10

Family

ID=73358463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010843014.5A Active CN111950302B (en) 2020-08-20 2020-08-20 Knowledge distillation-based machine translation model training method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111950302B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541122A (en) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN112597778A (en) * 2020-12-14 2021-04-02 华为技术有限公司 Training method of translation model, translation method and translation equipment
CN112784999A (en) * 2021-01-28 2021-05-11 开放智能机器(上海)有限公司 Mobile-v 1 knowledge distillation method based on attention mechanism, memory and terminal equipment
CN113011202A (en) * 2021-03-23 2021-06-22 中国科学院自动化研究所 End-to-end image text translation method, system and device based on multi-task training
CN113160041A (en) * 2021-05-07 2021-07-23 深圳追一科技有限公司 Model training method and model training device
CN113435208A (en) * 2021-06-15 2021-09-24 北京百度网讯科技有限公司 Student model training method and device and electronic equipment
CN113505614A (en) * 2021-07-29 2021-10-15 沈阳雅译网络技术有限公司 Small model training method for small CPU equipment
CN113505615A (en) * 2021-07-29 2021-10-15 沈阳雅译网络技术有限公司 Decoding acceleration method of small CPU (central processing unit) equipment-oriented neural machine translation system
CN113642605A (en) * 2021-07-09 2021-11-12 北京百度网讯科技有限公司 Model distillation method, device, electronic equipment and storage medium
CN115438678A (en) * 2022-11-08 2022-12-06 苏州浪潮智能科技有限公司 Machine translation method, device, electronic equipment and storage medium
WO2023212997A1 (en) * 2022-05-05 2023-11-09 五邑大学 Knowledge distillation based neural network training method, device, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170031901A1 (en) * 2015-07-30 2017-02-02 Alibaba Group Holding Limited Method and Device for Machine Translation
CN108090050A (en) * 2017-11-08 2018-05-29 江苏名通信息科技有限公司 Game translation system based on deep neural network
WO2018126213A1 (en) * 2016-12-30 2018-07-05 Google Llc Multi-task learning using knowledge distillation
CN110059744A (en) * 2019-04-16 2019-07-26 腾讯科技(深圳)有限公司 Method, the method for image procossing, equipment and the storage medium of training neural network
CN110506279A (en) * 2017-04-14 2019-11-26 易享信息技术有限公司 Using the neural machine translation of hidden tree attention
CN110765791A (en) * 2019-11-01 2020-02-07 清华大学 Automatic post-editing method and device for machine translation
CN110765966A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 One-stage automatic recognition and translation method for handwritten characters
CN111382582A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Neural machine translation decoding acceleration method based on non-autoregressive

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170031901A1 (en) * 2015-07-30 2017-02-02 Alibaba Group Holding Limited Method and Device for Machine Translation
WO2018126213A1 (en) * 2016-12-30 2018-07-05 Google Llc Multi-task learning using knowledge distillation
CN110506279A (en) * 2017-04-14 2019-11-26 易享信息技术有限公司 Using the neural machine translation of hidden tree attention
CN108090050A (en) * 2017-11-08 2018-05-29 江苏名通信息科技有限公司 Game translation system based on deep neural network
CN110059744A (en) * 2019-04-16 2019-07-26 腾讯科技(深圳)有限公司 Method, the method for image procossing, equipment and the storage medium of training neural network
CN110765966A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 One-stage automatic recognition and translation method for handwritten characters
CN110765791A (en) * 2019-11-01 2020-02-07 清华大学 Automatic post-editing method and device for machine translation
CN111382582A (en) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 Neural machine translation decoding acceleration method based on non-autoregressive

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GEOFFREY HINTON 等: "Distilling the Knowledge in a Neural Network", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1503.02531》, pages 1 - 9 *
ZHIQING SUN 等: "Mobilebert: Task-agnostic compression of bert by progressive knowledge transfer", 《ICLR 2020 CONFERENCE》, pages 1 - 26 *
廖胜兰 等: "基于BERT模型与知识蒸馏的意图分类方法", 《计算机工程》, vol. 47, no. 5, pages 73 - 79 *
李响 等: "利用单语数据改进神经机器翻译压缩模型的翻译质量", 《中文信息学报》, vol. 33, no. 7, pages 46 - 55 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597778A (en) * 2020-12-14 2021-04-02 华为技术有限公司 Training method of translation model, translation method and translation equipment
CN112541122A (en) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN112784999A (en) * 2021-01-28 2021-05-11 开放智能机器(上海)有限公司 Mobile-v 1 knowledge distillation method based on attention mechanism, memory and terminal equipment
CN113011202B (en) * 2021-03-23 2023-07-25 中国科学院自动化研究所 End-to-end image text translation method, system and device based on multitasking training
CN113011202A (en) * 2021-03-23 2021-06-22 中国科学院自动化研究所 End-to-end image text translation method, system and device based on multi-task training
CN113160041A (en) * 2021-05-07 2021-07-23 深圳追一科技有限公司 Model training method and model training device
CN113160041B (en) * 2021-05-07 2024-02-23 深圳追一科技有限公司 Model training method and model training device
CN113435208A (en) * 2021-06-15 2021-09-24 北京百度网讯科技有限公司 Student model training method and device and electronic equipment
CN113435208B (en) * 2021-06-15 2023-08-25 北京百度网讯科技有限公司 Training method and device for student model and electronic equipment
CN113642605A (en) * 2021-07-09 2021-11-12 北京百度网讯科技有限公司 Model distillation method, device, electronic equipment and storage medium
CN113505614A (en) * 2021-07-29 2021-10-15 沈阳雅译网络技术有限公司 Small model training method for small CPU equipment
CN113505615A (en) * 2021-07-29 2021-10-15 沈阳雅译网络技术有限公司 Decoding acceleration method of small CPU (central processing unit) equipment-oriented neural machine translation system
WO2023212997A1 (en) * 2022-05-05 2023-11-09 五邑大学 Knowledge distillation based neural network training method, device, and storage medium
CN115438678A (en) * 2022-11-08 2022-12-06 苏州浪潮智能科技有限公司 Machine translation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111950302B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN111950302B (en) Knowledge distillation-based machine translation model training method, device, equipment and medium
CN112487182B (en) Training method of text processing model, text processing method and device
CN110069790B (en) Machine translation system and method for contrasting original text through translated text retranslation
CN106484682A (en) Based on the machine translation method of statistics, device and electronic equipment
WO2023160472A1 (en) Model training method and related device
CN109582952B (en) Poetry generation method, poetry generation device, computer equipment and medium
CN112417092B (en) Intelligent text automatic generation system based on deep learning and implementation method thereof
CN113204633B (en) Semantic matching distillation method and device
CN112560456B (en) Method and system for generating generated abstract based on improved neural network
CN115268868B (en) Intelligent source code conversion method based on supervised learning
CN113505193A (en) Data processing method and related equipment
CN114398899A (en) Training method and device for pre-training language model, computer equipment and medium
CN113821635A (en) Text abstract generation method and system for financial field
CN116129902A (en) Cross-modal alignment-based voice translation method and system
CN112528598B (en) Automatic text abstract evaluation method based on pre-training language model and information theory
CN112765996B (en) Middle-heading machine translation method based on reinforcement learning and machine translation quality evaluation
CN116432637A (en) Multi-granularity extraction-generation hybrid abstract method based on reinforcement learning
CN115730590A (en) Intention recognition method and related equipment
CN110888944A (en) Attention convolution neural network entity relation extraction method based on multiple convolution window sizes
CN114239575B (en) Statement analysis model construction method, statement analysis method, device, medium and computing equipment
CN112287697A (en) Method for accelerating running speed of translation software in small intelligent mobile equipment
CN114238579B (en) Text analysis method, text analysis device, text analysis medium and computing equipment
CN117172232B (en) Audit report generation method, audit report generation device, audit report generation equipment and audit report storage medium
CN112685543B (en) Method and device for answering questions based on text
Peng Design and Construction of Machine Translation System Based on RNN Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant