CN113033218B

CN113033218B - Machine translation quality evaluation method based on neural network structure search

Info

Publication number: CN113033218B
Application number: CN202110414498.6A
Authority: CN
Inventors: 杜权
Original assignee: Shenyang Yayi Network Technology Co ltd
Current assignee: Shenyang Yayi Network Technology Co ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2023-08-15
Anticipated expiration: 2041-04-16
Also published as: CN113033218A

Abstract

The invention discloses a machine translation quality evaluation method based on neural network structure search, which comprises the following steps: acquiring training data of a WMT quality assessment task and training data in a WMT machine translation task; determining a predictor component in a predictor-evaluator model, performing a pre-search using a search strategy based on an evolutionary algorithm; constructing a classical predictor-evaluator model, and carrying out initial use of a transducer neural machine translation model to hot start an initialized population by carrying out a search strategy based on an evolutionary algorithm; searching strategies based on an evolutionary algorithm are used for pre-searching; performing fine adjustment, training and optimizing on the network structure of the predictor part; the word-level task of quality assessment is performed using the complete model, and its accuracy over the test set is used to characterize the model performance. The invention utilizes network structure search techniques to tailor the network structure for the predictor component for quality assessment tasks and data characteristics.

Description

Machine translation quality evaluation method based on neural network structure search

Technical Field

The invention relates to a machine translation quality evaluation technology, in particular to a machine translation quality evaluation method based on neural network structure search.

Background

In recent years, with the widespread spread and use of deep learning techniques, neural network-based approaches have achieved remarkable success in many fields. The performance of neural network-based approaches on specific tasks often depends on the structure of the neural network, and thus most efforts of researchers have focused on designing more excellent network structures. With the continuous progress of research in various fields, more and more excellent neural network structures are proposed, and the neural network structures applied to various tasks become more and more complex, which means that the trial-and-error cost and the time cost of designing the neural network structures by means of manpower become more difficult to bear, and thus, the structure search technology is generated.

The structure search technology is a technology for automatically acquiring a neural network structure with better performance and stronger generalization capability by designing an economic and efficient search method under a given search space, and aims to relieve researchers from a large amount of mental labor. Currently, there are several mainstream methods for structure search technology: a gradient-based network structure search method, an evolutionary algorithm-based structure search method, a reinforcement learning-based structure search method, and a Bayesian optimization-based structure search method.

Translation quality assessment is an important area in machine translation that can make decisions about translation quality without relying on reference translations, including determining word errors, scoring sentences or documents, and so forth. The most classical structure to solve this task is the predictor-evaluator model, where the predictor network structure responsible for feature extraction is often complex. Due to the lack of quality assessment-related data, researchers often use either a trained translation model or various pre-trained models directly as predictors. The network structure of the evaluator is very simple, and a bidirectional RNN network is often directly adopted.

Since it cannot be guaranteed that the translation model and the pre-training model are sufficiently suitable for quality assessment tasks, the present invention will tailor the network structure for the predictor by means of neural network structure search techniques. The current network structure searching method is mostly applied to the tasks of image classification, language modeling and the like with lighter weight, because the realization of the neural network structure searching has extremely high requirements on the computing power of the equipment, and the lightweight network structure searching is more likely to be realized on the existing equipment. Similar to such tasks of quality assessment, the neural network structure itself is relatively complex, and there is a difficulty in applying the structure search technique to the task.

Disclosure of Invention

Aiming at the current situation that the network structure of a predictor component in the classical predictor-evaluator model of the existing quality evaluation task is not completely suitable for the quality evaluation task, the invention provides a machine translation quality evaluation method based on neural network structure search, and the network structure of the predictor component is searched by means of a network structure search technology, so that the model performance is further improved.

In order to realize the above, the technical scheme adopted by the invention is as follows:

the invention provides a machine translation quality evaluation method based on neural network structure search, which comprises the following steps:

1) Acquiring training data of a WMT quality assessment task and training data in a WMT machine translation task;

2) Determining a portion of the network structure search technique to be implemented as a predictor component in a predictor-evaluator model, determining a search space based on component structure and functional characteristics, and determining that a search strategy based on an evolutionary algorithm is to be used for pre-searching;

3) Constructing a classical predictor-evaluator model, wherein the structure of the evaluator part directly uses a bidirectional GRU model in the traditional model, the predictor part is constructed according to a search space and a search strategy, and a transformation neural machine translation model is used for carrying out initial use of the search strategy based on an evolutionary algorithm to thermally start an initialization population;

4) Taking neural machine translation as a target task, taking machine translation bilingual data as training data, and performing pre-search by using a search strategy based on an evolutionary algorithm;

5) Fine-tuning the network structure of the predictor part by using the data of the WMT quality assessment task;

6) Training and optimizing the searched predictor by using machine translation bilingual training data, and continuously training and optimizing a predictor-evaluator overall model by using data of a WMT quality evaluation task after convergence until convergence;

7) And performing word-level tasks of quality assessment by using the complete model after training convergence, and using the accuracy of the word-level tasks on the test set to characterize the performance of the model.

Step 2) selecting a structural space near a transducer model as a search space, modifying the search space on the basis of a NASNET search space, wherein the search space consists of two groups of identical stackable computing units which respectively represent an encoder and a decoder, the computing units of different parts are cascaded by blocks of different numbers of NASNet patterns, each block comprises a left branch and a right branch, and the left branch and the right branch respectively receive two hidden state inputs and generate new hidden state combinations to be used as the output of the block; in the structure searching process, the operation combination of the left branch and the right branch is actually required to be searched, wherein the operation combination comprises input, normalization, layer structure, output dimension, activation function, combination function and calculation unit quantity; and meanwhile, determining a network structure of a search predictor in a search space by using a search strategy based on an evolution algorithm, namely, regarding all candidate structures as a population in the biological world, wherein each candidate structure is an individual in the population, the 'win/lose' in the population evolution process is the process of selecting the candidate structure, and the 'good/bad' of the individual is measured by the adaptability of the individual.

Step 3) building a predictor-evaluator model, searching the internal structure of the predictor by using a network structure searching technology, and keeping the internal structure of the evaluator part to be a classical bidirectional circulating neural network, in particular a bidirectional GRU network; in the search process for predictor structures, a population is initially initialized using a transducer neural machine translation model hot start, on the basis of which more excellent predictor structures than existing transducer models are found.

In the step 4), under the inspired of the pre-training method, taking a neural machine translation task as a target task, and pre-searching the network structure of the predictor component on the basis of fully utilizing machine translation bilingual data; the evolution algorithm adopted in the pre-search process is a progressive dynamic barrier algorithm based on a tournament selection evolution algorithm, and specifically comprises the following steps:

401 Randomly sampling N individuals in an original population obtained by initializing a Transformer model to serve as a sub-population, evaluating the loss of each individual in the sub-population on a check set to serve as fitness, and selecting the individual with the highest fitness to mutate, namely changing some components in a network model into other components to generate m sub-models;

402 Training s) the m submodels generated in 401) ₀ After the step, the fitness of m sub-models is evaluated, and the average value h of the fitness of the whole population at the moment is calculated ₀ ；

403 At this time)Randomly sampling N individuals in the population again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness for mutation to generate m sub-models; training the m submodels s ₀ After the step, the fitness of the m sub-models is evaluated, for which the fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, the fitness of the submodels is evaluated, and the average value h of the fitness of the whole population at the moment is calculated ₁ ；

404 Randomly sampling N individuals in the population at the moment again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness to mutate to generate m sub-models. Training the m submodels s ₀ After the step, the fitness of the m sub-models is evaluated, for which the fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, the fitness of these submodels is again evaluated, for which fitness is greater than h ₁ The submodel of the model continues to train s ₂ After the step, the fitness of the submodels is evaluated, and the average value h of the fitness of the whole population at the moment is calculated ₂ ；

405 And the same is done until the training steps of all individuals in the population reach a specified value.

In step 5) the structure searched in step 4) is used as a predictor, and the training data of the quality assessment task is used to carry out fine tuning on the structure of the predictor by using the evolution algorithm mentioned in step 4) on the whole predictor-evaluator model.

In step 6), after the model parameters of the predictor and evaluator components are re-initialized, training the model parameters of the predictor by using bilingual data from a machine translation task and a neural machine translation task until convergence; and then training and optimizing model parameters of the predictor and the evaluator simultaneously by using the quality evaluation data until convergence, wherein the method comprises the following steps of:

601 Building a predictor assembly by using the network structure obtained after the fine tuning in the step 5);

602 After initializing the parameters in the predictor component again, using bilingual data from the machine translation task, and simultaneously taking neural machine translation as a target task, and performing conventional training on the parameters of the predictor component until the loss of the model is converged;

603 Using a bi-directional GRU network to build an evaluator assembly in combination with a pre-trained predictor assembly;

604 The parameters in the assessment are re-initialized, and model parameters of the predictor and the evaluator are simultaneously trained by using the quality assessment training data until the loss of the model on the training set is converged.

The invention has the following beneficial effects and advantages:

1. the method eliminates the method of directly using a translation model or a pre-training model as a predictor in the traditional method, and utilizes a network structure searching technology to customize a network structure for the predictor component according to the task and data characteristics of quality assessment.

2. The invention overcomes the defect of overlong training time in the network structure searching process, and adopts a progressive dynamic obstacle method to stop the training of some models without prospect in advance, so as to allocate more computing resources to the sub-models with better current performance.

3. The invention realizes the performance improvement on the quality evaluation task through the automatic design of the model structure.

Drawings

FIG. 1 is a diagram of a search space involved in a network structure search process in accordance with the present invention;

FIG. 2 is a schematic diagram of an overall model of a predictor-evaluator in accordance with the present invention;

fig. 3 is a schematic diagram of a network structure pre-search and fine tuning process according to the present invention.

Detailed Description

The invention is further elucidated below in connection with the drawings of the specification.

The invention relates to a machine translation quality evaluation method based on neural network structure search, which mainly searches the network structure of a predictor component of a classical predictor-evaluator model, and an evaluator part in the searching process uses a bidirectional long-short-term memory network. Because of the lack of quality assessment task training data, neural machine translation is used as a target task, the network structure of the predictor component is pre-searched by using bilingual data of machine translation, and then the pre-searched network structure of the predictor component is finely tuned by using the quality assessment training data. The entire search process will use a search strategy based on an evolutionary algorithm.

The invention discloses a machine translation quality evaluation method based on neural network structure search, which comprises the following steps:

1) Acquiring the Ind training data of a WMT2020 quality assessment task and the Ind training data of a WMT2014 machine translation task;

2) Determining the part to be implemented with the network structure search technique as a predictor-evaluating predictor components in a model thereof, determining a search space according to component structure and functional characteristics, and simultaneously determining that a search strategy based on an evolutionary algorithm is to be used for pre-searching;

5) Fine tuning the network structure of the predictor part by using the quality evaluation data with labels;

6) Training and optimizing the searched predictor by using machine translation bilingual training data, and continuously training and optimizing the predictor-evaluator overall model by using quality evaluation training data after convergence until convergence;

In step 2) the structure space around the transducer model is empirically selected as the search space, more specifically, a slight modification is made on the basis of the NASNet search space, as shown in fig. 1, the search space is composed of two identical and stackable computing units, which respectively represent the encoder and the decoder, different parts of the computing units are cascaded by different numbers of NASNet style blocks, and each block comprises a left branch and a right branch, which respectively receive two hidden state inputs and generate new hidden state combinations as the output of the block. In the structure searching process, the operation combination of the left branch and the right branch is actually required to be searched, wherein the operation combination comprises input, normalization, layer structure, output dimension, activation function, combination function and calculation unit quantity; and simultaneously determining a network structure of a predictor searched in a search space by using a search strategy based on an evolution algorithm, namely, taking all candidate structures as a population in the biological world, wherein each candidate structure is an individual in the population, the 'win/lose' in the population evolution process is the process of selecting the candidate structure, and the 'win/lose' of the individual is measured by the adaptability (the loss of the candidate structure on a check set after a certain number of steps are trained on the candidate structure).

In step 3), a predictor-evaluator model is built, the model framework of which is shown in fig. 2, and since the predictor responsible for feature extraction in the model has the greatest influence on the performance of the whole model, the internal structure of the predictor is mainly searched by using a network structure searching technology, so that the internal structure of the evaluator part is kept to be a classical bidirectional circulating neural network, in particular a bidirectional LSTM network. In the search process for predictor structures, a population is initially initialized using a transducer neural machine translation model hot start, on the basis of which more excellent predictor structures than existing transducer models are found.

In step 4), since the scarce quality evaluation data is insufficient to complete the network structure searching process, under the heuristic of the pretraining method, the neural machine translation task is taken as the target task, and on the basis of fully utilizing the machine translation bilingual data, the network structure of the predictor component is pre-searched, and the process corresponds to the pre-searching process of the left part of fig. 3; the evolution algorithm adopted in the pre-search process is a method for dynamically distributing resources to a more promising network structure according to suitability on the basis of the tournament selection evolution algorithm, namely a progressive dynamic barrier algorithm, which is specifically as follows:

401 Randomly sampling N individuals in the original population initialized by the Transformer model as a sub-population, evaluating the loss of each individual in the sub-population on the check set as fitness, and selecting the individual with the highest fitness for mutation, namely changing some components in the network model into other components, to generate m sub-models.

402 Training s) the m submodels generated in 401) ₀ After the step, their fitness is evaluated, and the average value h of the fitness of the whole population at that time is calculated ₀ 。

403 Randomly sampling N individuals in the population at the moment again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness to mutate to generate m sub-models. Training the m submodels s ₀ After the step, their fitness is evaluated, for which fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, their fitness is evaluated, and the average value h of the fitness of the whole population at that time is calculated ₁ 。

404 Randomly sampling N individuals in the population at the moment again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness to mutate to generate m sub-models. Training the m submodels s ₀ After the step, their fitness is evaluated, for which fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, their fitness is again assessed, for which fitness is greater than h ₁ The submodel of the model continues to train s ₂ After the step, their fitness is evaluated, and the average value h of the fitness of the whole population at that time is calculated ₂ 。

In step 5) the structure searched in step 4) is used as predictor, and the evolution algorithm mentioned in step 4) is continued to be used on the whole predictor-evaluator model using the training data of the quality evaluation task to fine-tune the structure of the predictor, which corresponds to the "fine-tuning" procedure in the right part of fig. 3.

In step 6), after re-initializing the model parameters of the predictor and evaluator components, training the model parameters of the predictor using bilingual data from the machine translation task and the neural machine translation task until convergence; the model parameters of the predictor and evaluator are then simultaneously trained and optimized using the quality assessment data until convergence. The method comprises the following steps:

602 After re-initializing the parameters in the predictor component, using bilingual data from the machine translation task while taking neural machine translation as the target task, and performing conventional training on the parameters of the predictor component until the loss of the model converges.

The word-level quality assessment task is presented here as an example. When the test data is "{" Draw or select a line "," zeichen oderWhen Sie ine line ies "}" (where "Draw or select a line" is an english source language sentence, "zeichen oder>Sie tein link aus "is a german translation provided by the translation system), word-level quality assessmentThe estimation task requires the positions of words in the source language sentence which cause the translation error, words in the translation which are translated error, and the phenomenon of miss-translation when translating the sentence. The two sentences in the test data are respectively sent to the Source language end and the translation end in the model of fig. 2, the predictor component will extract high-abstract quality features in the Source language and the translation, which can reflect the relation between the Source language and the translation and the quality of the translation, and send the quality features to the evaluator, the evaluator will make predictions according to the quality features, and the output of the part includes three tags, namely Source tags (reflecting whether the words in the Source language sentence are correctly translated), MT tags (reflecting whether the translated words in the translation are correct), gap tags (reflecting whether the translation corresponding position in the translation is miss-translated), and in this example, the Source tags are "BAD BAD OK BAD BAD OK", the MT tags are "OK OK OK OK OK BAD OK OK", and the Gap tags are "OK BAD OK OK OK OK OK OK OK". />

Claims

1. A machine translation quality evaluation method based on neural network structure search is characterized by comprising the following steps:

4) Taking neural machine translation as a target task, taking machine translation bilingual data as training data, and pre-searching a network structure of a predictor part by using a search strategy based on an evolutionary algorithm;

2. The machine translation quality evaluation method based on neural network structure search according to claim 1, wherein: step 2) selecting a structural space near a transducer model as a search space, modifying the search space on the basis of a NASNET search space, wherein the search space consists of two groups of identical stackable computing units which respectively represent an encoder and a decoder, the computing units of different parts are cascaded by blocks of different numbers of NASNet patterns, each block comprises a left branch and a right branch, and the left branch and the right branch respectively receive two hidden state inputs and generate new hidden state combinations to be used as the output of the block; in the structure searching process, the operation combination of the left branch and the right branch is actually required to be searched, wherein the operation combination comprises input, normalization, layer structure, output dimension, activation function, combination function and calculation unit quantity; and meanwhile, determining a network structure of a search predictor in a search space by using a search strategy based on an evolution algorithm, namely, regarding all candidate structures as a population in the biological world, wherein each candidate structure is an individual in the population, the 'win/lose' in the population evolution process is the process of selecting the candidate structure, and the 'good/bad' of the individual is measured by the adaptability of the individual.

3. The machine translation quality evaluation method based on neural network structure search according to claim 1, wherein: step 3) building a predictor-evaluator model, searching the internal structure of the predictor by using a network structure searching technology, and keeping the internal structure of the evaluator part to be a classical bidirectional circulating neural network, in particular a bidirectional GRU network; in the search process for predictor structures, a population is initially initialized using a transducer neural machine translation model hot start, on the basis of which more excellent predictor structures than existing transducer models are found.

4. The machine translation quality evaluation method based on neural network structure search according to claim 1, wherein: in the step 4), under the inspired of the pre-training method, taking a neural machine translation task as a target task, and pre-searching the network structure of the predictor component on the basis of fully utilizing machine translation bilingual data; the evolution algorithm adopted in the pre-search process is a progressive dynamic barrier algorithm based on a tournament selection evolution algorithm, and specifically comprises the following steps:

403 Randomly sampling N individuals in the population at the moment again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness to mutate to generate m sub-models; training the m submodels s ₀ After the step, the fitness of the m sub-models is evaluated, for which the fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, the fitness of the submodels is evaluated, and the average value h of the fitness of the whole population at the moment is calculated ₁ ；

404 Randomly sampling N individuals in the population at the moment again to serve as sub-populations, evaluating the fitness of each individual in the sub-populations, and selecting the individuals with the highest fitness to mutate to generate m sub-models; training the m sub-modelsTraining s ₀ After the step, the fitness of the m sub-models is evaluated, for which the fitness is greater than h ₀ The submodel of the model continues to train s ₁ After the step, the fitness of these submodels is again evaluated, for which fitness is greater than h ₁ The submodel of the model continues to train s ₂ After the step, the fitness of the submodels is evaluated, and the average value h of the fitness of the whole population at the moment is calculated ₂ ；

5. The machine translation quality evaluation method based on neural network structure search according to claim 1, wherein: in step 5) the structure searched in step 4) is used as a predictor, and the training data of the quality assessment task is used to carry out fine tuning on the structure of the predictor by using the evolution algorithm mentioned in step 4) on the whole predictor-evaluator model.

6. The machine translation quality evaluation method based on neural network structure search according to claim 1, wherein: in step 6), after the model parameters of the predictor and evaluator components are re-initialized, training the model parameters of the predictor by using bilingual data from a machine translation task and a neural machine translation task until convergence; and then training and optimizing model parameters of the predictor and the evaluator simultaneously by using the quality evaluation data until convergence, wherein the method comprises the following steps of: