CN112016271A

CN112016271A - Language style conversion model training method, text processing method and device

Info

Publication number: CN112016271A
Application number: CN201910465744.3A
Authority: CN
Inventors: 王黎杰; 涂眉
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecom R&D Center; Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2020-12-01

Abstract

The application provides a training method, a text processing method and a corresponding device of a style conversion model, wherein the training method comprises the following steps: acquiring training sample data, wherein the training sample data comprises first training texts and second training samples, each first training text comprises a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample comprises a source text in the original language style and a target text in a non-target language style corresponding to the source text; training the language style conversion model based on the training sample data until the total loss function of the style conversion model converges. Based on the scheme provided by the embodiment of the application, the toast rate in the translation result can be effectively improved.

Description

Language style conversion model training method, text processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method for training a language style conversion model, a method for processing a text, and an apparatus for processing a text.

Background

A neural network model architecture represented by encoding (Encode) and decoding (Decode) is widely applied to tasks such as machine translation, automatic text summarization, automatic robot question answering and the like. The mainstream architecture at present is mainly realized by adopting a recurrent neural network plus attention mechanism or simply adopting a multi-head self-attention mode. However, the existing model is poor in generating text with a specific language style (such as "toast").

Disclosure of Invention

The purpose of the present application is to provide a scheme capable of generating a text in a desired language style, and to achieve the purpose, the technical scheme provided by the present application is specifically as follows:

in a first aspect, an embodiment of the present application provides a method for training a language style conversion model, where the method includes:

acquiring training sample data, wherein the training sample data comprises first training texts and second training samples, each first training text comprises a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample comprises a source text in the original language style and a target text in a non-target language style corresponding to the source text;

training the language style conversion model based on training sample data until a total loss function of the style conversion model converges, wherein the total loss function comprises a text processing loss function, and the text processing loss function is used for representing the difference between the text output by the style conversion model and the corresponding target text.

In a second aspect, an embodiment of the present application provides a text processing method, where the method includes:

acquiring a text to be processed;

inputting the text to be processed into a language style conversion model to obtain a target text with a target language style corresponding to the text to be processed, wherein the language style conversion model is obtained by training based on the method provided by the first aspect of the application.

In a third aspect, an embodiment of the present application provides an apparatus for training a language style conversion model, where the apparatus includes:

the training sample acquisition module is used for acquiring training sample data, the training sample data comprises first training texts and second training samples, each first training text comprises a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample comprises a source text in the original language style and a target text in a non-target language style corresponding to the source text;

and the model training module is used for training the language style conversion model based on training sample data until the total loss function of the style conversion model is converged, wherein the total loss function comprises a text processing loss function, and the text processing loss function is used for representing the difference between the text output by the style conversion model and the corresponding target text.

In a fourth aspect, an embodiment of the present application provides a text processing apparatus, including:

the text to be processed acquisition module is used for acquiring a text to be processed;

the target text obtaining model is used for inputting the text to be processed into the language style conversion model to obtain a target text with a target language style corresponding to the text to be processed, wherein the language style conversion model is obtained by training based on the method provided by the first aspect of the application.

In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor; wherein the memory has stored therein a computer program; the processor is configured to invoke the computer program to perform the method provided in the first aspect or the second aspect of the present application.

In a sixth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method provided in the first or second aspect of the present application.

The advantages of the technical solutions provided in the present application will be described in detail with reference to the following embodiments and accompanying drawings, which are not described herein.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a diagram illustrating the structure of a translation model in the prior art;

FIG. 2 is a flow chart illustrating a method for training a model according to an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of a training method in an example of the present application;

FIG. 4 illustrates a schematic diagram of a pre-training principle of a translation model in an example of the present application;

FIG. 5 is a schematic diagram illustrating the principle of pre-training discriminant models in an example of the present application

FIG. 6 illustrates a schematic diagram of a retraining principle of a translation model in an example of the present application;

FIG. 7 illustrates a schematic diagram of computing a first discriminant loss function in an example of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus for a model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 10 shows a schematic diagram of a principle of calculating a loss function provided by an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

In order to make the objects, technical solutions and advantages of the present application clearer, the following takes a translation model as an example to briefly introduce the related technical solutions in the prior art.

A schematic structure diagram of an existing machine translation model is shown in fig. 1, and as shown in the figure, the model mainly includes two parts, namely an encoder and a decoder, for a text to be translated, such as a chinese text of "i is a student" shown in the figure, the text is input into the encoder of the model, and a translation text corresponding to a target language style (in this example, korean) can be output by the decoder, such as a korean text of the output part shown in the figure.

Although the existing machine translation model can also better realize the translation of the text, the applicant of the present application finds that, for the language with the expression of the tongue, the existing machine translation model cannot ensure that the translation result is expressed in the tongue, specifically, the tongue in the language is very complex, for example, nouns and verbs in languages such as japanese and korean have special tongue names, the end of the sentence is also transformed in a tangible manner, and simple word replacement (which has a common translation scheme) can make the generated sentence more rigid, while when the target language is the language with the expression of the tongue, the target language is likely to have different tongue expressions in different context contexts, and how to naturally generate the tongue in different task frames (translation or question and answer) is still an urgent problem to be solved.

Similarly, other neural network models commonly used in the prior art, in addition to the translation model, are also basically unable to meet the application requirements of generating target text with a specific language style.

Additionally, it is understood that linguistics refers to a particular manner of expressing text having a meaning, e.g., linguistics, non-linguistics, colloquialisation, writing, etc.

In order to solve the above problems in the prior art, the present application provides a method for training a language style conversion model, a method for processing a text, and a device thereof. The following describes embodiments of the present application with reference to specific examples.

Fig. 2 is a schematic flowchart illustrating a method for training a language style conversion model according to an embodiment of the present application, and as shown in the diagram, the method for training the language style conversion model mainly includes the following steps:

step S110: the method comprises the steps of obtaining training sample data, wherein the training sample data comprise first training texts and second training samples, each first training text comprises a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample comprises a source text in the original language style and a target text in a non-target language style corresponding to the source text.

That is, each training sample in the training sample data is a pair of texts, namely a source text and a target text corresponding to the source text, and the training data includes both the target text in the target language style and the non-target language style as the target text. For example, if the target language style is toast, the target sample includes toast text and non-toast text.

The language style conversion model refers to a model with a text language style conversion function, but the specific type of the model is not limited in the present application, and the model may be a functional model corresponding to actual application requirements for different application requirements. For example, the style conversion model may include, but is not limited to, a translation model or a question and answer model, etc.

It is understood that, for different style conversion models, the training sample data may be different, for example, for a translation model, the language types of the source text and the target text in the training sample data are different, where the source text is a text corresponding to the source language (may be simply referred to as a source language text), the target text is a text corresponding to the target language (may be simply referred to as a target language text), and if the translation model to be trained is a model for translating english into korean, english is the source language, and korean is the target language, each training text in the training sample data includes an english text and a korean text corresponding to the english text; for another example, for the question-answer model, the source text is a question text, and the target text is a corresponding answer text.

As can be seen, in the embodiment of the present application, the source text and the target text may be texts in the same language, or texts in different languages. In addition, the source text and the target text may also be two texts with the same substantial content, that is, the target text is another language style expression of the source text, that is, the language style conversion model in the embodiment of the present application may be a model with a language style conversion function and other functions (such as translation, question answering, etc.), or may be only a model with a language style conversion function. In addition, the original language style and the target language style may be the same or different for a model having functions other than the language style conversion.

Step S120: training the language style conversion model based on the training sample data until the total loss function of the style conversion model converges.

In the training stage, the input of the language style conversion model is a source text, and the output is a predicted target text. For example, for a translation model, the input is source language text, and the output is the translation result of the source language text, that is, the predicted target text.

In the embodiment of the present application, the total loss function includes a text processing loss function, and the text processing loss function is used for a difference between the text output by the style conversion model and the corresponding target text (the target text in the training sample). Since the target text in the training sample data includes both the target language-style text and the non-target language-style text, the output result of the model can be biased to the expression of the target language style through model training.

The embodiment of the present application is not limited to the specific form of the text processing loss function, and may be selected according to the requirement. As an alternative, in each training, for one sample data, the value of the corresponding text processing loss function may be obtained by calculating the difference between the text output by the model and the target text in the sample data.

In an optional embodiment of the present application, training the language style conversion model based on training sample data includes:

marking a first label of a source text in training sample data, wherein the first label is used for representing whether the language style of the source text is a target language style;

the style conversion model is trained based on the tagged source text and the corresponding target text.

Alternatively, in the training stage of the style conversion model, a standard is set or a label is manually marked on each source text in the sample data, and it can be known whether the target text corresponding to the source text is a text with a target language style through the label, for example, for the translation model, if the target language style is "jingle", the label is used to indicate whether the target language text contains "jingle".

The specific form of the tag label may be set as required, and as an optional manner, the end of the source text may be labeled, for example, if the target text corresponding to the source text is a target language-style text, a tag "P" may be added at the end of the source text, and if the target text corresponding to the source text is a target language-style text, a tag "np" may be added at the end of the source text. In addition, in practical applications, in order to make the feature of the label more prominent when the model processes the text, some special processing may be performed on the text itself and the label, for example, a space or an underline (e.g., "_") may be added between the text itself and the label. Taking the translation model and the target language style as the tongue, if the target language text contains tongue, the tag "_ p" may be added to the end of the source language text before training, and if the target language text does not contain tongue, the tag "_ np" may be added to the end of the corresponding source language text.

The training mode based on the labeled data can make the output result of the model biased to the result with the target language style when the model is used for processing the text to be processed after the training of the model is completed.

As an example, fig. 3 is a schematic flowchart illustrating a training method for tagged data according to an embodiment of the present application, in this example, the language style conversion model is a translation model, the target language style is a toast style, as shown in the figure, the part shown by the dotted line box in this example is a translation model (in the example of the present application, for convenience of description of different parts of the model, the word embedding (embedding) model of the language style conversion model is illustrated separately from other parts of the model, but other parts are still referred to by the name of the whole model, and as the translation model in this example is actually a model part of the translation model except for the word embedding part of the input segment), data _ p in the graph indicates that the target language text in the sample data contains a toast, and data _ np indicates that the target language text does not contain a toast. When the model is trained, the source language text in each training sample is input into the word embedding model, the translation model carries out encoding and decoding processing according to the word vector output by the word embedding model, a corresponding translation result is output, and whether the training is finished or not is judged based on whether the total loss function is converged or not in each training process.

Based on the training mode of the labeled data, after the training of the model is completed, when the model is used for translation, labels can be marked on the to-be-translated text before the to-be-translated text is input into the model, for example, a label "p" is marked at the tail of the to-be-translated text, and the labeled to-be-translated text is input into the trained model, and based on the mode, the translation result output by the model can be biased to a result containing the tongue.

In an optional embodiment of the present application, training the language style conversion model based on training sample data until a total loss function of the style conversion model converges includes:

setting a language style discrimination model for determining a probability that a text output from the style conversion model is a text having a target language style, wherein the total loss function further includes a first discrimination loss function corresponding to the discrimination model;

and training the style conversion model based on the training sample data and the discrimination model until the total loss function is converged.

In the scheme, the judgment model is added in the training process of the style conversion model to restrict the processing result of the conversion model, so that the language style of the result output by the model is biased to the target language style, such as expression containing the tongue, when the trained style conversion model is applied.

Correspondingly, in order to ensure the output effect of the style conversion model obtained by training, when the discriminant model is added, the corresponding first discriminant loss function is also added correspondingly, so that the discriminant effect of the discriminant model and the processing effect of the style conversion model are improved through the loss function.

It will be appreciated that the essence of the discriminant model may be a classification network that functions to determine whether the text corresponding to its input is text having a target language style. As an alternative, the discriminant model may be a Convolutional Neural Networks (CNN) classification model (classifier).

The specific function form of the first discriminant loss function can be selected according to actual requirements, for example, a discriminant loss function commonly used in neural network training, that is, a classification loss function, can be selected. As an alternative, the first discriminant loss function may be determined according to the result of the discriminant model and the text corresponding to the input of the discriminant model corresponding to the result of the discriminant model, that is, the value of the function may be the probability of whether the result of the discriminant model is the text having the target language style, and the fact whether the text corresponding to the input of the discriminant model is the text having the target language style.

In an alternative embodiment of the present application, the value of the first discriminant loss function is determined based on the score of each candidate output of the style conversion model and the probability that the discriminant result corresponding to each candidate output is text having the target language style.

For example, for the translation model, if the target language style is the style of tongue, the value of the first discriminant loss function may be determined based on the translation score of each candidate translation result of the translation model and the probability that the discrimination result corresponding to each candidate translation result is tongue.

That is, in the actual training process, a plurality of candidate output results of the style conversion model may be selected, the discriminant model may discriminate each of the plurality of candidate output results, and the value of the first discriminant loss function may be calculated from scores of the plurality of candidate output results and corresponding discrimination results.

As an alternative, the total Loss function tracing Loss may be expressed as:

wherein γ and (1- γ) are both adjustment coefficients, and can be configured and adjusted according to actual requirements, γ is greater than or equal to 0 and less than or equal to 1, N is the number of samples, i.e. the number of source texts input into the model during each training, loss (N) can be a commonly used sample processing loss function, γ × loss (N) represents the text processing loss function in the optional manner,

represents a first discriminant loss function corresponding to a sample, K represents the number of candidate output results, score (f)_{g_i}) A score representing the ith candidate output result,

the judgment result corresponding to the ith candidate output result is shown, namely the probability that the text corresponding to the ith candidate output result is the text with the target language style,

presentation pair

And carrying out logarithm operation.

pre-training the translation model based on training sample data until the text processing loss function is converged;

and training the style conversion model after pre-training based on the training sample data until the total loss function is converged.

That is, before training the style conversion model, the initial style conversion model may be pre-trained based on the text processing loss function, and then the pre-trained style conversion model may be trained based on the total loss function. Through the pre-training step, the text processing accuracy of the model can be improved, namely the accuracy of the content of the target text output by the model can be improved, model training is performed on the basis of relatively good accuracy after pre-training, the training time can be shortened, and the training speed of the model is improved.

pre-training the discrimination model based on the output of the pre-trained style conversion model until a second discrimination loss function is converged;

and training the style conversion model after pre-training based on the training sample data and the discrimination model after pre-training until the total loss function is converged.

That is to say, after the pre-training of the style conversion model is completed, before the training of the style conversion model is performed again, the discriminant model can be pre-trained, so that the pre-trained discriminant model can better discriminate whether the text corresponding to the input of the pre-trained discriminant model is the target language style or the non-target language style.

In an alternative embodiment of the present application, the value of the second discrimination loss function is determined based on the discrimination result of the discrimination model and a second label of the output of the style conversion model corresponding to the discrimination result, where the second label is used to characterize whether the output of the style conversion model is the output with the target language style.

Alternatively, the second discrimination loss function may be a loss function commonly used in neural network training, that is, a value of the function may be calculated according to a discrimination result of the model and an actual situation whether an input thereof (i.e., an output of the segmentation conversion model) is an input having a target language style.

As an alternative, the second discrimination loss function loss (c) may be expressed as:

where M represents the number of inputs to the discriminant model per training, y_jLabels representing inputs of a discriminant model, the labels characterizing whether the inputs have a target language style, y_jCan be {0, 1}, specifically, y_jA value of 0 indicates that there is no target language style, i.e. a non-target language style, y_jA value of 1 indicates that there is a target language style, y_pThe discrimination result indicating the discrimination model, that is, whether or not the output of the model determination style conversion model is an output having the target language style.

In an optional embodiment of the present application, pre-training the discriminant model based on an output of the pre-trained style conversion model until the second discriminant loss function converges includes:

fixing model parameters of the style conversion model after pre-training, inputting source texts in training sample data into the style conversion model after pre-training, and training the discrimination model based on word vectors corresponding to the output of the style conversion model after pre-training until a second discrimination loss function is converged.

That is, after the pre-training of the style conversion model is completed, when the discrimination model is pre-trained, since the style conversion model is already the model that has been pre-trained, the model parameters of the style conversion model may be fixed, the discrimination model is trained based on the word vector corresponding to the output text of the model, and the parameters of the discrimination model are continuously adjusted in the training process until the pre-training of the discrimination model is completed when the second discrimination loss function converges.

It is understood that, in practical applications, in addition to the word vector being used as an input of the discriminant model to pre-train the discriminant model, the output result of the style conversion model may also be used to train the discriminant model.

As an alternative, when the pre-training of the style conversion model is completed and the discriminant model is pre-trained, the discriminant model may be trained by using part or all of the source samples in the training samples, that is, part or all of the source (source) corpora (i.e., the source text in the samples) used in the training of the style conversion model may be extracted, and for each source corpus, the source corpora is input to the pre-trained style conversion model, and N with a high score of the style conversion model may be selected₁(best first N)₁I.e. pre-best-N₁) The discriminant model is trained by each output, specifically, the first best-N of each source corpus output by the style conversion model₁Labeling the output with a second label, such as labels 0 and 1, indicating that the output is a non-target language segmented output and a target language style output, respectively, based on the bandsThe labeled output trains the discriminant model until the second loss function converges. In this example, M in the above formula is the number of source corpora and N extracted when the discriminant model is trained₁The product of (a).

For example, for the Translation model, taking the target language style as the tongue, a part of the source corpus used for pre-training of Neural Machine Translation (NMT) can be extracted, and the best-N of each source corpus of the pre-trained NMT can be taken₁And (one source corpus is input, and multiple candidate outputs are generated in NMT), and then, the translation result is subjected to dedication marking, wherein 0 and 1 respectively represent non-dedication and are used for training a dedication judgment model. Wherein N is₁The value of (a) can be set according to the actual configuration, for example, it can be set to 5.

In an optional embodiment of the present application, training the style conversion model after pre-training based on the training sample data and the pre-trained discrimination model includes:

fixing model parameters of the pre-trained discrimination model, inputting source texts in training sample data into the pre-trained style conversion model, and obtaining output of the pre-trained style conversion model;

and inputting the word vector corresponding to the output of the style conversion model after pre-training into the discrimination model after pre-training to obtain a corresponding discrimination result.

Because the discriminant model is a model which is pre-trained, when the style conversion model is subsequently trained again, the model parameters of the discriminant model can be fixed, retraining of the style conversion model is realized based on the text processing loss function and the first discriminant loss function, the processing effect of the finally obtained style conversion model is improved, and the possibility that the model outputs the text with the target language style is improved on the basis of ensuring the text content processing accuracy.

In an alternative embodiment of the present application, the style conversion model includes a word embedding model, and the model parameters of the pre-trained discrimination model are fixed, including:

and fixing the model parameters of the pre-trained discrimination model and the parameters of the word embedding model.

In practical applications, the word embedding model generally includes a word embedding model at an input end of the model (referred to as Inputsembedding) and a word embedding model at an output end (referred to as Outputsembedding), i.e., word embedding as shown in fig. 3 is input embedding, where input embedding is used to map each word or word in input text (source text when the model is trained, text to be processed when the model is used) to a word vector of fixed dimensions, output embedding is used to map a word or word for a previous prediction output of a decoder to a word vector of fixed dimensions, and the decoder can predict a current word or word based on the word vector and the output of the encoder.

When the style conversion model is pre-trained, the word embedding model is used as a part of the style conversion model, and the parameters of the word embedding model are pre-trained parameters, so that when the translation model is trained again, the parameters of the word embedding model can be fixed without being trained, and the parameters of other network structure parts of the style conversion model are trained again, so that the processing effect of the model is improved. In addition, when the discrimination model is pre-trained, the discrimination model is pre-trained based on the word embedding model obtained after pre-training, that is, the discrimination model is obtained by using the word vector based on the word embedding model obtained after pre-training during pre-training, so that the parameters of the word embedding model are fixed during re-training, the discrimination model can be influenced, and the model training speed can be improved.

For better illustration and understanding of the solutions provided by the embodiments of the present application, further description is provided below with reference to examples.

In this example, the scheme provided by the embodiment of the present application is described by taking an example in which the style conversion model is a translation model and the target language style is toast.

Fig. 4 is a schematic diagram of the translation model in the present example, fig. 5 is a schematic diagram of the discriminant model in the present example, and fig. 6 is a schematic diagram of the principle of training the translation model based on the discriminant model in the present example. As shown in fig. 4, the translation model in this example includes a word embedding model (only input word embedding is shown in the drawing) and a translation model, a worship discrimination model (D shown in the drawing) shown in fig. 5 and 6 is a discrimination model in this example, and as can be seen from fig. 6, the input of the discrimination model is the output of an output word embedding layer of the translation model (output with worship tags or non-worship tag labeling).

The translation loss function in this example adopts a general translation loss function, denoted as loss (n), and the second judgment loss function denoted as loss (c), and may adopt the foregoing formula (2), that is:

the first discriminant loss function can be expressed as in equation (1) above, i.e.:

the training procedure in this example is described in detail below:

the method comprises the following steps: as shown in fig. 4, the translation model is pre-trained, i.e., the translation model structure shown in fig. 3 is pre-trained.

Specifically, the loss function used in the pre-training step may directly adopt a translation loss function loss (n), the source language sample in the training sample data (the training data shown in the figure) is input into the translation model, the model is trained until loss (n) converges, and the translation model which preliminarily satisfies the translation quality can be obtained through the pre-training step.

In the pre-training step, the model parameters to be trained include parameters of the word embedding model of the translation model and model parameters of other parts (the translation model shown in the figure). And taking the parameters of the word embedding model in the translation model after the pre-training as the parameters of the word embedding model in the subsequent training process, namely, after the pre-training step is completed, the parameters of the word embedding model in the subsequent training process are fixed.

Step two: as shown in fig. 5, in this step, the toast distinguishing model is pre-trained, and the distinguishing model is trained by using the style conversion model obtained in the step one and in a fixed translation model (i.e. model parameters of the word embedding part and other parts in the fixed translation model), so that the distinguishing model can correctly distinguish toast from non-toast.

Specifically, when the discriminant model is pre-trained based on the second discriminant loss function, that is, loss (c), the input of the discriminant model is the output of the output embedding layer of the pre-trained translation model, that is, the word embedding shown in fig. 5 is the output embedding of the translation model. As shown in fig. 5, when training the discriminant model (i.e., the classification network), the output of the translation model is labeled with a previous label of the dedication and the non-dedication, i.e., a second label, such as the output of the translation model is a dedication text, the tag may be set to "1", if the output of the translation model is non-dedication text, the tag may be set to "0", during pre-training, inputting part or all of the extracted source language text into a translation model after pre-training, inputting the word vector of the target end of the translation model after pre-training (namely the output of the output embedding layer) into the discriminant model, the judgment model outputs the probability that the text corresponding to the input data is the text of the dedication, and the judgment model is continuously trained by adopting the training process until loss (C) converges, so that the pre-training of the judgment model can be finished.

Step three: this step retrains the translation model, as shown in FIG. 6.

Specifically, on the basis of the translation model pre-trained in the step one, the worship discrimination model uses the discriminator model pre-trained in the step two, the parameters of the discrimination model are fixed in the retraining process, the parameters of the word embedding model of the translation model are fixed (namely, a fixed word embedding mode is adopted), and finally the Loss of the discrimination model (corresponding to the first discrimination Loss function) and the Loss of the translation model (corresponding to the text processing Loss function) are combined to update the parameters of the translation model, and the total Loss function train Loss in the formula (1) can be adopted for retraining in the step.

In fig. 7, after the word vector corresponding to the translation result is obtained by the decoder of the translation model in the example of the present application, how to obtain the schematic diagram of the corresponding word or sub-word (i.e. token at the target end) according to the word vector, in practical application, for the output of the decoder, the vector of the decoder output may first be mapped to a longer vector through a Linear layer (i.e., a fully-connected neural network), for example, the dictionary size of the target end is V, then the dimension of the vector Output by the linear layer is V, the vector input by the linear layer passes through the softmax layer shown as the bottommost layer in the figure, the softmax layer converts the vector into a probability vector (i.e. Output Probabilities are obtained by softmax), the element value of the ith dimension of the probability vector represents the probability that the predicted word or phrase is the corresponding word or phrase of the dimension in the dictionary.

In this example, assuming that a sentence length is T (that is, a text is composed of T words and/or words), candidate translation results of the translation model are K, as shown in fig. 7, the translation model is a cyclic decoding process, a token (a word or a word in a target text) at a target end (i.e., a decoding end) is continuously generated, during decoding for the ith time, the ith word or word needs to be generated, a probability vector with a dimension V is obtained after softmax, K word sampling (i.e., sampling K times) is performed on elements in the probability vector in a polynomial sampling manner with a put back, K sampling results are obtained, and a token (K tokens shown in the figure) corresponding to each result is obtained, in this manner, after T K sampling, for each token in the T tokens of the target text, K candidate tokens can be obtained, as shown in the figure, for the 1 st token, its corresponding k candidate tokens may be denoted as token 1_1, token 1_2, …, and token 1_ k, and for the T th token, its corresponding k candidate tokens may be denoted as token _1, token _2, …, and token _ k. After the candidate token is obtained, k candidate translation results, that is, k candidate sentences, such as scorel to scorek shown in the figure, can be obtained, and it can be understood that for the ith candidate sentence, the sentence is the sentence obtained from tok1_ i, tok2i, … and tokT _ i.

After the candidate sentences are determined, a translation score of each sentence can be calculated based on probability values corresponding to the T candidate tokens of each candidate sentence, for example, for the first candidate sentence scorel, the translation score is obtained as Ptok1_1 Ptok2_1 PtokT … indicated in the figure, where Ptoki _ j represents a probability value corresponding to the result of the j-th sampling when the i-th token is generated, and if Ptok1_1 represents a probability value corresponding to the result of the first sampling when the first token is generated. That is, the translation score of each candidate sentence can be obtained by multiplying the probabilities corresponding to the tokens in each candidate sentence, and as an optional mode, after the score of each candidate sentence is obtained, the relative score between the candidate sentences, that is, the normalized score of each candidate sentence, can also be obtained through the processing of softmax.

For each candidate sentence, the discriminator is used for judging the probability value of each candidate sentence as the tongue, as shown in the figure

The probability value indicating that the ith candidate sentence contains a phrase is calculated based on the probability value and the normalized score of each sentence, and the value of the first discriminant loss function (loss (c) shown in fig. 7) is calculated.

As an alternative, in calculating the first discriminant loss function, scores of the candidate sentences multiplied by the probabilities may be used for each candidate sentence, or the above normalized scores may be used, and the scores involved in the calculation may be relatively more objectively calculated by using the normalized scores.

It is clear to those skilled in the art that, in practical application, as an alternative, in the cyclic decoding process, a token corresponding to the maximum element value in Output properties obtained each time may be obtained, to obtain T tokens with the maximum probability, a sentence corresponding to the T tokens may be used as a final translation result of the translation model, and loss (n) may be calculated based on the T tokens and a target language text corresponding to sample data.

As shown in fig. 6, in the process of retraining, further Training of the translation model may be guided based on the total Loss function Training Loss (the total Loss function shown in formula (1)) determined by the first discriminant Loss function and the translation Loss function until the Training Loss converges, and the translation model obtained by this Training may be used as the final translation model.

Compared with the prior art, the scheme for training the style conversion model based on the discrimination model introduces a minimum risk training mode based on the discrimination model and a maximum likelihood training mode based on the style conversion model in the training process of the style conversion model, and adds a discriminator (capable of judging whether a translation result is a toast or a non-toast) trained in advance into the training of a normal style conversion model, namely, the loss of the discrimination model is fused into the loss of the style conversion model. If the output result of the style conversion model is a result with a target language style (if the translation result is a dedication) in the training process, the loss of the discrimination model is small, the overall loss of the model is small in the training process, and the updating of the model parameters is small; if the output result is the result of the non-target language style, the loss of the discrimination model is large, the overall loss of the model is large during training, the updating of the model parameters is large, the style conversion model obtained based on the training mode is used, and the output result of the model is biased to the target language style when the model is used in an online stage.

The scheme provided by the embodiment of the application is suitable for various general application scenes needing to output the text in the target language style. For example, for the translation model, it can be applied to various text translation scenarios that require the output target language to be a specific language style, such as a case where the model output target language is a dedication, such as korean, japanese, and the like, based on the scheme. The scheme of the embodiment of the application is suitable for converting the style of the text, and can be realized by adopting a similar scheme aiming at different styles.

As shown in fig. 10, for a schematic diagram of calculating a loss function provided in this embodiment of the application, Token is input, and is first converted into corresponding word embedding, a corresponding text processing loss function is obtained through NMT and NMT output is obtained through prediction, K times of sampling are performed on the NMT output to obtain K sampled sentences, scores of the K sentences are obtained through calculation, the K sentences are input into a discriminant model at the same time, a toast probability value of each sentence is obtained through prediction, and finally a negative log is obtained for the K sentences and multiplied by the scores of the K sentences to obtain a first discriminant loss function.

For the scheme provided by the embodiment of the present application, taking a translation model that needs to output a toast style as an example, assuming that a source language is chinese and a target language is korean, a chinese text of 2133 sentences is randomly extracted to test the existing translation model and the translation model provided by the embodiment of the present application, and the test results are shown in the following table:

	BLEU	rate of speech
			Existing translation model	34.21	44.77％
In the first embodiment of the present application	33.57	79.23％
			Scheme two of the present application	34.17	99.40％

The first scheme refers to a translation model obtained by training based on sample data with a toast tag and a non-toast tag provided in the embodiment of the application, and the second scheme refers to a translation model obtained by training the translation model based on training sample data and a set discrimination model.

As shown in the table, BLEU (bilingual evaluation understudy) is a translation quality evaluation criterion, and higher score indicates higher translation quality. The BLEU and the dedication rate in the test results shown in the table are calculated as the mean of the test results of the BLEU score top5 obtained by the test. As can be seen from the table, based on the scheme provided by the embodiment of the present application, the dedication rate in the translation result can be greatly increased under the condition that the BLEU is basically unchanged.

Based on the same principle as the method shown in fig. 2, the embodiment of the present application further provides a training apparatus for a language-style conversion model, as shown in fig. 8, the training apparatus 100 for a language-style conversion model may include a training sample obtaining module 110 and a model training module 120. Wherein:

the training sample acquisition module 110 is configured to acquire training sample data, where the training sample data includes first training texts and second training samples, each first training text includes a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample includes a source text in an original language style and a target text in a non-target language style corresponding to the source text;

the model training module 120 is configured to train the language style conversion model based on training sample data until a total loss function of the style conversion model converges, where the total loss function includes a text processing loss function, and the text processing loss function is used to represent a difference between a text output by the style conversion model and a corresponding target text.

Optionally, the language style conversion model includes a translation model or a question-and-answer model.

Optionally, the target language style comprises a toast style.

Optionally, when the model training module 120 trains the language style conversion model based on the training sample data, it may specifically be configured to:

Optionally, the model training module 120 is specifically configured to train the language style conversion model based on the training sample data until the total loss function of the style conversion model converges:

setting a language style discrimination model, wherein the discrimination model is used for judging the probability that the output text of the style conversion model is the text with the target language style, and the total loss function further comprises a first discrimination loss function corresponding to the discrimination model;

Optionally, the value of the first discriminant loss function is determined based on the score of each candidate output of the style conversion model and the probability that the discrimination result corresponding to each candidate output is the text with the target language style.

Optionally, the value of the second discrimination loss function is determined based on the discrimination result of the discrimination model and a second label of the output of the style conversion model corresponding to the discrimination result, where the second label is used to represent whether the output of the style conversion model is the output with the target language style.

Optionally, the model training module 120 is specifically configured to pre-train the discriminant model based on the output of the pre-trained style conversion model until the second discriminant loss function converges:

Optionally, the model training module 120 is specifically configured to, when training the style conversion model after the pre-training based on the training sample data and the pre-trained discrimination model:

Optionally, the style conversion model includes a word embedding model, and the model training module 120 may specifically be configured to, when fixing the model parameters of the pre-trained discrimination model:

It is understood that each module of the training device provided in the embodiments of the present application may have a function of implementing the corresponding step in the training method provided in the embodiments of the present application. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The modules can be software and/or hardware, and can be implemented individually or by integrating a plurality of modules. For the functional description of each module of the training apparatus, reference may be made to the corresponding description in the training method in each embodiment described above, and details are not repeated here.

Based on the principle of the training method provided by the embodiment of the present application, the embodiment of the present application further provides a text processing method, which may include:

acquiring a text to be processed;

inputting the text to be processed into a language style conversion model to obtain a target text with a target language style corresponding to the text to be processed, wherein the language style conversion model is obtained by training based on a training method of the language style conversion model provided in any embodiment of the application.

Optionally, if the language-style conversion model is obtained by training based on the training method corresponding to the example shown in fig. 3, before inputting the text to be processed into the style conversion model, the method further includes:

and marking a label of the text to be processed, wherein the label is used for representing the language style of a target text corresponding to the text to be processed as a target language style.

That is, when the style conversion model is trained in a manner of labeling a label used for representing whether a target text corresponding to a source text is a text having a target language style with training sample data, when the text to be processed is processed based on the trained model, in order to make the output be a text that tends to have the target language style, the text to be processed may also be labeled with a label that corresponds to the text having the target language style, such as labeling a label "_ p" at the end of the text to be processed.

Based on the principle of the training method provided by the embodiment of the present application, the embodiment of the present application further provides a text processing apparatus, which may include a to-be-processed text acquisition module and a target text acquisition model. Wherein:

the target text obtaining model is used for inputting the text to be processed into the language style conversion model to obtain the target text with the target language style corresponding to the text to be processed, wherein the language style conversion model is obtained by training based on the training method of the language style conversion model provided in any embodiment of the application.

Optionally, if the language style conversion model is obtained by training based on the training method corresponding to the example shown in fig. 3, the target text obtaining module is further configured to:

before the text to be processed is input into the style conversion model, labeling a label of the text to be processed, wherein the label is used for representing the language style of a target text corresponding to the text to be processed as a target language style.

Based on the same principle as the scheme described in the foregoing, an embodiment of the present application further provides a text processing method, which may include:

acquiring a text to be processed;

labeling a label of the text to be processed, wherein the label is used for representing that the language style of a target text corresponding to the text to be processed is a target language style;

and inputting the text to be processed into a language style conversion model to obtain a target text corresponding to the text to be processed.

Alternatively, the target language style includes, but is not limited to, a toast style or a non-toast style.

Correspondingly, an embodiment of the present application further provides a text processing apparatus, which may include:

the text marking module is used for marking a label of the text to be processed, and the label is used for representing the language style of a target text corresponding to the text to be processed as a target language style;

and the target text acquisition model is used for inputting the text to be processed into the language style conversion model to obtain a target text corresponding to the text to be processed.

Based on the same principle as the methods and apparatuses provided by the present application, embodiments of the present application also provide an electronic device, which may include a memory and a processor; wherein the memory has stored therein a computer program; the processor is used for calling the computer program to execute the steps executed by the method or any device provided by any embodiment of the application.

The embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps performed by any one of the methods provided in any one of the embodiments of the present application or any one of the apparatuses.

Alternatively, fig. 9 shows a schematic structural diagram of an electronic device to which the embodiment of the present application is applied, and as shown in fig. 9, the electronic device 4000 may include a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing computer programs for executing the present scheme, and is controlled by the processor 4001 for execution. Processor 4001 is configured to execute a computer program stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of text processing, comprising:

acquiring a text to be processed;

2. The method of claim 1, wherein the target language style is a toast style or a non-toast style.

3. A method for training a language style conversion model, comprising:

training a language style conversion model based on the training sample data until a total loss function of the style conversion model converges, wherein the total loss function comprises a text processing loss function, and the text processing loss function is used for representing the difference between the text output by the style conversion model and the corresponding target text.

4. The method according to claim 1, wherein training a language style conversion model based on the training sample data comprises any of:

marking a first label of a source text in the training sample data, wherein the first label is used for representing whether the language style of the source text is a target language style; training the style conversion model based on the labeled source text and the corresponding target text;

setting a language style discriminant model for determining a probability that a text output from the style conversion model is a text having a target language style, wherein the total loss function further includes a first discriminant loss function corresponding to the discriminant model; and training the style conversion model based on the training sample data and the discrimination model.

5. The method of claim 4, wherein training a language style conversion model based on the training sample data until a total loss function of the style conversion model converges comprises:

pre-training the translation model based on the training sample data until the text processing loss function is converged;

6. The method of claim 5, wherein training a language style conversion model based on the training sample data until a total loss function of the style conversion model converges comprises:

pre-training the discrimination model based on the output of the style conversion model after pre-training until a second discrimination loss function is converged;

and training the style conversion model after pre-training based on the training sample data and the discriminant model after pre-training until the total loss function is converged.

7. A text processing apparatus, comprising:

the text labeling module is used for labeling a label of the text to be processed, and the label is used for representing that the language style of a target text corresponding to the text to be processed is a target language style;

8. An apparatus for training a language-style conversion model, comprising:

the training sample acquisition module is used for acquiring training sample data, wherein the training sample data comprises first training texts and second training samples, each first training text comprises a source text in an original language style and a target text in a target language style corresponding to the source text, and each second training sample comprises a source text in the original language style and a target text in a non-target language style corresponding to the source text;

and the model training module is used for training a language style conversion model based on the training sample data until the total loss function of the style conversion model is converged, wherein the total loss function comprises a text processing loss function, and the text processing loss function is used for representing the difference between the text output by the style conversion model and the corresponding target text.

9. An electronic device comprising a memory and a processor;

the memory has stored therein a computer program;

the processor for invoking the computer program to perform the method of any of claims 1-6.

10. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 6.