CN114896993B

CN114896993B - Translation model generation method and device, electronic equipment and storage medium

Info

Publication number: CN114896993B
Application number: CN202210490580.1A
Authority: CN
Inventors: 张传强; 张睿卿; 何中军; 李芝; 吴华
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2023-03-24
Anticipated expiration: 2042-05-06
Also published as: CN114896993A

Abstract

The present disclosure provides a method and an apparatus for generating a translation model, an electronic device, and a storage medium, which relate to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as natural language processing and deep learning. The method comprises the following steps: acquiring a first sample pair data set, inputting a first spliced statement obtained by splicing a first source statement and a related statement into a first encoder of an initial translation model to acquire a predicted statement output by a first sub-model and a predicted label output by a second sub-model; and correcting the initial translation model according to a first difference between the prediction statement and the associated statement corresponding to the first class label and a second difference between the prediction label and the label corresponding to the associated statement in the first spliced statement to obtain a corrected translation model. Therefore, the classification task and the translation task are combined, and the translation model is trained, so that the translation accuracy of the translation model is improved, and the translation missing probability is reduced.

Description

Translation model generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a translation model, an electronic device, and a storage medium.

Background

As the technology of artificial intelligence has been continuously developed and perfected, it has played an extremely important role in various fields related to human daily life, for example, artificial intelligence has made a significant progress in the application scenario of machine translation. However, the translation missing in the translation process may cause inaccurate translation results. Therefore, how to reduce the translation missing rate of machine translation to improve the accuracy of machine translation is an important research direction.

Disclosure of Invention

The disclosure provides a translation model generation method and device, electronic equipment and a storage medium.

According to a first aspect of the present disclosure, there is provided a method for generating a translation model, including:

acquiring a first sample pair data set, wherein each first sample pair comprises a first source statement, at least one associated statement and a label corresponding to each associated statement;

inputting a first spliced sentence obtained by splicing the first source sentence and a related sentence into a first encoder of an initial translation model to obtain a predicted sentence output by a first sub-model of the initial translation model and a predicted label output by a second sub-model;

and respectively correcting the first sub-model, the second sub-model and the first encoder according to a first difference between the prediction statement and the associated statement corresponding to the first class label and a second difference between the prediction label and the label corresponding to the associated statement in the first spliced statement to obtain a corrected translation model.

According to a second aspect of the present disclosure, there is provided a generation apparatus of a translation model, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a data set of a first sample pair, and each first sample pair comprises a first source statement, at least one associated statement and a label corresponding to each associated statement;

the second obtaining module is used for inputting the first spliced statement obtained by splicing the first source statement and one associated statement into a first encoder of an initial translation model so as to obtain a predicted statement output by a first sub-model of the initial translation model and a predicted label output by a second sub-model;

and the first correction module is used for correcting the first sub-model, the second sub-model and the first encoder respectively according to a first difference between the prediction statement and the associated statement corresponding to the first class label and a second difference between the prediction label and the label corresponding to the associated statement in the first spliced statement so as to obtain a corrected translation model.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a translation model according to the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of generating a translation model according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the steps of the method of generation of a translation model according to the first aspect.

The generation method and device of the translation model, the electronic device and the storage medium have the following beneficial effects:

in the embodiment of the disclosure, a first sample pair data set is first obtained, where each first sample pair includes a first source sentence, at least one associated sentence, and a label tag corresponding to each associated sentence, then the first concatenated sentence obtained by concatenating the first source sentence and one associated sentence is input into a first encoder of an initial translation model to obtain a predicted sentence output by a first submodel of the initial translation model and a predicted label output by a second submodel, and finally, according to a first difference between the predicted sentence and the associated sentence corresponding to the first category label and a second difference between the predicted label and the label tag corresponding to the associated sentence in the first concatenated sentence, the first submodel, the second submodel, and the first encoder are respectively corrected to obtain a corrected translation model. Therefore, the translation model is trained according to the first difference between the predicted sentence corresponding to the first source sentence and the correctly translated sentence and the second difference between the predicted label corresponding to the correctly translated sentence or the translation missing sentence and the label, so that the translation accuracy of the translation model is improved, and the translation missing probability is reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart diagram of a method for generating a translation model according to an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a method for generating a translation model according to another embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a method for generating a translation model according to another embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus for generating a translation model according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device used to implement the generation method of the translation model of the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the disclosure relates to the technical field of artificial intelligence such as computer vision and deep learning.

Artificial Intelligence (Artificial Intelligence), abbreviated in english as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Natural language processing is the computer processing, understanding and use of human languages (such as chinese, english, etc.), which is a cross discipline between computer science and linguistics, also commonly referred to as computational linguistics. Since natural language is the fundamental mark that humans distinguish from other animals. Without language, human thinking has not been talk about, so natural language processing embodies the highest task and context of artificial intelligence, that is, only when a computer has the capability of processing natural language, the machine has to realize real intelligence.

A method, an apparatus, an electronic device, and a storage medium for generating a translation model according to embodiments of the present disclosure are described below with reference to the drawings.

It should be noted that an execution subject of the method for generating a translation model according to this embodiment is a device for generating a translation model, which may be implemented in a software and/or hardware manner, and the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

Fig. 1 is a schematic flow chart of a method for generating a translation model according to an embodiment of the present disclosure.

As shown in fig. 1, the method for generating the translation model includes:

s101: and acquiring a first sample pair data set, wherein each first sample pair comprises a first source statement, at least one associated statement and a label tag corresponding to each associated statement.

The first source sentence can be a sentence to be translated in the first sample pair. The first source sentence may be english, chinese, french, etc., which is not limited by this disclosure.

Wherein, the associated sentence can be a correct translation sentence corresponding to the first source sentence, namely a positive sample; alternatively, the translation may be a translation of the translation that is missing and corresponds to the first source sentence, i.e., a negative example. The associated sentences may be english, chinese, french, etc., which is not limited in this disclosure.

It should be noted that the number of the association sentences of the positive example and the number of the association sentences of the negative example are not limited in the present disclosure. That is, the first sample pair may include at least one associated statement corresponding to a positive sample and at least one associated statement corresponding to a negative sample.

Optionally, the label tag corresponding to the associated statement of the positive sample may be "1", and the label tag corresponding to the associated statement of the negative sample may be "0". Or, the label corresponding to the related statement of the positive sample is "0", and the label corresponding to the related statement of the negative sample is "1". The present disclosure is not limited thereto.

Optionally, word-level pruning or sentence-level pruning may be performed on the correct translation result to obtain a translation result of the translation missing corresponding to the first source sentence.

For example, if the first source statement is "I didn't knock if you'd have you your baggage with you. I have no baggage", the correct translation result, i.e. the associated statement corresponding to the positive sample, may be "I don't know whether you carry baggage or not. I do not have luggage, etc., the label tag corresponding to the association statement may be "1". The translation result of the translation corresponding to the first source sentence, i.e. the associated sentence corresponding to the negative example, may be "i know if you are luggage. I do not have luggage, or, in other words, "i do not know whether you carry luggage" or not, etc., the label tag corresponding to the association statement may be "0". The present disclosure is not limited thereto.

S102: and inputting a first spliced sentence obtained by splicing the first source sentence and one associated sentence into a first encoder of the initial translation model so as to obtain a predicted sentence output by a first sub-model of the initial translation model and a predicted label output by a second sub-model.

The first spliced sentence can be a sentence obtained by splicing the source sentence and one associated sentence.

Optionally, in the first sample pair, under the condition that the first sample pair includes a first associated statement corresponding to at least one first category tag and a second associated statement corresponding to at least one second category tag, the first source statement and the second associated statement are spliced to generate a first spliced statement.

The first category label may be a label corresponding to the positive sample, that is, a correct translation result corresponding to the first source statement. The second category label may be a label corresponding to the negative example, that is, a translation result of the translation miss corresponding to the first source sentence.

The first associated sentence may be a correct translated sentence corresponding to the first source sentence. The second associated statement may be a translation of the translation that corresponds to the first source statement.

It is understood that, in the case that the first sample pair includes at least one first associated sentence and at least one second associated sentence, the first source sentence and each second associated sentence may be spliced to obtain at least one first spliced sentence, and each first spliced sentence is sequentially input into the first encoder.

Or, under the condition that the first sample pair comprises one associated sentence and the label tag corresponding to the associated sentence is the first class tag, splicing the first source sentence with the associated sentence to generate a first spliced sentence.

It can be understood that, in the case that the first sample pair only includes one associated sentence corresponding to the first category tag, that is, a correctly translated sentence corresponding to the first source sentence, the first source sentence and the associated sentence are spliced to obtain the first spliced sentence.

In the embodiment of the disclosure, the associated sentence spliced with the first source sentence can be determined according to the category of the label corresponding to the associated sentence contained in the first sample pair, so that the associated sentence spliced with the first source sentence contains both a positive sample and a negative sample, and then the classification task can be trained on the basis of the translation task.

Wherein the initial translation model may be a model that has not been trained for the translation task. The initial translation model may include a first encoder, a first submodel, and a second submodel.

Wherein the first encoder is configured to encode the first concatenation statement. The first sub-model may be a decoder for training a translation task to obtain a predicted translation result corresponding to the first source sentence, i.e. the predicted sentence. The second sub-model may be a fully-connected network, and is used for training a classification task to obtain a prediction label corresponding to the associated sentence in the first concatenation sentence.

S103: and respectively correcting the first sub-model, the second sub-model and the first encoder according to a first difference between the prediction statement and the associated statement corresponding to the first class label and a second difference between the prediction label and the label corresponding to the associated statement in the first spliced statement to obtain a corrected translation model.

It can be understood that after the predicted sentence output by the first sub-model and the predicted tag output by the second sub-model are obtained, the initial translation model can be corrected according to the difference between the predicted sentence and the correct translation sentence corresponding to the first source sentence, that is, the first difference between the predicted sentence and the associated sentence corresponding to the first class tag, and the second difference between the predicted tag and the tagging tag corresponding to the associated sentence in the first spliced sentence, so as to obtain a corrected translation model. Therefore, the convergence and robustness of the initial translation model can be improved, the performance of the translation model is further improved, and the probability of translation missing is reduced.

Fig. 2 is a flowchart illustrating a method for generating a translation model according to another embodiment of the present disclosure.

As shown in fig. 2, the method for generating the translation model includes:

s201: and acquiring a first sample pair data set, wherein each first sample pair comprises a first source statement, at least one associated statement and a label tag corresponding to each associated statement.

S202: and inputting the first spliced sentence into a first encoder of the initial translation model to obtain a first vector corresponding to the first source sentence output by the first encoder and a second vector corresponding to the first spliced sentence.

Optionally, the first source statement, the separator, and an associated statement are sequentially spliced to obtain a first spliced statement. That is, when the first source sentence and the associated sentence are spliced, a separator [ sep ] may be added between the first source sentence and the associated sentence. Therefore, when the first encoder performs encoding, the position of the first source sentence in the first spliced sentence and the position of the associated sentence can be determined according to the separator. And then, the first source sentence can be coded, a corresponding first vector is output, the first spliced sentence is coded, and a second vector corresponding to the first spliced sentence is output.

S203: and inputting the first vector into the first submodel to obtain a prediction statement corresponding to the first vector.

The first sub-model may be a decoder, configured to decode the first vector to obtain a predicted sentence corresponding to the first source sentence, that is, a predicted translation result corresponding to the first source sentence.

For example, the first source sentence may be "I didn't knock if you'd have you with you. I do not have luggage ", or" i don't know if you are carrying luggage. I carry luggage "etc.

S204: and inputting the second vector into a second submodel to obtain a prediction label corresponding to the second vector.

The prediction tag may include a positive sample and a negative sample. The present disclosure is not limited thereto.

The second submodel may be composed of a full link layer and a softmax layer, and is configured to classify the first source sentence and the associated sentence in the first spliced sentence, so as to predict whether the associated sentence in the first spliced sentence is a correct translation result of the first source sentence, that is, whether the associated sentence is a positive sample or a negative sample.

S205: and correcting the first submodel according to a first difference between the prediction statement and the associated statement corresponding to the first class label.

Optionally, calculating the loss function of the first difference may be:

wherein L is _MT Is a first difference, x ^ori1 Is the first source statement, y ^ori1 For associated statements corresponding to tags of the first category, y _j For the prediction statement, D1 is the first sample pair dataset.

It will be appreciated that after the first difference is determined, the first submodel may be modified to improve the performance of the first submodel based on the first difference.

S206: and correcting the second submodel according to a second difference between the predicted label and the label corresponding to the associated statement in the first spliced statement.

Optionally, a cross entropy loss function may be used to determine a second difference between the prediction tag and the label tag corresponding to the associated statement in the first concatenation statement.

It will be appreciated that after the second difference is determined, the second submodel may be modified based on the second difference to improve the performance of the second submodel.

S207: and correcting the first encoder according to the first difference, the first preset weight, the second difference and the second preset weight.

It should be noted that, since the classification task and the translation task both use the first encoder to perform encoding, the first encoder may be modified according to the first difference and the second difference, so as to improve the performance of the first encoder.

The first preset weight may be a preset weight value corresponding to the first difference. The second preset weight may be a preset weight value corresponding to the second difference.

Note that, the sum of the first preset weight and the second preset weight is 1. For example, if the first preset weight is 0.5, the second preset weight is 0.5; or, the first predetermined weight is 0.4, and the second predetermined weight is 0.6. The present disclosure is not limited thereto.

It can be understood that the classification task and the translation task are utilized to train the translation model at the same time, and the coding accuracy of the first encoder can be improved, so that the first encoder can enable the associated sentences corresponding to the first category labels to be more similar to the coding of the first source sentences, and the associated sentences corresponding to the second category labels have larger difference with the coding of the first source sentences, thereby improving the translation accuracy of the translation model and reducing the translation missing probability.

In the embodiment of the disclosure, a first sample pair data set is first obtained, where each first sample pair includes a first source sentence, at least one associated sentence, and a label tag corresponding to each associated sentence, then the first spliced sentence is input into a first encoder of an initial translation model to obtain a first vector corresponding to the first source sentence output by the first encoder and a second vector corresponding to the first spliced sentence, the first vector is input into a first sub-model to obtain a predicted sentence corresponding to the first vector, the second vector is input into a second sub-model to obtain a predicted label corresponding to the second vector, finally the first sub-model is corrected according to a first difference between the predicted sentence and the associated sentence corresponding to the first type label, the second sub-model is corrected according to a second difference between the predicted label and the label tag corresponding to the associated sentence in the first spliced sentence, and the first encoder is corrected according to the first difference, a first preset weight, a second difference, and a second preset weight. Therefore, the first vector and the second vector are respectively input into the first sub-model and the second sub-model, the predicted sentence corresponding to the first source sentence and the predicted label corresponding to the associated sentence are obtained, and then the first encoder, the first sub-network and the second sub-network are corrected according to the first difference and the second difference, so that the capability of the first encoder for encoding the translation missing sentence and the first source sentence with large difference is further improved, the translation accuracy of the translation model is further improved, and the translation missing probability is further reduced.

From the above analysis, the present disclosure may train the initial translation model with the first sample to obtain a trained translation model. Before the initial translation model is trained, a pre-training mode can be adopted to pre-train the first encoder and the first sub-model in the initial translation model, so that the training efficiency of the translation model is improved. The following describes in detail the training process for obtaining the first encoder and the first sub-model in the initial translation model with reference to fig. 3.

as shown in fig. 3, the method for generating the translation model includes:

s301: and acquiring a second sample pair data set, wherein the second sample pair data set comprises a second source statement, a third associated statement corresponding to the first class label, a fourth associated statement corresponding to the second class label, a mask statement corresponding to the third associated statement and a masked word.

The second source sentence may be a sentence to be translated in the second sample pair. The second source sentence may be english, chinese, french, etc., which is not limited by this disclosure.

The third associated sentence corresponding to the first category label may be a correctly translated sentence corresponding to the second source sentence. The fourth associated sentence to which the second category label corresponds may be a translation sentence to which the second source sentence corresponds and which is missing.

It should be noted that the second sample pair may include at least one third related sentence corresponding to the second source sentence and at least one fourth related sentence. The present disclosure is not limited thereto.

And the mask statement is obtained by partially masking the words in the third related statement. For example, mask 15% of words or 20% of words in the mask statement randomly to obtain the mask statement.

The masked words may be words corresponding to masked positions in the mask statement.

For example, the first source sentence is "what weather is today? Today is sunny "and the third related statement is" What's the weather today's sunny today ", then the corresponding second mask statement may be" What's the weather XXXIt's XXX today "where" XXX "is the mask position. The masked words are "today" and "sunny".

S302: and inputting a second spliced statement, a second source statement, a third associated statement and a fourth associated statement after the second source statement and the mask statement are spliced into a second encoder of the initial model in sequence to obtain a third vector corresponding to the second spliced statement, a fourth vector corresponding to the second source statement, a fifth vector corresponding to the third associated statement and a sixth vector corresponding to the fourth associated statement.

The second spliced statement may be a statement obtained by splicing the second source statement and the mask statement.

The second vector is output after the second encoder encodes the second splicing statement; the fourth vector is output after the second encoder encodes the second source statement; the fifth vector is output after the second encoder encodes the third association statement; the sixth vector is a vector output after the second encoder encodes the fourth association statement.

It should be noted that the input data of the second encoder may include a plurality of fourth association statements, and therefore, the second encoder may output a plurality of sixth vectors.

S303: a first similarity between the fourth vector and the fifth vector and a second similarity between the fourth vector and the sixth vector are determined.

It should be noted that, in the embodiment of the present disclosure, any desirable manner may be adopted to obtain the first similarity between the fourth vector and the fifth vector, and the second similarity between the fourth vector and the sixth vector, which is not limited in the present disclosure. For example, a first similarity between the fourth vector and the fifth vector and a second similarity between the fourth vector and the sixth vector may be calculated using a euclidean distance formula or a manhattan distance formula.

S304: and inputting the third vector into a third sub-model of the initial model to obtain a predicted word corresponding to the masked position.

The third sub-model may be a decoder, configured to decode the third vector to predict a predicted word corresponding to a masked position in the mask statement.

S305: and respectively correcting the second encoder and the third submodel according to a third difference between the prediction words and the masked words and a fourth difference between the first similarity and the second similarity so as to generate a first encoder and a first submodel.

Optionally, the third sub-model may be modified according to a third difference between the prediction word and the masked word, and then the second encoder may be modified according to the third difference, the third preset weight, the fourth difference, and the fourth preset weight, that is, the target loss value is determined according to the third difference, the third preset weight, the fourth difference, and the fourth preset weight, and the second encoder is modified according to the target loss value.

Optionally, the loss function determining the third difference may be:

wherein the content of the first and second substances,

is a third difference, x ^ori2 Is a second source statement, y ^ori3 Is a third associated statement, y ^m For a third associated sentence y ^ori3 Chinese masked word, y ^o For a third associated sentence y ^ori3 Which are not masked words. y is _t To predict the word, D2 is the second sample pair dataset. />

Optionally, the loss function for determining the fourth difference may be:

wherein the content of the first and second substances,

is a fourth difference, x ^ori2 Is a second source statement, y ^ori3 Is a third associated statement, y ^ant4 Is a fourth association statement, f (x) ^ori2 ,y ^ori3 ) Is the first similarity, f (x) ^ori2 ,y ^ant4 ) Is the second similarity.

Alternatively, the target loss value may be calculated by the formula:

wherein the content of the first and second substances,

to target loss value, λ ₁ Is a third predetermined weight, λ ₂ Is a fourthThe weight is preset.

The third preset weight may be a preset weight value corresponding to the third difference. The fourth preset weight may be a preset weight value corresponding to the fourth difference.

Note that the sum of the third preset weight and the fourth preset weight is 1. For example, if the third preset weight is 0.7, the fourth preset weight is 0.3; alternatively, if the third predetermined weight is 0.6, the fourth predetermined weight is 0.4. The present disclosure is not limited thereto.

In the embodiment of the present disclosure, the second encoder and the third submodel are trained in combination with contrast learning to generate the first encoder and the first submodel, so that the vectors corresponding to the sentences having the same meaning output by the first encoder are closer, and the vectors corresponding to the sentences having different meanings are farther. Thereby, the encoding accuracy of the first encoder is improved.

In the embodiment of the disclosure, a second sample pair data set is first obtained, then a second spliced statement, a second source statement, a third related statement and a fourth related statement in the second sample pair data set after splicing a second source statement and a mask statement are sequentially input into a second encoder of an initial model to obtain a third vector corresponding to the second spliced statement, a fourth vector corresponding to the second source statement, a fifth vector corresponding to the third related statement and a sixth vector corresponding to the fourth related statement, then a first similarity between the fourth vector and the fifth vector and a second similarity between the fourth vector and the sixth vector are determined, the third vector is input into a third sub-model of the initial model to obtain a predicted term corresponding to a masked position, and finally the second encoder and the third sub-model are respectively corrected according to a third difference between the predicted term and the masked term and a fourth difference between the first similarity and the second similarity to generate a first encoder and a first sub-model. Therefore, before the translation model is trained, the first encoder and the first sub-model are pre-trained through comparison and learning, so that the translation model has better model parameters before being trained, and conditions are provided for improving the training efficiency of the translation model.

as shown in fig. 4, the apparatus 400 for generating a translation model includes:

a first obtaining module 410, configured to obtain a data set of first sample pairs, where each first sample pair includes a first source statement, at least one associated statement, and a label tag corresponding to each associated statement;

a second obtaining module 420, configured to input the first spliced sentence obtained by splicing the first source sentence and one associated sentence into the first encoder of the initial translation model, so as to obtain a predicted sentence output by the first sub-model of the initial translation model and a predicted tag output by the second sub-model;

the first modification module 430 is configured to modify the first sub-model, the second sub-model, and the first encoder according to a first difference between the prediction statement and the associated statement corresponding to the first class tag and a second difference between the prediction tag and the label tag corresponding to the associated statement in the first spliced statement, so as to obtain a modified translation model.

In some embodiments of the present disclosure, the apparatus further includes a first generating module, specifically configured to:

responding to a first sample pair comprising a first associated statement corresponding to at least one first category label and a second associated statement corresponding to at least one second category label, and splicing the first source statement and the second associated statement to generate a first spliced statement; alternatively, the first and second electrodes may be,

and in response to that the first sample pair comprises one associated sentence and the label tag corresponding to the associated sentence is the first class tag, splicing the first source sentence with the associated sentence to generate a first spliced sentence.

In some embodiments of the present disclosure, the second obtaining module 420 is specifically configured to:

inputting the first spliced sentence into a first encoder of the initial translation model to obtain a first vector corresponding to a first source sentence output by the first encoder and a second vector corresponding to the first spliced sentence;

inputting the first vector into a first submodel to obtain a prediction statement corresponding to the first vector;

and inputting the second vector into a second submodel to obtain a prediction label corresponding to the second vector.

In some embodiments of the present disclosure, the first modification module 430 is specifically configured to:

correcting the first submodel according to a first difference between the prediction statement and the associated statement corresponding to the first class label;

correcting the second submodel according to a second difference between the predicted label and a label corresponding to the associated statement in the first spliced statement;

and correcting the first encoder according to the first difference, the first preset weight, the second difference and the second preset weight.

In some embodiments of the present disclosure, further comprising:

a third obtaining module, configured to obtain a second sample pair data set, where the second sample pair data set includes a second source statement, a third associated statement corresponding to the first category label, a fourth associated statement corresponding to the second category label, a mask statement corresponding to the third associated statement, and a masked word;

a fourth obtaining module, configured to sequentially input a second spliced statement, a second source statement, a third related statement, and a fourth related statement, which are obtained by splicing the second source statement and the mask statement, into a second encoder of the initial model, so as to obtain a third vector corresponding to the second spliced statement, a fourth vector corresponding to the second source statement, a fifth vector corresponding to the third related statement, and a sixth vector corresponding to the fourth related statement;

the first determining module is used for determining a first similarity between the fourth vector and the fifth vector and a second similarity between the fourth vector and the sixth vector;

the fifth obtaining module is used for inputting the third vector into a third sub-model of the initial model so as to obtain a predicted word corresponding to the masked position;

and the second correction module is used for respectively correcting the second encoder and the third submodel according to the third difference between the prediction words and the masked words and the fourth difference between the first similarity and the second similarity so as to generate the first encoder and the first submodel.

In some embodiments of the present disclosure, the second modification module is specifically configured to:

correcting the third submodel according to a third difference between the predicted words and the masked words;

and correcting the second encoder according to the third difference, the third preset weight, the fourth difference and the fourth preset weight.

It should be noted that the explanation of the generation method of the translation model described above is also applicable to the generation device of the translation model of the present embodiment, and details are not described here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as generation of a translation model. For example, in some embodiments, the generation of the translation model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the generation of the translation model described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the generation of the translation model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the Internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

In this embodiment, first a data set of a first sample pair is obtained, where each first sample pair includes a first source sentence, at least one associated sentence, and a label tag corresponding to each associated sentence, then the first concatenated sentence obtained by concatenating the first source sentence and one associated sentence is input into a first encoder of an initial translation model to obtain a predicted sentence output by a first submodel of the initial translation model and a predicted label output by a second submodel, and finally, according to a first difference between the predicted sentence and the associated sentence corresponding to the first category label and a second difference between the predicted label and the label tag corresponding to the associated sentence in the first concatenated sentence, the first submodel, the second submodel, and the first encoder are corrected, respectively, to obtain a corrected translation model. Therefore, the translation model is trained according to the first difference between the predicted sentence corresponding to the first source sentence and the correctly translated sentence and the second difference between the predicted label corresponding to the correctly translated sentence or the translation missing sentence and the label, so that the translation accuracy of the translation model is improved, and the translation missing probability is reduced.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. In the description of the present disclosure, the words "if" and "if" used may be interpreted as "in \8230; \8230when" or "when 8230; \8230when" or "in response to a determination" or "in the case of \8230; \8230.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of generating a translation model, comprising:

correcting the first submodel according to a first difference between the prediction statement and an associated statement corresponding to a first class label;

correcting the second submodel according to a second difference between the predicted label and a label corresponding to an associated statement in the first spliced statement;

2. The method of claim 1, wherein prior to said inputting into the first encoder of the initial translation model the first spliced sentence resulting from splicing the first source sentence with an associated sentence, further comprising:

in response to the first sample pair including a first associated statement corresponding to at least one first category label and a second associated statement corresponding to at least one second category label, splicing the first source statement and the second associated statement to generate the first spliced statement; alternatively, the first and second electrodes may be,

and in response to that the first sample pair comprises one associated sentence and the label tag corresponding to the associated sentence is the first class tag, splicing the first source sentence and the associated sentence to generate the first spliced sentence.

3. The method of claim 1, wherein said inputting the first source sentence and the first concatenated sentence of an associated sentence into the first encoder of the initial translation model to obtain the predicted sentence output by the first sub-model of the initial translation model and the predicted tag output by the second sub-model comprises:

inputting the first spliced statement into a first encoder of the initial translation model to obtain a first vector corresponding to the first source statement output by the first encoder and a second vector corresponding to the first spliced statement;

inputting the first vector into the first submodel to obtain a prediction statement corresponding to the first vector;

and inputting the second vector into the second submodel to obtain a prediction label corresponding to the second vector.

4. The method of claim 1, further comprising:

acquiring a second sample pair data set, wherein the second sample pair data set comprises a second source statement, a third associated statement corresponding to the first class label, a fourth associated statement corresponding to the second class label, a mask statement corresponding to the third associated statement and a masked word;

inputting the second source sentence and the second spliced sentence, the second source sentence, the third associated sentence and the fourth associated sentence spliced by the mask sentence into a second encoder of an initial model in sequence to obtain a third vector corresponding to the second spliced sentence, a fourth vector corresponding to the second source sentence, a fifth vector corresponding to the third associated sentence and a sixth vector corresponding to the fourth associated sentence;

determining a first similarity between the fourth vector and the fifth vector and a second similarity between the fourth vector and the sixth vector;

inputting the third vector into a third sub-model of the initial model to obtain a predicted word corresponding to the masked position;

and respectively correcting the second encoder and the third submodel according to a third difference between the prediction words and the masked words and a fourth difference between the first similarity and the second similarity so as to generate the first encoder and the first submodel.

5. The method of claim 4, wherein the modifying the second encoder and the third submodel according to a third difference between the prediction word and the masked word and a fourth difference between the first similarity and the second similarity, respectively, comprises:

correcting the third submodel according to a third difference between the predicted word and the masked word;

6. An apparatus for generating a translation model, comprising:

the first correcting module is used for correcting the first submodel according to a first difference between the predicted statement and the associated statement corresponding to the first class label; correcting the second submodel according to a second difference between the predicted label and a label corresponding to an associated statement in the first spliced statement; and correcting the first encoder according to the first difference, the first preset weight, the second difference and the second preset weight.

7. The apparatus according to claim 6, further comprising a first generating module, specifically configured to:

8. The apparatus according to claim 6, wherein the second obtaining module is specifically configured to:

9. The apparatus of claim 6, further comprising:

a third obtaining module, configured to obtain a second sample pair data set, where the second sample pair data set includes a second source statement, a third associated statement corresponding to the first class label, a fourth associated statement corresponding to the second class label, a mask statement corresponding to the third associated statement, and a masked word;

a fourth obtaining module, configured to sequentially input the second source statement and the second spliced statement, the second source statement, the third related statement, and the fourth related statement after the mask statement is spliced into a second encoder of an initial model, so as to obtain a third vector corresponding to the second spliced statement, a fourth vector corresponding to the second source statement, a fifth vector corresponding to the third related statement, and a sixth vector corresponding to the fourth related statement;

a first determining module, configured to determine a first similarity between the fourth vector and the fifth vector, and a second similarity between the fourth vector and the sixth vector;

a fifth obtaining module, configured to input the third vector into a third sub-model of the initial model to obtain a predicted word corresponding to the masked position;

a second correcting module, configured to correct the second encoder and the third sub-model according to a third difference between the prediction word and the masked word and a fourth difference between the first similarity and the second similarity, so as to generate the first encoder and the first sub-model.

10. The apparatus according to claim 9, wherein the second modification module is specifically configured to:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising computer instructions which, when executed by a processor, carry out the steps of the method of any one of claims 1 to 5.