CN113822084A

CN113822084A - Statement translation method and device, computer equipment and storage medium

Info

Publication number: CN113822084A
Application number: CN202110801955.7A
Authority: CN
Inventors: 周楚伦; 孟凡东; 苏劲松
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-12-21

Abstract

The embodiment of the application discloses a sentence translation method and device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining a first prediction result based on a translation model, a first sample sentence and a second sample sentence, obtaining a third sample sentence, a fourth sample sentence and a first sample relation label, obtaining a first prediction relation label based on the translation model, the third sample sentence and the fourth sample sentence, and adjusting the translation model based on the first prediction result, the first prediction relation label and the first sample relation label. According to the method provided by the embodiment of the application, when the translation model is trained, the analysis capability of the translation model on the sentences with the association relation is improved under the condition that the translation model has the translation capability, so that the analysis capability of the translation model on the sentences with the association relation can be combined when the sentences are translated based on the translation model in the following process, and the accuracy of the translation model is improved.

Description

Statement translation method and device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a sentence translation method, a sentence translation device, computer equipment and a storage medium.

Background

Under the large background of globalization and the information age, the significance of translation work is great. With the increasing translation workload, translation models are generated. Compared with manual translation, the translation model has higher efficiency and is more widely applied. However, as the requirement of people for translation accuracy rate is higher and higher, how to improve the accuracy rate of the translation model becomes a problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a sentence translation method and device, computer equipment and a storage medium, and can improve the accuracy of a translation model. The technical scheme is as follows:

in one aspect, a statement translation method is provided, and the method includes:

obtaining a first prediction result based on a translation model, a first sample sentence, and a second sample sentence, the first prediction result indicating a possibility of translating the first sample sentence into the second sample sentence based on the translation model, the first sample sentence having the same meaning as the second sample sentence, and the first sample sentence belonging to a source language, the second sample sentence belonging to a target language;

obtaining a third sample sentence, a fourth sample sentence and a first sample relation label, wherein the first sample relation label indicates whether the third sample sentence and the fourth sample sentence have an association relation, and the third sample sentence and the fourth sample sentence both belong to the source language;

obtaining a first prediction relationship label based on the translation model, the third sample statement and the fourth sample statement, the first prediction relationship label indicating a prediction association between the third sample statement and the fourth sample statement;

adjusting the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label.

In one possible implementation manner, the obtaining a second loss value based on the first prediction probability and the first sample relationship label includes:

in response to the first sample relationship label being a first positive sample relationship label, obtaining the second loss value based on the first prediction probability, the first positive sample relationship label indicating that the third sample statement and the fourth sample statement belong to the same group of dialogs, the first prediction probability and the second loss value having a negative correlation; alternatively, the first and second electrodes may be,

and in response to the first sample relation label being a first negative sample relation label, obtaining the second loss value based on a difference between a target value and the first prediction probability, wherein the first negative sample relation label indicates that the third sample statement and the fourth sample statement do not belong to the same group of conversations, and a negative correlation exists between the difference and the second loss value.

In another possible implementation manner, the obtaining a third loss value based on the second prediction probability and the first sample relationship label includes:

in response to the first sample relationship label being a second positive sample relationship label, obtaining the third loss value based on the second prediction probability, where the second positive sample relationship label indicates that the third sample statement and the fourth sample statement are issued by the same interlocutor, and the second prediction probability and the third loss value are in a negative correlation; alternatively, the first and second electrodes may be,

and in response to the first sample relation label being a second negative sample relation label, obtaining the third loss value based on a difference between a target value and the second prediction probability, wherein the second negative sample relation label indicates that the third sample statement and the fourth sample statement are not issued by the same interlocutor, and the difference and the third loss value are in a negative correlation relationship.

In another possible implementation manner, the obtaining a first translation term based on the translation model and the seventh coding feature includes:

coding an initial character based on the translation model to obtain an eighth coding feature corresponding to the initial character;

fusing the seventh coding feature and the eighth coding feature based on the translation model to obtain a fused feature;

and decoding the fusion characteristics based on the translation model to obtain a first translation word.

In another possible implementation manner, the obtaining a next translation word based on the translation model, the seventh coding feature, and the first translation word includes:

based on the translation model, encoding the initial character and the translation words obtained currently to obtain a ninth encoding characteristic;

fusing the seventh coding feature and the ninth coding feature based on the translation model to obtain a fused feature;

and decoding the fusion characteristics based on the translation model to obtain the next translation word.

In another aspect, there is provided a sentence translating apparatus, the apparatus including:

an obtaining module configured to obtain a first prediction result based on a translation model, a first sample sentence, and a second sample sentence, the first prediction result indicating a possibility of translating the first sample sentence into the second sample sentence based on the translation model, the first sample sentence having the same meaning as the second sample sentence, and the first sample sentence belonging to a source language, the second sample sentence belonging to a target language;

the obtaining module is further configured to obtain a third sample sentence, a fourth sample sentence, and a first sample relationship tag, where the first sample relationship tag indicates whether there is an association relationship between the third sample sentence and the fourth sample sentence, and the third sample sentence and the fourth sample sentence both belong to the source language;

the obtaining module is further configured to obtain a first prediction relationship label based on the translation model, the third sample statement, and the fourth sample statement, where the first prediction relationship label indicates a prediction association relationship between the third sample statement and the fourth sample statement;

an adjusting module, configured to adjust the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label.

In one possible implementation manner, the obtaining module includes:

the splicing unit is used for splicing the third sample statement and the fourth sample statement to obtain a first spliced statement;

a coding unit, configured to code the first spliced sentence based on the translation model to obtain a first coding feature, where the first coding feature includes a plurality of first feature vectors, a first one of the first feature vectors corresponds to a start character located before the first spliced sentence, each of the first feature vectors except for the first one of the first feature vectors corresponds to a first word, the first word is a word in the first spliced sentence, and each of the first feature vectors is obtained by weighted fusion of a word vector of the start character and word vectors of the first words;

and the classification unit is used for classifying the first feature vector to obtain the first prediction relation label.

In another possible implementation, the first sample relationship label indicates whether the third sample statement and the fourth sample statement belong to the same group of dialogs; the classification unit is configured to classify a first one of the first feature vectors based on a first classification model to obtain a first prediction probability, where the first prediction probability indicates a possibility that the third sample sentence and the fourth sample sentence belong to the same group of dialogues;

the adjustment module includes:

a first obtaining unit configured to obtain a first loss value based on the first prediction result;

the first obtaining unit is further configured to obtain a second loss value based on the first prediction probability and the first sample relationship label;

a first adjusting unit, configured to adjust the translation model and the first classification model based on the first loss value and the second loss value.

In another possible implementation manner, the first obtaining unit is configured to obtain the second loss value based on the first prediction probability in response to that the first sample relation label is a first positive sample relation label, where the first positive sample relation label indicates that the third sample statement and the fourth sample statement belong to the same group of dialogs, and the first prediction probability and the second loss value are in a negative correlation; or, in response to that the first sample relation label is a first negative sample relation label, obtaining the second loss value based on a difference between a target value and the first prediction probability, where the first negative sample relation label indicates that the third sample statement and the fourth sample statement do not belong to the same group of conversations, and a negative correlation exists between the difference and the second loss value.

In another possible implementation manner, the third sample statement and the fourth sample statement belong to the same group of dialogues, and the first sample relationship tag indicates whether the third sample statement and the fourth sample statement are issued by the same interlocutor; the classification unit is configured to classify the first feature vector based on a second classification model to obtain a second prediction probability, where the second prediction probability indicates a possibility that the third sample statement and the fourth sample statement are issued by the same interlocutor;

the adjustment module includes:

the first obtaining unit is further configured to obtain a third loss value based on the second prediction probability and the first sample relationship label;

a first adjusting unit, configured to adjust the translation model and the second classification model based on the first loss value and the third loss value.

In another possible implementation manner, the first obtaining unit is configured to obtain the third loss value based on the second prediction probability in response to that the first sample relationship label is a second positive sample relationship label, where the second positive sample relationship label indicates that the third sample statement and the fourth sample statement are issued by the same interlocutor, and the second prediction probability and the third loss value have a negative correlation; or, in response to that the first sample relationship label is a second negative sample relationship label, obtaining the third loss value based on a difference between a target value and the second prediction probability, where the second negative sample relationship label indicates that the third sample statement and the fourth sample statement are not issued by the same interlocutor, and a negative correlation exists between the difference and the third loss value.

In another possible implementation manner, the encoding unit is configured to perform feature extraction on the first spliced statement based on a feature extraction submodel in the translation model to obtain a second encoding feature, where the second encoding feature includes a plurality of word vectors, and the word vectors include a word vector of the start character and a plurality of word vectors of the first word; for each of the word vectors: based on a coding sub-model in the translation model, carrying out weighted fusion on a plurality of word vectors, and fusing the vector subjected to weighted fusion with the word vectors to obtain first feature vectors corresponding to the word vectors; and forming the first coding feature by using the obtained plurality of first feature vectors.

In another possible implementation manner, the obtaining module is configured to encode the first sample statement and the second sample statement respectively based on the translation model to obtain a third encoding feature corresponding to the first sample statement and a fourth encoding feature corresponding to the second sample statement, where the fourth encoding feature includes a second feature vector corresponding to each second word, the second word is a word in the second sample statement, and each second feature vector is obtained by weighted fusion of a corresponding second word and a word vector of a previous second word; fusing the third coding features and the fourth coding features based on the translation model to obtain fusion features, wherein the fusion features comprise fusion feature vectors corresponding to each second word; based on the translation model and the fusion features, obtaining a third prediction probability corresponding to each second word, wherein the third prediction probability indicates the possibility of translating each fusion feature vector into the corresponding second word based on the translation model.

In another possible implementation manner, the apparatus further includes:

the obtaining module is further configured to obtain a first sample data set, where the first sample data set includes first dialogue data and second dialogue data having the same meaning, the first dialogue data belongs to the source language, the second dialogue data belongs to the target language, the first dialogue data and the second dialogue data are both translated based on third dialogue data, and the third dialogue data is obtained by at least two interlocutors performing dialogues in the source language and the target language respectively;

and the training module is used for carrying out iterative training on the translation model again based on the first sample data set.

In another possible implementation manner, the training module includes:

a second obtaining unit configured to obtain a fifth sample sentence and a first associated sentence associated with the fifth sample sentence from the first dialogue data, and obtain a sixth sample sentence having the same meaning as the fifth sample sentence from the second dialogue data;

a determining unit, configured to obtain a seventh sample statement and an eighth sample statement from at least one piece of the first dialogue data, and determine a second sample relationship label, where the second sample relationship label indicates whether there is an association relationship between the seventh sample statement and the eighth sample statement;

the second obtaining unit is further configured to obtain a second prediction result based on the translation model, the fifth sample sentence, the first related sentence, and the sixth sample sentence, the second prediction result indicating a possibility of translating the fifth sample sentence into the sixth sample sentence based on the translation model;

the second obtaining unit is further configured to obtain a second prediction relationship label based on the translation model, the seventh sample statement, and the eighth sample statement, where the second prediction relationship label indicates a prediction association relationship between the seventh sample statement and the eighth sample statement;

and the second adjusting unit is further configured to adjust the translation model based on the second prediction result, the second prediction relation label and the second sample relation label.

In another possible implementation manner, the second obtaining unit is configured to splice the fifth sample statement and the first associated statement to obtain a second spliced statement; respectively coding the second spliced statement and the sixth sample statement based on the translation model to obtain a fifth coding feature corresponding to the second spliced statement and a sixth coding feature corresponding to the sixth sample statement, wherein the sixth coding feature comprises a third feature vector corresponding to each third word, the third words refer to words in the sixth sample statement, and each third feature vector is obtained by weighted fusion of the corresponding third word and word vectors of previous third words; fusing the fifth coding feature and the sixth coding feature based on the translation model to obtain a fusion feature, wherein the fusion feature comprises a fusion feature vector corresponding to each third word; based on the translation model and the fused features, obtaining a fourth prediction probability corresponding to each third word, wherein the fourth prediction probability indicates the possibility of translating each fused feature vector into the corresponding third word based on the translation model.

In another possible implementation manner, the apparatus further includes:

the obtaining module is further configured to obtain a second sample data set, where the second sample data set includes a ninth sample sentence and a tenth sample sentence having the same meaning, the ninth sample sentence belongs to the source language, and the tenth sample sentence belongs to the target language;

and the training module is used for carrying out iterative training on the translation model based on the second sample data set.

In another possible implementation manner, the apparatus further includes:

the obtaining module is further configured to obtain, based on the translation model, a target sentence, and a second associated sentence associated with the target sentence, a translated sentence corresponding to the target sentence, where the target sentence and the second associated sentence both belong to the source language, and the translated sentence belongs to the target language.

In another possible implementation manner, the obtaining module includes:

the coding unit is used for coding the target statement and the second associated statement based on the translation model to obtain a seventh coding characteristic;

a third obtaining unit, configured to obtain a first translation word based on the translation model and the seventh coding feature;

the third obtaining unit is further configured to obtain a next translation word based on the translation model, the seventh coding feature, and the first translation word, and repeat the above steps until a last translation word is obtained;

and a construction unit configured to construct the translation sentence from the obtained plurality of translation words.

In another possible implementation manner, the third obtaining unit is configured to encode a start character based on the translation model to obtain an eighth encoding characteristic corresponding to the start character; fusing the seventh coding feature and the eighth coding feature based on the translation model to obtain a fused feature; and decoding the fusion characteristics based on the translation model to obtain a first translation word.

In another possible implementation manner, the third obtaining unit is configured to encode the starting character and the currently obtained translation word based on the translation model to obtain a ninth encoding characteristic; fusing the seventh coding feature and the ninth coding feature based on the translation model to obtain a fused feature; and decoding the fusion characteristics based on the translation model to obtain the next translation word.

In another aspect, a computer device is provided, which includes a processor and a memory, wherein at least one computer program is stored in the memory, and the at least one computer program is loaded by the processor and executed to implement the operations performed in the sentence translation method according to the above aspect.

In another aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the operations performed in the sentence translation method according to the above aspect.

In yet another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising computer program code, the computer program code being stored in a computer readable storage medium. The processor of the computer device reads the computer program code from the computer-readable storage medium, and executes the computer program code, so that the computer device implements the operations performed in the sentence translation method according to the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the method, the apparatus, the computer device, and the storage medium provided in the embodiments of the present application, when training a translation model, a first prediction result is obtained based on a first sample sentence and a second sample sentence that have the same meaning and belong to different languages, respectively, the first prediction result indicates a possibility of translating the first sample sentence into the second sample sentence based on the translation model, that is, the first prediction result can reflect an accuracy of the translation model, a first prediction relation tag is obtained based on a third sample sentence and a fourth sample sentence that belong to a source language, the first prediction relation tag indicates a prediction association between the third sample sentence and the fourth sample sentence, and the first sample relation tag indicates a true association between the third sample sentence and the fourth sample sentence, based on the first prediction relation tag and the first sample relation tag, an analysis capability of the translation model for sentences having an association can be determined, the translation model is adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, namely, under the condition that the translation model is ensured to have the translation capability, the analysis capability of the translation model on the sentences with the association relation is improved, so that the analysis capability of the translation model on the sentences with the association relation can be combined when the sentences are translated based on the translation model in the following process, and the accuracy of the translation model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a data sharing system according to an embodiment of the present application;

FIG. 2 is a flowchart of a sentence translation method provided by an embodiment of the present application;

FIG. 3 is a flowchart of a sentence translation method provided by an embodiment of the present application;

FIG. 4 is a flowchart of a sentence translation method provided by an embodiment of the present application;

FIG. 5 is a flowchart of a sentence translation method provided by an embodiment of the present application;

FIG. 6 is a flowchart of a sentence translation method provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a sentence translating apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a sentence translating apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

As used herein, the terms "first," "second," "third," "fourth," "fifth," "sixth," and the like may be used herein to describe various concepts, but these concepts are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first sample statement can be referred to as a second sample statement, and similarly, a second sample statement can be referred to as a first sample statement, without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of words includes 3 words, each of which refers to each of the 3 words, and any of which refers to any of the 3 words, which can be the first word, or the second word, or the third word.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

The scheme provided by the embodiment of the application can train the translation model based on the technologies of artificial intelligence, natural language processing, machine learning and the like, and realizes the sentence translation method by utilizing the trained translation model.

The sentence translation method provided by the embodiment of the application can be applied to computer equipment. Optionally, the computer device is a terminal or a server. Optionally, the server is an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. Optionally, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.

In some embodiments, the computer program according to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices located at one site, or may be executed on multiple computer devices distributed at multiple sites and interconnected by a communication network, and the multiple computer devices distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.

Optionally, sample statements, dialogue data, and sample relationship labels used for training a translation model are all stored in the blockchain system, the translation model is deployed in the blockchain system, and any device in the blockchain system can complete a translation task based on the translation model.

Referring to the data sharing system shown in fig. 1, the data sharing system 100 refers to a system for performing data sharing between nodes, the data sharing system may include a plurality of nodes 101, and the plurality of nodes 101 may refer to respective clients in the data sharing system. Each node 101 may receive input information while operating normally and maintain shared data within the data sharing system based on the received input information. In order to ensure information intercommunication in the data sharing system, information connection can exist between each node in the data sharing system, and information transmission can be carried out between the nodes through the information connection. For example, when an arbitrary node in the data sharing system receives input information, other nodes in the data sharing system acquire the input information according to a consensus algorithm, and store the input information as data in shared data, so that the data stored on all the nodes in the data sharing system are consistent.

Each node in the data sharing system has a node identifier corresponding thereto, and each node in the data sharing system may store a node identifier of another node in the data sharing system, so that the generated block is broadcast to the other node in the data sharing system according to the node identifier of the other node in the following. Each node may maintain a node identifier list as shown in the following table, and store the node name and the node identifier in the node identifier list correspondingly. The node identifier may be an IP (Internet Protocol) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Node name	Node identification
		Node 1	117.114.151.174
Node 2	117.116.189.145
		…	…
Node N	119.123.789.258

The method provided by the embodiment of the application can be used for various scenes.

For example, in an online translation scenario:

the method includes the steps that a server trains a translation model, the trained translation model is deployed in the server, a terminal logs in a translation application based on a user identifier, the server provides service for the translation application, the terminal sends a target sentence to be translated and an associated sentence associated with the target sentence to the server based on the translation application, the server receives the target sentence and the associated sentence, the translation sentence corresponding to the target sentence is translated based on the translation model, the target sentence and the associated sentence, the translation sentence is sent to the terminal, and the terminal receives and displays the translation sentence based on the translation application.

As another example, in a face-to-face conversation scenario:

the method includes the steps that a server trains a translation model, the trained translation model is deployed in the server, a terminal logs in a translation application based on user identification, the server provides service for the translation application, the terminal collects voice data which are sent by any conversation person and belong to a source language based on the translation application, the voice data are converted into first text information which belongs to the source language, the first text information to be translated is sent to the server based on the translation application, the server receives the first text information, a plurality of target sentences and associated sentences of the target sentences are obtained from the first text information, for each target sentence, the translated sentences which have the same meaning as the target sentence and belong to the target language are translated based on the translation model, the target sentences and the corresponding associated sentences, and the obtained translated sentences form translated sentences which have the same meaning as the first text information, And the second text information belonging to the target language is sent to the terminal, the terminal receives the second text information based on the translation application, converts the second text information into voice data belonging to the target language, and plays the converted voice data so that an interlocutor corresponding to the terminal can listen to the played voice data, thereby realizing the simultaneous interpretation effect and ensuring that two interlocutors communicating in different languages can have a conversation.

Fig. 2 is a flowchart of a sentence translation method provided in an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 2, the method includes:

201. obtaining a first prediction result based on the translation model, the first sample sentence, and the second sample sentence, the first prediction result indicating a possibility of translating the first sample sentence into the second sample sentence based on the translation model, the first sample sentence having the same meaning as the second sample sentence, and the first sample sentence belonging to the source language and the second sample sentence belonging to the target language.

In the embodiment of the present application, the source language and the target language are any two different languages, for example, the source language is chinese, and the target language is english. The translation model is used for translating the sentence belonging to the source language into the sentence belonging to the target language, and the second sample sentence is the sentence obtained when the first sample sentence belonging to the source language is translated into the target language. The first prediction result can reflect the accuracy of the translation model, and the translation model can be adjusted subsequently based on the obtained first prediction result so as to improve the accuracy of the translation model.

202. And acquiring a third sample sentence, a fourth sample sentence and a first sample relation label, wherein the first sample relation label indicates whether the third sample sentence and the fourth sample sentence have an association relation or not, and the third sample sentence and the fourth sample sentence both belong to the source language.

In this embodiment, the third sample sentence and the fourth sample sentence may or may not have an association relationship, and the first sample relationship label is manually set based on the relationship between the third sample sentence and the fourth sample sentence, or determined in another manner.

203. And acquiring a first prediction relation label based on the translation model, the third sample statement and the fourth sample statement, wherein the first prediction relation label indicates a prediction incidence relation between the third sample statement and the fourth sample statement.

And the first prediction relation label is a relation label obtained by processing the third sample statement and the fourth sample statement based on the translation model.

204. The translation model is adjusted based on the first prediction result, the first prediction relationship label, and the first sample relationship label.

The first prediction result and the first prediction relation label are obtained based on the translation model, the first prediction result can reflect the accuracy of the translation model, the analysis capability of the translation model on the sentences with the association relation can be determined based on the first prediction relation label and the first sample relation label, and the model parameters in the translation model are adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, namely the analysis capability of the translation model on the sentences with the association relation is improved under the condition that the translation capability of the translation model is ensured.

In the method provided by the embodiment of the application, when a translation model is trained, a first prediction result is obtained based on a first sample sentence and a second sample sentence which have the same meaning and belong to different languages respectively, the first prediction result indicates the possibility of translating the first sample sentence into the second sample sentence based on the translation model, namely the first prediction result can reflect the accuracy of the translation model, a first prediction relation label is obtained based on a third sample sentence and a fourth sample sentence which belong to a source language, the first prediction relation label indicates a prediction incidence relation between the third sample sentence and the fourth sample sentence, the first sample relation label indicates a real incidence relation between the third sample sentence and the fourth sample sentence, and the analysis capability of the translation model on the sentences with the incidence relation can be determined based on the first prediction relation label and the first sample relation label, the translation model is adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, namely, under the condition that the translation model is ensured to have the translation capability, the analysis capability of the translation model on the sentences with the association relation is improved, so that the analysis capability of the translation model on the sentences with the association relation can be combined when the sentences are translated based on the translation model in the following process, and the accuracy of the translation model is improved.

On the basis of the embodiment shown in fig. 2, the following embodiment is provided in the present application to explain the process of adjusting the translation model in detail.

Fig. 3 is a flowchart of a sentence translation method provided in an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 3, the method includes:

301. and coding the first sample sentence based on the translation model to obtain a third coding characteristic corresponding to the first sample sentence.

In the embodiment of the present application, the first sample sentence has the same meaning as the second sample sentence, and the first sample sentence belongs to the source language and the second sample sentence belongs to the target language. Based on the translation model, the first sample statement, and the second sample statement, determining a likelihood of translating the first sample statement into a second sample statement based on the translation model translation. The translation model is an arbitrary Network model, for example, the translation model is a transform (a Network model), or RNN (Recurrent Neural Network).

The first sample sentence is an arbitrary sentence belonging to the source language, for example, the first sample sentence is a sentence belonging to chinese, such as "rained today", and the third encoding feature is a feature obtained by encoding the first sample sentence and is used for representing the meaning of the first sample sentence.

In one possible implementation manner, the third encoding feature includes a plurality of feature vectors, a first feature vector of the plurality of feature vectors corresponds to a starting character before the first sample sentence, each feature vector of the plurality of feature vectors except the first feature vector corresponds to a fourth word, the fourth word refers to a word in the first sample sentence, and each feature vector is obtained by weighted fusion of a word vector of the starting character and word vectors of the plurality of fourth words.

When a first sample sentence is processed, a start character is added before the first sample sentence, wherein the start character indicates that a word behind the start character is a word contained in the sentence to be processed, namely the start character can indicate the start position of the sentence. In this embodiment of the present application, a word in a first sample sentence is changed into a fourth word, and when the first sample sentence is encoded based on a translation model, a start character is added before the first sample sentence, and then the start character and the first sample sentence are encoded, so that a feature vector corresponding to the start character and a feature vector corresponding to each fourth word can be obtained.

In one possible implementation, this step 301 includes: and based on the coding submodel in the translation model, coding the coding features corresponding to the first sample sentence to obtain the third coding features.

The feature extraction sub-model is used to convert any word into a corresponding word vector, for example, the feature extraction sub-model is a word embedding layer, and based on the word embedding layer, any word can be mapped into a corresponding word vector. The coding submodel is used for coding a plurality of word vectors contained in the coding features again. The coding feature corresponding to the first sample statement is also a word vector sequence corresponding to the first sample statement, and the word vector sequence includes a word vector of a start character and a word vector of each fourth word included in the first sample statement.

And based on the feature extraction submodel, firstly obtaining the word vector of each fourth word in the first sample sentence, and then coding the coding features corresponding to the first sample sentence again based on the coding submodel, so that the word vector of the initial character and the word vector of each fourth word are merged into each obtained feature vector, the relation between the initial character and the plurality of fourth words is enhanced, and the accuracy of the third coding features is improved.

Optionally, the encoding sub-model includes a plurality of encoding layers, and the process of obtaining the third encoding characteristic based on the encoding sub-model includes: and coding the coding features corresponding to the first sample sentence based on the first coding layer to obtain a first coding feature, coding the coding features output by the previous coding layer again based on the second coding layer to output a coding feature, and repeating the steps until the third coding feature output by the last coding layer is obtained.

In this embodiment of the present application, the input of the first coding layer is a coding feature corresponding to the first sample sentence output by the feature extraction submodel, that is, the input of the first coding layer is a word vector sequence, the output of the first coding layer is a coding feature, starting from the second coding layer, the input of each coding layer is a coding feature output by the previous coding layer, the output is a new coding feature, and the output of the last coding layer is the third coding feature. For example, the coding feature output by each coding layer is a hidden state sequence, the hidden state sequence includes a feature vector corresponding to the initial character and a feature vector corresponding to each fourth word, and the hidden state sequence output by the last coding layer is the third coding feature.

Optionally, each coding layer includes a self-attention sublayer and a feedforward network sublayer, the self-attention sublayer is used for performing weighted fusion on vectors in the input coding features, and the feedforward network sublayer is used for performing feature transformation on the coding features after weighted fusion.

For example, for any one of a plurality of coding layers, the coding features input to the coding layer include a plurality of first vectors, and the processing procedure of the coding layer includes: for each first vector, performing weighted fusion on a plurality of first vectors in the coding features based on a self-attention sublayer in the coding layer, performing fusion on the weighted fusion vectors and the first vectors to obtain vectors after updating of the first vectors, and forming the coding features output by the self-attention sublayer by the vectors after updating of the plurality of first vectors; and based on the feedforward network sublayer, performing feature transformation on the coding features output from the attention sublayer, and fusing the features after feature transformation and the coding features output from the attention sublayer to obtain the coding features output by the coding layer.

Optionally, for a self-attention sublayer in any one of the coding layers, the coding features input in the self-attention sublayer and the coding features output from the attention sublayer satisfy the following relationship:

d^(l)＝LN(SelfAtt(h^(l-1)，h^(l-1)，h^(l-1)))+h^(l-1)

wherein, l is used for representing the serial numbers of a plurality of coding layers in the coding submodel, and is an integer greater than 0; d^(l)For representing the coding features output from the attention sublayer in the ith coding layer, LN (-) for representing the regularization function; SelfAtt (. circle.) is used to denote the mechanism of self-attention; h is^(l-1)For indicating the coding characteristics of the input ith coding layer, if l is 1, h⁽⁰⁾The expression feature extraction submodel extracts the features of the first sample sentence to obtain the coding features, namely h⁽⁰⁾The coding features represented include the word vector of the starting character and the word vector of each word contained in the first sample sentence, h if l is greater than 1^(l-1)I.e. the coding characteristics of the output of the l-1 coding layer.

Optionally, for a feedforward network sublayer in any coding layer, the coding features input in the feedforward network sublayer and the coding features output by the feedforward network sublayer satisfy the following relationship:

h^(l)＝LN(FFN(d^(l)))+d^(l)

wherein, l is used for representing the serial numbers of a plurality of coding layers in the coding submodel, and is an integer greater than 0; h is^(l)The coding characteristics used for expressing the output of the ith coding layer are the hidden state sequence of the output of the ith coding layer; LN (-) is used to represent the regularization function; FFN (-) is used to represent a feed-forward network; d^(l)For representing the coding features output from the attention sublayer in the ith coding layer.

302. And coding the second sample sentence based on the translation model to obtain a fourth coding feature corresponding to the second sample sentence.

Wherein the second sample sentence belongs to the target language, the second sample sentence corresponding to the sentence obtained when the first sample sentence is translated from the source language into the target language. The fourth coding feature is a feature obtained by coding the second sample statement and is used for representing the meaning of the second sample statement. The fourth coding features comprise second feature vectors corresponding to each second word, the second words refer to words in the second sample sentence, and each second feature vector is obtained by weighting and fusing the corresponding second word and word vectors of previous second words. For example, the second sample sentence includes 4 second words, the second feature vector corresponding to the 3 rd second word is obtained by weighted fusion of the word vectors of the 1 st to 3 rd second words, and the feature vector corresponding to the 4 th second word is obtained by weighted fusion of the word vectors of the 1 st to 4 th second words. In the embodiment of the present application, for any second word, the second word before the second word is the preceding word of the second word, for example, the second sample sentence includes 4 second words, the preceding word of the 2 nd second word includes only the 1 st second word, and the preceding word of the 3 rd second word includes the 1 st second word and the 2 nd second word. That is, each second feature vector is obtained by the word vector weighted fusion of the corresponding second word and the corresponding preceding word.

When the feature vector corresponding to any second word is obtained, the feature vector is obtained only based on the second word and the word vector of the second word before the second word, but the word vector of the second word after the second word is not involved, so that the effect of masking the words after the second word is realized.

In a possible implementation manner, when the second sample sentence is encoded based on the translation model, a start character is added before the second sample sentence, and a second feature vector corresponding to the start character and a second feature vector corresponding to each second word are obtained, that is, the fourth encoding feature includes a plurality of second feature vectors, a first one of the plurality of second feature vectors corresponds to the start character, and each second feature vector except the first feature vector in the plurality of second feature vectors corresponds to one second word.

In one possible implementation, this step 302 includes: and based on a decoding sub-model in the translation model, coding features corresponding to the second sample sentence to obtain the fourth coding feature.

The coding features corresponding to the second sample sentence obtained based on the feature extraction submodel are also the word vector sequence corresponding to the second sample sentence, and the word vector sequence comprises the word vector of the initial character and the word vector of each second word. The decoding submodel is used for coding the coding features of the sentences again, and in the coding process, only the word vectors of any word and the words before the word are subjected to weighted fusion according to the sequence of the words contained in the sentences.

And through the feature extraction submodel, firstly obtaining the word vector of each second word in the second sample sentence, and then coding the coding features corresponding to the second sample sentence again based on the decoding submodel, so that the word vector of the corresponding second word and the word vector of the corresponding preamble word are merged into the feature vector corresponding to each second word, the relation between each second word and the corresponding preamble word is enhanced, and the accuracy of the fourth coding feature is improved.

It should be noted that, in the embodiment of the present application, the third coding feature and the fourth coding feature are obtained by processing the first sample sentence and the second sample sentence having the same meaning based on the translation model, and in another embodiment, the first sample sentence and the second sample sentence are obtained before the step 301 is executed.

In one possible implementation, the manner of obtaining the first sample statement and the second sample statement includes: and acquiring a third sample data set, wherein the third sample data set comprises a plurality of pairs of sample sentences, each pair of sample sentences has the same meaning, each pair of sample sentences comprises a sentence belonging to a source language and a sentence belonging to a target language, and any pair of sample sentences is selected from the third sample data set to obtain the first sample sentence and the second sample sentence.

And the third sample data set is a statement level data set, and comprises statement pairs belonging to the source language and the target language.

303. And fusing the third coding features and the fourth coding features based on the translation model to obtain fusion features, wherein the fusion features comprise fusion feature vectors corresponding to each second word.

Wherein the fused feature comprises a plurality of fused feature vectors, each fused feature vector corresponding to a second term. The third coding feature and the fourth coding feature are blended into the fusion feature vector, that is, the fusion feature can embody the relationship between the first sample sentence and the second sample sentence.

In one possible implementation, this step 303 includes: and fusing the third coding characteristic and the fourth coding characteristic based on a decoding sub-model in the translation model to obtain a fused characteristic. Wherein the decoding submodel is used for fusing the coding features of the sample sentences belonging to different languages.

It should be noted that, in the embodiment of the present application, the fourth coding feature is obtained based on the translation model, and the fusion feature is obtained based on the fourth coding feature and the third coding feature, and in another embodiment, the translation model further includes a decoding sub-model, and the decoding sub-model includes a plurality of decoding layers, and the fusion feature is obtained based on the plurality of decoding layers, the second sample sentence, and the third coding feature.

In one possible implementation, the process of obtaining the fusion feature based on the plurality of decoding layers, the second sample statement and the third coding feature includes: extracting features of the second sample sentence based on a feature extraction submodel in the translation model to obtain coding features corresponding to the second sample sentence, and coding the coding features corresponding to the second sample sentence based on the first decoding layer to obtain a first coding feature; based on the first decoding layer, fusing the first coding feature and the third coding feature to obtain a first fused feature; and based on the second decoding layer, encoding the fusion feature output by the previous decoding layer to obtain a second encoding feature, based on the second decoding layer, fusing the second encoding feature and the third encoding feature to obtain a next fusion feature, and repeating the steps until the last fusion feature is output by the last decoding layer.

The encoding features output by each decoding layer comprise feature vectors corresponding to each second term, and each feature vector is obtained by weighting and fusing vectors corresponding to the corresponding second term and the previous second term. In the process of acquiring the fusion characteristics based on the plurality of decoding layers, the process of acquiring the encoding characteristics by the plurality of decoding layers is different.

For a first decoding layer, the input of the first decoding layer is the coding features corresponding to a second sample sentence, and the coding features corresponding to the second sample sentence comprise a word vector of a starting character and a plurality of word vectors of second words. And when the coding features corresponding to the second sample sentence are coded based on the first coding layer, sequentially acquiring the weighted feature vector corresponding to each word vector according to the sequence of a plurality of word vectors in the input coding features. For any second word, obtaining a feature vector corresponding to the second word only based on the word vector of the second word, the initial character before the second word and the word vector of the second word, and after obtaining the feature vectors corresponding to the plurality of second words, forming the obtained feature vectors corresponding to the plurality of second words into a first coding feature.

For each decoding layer except the first decoding layer, the decoding layer inputs the fused feature output by the last decoding layer, and the fused feature comprises a plurality of feature vectors. When the input fusion features are coded based on the coding layer, the weighted feature vectors corresponding to the feature vectors are sequentially obtained according to the sequence of the feature vectors in the input fusion features. For any second word, obtaining a feature vector corresponding to the second word after weighting based on the feature vector corresponding to the second word, the initial character before the second word and the feature vector corresponding to the second word, and after obtaining a plurality of feature vectors corresponding to the second word after weighting, forming the obtained plurality of feature vectors corresponding to the second word into the encoding features output by the decoding layer.

Optionally, the fusion feature output by each decoding layer is a hidden state sequence, the hidden state sequence includes a fusion feature vector corresponding to the starting character and a fusion feature vector corresponding to each second word, and the hidden state sequence output by the last decoding layer is the fusion feature obtained in step 303.

Optionally, each decoding layer includes a mask self-attention sublayer, a cross-attention sublayer and a feedforward network sublayer, where the mask self-attention sublayer is configured to perform weighted fusion on vectors in the input features in sequence according to an order of the vectors in the input features; the cross attention sublayer is used for fusing the coding features output by the mask from the attention sublayer with the third coding features; the feedforward network sublayer is used for performing feature transformation on the fusion features output by the cross attention sublayer and outputting the transformed fusion features.

For example, for any of a plurality of decoding layers, the features input into the decoding layer include a plurality of second vectors, that is, the features are input into the mask from the attention sublayer, processed by the mask from the attention sublayer, the cross attention sublayer and the feedforward network sublayer in the encoding layer, and the fused features output by the decoding layer are output. The processing procedure of the decoding layer comprises the following steps: for any second vector, based on a mask self-attention sublayer in the decoding layer, performing weighted fusion on the second vector and a second vector before the second vector in the input features, fusing the vector after weighted fusion with the second vector to obtain a vector after updating the second vector, and forming a plurality of vectors after updating the second vector into the mask self-attention sublayer to output coding features; based on the cross attention sublayer, performing weighted fusion on the coding features output by the mask from the attention sublayer and the third coding features, fusing the weighted fusion features and the fourth coding features, and outputting fusion features; and based on the feedforward network sublayer, performing feature transformation on the fusion features output by the cross attention sublayer, fusing the fusion features after feature transformation and the fusion features output by the cross attention sublayer, and outputting the obtained fusion features.

Optionally, for a masked self-attention sublayer in any decoding layer, features input in the masked self-attention sublayer and encoding features output by the masked self-attention sublayer satisfy the following relationship:

a^(l)＝LN(MaskedSelfAtt(s^(l-1)，s^(l-1)，s^(l-1)))+s^(l-1)

wherein, l is used for representing the serial numbers of a plurality of decoding layers in the decoding submodel, and is an integer greater than 0; a is^(l)The coding characteristics used for representing the output of the mask from the attention sublayer in the ith decoding layer, LN (-) is used for representing the regularization function; MaskedSelfAtt (-) is used to represent the masked self-attention mechanism; s^(l-1)For characterizing the input first decoded layer, s if l is 1⁽⁰⁾The expression feature extraction submodel extracts the features of the second sample sentence to obtain the coding feature, namely s⁽⁰⁾The coding features represented include the word vector of the starting character and the word vector of each word contained in the second sample sentence, s if l is greater than 1^(l-1)I.e. the fusion characteristics of the output of the l-1 decoding layer.

Optionally, for a cross attention sublayer in any decoding layer, the coding features input in the cross attention sublayer and the fusion features output by the cross attention sublayer satisfy the following relationship:

wherein, l is used for representing the serial numbers of a plurality of decoding layers in the decoding submodel, and is an integer greater than 0; z is a radical of^(l)A fusion feature to represent the cross-attention sublayer output in the l-th decoding layer, LN (-) to represent the regularization function; CrossAtt (. circle.) is used to denote the cross-attention mechanism; a is^(l)For indicating the coding characteristics of the mask in the l-th decoding layer output from the attention sublayer,

for representing Lth in coding submodel in translation model_eCoding characteristics of the output of the individual coding layers, L_eFor indicating the total number of coding layers in the coding submodel, L_eIs an integer greater than 1 and is,

i.e. the output coding characteristics of the last coding layer in the coding sub-model in the translation model.

Optionally, for a feedforward network sublayer in any decoding layer, the fusion feature input in the feedforward network sublayer and the converted fusion feature output by the feedforward network sublayer satisfy the following relationship:

s^(l)＝LN(FFN(z^(l)))+z^(l)

wherein, l is used for representing the serial numbers of a plurality of decoding layers in the decoding submodel, and is an integer greater than 0; s^(l)The decoding characteristics used for representing the output of the decoding layer I are hidden state sequences of the output of the decoding layer I; LN (-) is used to represent the regularization function; FFN (-) is used to represent a feed-forward network; z is a radical of^(l)Used to represent the fused features of the cross-attention sublayer output in the ith decoding layer.

304. Based on the translation model and the fusion features, a third prediction probability corresponding to each second word is obtained, and the third prediction probability indicates the possibility of translating each fusion feature vector in the fusion features into the corresponding second word based on the translation model.

And the fusion features comprise fusion feature vectors corresponding to a plurality of second words, and each fusion feature vector is translated into a word belonging to the target language based on the translation model and the fusion feature vectors in the fusion features, so that a third prediction probability corresponding to each second word can be obtained. The greater the third prediction probability, the greater the likelihood of representing a translation of the fused feature vector in the fused feature into a corresponding second word, and the smaller the third prediction probability, the less likelihood of representing a translation of the fused feature vector in the fused feature into a corresponding second word. The obtained third prediction probability corresponding to each second word is the first prediction result obtained based on the translation model, the first sample sentence and the second sample sentence, and the third prediction probabilities corresponding to the plurality of second words can embody the accuracy of the translation model.

In one possible implementation, this step 304 includes: and comparing the fusion feature vector with word vectors of a plurality of words in a word library based on the translation model to obtain prediction probabilities corresponding to the words, and selecting a third prediction probability corresponding to a second word corresponding to the fusion feature vector from the prediction probabilities corresponding to the words.

In the embodiment of the present application, the translation model corresponds to a word library, a plurality of words included in the word library all belong to a target language, and a word vector of each word is used for representing the corresponding word. Optionally, the term library is manually preconfigured. For any fusion feature vector in the fusion features, the fusion feature vector is compared with the word vector of each word in the word library to determine the prediction probability of translating the fusion feature vector into each word, and the third prediction probability corresponding to the corresponding second word is selected from the obtained multiple probabilities.

Optionally, the process of obtaining the prediction probabilities corresponding to the plurality of words includes: based on the translation model, determining the similarity between the fusion feature vector and the word vector of each word in the word library, and respectively determining the prediction probability corresponding to each word based on the similarity corresponding to each word.

And the prediction probability corresponding to each word and the corresponding similarity are in positive correlation. The similarity between the fused feature vector and the word vector of the word can represent the similarity between the translated word corresponding to the fused feature vector and the word, and the greater the similarity between any fused feature vector and any word, the greater the possibility of translating the fused feature vector into the word, that is, the greater the prediction probability corresponding to the word, and the smaller the similarity between any fused feature vector and any word, the smaller the possibility of translating the fused feature vector into the word, that is, the smaller the prediction probability corresponding to the word. And determining the prediction probability corresponding to the words based on the similarity between the fused feature vector and the word vector so as to ensure the accuracy of the prediction probability.

In one possible implementation, the translation model includes a mapping sub-model, and step 304 includes: and mapping any one fusion feature vector in the fusion features based on a mapping sub-model in a translation model to obtain a probability sequence, wherein the probability sequence comprises the probabilities of multiple dimensions, each dimension corresponds to one word in a word library, and the probability of the dimension corresponding to a second word corresponding to the fusion feature vector is determined as a third prediction probability corresponding to the second word corresponding to the fusion feature vector.

In the embodiment of the present application, the translation model corresponds to a word library, a plurality of words included in the word library all belong to a target language, the mapping sub-model is used to map the fusion feature vector to each word in the word library, and the obtained probability sequence is probability distribution of prediction probabilities corresponding to the plurality of words in the word library.

Optionally, the process of obtaining the probability sequence based on the mapping sub-model satisfies the following relationship:

P(y_t|y<t，x)＝softmax(Ws_t)

wherein, y_tThe sequence is used for representing the tth second word in the second sample sentence, t is used for representing the sequence number of the second word, namely the sequence of the second word in a plurality of second words contained in the second sample sentence, and t is an integer greater than 0; y is<t is used for representing a second word before the tth second word in the second sample sentence, namely a preorder word of the tth second word; p (y)_tY < t, x) is used for representing a probability sequence corresponding to the t-th fusion feature vector in the fusion features, and x is used for representing a first sample statement; softmax (·) is used to represent a logistic regression function; w is used for representing a linear transformation matrix in the mapping submodel; st is used to represent the t-th fused feature vector in the fused features.

It should be noted that, in the embodiment of the present application, the first sample statement and the second sample statement are processed in a manner of encoding and fusing first based on the translation model to obtain the first prediction result, and in another embodiment, the step 301 and the step 304 need not be executed, and other manners can be adopted to obtain the first prediction result based on the translation model, the first sample statement and the second sample statement, where the first prediction result indicates a possibility of translating the first sample statement into the second sample statement based on the translation model.

305. And acquiring a third sample sentence, a fourth sample sentence and a first sample relation label, wherein the first sample relation label indicates whether the third sample sentence and the fourth sample sentence have an association relation or not, and the third sample sentence and the fourth sample sentence both belong to the source language.

The first sample relationship label can be represented in any form, for example, the first sample relationship label indicates whether there is an association between the third sample statement and the fourth sample statement in a numerical form, such as that the first sample relationship label is 1, which indicates that there is an association between the third sample statement and the fourth sample statement, and the first sample relationship label is 0, which indicates that there is an association between the third sample statement and the fourth sample statement.

In one possible implementation, this step 305 includes: and acquiring a fourth sample data set, wherein the fourth sample data set comprises at least one piece of dialogue data, the dialogue data belongs to the source language, a third sample statement and a fourth sample statement are acquired from the at least one piece of dialogue data, and a first sample relation label is determined.

In the embodiment of the present application, the fourth sample data set only includes dialogue data belonging to the source language, that is, the fourth sample data set is a monolingual sample data set. Each dialogue data is obtained by at least two interlocutors in the source language. The third sample sentence belongs to any dialogue data, the fourth sample sentence belongs to any dialogue data, and the dialogue data of the third sample sentence and the dialogue data of the fourth sample sentence may be the same or different.

For example, the third sample statement and the fourth sample statement have an association relationship therebetween, which indicates that the third sample statement and the fourth sample statement belong to the same dialogue data; or, the third sample sentence and the fourth sample sentence are issued by the same interlocutor, and the third sample sentence and the fourth sample sentence belong to the same interlocutor data. The third sample statement and the fourth sample statement do not have an incidence relation, and the third sample statement and the fourth sample statement do not belong to the same dialogue data; or the third sample sentence and the fourth sample sentence belong to the same dialogue data, but the third sample sentence and the fourth sample sentence are not issued by the same interlocutor.

306. And splicing the third sample statement and the fourth sample statement to obtain a first spliced statement.

And splicing the third sample statement and the fourth sample statement into a long statement by adopting a splicing mode.

In one possible implementation, the process of obtaining the first concatenation statement includes: and splicing the third sample statement after the fourth sample statement to obtain the first spliced statement.

Optionally, in the concatenation process, a separator is added between the fourth sample statement and the third sample statement, that is, the obtained first concatenation statement includes the fourth sample statement, the separator, and the third sample statement.

By adding a separator to the first spliced sentence, the fourth sample sentence and the third sample sentence included in the first spliced sentence can be subsequently distinguished based on the separator.

307. And coding the first spliced statement based on the translation model to obtain first coding features, wherein the first coding features comprise a plurality of first feature vectors, the first feature vector corresponds to the initial character positioned before the first spliced statement, each first feature vector except the first feature vector in the plurality of first feature vectors corresponds to a first word, the first word refers to a word in the first spliced statement, and each first feature vector is obtained by weighted fusion of the word vector of the initial character and the word vectors of the plurality of first words.

In this embodiment of the present application, when the translation model encodes the first spliced sentence, a start character is added before the first spliced sentence to indicate a start position of the first spliced sentence, and a feature vector corresponding to the start character and a feature vector corresponding to each word in the first spliced sentence are obtained. In the first coding features, a word vector of a start character and word vectors of a plurality of first words are merged into a first feature vector corresponding to each first word, so that the relevance between the plurality of first words is enhanced, the word vectors of the plurality of first words are also merged into the first feature vector corresponding to the start character, namely, the word vector of a word contained in a third sample sentence and the word vector of a word contained in a fourth sample sentence are merged into the first feature vector corresponding to the start character, and the relation between the third sample sentence and the fourth sample sentence can be embodied in the feature vector corresponding to the start character.

In one possible implementation, this step 307 includes: extracting features of the first spliced statement based on a feature extraction submodel in the translation model to obtain second coding features, wherein the second coding features comprise a plurality of word vectors, and the word vectors comprise word vectors of initial characters and word vectors of a plurality of first words; for each word vector: based on a coding sub-model in the translation model, carrying out weighted fusion on a plurality of word vectors, and fusing the vector subjected to weighted fusion with the word vectors to obtain first feature vectors corresponding to the word vectors; and forming a first coding feature by using the obtained plurality of first feature vectors.

The word vectors of each first word in the first spliced statement are subjected to weighted fusion so as to enhance the relevance among a plurality of first words, and therefore the accuracy of the first coding features is improved.

Optionally, when a first feature vector corresponding to each word vector is obtained, determining similarity between the word vector and a plurality of word vectors, performing normalization processing on the obtained plurality of similarity to obtain a plurality of weights, where each weight corresponds to one word vector, and the sum of the plurality of weights is 1, performing weighted fusion on the plurality of word vectors according to the plurality of weights, and performing fusion on the fused vector and the word vector to obtain a first feature vector corresponding to the word vector.

In one possible implementation, the translation model includes an encoding sub-model, the encoding sub-model includes at least one encoding layer, and after the feature extraction sub-model outputs the second encoding feature, for each word vector: based on the first coding layer, carrying out weighted fusion on the word vectors, fusing the vectors subjected to weighted fusion with the word vectors to obtain first feature vectors corresponding to the word vectors, and forming the obtained first feature vectors into first coding features.

After the second coding feature is obtained, the process of obtaining the first coding feature based on the first coding layer in the coding sub-model is the same as the processing process of the first coding layer included in the coding sub-model in step 301, and is not described herein again.

308. And classifying a first feature vector in the first coding features to obtain a first prediction relation label.

The first feature vector in the first encoding features is the feature vector corresponding to the starting character. The first prediction relation label indicates a prediction incidence relation between the third sample statement and the fourth sample statement, that is, an incidence relation between the third sample statement and the fourth sample statement predicted based on the translation model. Because the first feature vector in the first coding feature can embody the relation between the third sample statement and the fourth sample statement, the first prediction relation label is obtained in a classification mode so as to predict whether the third sample statement and the fourth sample statement have the association relation.

It should be noted that, in the embodiment of the present disclosure, the first splicing statement is processed based on the translation model, and then the first feature vector in the obtained first coding feature is classified to obtain the first prediction relationship label, and in another embodiment, the step 306 and the step 308 need not be executed, and the first prediction relationship label can be obtained based on the translation model, the third sample statement and the fourth sample statement in other manners.

309. And adjusting the translation model based on a third prediction probability, a first prediction relation label and a first sample relation label which are included in the second sample sentence and correspond to the second word.

In this embodiment of the application, a third prediction probability corresponding to each second word included in a second sample sentence is a first prediction result obtained based on a translation model, the third prediction probabilities corresponding to a plurality of second words can embody an accuracy of the translation model, a first prediction relation label indicates an association relation between a third sample sentence and a fourth sample sentence predicted based on the translation model, a first sample relation label indicates a real association relation between the third sample sentence and the fourth sample sentence, an analysis capability between the translation model and the sentences having the association relation can be determined based on the first prediction relation label and the first sample relation label, a model parameter in the translation model is adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, that is, under the condition that the translation model has the translation capability is ensured, and the analysis capability of the translation model on the sentences with the association relation is improved.

In the embodiment of the present application, the first prediction result is taken as the third prediction probability corresponding to the second word included in the second sample sentence, and the translation model is adjusted based on the third prediction probability corresponding to the second word included in the second sample sentence, the first prediction relationship label, and the first sample relationship label.

It should be noted that, in the embodiment of the present application, the description is only performed by one iteration of the translation model, and in another embodiment, the translation model is iterated multiple times according to the

above steps

301 and 309, so as to improve the accuracy of the translation model.

And, in the process of obtaining the first prediction result, obtaining a third coding feature for characterizing the first sample sentence belonging to the source language and a fourth coding feature for characterizing the second sample sentence belonging to the target language, fusing the third coding feature of the first sample sentence and the fourth coding feature of the second sample sentence, the obtained fusion characteristics show the relation between the first sample sentences and the second sample sentences, the prediction probability corresponding to each second word in the second sample sentences is obtained based on the fusion characteristics, the prediction probability corresponding to each second word can represent the possibility that the translation model translates the corresponding second word, the accuracy of the translation model can be determined, so that the translation model can be adjusted based on the determined prediction probability corresponding to each second word, and the accuracy of the translation model is improved.

In the fourth coding features corresponding to the second sample sentence, the feature vector corresponding to each second word is obtained only based on the second word and the word vector of the second word before the second word, and the word vector of the second word after the second word is not involved, so that the effect of masking the words after the second word is realized, and the next word can be translated by combining the translated words in the process of word-by-word translation of the subsequent translation model, so that the translated words have relevance, and the accuracy of the translation model is improved.

And the third sample sentence and the fourth sample sentence are spliced, the first coding features corresponding to the spliced sentences are obtained, each feature vector in the first coding features is obtained by fusing a start character and a word vector of each word in the first spliced sentence, namely the word vector of each word in the first spliced sentence is fused into the first feature vector corresponding to the start character, and the relation between the third sample sentence and the fourth sample sentence can be embodied.

On the basis of the embodiment shown in fig. 3, after obtaining the third prediction probability corresponding to each second word included in the second sample sentence and the first coding feature of the first spliced sentence, the computer device may further obtain the first prediction relation label in combination with the classification model, and adjust the translation model by determining the loss value, that is, after step 307, the computer device may further adjust the model by using the following two methods:

the first way comprises the following steps 310-313:

310. and classifying a first feature vector in the first coding features based on the first classification model to obtain a first prediction probability, wherein the first prediction probability indicates the possibility that the third sample sentence and the fourth sample sentence belong to the same group of conversations.

In the embodiment of the present application, the first classification model is used to determine the possibility that two sample sentences belong to the same group of dialogs, the first sample relationship label indicates whether a third sample sentence and a fourth sample sentence belong to the same group of dialogs, and the third sample sentence and the fourth sample sentence may or may not belong to the same group of dialogs.

For example, the third sample sentence and the fourth sample sentence are sentences included in the same fourth dialogue data obtained by the dialogue between at least two interlocutors in the source language, and the third sample sentence and the fourth sample sentence belong to the same group of dialocus. For example, the third sample sentence and the fourth sample sentence both belong to fourth dialogue data, and in the fourth dialogue data, the fourth sample sentence is at least one sentence before the third sample sentence, that is, the fourth sample sentence is a context sentence of the third sample sentence.

For another example, the third sample sentence is a sentence included in one dialogue data, the fourth sample sentence is a sentence included in another dialogue data, and the two dialogue data are obtained by dialogues of at least two interlocutors in the source language, and the third sample sentence and the fourth sample sentence do not belong to the same group of dialogues.

Because the first feature vector in the first coding feature can embody the connection between the third sample statement and the fourth sample statement, the first feature vector in the first coding feature is classified based on the first classification model to determine the possibility that the third sample statement and the fourth sample statement belong to the same group of conversations, namely the first prediction probability.

311. And acquiring a first loss value based on the third prediction probabilities corresponding to the second words.

In this embodiment of the application, the third prediction probability corresponding to each second word in the second sample sentence is the first prediction result obtained based on the translation model, and each third prediction probability represents a possibility that the translation model translates the corresponding second word. The first loss value can embody the accuracy of the translation model, and the translation model is adjusted through the first loss value by obtaining the first loss value.

In one possible implementation, this step 311 includes: and determining a negative value of the sum of the third prediction probabilities corresponding to the plurality of second words as the first loss value.

In one possible implementation manner, the third prediction probabilities and the first loss values corresponding to the plurality of second words satisfy the following relationship:

wherein L is₁For representing a first penalty value, t represents the order of the second words in the second sample statement, y represents the second sample statement, | y | represents the total number of second words contained in the second sample statement, y_tDenotes the t-th second word in the second sample sentence, y < t denotes the second word before the t-th second word in the second sample sentence, P (y)_tAnd | x, y < t) is a third prediction probability corresponding to the tth second word, and x represents the first sample sentence.

312. And acquiring a second loss value based on the first prediction probability and the first sample relation label.

Since the first sample relationship label indicates a true associative relationship between the third sample statement and the fourth sample statement, i.e., indicates whether the third sample statement and the fourth sample statement belong to the same group of dialogs, the first prediction probability indicates a predicted possibility that the third sample statement and the fourth sample statement belong to the same group of dialogs. And acquiring a second loss value based on the first prediction probability and the first sample relation label, wherein the acquired second loss value represents the accuracy of the prediction result.

In one possible implementation, the first sample relationship label includes a first positive sample relationship label indicating that the third sample statement and the fourth sample statement belong to the same group of dialogs, or a first negative sample relationship label indicating that the third sample statement and the fourth sample statement do not belong to the same group of dialogs. When the first sample relationship labels are different, the manner of obtaining the second loss value is different, that is, the process of obtaining the second loss value includes the following two manners:

the first mode is as follows: and acquiring a second loss value based on the first prediction probability in response to the first sample relation label being a first positive sample relation label, wherein the first prediction probability and the second loss value are in a negative correlation relationship.

The larger the first prediction probability is, the smaller the second loss value is, and the smaller the first prediction probability is, the larger the second loss value is.

Optionally, in response to the first sample relationship label being the first positive sample relationship label, the first prediction probability, the second loss value satisfy the following relationship:

wherein L is₂For the purpose of representing the second loss value,

for representing the first prediction probability is used,

for representing a first positive sample relational tag, x for a first sample statement, c_xRepresenting a second sample statement; softmax (·) is used to represent a logistic regression function;

for representing the linear transformation matrix parameters in the first classification model,

for representing a first one of the first encoded features.

The second mode is as follows: and responding to the first sample relation label as a first negative sample relation label, and acquiring a second loss value based on the difference value between the target value and the first prediction probability, wherein the difference value and the second loss value have a negative correlation relation.

The target value is an arbitrary value, and for example, the target value is 1. The larger the difference, the smaller the second loss value, and the smaller the difference, the larger the second loss value.

Optionally, in response to the first sample relationship label being a first negative sample relationship label, the target value, the first prediction probability, and the second loss value satisfy the following relationship:

wherein L is₂For the purpose of representing the second loss value,

for representing a first negative exemplar relational label, x for a first sample statement, c_xA second sample statement is represented that is,

representing the difference between a target value and the first prediction probability, the target value being 1,

for representing the first prediction probability, softmax (·) for representing a logistic regression function;

for representing a first one of the first encoded features.

313. Based on the first loss value and the second loss value, the translation model and the first classification model are adjusted.

And adjusting model parameters in the translation model and model parameters in the first classification model through the first loss value and the second loss value so as to improve the accuracy of the translation model and the classification capability of the first classification model.

In one possible implementation, this step 313 comprises: a sum of the first loss value and the second loss value is determined, and the translation model and the first classification model are adjusted based on the determined sum of the loss values.

In this embodiment, the third sample sentence and the fourth sample sentence may belong to the same group of dialogs, or may not belong to the same group of dialogs, and there is coherence between the sample sentences belonging to the same group of dialogs, and the translation model is trained based on the third sample sentence and the fourth sample sentence, so as to improve the analysis capability of the translation model on the sentences having an association relationship in terms of the continuity of the dialogs.

The second way comprises the following steps 314-:

314. and classifying the first feature vector based on a second classification model to obtain a second prediction probability, wherein the second prediction probability indicates the possibility that the third sample statement and the fourth sample statement are sent by the same interlocutor.

In this embodiment, the second classification model is used to determine the possibility that two sample sentences are issued by the same interlocutor, and the first sample relationship label indicates whether the third sample sentence and the fourth sample sentence are issued by the same interlocutor. The third sample sentence and the fourth sample sentence belong to the same group of conversations, but the third sample sentence and the fourth sample sentence may or may not be sent by the same interlocutor.

For example, the third sample sentence and the fourth sample sentence are sentences included in the same fifth dialogue data, and the fifth dialogue data is obtained by dialogue in the source language by at least two interlocutors, that is, the third sample sentence and the fourth sample sentence belong to the same group of dialogue. The fifth dialogue data comprises statements sent by the interlocutor A and the interlocutor B, and if the third sample statement and the fourth sample statement are both sent by the interlocutor A or the interlocutor B, the third sample statement and the fourth sample statement are sent by the same interlocutor; if the third sample sentence is uttered by the interlocutor a and the fourth sample sentence is uttered by the interlocutor B, the third sample sentence and the fourth sample sentence are not uttered by the same interlocutor.

Optionally, in a case where the third sample statement and the fourth sample statement are issued by the same interlocutor, the fourth sample statement is a context statement of the third sample statement. For example, the third sample sentence and the fourth sample sentence both belong to fifth dialogue data, and in the fifth dialogue data, the fifth sample sentence is at least one sentence before the third sample sentence, that is, the fourth sample sentence is a context sentence of the third sample sentence.

Because the first feature vector in the first coding feature can embody the connection between the third sample statement and the fourth sample statement, the first feature vector in the first coding feature is classified based on the second classification model, and the second prediction probability is obtained according to the possibility that the third sample statement and the fourth sample statement are sent by the same interlocutor.

315. Based on the first prediction result, a first loss value is obtained.

This step is similar to the step 311, and will not be described herein again.

316. And acquiring a third loss value based on the second prediction probability and the first sample relation label.

Since the first sample relationship label indicates a true association relationship between the third sample statement and the fourth sample statement, that is, whether the third sample statement and the fourth sample statement are issued by the same interlocutor, the second prediction probability indicates a predicted possibility that the third sample statement and the fourth sample statement are issued by the same interlocutor. And acquiring a third loss value based on the second prediction probability and the first sample relation label, wherein the acquired third loss value represents the accuracy of the predicted result.

In one possible implementation manner, the first sample relationship label includes a second positive sample relationship label or a second negative sample relationship label, the second positive sample relationship label indicates that the third sample statement and the fourth sample statement are issued by the same interlocutor, the second negative sample relationship label indicates that the third sample statement and the fourth sample statement are not issued by the same interlocutor, when the first sample relationship labels are different, the manner of obtaining the third loss value is different, that is, the process of obtaining the third loss value includes the following two manners:

the first mode is as follows: and acquiring a third loss value based on the second prediction probability in response to the first sample relation label being a second positive sample relation label, wherein the second prediction probability and the third loss value are in a negative correlation relationship.

The larger the second prediction probability is, the smaller the third loss value is, and the smaller the second prediction probability is, the larger the third loss value is.

The second mode is as follows: and responding to the first sample relation label as a second negative sample relation label, and acquiring a third loss value based on the difference value between the target value and the second prediction probability, wherein the difference value and the third loss value are in a negative correlation relation.

It should be noted that the process of obtaining the third loss value is the same as the process of obtaining the second loss in step 312, and is not described herein again.

317. Based on the first loss value and the third loss value, the translation model and the second classification model are adjusted.

This step is similar to step 313 described above and will not be described further herein.

In this embodiment of the application, the third sample sentence and the fourth sample sentence belong to the same group of conversations, and may or may not be sent by the same interlocutor, and the sentences sent by the same interlocutor can embody the characteristics expressed by the interlocutor, and the translation model is trained based on the third sample sentence and the fourth sample sentence, so as to improve the analysis capability of the translation model on the sentences having an association relationship in terms of the characteristics expressed by the interlocutor.

It should be noted that, in the embodiment of the present application, the second loss value or the third loss value is obtained only by the third sample statement and the fourth sample statement, and the first sample relationship tag, and after the second loss value or the third loss value is obtained, the translation model and the classification model are adjusted in combination with the first loss value, but in another embodiment, the above step 305 is not performed, and two sets of sample data are obtained, where the first set of sample data includes two sample statements and corresponding sample relationship tags, two sample statements in the first set of sample data both belong to the source language, the sample relationship tag in the first set of sample data indicates whether two sample statements in the first set of sample data belong to the same group of dialogs, the second set of sample data includes two sample statements and corresponding sample relationship tags, two sample statements in the second set of sample data belong to the same group of dialogs, and both belong to the source language, and the sample relationship label in the second group of sample data indicates whether two sample sentences in the second group of sample data are sent by the same interlocutor. Then, according to step 306-.

In the embodiment of the application, sample sentences are obtained from two angles of conversation continuity and interlocutor expression characteristics, the translation model is trained based on the obtained sample data, and the multitask training method for the translation model is realized, so that the trained model can improve the analysis capability of the sentences with the incidence relation in multiple angles, and the analysis capability of the sentences with the incidence relation can be combined with the translation model when the sentences are translated based on the translation model in the following process, thereby improving the accuracy of the translation model.

In one possible implementation, after the first loss value, the fourth loss value, and the fifth loss value are obtained, a sum of the first loss value, the fourth loss value, and the fifth loss value is determined, and the translation model, the first classification model, and the second classification model are adjusted based on the determined sum of the loss values.

Optionally, a sum of the first loss value, the fourth loss value, the fifth loss value, and the determined loss value satisfies the following relationship:

wherein the content of the first and second substances,

for indicating the sum of determined loss values, theta_nct、

For representing parameters in translation models, L₁For the purpose of representing a first loss value,

for the purpose of representing a fourth loss value,

for representing a fifth loss value.

It should be noted that the above is only described by taking an iteration of the translation model, the first classification model and the second classification model as an example, and in another embodiment, according to the above steps, the translation model, the first classification model and the second classification model are iterated for multiple times to improve the accuracy of the translation model.

In one possible implementation, a fourth sample data set is obtained, where the fourth sample data set includes at least one dialog data, the dialog data belongs to a source language, and based on the fourth sample data set, a first sample data subset and a second sample data subset are obtained, where the first sample data subset includes multiple sets of sample data, each set of sample data includes two sample statements and corresponding sample relationship tags, two sample statements in each set of sample data both belong to the source language, the sample relationship tag in each set of sample data indicates whether two sample statements in the set of sample data belong to a same set of dialog, the second sample subset includes multiple sets of sample data, each set of sample data includes two sample statements and corresponding sample relationship tags, two sample statements in each set of sample data belong to a same set of dialog and both belong to the source language, and the sample relationship tag in each set of sample data indicates whether two sample statements in the set of sample data belong to a same set of dialog One talker makes.

Optionally, the multiple sets of sample data in the first sample data subset include positive sample data and negative sample data, the positive sample data indicates that the included sample relationship label indicates that the two corresponding sample statements belong to the same group of conversations, and the negative sample data indicates that the included sample relationship label indicates that the two corresponding sample statements do not belong to the same group of conversations.

Optionally, the multiple groups of sample data in the second subset of sample data include positive sample data and negative sample data, the positive sample data indicates whether the included sample relationship label indicates that the two corresponding sample statements are sent by the same interlocutor, and the negative sample data indicates whether the included sample relationship label indicates that the two corresponding sample statements are sent by the same interlocutor.

When the translation model, the first classification model and the second classification model are subjected to iterative training according to the mode, a group of sample data is respectively selected from the first sample data subset and the second sample data subset in each iteration, and the translation model, the first classification model and the second classification model are subjected to iterative training once according to the mode by utilizing the two groups of selected sample data.

Based on the embodiment shown in fig. 3, after the translation model is trained, the translation model can be iteratively trained again in combination with the sample data set with bilingual dialogue data, and the iterative training process is described in detail in the following embodiment.

Fig. 4 is a flowchart of a sentence translation method provided in an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 4, the method includes:

401. the method comprises the steps of obtaining a first sample data set, wherein the first sample data set comprises first dialogue data and second dialogue data with the same meaning, the first dialogue data belong to a source language, the second dialogue data belong to a target language, the first dialogue data and the second dialogue data are obtained through translation based on third dialogue data, and the third dialogue data are obtained through dialogue conducted by at least two interlocutors in the source language and the target language respectively.

The first sample data set comprises first dialogue data belonging to a source language and second dialogue data belonging to a target language, namely the first sample data set is a bilingual sample data set. The third dialogue data is obtained by at least two interlocutors performing dialogues in the source language and the target language, respectively, that is, the third dialogue data contains a dialogue sentence belonging to the source language and a dialogue sentence belonging to the target language. After the third dialogue data is obtained, translating sentences which belong to the target language in the third dialogue data, wherein the translated sentences belong to the source language, and the translated sentences and the sentences which belong to the source language in the third dialogue data form the first dialogue data; and translating the sentences belonging to the source language in the third dialogue data, translating the translated sentences belonging to the target language, and forming the translated sentences and the sentences belonging to the target language in the third dialogue data into the second dialogue data.

For example, if the interlocutor a dialogues in the source language and the interlocutor B dialogues in the target language, the interlocutor a dialogues with the interlocutor B to obtain the third dialogue data, such as the sentence sequence of the third dialogue data expressed as (X)₁，Y₂，X₃，Y₄，…，X_u-1，Y_u) Wherein (X)₁，X₃，…，X_u-1) Is a sentence by the interlocutor A, (Y)₂，Y₄，…，Y_u) Is a statement made by interlocutor B. Translating the sentence sent by the interlocutor B in the third dialogue data, and forming the first dialogue data by the translated sentence belonging to the source language and the sentence sent by the interlocutor A in the third dialogue data, namely, expressing the sentence sequence of the first dialogue data as (X)₁，X₂，X₃，X₄，…，X_u-1，X_u)，(X₂，X₄，…，X_u) Is pair (Y)₂，Y₄，…，Y_u) And translating to obtain the target product. The sentence uttered by the speaker A in the third dialogue data is translated, and the translated sentence belonging to the target language and the sentence uttered by the speaker B in the third dialogue data constitute the second dialogue data, that is, the sentence sequence of the second dialogue data is expressed as (Y)₁，Y₂，Y₃，Y₄，…，Y_u-1，Y_u)，(Y₁，Y₃，…，Y_u-1) Is pair (X)₁，X₃，…，X_u-1) And translating to obtain the target product.

In one possible implementation, the first sample data set includes a plurality of sets of dialogue data, each set of dialogue data including one first dialogue data and a second dialogue data having the same meaning, the first dialogue data being different in different sets of dialogue data.

402. And acquiring a fifth sample sentence and a first associated sentence associated with the fifth sample sentence from the first dialogue data, and acquiring a sixth sample sentence having the same meaning as the fifth sample sentence from the second dialogue data.

In the embodiment of the present application, the fifth sample sentence and the first associated sentence have an association relationship, the fifth sample sentence and the sixth sample sentence have the same meaning, and both the fifth sample sentence and the first associated sentence belong to the source language, and the sixth sample sentence belongs to the target language.

In one possible implementation, the fifth sample statement belongs to the same first dialogue data as the first association statement. Optionally, the first associated statement is a context statement of a fifth sample statement. For example, in the same first dialogue data, the first association statement is at least one statement before the fifth sample statement, that is, the first association statement is a context statement of the fifth sample statement.

In one possible implementation, this step 402 includes: and acquiring a third sample data subset based on the first sample data set, wherein the third sample data subset comprises a plurality of groups of dialogue data, each group of dialogue data comprises a fifth sample statement, a first associated statement associated with the fifth sample statement and a sixth sample statement with the same meaning as the fifth sample number, and any group of dialogue data is selected from the third sample data subset.

In this embodiment of the application, the third sample data subset is obtained based on the first dialogue data and the second dialogue data in the first sample data set. By acquiring the third sample data subset, when the translation model is trained, any one group of dialogue data can be selected from the third sample data subset without acquiring the third sample data subset from the first dialogue data and the second dialogue data, so that the efficiency of acquiring the sample data is improved.

403. And obtaining a seventh sample statement and an eighth sample statement from the at least one first dialogue data, and determining a second sample relation label, wherein the second sample relation label indicates whether the seventh sample statement and the eighth sample statement have an association relation.

In an embodiment of the present application, the first sample sentence includes a plurality of first dialogue data, each of the first dialogue data belongs to a source language, the seventh sample sentence and the eighth sample sentence are both obtained from one first dialogue data, and the first dialogue data to which the seventh sample sentence and the second sample sentence belong may be the same or different.

The seventh sample statement and the eighth sample statement may or may not have an association relationship, and the second sample relationship label may be represented in an arbitrary form, for example, the second sample relationship label indicates whether there is an association relationship between the seventh sample statement and the eighth sample statement in a numerical form, for example, the second sample relationship label is 1, which indicates that there is an association relationship between the seventh sample statement and the eighth sample statement, and the second sample relationship label is 0, which indicates that there is an association relationship between the seventh sample statement and the eighth sample statement. Optionally, the second exemplar relationship label is set manually.

For example, the seventh sample statement and the eighth sample statement have an association relationship therebetween, which indicates that the seventh sample statement and the eighth sample statement belong to the same first dialogue data; or, the seventh sample sentence and the eighth sample sentence are issued by the same interlocutor, and the seventh sample sentence and the eighth sample sentence belong to the same first dialogue data. The seventh sample statement and the eighth sample statement do not have an incidence relation, and the seventh sample statement and the eighth sample statement do not belong to the same first dialogue data; alternatively, the seventh sample sentence and the eighth sample sentence belong to the same first dialogue data, but the seventh sample sentence and the eighth sample sentence are not issued by the same interlocutor.

In one possible implementation, this step 403 includes: obtaining a fourth sample data subset and a fifth sample data subset based on the first sample data set, the fourth sample data subset comprising a plurality of sets of sample data, each set of sample data comprising two sample statements and corresponding sample relationship labels, the two sample statements in each set of sample data both belonging to the source language, the sample relationship label in each set of sample data indicating whether the two sample statements in the set of sample data belong to the same set of dialogues, the fifth sample data subset comprising a plurality of sets of sample data, each set of sample data comprising two sample statements and corresponding sample relationship labels, the two sample statements in each set of sample data belonging to the same set of dialogues, and the sample relation labels in each group of sample data indicate whether two sample sentences in the group of sample data are sent by the same interlocutor, and any group of sample data is selected from the fourth sample data subset or the fifth sample data subset.

In an embodiment of the present application, the fourth subset of sample data and the fifth subset of sample data are both obtained based on at least one first session data in the first sample data set. By acquiring the fourth sample data subset and the fifth sample data subset, when the translation model is trained, any one group of dialogue data is selected from the fourth sample data subset or the fifth sample data subset, and the acquisition from the first dialogue data is not needed, so that the efficiency of acquiring the sample data is improved.

Optionally, the multiple groups of sample data in the fourth subset of sample data include positive sample data and negative sample data, the positive sample data indicates that the included sample relationship label indicates that the two corresponding sample statements belong to the same group of dialogues, and the negative sample data indicates that the included sample relationship label indicates that the two corresponding sample statements do not belong to the same group of dialogues.

For example, for any positive sample data in the fourth subset of sample data comprising a sample statement A and a sample statement B belonging to the source language, both sample statement A and sample statement B belong to the same dialogue data, e.g. the dialogue data has a statement sequence identifier of (X)₁，X₂，X₃，X₄，…，X_u-1，X_u) The sample statement A is (X)_u) Sample statement B (X)₁，X₂，X₃，X₄，…，X_u-1) I.e. sample statement B is at least one statement preceding sample statement a.

Optionally, the multiple groups of sample data in the fifth sample data subset include positive sample data and negative sample data, the positive sample data indicates whether the included sample relationship label indicates that the two corresponding sample statements are sent by the same interlocutor, and the negative sample data indicates whether the included sample relationship label indicates that the two corresponding sample statements are sent by the same interlocutor.

For example, for any positive sample data in the fifth sample data subset comprising sample statement A and sample statement B belonging to the source language, sample statement A and sample statement B both belong to the same dialogue data and are issued by the same interlocutor, e.g., the sentence sequence of the dialogue data is identified as (X)₁，X₂，X₃，X₄，…，X_u-1，X_u) Wherein (X)₁，X₃，…，X_u-1) Is sent by the interlocutor 1, (X)₂，X₄，…，X_u) Is formed byUttered by talker 2; sample statement A is (X)_u) Sample statement B (X)₂，X₄，…，X_u-2) That is, the sample statement B is at least one statement before the sample statement a; sample statement A is (X)_u-1) Sample statement B (X)₁，X₃，…，X_u-3) I.e. sample statement B is at least one statement preceding sample statement a.

404. Based on the translation model, the fifth sample sentence, the first associated sentence, and the sixth sample sentence, a second prediction result is obtained, the second prediction result indicating a possibility of translating the fifth sample sentence into the sixth sample sentence based on the translation model.

Wherein the second prediction result can reflect the accuracy of the translation model. Because the first association statement and the fifth sample statement have an association relationship, the meaning expressed by the first association statement and the meaning expressed by the fifth sample statement may have an association, and the translation model can be helped to translate the fifth sample statement into the sixth sample statement based on the meaning expressed by the first association statement, so that the fifth sample statement, the first association statement and the sixth sample statement are combined to obtain the second prediction result, and the accuracy of the obtained prediction result is improved.

In one possible implementation, this step 404 includes:

4041. and splicing the fifth sample statement and the first associated statement to obtain a second spliced statement.

And splicing the fifth sample statement and the first associated statement into a long statement by adopting a splicing mode.

In one possible implementation, the process of obtaining the second concatenation statement includes: and splicing the fifth sample statement after the first association statement to obtain the second spliced statement.

Optionally, in the concatenation process, a separator is added between the first associated statement and the fifth sample statement, that is, the obtained second concatenated statement includes the first associated statement, the separator, and the fifth sample statement. By adding the separator to the second spliced sentence, the first associated sentence and the fifth sample sentence contained in the second spliced sentence can be distinguished based on the separator in the following.

4042. And coding the second spliced statement based on the translation model to obtain a fifth coding characteristic corresponding to the second spliced statement.

In a possible implementation manner, the fifth encoding feature includes a plurality of feature vectors, a first feature vector of the plurality of feature vectors corresponds to a starting character before the second concatenated sentence, each feature vector of the plurality of feature vectors except the first feature vector corresponds to a fifth word, the fifth word refers to a word in the second concatenated sentence, and each feature vector is obtained by weighted fusion of a word vector of the starting character and word vectors of the plurality of fifth words.

In one possible implementation, step 4042 includes: and based on the coding submodel in the translation model, coding features corresponding to the second spliced sentence are coded to obtain the fifth coding feature.

Optionally, the encoding sub-model includes a plurality of encoding layers, and the process of obtaining the fifth encoding characteristic based on the encoding sub-model includes: and coding the coding features corresponding to the second splicing statement based on the first coding layer to obtain a first coding feature, coding the coding features output by the previous coding layer again based on the second coding layer to output a coding feature, and repeating the steps until the fifth coding feature is output by the last coding layer.

Optionally, in the process of obtaining the fifth coding feature based on multiple coding layers, only the first coding layer performs weighted fusion on the vector corresponding to each word in the second concatenation statement, and starting from the second coding layer, only the vector corresponding to the word in the fifth sample statement in the second concatenation statement is performed weighted fusion, and the vector corresponding to the word in the first association statement is not changed any more.

For example, for a first coding layer of the multiple coding layers, the coding features input to the first coding layer include multiple word vectors, where the multiple word vectors include a word vector of the starting character and a word vector of each fourth word, and the processing procedure of the first coding layer includes: for each word vector, based on a self-attention sublayer in the coding layer, performing weighted fusion on a plurality of word vectors in the coding features, performing fusion on the vector subjected to weighted fusion and the word vector to obtain a feature vector after updating the word vector, and forming the feature vector after updating the plurality of word vectors into the coding features output by the self-attention sublayer; and based on the feedforward network sublayer, performing feature transformation on the coding features output from the attention sublayer, and fusing the features after feature transformation and the coding features output from the attention sublayer to obtain the coding features output by the coding layer.

For any coding layer of the plurality of coding layers except the first coding layer, the coding features input into the coding layer include a plurality of first vectors, and then the processing procedure of the coding layer includes: determining a first vector corresponding to a word in a fifth sample statement in the plurality of first vectors, performing weighted fusion on the first vector corresponding to any word in the fifth sample statement based on a self-attention sublayer in the coding layer, fusing the vector subjected to weighted fusion with the first vector corresponding to the word to obtain an updated feature vector corresponding to the word, and forming the updated feature vectors corresponding to the plurality of words in the fifth sample statement and the first vector corresponding to each word in the first associated statement into a coding feature output by the self-attention sublayer; and based on the feedforward network sublayer, performing feature transformation on the coding features output from the attention sublayer, and fusing the features after feature transformation and the coding features output from the attention sublayer to obtain the coding features output by the coding layer.

And starting from the second coding layer, only carrying out weighted fusion on the vectors corresponding to the words in the fifth sample sentence in the second spliced sentence, ensuring that the vectors corresponding to the words in the first associated sentence are not changed any more, and outputting fifth coding characteristics by the last coding layer according to the coding.

Optionally, the first associated statement in the second concatenation statement precedes the fifth sample statement, and a spacer is provided between the first associated statement and the fifth associated statement, the coding layer determines a vector after the spacer in the input feature as a vector corresponding to a word in the fifth sample statement.

It should be noted that step 4042 is the same as step 301, and is not described herein again.

4043. And coding the sixth sample sentence based on the translation model to obtain sixth coding features corresponding to the sixth sample sentence, wherein the sixth coding features comprise third feature vectors corresponding to each third word, the third words refer to the words in the sixth sample sentence, and each third feature vector is obtained by weighting and fusing the corresponding third word and the word vectors of the previous third words.

Step 4043 is similar to step 302, and will not be described herein.

4044. And fusing the fifth coding feature and the sixth coding feature based on the translation model to obtain fusion features, wherein the fusion features comprise fusion feature vectors corresponding to each third word.

The process is the same as step 303, and will not be described herein.

It should be noted that, in the embodiment of the present application, the sixth coding feature is obtained based on the translation model, and the fusion feature is obtained based on the fifth coding feature and the sixth coding feature, while in another embodiment, the translation model further includes a decoding sub-model, and the decoding sub-model includes a plurality of decoding layers, and the fusion feature is obtained based on the plurality of decoding layers, the sixth sample sentence, and the fifth coding feature.

In one possible implementation, the process of obtaining the fusion feature based on the plurality of decoding layers, the sixth sample statement and the fifth coding feature includes: extracting features of a sixth sample sentence based on a feature extraction submodel in the translation model to obtain a coding feature corresponding to the sixth sample sentence, and coding the coding feature corresponding to the sixth sample sentence based on the first decoding layer to obtain a first coding feature; fusing the first coding feature and the fifth coding feature based on the first decoding layer to obtain a first fused feature; and based on the second decoding layer, encoding the fusion feature output by the previous decoding layer to obtain a second encoding feature, based on the second decoding layer, fusing the second encoding feature and the fifth encoding feature to obtain a next fusion feature, and repeating the steps until the last fusion feature is output by the last decoding layer.

Optionally, each decoding layer includes a mask self-attention sublayer, a cross-attention sublayer and a feedforward network sublayer, where the mask self-attention sublayer is configured to perform weighted fusion on vectors in the input features in sequence according to an order of the vectors in the input features; the cross attention sublayer is used for fusing the coding features output by the mask from the attention sublayer with the fifth coding features; the feedforward network sublayer is used for performing feature transformation on the fusion features output by the cross attention sublayer and outputting the transformed fusion features.

Optionally, the processing procedure of the cross-attention sublayer includes: determining a third vector corresponding to each word in a fifth sample statement in fifth coding features based on a cross attention sublayer, performing weighted fusion on a plurality of third vectors and the coding features output by a mask from the attention sublayer based on the cross attention sublayer for any third vector, performing fusion on the weighted fused features and the third vectors to obtain a vector fused with the third vectors, forming initial fused features based on the fused vectors corresponding to the plurality of third vectors and a fourth vector except the third vector in the fifth coding features, and fusing the initial fused features and the coding features output by the mask from the attention sublayer to obtain fused features.

In the embodiment of the present application, when the cross attention sublayer fuses the coding features masked and output from the attention sublayer and the fifth coding features, only the feature vector corresponding to each word in the fifth sample sentence in the fifth coding features is subjected to weighted fusion, and the feature vector of each word in the first related sentence in the fifth coding features is not subjected to weighted fusion.

4045. Based on the translation model and the fusion features, a fourth prediction probability corresponding to each third word is obtained, wherein the fourth prediction probability indicates the possibility of translating each fusion feature vector into the corresponding third word based on the translation model.

This step is similar to step 304, and will not be described herein again.

405. And acquiring a second prediction relation label based on the translation model, the seventh sample statement and the eighth sample statement, wherein the second prediction relation label indicates a prediction incidence relation between the seventh sample statement and the eighth sample statement.

And the second prediction relation label is a relation label predicted based on a processing result after the seventh sample statement and the eighth sample statement are processed based on the translation model.

In one possible implementation, this step 405 includes: and splicing the seventh sample sentence and the eighth sample sentence to obtain a third spliced sentence, coding the third spliced sentence based on the translation model to obtain the coding characteristics of the third spliced sentence, wherein the coding characteristics comprise a plurality of characteristic vectors, the first characteristic vector corresponds to a starting character positioned in front of the third spliced sentence, each characteristic vector except the first characteristic vector in the plurality of characteristic vectors corresponds to a word, the word refers to a word in the third spliced sentence, each characteristic vector is obtained by weighted fusion of a word vector of the starting character and word vectors of a plurality of words, and the first characteristic vector in the coding characteristics of the third spliced sentence is classified to obtain a second prediction relation label.

The steps are similar to the steps 306-308 described above, and are not described herein again.

406. And adjusting the translation model based on the second prediction result, the second prediction relation label and the second sample relation label.

This step is similar to step 309, and will not be described herein again.

In this embodiment of the application, the second prediction result is a prediction probability corresponding to each term included in the sixth sample term, that is, the prediction probabilities corresponding to the terms can show the accuracy of the translation model, and the second prediction result is obtained by combining the fifth sample term and the first related term related to the fifth sample term, so that the second prediction result can also show the analysis capability of the translation model on the terms having a relationship. The analysis capability of the translation model on the sentences with the incidence relations can be determined based on the first prediction relation label and the first sample relation label, and model parameters in the translation model are adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, namely, the analysis capability of the translation model on the sentences with the incidence relations is improved under the condition that the translation model has the translation capability.

It should be noted that in the embodiment of the present disclosure, the sample statement is obtained from the first sample dataset, and the translation model is trained by using the obtained sample statement, but in another embodiment, the step 402 and the step 406 need not be executed, and other ways can be adopted to iteratively train the translation model again based on the first sample dataset.

It should be noted that the above is only described by taking one iteration of the translation model as an example, and in another embodiment, according to the above steps 401 and 406, the translation model is iterated multiple times to improve the accuracy of the translation model.

According to the method provided by the embodiment of the application, the second prediction result can embody the accuracy of the translation model, and the second prediction result is obtained by combining the fifth sample sentence and the first associated sentence associated with the fifth sample sentence, so that the analysis capability of the translation model on the sentences with the association relationship can be embodied in the second prediction result. The analysis capability of the translation model on the sentences with the incidence relations can be determined based on the first prediction relation label and the first sample relation label, and model parameters in the translation model are adjusted based on the first prediction result, the first prediction relation label and the first sample relation label, namely, the analysis capability of the translation model on the sentences with the incidence relations is improved under the condition that the translation model has the translation capability.

It should be noted that, on the basis of the embodiment shown in fig. 3, the second prediction relationship label can be obtained based on the classification model, for example, if the seventh sample statement, the eighth sample statement and the second sample relationship label are obtained from the fourth sample data subset, the second prediction relationship label is obtained based on the third classification model; and if the seventh sample statement, the eighth sample statement and the second sample relational tag are acquired from the fifth sample data subset, the second prediction relational tag is obtained based on the fourth classification model. Then in step 406, the translation model is adjusted and the third classification model or the fourth classification model is adjusted based on the second prediction result, the second prediction relationship label, and the second sample relationship label.

Based on the embodiment shown in fig. 3, a set of session data is selected from the third subset of sample data, a set of sample data is selected from the fourth subset of sample data and the fifth subset of sample data, a sixth loss value is determined based on the second prediction result according to the step 311 by using the selected session data and sample data, a seventh loss value is obtained based on the sample data and the third classification model selected from the fourth subset of sample data according to the steps 310 and 312, an eighth loss value is obtained based on the sample data set and the fourth classification model selected from the fifth subset of sample data according to the steps 314 and 317, and the translation model, the third classification model and the fourth classification model are adjusted based on the sixth loss value, the seventh loss value and the eighth loss value.

In one possible implementation, after the sixth loss value, the seventh loss value, and the eighth loss value are obtained, a sum of the sixth loss value, the seventh loss value, and the eighth loss value is determined, and the translation model, the third classification model, and the fourth classification model are adjusted based on the determined sum of the loss values.

Optionally, a sum of the sixth loss value, the seventh loss value, the eighth loss value, and the determined loss value satisfies the following relationship:

wherein the content of the first and second substances,

for indicating the sum of determined loss values, theta_nct、

For representing a parameter, L'₁For the purpose of representing a sixth loss value,

for the purpose of representing a seventh loss value,

for representing an eighth loss value; t represents the order of the second words in the sixth sample sentence, y represents the sixth sample sentence, | y | represents the total number of words contained in the sixth sample sentence, y_tDenotes the t-th word, X, in the sixth sample sentence_uA fifth sample statement is represented that is,

representing a first associated sample sentence, y < t representing a word preceding the t-th word in a sixth sample sentence,

the prediction probability corresponding to the t-th word in the sixth sample sentence.

Based on the embodiment shown in fig. 3, before the translation model is trained according to the

above steps

301 and 309, the translation model can be iteratively trained based on the second sample data set, so that the translation model after iterative training has a preliminary translation capability. The process of iteratively training the translation model based on the second sample data set is described in detail in the following embodiments.

Fig. 5 is a flowchart of a sentence translation method provided in an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 5, the method includes:

501. and acquiring a second sample data set, wherein the second sample data set comprises a ninth sample sentence and a tenth sample sentence which have the same meanings, the ninth sample sentence belongs to the source language, and the tenth sample sentence belongs to the target language.

In the embodiment of the present application, the second sample data set is a sample data set at a sentence level, and the second sample data set includes a ninth sample sentence and a tenth sample sentence which have the same meaning and belong to the source language and the target language, respectively.

502. And performing iterative training on the translation model based on the second sample data set.

And performing iterative training on the translation model through the sample sentences belonging to the source language and the target language in the second sample data set so as to improve the translation capability of the translation model.

In one possible implementation, this step 502 includes: encoding a ninth sample sentence based on the translation model to obtain encoding features corresponding to the ninth sample sentence, encoding a tenth sample sentence based on the translation model to obtain encoding features corresponding to the tenth sample sentence, fusing the encoding features corresponding to the ninth sample sentence and the encoding features corresponding to the tenth sample sentence based on the translation model to obtain fused features, the fused features including fused feature vectors corresponding to each word included in the tenth sample sentence, obtaining a prediction probability corresponding to each word included in the tenth sample sentence based on the translation model and the fused features, the prediction probability indicating a possibility of translating each fused feature vector in the fused features into a corresponding word based on the translation model, obtaining a loss value based on the prediction probability corresponding to each word included in the tenth sample sentence, obtaining a loss value based on the loss value, the translation model is adjusted.

The steps are similar to the

steps

301, 304 and 311, and are not described herein again.

In the method provided by the embodiment of the application, the second sample data set is used as a statement level sample data set and comprises a ninth sample statement belonging to a source language and a tenth sample statement belonging to a target language, and the ninth sample statement and the tenth sample statement have the same meaning, so that the translation model is trained based on the sample statements in the second sample data set, the trained translation model has translation capability, and the accuracy of the translation model is improved.

It should be noted that, the present application is only described in the above embodiments, but in another embodiment, the above embodiments can be combined arbitrarily, for example, the embodiments shown in fig. 3, fig. 4 and fig. 5 are combined, and according to the embodiment shown in fig. 5, a translation model is trained based on a sample data set at a sentence level so that the translation model has a translation capability; according to the embodiment shown in fig. 3, the translation model is trained based on the sample data set at the sentence level and the monolingual sample data set, and the analysis capability of the sentences with the association relationship of the translation model is improved under the condition that the translation model has the translation capability, so that the analysis capability of the sentences with the association relationship of the translation model can be combined when the translation model translates the sentences, and the accuracy of the translation model is improved; according to the embodiment shown in fig. 4, the translation model is trained based on the bilingual sample data set, so that the accuracy of the translation model is further improved.

The method provided by the embodiment of the application realizes a continuous learning method, adopts different sample data sets, trains the translation model in multiple training stages, enables the translation model to be more smooth in transition in each training stage, and can gradually improve the accuracy of the translation model, so that the translation quality of the translation model can be improved when the utterance is translated based on the translation model in the subsequent process.

Based on the embodiments shown in fig. 2 to fig. 5, after the translation model is trained, the translation task can be completed based on the trained translation model, and the translation process is described in detail in the following embodiments.

Fig. 6 is a flowchart of a sentence translation method provided in an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 6, the method includes:

601. and coding the target sentence and a second associated sentence associated with the target sentence based on the translation model to obtain a seventh coding characteristic.

The target sentence and the second associated sentence both belong to the source language, and the target sentence and the second associated sentence have an association relationship, for example, the second associated sentence is a context sentence of the target sentence.

In one possible implementation, step 601 includes: and splicing the target sentence and the second associated sentence to obtain a spliced sentence, and coding the spliced sentence based on the translation model to obtain a seventh coding characteristic.

This step is similar to the above-mentioned step 4041-4042, and will not be described herein again.

602. And coding the starting character based on the translation model to obtain the eighth coding feature corresponding to the starting character.

In the embodiment of the present application, the start character is an arbitrary character, and indicates the start position of the translated sentence. When the target sentence is translated based on the translation model, words belonging to the target language are sequentially translated, and the translated words form the sentence which has the same meaning as the target sentence and belongs to the target language. When the target sentence is translated, when the first word belonging to the target language needs to be translated currently, only the starting character is encoded, so that the first word belonging to the target language is translated based on the eighth encoding characteristic corresponding to the starting character subsequently.

603. And fusing the seventh coding feature and the eighth coding feature based on the translation model to obtain a fused feature.

The steps 602-604 are similar to the steps 302-303, and will not be described herein again.

604. And decoding the fusion characteristics based on the translation model to obtain a first translation word.

Wherein the fused feature is used to represent the first translated term. The fused features are decoded based on a translation model to determine the most likely word belonging to the target language as the translated word.

In one possible implementation, this step 604 includes: based on the translation model, comparing the fusion characteristics with word vectors of a plurality of words in a word library to obtain prediction probabilities corresponding to the words, and selecting the word corresponding to the maximum prediction probability from the prediction probabilities corresponding to the words as the first translation word.

In the embodiment of the present application, the translation model corresponds to a word library, a plurality of words included in the word library all belong to a target language, and a word vector of each word is used for representing the corresponding word. And comparing the fusion characteristics with the word vector of each word in the word library to determine the prediction probability of translating the fusion characteristics into each word, namely determining the possibility of translating the fusion characteristics into each word, and selecting the word with the highest possibility from the plurality of words as the translation word.

In one possible implementation, the translation model includes a mapping sub-model, and step 304 includes: and mapping the fusion characteristics based on a mapping sub-model in a translation model to obtain a probability sequence, wherein the probability sequence comprises the probabilities of a plurality of dimensions, each dimension corresponds to a word in a word library, and the word corresponding to the maximum prediction probability is used as the first translation word.

It should be noted that, in the embodiment of the present application, the first translation word is obtained by using the start character and using a coding re-fusion method, but in another embodiment, the first translation word can be obtained by using other methods based on the translation model and the seventh coding feature without performing

step

602 and 604.

605. And coding the initial character and the currently obtained translation word based on the translation model to obtain a ninth coding characteristic.

In the embodiment of the application, a translated sentence corresponding to a target sentence is obtained by a word-by-word translation mode, and in the translation process, when a next translated word is translated, a start character and a currently obtained translated word need to be utilized for translation, so that the start character and the currently obtained translated word are encoded, that is, the obtained ninth encoding characteristic can represent the meaning represented by the start character and the currently obtained translated word.

606. And fusing the seventh coding feature and the ninth coding feature based on the translation model to obtain a fused feature.

607. And decoding the fusion characteristics based on the translation model to obtain the next translation word.

In the embodiment of the application, in the process of translating the target sentence into the sentence belonging to the target language based on the translation model, each translation word belonging to the target language is sequentially translated, and in the process of acquiring each translation word, the next word is translated by using the currently obtained translation word.

The steps 605-607 are the same as the steps 602-604, and will not be described herein again.

It should be noted that, in the embodiment of the present application, the start character is utilized, and the next translation term is obtained by using a coding re-fusion method, but in another embodiment, the

step

605 and 607 need not be executed, and other methods can be adopted to obtain the next translation term based on the translation model, the seventh coding feature and the first translation term.

608. The above steps 605-607 are repeated until the last translation word is obtained.

In the embodiment of the present application, each time the above steps 605-.

609. And forming the obtained translation words into a translation sentence.

Wherein the translated sentence belongs to the target language, and the translated sentence has the same meaning as the target sentence.

It should be noted that, in the embodiment of the present application, the translated sentence is obtained in a word-by-word translation manner, but in another embodiment, step 601 and step 609 need not be executed, and other manners can be adopted, based on the translation model, the target sentence, and the second associated sentence associated with the target sentence, the translated sentence corresponding to the target sentence is obtained, both the target sentence and the second associated sentence belong to the source language, and the translated sentence belongs to the target language.

According to the method provided by the embodiment of the application, the target sentence is translated by using the second associated sentence associated with the target sentence based on the translation model, and the accuracy of the obtained translated sentence can be improved by combining the analysis capability of the translation model on the sentences with the associated relation. And moreover, a translation sentence corresponding to the target sentence is obtained by adopting a word-by-word translation mode, and the currently translated words are combined in the translation process to ensure that the translated words have relevance, so that the accuracy of the obtained translation sentence is ensured.

Fig. 7 is a schematic structural diagram of a sentence translating apparatus according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes:

an obtaining module 701, configured to obtain a first prediction result based on the translation model, the first sample sentence, and the second sample sentence, where the first prediction result indicates a possibility of translating the first sample sentence into the second sample sentence based on the translation model, the first sample sentence and the second sample sentence have the same meaning, and the first sample sentence belongs to a source language and the second sample sentence belongs to a target language;

the obtaining module 701 is further configured to obtain a third sample sentence, a fourth sample sentence, and a first sample relationship tag, where the first sample relationship tag indicates whether the third sample sentence and the fourth sample sentence have an association relationship, and the third sample sentence and the fourth sample sentence both belong to a source language;

the obtaining module 701 is further configured to obtain a first prediction relationship label based on the translation model, the third sample statement and the fourth sample statement, where the first prediction relationship label indicates a prediction association relationship between the third sample statement and the fourth sample statement;

an adjusting module 702, configured to adjust the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label.

In one possible implementation, as shown in fig. 8, the obtaining module 701 includes:

the splicing unit 7011 is configured to splice the third sample statement and the fourth sample statement to obtain a first spliced statement;

the encoding unit 7012 is configured to encode the first spliced sentence based on the translation model to obtain a first encoding feature, where the first encoding feature includes a plurality of first feature vectors, a first one of the first feature vectors corresponds to a start character located before the first spliced sentence, each of the first feature vectors except for the first one of the first feature vectors corresponds to a first word, the first word refers to a word in the first spliced sentence, and each of the first feature vectors is obtained by weighted fusion of a word vector of the start character and word vectors of the first words;

a classifying unit 7013, configured to classify the first feature vector to obtain a first prediction relationship label.

In another possible implementation, the first sample relationship label indicates whether the third sample statement and the fourth sample statement belong to the same group of dialogs; a classifying unit 7013, configured to classify the first feature vector based on the first classification model to obtain a first prediction probability, where the first prediction probability indicates a possibility that the third sample statement and the fourth sample statement belong to the same group of conversations;

as shown in fig. 8, the adjusting module 702 includes:

a first obtaining unit 7021, configured to obtain a first loss value based on the first prediction result;

a first obtaining unit 7021, further configured to obtain a second loss value based on the first prediction probability and the first sample relationship label;

a first adjusting unit 7022 is configured to adjust the translation model and the first classification model based on the first loss value and the second loss value.

In another possible implementation manner, the first obtaining unit 7021 is configured to, in response to that the first sample relationship label is a first positive sample relationship label, obtain a second loss value based on a first prediction probability, where the first positive sample relationship label indicates that the third sample statement and the fourth sample statement belong to the same group of dialogues, and the first prediction probability and the second loss value are in a negative correlation relationship; or, in response to the first sample relation label being the first negative sample relation label, obtaining a second loss value based on a difference between the target value and the first prediction probability, where the first negative sample relation label indicates that the third sample statement and the fourth sample statement do not belong to the same group of conversations, and a negative correlation exists between the difference and the second loss value.

In another possible implementation manner, the third sample statement and the fourth sample statement belong to the same group of conversations, and the first sample relationship tag indicates whether the third sample statement and the fourth sample statement are issued by the same interlocutor; a classifying unit 7013, configured to classify the first feature vector based on a second classification model to obtain a second prediction probability, where the second prediction probability indicates a possibility that the third sample statement and the fourth sample statement are issued by the same interlocutor;

as shown in fig. 8, the adjusting module 702 includes:

the first obtaining unit 7021 is further configured to obtain a third loss value based on the second prediction probability and the first sample relationship label;

a first adjusting unit 7022 is configured to adjust the translation model and the second classification model based on the first loss value and the third loss value.

In another possible implementation manner, the first obtaining unit 7021 is configured to, in response to that the first sample relationship label is a second positive sample relationship label, obtain a third loss value based on a second prediction probability, where the second positive sample relationship label indicates that the third sample statement and the fourth sample statement are issued by the same interlocutor, and the second prediction probability and the third loss value form a negative correlation; or, in response to the first sample relation label being the second negative sample relation label, obtaining a third loss value based on a difference between the target value and the second prediction probability, where the second negative sample relation label indicates that the third sample statement and the fourth sample statement are not issued by the same interlocutor, and the difference and the third loss value are in a negative correlation relationship.

In another possible implementation manner, the encoding unit 7012 is configured to perform feature extraction on the first spliced statement based on a feature extraction sub-model in the translation model to obtain a second encoding feature, where the second encoding feature includes a plurality of word vectors, and the word vectors include a word vector of a start character and a word vector of a plurality of first words; for each word vector: based on a coding sub-model in the translation model, carrying out weighted fusion on a plurality of word vectors, and fusing the vector subjected to weighted fusion with the word vectors to obtain first feature vectors corresponding to the word vectors; and forming a first coding feature by using the obtained plurality of first feature vectors.

In another possible implementation manner, the obtaining module 701 is configured to encode the first sample statement and the second sample statement respectively based on the translation model to obtain a third encoding feature corresponding to the first sample statement and a fourth encoding feature corresponding to the second sample statement, where the fourth encoding feature includes a second feature vector corresponding to each second word, the second word refers to a word in the second sample statement, and each second feature vector is obtained by weighted fusion of the corresponding second word and a word vector of a previous second word; fusing the third coding features and the fourth coding features based on the translation model to obtain fusion features, wherein the fusion features comprise fusion feature vectors corresponding to each second word; based on the translation model and the fusion features, a third prediction probability corresponding to each second word is obtained, wherein the third prediction probability indicates the possibility of translating each fusion feature vector into the corresponding second word based on the translation model.

In another possible implementation manner, as shown in fig. 8, the apparatus further includes:

the obtaining module 701 is further configured to obtain a first sample data set, where the first sample data set includes first dialog data and second dialog data having the same meaning, the first dialog data belongs to a source language, the second dialog data belongs to a target language, the first dialog data and the second dialog data are both obtained by translating based on third dialog data, and the third dialog data is obtained by at least two interlocutors performing dialogues in the source language and the target language respectively;

a training module 703, configured to perform iterative training on the translation model again based on the first sample data set.

In another possible implementation, as shown in fig. 8, the training module 703 includes:

a second obtaining unit 7031, configured to obtain a fifth sample statement and a first associated statement associated with the fifth sample statement from the first dialogue data, and obtain a sixth sample statement having the same meaning as the fifth sample statement from the second dialogue data;

a determining unit 7032, configured to obtain a seventh sample statement and an eighth sample statement from at least one piece of first dialogue data, and determine a second sample relationship label, where the second sample relationship label indicates whether there is an association relationship between the seventh sample statement and the eighth sample statement;

a second obtaining unit 7031, configured to obtain a second prediction result based on the translation model, the fifth sample sentence, the first association sentence, and the sixth sample sentence, where the second prediction result indicates a possibility of translating the fifth sample sentence into the sixth sample sentence based on the translation model;

the second obtaining unit 7031 is further configured to obtain a second prediction relationship label based on the translation model, the seventh sample statement and the eighth sample statement, where the second prediction relationship label indicates a prediction association relationship between the seventh sample statement and the eighth sample statement;

the second adjusting unit 7033 is further configured to adjust the translation model based on the second prediction result, the second prediction relation label, and the second sample relation label.

In another possible implementation manner, the second obtaining unit 7031 is configured to splice the fifth sample statement and the first associated statement to obtain a second spliced statement; respectively coding the second spliced statement and the sixth sample statement based on the translation model to obtain a fifth coding feature corresponding to the second spliced statement and a sixth coding feature corresponding to the sixth sample statement, wherein the sixth coding feature comprises a third feature vector corresponding to each third word, the third words refer to words in the sixth sample statement, and each third feature vector is obtained by weighted fusion of the corresponding third word and word vectors of previous third words; fusing the fifth coding feature and the sixth coding feature based on the translation model to obtain a fusion feature, wherein the fusion feature comprises a fusion feature vector corresponding to each third word; based on the translation model and the fusion features, a fourth prediction probability corresponding to each third word is obtained, wherein the fourth prediction probability indicates the possibility of translating each fusion feature vector into the corresponding third word based on the translation model.

the obtaining module 701 is further configured to obtain a second sample data set, where the second sample data set includes a ninth sample sentence and a tenth sample sentence having the same meaning, the ninth sample sentence belongs to the source language, and the tenth sample sentence belongs to the target language;

a training module 703, configured to perform iterative training on the translation model based on the second sample data set.

the obtaining module 701 is further configured to obtain, based on the translation model, the target sentence, and a second associated sentence associated with the target sentence, a translated sentence corresponding to the target sentence, where the target sentence and the second associated sentence both belong to the source language, and the translated sentence belongs to the target language.

In another possible implementation manner, as shown in fig. 8, the obtaining module 701 includes:

a coding unit 7012, configured to code the target sentence and the second associated sentence based on the translation model to obtain a seventh coding feature;

a third obtaining unit 7014, configured to obtain a first translation word based on the translation model and the seventh coding feature;

the third obtaining unit 7014 is further configured to obtain a next translation word based on the translation model, the seventh coding feature, and the first translation word, and repeat the above steps until a last translation word is obtained;

a forming unit 7015 is configured to form the obtained plurality of translation words into a translation sentence.

In another possible implementation manner, the third obtaining unit 7014 is configured to encode the start character based on the translation model to obtain an eighth encoding characteristic corresponding to the start character; fusing the seventh coding feature and the eighth coding feature based on the translation model to obtain a fused feature; and decoding the fusion characteristics based on the translation model to obtain a first translation word.

In another possible implementation manner, the third obtaining unit 7014 is configured to encode the starting character and the currently obtained translation word based on the translation model to obtain a ninth encoding characteristic; fusing the seventh coding feature and the ninth coding feature based on the translation model to obtain a fused feature; and decoding the fusion characteristics based on the translation model to obtain the next translation word.

It should be noted that: the sentence translating apparatus provided in the above embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the sentence translation apparatus and the sentence translation method provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments and will not be described herein again.

The embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, and the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor to implement the operations performed in the sentence translation method according to the above embodiment.

Optionally, the computer device is provided as a terminal. Fig. 9 shows a block diagram of a terminal 900 according to an exemplary embodiment of the present application. The terminal 900 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

The terminal 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used to store at least one computer program for execution by the processor 901 to implement the sentence translation methods provided by the method embodiments herein.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, disposed on the front panel of the terminal 900; in other embodiments, the number of the display panels 905 may be at least two, and each of the display panels is disposed on a different surface of the terminal 900 or is in a foldable design; in other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of the terminal 900. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When the power source 909 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the display screen 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user on the terminal 900. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side bezel of the terminal 900 and/or underneath the display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the display screen 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the terminal 900. When a physical key or vendor Logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the display screen 905 is increased; when the ambient light intensity is low, the display brightness of the display screen 905 is reduced. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

A proximity sensor 916, also referred to as a distance sensor, is provided on the front panel of the terminal 900. The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the terminal 900 gradually decreases, the processor 901 controls the display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the terminal 900 gradually becomes larger, the display 905 is controlled by the processor 901 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Optionally, the computer device is provided as a server. Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the memory 1002 stores at least one computer program, and the at least one computer program is loaded and executed by the processors 1001 to implement the methods provided by the foregoing method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is loaded and executed by a processor to implement the operations performed in the sentence translation method of the foregoing embodiment.

Embodiments of the present application also provide a computer program product or a computer program comprising computer program code stored in a computer readable storage medium. The processor of the computer apparatus reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer apparatus realizes the operations performed in the sentence translation method according to the above-described embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A sentence translation method, the method comprising:

2. The method of claim 1, wherein obtaining a first predictive relational tag based on the translation model, the third sample statement, and the fourth sample statement comprises:

splicing the third sample statement and the fourth sample statement to obtain a first spliced statement;

based on the translation model, encoding the first spliced sentence to obtain first encoding features, wherein the first encoding features comprise a plurality of first feature vectors, a first one of the first feature vectors corresponds to a starting character located before the first spliced sentence, each first feature vector except the first one of the first feature vectors corresponds to a first word, the first word refers to a word in the first spliced sentence, and each first feature vector is obtained by weighted fusion of a word vector of the starting character and word vectors of a plurality of the first words;

and classifying the first feature vector to obtain the first prediction relation label.

3. The method of claim 2, wherein the first sample relationship label indicates whether the third sample statement and the fourth sample statement belong to the same group of dialogs; the classifying the first feature vector to obtain the first prediction relationship label includes:

classifying a first one of the first feature vectors based on a first classification model to obtain a first prediction probability, wherein the first prediction probability indicates a possibility that the third sample sentence and the fourth sample sentence belong to the same group of conversations;

the adjusting the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label includes:

acquiring a first loss value based on the first prediction result;

acquiring a second loss value based on the first prediction probability and the first sample relation label;

adjusting the translation model and the first classification model based on the first penalty value and the second penalty value.

4. The method of claim 2, wherein the third sample statement and the fourth sample statement belong to the same group of dialogs, and wherein the first sample relationship label indicates whether the third sample statement and the fourth sample statement are issued by the same interlocutor; the classifying the first feature vector to obtain the first prediction relationship label includes:

classifying the first feature vector based on a second classification model to obtain a second prediction probability, wherein the second prediction probability indicates the possibility that the third sample statement and the fourth sample statement are sent by the same interlocutor;

acquiring a first loss value based on the first prediction result;

acquiring a third loss value based on the second prediction probability and the first sample relation label;

adjusting the translation model and the second classification model based on the first penalty value and the third penalty value.

5. The method of claim 2, wherein said encoding the first spliced sentence based on the translation model to obtain a first encoding characteristic comprises:

extracting features of the first spliced statement based on a feature extraction submodel in the translation model to obtain second coding features, wherein the second coding features comprise a plurality of word vectors, and the word vectors comprise the word vector of the initial character and the word vectors of the first words;

for each of the word vectors: based on a coding sub-model in the translation model, carrying out weighted fusion on a plurality of word vectors, and fusing the vector subjected to weighted fusion with the word vectors to obtain first feature vectors corresponding to the word vectors;

and forming the first coding feature by using the obtained plurality of first feature vectors.

6. The method of claim 1, wherein obtaining the first prediction result based on the translation model, the first sample statement, and the second sample statement comprises:

respectively coding the first sample sentence and the second sample sentence based on the translation model to obtain a third coding feature corresponding to the first sample sentence and a fourth coding feature corresponding to the second sample sentence, wherein the fourth coding feature comprises a second feature vector corresponding to each second word, the second words refer to words in the second sample sentence, and each second feature vector is obtained by weighted fusion of the corresponding second word and word vectors of previous second words;

fusing the third coding features and the fourth coding features based on the translation model to obtain fusion features, wherein the fusion features comprise fusion feature vectors corresponding to each second word;

based on the translation model and the fusion features, obtaining a third prediction probability corresponding to each second word, wherein the third prediction probability indicates the possibility of translating each fusion feature vector into the corresponding second word based on the translation model.

7. The method of claim 1, wherein after adjusting the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label, the method further comprises:

obtaining a first sample data set, wherein the first sample data set comprises first dialogue data and second dialogue data with the same meaning, the first dialogue data belongs to the source language, the second dialogue data belongs to the target language, the first dialogue data and the second dialogue data are obtained through translation based on third dialogue data, and the third dialogue data are obtained through dialogue of at least two interlocutors in the source language and the target language respectively;

iteratively training the translation model again based on the first sample dataset.

8. The method of claim 7, wherein iteratively training the translation model again based on the first sample dataset comprises:

acquiring a fifth sample sentence and a first associated sentence associated with the fifth sample sentence from the first dialogue data, and acquiring a sixth sample sentence having the same meaning as the fifth sample sentence from the second dialogue data;

obtaining a seventh sample statement and an eighth sample statement from at least one piece of the first dialogue data, and determining a second sample relation label, wherein the second sample relation label indicates whether the seventh sample statement and the eighth sample statement have an association relation or not;

obtaining a second prediction result based on the translation model, the fifth sample sentence, the first associated sentence, and the sixth sample sentence, the second prediction result indicating a likelihood of translating the fifth sample sentence into the sixth sample sentence based on the translation model;

obtaining a second prediction relationship label based on the translation model, the seventh sample statement and the eighth sample statement, the second prediction relationship label indicating a prediction association between the seventh sample statement and the eighth sample statement;

adjusting the translation model based on the second prediction result, the second prediction relationship label, and the second sample relationship label.

9. The method of claim 8, wherein obtaining a second prediction result based on the translation model, the fifth sample statement, the first associated statement, and the sixth sample statement comprises:

splicing the fifth sample statement and the first associated statement to obtain a second spliced statement;

respectively coding the second spliced statement and the sixth sample statement based on the translation model to obtain a fifth coding feature corresponding to the second spliced statement and a sixth coding feature corresponding to the sixth sample statement, wherein the sixth coding feature comprises a third feature vector corresponding to each third word, the third words refer to words in the sixth sample statement, and each third feature vector is obtained by weighted fusion of the corresponding third word and word vectors of previous third words;

fusing the fifth coding feature and the sixth coding feature based on the translation model to obtain a fusion feature, wherein the fusion feature comprises a fusion feature vector corresponding to each third word;

based on the translation model and the fused features, obtaining a fourth prediction probability corresponding to each third word, wherein the fourth prediction probability indicates the possibility of translating each fused feature vector into the corresponding third word based on the translation model.

10. The method of claim 1, wherein prior to adjusting the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label, the method further comprises:

acquiring a second sample data set, wherein the second sample data set comprises a ninth sample sentence and a tenth sample sentence which have the same meaning, the ninth sample sentence belongs to a source language, and the tenth sample sentence belongs to a target language;

iteratively training the translation model based on the second set of sample data.

11. The method of any of claims 1-10, wherein after adjusting the translation model based on the first prediction result, the first prediction relationship label, and the first sample relationship label, the method further comprises:

and acquiring a translated sentence corresponding to the target sentence based on the translation model, the target sentence and a second associated sentence associated with the target sentence, wherein the target sentence and the second associated sentence both belong to the source language, and the translated sentence belongs to the target language.

12. The method of claim 11, wherein obtaining the translation statement corresponding to the target statement based on the translation model, the target statement, and a second related statement related to the target statement comprises:

coding the target statement and the second associated statement based on the translation model to obtain a seventh coding feature;

acquiring a first translation word based on the translation model and the seventh coding feature;

obtaining the next translation word based on the translation model, the seventh coding feature and the first translation word, and repeating the steps until the last translation word is obtained;

and forming the translation sentence by the obtained translation words.

13. A sentence translation apparatus, the apparatus comprising:

14. A computer device comprising a processor and a memory, wherein at least one computer program is stored in the memory, and wherein the at least one computer program is loaded and executed by the processor to perform the operations performed in the sentence translation method of any of claims 1 to 12.

15. A computer-readable storage medium, having at least one computer program stored therein, the at least one computer program being loaded and executed by a processor to perform the operations performed in the sentence translation method of any of claims 1 to 12.