CN116956950A - Machine translation method, apparatus, device, medium, and program product - Google Patents

Machine translation method, apparatus, device, medium, and program product Download PDF

Info

Publication number
CN116956950A
CN116956950A CN202310304742.2A CN202310304742A CN116956950A CN 116956950 A CN116956950 A CN 116956950A CN 202310304742 A CN202310304742 A CN 202310304742A CN 116956950 A CN116956950 A CN 116956950A
Authority
CN
China
Prior art keywords
sentence
translation
sentences
natural language
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310304742.2A
Other languages
Chinese (zh)
Inventor
刘乐茂
郝宏坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310304742.2A priority Critical patent/CN116956950A/en
Publication of CN116956950A publication Critical patent/CN116956950A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a machine translation method, a device, equipment, a medium and a program product, and relates to the technical field of machine translation. The method comprises the following steps: respectively extracting the characteristics of the first sentence and the ith second sentence to obtain a first characteristic representation corresponding to the first sentence and a second characteristic representation corresponding to the ith second sentence; performing joint decoding on the first feature representation and a second feature representation corresponding to an ith second sentence to obtain an ith candidate translation result corresponding to the first sentence; and carrying out fusion analysis on candidate translation results corresponding to the plurality of second sentences to obtain target translation sentences corresponding to the first sentences. By fusing the translation results, the influence of the TM-enhanced machine translation model variance on the quality of the translation results is reduced, and the influence of the deviation possibly generated by the machine translation model on the quality of the translation results is reduced, so that the quality of translation sentences obtained by translation of the machine translation model is improved.

Description

Machine translation method, apparatus, device, medium, and program product
Technical Field
Embodiments of the present application relate to the field of machine translation technologies, and in particular, to a machine translation method, apparatus, device, medium, and program product.
Background
Translation memory (Translation Memory, TM) refers to sentences retrieved from training data or external databases that are similar to the current sentence to be translated. In recent years, a neural machine translation (Neural Machine Translation, NMT) method of enhancing translation memory has received a great deal of attention in the field of translation research since it was proposed.
In the related art, the TM-enhanced NMT model is mainly divided into two steps: the first step is searching, namely searching similar sentence pairs back in TM according to the current source language sentence to be translated; and the second step is to obtain sentences in the target language by using a neural network model according to the sentence pairs in the source language and the retrieved sentences.
However, in the scenario where the training set is low-resource, the TM-enhanced NMT model in the related art performs poorly, even worse than the translation of the normal NMT model.
Disclosure of Invention
The embodiment of the application provides a machine translation method, a device, equipment, a medium and a program product, which can improve the quality of a translation sentence obtained by machine translation, and the technical scheme is as follows:
in one aspect, a machine translation method is provided, the method comprising:
acquiring a first sentence of a first natural language;
Acquiring a plurality of second sentences conforming to a first semantic association relation with the first sentences from a first translation memory library, wherein the plurality of second sentences are sentences in a second natural language, and the translation memory library comprises sentences in the second natural language;
respectively extracting features of the first sentence and the ith second sentence to obtain a first feature representation corresponding to the first sentence and a second feature representation corresponding to the ith second sentence, wherein i is a positive integer;
performing joint decoding on the first characteristic representation and a second characteristic representation corresponding to an ith second sentence to obtain an ith candidate translation result corresponding to the first sentence, wherein the candidate translation result comprises the probability that a word in a dictionary belongs to a translation of a second natural language corresponding to the first sentence;
and carrying out fusion analysis on candidate translation results corresponding to the plurality of second sentences to obtain target translation sentences corresponding to the first sentences, wherein the target translation sentences are translations of the second natural language corresponding to the first sentences.
In another aspect, there is provided a data translation apparatus, the apparatus comprising:
the acquisition module is used for acquiring a first sentence of a first natural language;
The obtaining module is further configured to obtain a plurality of second sentences conforming to the first semantic association relationship with the first sentence from a first translation memory, where the plurality of second sentences are sentences in a second natural language, and the translation memory includes sentences in the second natural language;
the extraction module is used for extracting the characteristics of the first sentence and the ith second sentence respectively to obtain a first characteristic representation corresponding to the first sentence and a second characteristic representation corresponding to the ith second sentence, wherein i is a positive integer;
the decoding module is used for carrying out joint decoding on the first characteristic representation and the second characteristic representation corresponding to the ith second sentence to obtain an ith candidate translation result corresponding to the first sentence, wherein the candidate translation result comprises the probability that the words in the dictionary belong to the translation of the second natural language corresponding to the first sentence;
and the analysis module is used for carrying out fusion analysis on candidate translation results corresponding to the plurality of second sentences respectively to obtain target translation sentences corresponding to the first sentences, wherein the target translation sentences are translations of the second natural language corresponding to the first sentences.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the machine translation method as in any of the above embodiments.
In another aspect, a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by a processor to implement a machine translation method as described in any of the above embodiments is provided.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the machine translation method according to any of the above embodiments.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
and carrying out joint coding on different pairs of source language-translation memory (namely second sentence) through recalling a plurality of translation memories of the source sentence to obtain different translation results, and finally fusing all the translation results to obtain a target translation sentence of the source language. By fusing the translation results, the influence of the TM-enhanced machine translation model variance on the quality of the translation results is reduced, and the influence of the deviation possibly generated by the machine translation model on the quality of the translation results is reduced, so that the quality of translation sentences obtained by translation of the machine translation model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a machine translation process provided by an exemplary embodiment of the present application;
FIG. 2 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application;
FIG. 3 is a flow chart of a machine translation method provided by an exemplary embodiment of the present application;
FIG. 4 is a flow chart of a machine translation method provided by another exemplary embodiment of the present application;
FIG. 5 is a flow chart of a machine translation method provided by an exemplary embodiment of the present application;
FIG. 6 is a flow chart of a machine translation method provided by yet another exemplary embodiment of the present application;
FIG. 7 is a flow chart of a machine translation method provided by yet another exemplary embodiment of the present application;
fig. 8 is a schematic diagram of a BLEU result on JRC-Acquis four translation tasks for different methods in a plug-and-play scenario provided by an exemplary embodiment of the present application;
Fig. 9 is a schematic diagram of a BLEU result of a multi-domain data set for different methods in a low resource scenario provided by an exemplary embodiment of the present application;
FIG. 10 is a block diagram of a machine translation device according to an exemplary embodiment of the present application;
FIG. 11 is a block diagram of a machine translation device according to another exemplary embodiment of the present application;
fig. 12 is a block diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
For the purpose of promoting an understanding of the principles and advantages of the application, reference will now be made in detail to the embodiments of the application, some but not all of which are illustrated in the accompanying drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and no limitation on the amount or order of execution.
First, a brief description will be given of terms involved in the embodiments of the present application.
Neural machine translation (Neural Machine Translation, NMT): refers to a translation mode of translating a sentence in one natural language into a sentence in another natural language through a trained machine translation model. Schematically, the machine translation model is trained through a large number of translation corpus samples, the translation corpus samples comprise a plurality of groups of first natural language corpus and second natural language corpus, a corresponding relation exists between the first natural language corpus and the second natural language corpus, each first natural language corpus corresponds to at least one second natural language corpus as a translation result, after the training is completed, a user inputs a source end sentence of the first natural language into the machine translation model, and then outputs a target end sentence of the second natural language. Schematically, the Chinese sentence is translated into the English sentence, the Chinese sentence is the source sentence, and the English sentence is the target sentence.
Translation memory (Translation Memory, TM) library: a database of translations of natural language sentences. The translation memory may store bilingual sentence pairs having an inter-translation relationship, or may store only bilingual sentences corresponding to translations.
Bilingual evaluation study (Bilingual Evaluation Understudy, BLEU): is an evaluation index of the machine translation model, and the higher the BLEU value is, the better the translation effect of the machine translation model is.
Converter (transducer) model: a model comprising an encoder and a decoder, the structure of the encoding layer and the decoding layer being similar or identical, the encoder typically comprising a multi-layer Attention mechanism unit and a feedforward neural network unit, such as a Self-Attention mechanism unit (Self-Attention) and a multi-head Attention mechanism unit (Muiti-head Attention), etc.; the decoder typically includes a self-attention mechanism and a Cross-attention mechanism (Cross-attention), and the transform model typically splits the input of the attention mechanism elements into N "heads" (referring to the number of multi-head attention mechanism elements) according to the dimensions of the model, and then performs the calculation of the attention mechanism on each "head" separately. The machine translation model according to the embodiments of the present application is constructed based on a transducer model.
High resource scenario: a training set bilingual sentence pair rich machine translation task scene.
Low resource scenario: a machine translation task scenario with starvation of bilingual sentences of a training set.
TM enhanced NMT methods have received extensive attention in the field of translation research since being proposed. Translation memory is a sentence similar to the current source sentence to be translated that is retrieved from training data or an external database, which can provide a certain knowledge for the translation process of the source sentence. Many recent studies have proposed methods for enhancing neural machine translation models by using translation memories.
In the related art, the TM-enhanced NMT model is mainly divided into two steps: the first step is searching, namely searching similar sentence pairs back in TM according to the current source language sentence to be translated; and the second step is to obtain sentences in the target language by using a neural network model according to the sentence pairs in the source language and the retrieved sentences. However, compared with the common NMT model without using translation memory, the TM enhanced NMT model achieves larger translation performance improvement in a conventional high-resource scene; in scenarios where the training set is low-resource, using the TM-enhanced NMT model performs poorly, even worse than the translation of the normal NMT model.
Taking a training data set as JRC-Acquis as an example for explanation, and using all training set training models under the same translation task and high resource scene; in a low-resource scenario, one quarter of the training set data training model is randomly selected. Comparison of TM-enhanced NMT model by BLEU values the differences between the performance of the normal NMT model are shown in table 1 below:
TABLE 1
Model High resource scenario Low resource scenarios
NMT 60.83 54.54
TM-NMT 63.76 53.92
From the data shown in table 1, it can be seen that: in a high resource scene, the BLEU value of the NMT model is smaller than that of the TM enhanced NMT model, namely the translation performance of the TM enhanced NMT model is better; in a low resource scenario, the BLEU value of the NMT model is greater than that of the TM enhanced NMT model, i.e. the translation performance of the TM enhanced NMT model is poor.
The reason why the TM enhanced NMT model performs poorly compared to the NMT model for low resource scenarios is because the TM enhanced NMT model has a larger variance and smaller bias than the NMT model, which means that the TM enhanced NMT model is good at fitting the training set data and is more sensitive to minor fluctuations of the training set. Therefore, in order to balance the variance and deviation of the TM enhanced NMT model, the embodiment of the application considers the influence of different translation memories on the translation process of the current source sentence during translation, thereby improving the expression capacity of the TM enhanced NMT model, and finally improving the translation performance of the model, namely improving the quality of the translation sentence obtained by machine translation.
Referring to fig. 1, a schematic flow chart of a machine translation process is shown, the flow chart includes the following steps:
1. A plurality of translation memory recalls.
As shown in FIG. 1, a source sentence X to be translated is input into a translation memory library 110, and a plurality of translation memories TM are retrieved, which can be implemented as bilingual sentence pairs (X TM ,Y TM ) For example: the source sentence is "I walk through this city" and the retrieved bilingual sentence pair includes (X 1 TM ,Y 1 TM ): (I walk through this city-I have hiked through this city), (X) 2 TM ,Y 2 TM ): (i walk around this city-I explored the city on foot).
2. Decoding is based on different source sentence-translation memory combinations.
After obtaining a plurality of different translation memories (X 1 TM ,Y 1 TM ) And (X) 2 TM ,Y 2 TM ) Thereafter, the memory translations corresponding to the source statement in the translation memory may be translated (e.g.: y is Y 1 TM Namely, "I have hiked through this city" in the bilingual sentence pair) and the source sentence X are input into the trained machine translation model 120, the source sentence X and the memory translation are respectively encoded by the encoder 121 in the machine translation model 120, and the encoded source sentence X and the memory translation are input into the decoder 122 for joint decoding, thereby obtaining a decoding result.
The decoding result includes a probability distribution corresponding to the source sentence, the probability distribution indicating a probability that each word in the dictionary belongs to a translation of the source sentence. As shown in FIG. 1, recalled translation memory Y 1 TM And Y 2 TM And respectively combining the source sentences X to obtain a plurality of decoding results.
3. And fusing a plurality of decoding results.
After obtaining Y 1 TM And Y 2 TM After the decoding results corresponding to the respective decoding results, the decoding results may be input to the fusion layer 123 to obtain Y 1 TM Probability distribution P in decoding result 1 Weight k of (2) 1 And Y 2 TM Probability distribution P in decoding result 2 Weight k of (2) 2 (k 1 +k 2 =1), calculate P 1 ×k 1 +P 2 ×k 2 Obtaining a fusion probability corresponding to a plurality of decoding resultsThe rate profile P.
4. A target translation statement is determined.
After the fusion probability distribution P is determined, the word with the highest probability value in each row in the fusion probability distribution P can be output, and the obtained output sequence is the target translation sentence Y corresponding to the source sentence.
The machine translation method provided by the embodiment of the application can be applied to various application scenes such as document data translation, article information reading, foreign language website translation, foreign language learning and inquiring, spoken language dialogue assistance, foreign travel service and the like. The following describes an example of the application of the embodiment of the present application to a document translation scenario:
in the text translation software, a target translation model is applied, and document contents such as contracts, files, materials, papers, mails and the like can be used as target translation model input data; firstly, carrying out serialization processing on input data by a target translation model to obtain a corresponding input sequence; secondly, recalling n translation memories from a translation memory library according to the input sequence, and respectively encoding the input sequence and the n translation memories through an encoder; then, the coding features of the input sequence and the coding features of the ith translation memory are decoded in a combined mode through a decoder to obtain an ith decoding result; and finally, carrying out fusion analysis on the obtained n decoding results to obtain a translation result of the input data.
It should be noted that the above application scenario is merely an illustrative example, and the machine translation method provided in the embodiment of the present application may also be applied to other scenarios, which is not limited in this embodiment of the present application.
Next, an implementation environment according to an embodiment of the present application will be described, schematically, with reference to fig. 2, where a terminal 210 and a server 220 are involved, and the terminal 210 and the server 220 are connected through a communication network 230. The communication network 230 may be a wired network or a wireless network, which is not limited in the embodiments of the present application.
In some alternative embodiments, the terminal 210 has installed and running therein a target application program having a data translation function. The target application may be implemented as a translation application, a document reading application, an instant messaging application, a video application, a news information application, a comprehensive search engine application, a social application, a game application, a shopping application, a map navigation application, etc., which is not limited in this embodiment of the present application.
In some optional embodiments, the server 220 is configured to provide a background service for a target application installed in the terminal 210, and a translation system is set in the server 220, where the translation system at least includes a first translation memory library and a target translation model, and after the server 220 receives a first sentence of a first natural language uploaded by the terminal, first, a plurality of second sentences conforming to a first semantic association relationship with the first sentence are obtained from the first translation memory library; secondly, respectively extracting features of the first sentence and the ith second sentence through a target translation model to obtain a first feature representation corresponding to the first sentence and a second feature representation corresponding to the ith second sentence, wherein i is a positive integer; then, the target translation model is used for carrying out joint decoding on the first characteristic representation and the second characteristic representation corresponding to the ith second sentence, so as to obtain an ith candidate translation result corresponding to the first sentence; and finally, carrying out fusion analysis on candidate translation results respectively corresponding to the plurality of second sentences through a target translation model to obtain target translation sentences corresponding to the first sentences, wherein the target translation sentences are translations of the second natural language corresponding to the first sentences. Alternatively, the server 220 transmits the obtained target translation sentence to the terminal 210.
In some alternative embodiments, the translation system is provided in the terminal 210, that is, the terminal 210 may obtain the target translation sentence after analyzing the first sentence in the first natural language by the translation system; i.e., the terminal 210 may complete the translation process of the data offline.
The terminal 210 includes at least one of a smart phone, a tablet computer, a portable laptop, a desktop computer, an intelligent sound box, an intelligent wearable device, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like.
It should be noted that the server 220 can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.
Cloud Technology (Cloud Technology) refers to a hosting Technology that unifies serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business model, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing. Optionally, server 220 may also be implemented as a node in a blockchain system.
It should be noted that, before collecting relevant data (for example, the first sentence and the translation memory library) of the user and during the process of collecting relevant data of the user, the application can display a prompt interface, a popup window or output voice prompt information, where the prompt interface, the popup window or the voice prompt information is used to prompt the user to collect relevant data currently, so that the application only starts to execute the relevant step of obtaining relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or the popup window, otherwise (i.e., when the confirmation operation of the user to the prompt interface or the popup window is not obtained), ends the relevant step of obtaining relevant data of the user, i.e., does not obtain relevant data of the user. In other words, all user data collected by the present application is collected with the consent and authorization of the user, and the collection, use and processing of relevant user data requires compliance with relevant laws and regulations and standards of the relevant country and region.
In connection with the above description and the implementation environment, fig. 3 is a flowchart of a machine translation method according to an embodiment of the present application, where the method may be executed by a server or a terminal, or may be executed by the server and the terminal together, and in the embodiment of the present application, the method is described by taking the execution of the method by the server as an example, as shown in fig. 3, and the method includes:
Step 301, a first sentence of a first natural language is obtained.
Natural language generally refers to languages that evolve naturally with culture, such as: chinese, english, german, western, etc. The first natural language may be implemented as any natural language, which is not limited in the embodiments of the present application. Taking the first natural language implementation as Chinese as an example, the first sentence may be "hello-! ".
Wherein the first sentence is a sentence to be translated. Illustratively, a terminal is provided with and runs a target application program with a translation function, and the target application program can translate an application program, a document reading application program, an instant messaging application program, a video application program, a news information application program, a comprehensive search engine application program, a social application program, a game application program, a shopping application program, a map navigation application program and the like.
Optionally, the method for acquiring the first sentence further includes: in response to receiving the target data, a first statement is obtained.
The form of the target data can be at least one of various forms such as characters, pictures, audios and videos.
Optionally, the target data may be input data received by the current terminal, or data sent to the current terminal by other computer devices (other terminals or servers), or data read from a universal serial bus (Universal Serial Bus, USB) interface, and the method for acquiring the target data is not limited in this embodiment of the present application.
Illustratively, taking the translation application as an example, a user can input an original text in an input box in the translation application, and the original text is a first sentence; or, the user scans the picture containing the characters through the scanning function provided by the translation application program, and the characters obtained through recognition are the first sentences; or the user uploads a section of audio or video through the audio/video uploading control provided by the translation application program, and the text information in the identified audio or video is the first sentence.
Step 302, a plurality of second sentences conforming to the first semantic association relation with the first sentences are obtained from the first translation memory.
The plurality of second sentences are sentences in the second natural language, and the first translation memory comprises the sentences in the second natural language.
The second natural language and the first natural language are different natural languages. For example: the first natural language is German, and the second natural language is English.
The first translation memory is a database mainly storing sentences in the second natural language and is used for assisting in translating the sentences from the first natural language to the second natural language. Optionally, the first translation memory includes at least one of:
1. The first translation memory comprises a plurality of bilingual sentence pairs, wherein the bilingual sentence pairs comprise sentence pairs with inter-translation relations, which are formed by sentences in a first natural language and sentences in a second natural language.
Illustratively, the first natural language is implemented as chinese, the second natural language is implemented as english, and the bilingual sentence pairs are implemented as chinese-english sentence pairs, for example: the Chinese sentence is "I need to solve this problem", and the English sentence corresponding to the sentence pair is "I need to solve this problem".
2. The first translation memory includes a plurality of single-language sentences of the second natural language.
Illustratively, when the second natural language is implemented as english, the first translation memory includes a plurality of english single-language sentences, where the single-language sentences have no corresponding translations.
In some embodiments, illustrated by way of example in which the first translation memory is implemented as a memory including bilingual sentence pairs, the method of determining a plurality of second sentences further comprises:
determining semantic similarity between the first sentence and the sentence of the first natural language in the first translation memory; determining the sentence of the second natural language in the bilingual sentence pair with the semantic similarity reaching the similarity threshold as a second sentence; or determining the sentences of the second natural language in the n bilingual sentences with the highest semantic similarity as n second sentences; wherein n is an integer greater than 1, and i is less than or equal to n.
Alternatively, in the case where natural language is the same between sentences, a method of calculating similarity between sentences may use a statistical-based method, for example: a Best Matching (Best Matching 25, bm25) based method, a Term Frequency-inverse text Frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) based method, and the like; deep learning based methods may also be used, such as: text similarity calculation method based on Word2vec model, etc., the embodiment of the application is not limited to this.
Illustratively, let Chinese sentence 1 be the first sentence to be translated, n be 2, and the first translation memory includes a pair of Chinese-English sentences a, a pair of Chinese-English sentences b, and a pair of Chinese-English sentences c.
And if the semantic similarity of the Chinese sentence 1 and the Chinese sentence a in the Chinese-English sentence pair a is larger than a similarity threshold value and the semantic similarity of the Chinese sentence 1 and the Chinese sentence b in the Chinese-English sentence pair b is larger than a similarity threshold value, taking the English sentence a in the Chinese-English sentence pair a and the English sentence b in the Chinese-English sentence pair b as n second sentences.
Or, the semantic similarity between the Chinese sentence 1 and the Chinese sentence a in the Chinese-English sentence pair a is greater than the semantic similarity between the Chinese sentence 1 and the Chinese sentence b in the Chinese-English sentence pair b, and greater than the semantic similarity between the Chinese sentence 1 and the Chinese sentence c in the Chinese-English sentence pair c, and the English sentence a in the Chinese-English sentence pair a and the English sentence b in the Chinese-English sentence pair b are taken as n second sentences.
In some embodiments, describing the first translation memory implemented as a memory including single-language sentences of a second natural language, the method of determining a plurality of second sentences further includes:
determining a first feature vector corresponding to the first sentence and determining a second feature vector corresponding to the monolingual sentence of the second natural sentence in the first translation memory; after the first feature vector and the second feature vector are aligned, determining a vector distance between the aligned first feature vector and second feature vector; determining the single-language sentences of the second natural language corresponding to the second feature vectors of which the vector distances reach the distance threshold as m second sentences; or determining the single-language sentences of m second natural languages corresponding to m second feature vectors with the minimum vector distance as m second sentences; wherein m is an integer greater than 1, and i is less than or equal to m.
Alternatively, in the case where natural languages are different between sentences, the correlation between sentences is calculated using a maximum inner product search (Maximum Inner Product Search, MIPS) algorithm, each sentence may be encoded as a dense vector, and a dot product between the dense vector of the first sentence and the dense vector of the second sentence is calculated, the larger the dot product, the higher the similarity between the vectors, and the smaller the distance between the vectors.
In some embodiments, to improve accuracy and recall efficiency of the recalled second sentence, the first translation memory may be categorized, optionally, the first translation memory includes a plurality of sub first translation memory, where each sub first translation memory belongs to a different domain, for example: the first translation memory comprises a first sub-first translation memory and a second sub-first translation memory; the first sub first translation memory comprises English sentences related to the artificial intelligence field; and if the user inputs Chinese sentences related to the artificial intelligence field, positioning the Chinese sentences to the first sub first translation memory library to obtain English sentences of which the input Chinese sentences accord with the first semantic association relation.
And step 303, respectively extracting the characteristics of the first sentence and the ith second sentence to obtain a first characteristic representation corresponding to the first sentence and a second characteristic representation corresponding to the ith second sentence.
Wherein i is a positive integer.
Alternatively, the target translation model is implemented as a sequence-to-sequence translation model, which is primarily an Encoder-Decoder (Encoder-Decoder) based architecture.
Optionally, the first sentence and the ith second sentence are input into an encoder of the target translation model, and a first coding vector corresponding to the first sentence and a second coding vector corresponding to the ith second sentence are respectively output.
Wherein the encoder is used for encoding an input sentence to obtain an encoded vector, and the encoder encodes the sentence by splitting the sentence into words or terms when encoding, and illustratively, it is assumed that the first sentence is split into a token-level sequence x= { x 1 ,x 2 ,...,x M (where M is sentence length), and respectively decomposing n second sentences into token-level sequences Z k ={Z k1 ,Z k2 ,...,Z kW (wherein W is the sentence length), the resulting sequences x and Z will be disassembled k The target translation model is input to an encoder for encoding.
Or, if the sentence stored in the first translation memory is a token-level sequence, the first sentence acquired in step 301 is implemented as a token-level sequence representation corresponding to the first sentence.
In some embodiments, the encoder architecture in the target translation model is implemented as a dual encoder architecture; inputting the first sentence into a first encoder of the target translation model, and outputting a first encoding vector corresponding to the first sentence; inputting the ith second sentence into a second encoder of the target translation model, and outputting a second coding vector corresponding to the ith second sentence.
In other embodiments, the encoder architecture in the target translation model is implemented as a single encoder architecture; the first sentence and the ith second sentence are input into a target encoder of a target translation model, and a first coding vector corresponding to the first sentence and a second coding vector corresponding to the ith second sentence are respectively output.
Alternatively, the target translation model may be implemented as an autoregressive model or as a non-autoregressive model, as embodiments of the application are not limited in this respect.
In the field of machine translation, autoregressive refers to a regression mode for determining a translation result of a current vocabulary according to the translation result, that is, a prediction result of a variable at a t moment is obtained by predicting variable expressions at the first k moments such as t-1, t-2, … …, t-k moments, and the like, and the characteristic is autoregressive. In contrast, non-autoregressive refers to the manner in which the translation results for each vocabulary are determined independently.
Illustratively, the target translation model includes at least one of the following models: a Long Short-Term Memory (LSTM) based model, a Bi-directional Long-Term Memory (Bi-LSTM) based model, a gated loop unit (Gate Recurrent Unit, GRU) based model, a discrete context based converter (Disentangled Context transformer, disCo) model, etc.
And step 304, performing joint decoding on the first characteristic representation and the second characteristic representation corresponding to the ith second sentence to obtain an ith candidate translation result corresponding to the first sentence.
The candidate translation result comprises the probability that the words in the dictionary belong to the translations of the second natural language corresponding to the first sentence.
Alternatively, joint decoding refers to decoding the first feature representation and the second feature representation simultaneously. Through joint decoding, in the process of decoding to obtain a translation result, the input source language sentences and recalled translation memories are considered at the same time, and the quality of the translation result obtained by decoding is improved.
In some embodiments, the first feature representation and the second feature representation are jointly decoded by a single decoder. Optionally, the first feature representation and the ith second feature representation are spliced and then input into a single decoder for joint decoding, and the ith candidate translation result is output.
In other embodiments, the first feature representation and the second feature representation are jointly decoded by a plurality of decoders. Optionally, decoding the ith second feature representation by the first decoder to obtain context information of the ith second sentence, inputting the context information of the ith second sentence and the first feature representation into the second decoder, guiding the decoding process of the first feature representation by the context information of the ith second sentence, and outputting the ith candidate translation result.
Optionally, if the ith candidate translation result includes the ith target probability distribution, inputting the first coding vector and the ith second coding vector into a decoder of the target translation model, and outputting the ith target probability distribution, where the ith target probability distribution is used to characterize the probability that each word in the dictionary belongs to a translation of the second natural language corresponding to the first sentence.
Wherein each word in the dictionary is a plurality of words in the second natural language.
Illustratively, sentence sequences x and Z k The corresponding coding vectors are input into a decoder of the target translation model for decoding:
taking the target translation model as an autoregressive model for example for explanation, the decoder in the target translation model of the autoregressive type carries out serial output, the (t) th decoding moment (each decoding moment corresponds to) and the decoder outputs the probability that each word in the dictionary belongs to the word corresponding to the decoding moment, and the input of the (t+1) th decoding moment of the decoder is the output combination of the first t decoding moments; combining probabilities output at all decoding moments according to a decoding sequence, wherein the obtained probability distribution is the target probability distribution; the target probability distribution of the output is assumed to be shown as probability distribution a:
Probability distribution a:
the number of lines 2 of the probability distribution a indicates that the length of the output sentence sequence is 2, and the probability that the word at the 1 st position in the sentence sequence belongs to the 1 st word in the dictionary is 0.6.
Taking the target translation model as a non-autoregressive model for illustration, a decoder in the non-autoregressive type target translation model carries out parallel prediction on each word of the output sequence, namely decoding to obtain a complete probability distribution as the target probability distribution.
Step 305, performing fusion analysis on the candidate translation results corresponding to the second sentences to obtain the target translation sentence corresponding to the first sentence.
The target translation sentence is a translation of a second natural language corresponding to the first sentence.
Optionally, performing fusion analysis on candidate translation results corresponding to the plurality of second sentences respectively, including: and carrying out weighted summation on target probability distribution in the candidate translation results corresponding to the second sentences respectively.
In some embodiments, an average of a plurality of target probability distributions is calculated, where the average is an average probability distribution, and a word with the highest probability value in each row of the average probability distribution is output, and the obtained output sequence is the target translation sentence corresponding to the first sentence.
In other embodiments, a weighted average corresponding to the multiple target probability distributions is calculated, where the weighted average is a weighted probability distribution, and the word with the highest probability value in each row of the weighted probability distribution is output, and the obtained output sequence is the target translation sentence corresponding to the first sentence.
The weight coefficients corresponding to the multiple target probability distributions can be obtained through a weight layer in the target translation model, namely, the ith candidate translation result is input into the weight layer, and the weight coefficient corresponding to the ith target probability distribution is output. Or, under the condition that the first translation memory library is realized as a bilingual sentence library, determining a weight coefficient corresponding to the ith target probability distribution based on the similarity between the sentences of the first natural language in the first translation memory library corresponding to the ith second sentence and the first sentence, wherein the similarity and the weight coefficient are in positive correlation; and under the condition that the first translation memory library is realized as a single-language sentence library, determining a weight coefficient corresponding to the ith target probability distribution based on the distance between the vector corresponding to the ith second sentence and the vector corresponding to the first sentence, wherein the distance and the weight coefficient are in a negative correlation.
Illustratively, if the number of the second sentences is determined to be 2, the method includes: a second sentence 1 and a second sentence 2, a first probability distribution is obtained for the first sentence-second sentence 1, and a second probability distribution is obtained for the first sentence-second sentence 2; and carrying out fusion analysis on the first probability distribution and the second probability distribution to obtain a new probability distribution, and outputting the word with the highest probability value in each row in the new probability distribution, wherein the obtained output sequence is the target translation sentence corresponding to the first sentence.
If the target translation model is implemented as an autoregressive model, decoding is performed synchronously for the first sentence-second sentence 1 and for the first sentence-second sentence 2, that is, the number of lines of probability distributions obtained by decoding for the first sentence-second sentence 1 and for the first sentence-second sentence 2 is ensured to be consistent (that is, the sequence lengths obtained by decoding are consistent).
If the target translation model is implemented as a non-autoregressive model, decoding can be performed first for any one of the first sentence-second sentence 1 and the first sentence-second sentence 2, and assuming that the probability distribution a is obtained by decoding for the first sentence-second sentence 1, the number of lines (i.e., the sequence length) of the probability distribution a is taken as the length of the finally output target translation sentence, and the probability distribution b is obtained by decoding for the first sentence-second sentence 2 on the basis of limiting the decoding sequence length to the number of lines of the probability distribution a, so that the consistency of the number of lines of the probability distribution obtained by decoding for the first sentence-second sentence 1 and the first sentence-second sentence 2 (i.e., the consistency of the sequence length obtained by decoding) can be ensured.
In some embodiments, fusion analysis is performed on candidate translation results corresponding to the second sentences respectively, that is, an average value of the calculated target probability distributions. Illustratively, assuming that the probability distribution of the 1 st sequence in the output sequence corresponding to the second sentence a is [0.1,0.6,0.3] and the probability distribution of the 1 st sequence in the output sequence corresponding to the second sentence b is [0.2,0.2,0.6], the probability distribution of the 1 st sequence in the final output sequence is [0.15,0.4,0.45], that is, the 3 rd word in the dictionary is determined to be the 1 st sequence word in the output sequence.
After a plurality of sequence words are obtained, the sequence words are spliced according to the sequence, and the obtained sentences are target translation sentences.
In summary, the embodiment of the application provides a machine translation method, which performs joint coding on different pairs of source language-translation memory (i.e. second sentence) through recalling a plurality of translation memories of a source sentence to obtain different translation results, and finally fuses each translation result to obtain a target translation sentence of the source language. By fusing the translation results, the influence of the TM-enhanced machine translation model variance on the quality of the translation results is reduced, and the influence of the deviation possibly generated by the machine translation model on the quality of the translation results is reduced, so that the quality of translation sentences obtained by translation of the machine translation model is improved.
According to the method provided by the embodiment of the application, the plurality of second sentences which are semantically similar to the first sentences are recalled by adopting the semantic similarity in the bilingual sentence library, so that the quality of the recalled second sentences is improved, and the influence of model deviation on a translation result is reduced.
According to the method provided by the embodiment of the application, the machine translation process of enhancing the translation memory by utilizing the single-language sentence library is realized by recalling a plurality of second sentences which are semantically similar to the first sentences by adopting the vector distance in the single-language sentence library, so that the use scene of the machine translation method provided by the application is increased.
In some optional embodiments, in some embodiments, fusion analysis is performed on candidate translation results corresponding to the plurality of second sentences, where a weight parameter corresponding to the candidate translation result needs to be determined, and weighted average is performed according to the weight parameter. Taking the above-mentioned first translation memory as an example of a memory including bilingual sentence pairs as an illustration, fig. 4 is a flowchart of a machine translation method provided in an embodiment of the present application, where the method may be executed by a server or a terminal, or may be executed by the server and the terminal together, and in the embodiment of the present application, the method is executed by the server as an example, and as shown in fig. 4, the method includes:
Step 401, a first sentence of a first natural language is obtained.
Wherein the first sentence is a sentence to be translated. The first natural language may be implemented as any natural language, which is not limited in the embodiments of the present application. Taking the first natural language implementation as Chinese as an example, the first sentence may be "hello-! ".
Step 402, determining semantic similarity between the first sentence and the sentence in the first natural language in the first translation memory.
The first translation memory is a database mainly storing sentences in the second natural language and is used for assisting in translating the sentences from the first natural language to the second natural language. The first translation memory comprises a plurality of bilingual sentence pairs, wherein the bilingual sentence pairs comprise sentence pairs with inter-translation relations, which are formed by sentences in a first natural language and sentences in a second natural language. The second natural language and the first natural language are different natural languages. Illustratively, the first natural language is implemented as chinese, the second natural language is implemented as english, and the bilingual sentence pairs are implemented as chinese-english sentence pairs, for example: the Chinese sentence is "I need to solve this problem", and the English sentence corresponding to the sentence pair is "I need to solve this problem".
Alternatively, in the case where natural language is the same between sentences, a method of calculating similarity between sentences may use a statistical-based method, for example: a Best Matching (Best Matching 25, bm25) based method, a Term Frequency-inverse text Frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) based method, and the like; deep learning based methods may also be used, such as: text similarity calculation method based on Word2vec model, etc., the embodiment of the application is not limited to this.
Step 403, based on the semantic similarity, obtaining n second sentences conforming to the first semantic association relationship with the first sentence from the first translation memory.
Optionally, determining the sentences of the second natural language in the n bilingual sentences with the highest semantic similarity as n second sentences; or determining the sentences of the second natural language in the n bilingual sentences of which the semantic similarity reaches the similarity threshold as n second sentences.
n is a positive integer greater than 1, and i is less than or equal to n.
Wherein the plurality of second sentences are sentences of a second natural language.
Step 404, inputting the first sentence and the i second sentence into the encoder of the target translation model, and outputting the first encoding vector corresponding to the first sentence and the second encoding vector corresponding to the i second sentence respectively.
In some embodiments, the encoder architecture in the target translation model is implemented as a dual encoder architecture; inputting the first sentence into a first encoder of the target translation model, and outputting a first encoding vector corresponding to the first sentence; inputting the ith second sentence into a second encoder of the target translation model, and outputting a second coding vector corresponding to the ith second sentence.
In other embodiments, the encoder architecture in the target translation model is implemented as a single encoder architecture; the first sentence and the ith second sentence are input into a target encoder of a target translation model, and a first coding vector corresponding to the first sentence and a second coding vector corresponding to the ith second sentence are respectively output.
Step 405, inputting the first encoding vector and the ith second encoding vector into a decoder of the target translation model, and outputting the ith candidate translation result.
The ith candidate translation result comprises an ith target probability distribution, wherein the ith target probability distribution is used for representing the probability that each word in the dictionary belongs to a translation of a second natural language corresponding to the first sentence.
The i candidate translation result further comprises a first context vector and an i second context vector, wherein the first context vector is used for representing the context information corresponding to the first sentence, and the i second context vector is used for representing the context information of the combined sentence formed by the first sentence and the i second sentence.
In some embodiments, the plurality of second sentences in the first translation memory also corresponds to a quality score, which is a score of the second sentences according to a preset scoring standard, wherein the quality score is used for representing the reliability degree of the second sentences. The method for outputting the i candidate translation result further comprises the following steps:
acquiring a quality score corresponding to the ith second sentence from the first translation memory; inputting the first coding vector, the ith second coding vector and the quality scores corresponding to the ith second sentence into a decoder of the target translation model, and outputting the ith candidate translation result.
When generating the translation of the first sentence, taking recalled translation memory (namely the second sentence) as a reference, and considering the quality of the translation memory of the reference, if the quality score of the translation memory is lower, the weight corresponding to the translation memory can be reduced, namely the influence of the translation memory on the generation of the translation is reduced; if the quality score of the translation memory is higher, the weight corresponding to the translation memory is increased, namely the influence of the translation memory on the generation of the translation is increased.
And step 406, performing linear mapping on the first context vector and the ith second context vector to obtain a weight parameter corresponding to the ith target probability distribution.
Schematically, if the target translation model is implemented as an autoregressive model, the context vector is a state vector output by the decoder at the last decoding moment, where the first context vector is a state vector output by the decoder at the last decoding moment after the first feature representation is decoded by the decoder alone; the ith second context vector is a state vector output by the decoder at the last decoding moment by jointly decoding the first feature representation and the ith second feature representation by the decoder.
Optionally, the first context vector and the ith second context vector are input to a linear layer of the target translation model, and a weight parameter corresponding to the ith candidate translation result is output.
Step 407, performing weighted average processing on the multiple target probability distributions based on the weight parameters to obtain a target translation sentence corresponding to the first sentence.
Schematically, assuming that a second sentence a and a second sentence b are determined, the probability distribution of the 1 st sequence in the output sequence corresponding to the second sentence a is [0.1,0.6,0.3], and the weight is 0.6; the probability distribution of the 1 st sequence in the output sequence corresponding to the second sentence b is [0.2,0.2,0.6], and the weight is 0.4; the probability distribution of the 1 st sequence in the final output sequence is [0.14,0.44,0.42], i.e., the 2 nd word in the dictionary is determined to be the 1 st sequence word in the output sequence.
After a plurality of sequence words are obtained, the sequence words are spliced according to the sequence, and the obtained sentences are target translation sentences.
Illustratively, taking an example of implementing the encoder architecture in the target translation model as a dual encoder architecture, please refer to fig. 5, which shows a flow chart of a machine translation method:
as shown in fig. 5, a source language is input into a translation memory 501 to obtain a memory sentence 1 and a memory sentence 2, and the source language is encoded by an encoder 502 to obtain a feature vector X; the memory sentence 1 and the memory sentence 2 are encoded by the encoder 503 to obtain the feature vector Z 1 And feature vector Z 2
The feature vector X of the source language obtained by encoding is respectively compared with the feature vector Z of the memory statement 1 1 Feature vector Z of memory statement 2 2 The connection is performed, and then the connection is input to the decoder 504, so as to obtain two decoding results, and the decoding result corresponding to the memory statement 1 is: target probability distribution P1 and first context vector H, second context vector H 3 The method comprises the steps of carrying out a first treatment on the surface of the The decoding result corresponding to the memory statement 2 is: target probability distribution P1 and first context vector H, second context vector H 3
Then { H, H 1 Sum { H, H 2 Input into the linear layer 505, obtain the weight coefficient k corresponding to the target probability distribution P1 and the weight coefficient k-1 corresponding to the target probability distribution P2, and the most And then calculating P1×k+P2× (k-1), obtaining a fusion probability distribution P, and after determining the fusion probability distribution P, outputting the word with the highest probability value in each row in the fusion probability distribution P, wherein the obtained output sequence is the target translation sentence corresponding to the source sentence.
In some embodiments, after performing fusion analysis on candidate translation results corresponding to the plurality of second sentences to obtain the target translation sentence corresponding to the first sentence, the method further includes: and updating the first translation memory according to the target translation statement.
Schematically, if the first translation memory is implemented as a memory including bilingual sentence pairs, a target sentence pair composed of the first sentence and the target translation sentence is constructed, and the target sentence pair is stored in the first translation memory; optionally, before storing the target sentence pair in the first translation memory, the target sentence pair may be repeatedly detected, that is, whether the first translation memory has a sentence pair identical to the target sentence pair exists, if so, the target sentence pair is not stored, and if not, the target sentence pair is stored in the first translation memory.
If the first translation memory is realized as a memory including the single-language sentences of the second natural language, the target translation sentences are directly stored in the first translation memory. Optionally, before storing the target translation sentence in the first translation memory, the target translation sentence may be repeatedly detected, that is, whether there is a sentence that is identical to the target translation sentence in the first translation memory, if so, the target translation sentence is not stored, and if not, the target translation sentence is stored in the first translation memory.
Optionally, before updating the first translation memory according to the target translation sentence, a quality score of the target translation sentence needs to be obtained, where the quality score may be a score of a manual evaluation or a score obtained through a scoring model, which is not limited in the embodiment of the present application; after the quality scores of the target translation sentences are obtained, the target translation sentences and the corresponding quality scores are correspondingly stored in a first translation memory bank.
In summary, the embodiment of the application provides a machine translation method, which performs joint coding on different pairs of source language-translation memory (i.e. second sentence) through recalling a plurality of translation memories of a source sentence to obtain different translation results, and finally fuses each translation result to obtain a target translation sentence of the source language. By fusing the translation results, the influence of the TM-enhanced machine translation model variance on the quality of the translation results is reduced, and the influence of the deviation possibly generated by the machine translation model on the quality of the translation results is reduced, so that the quality of translation sentences obtained by translation of the machine translation model is improved.
In the method provided by the embodiment of the application, the quality of the translation memory of the reference is considered in the process of generating the translation result by encoding, and if the quality score of the translation memory is lower, the influence of the translation memory on the generation of the translation can be reduced; if the quality score of the translation memory is higher, the influence of the translation memory on the generation of the translation is improved, and the quality of the finally obtained target translation sentence is improved.
According to the method provided by the application, through mapping the context vector, the weight parameter corresponding to the target probability distribution is determined based on the context information of the combined statement formed by the first statement, the first statement and the second statement output by the model, so that the target probability distribution is integrated according to the weight parameter, the influence of variance deviation of the model on a translation result is greatly reduced, and the quality of the obtained translation statement is improved.
In some alternative embodiments, the target translation model is obtained after training the first candidate translation model. Before introducing the training method of the target translation model, first, the variance characteristics of the TM-enhanced NMT model are analyzed from the perspective of the probability theory and the variance-bias decomposition of the search.
Illustratively, assume that the input source language sequence is: x= { x 1 ,x 2 ,...,x M -its length is M; the output target language sequence is: y= { y 1 ,y 2 ,...,y N And its length is N.
Embodiments of the present application provideThe translation system of the system comprises a retrieval module and a target translation model, wherein the retrieval module uses a BM25 method to retrieve K sentences TM from a translation memory database according to an input source language sequence:wherein->Is a bilingual sentence pair of the retrieval, < - >For statements in the source language->Is a statement in the target language. The target translation model models the target language based on the source language and the retrieved TM.
For a given source language sentence, the retrieval module may be considered a probabilistic model P (z|x), where Z is derived from the probabilistic sampling, P (z|x) is defined as the following equation one:
equation one:
where sim function is defined by BM25 retrieval method and T is the temperature coefficient. If T is small enough, then P (z) k I x) refers to the probability that the kth sentence TM is recalled in the case of the source language sequence x, and P (z|x) refers to the probability that the kth sentence TM is recalled in the case of the source language sequence x. And P (Z|x) establishes a hidden variable model relationship with the variable Z by the following formula II:
formula II:
wherein, P (y|x) refers to the probability of translating the source language sequence x into the target language sequence y when the source language sequence is x; p (y|x, Z) refers to the sequence x in the source language and recall TMIn the case of Z, the probability of translating the source language sequence x into the target language sequence y, E Z P (y|x, Z) is the desire for Z.
Since the above summation operation cannot be performed on all Z in practice, an approximate estimate can be made by monte carlo sampling, as shown in the following equation three:
And (3) a formula III: p (y|x) to P (y|x, Z), wherein Z to P (Z|x)
From the above equation three, it can be found that the TM-enhanced NMT model P (y|x, Z) can be regarded as being estimated by using the monte carlo method, and the estimated quality depends on the desire of the estimation error: e (E) Z (P(y|x,Z)-P(y|x)) 2 . While this expectation can be practically deduced as the variance of P (y|x, Z) with respect to the variable Z, the following equation four is deduced:
equation four: e (E) Z (P(y|x,Z)-P(y|x)) 2 =E Z (P(y|x,Z)-E Z (y|x,Z)) 2
=V Z P(y|x,Z)
From the above derivation, it can be seen that the variance V with respect to the variable Z Z P (y|x, Z) actually controls the quality of the estimation, and therefore, if the variance V Z Too large P (y|x, Z) means that the model's fluctuation of Z can have a large negative impact on the TM-enhanced NMT model.
The above only analyzes the variance for the variable Z, independent of x, y and model structure. To verify the conclusion of the theoretical analysis described above, the variance of the model can be estimated using the variance-bias decomposition commonly used in machine learning theory. The variance and bias of standard NMT and TM-enhanced NMT models were estimated on the JRC-Acquis De English translation dataset, respectively. Furthermore, to demonstrate that the conclusion of the theoretical analysis described above is independent of the specific model architecture, single and dual encoder architectures may be used, respectively. For illustration, please refer to table 2, which shows a comparison between variances and deviations of standard NMT and TM enhanced NMT models over different encoder architectures (the lower the variance and the better the deviation, the lower the variance is, the more important the low deviation is in a low resource scenario):
TABLE 2
From table 2, the following conclusions can be drawn: first, the TM-enhanced NMT has a higher variance when the model architecture is the same, which validates the theoretical analysis by the hidden variable model described above; secondly, when the model architecture is the same, the NMT with TM enhancement has lower deviation, which means that the model has stronger capability of fitting training set data, thereby explaining the reason that the NMT model with TM enhancement has better effect than the standard NMT model in a high resource scene; third, the variance of the model is affected by the size of the training set data, which less exacerbates the negative effects of model variance on model performance, and therefore, the larger variance of the TM-enhanced NMT model in table 2.
Based on the above analysis, fig. 6 is a flowchart of a machine translation method provided in an embodiment of the present application, where the method may be performed by a server or a terminal, or may be performed by both the server and the terminal, and in an embodiment of the present application, the method is described by taking the embodiment of the method performed by the server as an example, and steps 601 to 604 described below may be performed before step 301, as shown in fig. 6, the method includes:
step 601, a first sample sentence of a first natural language labeled with a first reference label is obtained.
The first reference label is used for indicating a reference translation of the second natural language corresponding to the first sample sentence.
Step 602, obtaining a second sample sentence which accords with the second semantic association relation with the first sample sentence from a second translation memory library.
The second sample sentence is a sentence of a second natural language, and the second translation memory comprises the sentence of the second natural language.
The second translation memory may be the same translation memory as the first translation memory, or may be a different translation memory.
The second translation memory is a database mainly storing sentences in the second natural language and is used for assisting in translating the sentences from the first natural language to the second natural language. Optionally, the second translation memory includes at least one of:
1. the second translation memory comprises a plurality of bilingual sentence pairs, wherein the bilingual sentence pairs comprise sentence pairs with inter-translation relations formed by sentences of the first natural language and sentences of the second natural language.
2. The second translation memory includes a plurality of single-language sentences of the second natural language.
Optionally, taking the second translation memory including a plurality of bilingual sentence pairs as an example, the determining the second sample sentence further includes:
Determining semantic similarity between the first sample sentence and a sentence of a first natural language in an h bilingual sentence pair in a second translation memory, wherein h is a positive integer; and determining the sentence of the second natural language in the bilingual sentence pair with the highest semantic similarity as a second sample sentence.
Alternatively, the method of calculating the similarity between sentences may use a statistical-based method, such as: a Best Matching (Best Matching 25, bm25) based method, a Term Frequency-inverse text Frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) based method, and the like; deep learning based methods may also be used, such as: text similarity calculation method based on Word2vec model, etc., the embodiment of the application is not limited to this.
And step 603, encoding the first sample sentence and the second sample sentence through an encoder in the first candidate translation model respectively to obtain a first sample vector corresponding to the first sample sentence and a second sample vector corresponding to the second sample sentence.
Alternatively, the first candidate translation model is implemented as a sequence-to-sequence translation model, which is primarily an Encoder-Decoder (Encoder-Decoder) based architecture.
In some embodiments, the encoder architecture in the first candidate translation model is implemented as a dual encoder architecture; inputting the first sample sentence into a first encoder of a first candidate translation model, and outputting a first sample vector corresponding to the first sample sentence; and inputting the second sample sentence into a second encoder of the first candidate translation model, and outputting a second sample vector corresponding to the second sample sentence.
In other embodiments, the encoder architecture in the first candidate translation model is implemented as a single encoder architecture; the first sample sentence and the second sample sentence are input into a target encoder of the first candidate translation model, and a first sample vector corresponding to the first sample sentence and a second sample vector corresponding to the second sample sentence are output respectively.
Illustratively, the first candidate translation model includes at least one of the following models: a Long Short-Term Memory (LSTM) based model, a Bi-directional Long-Term Memory (Bi-LSTM) based model, and a gated loop unit (Gate Recurrent Unit, GRU) based model.
Step 604, performing joint decoding on the first sample vector and the second sample vector through a decoder in the first candidate translation model to obtain a first prediction translation result corresponding to the first sample sentence, and training the first candidate translation model based on the difference between the second prediction translation result and the first reference label to obtain the first translation model.
The first predictive translation result comprises the probability that the words in the dictionary belong to the translations of the second natural language corresponding to the first sample sentence. The first translation model is a trained translation model, and can translate the input sentence in the first natural language into the sentence in the second natural language.
Optionally, the first predictive translation result comprises a first predictive probability distribution. The first sample vector and the second sample vector are input into a decoder of the target translation model, and a first prediction probability distribution is output, wherein the first prediction probability distribution is used for representing the probability that each word in the dictionary belongs to a translation of a second natural language corresponding to the first sample sentence.
Wherein each word in the dictionary is a plurality of words in the second natural language.
After the first prediction probability distribution is obtained through calculation, the word with the highest probability value in each row in the first prediction probability distribution can be output to obtain a first prediction translation statement, and then the first candidate translation model is trained by calculating the difference between the first prediction translation statement and the first reference label to obtain a first translation model.
Step 605, using the first translation model as a target translation model.
The steps 601 to 604 can be actually regarded as a pre-training stage of the target translation model, that is, the first candidate translation model is pre-trained by retrieving a second sample sentence that is semantically closest to the first sample sentence, so as to obtain a first translation model with relatively low variance, and then the first translation model can be directly used as the target translation model.
Then, in the application stage of the target translation model, for the source sentence to be translated, a plurality of translation memories of the source sentence can be recalled, different 'source language-translation memories' are input into the target translation model, prediction probability distributions corresponding to different translation memories are obtained, the probability distributions are averaged to obtain an average probability distribution, the word with the highest probability value in each row in the average probability distribution is taken as an output to obtain the translation result of the source sentence, the target translation model is illustratively realized as an LSTM model, and the average probability distribution is represented by the following formula five:
formula five:
wherein, P (y|x) refers to the probability of translating the source language sequence x into the target language sequence y when the source language sequence is x; k is the number of recalled translation memories; t is the time step in the LSTM model, N is the total number of time steps, i.e. the length of the decoded sequence y, y <t The sequence combination output in the first t-1 time steps is referred; p (y) t |x,y <t ,z k ) Refers to recalling the kth translation memory when the source language sequence is x and the output sequence is y <t In the case of a target language sequence y, the probability of the source language sequence x translating into the target language sequence y.
In summary, according to the machine translation method provided by the embodiment of the application, the sentence in the first natural language is translated into the sentence in the second natural language through the target translation model, in the process of training to obtain the target translation model, the translation memory most relevant to the sample sentence in the first natural language is recalled, then the candidate translation model is trained based on the training sample pair and the recalled translation memory to obtain the target translation model, the variance of the target translation model obtained through training is minimized, and the quality of the result obtained through translating by the target translation model is improved.
In some embodiments, in order to reduce the variance of the target translation model and simultaneously keep the advantage of lower deviation of the TM-enhanced NMT model, weight parameters can be configured for prediction probability distributions corresponding to different translation memories; the weight coefficient is obtained through model training, so that a stronger relation can be established between the source language sentence and each translation memory retrieved. Fig. 7 is a flowchart of a machine translation method provided in an embodiment of the present application, where the method may be performed by a server or a terminal, or may be performed by both the server and the terminal, and in an embodiment of the present application, the method is described by taking the embodiment of the method performed by the server as an example, and steps 701 to 707 described below may be performed after step 604, as shown in fig. 7, where the method includes:
And 701, adding a weight network into the first translation model to obtain a second candidate translation model to be trained.
The weight network is used for determining weight parameters corresponding to the prediction translation result of the decoding output. The weight network is a network layer to be subjected to network parameter adjustment. The first translation model is a trained translation model, and can translate the input sentence in the first natural language into the sentence in the second natural language.
Alternatively, the weighting network may be implemented as a linear layer, i.e. a fully connected layer, for linear mapping. I.e. connecting n linear layers after the decoder of the first translation model and randomly initializing network parameters in the n linear layers.
Step 702, a third sample sentence of the first natural language labeled with a second reference label is obtained.
The second reference tag is used for indicating a reference translation of a second natural language corresponding to the third sample sentence.
Step 703, obtaining a plurality of fourth sample sentences conforming to the first semantic association relationship with the third sample sentences from the third translation memory.
Wherein the third translation memory includes sentences in the second natural language.
The third translation memory may be the same translation memory as the first translation memory and the second translation memory, or may be different translation memories.
The third translation memory is a database mainly storing sentences in the second natural language and is used for assisting in translating the sentences from the first natural language to the second natural language. Optionally, the third translation memory comprises at least one of:
1. the third translation memory includes a plurality of bilingual sentence pairs, wherein the bilingual sentence pairs include sentence pairs having an inter-translation relationship formed by sentences of the first natural language and sentences of the second natural language.
2. The third translation memory includes a plurality of single-language sentences of the second natural language.
Taking the above-mentioned third translation memory as an example, the memory including bilingual sentence pairs is described, optionally, semantic similarity between the third sample sentence and the sentence of the first natural language in the third translation memory is determined; determining the sentences of the second natural language in the n bilingual sentences with the highest semantic similarity as n fourth sample sentences; or determining the sentence of the second natural language in the bilingual sentence pair with the semantic similarity reaching the similarity threshold as a fourth sample sentence.
And step 704, respectively encoding the third sample sentence and the fourth sample sentence by an encoder in the second candidate translation model to obtain a third sample vector corresponding to the third sample sentence and a fourth sample vector corresponding to the fourth sample sentence.
Wherein r is a positive integer.
Step 705, performing joint decoding on the third sample vector and the fourth sample vector corresponding to the r fourth sample sentence by using a decoder in the second candidate translation model, to obtain the r prediction translation result corresponding to the third sample sentence.
The r-th sub-translation result comprises the probability that the word in the dictionary belongs to the translation of the second natural language corresponding to the third sample sentence.
Optionally, the r-th predictive translation result includes an r-th candidate probability distribution, a first sample context vector, and an r-th second sample context vector. The r second prediction probability distribution is used for representing the probability that each word in the dictionary belongs to a translation of a second natural language corresponding to a third sample sentence, the first sample context vector is used for representing context information corresponding to the third sample sentence, and the r second sample context vector is used for representing context information of a combined sentence formed by the third sample sentence and the r fourth sample sentence.
And step 706, obtaining a weight parameter corresponding to the r-th sub-translation result through a weight network in the second candidate translation model.
And linearly mapping the first sample context vector and the r second sample context vector through a weight network in the second candidate translation model to obtain a weight parameter corresponding to the r candidate probability distribution.
And step 707, performing fusion analysis on the plurality of sub-translation results based on the weight parameters to obtain a second prediction translation result corresponding to the third sample sentence, and adjusting network parameters of the weight network based on the difference between the second prediction translation result and the second reference label to obtain the target translation model.
Optionally, weighted average processing is performed on candidate probability distributions respectively included in the plurality of sub-translation results based on the weight parameters, so as to obtain second prediction probability distributions corresponding to the third sample sentences.
After the second prediction probability distribution is obtained through calculation, the word with the highest probability value in each row in the second prediction probability distribution can be output to obtain a second prediction translation statement, then the network parameters of the weight network are adjusted through calculation of the difference between the second prediction translation statement and a second reference label, so that a second candidate translation model is finely adjusted to obtain a target translation model, and the first translation model is illustratively used as an LSTM model, and the second prediction probability distribution is shown in the following formula six:
formula six:
wherein, P (y|x) refers to the probability of translating the source language sequence x into the target language sequence y when the source language sequence is x; k is the number of recalled translation memories; t is the time step in the LSTM model, N is the total number of time steps, i.e. the length of the decoded sequence y, y <t The sequence combination output in the first t-1 time steps is referred; p (y) t |x,y <t ,z k ) Refers to recalling the kth translation memory when the source language sequence is x and the output sequence is y <t In the case of (2), the probability of translating the source language sequence x into the target language sequence y; w (x, y) <t ,z k ) Is a weight coefficient.
Optionally, the weight coefficient w (x, y <t ,z k ) The method is determined according to the neural network decoding state of the current time step, the sum of weight coefficients w corresponding to all k translation memories is 1, and the weight coefficients can be expressed as the following formula seven:
formula seven: w (x, y) <t ,z k )≈Softmax(f(H t ,H t,k ))[k]
Wherein f is two linear layers, softmax is the activation function, H t And H t,k The decoding status of the standard NMT model (i.e., the first sample context vector) and the decoding status of the TM-enhanced NMT model (i.e., the second sample context vector), respectively.
In summary, in the machine translation method provided by the embodiment of the present application, in the process of training to obtain the target translation model, the translation memory most relevant to the sample sentence of the first natural language is recalled, and the candidate translation model is trained based on the training sample pair and the recalled translation memory, so as to obtain the first translation model; and then adding a weight layer into the first translation model, and performing fine adjustment on the translation model added with the weight layer by using a training sample, thereby obtaining a target translation model. The weight layer is used for configuring weight parameters for prediction probability distribution corresponding to different translation memories respectively, and can establish stronger connection between a source language sentence and each retrieved translation memory, so that the variance of a target translation model is reduced, the deviation of the target translation model is reduced, and finally the quality of a result obtained by translating the target translation model is improved.
The beneficial effects of the embodiments of the present application are described below with reference to the data of tables 3 to 6 and fig. 8 and 9:
schematically, the target model in table 3 is a model proposed by the embodiment of the present application, and it can be found that the target model has a variance reduced to a certain extent compared with the reference model, and in addition, the target model can also achieve a lower deviation, so that an effective model variance-deviation trade-off is achieved.
TABLE 3 Table 3
Model Reference model Target model
Variance of 0.2168 0.1814
Deviation of 1.8460 1.9137
Tables 4 and 5 show the effectiveness of the machine translation method performed by the object model according to the embodiment of the present application in a low resource scenario. In particular, table 4 shows the results on four translation tasks on the JRC-Acquis dataset, with the target model being raised by a maximum of 1.84 BLEU values over the reference model. Table 5 shows the results on the multi-domain dataset, also demonstrating the effectiveness of the machine translation method performed by the object model proposed by the application examples.
TABLE 4 Table 4
Model Siemens (Siemens) Yingxi Deying Ind & d
Reference model 57.31 55.06 53.92 48.76
Target model 59.14 56.53 55.36 50.51
TABLE 5
Model Medical treatment Law of law IT Classic Captions Average of
Reference model 43.53 49.36 32.76 14.43 20.03 32.02
Target model 47.97 52.28 35.84 16.59 22.58 35.05
For illustration, please refer to fig. 8 and fig. 9, in the plug-and-play scenario, model training is performed using one-fourth of training set data, then during testing, the construction of translation memory databases is performed using two-fourth, three-quarter and four-fourth of training data, and then the translated results are tested respectively.
FIG. 8 illustrates the effectiveness of the machine translation method proposed by embodiments of the present application in the plug-and-play scenario of JRC-Acquis; statistical chart 801 shows the BLEU value of the german-western language model, statistical chart 802 shows the BLEU value of the german-german model, statistical chart 803 shows the BLEU value of the english-chinese model, and statistical chart 804 shows the BLEU value of the chinese-english model.
FIG. 9 illustrates the effectiveness of the machine translation method proposed by embodiments of the present application in the context of plug-and-play of multi-domain datasets; statistical plot 901 shows the BLEU value of the model trained on the medical dataset, statistical plot 902 shows the BLEU value of the model trained on the legal dataset, statistical plot 903 shows the BLEU value of the model trained on the IT dataset, statistical plot 904 shows the BLEU value of the model trained on the subtitle dataset, and statistical plot 905 shows the average BLEU value of the first 4 models.
The graphs of the statistical diagrams in fig. 8 and fig. 9 show that the target model provided by the embodiment of the application has a larger translation effect improvement compared with the reference model in the plug-and-play scene.
Table 6 shows the effectiveness of the machine translation method proposed in the embodiment of the present application in a high resource scenario. Specifically, model training is performed using all training set data in this scenario, and the translation result is tested. The results in table 6 show that the target model is at most 0.88 BLEU values above the reference model.
TABLE 6
Model Siemens (Siemens) Yingxi Deying Ind & d
Reference model 66.42 62.81 63.76 57.79
Target model 66.89 63.61 64.29 58.67
Referring to FIG. 10, a block diagram of a machine translation device according to an exemplary embodiment of the present application is shown, the device includes the following modules:
an obtaining module 1000, configured to obtain a first sentence of a first natural language;
the obtaining module 1000 is further configured to obtain, from a first translation memory, a plurality of second sentences conforming to a first semantic association relationship with the first sentence, where the plurality of second sentences are sentences in a second natural language, and the translation memory includes sentences in the second natural language;
the extracting module 1010 is configured to perform feature extraction on the first sentence and the i-th second sentence, so as to obtain a first feature representation corresponding to the first sentence and a second feature representation corresponding to the i-th second sentence, where i is a positive integer;
a decoding module 1020, configured to jointly decode the first feature representation and a second feature representation corresponding to an i second sentence to obtain an i candidate translation result corresponding to the first sentence, where the candidate translation result includes a probability that a word in a dictionary belongs to a translation of a second natural language corresponding to the first sentence;
And the analysis module 1030 is configured to perform fusion analysis on candidate translation results corresponding to the multiple second sentences to obtain a target translation sentence corresponding to the first sentence, where the target translation sentence is a translation of the second natural language corresponding to the first sentence.
Referring to fig. 10, in some embodiments, the first translation memory includes a plurality of bilingual sentence pairs, where the plurality of bilingual sentence pairs includes sentence pairs that are formed by sentences in a first natural language and sentences in a second natural language and have a mutual translation relationship; the acquiring module 1000 further includes:
a determining unit 1001, configured to determine a semantic similarity between the first sentence and a sentence in a first natural language in the first translation memory;
the obtaining module 1000 is further configured to determine, as a second sentence, a sentence in a second natural language in the bilingual sentence pair whose semantic similarity reaches a similarity threshold; or determining the sentence of the second natural language in the n bilingual sentences with the highest semantic similarity as a second sentence, wherein n is an integer greater than 1, and i is less than or equal to n.
In some embodiments, the first translation memory includes a plurality of monolingual statements in a second natural language; the acquiring module 1000 further includes:
An extracting unit 1002, configured to extract a first feature vector corresponding to the first sentence, and extract a second feature vector corresponding to a single-language sentence of the second natural sentence in the first translation memory;
the determining unit 1001 is configured to determine a vector distance between the aligned first feature vector and the aligned second feature vector after performing an alignment process on the first feature vector and the second feature vector;
the obtaining module 1000 is further configured to determine, as a second sentence, a monolingual sentence of a second natural language in which the vector distance is less than or equal to the distance threshold; or, determining the single-language sentences of the second natural language corresponding to the m second feature vectors with the smallest vector distance as m second sentences, wherein m is an integer larger than 1, and i is less than or equal to m.
In some embodiments, the extracting module 1010 is configured to input the first sentence and the i-th second sentence into an encoder of a target translation model, and output a first encoding vector corresponding to the first sentence and a second encoding vector corresponding to the i-th second sentence respectively; a decoding module 1020, configured to input the first encoding vector and the ith second encoding vector into a decoder of the target translation model, and output the ith candidate translation result, where the ith candidate translation result includes an ith target probability distribution.
In some embodiments, the plurality of second sentences in the first translation memory also corresponds to a quality score, the quality score being used to characterize the degree of reliability of the second sentences;
the obtaining model 1000 is further configured to obtain a quality score corresponding to the ith second sentence from the first translation memory; the extracting module 1010 is configured to input the first encoding vector, the ith second encoding vector, and the quality score corresponding to the ith second sentence into a decoder of the target translation model, and output the ith candidate translation result.
In some embodiments, the i-th candidate translation result further includes a first context vector and an i-th second context vector, where the first context vector is used to represent context information corresponding to the first sentence, and the i-th second context vector is used to represent context information of a joint sentence formed by the first sentence and the i-th second sentence; the analysis module 1030 includes:
a mapping unit 1031, configured to linearly map the first context vector and the ith second context vector to obtain a weight parameter corresponding to the ith target probability distribution;
And a weighting unit 1032, configured to perform weighted average processing on the multiple target probability distributions based on the weight parameter, so as to obtain the target translation sentence corresponding to the first sentence.
In some embodiments, the apparatus further comprises:
the training module 1040 is configured to obtain a first sample sentence of a first natural language labeled with a first reference tag, where the first reference tag is used to indicate a reference translation of a second natural language corresponding to the first sample sentence;
the training module 1040 is configured to obtain, from a second translation memory, a second sample sentence that conforms to a second semantic association relationship with the first sample sentence, where the second sample sentence is a sentence in a second natural language, and the second translation memory includes a sentence in the second natural language;
the training module 1040 is configured to encode the first sample sentence and the second sample sentence by using an encoder in a first candidate translation model, so as to obtain a first sample vector corresponding to the first sample sentence and a second sample vector corresponding to the second sample sentence;
the training module 1040 performs joint decoding on the first sample vector and the second sample vector by using a decoder in the first candidate translation model to obtain a first predicted translation result corresponding to the first sample sentence, and trains the first candidate translation model based on a difference between the first predicted translation result and the first reference label to obtain the first translation model, where the first predicted translation result includes a probability that a word in the dictionary belongs to a translation of a second natural language corresponding to the first sample sentence;
The training module 1040 uses the first translation model as the target translation model.
In some embodiments, the training module 1040 is further configured to add a weight network to the first translation model to obtain a second candidate translation model to be trained, where the weight network is a network to be subjected to network parameter adjustment; the training module 1040 is further configured to obtain a third sample sentence of the first natural language labeled with a second reference tag, where the second reference tag is used to indicate a reference translation of the second natural language corresponding to the third sample sentence; the training module 1040 is further configured to obtain a plurality of fourth sample sentences that conform to the first semantic association relationship with the third sample sentences from a third translation memory, where the third translation memory includes sentences in a second natural language; the training module 1040 is further configured to encode the third sample sentence and the r fourth sample sentence by using an encoder in the second candidate translation model, so as to obtain a third sample vector corresponding to the third sample sentence and a fourth sample vector corresponding to the r fourth sample sentence, where r is a positive integer; the training module 1040 is further configured to jointly decode, by using a decoder in the second candidate translation model, the third sample vector and a fourth sample vector corresponding to the r fourth sample sentence, to obtain an r-th sub-translation result corresponding to the third sample sentence, where the r-th sub-translation result includes a probability that a word in the dictionary belongs to a translation of a second natural language corresponding to the third sample sentence; the training module 1040 is further configured to obtain a weight parameter corresponding to the r-th sub-translation result through the weight network in the second candidate translation model; the training module 1040 is further configured to perform fusion analysis on the plurality of sub-translation results based on the weight parameter, obtain a second predicted translation result corresponding to the third sample sentence, and adjust a network parameter of the weight network based on a difference between the second predicted translation result and the second reference label, so as to obtain the target translation model.
In some embodiments, the second translation memory includes a plurality of bilingual sentence pairs, where the plurality of bilingual sentence pairs includes sentence pairs that have an inter-translation relationship between a sentence in a first natural language and a sentence in a second natural language; the training module 1040 is further configured to determine semantic similarity between the first sample sentence and a sentence in the first natural language in the second translation memory; the training module 1040 is further configured to determine, as the second sample sentence, a sentence in a second natural language in the bilingual sentence pair with the highest semantic similarity.
In some embodiments, the apparatus further comprises:
an update module 1050 for adding the target translation statement to the first translation memory.
In summary, the embodiment of the present application provides a machine translation device, through recalling a plurality of translation memories of a source sentence, performing joint encoding on different pairs of "source language-translation memory (i.e., a second sentence)" to obtain different translation results, and finally fusing each translation result to obtain a target translation sentence of a source language. By fusing the translation results, the influence of the TM-enhanced machine translation model variance on the quality of the translation results is reduced, and the influence of the deviation possibly generated by the machine translation model on the quality of the translation results is reduced, so that the quality of translation sentences obtained by translation of the machine translation model is improved.
It should be noted that: the machine translation device provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the embodiments of the machine translation device and the machine translation method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the machine translation device and the machine translation method are detailed in the method embodiments, which are not repeated herein.
Fig. 12 shows a block diagram of a computer device 1200 provided in accordance with an exemplary embodiment of the present application. The computer device 1200 may be: a smart phone, a tablet computer, a dynamic video expert compression standard audio layer 3 player (Moving Picture Experts Group Audio Layer III, MP 3), a dynamic video expert compression standard audio layer 4 (Moving Picture Experts Group Audio Layer IV, MP 4) player, a notebook computer, or a desktop computer. Computer device 1200 may also be referred to by other names as user device, portable computer device, laptop computer device, desktop computer device, etc.
In general, the computer device 1200 includes: a processor 1201 and a memory 1202.
Processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1201 may also include a main processor, which is a processor for processing data in an awake state, also called a central processor (Central Processing Unit, CPU), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1201 may integrate with an image processor (Graphics Processing Unit, GPU) for rendering and rendering of content required for display by the display screen. In some embodiments, the processor 1201 may also include an artificial intelligence (Artificial Intelligence, AI) processor for processing computing operations related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement the machine translation method provided by the method embodiments of the present application.
Illustratively, the computer device 1200 also includes other components, and those skilled in the art will appreciate that the structure illustrated in FIG. 12 is not limiting of the computer device 1200, and may include more or fewer components than illustrated, or may combine certain components, or employ a different arrangement of components.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing related hardware, and the program may be stored in a computer readable storage medium, which may be a computer readable storage medium included in the memory of the above embodiments; or may be a computer-readable storage medium, alone, that is not assembled into a computer device. The computer readable storage medium has stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which are loaded and executed by the processor to implement the machine translation method of any of the above embodiments.
Alternatively, the computer-readable storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), solid state disk (SSD, solid State Drives), or optical disk, etc. The random access memory may include resistive random access memory (ReRAM, resistance Random Access Memory) and dynamic random access memory (DRAM, dynamic Random Access Memory), among others. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (14)

1. A machine translation method, the method comprising:
Acquiring a first sentence of a first natural language;
acquiring a plurality of second sentences conforming to a first semantic association relation with the first sentences from a first translation memory library, wherein the plurality of second sentences are sentences in a second natural language, and the translation memory library comprises sentences in the second natural language;
respectively extracting features of the first sentence and the ith second sentence to obtain a first feature representation corresponding to the first sentence and a second feature representation corresponding to the ith second sentence, wherein i is a positive integer;
performing joint decoding on the first characteristic representation and a second characteristic representation corresponding to an ith second sentence to obtain an ith candidate translation result corresponding to the first sentence, wherein the candidate translation result comprises the probability that a word in a dictionary belongs to a translation of a second natural language corresponding to the first sentence;
and carrying out fusion analysis on candidate translation results corresponding to the plurality of second sentences to obtain target translation sentences corresponding to the first sentences, wherein the target translation sentences are translations of the second natural language corresponding to the first sentences.
2. The method of claim 1, wherein the first translation memory comprises a plurality of bilingual sentence pairs, the plurality of bilingual sentence pairs comprising sentence pairs having a mutual translation relationship between a sentence in a first natural language and a sentence in a second natural language;
The obtaining the plurality of second sentences conforming to the first semantic association relation with the first sentences from the first translation memory library comprises the following steps:
determining semantic similarity between the first sentence and a sentence of a first natural language in the first translation memory;
determining the sentence of the second natural language in the bilingual sentence pair with the semantic similarity reaching the similarity threshold as a second sentence; or determining the sentence of the second natural language in the n bilingual sentences with the highest semantic similarity as a second sentence, wherein n is an integer greater than 1, and i is less than or equal to n.
3. The method of claim 1, wherein the first translation memory includes a plurality of single-language sentences in a second natural language;
the obtaining the plurality of second sentences conforming to the first semantic association relation with the first sentences from the first translation memory library comprises the following steps:
extracting a first feature vector corresponding to the first sentence and extracting a second feature vector corresponding to a single-language sentence of a second natural sentence in the first translation memory;
after the first feature vector and the second feature vector are aligned, determining a vector distance between the aligned first feature vector and the aligned second feature vector;
Determining a single-language sentence of a second natural language corresponding to a second feature vector with a vector distance smaller than or equal to a distance threshold as a second sentence; or, determining the single-language sentences of the second natural language corresponding to m second feature vectors with the minimum vector distance as m second sentences, wherein m is an integer greater than 1, and i is less than or equal to m.
4. A method according to any one of claims 1 to 3, wherein the feature extraction is performed on the first sentence and the i-th second sentence, respectively, to obtain a first feature representation corresponding to the first sentence and a second feature representation corresponding to the i-th second sentence, and the method comprises:
inputting the first sentence and the ith second sentence into an encoder of a target translation model, and respectively outputting a first coding vector corresponding to the first sentence and a second coding vector corresponding to the ith second sentence;
the step of performing joint decoding on the first feature representation and the second feature representation corresponding to the ith second sentence to obtain an ith candidate translation result corresponding to the first sentence, including:
inputting the first coding vector and the ith second coding vector into a decoder of the target translation model, and outputting the ith candidate translation result, wherein the ith candidate translation result comprises an ith target probability distribution.
5. The method of claim 4, wherein a plurality of second sentences in the first translation memory also correspond to quality scores, the quality scores being used to characterize the degree of reliability of the second sentences;
the inputting the first coding vector and the ith second coding vector into the decoder of the target translation model, and outputting the ith candidate translation result, including:
obtaining a quality score corresponding to the ith second sentence from the first translation memory;
and inputting the first coding vector, the ith second coding vector and the quality score corresponding to the ith second sentence into a decoder of the target translation model, and outputting the ith candidate translation result.
6. The method of claim 4, wherein the i-th candidate translation result further comprises a first context vector and an i-th second context vector, the first context vector being used for representing context information corresponding to the first sentence, the i-th second context vector being used for representing context information of a joint sentence composed of the first sentence and the i-th second sentence;
The fusion analysis is performed on candidate translation results corresponding to the second sentences respectively to obtain target translation sentences corresponding to the first sentences, which comprises the following steps:
linearly mapping the first context vector and the ith second context vector to obtain a weight parameter corresponding to the ith target probability distribution;
and carrying out weighted average processing on a plurality of target probability distributions based on the weight parameters to obtain the target translation statement corresponding to the first statement.
7. The method of claim 4, wherein before inputting the first sentence and the i-th second sentence into the encoder of the target translation model and outputting the first encoded vector corresponding to the first sentence and the second encoded vector corresponding to the i-th second sentence, respectively, the method further comprises:
acquiring a first sample sentence of a first natural language marked with a first reference label, wherein the first reference label is used for indicating a reference translation of a second natural language corresponding to the first sample sentence;
obtaining a second sample sentence which accords with a second semantic association relation with the first sample sentence from a second translation memory library, wherein the second sample sentence is a sentence of a second natural language, and the second translation memory library comprises the sentence of the second natural language;
Encoding the first sample sentence and the second sample sentence through an encoder in a first candidate translation model respectively to obtain a first sample vector corresponding to the first sample sentence and a second sample vector corresponding to the second sample sentence;
the decoder in the first candidate translation model is used for carrying out joint decoding on the first sample vector and the second sample vector to obtain a first prediction translation result corresponding to the first sample sentence, and training the first candidate translation model based on the difference between the first prediction translation result and the first reference label to obtain the first translation model, wherein the first prediction translation result comprises the probability that the words in the dictionary belong to the translations of the second natural language corresponding to the first sample sentence;
and taking the first translation model as the target translation model.
8. The method of claim 7, wherein the method further comprises:
adding a weight network into the first translation model to obtain a second candidate translation model to be trained, wherein the weight network is a network to be subjected to network parameter adjustment;
Acquiring a third sample sentence of the first natural language marked with a second reference label, wherein the second reference label is used for indicating a reference translation of the second natural language corresponding to the third sample sentence;
obtaining a plurality of fourth sample sentences conforming to the first semantic association relation with the third sample sentences from a third translation memory library, wherein the third translation memory library comprises sentences of a second natural language;
encoding the third sample sentence and the fourth sample sentence respectively through an encoder in a second candidate translation model to obtain a third sample vector corresponding to the third sample sentence and a fourth sample vector corresponding to the fourth sample sentence, wherein r is a positive integer;
the third sample vector and a fourth sample vector corresponding to the r fourth sample sentence are subjected to joint decoding through a decoder in the second candidate translation model, and an r sub-translation result corresponding to the third sample sentence is obtained, wherein the r sub-translation result comprises the probability that a word in the dictionary belongs to a translation of a second natural language corresponding to the third sample sentence;
acquiring weight parameters corresponding to the r-th sub-translation result through the weight network in the second candidate translation model;
And carrying out fusion analysis on the plurality of sub-translation results based on the weight parameters to obtain a second prediction translation result corresponding to the third sample sentence, and adjusting network parameters of the weight network based on the difference between the second prediction translation result and the second reference label to obtain the target translation model.
9. The method according to claim 7 or 8, wherein the second translation memory includes a plurality of bilingual sentence pairs, the plurality of bilingual sentence pairs including sentence pairs having an inter-translation relationship between a sentence in a first natural language and a sentence in a second natural language;
the obtaining a second sample sentence which accords with a second semantic association relation with the first sample sentence from a second translation memory library comprises the following steps:
determining semantic similarity between the first sample sentence and a sentence of a first natural language in the second translation memory;
and determining the sentence of the second natural language in the bilingual sentence pair with the highest semantic similarity as the second sample sentence.
10. The method according to any one of claims 1 to 3, wherein after performing fusion analysis on candidate translation results corresponding to the plurality of second sentences to obtain a target translation sentence corresponding to the first sentence, the method further comprises:
And adding the target translation sentence into the first translation memory.
11. A data translation apparatus, the apparatus comprising:
the acquisition module is used for acquiring a first sentence of a first natural language;
the obtaining module is further configured to obtain a plurality of second sentences conforming to the first semantic association relationship with the first sentence from a first translation memory, where the plurality of second sentences are sentences in a second natural language, and the translation memory includes sentences in the second natural language;
the extraction module is used for extracting the characteristics of the first sentence and the ith second sentence respectively to obtain a first characteristic representation corresponding to the first sentence and a second characteristic representation corresponding to the ith second sentence, wherein i is a positive integer;
the decoding module is used for carrying out joint decoding on the first characteristic representation and the second characteristic representation corresponding to the ith second sentence to obtain an ith candidate translation result corresponding to the first sentence, wherein the candidate translation result comprises the probability that the words in the dictionary belong to the translation of the second natural language corresponding to the first sentence;
and the analysis module is used for carrying out fusion analysis on candidate translation results corresponding to the plurality of second sentences respectively to obtain target translation sentences corresponding to the first sentences, wherein the target translation sentences are translations of the second natural language corresponding to the first sentences.
12. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the machine translation method of any of claims 1 to 10.
13. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the machine translation method of any of claims 1 to 10.
14. A computer program product comprising a computer program which, when executed by a processor, implements the machine translation method of any one of claims 1 to 10.
CN202310304742.2A 2023-03-24 2023-03-24 Machine translation method, apparatus, device, medium, and program product Pending CN116956950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310304742.2A CN116956950A (en) 2023-03-24 2023-03-24 Machine translation method, apparatus, device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310304742.2A CN116956950A (en) 2023-03-24 2023-03-24 Machine translation method, apparatus, device, medium, and program product

Publications (1)

Publication Number Publication Date
CN116956950A true CN116956950A (en) 2023-10-27

Family

ID=88455422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310304742.2A Pending CN116956950A (en) 2023-03-24 2023-03-24 Machine translation method, apparatus, device, medium, and program product

Country Status (1)

Country Link
CN (1) CN116956950A (en)

Similar Documents

Publication Publication Date Title
KR102565274B1 (en) Automatic interpretation method and apparatus, and machine translation method and apparatus
US11593556B2 (en) Methods and systems for generating domain-specific text summarizations
US20190005946A1 (en) Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
CN110807332A (en) Training method of semantic understanding model, semantic processing method, semantic processing device and storage medium
CN111027331A (en) Method and apparatus for evaluating translation quality
US20180329894A1 (en) Language conversion method and device based on artificial intelligence and terminal
CN110598224A (en) Translation model training method, text processing device and storage medium
EP3707622A1 (en) Generation of text from structured data
CN110175336B (en) Translation method and device and electronic equipment
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN110795945A (en) Semantic understanding model training method, semantic understanding device and storage medium
CN111401084A (en) Method and device for machine translation and computer readable storage medium
CN114676234A (en) Model training method and related equipment
CN113505198B (en) Keyword-driven generation type dialogue reply method and device and electronic equipment
CN114254660A (en) Multi-modal translation method and device, electronic equipment and computer-readable storage medium
CN113392265A (en) Multimedia processing method, device and equipment
CN114757210A (en) Translation model training method, sentence translation method, device, equipment and program
CN115269828A (en) Method, apparatus, and medium for generating comment reply
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
Wang et al. Data augmentation for internet of things dialog system
CN113849623A (en) Text visual question answering method and device
CN114330372A (en) Model training method, related device and equipment
CN113609873A (en) Translation model training method, device and medium
CN115983294B (en) Translation model training method, translation method and translation equipment
CN115879480A (en) Semantic constraint machine translation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication