CN110442880B

CN110442880B - Translation method, device and storage medium for machine translation

Info

Publication number: CN110442880B
Application number: CN201910721252.6A
Authority: CN
Inventors: 林芯玥; 刘晋; 宋俊杰
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2022-09-30
Anticipated expiration: 2039-08-06
Also published as: CN110442880A

Abstract

The invention discloses a translation method, a translation device and a storage medium for machine translation, which comprise the following steps: receiving a source sentence to be translated; performing word segmentation processing on the source sentences; acquiring the part of speech of each word in the participle; according to the word vector model, the part of speech is fused into a word vector corresponding to a word, and a fused word vector sequence is obtained; inputting the word vector sequence into a coder decoder model to obtain a coding and decoding result; for the coding and decoding results, evaluating the results based on a beam search evaluation function, wherein the beam search evaluation function comprises a penalty term based on length comparison and a penalty term of repeated detection; and obtaining a translation according to the evaluation result. By applying the embodiment of the invention, the problems of repeated fragments in the translated text and source sentence omission are solved, and the method has the advantages of wide application range, strong pertinence and higher translated text quality.

Description

Translation method, device and storage medium for machine translation translated text

Technical Field

The invention relates to the technical field of machine translation improvement, in particular to a translation method, a translation device and a storage medium for machine translation.

Background

Language is the most important carrier for human information exchange at ordinary times, which has a very important influence on the development of the whole society, and a method for machine automated translation has become an urgent need at present. Implementing automated translation in different languages has resulted in a huge number of application controls.

At present, a machine translation method based on rules needs professional linguists to formulate a large number of rules, and is high in labor cost and poor in expandability. The machine translation method based on the intermediate language needs to formulate a set of universal intermediate language, and has high difficulty and low robustness. Although the machine translation method based on statistics is low in labor cost and improved in expansibility, the translated text quality is still poor. Machine translation methods based on neural networks are the most advanced machine translation methods at present, but there is room for improvement in the quality of translated text.

Disclosure of Invention

The invention aims to provide a translation method, a translation device and a storage medium for machine translation of a translated text, and aims to solve the problem of poor quality of the translated text generated by the existing machine translation model.

In order to achieve the above object, the present invention provides a translation method for machine-translated translations, the method comprising:

receiving a source sentence to be translated;

performing word segmentation processing on the source sentence;

acquiring the part of speech of each word in the participle;

according to the word vector model, the part of speech is fused into a word vector corresponding to a word, and a fused word vector sequence is obtained;

inputting the word vector sequence into a coder decoder model to obtain a coding and decoding result;

for the coding and decoding result, evaluating the result based on a beam search evaluation function, wherein the beam search evaluation function comprises a penalty term based on length comparison and a penalty term of repeated detection;

and obtaining a translation according to the evaluation result.

Further, the beam search evaluation function is specifically expressed as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

wherein X (Y, X) is a beam search merit function, log (P (Y/X)) is a probability function of Y occurrence at X occurrence, d (X) is a penalty term based on duplicate detection, l (X) is a penalty based on length comparison, P is a distribution function;

adding a penalty term based on a length ratio into the beam search evaluation function for solving the problem of partial missing translation;

and adding a penalty term based on repeated detection in the beam search evaluation function for solving the problem of repeated content of the translation.

Further, the specific formula of the duplicate detection penalty term d (x) is expressed as:

wherein c is the index of the current translation word, delta is the range of repeated detection, epsilon is a penalty coefficient, y is the matrix corresponding to the candidate translation, and y is _c-j ，y _c-i-j Two matrices, i, j, for repeated detection are traversal variables.

Further, the step of evaluating the result based on the beam search evaluation function for the codec result includes:

the ratio of the length of the source sentence to the length of the target translation;

fitting the length ratio through linear regression to obtain a cumulative distribution function;

probability F of ending translation when period end mark and common word appear in candidate word of beam search _X (x) And probability 1-F that the translation is not finished _X (x) Are added to the evaluation function l (x) ═ θ F _X (x)，not_EOS，l(x)＝θ(1-F _X (x) EOS, where EOS is a period end marker and θ is a parameter;

when the candidate word is a sentence end mark, multiplying the probability which is not translated well by a penalty factor to be used as a penalty item;

when the candidate word is not the sentence end mark, multiplying the probability of completing translation by a penalty factor to be used as a penalty item;

adding the obtained penalty term based on the length ratio into an evaluation function of beam searching;

and evaluating the result based on the beam search evaluation function.

Further, in the encoder-decoder model, a bidirectional recurrent neural network is used for both the encoder part and the decoder part.

Further, the step of inputting the word vector sequence into a coder-decoder model to obtain a coding and decoding result includes:

inputting the sequence of word vectors into a coder decoder model;

converting the word vector sequence into a sentence vector based on a deep learning framework of a coder-decoder;

at the decoder, the sentence vectors are converted into a sequence of word vectors.

In addition, the invention also discloses a machine translation device, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,

the memory is used for storing a translation program of the machine translation;

the processor is used for executing a translation program of the machine translation, so as to realize the translation step of any machine translation.

And a computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to cause the one or more processors to perform the steps of translating the machine translated version of any of the above.

By applying the translation method, the translation device and the translation storage medium for the machine-translated translation provided by the embodiment of the invention, while the semantic association between different words is established by effectively constructing vectors, the meaning of the words under different parts of speech is included, the problems of repeated segments and source sentences omission in the translation are improved by correcting the beam search evaluation function, and the translation method, the translation device and the translation storage medium have the advantages of wide application range, strong pertinence and higher translated text quality.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an embodiment of the present invention.

Fig. 3 is another schematic structural diagram of an embodiment of the present invention.

FIG. 4 is a schematic diagram illustrating a penalty term algorithm for duplicate detection according to an embodiment of the present invention.

FIG. 5 is a schematic diagram illustrating a penalty term algorithm for length ratio according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating effects in the translation according to an embodiment of the present invention.

Detailed Description

The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

The language Model (language Model) is a simple, uniform and abstract form system, and the language objective fact can be automatically processed by a computer after being described by the language Model, so the language Model has great significance for information processing of natural language and has important effects on researches such as part-of-speech tagging, syntactic analysis, voice recognition and the like.

In the machine translation problem, both the input original language sentence and the output target language translation can be regarded as a sequence, and thus the machine translation can be regarded as a sequence-to-sequence problem. The mainstream method for solving the sequence-to-sequence problem at present is a coder-decoder model, wherein a coder codes an original language sentence into a sentence vector, and a decoder decodes the sentence vector again to obtain a target language translation.

Note that a recurrent neural network RNN is generally used as an encoder and a decoder. Recurrent Neural Networks (RNNs) are a classical neural Network structure that contains recurrent neural elements and thus can process serializable data and allow persistence of data information. The RNN will train the current input along with the previous input as parameters and get the output. A Bidirectional Recurrent Neural Network (Bi-RNN) is a Network structure improved based on a Recurrent Neural Network. In some tasks, the inputs of the network are not only related to past inputs, but also have some relation to subsequent inputs. Therefore, in addition to the forward sequence, the reverse sequence is also input. The bidirectional cyclic neural network consists of two layers of cyclic neural networks, supports simultaneous input of a forward sequence and a reverse sequence, and effectively improves the performance of the network.

Word vector (WordVector) a collective term for a set of language modeling and feature learning techniques in embedded Natural Language Processing (NLP) where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions. The Skip-gram model is a model structure used for generating distributed representation of words when a neural network trains a language model, and the Skip-gram model predicts the context of a current word by taking a word vector of the word as input. Beam Search (Beam Search) Beam Search is a heuristic Search algorithm that explores the graph by expanding the most promising nodes in a finite set.

The beam search is an optimization of the best-first search, which may reduce its memory requirements. The best search is a graph search that ranks all partial solutions (states) according to some heuristic commands that attempt to predict how close the partial solutions are to the complete solution (target state). But only a predetermined number of best local solutions are retained as candidates in the beam search.

Please refer to fig. 1-6. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The invention provides a translation method for machine translation of a translated text as shown in fig. 1, wherein the method comprises the following steps:

s110, receiving a source sentence to be translated.

S120, performing word segmentation processing on the source sentences;

it will be appreciated that the participle processing is performed on each sentence in the source sentence received to be translated.

S130, acquiring the part of speech of each word in the participle;

it should be noted that, each sentence in the source sentence is segmented, then, part-of-speech tagging is performed on each word by using a part-of-speech tagging tool to obtain the part-of-speech of each word, and an abbreviation sign corresponding to the part-of-speech is obtained by querying a part-of-speech abbreviation list. And finally, connecting the original word with the corresponding part-of-speech abbreviation symbol through a _' symbol to obtain a word/part-of-speech character string and replace the original word in the source sentence.

S140, according to the word vector model, the part of speech is fused into the word vector corresponding to the word, and a fused word vector sequence is obtained;

word vector (WordVector) a collective term for a set of language modeling and feature learning techniques in embedded Natural Language Processing (NLP) where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions.

In one implementation of the present invention, all the words/part-of-speech character strings in the source sentence obtained in steps S120 and S130 are counted to construct a dictionary, and then each word/part-of-speech character string in the dictionary is indexed and stored. And then converting the words/part-of-speech character strings in the sentences into index values, inputting the index value sequence represented by each sentence into a skip-gram model for training to obtain trained word vectors with part-of-speech characteristics fused, and obtaining a fused word vector sequence.

Illustratively, as shown in FIG. 2, the input is w (t), and the outputs are w (t-2), w (t-1), w (t +2), and w (t +1), respectively, after the skip-gram model training.

And S150, inputting the word vector sequence into a coder decoder model to obtain a coding and decoding result.

It should be noted that, the word vector sequence is input into the encoder/decoder model, the trained word vectors are used to replace the words of each sentence in the original corpus, and the sentences in the corpus are converted into the word vector sequence. And then the word vector sequence is used as input and sent into a coder decoder model to obtain a coding and decoding result. The codec model structure is shown in fig. 3.

S160, evaluating the result of the coding and decoding result based on a beam search evaluation function, wherein the beam search evaluation function comprises a penalty term based on length comparison and a penalty term of repeated detection;

it will be appreciated that beam searching is a heuristic search algorithm that explores the graph by expanding the most promising nodes in a finite set. The beam search is an optimization of the best-first search, which can reduce its memory requirements. The best search is a graph search that ranks all partial solutions (states) according to some heuristic commands that attempt to predict how close the partial solutions are to the complete solution (target state). But only a predetermined number of best local solutions are retained as candidates in the beam search. According to the embodiment of the invention, the evaluation function of beam search is improved, and a penalty term based on repeated detection and a penalty term based on a length ratio are added.

S170, obtaining a translation according to the evaluation result;

and obtaining a final translation through a coder decoder model and beam searching.

In an implementation manner of the present invention, the beam search evaluation function is specifically expressed as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

and adding a penalty term based on repeated detection into the beam search evaluation function for solving the problem of repeated content of the translation.

It should be noted that, in the embodiment of the present invention, the evaluation function of beam search is improved, and a penalty term based on the length ratio and a penalty term based on duplicate detection are added. Aiming at the problem that the machine translation is too long or too short in length, the length ratio punishment item is obtained by counting the ratio of the length of the source sentence to the length of the translation, the punishment item is used in an evaluation function of the candidate word in the beam search, the punishment item is repeatedly detected, the translation is divided into segments with different sizes for comparison, the distance between the position where the repeated word appears and the position to be translated is also taken into consideration, and finally the obtained punishment item is used in the evaluation function of the beam search on the candidate word. The problems of repeated segments and source sentence omission in the translated text are solved, and the method is wide in application range, strong in pertinence and high in translated text quality.

wherein c is the index of the current translation word, delta is the range of repeated detection, epsilon is a penalty coefficient, y is the matrix corresponding to the candidate translation, and y is _c-j ，y _c-i-j Two matrixes, i and j, of repeated detection are respectively used as traversal variables.

Referring to fig. 4, the candidate sentences searched by the whole beam and the parameters δ and ∈ in the formula are used as the input of the algorithm, and the candidate sentences are divided into a plurality of segments with different sizes for comparison, and the punishment items are respectively calculated, and finally the weighted accumulation is performed. FIG. 5A block diagram illustrating a cumulative distribution function F calculated from a current candidate word and a current length _X (x) And taking a parameter theta in the formula as the input of the algorithm, firstly obtaining whether a candidate word is EOS through vector operation, if so, the candidate word is 1, and if not, the candidate word is 0. Then, the value of l (x) in the formula is obtained by dot multiplication.

In one implementation manner of the present invention, the step of evaluating the result based on a beam search evaluation function for the coding and decoding result includes:

probability F of ending the translation when the sentence end mark and the common word appear in the candidate words of the beam search at the same time _X (x) And probability 1-F that the translation is not finished _X (x) Are added to the evaluation function l (x) ═ θ F _X (x)，not_EOS，l(x)＝θ(1-F _X (x)) EOS, where EOS is a period end marker and θ is a parameter;

and evaluating the result based on the beam search evaluation function.

It can be understood that the length of the source sentence and the length of the target translation are respectively counted, the length ratio of the source sentence to the target translation is calculated, and then the length ratio obtained through linear regression is fitted to obtain the cumulative distribution function F of the source sentence and the target translation _X (x) P (X < X), wherein,

i.e., the ratio of the length of the target translation to the length of the source sentence. When EOS (period end marker) and common words appear in the candidate words of the beam search at the same time, probability FX (x) that the translation has ended and probabilities 1-F that the translation has not ended _X (x) Evaluation functions l (x) θ F added to them, respectively _X (x)，not_EOS，l(x)＝θ(1-F _X (x) EOS). When the candidate word is an EOS mark, multiplying the probability which is not translated well by a penalty factor to serve as a penalty item, and when the candidate word is not the EOS mark, multiplying the probability which is translated completely by the penalty factor to serve as a penalty item. And finally, adding the obtained penalty term based on the length ratio into an evaluation function of beam searching, as shown in fig. 5.

The final optimal translation is obtained through the codec model and the beam search, as shown in fig. 6.

It should be noted that the codec is a deep learning framework and combines with a beam search evaluation function to obtain a final optimal translation, so that the problems of repeated segments and source sentences omission are solved, and the codec has a wide application range, strong pertinence and high translated text quality.

It should be noted that a Bidirectional Recurrent Neural Network (Bi-RNN) is a Network structure improved based on a Recurrent Neural Network. In some tasks, the inputs to the network are not only related to past inputs, but also have some correlation with subsequent inputs. Therefore, in addition to the forward sequence, a reverse sequence is also input. The bidirectional cyclic neural network consists of two layers of cyclic neural networks, supports simultaneous input of a forward sequence and a reverse sequence, and effectively improves the performance of the network.

In an implementation manner of the present invention, the step of inputting the word vector sequence into a coder/decoder model to obtain a coding/decoding result includes:

inputting the sequence of word vectors into a coder-decoder model;

It will be appreciated that the encoder decoder is a deep learning framework, the encoder being arranged to convert a sequence of word vectors into a sentence vector and the decoder being arranged to convert the sentence vector into a sequence of word vectors.

The invention also provides a translation device for translating the translated text by the machine, which comprises a processor and a memory connected with the processor through a communication bus; wherein the content of the first and second substances,

The present invention also provides a computer storage medium storing one or more programs for execution by one or more processors to cause the one or more processors to perform any of the steps of translating a machine translation.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method for translating a machine-translated translation, the method comprising:

receiving a source sentence to be translated;

performing word segmentation processing on the source sentence;

acquiring the part of speech of each word in the participle;

and obtaining a translation according to the evaluation result.

2. The method for translating machine-translated translations according to claim 1, wherein said beam search evaluation function is embodied as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

wherein s (Y, X) is a beam evaluation function, X (Y, X) is a beam search evaluation function, log (P (Y/X)) is a probability function of Y occurrence at the time of X occurrence, d (X) is a penalty term based on duplicate detection, l (X) is a penalty based on length comparison, P is a distribution function;

3. The method for translating machine-translated translations according to claim 2, wherein the specific formula of the duplicate detection penalty term d (x) is expressed as:

4. The method according to claim 2 or 3, wherein said step of evaluating results based on a beam search evaluation function for said codec results comprises:

and evaluating the result based on the beam search evaluation function.

5. The method of claim 1, wherein said encoder/decoder model uses a bi-directional recurrent neural network for both encoder and decoder sections.

6. The method of claim 1, wherein the step of inputting the word vector sequence into a codec model to obtain a codec result comprises:

inputting the sequence of word vectors into a coder-decoder model;

converting the word vector sequence into sentence vectors based on a deep learning framework of a coder-decoder;

7. A translation device for machine translating a translation, the device comprising a processor and a memory coupled to the processor via a communication bus; wherein the content of the first and second substances,

the processor is used for executing the translation program of the machine translation, so as to realize the steps of the translation method of the machine translation in any one of claims 1 to 6.

8. A computer storage medium, characterized in that the computer storage medium stores one or more programs executable by one or more processors to cause the one or more processors to perform the steps of the translation method of machine-translating a translation according to any one of claims 1 to 6.