CN110442880A

CN110442880A - A kind of interpretation method, device and the storage medium of machine translation translation

Info

Publication number: CN110442880A
Application number: CN201910721252.6A
Authority: CN
Inventors: 林芯玥; 刘晋
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2019-11-12
Anticipated expiration: 2039-08-06
Also published as: CN110442880B

Abstract

The invention discloses interpretation method, device and the storage mediums of a kind of machine translation translation, comprising: receives source statement to be translated；Word segmentation processing is carried out to the source statement；Obtain the part of speech of each word in the participle；According to term vector model, the part of speech is incorporated in term vector corresponding to word, fused term vector sequence is obtained；By the term vector sequence inputting into coder-decoder model, encoding and decoding result is obtained；For the encoding and decoding as a result, carrying out evaluation of result based on beam search evaluation function, wherein the beam search evaluation function includes in the penalty term based on length vs and the penalty term for repeating to detect；Translation is obtained according to the evaluation result.Using the embodiment of the present invention, improve duplicated in translation segment and omit source statement the problem of, it is applied widely, with strong points, translation translation quality it is higher.

Description

Translation method, device and storage medium for machine translation translated text

Technical Field

The invention relates to the technical field of machine translation improvement, in particular to a translation method, a translation device and a storage medium for machine translation.

Background

The language is the most important carrier for human information exchange at ordinary times, which has a very important influence on the development of the whole society, and a method for machine automated translation has become an urgent need at present. Implementing automated translation in different languages has resulted in a huge number of application controls.

At present, a machine translation method based on rules needs a professional linguist to formulate a large number of rules, and is high in labor cost and poor in expandability. The machine translation method based on the intermediate language needs to formulate a set of universal intermediate language, and has high difficulty and low robustness. Although the machine translation method based on statistics is low in labor cost and improved in expansibility, the translated text quality is still poor. Machine translation methods based on neural networks are the most advanced machine translation methods at present, but there is room for improvement in the quality of translated text.

Disclosure of Invention

The invention aims to provide a translation method, a translation device and a storage medium for machine translation of a translated text, and aims to solve the problem of poor quality of the translated text generated by the existing machine translation model.

In order to achieve the above object, the present invention provides a translation method for machine-translated translations, the method comprising:

receiving a source sentence to be translated;

performing word segmentation processing on the source sentence;

acquiring the part of speech of each word in the participle;

according to the word vector model, the part of speech is fused into a word vector corresponding to a word, and a fused word vector sequence is obtained;

inputting the word vector sequence into a coder decoder model to obtain a coding and decoding result;

for the coding and decoding result, evaluating the result based on a beam search evaluation function, wherein the beam search evaluation function comprises a penalty term based on length comparison and a penalty term of repeated detection;

and obtaining a translation according to the evaluation result.

Further, the beam search evaluation function is specifically expressed as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

wherein X (Y, X) is a beam search merit function, log (P (Y/X)) is a probability function of Y occurrence at X occurrence, d (X) is a penalty term based on duplicate detection, l (X) is a penalty based on length comparison, P is a distribution function;

adding a penalty term based on a length ratio into the beam search evaluation function for solving the problem of partial missing translation;

and adding a penalty term based on repeated detection into the beam search evaluation function for solving the problem of repeated content of the translation.

Further, the specific formula of the duplicate detection penalty term d (x) is expressed as:

wherein c is the index of the current translation word, delta is the range of repeated detection, epsilon is a penalty coefficient, y is the matrix corresponding to the candidate translation, and y is_c-j，y_c-i-jTwo matrices, i, j, for repeated detection are traversal variables.

Further, the step of evaluating the result based on the beam search evaluation function for the codec result includes:

the ratio of the length of the source sentence to the length of the target translation;

fitting the length ratio through linear regression to obtain a cumulative distribution function;

probability F of ending translation when period end mark and common word appear in candidate word of beam search_X(x) And probability 1-F that the translation is not finished_X(x) Are added to the evaluation function l (x) ═ θ F_X(x)，not_EOS，l(x)＝θ(1-F_X(x) EOS, where EOS is a period end marker and θ is a parameter;

when the candidate word is a sentence end mark, multiplying the probability which is not translated well by a penalty factor to be used as a penalty item;

when the candidate word is not the sentence end mark, multiplying the probability of completing translation by a penalty factor to be used as a penalty item;

adding the obtained penalty term based on the length ratio into an evaluation function of beam searching;

and evaluating the result based on the beam search evaluation function.

Further, in the encoder-decoder model, a bidirectional recurrent neural network is used for both the encoder part and the decoder part.

Further, the step of inputting the word vector sequence into a coder-decoder model to obtain a coding and decoding result includes:

inputting the sequence of word vectors into a coder-decoder model;

converting the word vector sequence into a sentence vector based on a deep learning framework of a coder-decoder;

at the decoder, the sentence vectors are converted into a sequence of word vectors.

In addition, the invention also discloses a machine translation device, which comprises a processor and a memory connected with the processor through a communication bus; wherein,

the memory is used for storing a translation program of the machine translation;

the processor is used for executing a translation program of the machine translation, so as to realize the translation step of any machine translation.

And a computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to cause the one or more processors to perform any of the steps of translating a machine translation.

By applying the translation method, the translation device and the translation storage medium for the machine-translated translation provided by the embodiment of the invention, while the semantic association between different words is established by effectively constructing vectors, the meaning of the words under different parts of speech is included, the problems of repeated segments and source sentences omission in the translation are improved by correcting the beam search evaluation function, and the translation method, the translation device and the translation storage medium have the advantages of wide application range, strong pertinence and higher translated text quality.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an embodiment of the present invention.

Fig. 3 is another schematic structural diagram of an embodiment of the present invention.

FIG. 4 is a schematic diagram illustrating a penalty term algorithm for duplicate detection according to an embodiment of the present invention.

FIG. 5 is a schematic diagram illustrating a penalty term algorithm for length ratio according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating effects in the translation according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

The language Model (language Model) is a simple, uniform and abstract form system, and the language objective fact can be automatically processed by a computer after being described by the language Model, so that the language Model has great significance for information processing of natural language and has important effects on research such as part-of-speech tagging, syntactic analysis, speech recognition and the like.

In the machine translation problem, both the input original language sentence and the output target language translation can be regarded as a sequence, and thus the machine translation can be regarded as a sequence-to-sequence problem. The mainstream method for solving the sequence-to-sequence problem at present is a coder-decoder model, wherein a coder codes an original language sentence into a sentence vector, and a decoder decodes the sentence vector again to obtain a target language translation.

It should be noted that a recurrent neural network RNN is generally used as the encoder and the decoder. Recurrent Neural Networks (RNNs) are a classical neural Network structure that contains recurrent neural elements and thus can process serializable data and allow persistence of data information. The RNN will train the current input along with the previous input as parameters and get the output. A bidirectional recurrent Neural Network (Bi-RNN) is a Network structure improved based on the recurrent Neural Network. In some tasks, the inputs to the network are not only related to past inputs, but also have some correlation with subsequent inputs. Therefore, in addition to the forward sequence, the reverse sequence is also input. The bidirectional cyclic neural network consists of two layers of cyclic neural networks, supports simultaneous input of a forward sequence and a reverse sequence, and effectively improves the performance of the network.

Word vector (WordVector) a collective term for a set of language modeling and feature learning techniques in embedded Natural Language Processing (NLP) where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions. The Skip-gram model is a model structure used for generating distributed representation of words when a neural network trains a language model, and the Skip-gram model predicts the context of a current word by taking a word vector of the word as input. Beam Search (Beam Search) Beam Search is a heuristic Search algorithm that explores the graph by expanding the most promising nodes in a finite set.

The beam search is an optimization of the best-first search, which can reduce its memory requirements. The best search is a graph search that ranks all partial solutions (states) according to some heuristic commands that attempt to predict how close the partial solutions are to the complete solution (target state). But only a predetermined number of best local solutions are retained as candidates in the beam search.

Please refer to fig. 1-6. It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The invention provides a translation method for machine translation of a translated text as shown in fig. 1, wherein the method comprises the following steps:

s110, receiving a source sentence to be translated.

S120, performing word segmentation processing on the source sentences;

it will be appreciated that the participle processing is performed on each sentence in the source sentence received to be translated.

S130, acquiring the part of speech of each word in the participle;

it should be noted that, each sentence in the source sentence is segmented, then, part-of-speech tagging is performed on each word by using a part-of-speech tagging tool to obtain the part-of-speech of each word, and an abbreviation symbol corresponding to the part-of-speech is obtained by querying a part-of-speech abbreviation table. And finally, connecting the original word with the corresponding part-of-speech abbreviation symbol through a _' symbol to obtain a word/part-of-speech character string and replace the original word in the source sentence.

S140, according to the word vector model, the part of speech is fused into the word vector corresponding to the word, and a fused word vector sequence is obtained;

word vector (WordVector) a collective term for a set of language modeling and feature learning techniques in embedded Natural Language Processing (NLP) where words or phrases from a vocabulary are mapped to vectors of real numbers. Conceptually, it involves mathematical embedding from a one-dimensional space of each word to a continuous vector space with lower dimensions.

In one implementation of the present invention, all the words/part-of-speech character strings in the source sentence obtained in steps S120 and S130 are counted to construct a dictionary, and then each word/part-of-speech character string in the dictionary is indexed and stored. And then converting the words/part-of-speech character strings in the sentences into index values, inputting the index value sequence represented by each sentence into a skip-gram model for training to obtain trained word vectors with part-of-speech characteristics fused, and obtaining the fused word vector sequence.

Illustratively, as shown in FIG. 2, the input is w (t), and the outputs are w (t-2), w (t-1), w (t +2), and w (t +1), respectively, after the skip-gram model training.

And S150, inputting the word vector sequence into a coder decoder model to obtain a coding and decoding result.

It should be noted that, the word vector sequence is input into the encoder/decoder model, the trained word vectors are used to replace the words of each sentence in the original corpus, and the sentences in the corpus are converted into the word vector sequence. And then the word vector sequence is used as input and sent into a coder decoder model to obtain a coding and decoding result. The codec model structure is shown in fig. 3.

S160, aiming at the coding and decoding result, evaluating the result based on a beam search evaluation function, wherein the beam search evaluation function comprises a penalty item based on length comparison and a penalty item of repeated detection;

it will be appreciated that beam searching is a heuristic search algorithm that explores the graph by expanding the most promising nodes in a finite set. The beam search is an optimization of the best-first search, which can reduce its memory requirements. The best search is a graph search that ranks all partial solutions (states) according to some heuristic commands that attempt to predict how close the partial solutions are to the complete solution (target state). But only a predetermined number of best local solutions are retained as candidates in the beam search. According to the embodiment of the invention, the evaluation function of beam searching is improved, and a penalty term based on repeated detection and a penalty term based on a length ratio are added.

S170, obtaining a translation according to the evaluation result;

and obtaining a final translation through a coder decoder model and beam searching.

In an implementation manner of the present invention, the beam search evaluation function is specifically expressed as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

It should be noted that, in the embodiment of the present invention, the evaluation function of beam search is improved, and a penalty term based on the length ratio and a penalty term based on duplicate detection are added. Aiming at the problem that the machine translation is too long or too short in length, the length ratio punishment item is obtained by counting the ratio of the length of the source sentence to the length of the translation, the punishment item is used in an evaluation function of the candidate word in the beam search, the punishment item is repeatedly detected, the translation is divided into segments with different sizes for comparison, the distance between the position where the repeated word appears and the position to be translated is also taken into consideration, and finally the obtained punishment item is used in the evaluation function of the beam search on the candidate word. The problems of repeated segments and source sentence omission in the translated text are solved, and the method is wide in application range, strong in pertinence and high in translated text quality.

Referring to fig. 4, the candidate sentences searched by the whole beam and the parameters δ and ∈ in the formula are used as the input of the algorithm, and the candidate sentences are divided into a plurality of segments with different sizes for comparison, and the punishment items are respectively calculated, and finally the weighted accumulation is performed. FIG. 5 herein calculates the current candidate word and the current length to obtain the value F of the cumulative distribution function_X(x) And taking a parameter theta in the formula as the input of the algorithm, firstly obtaining whether the candidate word is EOS through vector operation, if so, the candidate word is 1, and if not, the candidate word is 0. Then, the value of l (x) in the formula is obtained by dot multiplication.

In one implementation manner of the present invention, the step of evaluating the result based on a beam search evaluation function for the coding and decoding result includes:

and evaluating the result based on the beam search evaluation function.

It can be understood that the length of the source sentence and the length of the target translation are respectively counted, the length ratio of the source sentence to the target translation is calculated, and then the length ratio obtained through linear regression is fitted to obtain the cumulative distribution function F of the source sentence and the target translation_X(x) P (X < X), wherein,

i.e., the ratio of the length of the target translation to the length of the source sentence. When EOS (period end marker) and common words appear in the candidate words of the beam search at the same time, probability FX (x) that the translation has ended and probabilities 1-F that the translation has not ended_X(x) Evaluation functions l (x) θ F added to them, respectively_X(x)，not_EOS，l(x)＝θ(1-F_X(x) EOS). When the candidate word is an EOS marker, it will not yet be translated wellAnd multiplying the probability of completing translation by a penalty factor to serve as a penalty item when the candidate word is not the EOS mark. And finally, adding the obtained penalty term based on the length ratio into an evaluation function of beam searching, as shown in fig. 5.

The final optimal translation is obtained through the codec model and the beam search, as shown in fig. 6.

It should be noted that the codec is a deep learning framework and combines with a beam search evaluation function to obtain a final optimal translation, solves the problems of repeated segments and source sentence omission, and has a wide application range, strong pertinence and high translated text quality.

It should be noted that a Bidirectional Recurrent Neural Network (Bi-RNN) is a Network structure improved based on a Recurrent Neural Network. In some tasks, the inputs to the network are not only related to past inputs, but also have some correlation with subsequent inputs. Therefore, in addition to the forward sequence, the reverse sequence is also input. The bidirectional cyclic neural network consists of two layers of cyclic neural networks, supports simultaneous input of a forward sequence and a reverse sequence, and effectively improves the performance of the network.

In an implementation manner of the present invention, the step of inputting the word vector sequence into a coder/decoder model to obtain a coding/decoding result includes:

inputting the sequence of word vectors into a coder-decoder model;

It will be appreciated that the encoder decoder is a deep learning framework, the encoder being arranged to convert a sequence of word vectors into a sentence vector and the decoder being arranged to convert the sentence vector into a sequence of word vectors.

The invention also provides a translation device for translating the translated text by the machine, which comprises a processor and a memory connected with the processor through a communication bus; wherein,

The present invention also provides a computer storage medium storing one or more programs for execution by one or more processors to cause the one or more processors to perform any of the steps of translating a machine translation.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method for translating a machine-translated translation, the method comprising:

receiving a source sentence to be translated;

performing word segmentation processing on the source sentence;

acquiring the part of speech of each word in the participle;

and obtaining a translation according to the evaluation result.

2. The method for translating machine-translated translations according to claim 1, wherein said beam search evaluation function is embodied as:

s(Y，X)＝log(P(Y|X))+d(x)+l(x)

wherein s (Y, X) is a beam evaluation function, X (Y, X) is a beam search evaluation function, log (P (Y/X)) is a probability function of Y occurrence at the time of X occurrence, d (X) is a penalty term based on duplicate detection, l (X) is a penalty based on length comparison, P is a distribution function;

3. The method for translating machine-translated translations according to claim 2, wherein the specific formula of the duplicate detection penalty term d (x) is expressed as:

4. The method according to claim 2 or 3, wherein said step of evaluating the result of said codec based on a beam search evaluation function comprises:

and evaluating the result based on the beam search evaluation function.

5. The method of claim 1, wherein the encoder and decoder models use bi-directional recurrent neural networks for both encoder and decoder sections.

6. The method of claim 1, wherein the step of inputting the word vector sequence into a codec model to obtain a codec result comprises:

inputting the sequence of word vectors into a coder-decoder model;

7. A translation device for machine translating a translation, the device comprising a processor and a memory coupled to the processor via a communication bus; wherein,

the processor is used for executing the translation program of the machine translation, so as to realize the translation step of the machine translation in any claim 1 to 6.

8. A computer storage medium having one or more programs stored thereon that are executable by one or more processors to cause the one or more processors to perform the steps of translating a machine translation according to any one of claims 1 to 6.