CN109670191A - Calibration optimization method, device and the electronic equipment of machine translation - Google Patents
Calibration optimization method, device and the electronic equipment of machine translation Download PDFInfo
- Publication number
- CN109670191A CN109670191A CN201910066709.4A CN201910066709A CN109670191A CN 109670191 A CN109670191 A CN 109670191A CN 201910066709 A CN201910066709 A CN 201910066709A CN 109670191 A CN109670191 A CN 109670191A
- Authority
- CN
- China
- Prior art keywords
- sample
- translation
- postedit
- training
- machine translation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
Abstract
The embodiment of the present invention provides calibration optimization method, device and the electronic equipment of a kind of machine translation, the method comprise the steps that original text and machine translation based on destination document, utilize the neural network model for the multi-task learning that training is completed, mechanical translation quality assessment is carried out, and automatic postedit is carried out to the machine translation of the destination document;Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, training is iterated to the neural network model of basic multi-task learning and updates acquisition, any training sample includes sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.The embodiment of the present invention can effectively combine closely the assessment of machine translation automated quality and the two mutually indepedent and closely related tasks of automatic postedit, so as to more effectively improve postedit efficiency and translation quality.
Description
Technical field
The present embodiments relate to machine translation mothod field, the calibration more particularly, to a kind of machine translation optimizes
Method, apparatus and electronic equipment.
Background technique
Machine translation is that a kind of natural language (original language) is converted to another natural language (target language using computer
Speech) process.Although the overall translation quality of current machine translation system is improved constantly, the stability of its quality also without
Method is protected.For example, the quality of output is unable to satisfy certain standard sometimes for the input of some special duties translation.
In this case, translator is in postedit these quality translation fluctuated, generally require to spend a large amount of energy and
Time, which does, examines and revises, this undoubtedly will affect the working efficiency of interpreter.
Machine translation automated quality assesses (Quality Estimation for Machine Translation, QE) can
Intelligent predicting is carried out with the quality to MT engine output sentence.By explicit quality annotation, translator can be light
Loose ground selection is postedit to be done on the basis of machine translation result or oneself is translated from the beginning.
However, being often desirable to intelligent supplementary translation in practical application further can assist translation.This
Task is automatic postedit (Automatic Post-Edit, APE).Automatic postedit task is not to occur to turn in machine
The inside of engine is translated, but automotive engine system is considered as a black box, and the output of this black box is automated in outside
Amendment, to obtain the translation of better quality.
So, how the assessment of machine translation automated quality and the two mutually independent tasks of automatic postedit to be had
Effect combine more effectively to improve translation quality the problem of, become current industry it is urgently to be resolved need project.
Summary of the invention
In order to overcome the above problem or at least be partially solved the above problem, the embodiment of the present invention provides a kind of machine and turns over
Calibration optimization method, device and the electronic equipment translated, the assessment of machine translation automated quality and automatic postedit to have
The combination of effect, so as to more effectively improve postedit efficiency and translation quality.
In a first aspect, the embodiment of the present invention provides a kind of calibration optimization method of machine translation, comprising:
Original text and machine translation based on destination document, using training complete multi-task learning neural network model,
Mechanical translation quality assessment is carried out, and automatic postedit is carried out to the machine translation of the destination document;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of trained sample in advance
This is iterated training to the neural network model of basic multi-task learning and updates acquisition, and any training sample includes
Sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.
Second aspect, the embodiment of the present invention provide a kind of calibration optimization device of machine translation, comprising:
Data acquisition module, for obtaining the original text and machine translation of destination document;
Assessment and postedit output module, it is complete using training for original text and machine translation based on the destination document
At multi-task learning neural network model, mechanical translation quality assessment is carried out, and to the machine of the destination document
Translation carries out automatic postedit;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of trained sample in advance
This is iterated training to the neural network model of basic multi-task learning and updates acquisition, and any training sample includes
Sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising: at least one processor, at least one
Manage device, communication interface and bus;The memory, the processor and the communication interface are completed mutual by the bus
Communication, the communication interface between the electronic equipment and machine translating apparatus information transmission;In the memory
It is stored with the computer program that can be run on the processor, when the processor executes the computer program, is realized such as
The calibration optimization method of machine translation described in upper first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, the non-transient calculating
Machine readable storage medium storing program for executing stores computer instruction, and the computer instruction executes the computer described in first aspect as above
The calibration optimization method of machine translation.
Calibration optimization method, device and the electronic equipment of machine translation provided in an embodiment of the present invention, by using more
The neural network model of business study, and the training sample comprising sample mechanical translation quality label and sample postedit text is utilized in advance
This is trained model, can effectively by the assessment of machine translation automated quality and automatic postedit the two independently of each other again
Closely related task is combined closely, so as to more effectively improve postedit efficiency and translation quality.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of the calibration optimization method of machine translation provided in an embodiment of the present invention;
The nerve net of multi-task learning in the calibration optimization method of Fig. 2 machine translation that an embodiment provides according to the present invention
The structural schematic diagram of network model;
Fig. 3 is the structural schematic diagram of the calibration optimization device of machine translation provided in an embodiment of the present invention;
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment in the embodiment of the present invention, ability
Domain those of ordinary skill every other embodiment obtained without making creative work, belongs to the present invention
The range of embodiment protection.
The embodiment of the present invention by being accomplished manually, takes time and effort for postedit task in the prior art, low efficiency etc. is asked
Topic, by using the neural network model of multi-task learning, and utilizes in advance comprising sample mechanical translation quality label and sample
The training sample of postedit text is trained model, can be effectively by the assessment of machine translation automated quality and automatic postedit
The two mutually indepedent and closely related tasks are combined closely, so as to more effectively improve postedit efficiency and turn over
Translate quality.Expansion explanation and introduction will be carried out to the embodiment of the present invention especially by multiple embodiments below.
Fig. 1 is the flow diagram of the calibration optimization method of machine translation provided in an embodiment of the present invention, as shown in Figure 1,
This method comprises:
S101 obtains the original text and machine translation of destination document.
It is appreciated that being usually by the language expression of a certain languages, as original for destination document to be processed
Text.The destination document is translated, is the language expression for being converted into another languages of identical semanteme, as translates
Text.During carrying out document translation using machine, the original text of destination document is inputted, machine translator can be exported according to the original text
Corresponding machine translation.
Due to the limitation of machine translation itself, the quality for the machine translation that machine translation directly exports cannot usually be protected
Barrier.For example, for original text " Little John found for the toy box in the pen ", it can using machine translation
It can obtain machine translation " small John has found toy box in pen ", it is clear that it is poor that the result and reality scene of machine translation exist
It is different.And postedit refers to the machine translation that machine translator directly exports further is calibrated and edited, to make to translate
Translation semanteme be more consistent with original text.Therefore before carrying out postedit to machine translation, the original text and machine first to be obtained
Device translation.For example, the original text and machine translation can be obtained from MT engine, can also be obtained from database.
S102, original text and machine translation based on destination document utilize the neural network for the multi-task learning that training is completed
Model carries out mechanical translation quality assessment, and carries out automatic postedit to the machine translation of destination document.
Wherein, the neural network model for the multi-task learning that training is completed is to utilize a certain amount of training sample in advance, right
The neural network model of basic multi-task learning be iterated training update obtain, any training sample include sample original text,
Sample machine translation, sample mechanical translation quality label and sample postedit text.
It is appreciated that though the assessment of machine translation automated quality and automatic postedit are two independent tasks, they it
Between the degree of correlation it is very big.Firstly, two tasks are trained using similar data set, and using supervision type machine learning algorithm
Model realizes, i.e., original text to be translated, the translation of machine translation and the translation of interpreter's postedit;Secondly, the two passes through one
A little strategies, which are combined, tends to provide preferably output result.
Therefore, the embodiment of the present invention, which utilizes, is based on multi-task learning (Multi-Task Learning, MTL) neural network
Model, while doing the assessment of (sentence surface) mechanical translation quality and automation this two biggish tasks of correlation of postedit.
Specifically, being translated on the basis of obtaining the original text and machine translation of destination document according to above-mentioned steps with the original text and machine
Text is data basis, carries out a series of transformation, obtains the data type that the neural network model of multi-task learning is capable of handling.It
Afterwards by transformation results input multi-task learning neural network model, using inside the model neural n ary operation and transmitting, can
Mechanical translation quality assessment is carried out simultaneously and automatic postedit is carried out to machine translation.
For example, for above-mentioned machine translation " small John has found toy box in pen ", it, may be defeated after automatic postedit
Automatic postedit is literary " small John has found toy box in fence " out.And the quality evaluation to the machine translation exportable simultaneously
As a result general for quality.
It is understood that being needed in advance to accurately carry out mechanical translation quality assessment operation and postedit operation
The neural network model of the basic multi-task learning of initialization building one, and using a certain amount of training sample to the basic model
It is trained.And in the establishment for being trained sample set, for any training sample, include at least original corresponding to sample
Text, machine translation, mechanical translation quality label and postedit text, i.e. sample original text, sample machine translation, sample machine translation matter
Measure label and sample postedit text.Wherein mechanical translation quality tag characterization obtains sample machine translation to machine translation is carried out
Translation quality assessment.
The calibration optimization method of machine translation provided in an embodiment of the present invention, by using the neural network of multi-task learning
Model, and model is instructed using the training sample comprising sample mechanical translation quality label and sample postedit text in advance
Practice, it can be effectively by the assessment of machine translation automated quality and the two mutually indepedent and closely related tasks of automatic postedit
It combines closely, so as to more effectively improve postedit efficiency and translation quality.
Wherein, according to the above embodiments optionally, using the training sample, training obtains the more of the training completion
The step of neural network model of tasking learning, specifically includes: for any training sample, being based on the training sample, utilizes basis
The neural network model of multi-task learning exports the prediction mechanical translation quality of the training sample and translates text after predicting;Respectively
Prediction mechanical translation quality is translated into text with after sample mechanical translation quality label, prediction and sample postedit text is compared,
Obtain prediction error;Basic multi-task learning is updated using back-propagation algorithm and gradient descent algorithm based on prediction error
The parameter of neural network model, and using the neural network model of updated basic multi-task learning as next training sample
The neural network model of basic multi-task learning, until obtaining the neural network model for the multi-task learning that training is completed.
It is appreciated that the embodiment of the present invention before the neural network model to multi-task learning is trained, will first search
Collection data simultaneously do some pretreatments, to obtain the data set of certain scale.From the data set, a certain amount of data can be chosen and made
For training sample, be trained one by one using neural network model of these training samples to the basic multi-task learning of building and
Update, finally obtain meet certain required precision model be training complete multi-task learning neural network model.
When being collected and being handled to data, for each data, needs to collect following data and handled: being searched
Collect a document original text to be translated, and it can be segmented;The machine translation translation of above-mentioned document to be translated is collected, and
It can be segmented;Interpreter is collected to the postedit translation of above-mentioned machine translation translation, and it can be segmented;Acquisition pair
Above-mentioned machine translation translation carries out the quality tab of quality evaluation.
Later, t data can be randomly selected from entire data set, composition M set is used for the training and test of model,
Middle M set may be expressed as: M={ (m11,m12,m13,m14),…,…,(mt1,mt2,mt3,mt4), wherein (mi1,mi2,mi3,mi4)
Represent (original text segmented, the machine translation translation segmented, the postedit translation segmented, the quality mark of the i-th data
Label).It needs to reshuffle the original series of M later, the data that can choose wherein 80% form training set Mtrain, remaining
20% formed verifying collection Mtest。
During being trained one by one using above-mentioned training sample to model, first by the data packet of training sample
It includes, is transformed into the data type that the neural network model of multi-task learning is capable of handling, then transformed data are inputted wait instruct
The neural network model of experienced basic multi-task learning, Xiang Yun before being carried out using the neural network model of basic multi-task learning
It calculates, obtain the prediction mechanical translation quality of the training sample and translates text after predicting.
Later, prediction mechanical translation quality is compared with sample mechanical translation quality label, seeks a prediction and misses
Difference is as the first prediction error.Meanwhile text and sample postedit text will be translated after prediction and be compared, it seeks another prediction and misses
Difference is as the second prediction error.Then it is according to the neural network model of the current multi-task learning of the two prediction error judgments
It is no to have had reached precision of prediction, if so, training is completed, using the neural network model of current multi-task learning as training
The neural network model of the multi-task learning of completion.Otherwise, predict errors in the mind of multi-task learning to be trained the two
It is carried out through backpropagation in network model, and using parameter of the gradient descent method to the neural network model of basic multi-task learning
It updates.
Followed by, next training sample is taken out, and by the nerve net of the updated basic multi-task learning of above-mentioned parameter
Training object of the network model as next training sample, repeats above-mentioned trained renewal process, until according to prediction error
Judgement knows that the neural network model of the multi-task learning after certain training has had reached precision of prediction, then confirms training
It completes, the neural network model for the multi-task learning that the neural network model of multi-task learning at this time is completed as training.
It is understood that being on the basis of obtaining the neural network model for the multi-task learning that above-mentioned training is completed
The universality for further verifying the model verifies the model using the data that above-mentioned verifying is concentrated.If verifying knows it
Precision is met the requirements, then confirms that the model is reliable, can be used for the calibration optimization application of actual machine translation.
It is understood that utilizing training sample, training obtains the neural network mould for the multi-task learning that training is completed
Before the step of type, the method for the embodiment of the present invention can also include: to obtain to the postedit cost of sample machine translation, and base
Sample mechanical translation quality label is obtained by normalized and segment processing in postedit cost and sample original text.After compile
Collecting cost indicates to carry out postedit to sample machine translation, obtains the cost that sample postedit text is spent.
For any training sample, the embodiment of the present invention can obtain the training sample from sample machine translation to sample first
The postedit cost of the postedit process of postedit text, postedit cost characterization carry out postedit acquisition to sample machine translation
The cost that sample postedit text is spent.For example, obtaining in the embodiment of the present invention to the postedit cost of sample machine translation
Step can specifically include: during carrying out postedit acquisition sample postedit text to sample machine translation, statistics is carried out
The total degree of the tapped keyboard of postedit, as postedit cost.
Later, need to be converted to postedit cost the mechanical translation quality label that machine can identify.It is understood that
It is that postedit cost is bigger, illustrates that the quality of machine translation translation is poorer, on the contrary then quality is better.Therefore in the process of conversion
In, calculating is first normalized according to postedit cost and sample original text, to eliminate the difference between different sample original texts.Separately
Outside, for the fine or not degree according to postedit at original evaluation machine translation, region is carried out to the result that above-mentioned normalization calculates
It divides, i.e. progress segment processing, and different labels is defined to variant piecewise interval to get sample mechanical translation quality is arrived
Label.
The embodiment of the present invention can utilize multitask by defining different quality tabs to different machines translation quality
The neural network model of study carries out the classification that different quality degree is more accurately carried out in quality evaluation training process, thus root
Automatic postedit is preferably carried out according to the quality of machine translation, keeps postedit result more acurrate.
It is wherein optional, it is based on postedit cost and sample original text, by normalized and segment processing, obtains sample
The step of mechanical translation quality label, specifically includes: doing division operation to the length of postedit cost and sample original text, and to phase
The result of division operation is normalized;Based on the value of normalized result, normalized result is converted to not
The sample mechanical translation quality label of ad eundem.
By postedit cost, divided by the length of the original text of destination document, (length can be according to original text first for the embodiment of the present invention
The total number of middle word determines), to remove the influence of different document length.Then the above-mentioned calculated result being divided by is returned again
One changes, and a value being such as converted between 0 to 1, is normalized result.Finally, being multiple by 0 to 1 interval division
Continuous subinterval, further according to subinterval locating for the normalized result, by this normalized result corresponding conversion
For multiple and different quality tabs.For being four continuous subintervals by 0 to 1 interval division, it can be obtained as shown in table 1
Normalized result table corresponding with quality tab conversion.
Table 1, normalized result table corresponding with quality tab conversion
Normalized result | Quality tab |
0≤x < 0.25 | 4 (high-quality) |
0.25≤x < 0.5 | 3 (quality is preferable) |
0.5≤x < 0.75 | 2 (quality is general) |
0.75≤x≤1.0 | 1 (of poor quality) |
As shown in table 1, by above-mentioned normalized and conversion, the postedit cost for carrying document different information is converted
At the quality tab for eliminating document difference, the training and optimization of model more rapidly can be accurately completed.
Wherein, according to the above embodiments optionally, mechanical translation quality assessment is carried out, and the machine of destination document is translated
The step of text progress automatic postedit specifically includes: original text and machine translation to destination document carry out word segmentation processing respectively, and
The result of word segmentation processing is inputted into trained original text and translation term vector model, extract original text term vector and machine translation word to
Amount;The neural network model for the multi-task learning that original text term vector and the input training of machine translation term vector are completed, with output
Mechanical translation quality assessment result and to the automatic postedit of machine translation text.
According to the above embodiments it is found that in the nerve that the original text of destination document and machine translation are inputted to multi-task learning
Network model first will carry out a series of transformation, obtain the nerve of multi-task learning using the original text and machine translation as data basis
The data type that network model is capable of handling.This conversion process embodiment of the present invention can use trained original text and translation
Term vector model is realized.
Specifically, it is necessary first to which original text and machine translation to destination document carry out word segmentation processing respectively, have been divided
The original text of word and the machine translation segmented, are the result of word segmentation processing.Later, the result of word segmentation processing is inputted into instruction respectively
The original text and translation term vector model perfected, to extract the term vector of original text and the term vector of machine translation, these term vector energy
The content of enough clearly characterization original text or machine translation, and can be identified by the neural network model of multi-task learning.Finally,
The neural network model for the multi-task learning that original text term vector and the input training of machine translation term vector are completed, passes through the model
Forward direction operation, obtain out mechanical translation quality assessment result and to the automatic postedit of machine translation text and export.
Wherein, before the step of result of word segmentation processing is inputted trained original text and translation term vector model, this
The method of inventive embodiments can also include: the standard list language corpus of acquisition original text languages and translation languages respectively, and respectively
Word segmentation processing is carried out to the standard list language corpus of original text languages and translation languages;Standard list language corpus based on word segmentation processing, is adopted
With Skip-Gram algorithm, training basis original text and translation term vector model, and model hyper parameter is set, obtain trained original
Text and translation term vector model;Wherein, original text languages are languages corresponding with the original text of destination document, and translation languages are and target
The corresponding languages of machine translation of document.
The embodiment of the present invention needs before the processing for carrying out term vector extraction according to above-described embodiment first with term vector
Model training sample is trained term vector model.A pair of basic original text and translation term vector mould are constructed firstly the need of initialization
Type, it is also necessary to the standard list language corpus with corresponding relationship is obtained in original text languages and translation languages corpus.The correspondence
Relationship indicates, pair between single language corpus according to standard translation, in original text corpus and translation corpus with identical semanteme
It should be related to.Later, the standard list language corpus to these with corresponding relationship carries out word segmentation processing respectively.For example, can respectively under
It carries the original text languages of newest wikipedia and single language corpus of translation languages and is segmented.
Next, carrying out the training of basic original text and translation term vector model respectively using Skip-Gram algorithm.For
Some of important hyper parameters, it is also necessary to individually be configured.For example, the dimension of term vector is set as 300, contextual window is set
It is 5.
The embodiment of the present invention is trained basic original text and translation term vector model using standard list language corpus, so that institute
The model accuracy of foundation is higher.
Fig. 2 shows multi-task learnings in the calibration optimization method of the machine translation of the offer of an embodiment according to the present invention
The structural schematic diagram of neural network model, wherein original text languages are English, and translation languages are Chinese.The input of model is original text
" small John has found in pen for " Little John found for the toy box in the pen. " and machine translation
Toy box." by two Algorithms inside model, the i.e. place of mechanical translation quality assessment algorithm and automatic postedit algorithm
Reason, final output mechanical translation quality assessment result (quality tab) and automatic postedit translation.As shown in table 2, it is shown that Fig. 2
Shown in treatment process output and input data.
Table 2, treatment process shown in Fig. 2 output and input tables of data
As shown in table 2, to mode input original text to be translated " Little John found for the toy box
" small John has found toy box in pen in the pen. " and the translation of machine translation." after, model can be accordingly to machine
The quality of translation is assessed, and assessment result is quality general 2, while also carrying out automatic postedit to machine translation, after output
" small John has found toy box to edited result in fence." obviously the result of postedit more meet actual scene, it is semantic more quasi-
Really.
As the other side of the embodiment of the present invention, the embodiment of the present invention provides a kind of machine according to the above embodiments
The calibration of translation optimizes device, and the device for realizing the calibration optimization of machine translation in the above embodiments.Therefore, upper
The description and definition in the calibration optimization method of the machine translation of each embodiment are stated, can be used for each in the embodiment of the present invention hold
The understanding of row module specifically refers to above-described embodiment, is not repeating herein.
One embodiment according to an embodiment of the present invention, the structure of the calibration optimization device of machine translation is as shown in figure 3, be
The structural schematic diagram of the calibration optimization device of machine translation provided in an embodiment of the present invention, which can be used to implement above-mentioned each
The calibration optimization of machine translation in embodiment of the method, the device include: data acquisition module 301 and assessment and postedit output mould
Block 302.Wherein:
Data acquisition module 301 is used to obtain the original text and machine translation of destination document;Assessment and postedit output module
302 be used for original text and machine translation based on destination document, using training complete multi-task learning neural network model, into
The assessment of row mechanical translation quality, and automatic postedit is carried out to the machine translation of destination document;Wherein, the multitask that training is completed
The neural network model of study be in advance using a certain amount of training sample, to the neural network model of basic multi-task learning into
Row iteration training updates acquisition, and any training sample includes sample original text, sample machine translation, sample mechanical translation quality mark
Label and sample postedit text.
Specifically, the quality for the machine translation that machine translation directly exports is logical due to the limitation of machine translation itself
It cannot often ensure.And postedit refers to the machine translation that machine translator directly exports further is calibrated and edited, from
And the semanteme of the translation translated is made more to be consistent with original text.Therefore before carrying out postedit to machine translation, data acquisition
Module 301 will first obtain the original text and machine translation.For example, the original text and machine translation can be obtained from MT engine,
It can also be obtained from database.
Later, postedit output module 302 carries out a series of changes then using above-mentioned original text and machine translation as data basis
It changes, obtains the data type that the neural network model of multi-task learning is capable of handling.Then, postedit output module 302 will become
Change result input multi-task learning neural network model, using inside the model neural n ary operation and transmitting, can simultaneously into
Row mechanical translation quality is assessed and carries out automatic postedit to machine translation.
It is understood that being needed in advance to accurately carry out mechanical translation quality assessment operation and postedit operation
The neural network model of the basic multi-task learning of initialization building one, and using a certain amount of training sample to the basic model
It is trained.And in the establishment for being trained sample set, for any training sample, include at least original corresponding to sample
Text, machine translation, mechanical translation quality label and postedit text, i.e. sample original text, sample machine translation, sample machine translation matter
Measure label and sample postedit text.Wherein mechanical translation quality tag characterization obtains sample machine translation to machine translation is carried out
Translation quality assessment.
The calibration of machine translation provided in an embodiment of the present invention optimizes device, by the way that corresponding execution module is arranged, uses
The neural network model of multi-task learning, and the instruction comprising sample mechanical translation quality label and sample postedit text is utilized in advance
Practice sample model is trained, can effectively by machine translation automated quality assessment and automatic postedit the two mutually it is only
Vertical and closely related task is combined closely, so as to more effectively improve postedit efficiency and translation quality.
It is understood that can be by hardware processor (hardware processor) come real in the embodiment of the present invention
Each relative program module in the device of existing the various embodiments described above.Also, the calibration of the machine translation of the embodiment of the present invention optimizes
Device utilize above-mentioned each program module, can be realized the calibration Optimizing Flow of the machine translation of above-mentioned each method embodiment, with
When realizing the calibration optimization of machine translation in above-mentioned each method embodiment, the beneficial effect of the device generation of the embodiment of the present invention
It is identical as corresponding above-mentioned each method embodiment, above-mentioned each method embodiment can be referred to, details are not described herein again.
As the another aspect of the embodiment of the present invention, the present embodiment provides a kind of electronics according to the above embodiments and sets
It is standby, it is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, comprising: at least one processor with reference to Fig. 4
401, at least one processor 402, communication interface 403 and bus 404.
Wherein, memory 401, processor 402 and communication interface 403 complete mutual communication by bus 404, communicate
Interface 403 is for the information transmission between the electronic equipment and machine translating apparatus;Being stored in memory 401 can be in processor
The computer program run on 402 when processor 402 executes the computer program, realizes the machine as described in the various embodiments described above
The calibration optimization method of device translation.
It is to be understood that including at least memory 401, processor 402, communication interface 403 and bus in the electronic equipment
404, and memory 401, processor 402 and communication interface 403 form mutual communication connection by bus 404, and can be complete
At mutual communication, such as the program instruction of the calibration optimization method of reading machine translated text from memory 401 of processor 402
Deng.In addition, communication interface 403 can also realize the communication connection between the electronic equipment and machine translating apparatus, and achievable
Mutual information transmission, such as realize that the calibration of machine translation optimizes by communication interface 403.
When electronic equipment is run, processor 402 calls the program instruction in memory 401, real to execute above-mentioned each method
Apply method provided by example, for example, original text and machine translation based on destination document, the multitask completed using training
The neural network model of habit carries out mechanical translation quality assessment, and carries out automatic postedit etc. to the machine translation of destination document.
Program instruction in above-mentioned memory 401 can be realized and as independent by way of SFU software functional unit
Product when selling or using, can store in a computer readable storage medium.Alternatively, realizing that above-mentioned each method is implemented
This can be accomplished by hardware associated with program instructions for all or part of the steps of example, and program above-mentioned can store to be calculated in one
In machine read/write memory medium, when being executed, execution includes the steps that above-mentioned each method embodiment to the program;And storage above-mentioned
Medium includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random
Access Memory, RAM), the various media that can store program code such as magnetic or disk.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium also according to the various embodiments described above, this is non-temporarily
State computer-readable recording medium storage computer instruction, the computer instruction execute computer as described in the various embodiments described above
Machine translation calibration optimization method, for example, original text and machine translation based on destination document, utilize training complete
The neural network model of multi-task learning carries out mechanical translation quality assessment, and carries out to the machine translation of destination document automatic
Postedit etc..
Electronic equipment provided in an embodiment of the present invention and non-transient computer readable storage medium, by executing above-mentioned each reality
The calibration optimization method for applying machine translation described in example, using the neural network model of multi-task learning, and utilize in advance comprising
The training sample of sample mechanical translation quality label and sample postedit text is trained model, can effectively turn over machine
It translates automated quality assessment and the two mutually indepedent and closely related tasks of automatic postedit is combined closely, so as to
More effectively improve postedit efficiency and translation quality.
It is understood that the embodiment of device described above, electronic equipment and storage medium is only schematic
, wherein unit may or may not be physically separated as illustrated by the separation member, it can both be located at one
Place, or may be distributed on heterogeneous networks unit.Some or all of modules can be selected according to actual needs
To achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are without paying creative labor
To understand and implement.
By the description of embodiment of above, those skilled in the art is it will be clearly understood that each embodiment can borrow
Help software that the mode of required general hardware platform is added to realize, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned
Substantially the part that contributes to existing technology can be embodied in the form of software products technical solution in other words, the meter
Calculation machine software product may be stored in a computer readable storage medium, such as USB flash disk, mobile hard disk, ROM, RAM, magnetic disk or light
Disk etc., including some instructions, with so that a computer equipment (such as personal computer, server or network equipment etc.)
Execute method described in certain parts of above-mentioned each method embodiment or embodiment of the method.
In addition, those skilled in the art are it should be understood that in the application documents of the embodiment of the present invention, term
"include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, so that including a series of elements
Process, method, article or equipment not only include those elements, but also including other elements that are not explicitly listed, or
Person is to further include for elements inherent to such a process, method, article, or device.In the absence of more restrictions, by
The element that sentence "including a ..." limits, it is not excluded that in the process, method, article or apparatus that includes the element
There is also other identical elements.
In the specification of the embodiment of the present invention, numerous specific details are set forth.It should be understood, however, that the present invention is implemented
The embodiment of example can be practiced without these specific details.In some instances, it is not been shown in detail well known
Methods, structures and technologies, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify implementation of the present invention
Example is open and helps to understand one or more of the various inventive aspects, above to the exemplary embodiment of the embodiment of the present invention
Description in, each feature of the embodiment of the present invention is grouped together into single embodiment, figure or descriptions thereof sometimes
In.
However, the disclosed method should not be interpreted as reflecting the following intention: i.e. the claimed invention is implemented
Example requires features more more than feature expressly recited in each claim.More precisely, such as claims institute
As reflection, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific embodiment party
Thus claims of formula are expressly incorporated in the specific embodiment, wherein each claim itself is real as the present invention
Apply the separate embodiments of example.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the embodiment of the present invention, rather than it is limited
System;Although the embodiment of the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art it is understood that
It is still possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is carried out etc.
With replacement;And these are modified or replaceed, each embodiment skill of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution
The spirit and scope of art scheme.
Claims (10)
1. a kind of calibration optimization method of machine translation characterized by comprising
Original text and machine translation based on destination document are carried out using the neural network model for the multi-task learning that training is completed
Mechanical translation quality assessment, and automatic postedit is carried out to the machine translation of the destination document;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, right
The neural network model of basic multi-task learning is iterated training and updates acquisition, and any training sample includes sample original
Text, sample machine translation, sample mechanical translation quality label and sample postedit text.
2. the method according to claim 1, wherein it is complete that training obtains the training using the training sample
At multi-task learning neural network model the step of specifically include:
For any training sample, it is based on the training sample, using the neural network model of the basic multi-task learning,
It exports the prediction mechanical translation quality of the training sample and translates text after predicting;
Respectively text and institute will be translated after the prediction mechanical translation quality and the sample mechanical translation quality label, the prediction
It states sample postedit text to be compared, obtains prediction error;
The basic multi-task learning is updated using back-propagation algorithm and gradient descent algorithm based on the prediction error
The parameter of neural network model, and using the neural network model of updated basic multi-task learning as next training sample
Basic multi-task learning neural network model, until obtain it is described training complete multi-task learning neural network mould
Type.
3. according to the method described in claim 2, it is characterized in that, utilizing the training sample, training obtains the training
Before the step of neural network model of the multi-task learning of completion, further includes:
The postedit cost to the sample machine translation is obtained, and is based on the postedit cost and the sample original text, is led to
Normalized and segment processing are crossed, the sample mechanical translation quality label is obtained;
The postedit cost indicates to carry out postedit to the sample machine translation, obtains the sample postedit text and spent
Cost.
4. according to the method described in claim 3, it is characterized in that, described former based on the postedit cost and the sample
Text, by normalized and segment processing, the step of obtaining the sample mechanical translation quality label, is specifically included:
Division operation is done to the length of the postedit cost and the sample original text, and normalizing is carried out to the result of division operation
Change processing;
Based on the value of normalized result, the normalized result is converted into the different grades of sample machine
Translation quality label.
5. the method according to claim 1, wherein the progress mechanical translation quality assessment, and to the mesh
The step of machine translation of mark document carries out automatic postedit specifically includes:
Original text and machine translation to the destination document carry out word segmentation processing respectively, and the result of word segmentation processing is inputted and is trained
Good original text and translation term vector model extracts original text term vector and machine translation term vector;
The original text term vector and the machine translation term vector are inputted to the nerve net for the multi-task learning that the training is completed
Network model, to export mechanical translation quality assessment result and to the automatic postedit text of the machine translation.
6. according to the method described in claim 5, it is characterized in that, inputting trained original in the result by word segmentation processing
Before the step of text and translation term vector model, further includes:
The standard list language corpus of original text languages and translation languages is obtained respectively, and respectively to the original text languages and the translation
The standard list language corpus of languages carries out word segmentation processing;
Standard list language corpus based on word segmentation processing, using Skip-Gram algorithm, training basis original text and translation term vector mould
Type, and model hyper parameter is set, obtain the trained original text and translation term vector model;
Wherein, the original text languages are languages corresponding with the original text of the destination document, and the translation languages are and the mesh
Mark the corresponding languages of machine translation of document.
7. according to the method described in claim 3, it is characterized in that, it is described obtain to the postedit of the sample machine translation at
This step of, specifically includes:
During obtaining the sample postedit text to sample machine translation progress postedit, after being carried out by statistics
The total degree for editing tapped keyboard, calculates the postedit cost.
8. a kind of calibration of machine translation optimizes device characterized by comprising
Data acquisition module, for obtaining the original text and machine translation of destination document;
Assessment and postedit output module are completed for original text and machine translation based on the destination document using training
The neural network model of multi-task learning carries out mechanical translation quality assessment, and to the machine translation of the destination document
Carry out automatic postedit;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, right
The neural network model of basic multi-task learning is iterated training and updates acquisition, and any training sample includes sample original
Text, sample machine translation, sample mechanical translation quality label and sample postedit text.
9. a kind of electronic equipment characterized by comprising at least one processor, at least one processor, communication interface and total
Line;
The memory, the processor and the communication interface complete mutual communication, the communication by the bus
Interface is also used to the transmission of the information between the electronic equipment and machine translating apparatus;
The computer program that can be run on the processor is stored in the memory, the processor executes the calculating
When machine program, the method as described in any in claim 1 to 7 is realized.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction makes the computer execute the method as described in any in claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910066709.4A CN109670191B (en) | 2019-01-24 | 2019-01-24 | Calibration optimization method and device for machine translation and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910066709.4A CN109670191B (en) | 2019-01-24 | 2019-01-24 | Calibration optimization method and device for machine translation and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670191A true CN109670191A (en) | 2019-04-23 |
CN109670191B CN109670191B (en) | 2023-03-07 |
Family
ID=66149728
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910066709.4A Active CN109670191B (en) | 2019-01-24 | 2019-01-24 | Calibration optimization method and device for machine translation and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670191B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175335A (en) * | 2019-05-08 | 2019-08-27 | 北京百度网讯科技有限公司 | The training method and device of translation model |
CN110532575A (en) * | 2019-08-21 | 2019-12-03 | 语联网(武汉)信息技术有限公司 | Text interpretation method and device |
CN110543643A (en) * | 2019-08-21 | 2019-12-06 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110555213A (en) * | 2019-08-21 | 2019-12-10 | 语联网(武汉)信息技术有限公司 | training method of text translation model, and text translation method and device |
CN110765791A (en) * | 2019-11-01 | 2020-02-07 | 清华大学 | Automatic post-editing method and device for machine translation |
CN111144137A (en) * | 2019-12-17 | 2020-05-12 | 语联网(武汉)信息技术有限公司 | Method and device for generating edited model corpus after machine translation |
CN112287696A (en) * | 2020-10-29 | 2021-01-29 | 语联网(武汉)信息技术有限公司 | Post-translation editing method and device, electronic equipment and storage medium |
CN112364990A (en) * | 2020-10-29 | 2021-02-12 | 北京语言大学 | Method and system for realizing grammar error correction and less sample field adaptation through meta-learning |
WO2021114625A1 (en) * | 2020-05-28 | 2021-06-17 | 平安科技(深圳)有限公司 | Network structure construction method and apparatus for use in multi-task scenario |
CN114444523A (en) * | 2022-02-10 | 2022-05-06 | 北京间微科技有限责任公司 | Portable off-line machine translation intelligent box |
WO2022166267A1 (en) * | 2021-02-07 | 2022-08-11 | 语联网(武汉)信息技术有限公司 | Machine translation post-editing method and system |
CN115328871A (en) * | 2022-10-12 | 2022-11-11 | 南通中泓网络科技有限公司 | Evaluation method for format data stream file conversion based on machine learning model |
CN116992892A (en) * | 2023-07-06 | 2023-11-03 | 四川语言桥信息技术有限公司 | Method, system and readable storage medium for improving APE model based on data enhancement and multitasking training |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090326913A1 (en) * | 2007-01-10 | 2009-12-31 | Michel Simard | Means and method for automatic post-editing of translations |
US8326598B1 (en) * | 2007-03-26 | 2012-12-04 | Google Inc. | Consensus translations from multiple machine translation systems |
CN105701089A (en) * | 2015-12-31 | 2016-06-22 | 成都数联铭品科技有限公司 | Post-editing processing method for correction of wrong words in machine translation |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN106649282A (en) * | 2015-10-30 | 2017-05-10 | 阿里巴巴集团控股有限公司 | Machine translation method and device based on statistics, and electronic equipment |
CN107301174A (en) * | 2017-06-22 | 2017-10-27 | 北京理工大学 | A kind of automatic post-editing system and method for integrated form based on splicing |
US20170323203A1 (en) * | 2016-05-06 | 2017-11-09 | Ebay Inc. | Using meta-information in neural machine translation |
-
2019
- 2019-01-24 CN CN201910066709.4A patent/CN109670191B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090326913A1 (en) * | 2007-01-10 | 2009-12-31 | Michel Simard | Means and method for automatic post-editing of translations |
US8326598B1 (en) * | 2007-03-26 | 2012-12-04 | Google Inc. | Consensus translations from multiple machine translation systems |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN106649282A (en) * | 2015-10-30 | 2017-05-10 | 阿里巴巴集团控股有限公司 | Machine translation method and device based on statistics, and electronic equipment |
CN105701089A (en) * | 2015-12-31 | 2016-06-22 | 成都数联铭品科技有限公司 | Post-editing processing method for correction of wrong words in machine translation |
US20170323203A1 (en) * | 2016-05-06 | 2017-11-09 | Ebay Inc. | Using meta-information in neural machine translation |
CN109074242A (en) * | 2016-05-06 | 2018-12-21 | 电子湾有限公司 | Metamessage is used in neural machine translation |
CN107301174A (en) * | 2017-06-22 | 2017-10-27 | 北京理工大学 | A kind of automatic post-editing system and method for integrated form based on splicing |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175335A (en) * | 2019-05-08 | 2019-08-27 | 北京百度网讯科技有限公司 | The training method and device of translation model |
CN110175335B (en) * | 2019-05-08 | 2023-05-09 | 北京百度网讯科技有限公司 | Translation model training method and device |
CN110532575A (en) * | 2019-08-21 | 2019-12-03 | 语联网(武汉)信息技术有限公司 | Text interpretation method and device |
CN110543643A (en) * | 2019-08-21 | 2019-12-06 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110555213A (en) * | 2019-08-21 | 2019-12-10 | 语联网(武汉)信息技术有限公司 | training method of text translation model, and text translation method and device |
CN110543643B (en) * | 2019-08-21 | 2022-11-11 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110765791A (en) * | 2019-11-01 | 2020-02-07 | 清华大学 | Automatic post-editing method and device for machine translation |
CN110765791B (en) * | 2019-11-01 | 2021-04-06 | 清华大学 | Automatic post-editing method and device for machine translation |
CN111144137A (en) * | 2019-12-17 | 2020-05-12 | 语联网(武汉)信息技术有限公司 | Method and device for generating edited model corpus after machine translation |
CN111144137B (en) * | 2019-12-17 | 2023-09-05 | 语联网(武汉)信息技术有限公司 | Method and device for generating corpus of machine post-translation editing model |
WO2021114625A1 (en) * | 2020-05-28 | 2021-06-17 | 平安科技(深圳)有限公司 | Network structure construction method and apparatus for use in multi-task scenario |
WO2022088570A1 (en) * | 2020-10-29 | 2022-05-05 | 语联网(武汉)信息技术有限公司 | Method and apparatus for post-editing of translation, electronic device, and storage medium |
CN112364990B (en) * | 2020-10-29 | 2021-06-04 | 北京语言大学 | Method and system for realizing grammar error correction and less sample field adaptation through meta-learning |
CN112364990A (en) * | 2020-10-29 | 2021-02-12 | 北京语言大学 | Method and system for realizing grammar error correction and less sample field adaptation through meta-learning |
CN112287696A (en) * | 2020-10-29 | 2021-01-29 | 语联网(武汉)信息技术有限公司 | Post-translation editing method and device, electronic equipment and storage medium |
CN112287696B (en) * | 2020-10-29 | 2024-02-23 | 语联网(武汉)信息技术有限公司 | Post-translation editing method and device, electronic equipment and storage medium |
WO2022166267A1 (en) * | 2021-02-07 | 2022-08-11 | 语联网(武汉)信息技术有限公司 | Machine translation post-editing method and system |
CN114444523A (en) * | 2022-02-10 | 2022-05-06 | 北京间微科技有限责任公司 | Portable off-line machine translation intelligent box |
CN115328871A (en) * | 2022-10-12 | 2022-11-11 | 南通中泓网络科技有限公司 | Evaluation method for format data stream file conversion based on machine learning model |
CN116992892A (en) * | 2023-07-06 | 2023-11-03 | 四川语言桥信息技术有限公司 | Method, system and readable storage medium for improving APE model based on data enhancement and multitasking training |
Also Published As
Publication number | Publication date |
---|---|
CN109670191B (en) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670191A (en) | Calibration optimization method, device and the electronic equipment of machine translation | |
CN110633409B (en) | Automobile news event extraction method integrating rules and deep learning | |
CN109145294B (en) | Text entity identification method and device, electronic equipment and storage medium | |
CN110287481A (en) | Name entity corpus labeling training system | |
CN108959246A (en) | Answer selection method, device and electronic equipment based on improved attention mechanism | |
CN111783993A (en) | Intelligent labeling method and device, intelligent platform and storage medium | |
Zhang et al. | A tree-BLSTM-based recognition system for online handwritten mathematical expressions | |
Patnaik et al. | Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks | |
CN110334186A (en) | Data query method, apparatus, computer equipment and computer readable storage medium | |
CN111027600B (en) | Image category prediction method and device | |
CN116541911B (en) | Packaging design system based on artificial intelligence | |
US20220100772A1 (en) | Context-sensitive linking of entities to private databases | |
Ding et al. | An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder | |
WO2023236977A1 (en) | Data processing method and related device | |
CN110852089A (en) | Operation and maintenance project management method based on intelligent word segmentation and deep learning | |
CN113705733A (en) | Medical bill image processing method and device, electronic device and storage medium | |
CN116029273A (en) | Text processing method, device, computer equipment and storage medium | |
CN114240672B (en) | Method for identifying duty ratio of green asset and related product | |
Malode | Benchmarking public large language model | |
CN113902569A (en) | Method for identifying the proportion of green assets in digital assets and related products | |
EP4222635A1 (en) | Lifecycle management for customized natural language processing | |
US20220100967A1 (en) | Lifecycle management for customized natural language processing | |
CN107958061A (en) | The computational methods and computer-readable recording medium of a kind of text similarity | |
CN116721713A (en) | Data set construction method and device oriented to chemical structural formula identification | |
CN116432611A (en) | Manuscript writing auxiliary method, system, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |