CN109670191A - Calibration optimization method, device and the electronic equipment of machine translation - Google Patents

Calibration optimization method, device and the electronic equipment of machine translation Download PDF

Info

Publication number
CN109670191A
CN109670191A CN201910066709.4A CN201910066709A CN109670191A CN 109670191 A CN109670191 A CN 109670191A CN 201910066709 A CN201910066709 A CN 201910066709A CN 109670191 A CN109670191 A CN 109670191A
Authority
CN
China
Prior art keywords
sample
translation
postedit
training
machine translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910066709.4A
Other languages
Chinese (zh)
Other versions
CN109670191B (en
Inventor
张睦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Language Network (wuhan) Information Technology Co Ltd
Original Assignee
Language Network (wuhan) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Language Network (wuhan) Information Technology Co Ltd filed Critical Language Network (wuhan) Information Technology Co Ltd
Priority to CN201910066709.4A priority Critical patent/CN109670191B/en
Publication of CN109670191A publication Critical patent/CN109670191A/en
Application granted granted Critical
Publication of CN109670191B publication Critical patent/CN109670191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Abstract

The embodiment of the present invention provides calibration optimization method, device and the electronic equipment of a kind of machine translation, the method comprise the steps that original text and machine translation based on destination document, utilize the neural network model for the multi-task learning that training is completed, mechanical translation quality assessment is carried out, and automatic postedit is carried out to the machine translation of the destination document;Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, training is iterated to the neural network model of basic multi-task learning and updates acquisition, any training sample includes sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.The embodiment of the present invention can effectively combine closely the assessment of machine translation automated quality and the two mutually indepedent and closely related tasks of automatic postedit, so as to more effectively improve postedit efficiency and translation quality.

Description

Calibration optimization method, device and the electronic equipment of machine translation
Technical field
The present embodiments relate to machine translation mothod field, the calibration more particularly, to a kind of machine translation optimizes Method, apparatus and electronic equipment.
Background technique
Machine translation is that a kind of natural language (original language) is converted to another natural language (target language using computer Speech) process.Although the overall translation quality of current machine translation system is improved constantly, the stability of its quality also without Method is protected.For example, the quality of output is unable to satisfy certain standard sometimes for the input of some special duties translation. In this case, translator is in postedit these quality translation fluctuated, generally require to spend a large amount of energy and Time, which does, examines and revises, this undoubtedly will affect the working efficiency of interpreter.
Machine translation automated quality assesses (Quality Estimation for Machine Translation, QE) can Intelligent predicting is carried out with the quality to MT engine output sentence.By explicit quality annotation, translator can be light Loose ground selection is postedit to be done on the basis of machine translation result or oneself is translated from the beginning.
However, being often desirable to intelligent supplementary translation in practical application further can assist translation.This Task is automatic postedit (Automatic Post-Edit, APE).Automatic postedit task is not to occur to turn in machine The inside of engine is translated, but automotive engine system is considered as a black box, and the output of this black box is automated in outside Amendment, to obtain the translation of better quality.
So, how the assessment of machine translation automated quality and the two mutually independent tasks of automatic postedit to be had Effect combine more effectively to improve translation quality the problem of, become current industry it is urgently to be resolved need project.
Summary of the invention
In order to overcome the above problem or at least be partially solved the above problem, the embodiment of the present invention provides a kind of machine and turns over Calibration optimization method, device and the electronic equipment translated, the assessment of machine translation automated quality and automatic postedit to have The combination of effect, so as to more effectively improve postedit efficiency and translation quality.
In a first aspect, the embodiment of the present invention provides a kind of calibration optimization method of machine translation, comprising:
Original text and machine translation based on destination document, using training complete multi-task learning neural network model, Mechanical translation quality assessment is carried out, and automatic postedit is carried out to the machine translation of the destination document;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of trained sample in advance This is iterated training to the neural network model of basic multi-task learning and updates acquisition, and any training sample includes Sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.
Second aspect, the embodiment of the present invention provide a kind of calibration optimization device of machine translation, comprising:
Data acquisition module, for obtaining the original text and machine translation of destination document;
Assessment and postedit output module, it is complete using training for original text and machine translation based on the destination document At multi-task learning neural network model, mechanical translation quality assessment is carried out, and to the machine of the destination document Translation carries out automatic postedit;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of trained sample in advance This is iterated training to the neural network model of basic multi-task learning and updates acquisition, and any training sample includes Sample original text, sample machine translation, sample mechanical translation quality label and sample postedit text.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising: at least one processor, at least one Manage device, communication interface and bus;The memory, the processor and the communication interface are completed mutual by the bus Communication, the communication interface between the electronic equipment and machine translating apparatus information transmission;In the memory It is stored with the computer program that can be run on the processor, when the processor executes the computer program, is realized such as The calibration optimization method of machine translation described in upper first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, the non-transient calculating Machine readable storage medium storing program for executing stores computer instruction, and the computer instruction executes the computer described in first aspect as above The calibration optimization method of machine translation.
Calibration optimization method, device and the electronic equipment of machine translation provided in an embodiment of the present invention, by using more The neural network model of business study, and the training sample comprising sample mechanical translation quality label and sample postedit text is utilized in advance This is trained model, can effectively by the assessment of machine translation automated quality and automatic postedit the two independently of each other again Closely related task is combined closely, so as to more effectively improve postedit efficiency and translation quality.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of the calibration optimization method of machine translation provided in an embodiment of the present invention;
The nerve net of multi-task learning in the calibration optimization method of Fig. 2 machine translation that an embodiment provides according to the present invention The structural schematic diagram of network model;
Fig. 3 is the structural schematic diagram of the calibration optimization device of machine translation provided in an embodiment of the present invention;
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment in the embodiment of the present invention, ability Domain those of ordinary skill every other embodiment obtained without making creative work, belongs to the present invention The range of embodiment protection.
The embodiment of the present invention by being accomplished manually, takes time and effort for postedit task in the prior art, low efficiency etc. is asked Topic, by using the neural network model of multi-task learning, and utilizes in advance comprising sample mechanical translation quality label and sample The training sample of postedit text is trained model, can be effectively by the assessment of machine translation automated quality and automatic postedit The two mutually indepedent and closely related tasks are combined closely, so as to more effectively improve postedit efficiency and turn over Translate quality.Expansion explanation and introduction will be carried out to the embodiment of the present invention especially by multiple embodiments below.
Fig. 1 is the flow diagram of the calibration optimization method of machine translation provided in an embodiment of the present invention, as shown in Figure 1, This method comprises:
S101 obtains the original text and machine translation of destination document.
It is appreciated that being usually by the language expression of a certain languages, as original for destination document to be processed Text.The destination document is translated, is the language expression for being converted into another languages of identical semanteme, as translates Text.During carrying out document translation using machine, the original text of destination document is inputted, machine translator can be exported according to the original text Corresponding machine translation.
Due to the limitation of machine translation itself, the quality for the machine translation that machine translation directly exports cannot usually be protected Barrier.For example, for original text " Little John found for the toy box in the pen ", it can using machine translation It can obtain machine translation " small John has found toy box in pen ", it is clear that it is poor that the result and reality scene of machine translation exist It is different.And postedit refers to the machine translation that machine translator directly exports further is calibrated and edited, to make to translate Translation semanteme be more consistent with original text.Therefore before carrying out postedit to machine translation, the original text and machine first to be obtained Device translation.For example, the original text and machine translation can be obtained from MT engine, can also be obtained from database.
S102, original text and machine translation based on destination document utilize the neural network for the multi-task learning that training is completed Model carries out mechanical translation quality assessment, and carries out automatic postedit to the machine translation of destination document.
Wherein, the neural network model for the multi-task learning that training is completed is to utilize a certain amount of training sample in advance, right The neural network model of basic multi-task learning be iterated training update obtain, any training sample include sample original text, Sample machine translation, sample mechanical translation quality label and sample postedit text.
It is appreciated that though the assessment of machine translation automated quality and automatic postedit are two independent tasks, they it Between the degree of correlation it is very big.Firstly, two tasks are trained using similar data set, and using supervision type machine learning algorithm Model realizes, i.e., original text to be translated, the translation of machine translation and the translation of interpreter's postedit;Secondly, the two passes through one A little strategies, which are combined, tends to provide preferably output result.
Therefore, the embodiment of the present invention, which utilizes, is based on multi-task learning (Multi-Task Learning, MTL) neural network Model, while doing the assessment of (sentence surface) mechanical translation quality and automation this two biggish tasks of correlation of postedit. Specifically, being translated on the basis of obtaining the original text and machine translation of destination document according to above-mentioned steps with the original text and machine Text is data basis, carries out a series of transformation, obtains the data type that the neural network model of multi-task learning is capable of handling.It Afterwards by transformation results input multi-task learning neural network model, using inside the model neural n ary operation and transmitting, can Mechanical translation quality assessment is carried out simultaneously and automatic postedit is carried out to machine translation.
For example, for above-mentioned machine translation " small John has found toy box in pen ", it, may be defeated after automatic postedit Automatic postedit is literary " small John has found toy box in fence " out.And the quality evaluation to the machine translation exportable simultaneously As a result general for quality.
It is understood that being needed in advance to accurately carry out mechanical translation quality assessment operation and postedit operation The neural network model of the basic multi-task learning of initialization building one, and using a certain amount of training sample to the basic model It is trained.And in the establishment for being trained sample set, for any training sample, include at least original corresponding to sample Text, machine translation, mechanical translation quality label and postedit text, i.e. sample original text, sample machine translation, sample machine translation matter Measure label and sample postedit text.Wherein mechanical translation quality tag characterization obtains sample machine translation to machine translation is carried out Translation quality assessment.
The calibration optimization method of machine translation provided in an embodiment of the present invention, by using the neural network of multi-task learning Model, and model is instructed using the training sample comprising sample mechanical translation quality label and sample postedit text in advance Practice, it can be effectively by the assessment of machine translation automated quality and the two mutually indepedent and closely related tasks of automatic postedit It combines closely, so as to more effectively improve postedit efficiency and translation quality.
Wherein, according to the above embodiments optionally, using the training sample, training obtains the more of the training completion The step of neural network model of tasking learning, specifically includes: for any training sample, being based on the training sample, utilizes basis The neural network model of multi-task learning exports the prediction mechanical translation quality of the training sample and translates text after predicting;Respectively Prediction mechanical translation quality is translated into text with after sample mechanical translation quality label, prediction and sample postedit text is compared, Obtain prediction error;Basic multi-task learning is updated using back-propagation algorithm and gradient descent algorithm based on prediction error The parameter of neural network model, and using the neural network model of updated basic multi-task learning as next training sample The neural network model of basic multi-task learning, until obtaining the neural network model for the multi-task learning that training is completed.
It is appreciated that the embodiment of the present invention before the neural network model to multi-task learning is trained, will first search Collection data simultaneously do some pretreatments, to obtain the data set of certain scale.From the data set, a certain amount of data can be chosen and made For training sample, be trained one by one using neural network model of these training samples to the basic multi-task learning of building and Update, finally obtain meet certain required precision model be training complete multi-task learning neural network model.
When being collected and being handled to data, for each data, needs to collect following data and handled: being searched Collect a document original text to be translated, and it can be segmented;The machine translation translation of above-mentioned document to be translated is collected, and It can be segmented;Interpreter is collected to the postedit translation of above-mentioned machine translation translation, and it can be segmented;Acquisition pair Above-mentioned machine translation translation carries out the quality tab of quality evaluation.
Later, t data can be randomly selected from entire data set, composition M set is used for the training and test of model, Middle M set may be expressed as: M={ (m11,m12,m13,m14),…,…,(mt1,mt2,mt3,mt4), wherein (mi1,mi2,mi3,mi4) Represent (original text segmented, the machine translation translation segmented, the postedit translation segmented, the quality mark of the i-th data Label).It needs to reshuffle the original series of M later, the data that can choose wherein 80% form training set Mtrain, remaining 20% formed verifying collection Mtest
During being trained one by one using above-mentioned training sample to model, first by the data packet of training sample It includes, is transformed into the data type that the neural network model of multi-task learning is capable of handling, then transformed data are inputted wait instruct The neural network model of experienced basic multi-task learning, Xiang Yun before being carried out using the neural network model of basic multi-task learning It calculates, obtain the prediction mechanical translation quality of the training sample and translates text after predicting.
Later, prediction mechanical translation quality is compared with sample mechanical translation quality label, seeks a prediction and misses Difference is as the first prediction error.Meanwhile text and sample postedit text will be translated after prediction and be compared, it seeks another prediction and misses Difference is as the second prediction error.Then it is according to the neural network model of the current multi-task learning of the two prediction error judgments It is no to have had reached precision of prediction, if so, training is completed, using the neural network model of current multi-task learning as training The neural network model of the multi-task learning of completion.Otherwise, predict errors in the mind of multi-task learning to be trained the two It is carried out through backpropagation in network model, and using parameter of the gradient descent method to the neural network model of basic multi-task learning It updates.
Followed by, next training sample is taken out, and by the nerve net of the updated basic multi-task learning of above-mentioned parameter Training object of the network model as next training sample, repeats above-mentioned trained renewal process, until according to prediction error Judgement knows that the neural network model of the multi-task learning after certain training has had reached precision of prediction, then confirms training It completes, the neural network model for the multi-task learning that the neural network model of multi-task learning at this time is completed as training.
It is understood that being on the basis of obtaining the neural network model for the multi-task learning that above-mentioned training is completed The universality for further verifying the model verifies the model using the data that above-mentioned verifying is concentrated.If verifying knows it Precision is met the requirements, then confirms that the model is reliable, can be used for the calibration optimization application of actual machine translation.
It is understood that utilizing training sample, training obtains the neural network mould for the multi-task learning that training is completed Before the step of type, the method for the embodiment of the present invention can also include: to obtain to the postedit cost of sample machine translation, and base Sample mechanical translation quality label is obtained by normalized and segment processing in postedit cost and sample original text.After compile Collecting cost indicates to carry out postedit to sample machine translation, obtains the cost that sample postedit text is spent.
For any training sample, the embodiment of the present invention can obtain the training sample from sample machine translation to sample first The postedit cost of the postedit process of postedit text, postedit cost characterization carry out postedit acquisition to sample machine translation The cost that sample postedit text is spent.For example, obtaining in the embodiment of the present invention to the postedit cost of sample machine translation Step can specifically include: during carrying out postedit acquisition sample postedit text to sample machine translation, statistics is carried out The total degree of the tapped keyboard of postedit, as postedit cost.
Later, need to be converted to postedit cost the mechanical translation quality label that machine can identify.It is understood that It is that postedit cost is bigger, illustrates that the quality of machine translation translation is poorer, on the contrary then quality is better.Therefore in the process of conversion In, calculating is first normalized according to postedit cost and sample original text, to eliminate the difference between different sample original texts.Separately Outside, for the fine or not degree according to postedit at original evaluation machine translation, region is carried out to the result that above-mentioned normalization calculates It divides, i.e. progress segment processing, and different labels is defined to variant piecewise interval to get sample mechanical translation quality is arrived Label.
The embodiment of the present invention can utilize multitask by defining different quality tabs to different machines translation quality The neural network model of study carries out the classification that different quality degree is more accurately carried out in quality evaluation training process, thus root Automatic postedit is preferably carried out according to the quality of machine translation, keeps postedit result more acurrate.
It is wherein optional, it is based on postedit cost and sample original text, by normalized and segment processing, obtains sample The step of mechanical translation quality label, specifically includes: doing division operation to the length of postedit cost and sample original text, and to phase The result of division operation is normalized;Based on the value of normalized result, normalized result is converted to not The sample mechanical translation quality label of ad eundem.
By postedit cost, divided by the length of the original text of destination document, (length can be according to original text first for the embodiment of the present invention The total number of middle word determines), to remove the influence of different document length.Then the above-mentioned calculated result being divided by is returned again One changes, and a value being such as converted between 0 to 1, is normalized result.Finally, being multiple by 0 to 1 interval division Continuous subinterval, further according to subinterval locating for the normalized result, by this normalized result corresponding conversion For multiple and different quality tabs.For being four continuous subintervals by 0 to 1 interval division, it can be obtained as shown in table 1 Normalized result table corresponding with quality tab conversion.
Table 1, normalized result table corresponding with quality tab conversion
Normalized result Quality tab
0≤x < 0.25 4 (high-quality)
0.25≤x < 0.5 3 (quality is preferable)
0.5≤x < 0.75 2 (quality is general)
0.75≤x≤1.0 1 (of poor quality)
As shown in table 1, by above-mentioned normalized and conversion, the postedit cost for carrying document different information is converted At the quality tab for eliminating document difference, the training and optimization of model more rapidly can be accurately completed.
Wherein, according to the above embodiments optionally, mechanical translation quality assessment is carried out, and the machine of destination document is translated The step of text progress automatic postedit specifically includes: original text and machine translation to destination document carry out word segmentation processing respectively, and The result of word segmentation processing is inputted into trained original text and translation term vector model, extract original text term vector and machine translation word to Amount;The neural network model for the multi-task learning that original text term vector and the input training of machine translation term vector are completed, with output Mechanical translation quality assessment result and to the automatic postedit of machine translation text.
According to the above embodiments it is found that in the nerve that the original text of destination document and machine translation are inputted to multi-task learning Network model first will carry out a series of transformation, obtain the nerve of multi-task learning using the original text and machine translation as data basis The data type that network model is capable of handling.This conversion process embodiment of the present invention can use trained original text and translation Term vector model is realized.
Specifically, it is necessary first to which original text and machine translation to destination document carry out word segmentation processing respectively, have been divided The original text of word and the machine translation segmented, are the result of word segmentation processing.Later, the result of word segmentation processing is inputted into instruction respectively The original text and translation term vector model perfected, to extract the term vector of original text and the term vector of machine translation, these term vector energy The content of enough clearly characterization original text or machine translation, and can be identified by the neural network model of multi-task learning.Finally, The neural network model for the multi-task learning that original text term vector and the input training of machine translation term vector are completed, passes through the model Forward direction operation, obtain out mechanical translation quality assessment result and to the automatic postedit of machine translation text and export.
Wherein, before the step of result of word segmentation processing is inputted trained original text and translation term vector model, this The method of inventive embodiments can also include: the standard list language corpus of acquisition original text languages and translation languages respectively, and respectively Word segmentation processing is carried out to the standard list language corpus of original text languages and translation languages;Standard list language corpus based on word segmentation processing, is adopted With Skip-Gram algorithm, training basis original text and translation term vector model, and model hyper parameter is set, obtain trained original Text and translation term vector model;Wherein, original text languages are languages corresponding with the original text of destination document, and translation languages are and target The corresponding languages of machine translation of document.
The embodiment of the present invention needs before the processing for carrying out term vector extraction according to above-described embodiment first with term vector Model training sample is trained term vector model.A pair of basic original text and translation term vector mould are constructed firstly the need of initialization Type, it is also necessary to the standard list language corpus with corresponding relationship is obtained in original text languages and translation languages corpus.The correspondence Relationship indicates, pair between single language corpus according to standard translation, in original text corpus and translation corpus with identical semanteme It should be related to.Later, the standard list language corpus to these with corresponding relationship carries out word segmentation processing respectively.For example, can respectively under It carries the original text languages of newest wikipedia and single language corpus of translation languages and is segmented.
Next, carrying out the training of basic original text and translation term vector model respectively using Skip-Gram algorithm.For Some of important hyper parameters, it is also necessary to individually be configured.For example, the dimension of term vector is set as 300, contextual window is set It is 5.
The embodiment of the present invention is trained basic original text and translation term vector model using standard list language corpus, so that institute The model accuracy of foundation is higher.
Fig. 2 shows multi-task learnings in the calibration optimization method of the machine translation of the offer of an embodiment according to the present invention The structural schematic diagram of neural network model, wherein original text languages are English, and translation languages are Chinese.The input of model is original text " small John has found in pen for " Little John found for the toy box in the pen. " and machine translation Toy box." by two Algorithms inside model, the i.e. place of mechanical translation quality assessment algorithm and automatic postedit algorithm Reason, final output mechanical translation quality assessment result (quality tab) and automatic postedit translation.As shown in table 2, it is shown that Fig. 2 Shown in treatment process output and input data.
Table 2, treatment process shown in Fig. 2 output and input tables of data
As shown in table 2, to mode input original text to be translated " Little John found for the toy box " small John has found toy box in pen in the pen. " and the translation of machine translation." after, model can be accordingly to machine The quality of translation is assessed, and assessment result is quality general 2, while also carrying out automatic postedit to machine translation, after output " small John has found toy box to edited result in fence." obviously the result of postedit more meet actual scene, it is semantic more quasi- Really.
As the other side of the embodiment of the present invention, the embodiment of the present invention provides a kind of machine according to the above embodiments The calibration of translation optimizes device, and the device for realizing the calibration optimization of machine translation in the above embodiments.Therefore, upper The description and definition in the calibration optimization method of the machine translation of each embodiment are stated, can be used for each in the embodiment of the present invention hold The understanding of row module specifically refers to above-described embodiment, is not repeating herein.
One embodiment according to an embodiment of the present invention, the structure of the calibration optimization device of machine translation is as shown in figure 3, be The structural schematic diagram of the calibration optimization device of machine translation provided in an embodiment of the present invention, which can be used to implement above-mentioned each The calibration optimization of machine translation in embodiment of the method, the device include: data acquisition module 301 and assessment and postedit output mould Block 302.Wherein:
Data acquisition module 301 is used to obtain the original text and machine translation of destination document;Assessment and postedit output module 302 be used for original text and machine translation based on destination document, using training complete multi-task learning neural network model, into The assessment of row mechanical translation quality, and automatic postedit is carried out to the machine translation of destination document;Wherein, the multitask that training is completed The neural network model of study be in advance using a certain amount of training sample, to the neural network model of basic multi-task learning into Row iteration training updates acquisition, and any training sample includes sample original text, sample machine translation, sample mechanical translation quality mark Label and sample postedit text.
Specifically, the quality for the machine translation that machine translation directly exports is logical due to the limitation of machine translation itself It cannot often ensure.And postedit refers to the machine translation that machine translator directly exports further is calibrated and edited, from And the semanteme of the translation translated is made more to be consistent with original text.Therefore before carrying out postedit to machine translation, data acquisition Module 301 will first obtain the original text and machine translation.For example, the original text and machine translation can be obtained from MT engine, It can also be obtained from database.
Later, postedit output module 302 carries out a series of changes then using above-mentioned original text and machine translation as data basis It changes, obtains the data type that the neural network model of multi-task learning is capable of handling.Then, postedit output module 302 will become Change result input multi-task learning neural network model, using inside the model neural n ary operation and transmitting, can simultaneously into Row mechanical translation quality is assessed and carries out automatic postedit to machine translation.
It is understood that being needed in advance to accurately carry out mechanical translation quality assessment operation and postedit operation The neural network model of the basic multi-task learning of initialization building one, and using a certain amount of training sample to the basic model It is trained.And in the establishment for being trained sample set, for any training sample, include at least original corresponding to sample Text, machine translation, mechanical translation quality label and postedit text, i.e. sample original text, sample machine translation, sample machine translation matter Measure label and sample postedit text.Wherein mechanical translation quality tag characterization obtains sample machine translation to machine translation is carried out Translation quality assessment.
The calibration of machine translation provided in an embodiment of the present invention optimizes device, by the way that corresponding execution module is arranged, uses The neural network model of multi-task learning, and the instruction comprising sample mechanical translation quality label and sample postedit text is utilized in advance Practice sample model is trained, can effectively by machine translation automated quality assessment and automatic postedit the two mutually it is only Vertical and closely related task is combined closely, so as to more effectively improve postedit efficiency and translation quality.
It is understood that can be by hardware processor (hardware processor) come real in the embodiment of the present invention Each relative program module in the device of existing the various embodiments described above.Also, the calibration of the machine translation of the embodiment of the present invention optimizes Device utilize above-mentioned each program module, can be realized the calibration Optimizing Flow of the machine translation of above-mentioned each method embodiment, with When realizing the calibration optimization of machine translation in above-mentioned each method embodiment, the beneficial effect of the device generation of the embodiment of the present invention It is identical as corresponding above-mentioned each method embodiment, above-mentioned each method embodiment can be referred to, details are not described herein again.
As the another aspect of the embodiment of the present invention, the present embodiment provides a kind of electronics according to the above embodiments and sets It is standby, it is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, comprising: at least one processor with reference to Fig. 4 401, at least one processor 402, communication interface 403 and bus 404.
Wherein, memory 401, processor 402 and communication interface 403 complete mutual communication by bus 404, communicate Interface 403 is for the information transmission between the electronic equipment and machine translating apparatus;Being stored in memory 401 can be in processor The computer program run on 402 when processor 402 executes the computer program, realizes the machine as described in the various embodiments described above The calibration optimization method of device translation.
It is to be understood that including at least memory 401, processor 402, communication interface 403 and bus in the electronic equipment 404, and memory 401, processor 402 and communication interface 403 form mutual communication connection by bus 404, and can be complete At mutual communication, such as the program instruction of the calibration optimization method of reading machine translated text from memory 401 of processor 402 Deng.In addition, communication interface 403 can also realize the communication connection between the electronic equipment and machine translating apparatus, and achievable Mutual information transmission, such as realize that the calibration of machine translation optimizes by communication interface 403.
When electronic equipment is run, processor 402 calls the program instruction in memory 401, real to execute above-mentioned each method Apply method provided by example, for example, original text and machine translation based on destination document, the multitask completed using training The neural network model of habit carries out mechanical translation quality assessment, and carries out automatic postedit etc. to the machine translation of destination document.
Program instruction in above-mentioned memory 401 can be realized and as independent by way of SFU software functional unit Product when selling or using, can store in a computer readable storage medium.Alternatively, realizing that above-mentioned each method is implemented This can be accomplished by hardware associated with program instructions for all or part of the steps of example, and program above-mentioned can store to be calculated in one In machine read/write memory medium, when being executed, execution includes the steps that above-mentioned each method embodiment to the program;And storage above-mentioned Medium includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), the various media that can store program code such as magnetic or disk.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium also according to the various embodiments described above, this is non-temporarily State computer-readable recording medium storage computer instruction, the computer instruction execute computer as described in the various embodiments described above Machine translation calibration optimization method, for example, original text and machine translation based on destination document, utilize training complete The neural network model of multi-task learning carries out mechanical translation quality assessment, and carries out to the machine translation of destination document automatic Postedit etc..
Electronic equipment provided in an embodiment of the present invention and non-transient computer readable storage medium, by executing above-mentioned each reality The calibration optimization method for applying machine translation described in example, using the neural network model of multi-task learning, and utilize in advance comprising The training sample of sample mechanical translation quality label and sample postedit text is trained model, can effectively turn over machine It translates automated quality assessment and the two mutually indepedent and closely related tasks of automatic postedit is combined closely, so as to More effectively improve postedit efficiency and translation quality.
It is understood that the embodiment of device described above, electronic equipment and storage medium is only schematic , wherein unit may or may not be physically separated as illustrated by the separation member, it can both be located at one Place, or may be distributed on heterogeneous networks unit.Some or all of modules can be selected according to actual needs To achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are without paying creative labor To understand and implement.
By the description of embodiment of above, those skilled in the art is it will be clearly understood that each embodiment can borrow Help software that the mode of required general hardware platform is added to realize, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned Substantially the part that contributes to existing technology can be embodied in the form of software products technical solution in other words, the meter Calculation machine software product may be stored in a computer readable storage medium, such as USB flash disk, mobile hard disk, ROM, RAM, magnetic disk or light Disk etc., including some instructions, with so that a computer equipment (such as personal computer, server or network equipment etc.) Execute method described in certain parts of above-mentioned each method embodiment or embodiment of the method.
In addition, those skilled in the art are it should be understood that in the application documents of the embodiment of the present invention, term "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, so that including a series of elements Process, method, article or equipment not only include those elements, but also including other elements that are not explicitly listed, or Person is to further include for elements inherent to such a process, method, article, or device.In the absence of more restrictions, by The element that sentence "including a ..." limits, it is not excluded that in the process, method, article or apparatus that includes the element There is also other identical elements.
In the specification of the embodiment of the present invention, numerous specific details are set forth.It should be understood, however, that the present invention is implemented The embodiment of example can be practiced without these specific details.In some instances, it is not been shown in detail well known Methods, structures and technologies, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify implementation of the present invention Example is open and helps to understand one or more of the various inventive aspects, above to the exemplary embodiment of the embodiment of the present invention Description in, each feature of the embodiment of the present invention is grouped together into single embodiment, figure or descriptions thereof sometimes In.
However, the disclosed method should not be interpreted as reflecting the following intention: i.e. the claimed invention is implemented Example requires features more more than feature expressly recited in each claim.More precisely, such as claims institute As reflection, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific embodiment party Thus claims of formula are expressly incorporated in the specific embodiment, wherein each claim itself is real as the present invention Apply the separate embodiments of example.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the embodiment of the present invention, rather than it is limited System;Although the embodiment of the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art it is understood that It is still possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is carried out etc. With replacement;And these are modified or replaceed, each embodiment skill of the embodiment of the present invention that it does not separate the essence of the corresponding technical solution The spirit and scope of art scheme.

Claims (10)

1. a kind of calibration optimization method of machine translation characterized by comprising
Original text and machine translation based on destination document are carried out using the neural network model for the multi-task learning that training is completed Mechanical translation quality assessment, and automatic postedit is carried out to the machine translation of the destination document;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, right The neural network model of basic multi-task learning is iterated training and updates acquisition, and any training sample includes sample original Text, sample machine translation, sample mechanical translation quality label and sample postedit text.
2. the method according to claim 1, wherein it is complete that training obtains the training using the training sample At multi-task learning neural network model the step of specifically include:
For any training sample, it is based on the training sample, using the neural network model of the basic multi-task learning, It exports the prediction mechanical translation quality of the training sample and translates text after predicting;
Respectively text and institute will be translated after the prediction mechanical translation quality and the sample mechanical translation quality label, the prediction It states sample postedit text to be compared, obtains prediction error;
The basic multi-task learning is updated using back-propagation algorithm and gradient descent algorithm based on the prediction error The parameter of neural network model, and using the neural network model of updated basic multi-task learning as next training sample Basic multi-task learning neural network model, until obtain it is described training complete multi-task learning neural network mould Type.
3. according to the method described in claim 2, it is characterized in that, utilizing the training sample, training obtains the training Before the step of neural network model of the multi-task learning of completion, further includes:
The postedit cost to the sample machine translation is obtained, and is based on the postedit cost and the sample original text, is led to Normalized and segment processing are crossed, the sample mechanical translation quality label is obtained;
The postedit cost indicates to carry out postedit to the sample machine translation, obtains the sample postedit text and spent Cost.
4. according to the method described in claim 3, it is characterized in that, described former based on the postedit cost and the sample Text, by normalized and segment processing, the step of obtaining the sample mechanical translation quality label, is specifically included:
Division operation is done to the length of the postedit cost and the sample original text, and normalizing is carried out to the result of division operation Change processing;
Based on the value of normalized result, the normalized result is converted into the different grades of sample machine Translation quality label.
5. the method according to claim 1, wherein the progress mechanical translation quality assessment, and to the mesh The step of machine translation of mark document carries out automatic postedit specifically includes:
Original text and machine translation to the destination document carry out word segmentation processing respectively, and the result of word segmentation processing is inputted and is trained Good original text and translation term vector model extracts original text term vector and machine translation term vector;
The original text term vector and the machine translation term vector are inputted to the nerve net for the multi-task learning that the training is completed Network model, to export mechanical translation quality assessment result and to the automatic postedit text of the machine translation.
6. according to the method described in claim 5, it is characterized in that, inputting trained original in the result by word segmentation processing Before the step of text and translation term vector model, further includes:
The standard list language corpus of original text languages and translation languages is obtained respectively, and respectively to the original text languages and the translation The standard list language corpus of languages carries out word segmentation processing;
Standard list language corpus based on word segmentation processing, using Skip-Gram algorithm, training basis original text and translation term vector mould Type, and model hyper parameter is set, obtain the trained original text and translation term vector model;
Wherein, the original text languages are languages corresponding with the original text of the destination document, and the translation languages are and the mesh Mark the corresponding languages of machine translation of document.
7. according to the method described in claim 3, it is characterized in that, it is described obtain to the postedit of the sample machine translation at This step of, specifically includes:
During obtaining the sample postedit text to sample machine translation progress postedit, after being carried out by statistics The total degree for editing tapped keyboard, calculates the postedit cost.
8. a kind of calibration of machine translation optimizes device characterized by comprising
Data acquisition module, for obtaining the original text and machine translation of destination document;
Assessment and postedit output module are completed for original text and machine translation based on the destination document using training The neural network model of multi-task learning carries out mechanical translation quality assessment, and to the machine translation of the destination document Carry out automatic postedit;
Wherein, the neural network model for the multi-task learning that the training is completed is to utilize a certain amount of training sample in advance, right The neural network model of basic multi-task learning is iterated training and updates acquisition, and any training sample includes sample original Text, sample machine translation, sample mechanical translation quality label and sample postedit text.
9. a kind of electronic equipment characterized by comprising at least one processor, at least one processor, communication interface and total Line;
The memory, the processor and the communication interface complete mutual communication, the communication by the bus Interface is also used to the transmission of the information between the electronic equipment and machine translating apparatus;
The computer program that can be run on the processor is stored in the memory, the processor executes the calculating When machine program, the method as described in any in claim 1 to 7 is realized.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in any in claim 1 to 7.
CN201910066709.4A 2019-01-24 2019-01-24 Calibration optimization method and device for machine translation and electronic equipment Active CN109670191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910066709.4A CN109670191B (en) 2019-01-24 2019-01-24 Calibration optimization method and device for machine translation and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910066709.4A CN109670191B (en) 2019-01-24 2019-01-24 Calibration optimization method and device for machine translation and electronic equipment

Publications (2)

Publication Number Publication Date
CN109670191A true CN109670191A (en) 2019-04-23
CN109670191B CN109670191B (en) 2023-03-07

Family

ID=66149728

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910066709.4A Active CN109670191B (en) 2019-01-24 2019-01-24 Calibration optimization method and device for machine translation and electronic equipment

Country Status (1)

Country Link
CN (1) CN109670191B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175335A (en) * 2019-05-08 2019-08-27 北京百度网讯科技有限公司 The training method and device of translation model
CN110532575A (en) * 2019-08-21 2019-12-03 语联网(武汉)信息技术有限公司 Text interpretation method and device
CN110543643A (en) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110555213A (en) * 2019-08-21 2019-12-10 语联网(武汉)信息技术有限公司 training method of text translation model, and text translation method and device
CN110765791A (en) * 2019-11-01 2020-02-07 清华大学 Automatic post-editing method and device for machine translation
CN111144137A (en) * 2019-12-17 2020-05-12 语联网(武汉)信息技术有限公司 Method and device for generating edited model corpus after machine translation
CN112287696A (en) * 2020-10-29 2021-01-29 语联网(武汉)信息技术有限公司 Post-translation editing method and device, electronic equipment and storage medium
CN112364990A (en) * 2020-10-29 2021-02-12 北京语言大学 Method and system for realizing grammar error correction and less sample field adaptation through meta-learning
WO2021114625A1 (en) * 2020-05-28 2021-06-17 平安科技(深圳)有限公司 Network structure construction method and apparatus for use in multi-task scenario
CN114444523A (en) * 2022-02-10 2022-05-06 北京间微科技有限责任公司 Portable off-line machine translation intelligent box
WO2022166267A1 (en) * 2021-02-07 2022-08-11 语联网(武汉)信息技术有限公司 Machine translation post-editing method and system
CN115328871A (en) * 2022-10-12 2022-11-11 南通中泓网络科技有限公司 Evaluation method for format data stream file conversion based on machine learning model
CN116992892A (en) * 2023-07-06 2023-11-03 四川语言桥信息技术有限公司 Method, system and readable storage medium for improving APE model based on data enhancement and multitasking training

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090326913A1 (en) * 2007-01-10 2009-12-31 Michel Simard Means and method for automatic post-editing of translations
US8326598B1 (en) * 2007-03-26 2012-12-04 Google Inc. Consensus translations from multiple machine translation systems
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN106649282A (en) * 2015-10-30 2017-05-10 阿里巴巴集团控股有限公司 Machine translation method and device based on statistics, and electronic equipment
CN107301174A (en) * 2017-06-22 2017-10-27 北京理工大学 A kind of automatic post-editing system and method for integrated form based on splicing
US20170323203A1 (en) * 2016-05-06 2017-11-09 Ebay Inc. Using meta-information in neural machine translation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090326913A1 (en) * 2007-01-10 2009-12-31 Michel Simard Means and method for automatic post-editing of translations
US8326598B1 (en) * 2007-03-26 2012-12-04 Google Inc. Consensus translations from multiple machine translation systems
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN106649282A (en) * 2015-10-30 2017-05-10 阿里巴巴集团控股有限公司 Machine translation method and device based on statistics, and electronic equipment
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation
US20170323203A1 (en) * 2016-05-06 2017-11-09 Ebay Inc. Using meta-information in neural machine translation
CN109074242A (en) * 2016-05-06 2018-12-21 电子湾有限公司 Metamessage is used in neural machine translation
CN107301174A (en) * 2017-06-22 2017-10-27 北京理工大学 A kind of automatic post-editing system and method for integrated form based on splicing

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175335A (en) * 2019-05-08 2019-08-27 北京百度网讯科技有限公司 The training method and device of translation model
CN110175335B (en) * 2019-05-08 2023-05-09 北京百度网讯科技有限公司 Translation model training method and device
CN110532575A (en) * 2019-08-21 2019-12-03 语联网(武汉)信息技术有限公司 Text interpretation method and device
CN110543643A (en) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110555213A (en) * 2019-08-21 2019-12-10 语联网(武汉)信息技术有限公司 training method of text translation model, and text translation method and device
CN110543643B (en) * 2019-08-21 2022-11-11 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110765791A (en) * 2019-11-01 2020-02-07 清华大学 Automatic post-editing method and device for machine translation
CN110765791B (en) * 2019-11-01 2021-04-06 清华大学 Automatic post-editing method and device for machine translation
CN111144137A (en) * 2019-12-17 2020-05-12 语联网(武汉)信息技术有限公司 Method and device for generating edited model corpus after machine translation
CN111144137B (en) * 2019-12-17 2023-09-05 语联网(武汉)信息技术有限公司 Method and device for generating corpus of machine post-translation editing model
WO2021114625A1 (en) * 2020-05-28 2021-06-17 平安科技(深圳)有限公司 Network structure construction method and apparatus for use in multi-task scenario
WO2022088570A1 (en) * 2020-10-29 2022-05-05 语联网(武汉)信息技术有限公司 Method and apparatus for post-editing of translation, electronic device, and storage medium
CN112364990B (en) * 2020-10-29 2021-06-04 北京语言大学 Method and system for realizing grammar error correction and less sample field adaptation through meta-learning
CN112364990A (en) * 2020-10-29 2021-02-12 北京语言大学 Method and system for realizing grammar error correction and less sample field adaptation through meta-learning
CN112287696A (en) * 2020-10-29 2021-01-29 语联网(武汉)信息技术有限公司 Post-translation editing method and device, electronic equipment and storage medium
CN112287696B (en) * 2020-10-29 2024-02-23 语联网(武汉)信息技术有限公司 Post-translation editing method and device, electronic equipment and storage medium
WO2022166267A1 (en) * 2021-02-07 2022-08-11 语联网(武汉)信息技术有限公司 Machine translation post-editing method and system
CN114444523A (en) * 2022-02-10 2022-05-06 北京间微科技有限责任公司 Portable off-line machine translation intelligent box
CN115328871A (en) * 2022-10-12 2022-11-11 南通中泓网络科技有限公司 Evaluation method for format data stream file conversion based on machine learning model
CN116992892A (en) * 2023-07-06 2023-11-03 四川语言桥信息技术有限公司 Method, system and readable storage medium for improving APE model based on data enhancement and multitasking training

Also Published As

Publication number Publication date
CN109670191B (en) 2023-03-07

Similar Documents

Publication Publication Date Title
CN109670191A (en) Calibration optimization method, device and the electronic equipment of machine translation
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN109145294B (en) Text entity identification method and device, electronic equipment and storage medium
CN110287481A (en) Name entity corpus labeling training system
CN108959246A (en) Answer selection method, device and electronic equipment based on improved attention mechanism
CN111783993A (en) Intelligent labeling method and device, intelligent platform and storage medium
Zhang et al. A tree-BLSTM-based recognition system for online handwritten mathematical expressions
Patnaik et al. Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks
CN110334186A (en) Data query method, apparatus, computer equipment and computer readable storage medium
CN111027600B (en) Image category prediction method and device
CN116541911B (en) Packaging design system based on artificial intelligence
US20220100772A1 (en) Context-sensitive linking of entities to private databases
Ding et al. An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder
WO2023236977A1 (en) Data processing method and related device
CN110852089A (en) Operation and maintenance project management method based on intelligent word segmentation and deep learning
CN113705733A (en) Medical bill image processing method and device, electronic device and storage medium
CN116029273A (en) Text processing method, device, computer equipment and storage medium
CN114240672B (en) Method for identifying duty ratio of green asset and related product
Malode Benchmarking public large language model
CN113902569A (en) Method for identifying the proportion of green assets in digital assets and related products
EP4222635A1 (en) Lifecycle management for customized natural language processing
US20220100967A1 (en) Lifecycle management for customized natural language processing
CN107958061A (en) The computational methods and computer-readable recording medium of a kind of text similarity
CN116721713A (en) Data set construction method and device oriented to chemical structural formula identification
CN116432611A (en) Manuscript writing auxiliary method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant