CN109977426A - A kind of training method of translation model, device and machine readable media - Google Patents
A kind of training method of translation model, device and machine readable media Download PDFInfo
- Publication number
- CN109977426A CN109977426A CN201711448599.5A CN201711448599A CN109977426A CN 109977426 A CN109977426 A CN 109977426A CN 201711448599 A CN201711448599 A CN 201711448599A CN 109977426 A CN109977426 A CN 109977426A
- Authority
- CN
- China
- Prior art keywords
- languages
- text
- transmogrified
- received text
- translation model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a kind of training method of translation model, device and machine readable medias, wherein the described method includes: the received text of the first languages to be converted to the transmogrified text of the first languages;The corresponding second languages text of received text of the received text of first languages, the transmogrified text and first languages of first languages is trained the translation model as the training data of translation model, and according to the training data.The translation model that the training program of the translation model provided through the embodiment of the present invention trains accurately can carry out simultaneous interpretation to the colloquial style voice of user's input, improve the translation performance of translation model.
Description
Technical field
The present invention relates to bilingual translation technical field, more particularly to a kind of training method of translation model, device and
Machine readable media.
Background technique
With the increase of international exchange, linked up using the languages of different language more and more frequent.To overcome languages to exchange
Obstacle carries out online voiced translation i.e. simultaneous interpretation by translation model and obtains extensively based on translation model is installed in client
Application.
Online voiced translation relates generally to two links, and first is to carry out speech recognition, i.e., the first language inputted user
The voice signal of kind is converted to text;Second is to be translated by translation model to text, obtains as translation result
The text of two languages finally provides a user the second languages text or language message.
Translation model is based on the training of bilingual sentence pair and generates, and is formal written for training the bilingual sentence pair of translation model
Language is received text, therefore translation model is only capable of accurately turning over the corresponding voice of received text of user's input in unison
It translates.And in actual application, it is accustomed to the voice to be translated excessively colloquial style of user's input for fear of speaking for user, turns at this time
Translating model will be unable to accurately carry out simultaneous interpretation to the voice of user's input, influence the translation performance of translation model.
Summary of the invention
The present invention provides a kind of training method of translation model, device and machine readable medias, can be accurately right
The colloquial style voice of user's input carries out simultaneous interpretation, improves the performance of interpreting of translation model.
To solve the above-mentioned problems, the invention discloses a kind of training methods of translation model, wherein the method packet
It includes: the received text of the first languages is converted to the transmogrified text of the first languages;By the received text of first languages, described
The transmogrified text of first languages and the corresponding second languages text of the received text of first languages are as translation model
Training data, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the invention also discloses a kind of training device of translation model, wherein described device packet
It includes: conversion module, for the received text of the first languages to be converted to the transmogrified text of the first languages;Training module, being used for will
The received text of the received text of first languages, the transmogrified text of first languages and first languages is corresponding
Training data of the second languages text as translation model, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the present invention discloses a kind of device for translation model training, including memory again,
And one or more than one program, wherein the one or more programs are stored in the memory, and
It is configured to execute the one or more programs by one or more than one processor to include following for carrying out
The instruction of operation: the received text of the first languages is converted to the transmogrified text of the first languages;By the standard of first languages
The corresponding second languages text of the received text of text, the transmogrified text of first languages and first languages, which is used as, to be turned over
The training data of model is translated, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the invention also discloses a kind of machine readable medias, instruction are stored thereon with, when by one
When a or multiple processors execute, so that device executes the training method of any one translation model as described in the present invention.
Compared with prior art, the invention has the following advantages that
Training method, device and the machine readable media of translation model provided in an embodiment of the present invention can will be used for
The received text of first languages of training pattern is converted to the transmogrified text of the first languages;By the received text of the first languages,
The training number of the transmogrified text of one languages and the corresponding second languages text of the received text of the first languages as translation model
It is trained according to translation model.There are transmogrified text passes corresponding with received text in the translation model obtained due to training
System, thus when user input colloquial style voice when recognize its corresponding transmogrified text after, can further determine that the transmogrified text
The voice output of corresponding cypher text or cypher text, therefore accurately the colloquial style voice of user's input can be carried out in unison
Translation, improves the translation performance of translation model.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the training method of according to embodiments of the present invention one translation model;
Fig. 2 is a kind of step flow chart of the training method of according to embodiments of the present invention two translation model;
Fig. 3 is a kind of structural block diagram of the training device of according to embodiments of the present invention three translation model;
Fig. 4 is a kind of structural block diagram of the training device of according to embodiments of the present invention four translation model;
Fig. 5 is a kind of structural block diagram of according to embodiments of the present invention five device for translation model training;And
Fig. 6 is the structural block diagram of the server in the embodiment of the present invention five.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
The received text for being used for the first languages of training is converted into the transmogrified text of the first languages in the embodiment of the present invention,
The second language that the transmogrified text of received text, the first languages and the received text of the first languages based on the first languages are translated
Kind text training translation model, passing through the translation model that training obtains can determine the first languages colloquial style voice pair of user's input
The received text for the first languages answered, thus the corresponding second languages text of the received text for obtaining the first languages, output second
Languages text or the corresponding voice of the second languages text, to realize the accurate translation to colloquial style voice.
The embodiment of the present invention can be applied to voiced translation, simultaneous interpretation etc. and arbitrarily need to be not result progress to language
In the scene of translation on line.First languages and the second languages can be used for indicating different bilinguals, above-mentioned first languages and
Two languages can be obtained by user is preset, can also be obtained by analyzing the historical behavior of user.It is alternatively possible to which user is most normal
The languages used are determined as the second languages as the first languages, by other languages in addition to the first languages.Such as: for Chinese
For (Chinese) is the user of mother tongue, the first languages can be Chinese (Chinese), the second languages can for English, Japanese, Korean,
Japanese, German, French, minority language kind one kind or combination.
Embodiment one
Referring to Fig.1, a kind of step flow chart of the training method of translation model of the embodiment of the present invention one is shown.
The training method of the translation model of the embodiment of the present invention the following steps are included:
Step 101: the received text of the first languages is converted to the transmogrified text of the first languages.
During hands-on, need to be trained according to the bilingual sentence pair translation model of multiple groups, in the embodiment of the present invention
It is illustrated by taking the training of one group of bilingual sentence pair as an example.Every group of bilingual sentence pair includes the received text and the standard of the first languages
The corresponding second languages text of text, the second languages text is also received text.The translation model trained can be by the first language
Kind text or voiced translation at the second languages text or voice.
The transmogrified text of first languages is close to spoken expression text, we can be by carrying out phase to received text
It should handle, so that the text after conversion is closer to actual oral expression mode.Usually colloquial voice input can deposit
In following several problems: word is repeated, is overturned there are redundancy modal particle, sentence incompleteness or word order etc..To ensure to be trained
Translation model simultaneous interpretation accurately can be carried out to colloquial style voice, then need to introduce in translation model training with it is spoken
The close transmogrified text of expression way.
Correspondingly, the standard text of the first languages of training can will be used for by following any one or more mode combination
Originally it is converted to the transmogrified text of the first languages: the participle in the received text of the first languages is repeated with certain probability with suitable
Answer the problem of word repeats in colloquial style voice;In the received text of the first languages with certain probability be inserted into default insertion word with
There are redundancy tone word problems in adaptation colloquial style voice;By the participle in the received text of the first languages with certain probability into
Row is deleted to adapt to the problem of sentence incompleteness in colloquial style voice;By the participle in the received text of the first languages with certain probability
Position replacement is carried out to adapt to the problem that word order is reverse in colloquial style voice.During specific implementation, those skilled in the art
The received text of first languages is converted to by the combination that can select any one of the above or various ways according to actual needs
The transmogrified text of one languages.
Step 102: by the received text of the received text of the first languages, the transmogrified text of the first languages and the first languages
Training data of the corresponding second languages text as translation model, and translation model is trained according to training data.
In the training process, can by the received text of the first languages, the first languages transmogrified text as a whole,
A bilingual sentence pair of group is constituted with the second languages text, the bilingual sentence pair according to composition is trained translation model.For according to
According to the concrete mode that bilingual sentence pair is trained model, referring to existing the relevant technologies, in the embodiment of the present invention not to this
Do concrete restriction.
In translation model after training, there are the received texts of the first languages, the transmogrified text of the first languages and first
The corresponding second languages text of the received text of languages.Therefore, in the specific application process, if user's input colloquial first
Languages voice, translation model can obtain the transmogrified text of the first languages to colloquial first languages speech recognition, determine first
The received text of corresponding first languages of languages transmogrified text, thus corresponding second languages of the received text for obtaining the first languages
Text exports the second languages text or the corresponding voice of the second languages text, to realize precisely turning over to colloquial style voice
It translates.
To sum up, the training method of translation model provided in an embodiment of the present invention can will be used for the first language of training pattern
The received text of kind is converted to the transmogrified text of the first languages;By the received text of the first languages, the transmogrified text of the first languages
And first languages the corresponding second languages text of received text as translation model training data to translation model carry out
Training.Exist in the translation model obtained due to training and is closed close to the transmogrified text of oral expression is corresponding with received text
System, thus when user input colloquial style voice when recognize its corresponding transmogrified text after, can further determine that the transmogrified text
The voice output of corresponding cypher text or cypher text, therefore accurately the colloquial style voice of user's input can be carried out in unison
Translation, improves the translation performance of translation model.
Embodiment two
Referring to Fig. 2, a kind of step flow chart of the training method of translation model of the embodiment of the present invention two is shown.
The translation model training method of the embodiment of the present invention specifically includes the following steps:
Step 201: the received text of the first languages is subjected to word segmentation processing.
The translation model trained supports the simultaneous interpretation between the first languages voice, the second language voice.In hands-on
Cheng Zhong needs to be trained according to the bilingual sentence pair translation model of multiple groups, with the training of one group of bilingual sentence pair in the embodiment of the present invention
For be illustrated.Every group of bilingual sentence pair includes the received text and the corresponding second languages text of the received text of the first languages
This.For enable trained translation model precisely translate input the first languages spoken voice, training translation model
When introduce the corresponding colloquial style text of received text of the first languages, therefore the received text by the first languages is needed to be converted into correspondence
Transmogrified text, wherein step 201 to step 202 be a kind of feasibility transform mode.
The received text of first languages can may be passage for a short sentence, therefore comprising more in received text
A participle.When carrying out participle division to received text, can be divided according to participle table preset in system.Such as: the
The received text of one sentence languages is " we have a meeting afternoon ", then can obtain " we ", " afternoon " " meeting " three after word segmentation processing
A participle.
Step 202: being directed to each participle, generate the first random probability value;It is true according to preset number of repetition probability distribution
Participle is repeated the number of repetition time by the fixed corresponding number of repetition of first random probability value.
Number of repetition probability distribution is preset in translation model training system, number of repetition probability distribution can be by this field
Technical staff presets, can also be by analyzing to obtain to the daily voice input habit of user.Number of repetition probability distribution is real
It is the corresponding relationship of recurrence probability and number of repetition in matter.Such as: setting recurrence probability is that 0.6 corresponding number of repetition is 1, is repeated
Probability is that 0.2 corresponding number of repetition is 2, and recurrence probability is that 0.12 corresponding number of repetition is 3, and recurrence probability is 0.08 corresponding repetition
Number is 4.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages
Whether need to repeat, can generate the first random probability value when judging every time, judges that the first random probability value generated is corresponding
Number of repetition, if corresponding number of repetition is greater than or equal to 1, it is determined that repeated to the participle, if corresponding repetition time
Number is 0, then the participle is without repeating.The the first random probability value possibility generated every time is identical may also be different.
Such as: when judging whether segment " we " needs to repeat, translation model training system generates the first random probability value
It is 0.6, can determine that the corresponding number of repetition of recurrence probability 0.6 is 1 by number of repetition probability distribution, accordingly, it can be determined that needing
It is repeated once " we " this participle.
Each participle in the received text of the first languages is handled one by one using method shown in step 202, i.e.,
The transmogrified text of available first languages.Spoken language text training translation model after being converted by this kind of mode, exists in translation
When the colloquial style voice of word replication problem, there is good translation performance.
Step 201 to step 202 is to repeat one or more participles in the received text of the first languages, is obtained
The specific implementation of the transmogrified text of first languages.During specific implementation, the received text of the first languages is converted to
The transmogrified text of first languages is not limited to carry out repeating a kind of this implementation to participle, can also be real in the following way
It is existing:
At mode one, one or more insertion position in the received text of the first languages, addition insertion word is obtained
The transmogrified text of first languages.
Wherein, appoint in the beginning of the sentence position, sentence tail position of the received text of the first languages and the received text of the first languages
It is insertion position between two participle of meaning.Transmogrified text training translation model after being converted by this kind of mode, in translation, there are superfluous
When the colloquial style voice of remaining modal particle problem, there is good translation performance.
A kind of feasibility at one or more insertion position in the received text of the first languages, addition insertion word
The mode for obtaining the transmogrified text of the first languages is as follows:
Firstly, determining each insertion position in the received text of the first languages;
Such as: the received text of the first languages is " being originally in this way ", then " originally " "Yes" " this is obtained after word segmentation processing
Three, sample " participles are an insertion position before then can determine " originally ";It " originally " is an insertion position between "Yes";
It is an insertion position between "Yes" and " in this way ", is an insertion position after " in this way ".
Secondly, being directed to each insertion position, the second random probability value is generated;According to preset insertion number probability distribution,
Determine the corresponding insertion number of the second random probability value, the matched insertion number in determining and insertion position from insertion word list
It is inserted into word, and is inserted into determining each insertion word.
Such as: the corresponding insertion number of the second random probability value is three, then determining and insertion position from insertion time list
Matched three insertions word is set, these three determining insertion words are sequentially inserted at the insertion position.It is inserted into number probability distribution
It can be preset by those skilled in the art, it can also be by analyzing to obtain to the daily voice input habit of user.
Being inserted into word list may include multiple modal particles, and insertion word list can be preset in by those skilled in the art to be turned over
It translates in model training systems, can also be obtained by speech analysis of the translation model training system to user's common input.The present invention
The specific word for including in insertion word list is not particularly limited in embodiment.It is right one by one using method shown in this step
Each insertion position in the received text of first languages is handled, and the transmogrified text of the first languages can be obtained.
Mode two deletes one or more participles in the received text of the first languages, obtains the deformation of the first languages
Text., there is the colloquial style of sentence incompleteness in translation in the transmogrified text training translation model after being converted by this kind of mode
When voice, there is good translation performance.
A kind of feasibility one or more participles in the received text of the first languages are deleted, obtain the first languages
The mode of transmogrified text is as follows:
Firstly, the received text of the first languages is carried out word segmentation processing;
Secondly, being directed to each participle, third random probability value is generated;According to preset probability of erasure distribution judge third with
Whether machine probability value, which indicates, is deleted, if the participle is deleted from the received text of the first languages.
Probability of erasure distribution can be preset by those skilled in the art, can also be by inputting to the daily voice of user
Habit analysis obtains.Two probability distribution sections can be only set in probability of erasure distribution, and a probability distribution section is corresponding
It deletes, another probability distribution section is corresponding to be retained.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages
Whether need to delete, can generate third random probability value when judging every time, whether judges the third random probability value generated
Instruction is deleted, if so, the participle is deleted in determination, if it is not, then retaining the participle.The third random probability value generated every time can
Energy is identical may also be different.Each participle in the received text of the first languages is carried out one by one using method shown in this step
Processing, can be obtained the transmogrified text of the first languages.
One or more participles in the received text of first languages are carried out position replacement by mode three, obtain the first language
The transmogrified text of kind.Transmogrified text training translation model after being converted by this kind of mode, in translation, there are word order to overturn problem
Colloquial style voice when, have good translation performance.
One or more participles in the received text of the first languages are subjected to a kind of feasibility position replacements, obtain the
The mode of the transmogrified text of one languages is as follows:
Firstly, the received text of the first languages is carried out word segmentation processing;
Secondly, being directed to each participle, the 4th random probability value is generated;According to preset replacement probability distribution judge the 4th with
Whether machine probability value, which indicates, is replaced, if the participle and adjacent participle are carried out position replacement.It can be in replacement probability distribution
Two probability distribution sections, the corresponding replacement in a probability distribution section are only set, another probability distribution section correspondence is not replaced.
Replacement probability distribution can be preset by those skilled in the art, can also be by inputting to the daily voice of user
Habit analysis obtains.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages
Whether need to carry out position replacement, can generate the 4th random probability value when judging every time, judges that the generate the 4th is random general
Whether rate value, which indicates, is replaced, and the participle is carried out position replacement if so, determining, if it is not, then without replacement.It generates every time
4th random probability value possibility is identical may also be different.Using method shown in this step one by one to the standard text of the first languages
Each participle in this is handled, and the transmogrified text of the first languages can be obtained.
Probability threshold value can be preset in system, judge whether the 4th random probability value indicates by the probability threshold value of setting
Replacement.Specifically when judge, the size of the 4th random probability value and probability threshold value can be compared, foundation comparison result determine the 4th with
Whether machine probability value, which indicates, is replaced;Such as: if judging result is that the 4th random probability value is greater than probability threshold value, it is determined that for referring to
Show replacement;If judging result is that the 4th random probability value is less than or equal to probability threshold value, it is determined that be used to indicate without replacement.
It is enumerated respectively in the embodiment of the present invention and has suffered the change that four kinds of received texts by the first languages are converted into the first languages
The mode of shape text.During specific implementation, it can be carried out only in accordance with any one mode in above-mentioned cited mode
Text conversion, can also be by any two kinds or more of combined uses in above-mentioned four kinds of modes.
Step 203: by the received text of the received text of the first languages, the transmogrified text of the first languages and the first languages
Training data of the corresponding second languages text as translation model, and translation model is trained according to training data.
The input method provided through the embodiment of the present invention removes the training side with translation model shown in embodiment one
Outside beneficial effect possessed by method, the transmogrified text that a variety of received texts by the first languages are converted to the first languages is provided
Concrete mode, those skilled in the art can select any one or a variety of transform modes, strong flexibility according to actual needs.
Embodiment three
Referring to Fig. 3, a kind of structural block diagram of the training device of translation model of the embodiment of the present invention three is shown.
The training device of the translation model of the embodiment of the present invention may include: conversion module 301, for by the first languages
Received text is converted to the transmogrified text of the first languages;Training module 302, for by the received text of first languages, institute
The corresponding second languages text of received text of the transmogrified text and first languages of the first languages is stated as translation model
Training data, and the translation model is trained according to the training data.
The training device of the translation model provided through the embodiment of the present invention can will be used for the first languages of training pattern
Received text be converted to the transmogrified texts of the first languages;By the transmogrified text of the received text of the first languages, the first languages with
And first the corresponding second languages texts of received text of languages translation model is instructed as the training data of translation model
Practice.There are the corresponding relationships of transmogrified text and received text in the translation model obtained due to training, work as user input port
After recognizing its corresponding transmogrified text when language voice, the corresponding cypher text of the transmogrified text or translation can be further determined that
The voice output of text, therefore simultaneous interpretation accurately can be carried out to the colloquial style voice of user's input, improve translation model
Translate performance.
Example IV
Referring to Fig. 4, a kind of structural block diagram of the training device of translation model of the embodiment of the present invention four is shown.
The embodiment of the present invention is advanced optimizing to the training device of the translation model in embodiment three, turning over after optimization
The training device for translating model may include: conversion module 401, for the received text of the first languages to be converted to the first languages
Transmogrified text;Training module 402, for by the transmogrified text of the received text of first languages, first languages and
Training data of the corresponding second languages text of the received text of first languages as translation model, and according to the training
Data are trained the translation model.
Optionally, the conversion module 401 may include: repetition submodule 4011, for by the mark of first languages
One or more participles in quasi- text are repeated, and the transmogrified text of the first languages is obtained.
Optionally, the submodule 4011 that repeats may include: the first split cells, for by the mark of first languages
Quasi- text carries out word segmentation processing;First processing units, for according to preset number of repetition probability distribution, determine described first with
The corresponding number of repetition of machine probability value.
Optionally, the conversion module 401 may include: insertion submodule 4012, for the mark in first languages
At one or more insertion position in quasi- text, addition insertion word obtains the transmogrified text of the first languages;Wherein, described
In the received text of the beginning of the sentence position of the received text of one languages, sentence tail position and first languages between any two participle
For insertion position.
Optionally, the insertion submodule 4012 may include: the second determination unit, for determining first languages
Each insertion position in received text;Third determination unit, for determining the corresponding insertion probability in each insertion position;Second processing
Unit generates the second random probability value for being directed to each insertion position;According to preset insertion number probability distribution, determine
The corresponding insertion number of second random probability value, it is determining from insertion word list matched with the insertion position described to insert
The insertion word of indegree, and it is inserted into determining each insertion word.
Optionally, the conversion module 401 may include: to delete submodule 4013, for by the mark of first languages
One or more participles in quasi- text are deleted, and the spoken language text of the first languages is obtained.
Optionally, the deletion submodule 4013 may include: the second split cells, for by the mark of first languages
Quasi- text carries out word segmentation processing;4th determination unit, for determining the corresponding probability of erasure of each participle;Third processing unit is used
In being directed to each participle, third random probability value is generated;The third random chance is judged according to the distribution of preset probability of erasure
Whether value, which indicates, is deleted, if the participle is deleted from the received text of first languages.
Optionally, the conversion module 401 may include: replacement submodule 4014, for by the mark of first languages
One or more participles in quasi- text carry out position replacement, obtain the transmogrified text of the first languages.
Optionally, the replacement submodule 4014 may include: third split cells, for by the mark of first languages
Quasi- text carries out word segmentation processing;Fourth processing unit generates the 4th random probability value for being directed to each participle;According to default
Replacement probability distribution judge whether the 4th random probability value indicates to replace, if by it is described participle with adjacent participle carry out
Position replacement.
The training device of the translation model of the embodiment of the present invention is for realizing phase in previous embodiment one and embodiment two
The training method for the translation model answered, and the beneficial effect with corresponding embodiment of the method, details are not described herein.About upper
The device in embodiment is stated, the concrete mode that wherein modules execute operation carries out in the embodiment of the method
Detailed description, no detailed explanation will be given here.
Embodiment five
The embodiment of the invention also provides a kind of devices for translation model training, include memory and one
Perhaps more than one program one of them or more than one program is stored in memory, and be configured to by one or
It includes the instruction for performing the following operation that more than one processor of person, which executes the one or more programs: by first
The received text of languages is converted to the transmogrified text of the first languages;By the received text of first languages, first languages
Transmogrified text and first languages training data of the corresponding second languages text of received text as translation model,
And the translation model is trained according to the training data.
Referring to Fig. 5, a kind of structural block diagram of device for translation model training of the embodiment of the present invention five is shown.
Fig. 5 is a kind of block diagram of device 600 for translation model training shown according to an exemplary embodiment.Example
Such as, device 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, and plate is set
It is standby, Medical Devices, body-building equipment, personal digital assistant etc..
Referring to Fig. 5, device 600 may include following one or more components: processing component 602, memory 604, power supply
Component 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, and
Communication component 616.
The integrated operation of the usual control device 600 of processing component 602, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing element 602 may include that one or more processors 620 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more modules, just
Interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, it is more to facilitate
Interaction between media component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in equipment 600.These data are shown
Example includes the instruction of any application or method for operating on device 600, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of device 600.Power supply module 606 may include power management system
System, one or more power supplys and other with for device 600 generate, manage, and distribute the associated component of electric power.
Multimedia component 608 includes the screen of one output interface of offer between described device 600 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 608 includes a front camera and/or rear camera.When equipment 600 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike
Wind (MIC), when device 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched
It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set
Part 616 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 612 provides interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for device 600 is commented
Estimate.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 600, and sensor module 614 can be with 600 1 components of detection device 600 or device
Position change, the existence or non-existence that user contacts with device 600,600 orientation of device or acceleration/deceleration and device 600
Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between device 600 and other equipment.Device
600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 600 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of machine readable storage medium including instruction is additionally provided, for example including instruction
Memory 604, above-metioned instruction can by the processor 620 of device 600 execute to complete the above method.For example, the machine can
Reading storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance
And generate bigger difference, may include one or more central processing units (central processing units,
CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs
1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932
It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or
More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further
Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900
In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
A kind of machine readable storage medium, when the instruction in the storage medium is by device (terminal or server)
When managing device execution, the training method for executing a kind of translation model is enabled a device to, which comprises by the mark of the first languages
Quasi- text is converted to the transmogrified text of the first languages;By the received text of first languages, the deformation text of first languages
Training data of the corresponding second languages text of received text of this and first languages as translation model, and according to institute
Training data is stated to be trained the translation model.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.For system embodiment
For, since it is basically similar to the method embodiment, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation
It defends oneself bright.
A kind of training method of translation model provided by the present invention, device and machine readable media have been carried out in detail above
Thin to introduce, used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (12)
1. a kind of training method of translation model characterized by comprising
The received text of first languages is converted to the transmogrified text of the first languages;
By the received text of the received text of first languages, the transmogrified text of first languages and first languages
Training data of the corresponding second languages text as translation model, and the translation model is carried out according to the training data
Training.
2. the method according to claim 1, wherein the received text by the first languages is converted to the first language
The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are repeated, the deformation text of the first languages is obtained
This.
3. according to the method described in claim 2, it is characterized in that, one by the received text of first languages
Or multiple participles are repeated, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, the first random probability value is generated;According to preset number of repetition probability distribution, determine described first with
The participle is repeated the number of repetition time by the corresponding number of repetition of machine probability value.
4. the method according to claim 1, wherein the received text by the first languages is converted to the first language
The step of transmogrified text of kind, comprising:
At one or more insertion position in the received text of first languages, addition insertion word obtains the first languages
Transmogrified text;Wherein, the mark of the beginning of the sentence position, sentence tail position of the received text of first languages and first languages
In quasi- text it is any two participle between be insertion position.
5. according to the method described in claim 4, it is characterized in that, one in the received text of first languages or
At multiple insertion positions, the step of word obtains the transmogrified text of the first languages is inserted into addition, comprising:
Determine each insertion position in the received text of first languages;
For each insertion position, the second random probability value is generated;According to preset insertion number probability distribution, described is determined
The corresponding insertion number of two random probability values, the determining and matched insertion number in the insertion position from insertion word list
Insertion word, and be inserted into determining each insertion word.
6. the method according to claim 1, wherein the received text by the first languages is converted to the first language
The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are deleted, the transmogrified text of the first languages is obtained.
7. according to the method described in claim 6, it is characterized in that, one by the received text of first languages
Or multiple participles are deleted, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, third random probability value is generated;Judge that the third is general at random according to the distribution of preset probability of erasure
Whether rate value, which indicates, is deleted, if the participle is deleted from the received text of first languages.
8. the method according to claim 1, wherein the received text by the first languages is converted to the first language
The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are subjected to position replacement, obtain the deformation of the first languages
Text.
9. according to the method described in claim 8, it is characterized in that, one by the received text of first languages
Or multiple participles carry out position replacements, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, the 4th random probability value is generated;Judge that the described 4th is random general according to preset replacement probability distribution
Whether rate value, which indicates, is replaced, if the participle is carried out position replacement with adjacent participle.
10. a kind of training device of translation model characterized by comprising
Conversion module, for the received text of the first languages to be converted to the transmogrified text of the first languages;
Training module, for by the received text of first languages, the transmogrified text of first languages and described first
Training data of the corresponding second languages text of the received text of languages as translation model, and according to the training data to institute
Translation model is stated to be trained.
11. a kind of device for translation model training, which is characterized in that including memory and one or more than one
Program, wherein the one or more programs are stored in the memory, and are configured to by one or one
It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:
The received text of first languages is converted to the transmogrified text of the first languages;
By the received text of the received text of first languages, the transmogrified text of first languages and first languages
Training data of the corresponding second languages text as translation model, and the translation model is carried out according to the training data
Training.
12. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held
The training method of translation model of the row as described in one or more in claim 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711448599.5A CN109977426A (en) | 2017-12-27 | 2017-12-27 | A kind of training method of translation model, device and machine readable media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711448599.5A CN109977426A (en) | 2017-12-27 | 2017-12-27 | A kind of training method of translation model, device and machine readable media |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977426A true CN109977426A (en) | 2019-07-05 |
Family
ID=67071176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711448599.5A Pending CN109977426A (en) | 2017-12-27 | 2017-12-27 | A kind of training method of translation model, device and machine readable media |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977426A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027332A (en) * | 2019-12-11 | 2020-04-17 | 北京百度网讯科技有限公司 | Method and device for generating translation model |
CN111291560A (en) * | 2020-03-06 | 2020-06-16 | 深圳前海微众银行股份有限公司 | Sample expansion method, terminal, device and readable storage medium |
CN112487833A (en) * | 2020-12-01 | 2021-03-12 | 中译语通科技(青岛)有限公司 | Machine translation method and translation system thereof |
CN112597779A (en) * | 2020-12-24 | 2021-04-02 | 语联网(武汉)信息技术有限公司 | Document translation method and device |
CN112784612A (en) * | 2021-01-26 | 2021-05-11 | 浙江香侬慧语科技有限责任公司 | Method, apparatus, medium, and device for synchronous machine translation based on iterative modification |
CN113345422A (en) * | 2021-04-23 | 2021-09-03 | 北京巅峰科技有限公司 | Voice data processing method, device, equipment and storage medium |
CN111258991B (en) * | 2020-01-08 | 2023-11-07 | 北京小米松果电子有限公司 | Data processing method, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1591415A (en) * | 2003-09-01 | 2005-03-09 | 株式会社国际电气通信基础技术研究所 | Machine translation apparatus and machine translation computer program |
TW200805091A (en) * | 2005-10-28 | 2008-01-16 | Rozetta Corp | Apparatus, method, and program for determining naturalness of array of words |
CN103956162A (en) * | 2014-04-04 | 2014-07-30 | 上海元趣信息技术有限公司 | Voice recognition method and device oriented towards child |
CN106547743A (en) * | 2015-09-23 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of method translated and its system |
CN106708812A (en) * | 2016-12-19 | 2017-05-24 | 新译信息科技(深圳)有限公司 | Machine translation model obtaining method and device |
CN106782502A (en) * | 2016-12-29 | 2017-05-31 | 昆山库尔卡人工智能科技有限公司 | A kind of speech recognition equipment of children robot |
-
2017
- 2017-12-27 CN CN201711448599.5A patent/CN109977426A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1591415A (en) * | 2003-09-01 | 2005-03-09 | 株式会社国际电气通信基础技术研究所 | Machine translation apparatus and machine translation computer program |
TW200805091A (en) * | 2005-10-28 | 2008-01-16 | Rozetta Corp | Apparatus, method, and program for determining naturalness of array of words |
CN103956162A (en) * | 2014-04-04 | 2014-07-30 | 上海元趣信息技术有限公司 | Voice recognition method and device oriented towards child |
CN106547743A (en) * | 2015-09-23 | 2017-03-29 | 阿里巴巴集团控股有限公司 | A kind of method translated and its system |
CN106708812A (en) * | 2016-12-19 | 2017-05-24 | 新译信息科技(深圳)有限公司 | Machine translation model obtaining method and device |
CN106782502A (en) * | 2016-12-29 | 2017-05-31 | 昆山库尔卡人工智能科技有限公司 | A kind of speech recognition equipment of children robot |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027332A (en) * | 2019-12-11 | 2020-04-17 | 北京百度网讯科技有限公司 | Method and device for generating translation model |
CN111258991B (en) * | 2020-01-08 | 2023-11-07 | 北京小米松果电子有限公司 | Data processing method, device and storage medium |
CN111291560A (en) * | 2020-03-06 | 2020-06-16 | 深圳前海微众银行股份有限公司 | Sample expansion method, terminal, device and readable storage medium |
CN112487833A (en) * | 2020-12-01 | 2021-03-12 | 中译语通科技(青岛)有限公司 | Machine translation method and translation system thereof |
CN112597779A (en) * | 2020-12-24 | 2021-04-02 | 语联网(武汉)信息技术有限公司 | Document translation method and device |
CN112784612A (en) * | 2021-01-26 | 2021-05-11 | 浙江香侬慧语科技有限责任公司 | Method, apparatus, medium, and device for synchronous machine translation based on iterative modification |
CN112784612B (en) * | 2021-01-26 | 2023-12-22 | 浙江香侬慧语科技有限责任公司 | Method, device, medium and equipment for synchronous machine translation based on iterative modification |
CN113345422A (en) * | 2021-04-23 | 2021-09-03 | 北京巅峰科技有限公司 | Voice data processing method, device, equipment and storage medium |
CN113345422B (en) * | 2021-04-23 | 2024-02-20 | 北京巅峰科技有限公司 | Voice data processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977426A (en) | A kind of training method of translation model, device and machine readable media | |
WO2021077529A1 (en) | Neural network model compressing method, corpus translation method and device thereof | |
CN106202150B (en) | Information display method and device | |
CN107992812A (en) | A kind of lip reading recognition methods and device | |
CN106251869B (en) | Voice processing method and device | |
CN111524521A (en) | Voiceprint extraction model training method, voiceprint recognition method, voiceprint extraction model training device, voiceprint recognition device and voiceprint recognition medium | |
CN105335754A (en) | Character recognition method and device | |
CN107870904A (en) | A kind of interpretation method, device and the device for translation | |
CN110210310A (en) | A kind of method for processing video frequency, device and the device for video processing | |
CN107564526B (en) | Processing method, apparatus and machine-readable medium | |
CN108073572B (en) | Information processing method and device, simultaneous interpretation system | |
CN109002184A (en) | A kind of association method and device of input method candidate word | |
CN104361896B (en) | Voice quality assessment equipment, method and system | |
CN107274903A (en) | Text handling method and device, the device for text-processing | |
CN109961791A (en) | A kind of voice information processing method, device and electronic equipment | |
WO2022037600A1 (en) | Abstract recording method and apparatus, and computer device and storage medium | |
CN105139848B (en) | Data transfer device and device | |
KR20210032875A (en) | Voice information processing method, apparatus, program and storage medium | |
CN108628813A (en) | Treating method and apparatus, the device for processing | |
CN107291704A (en) | Treating method and apparatus, the device for processing | |
CN109471919A (en) | Empty anaphora resolution method and device | |
CN108650543A (en) | The caption editing method and device of video | |
CN111739535A (en) | Voice recognition method and device and electronic equipment | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN111147914A (en) | Video processing method, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |