CN109977426A - A kind of training method of translation model, device and machine readable media - Google Patents

A kind of training method of translation model, device and machine readable media Download PDF

Info

Publication number
CN109977426A
CN109977426A CN201711448599.5A CN201711448599A CN109977426A CN 109977426 A CN109977426 A CN 109977426A CN 201711448599 A CN201711448599 A CN 201711448599A CN 109977426 A CN109977426 A CN 109977426A
Authority
CN
China
Prior art keywords
languages
text
transmogrified
received text
translation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711448599.5A
Other languages
Chinese (zh)
Inventor
施亮亮
王宇光
姜里羊
阳家俊
李响
卫林钰
陈伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201711448599.5A priority Critical patent/CN109977426A/en
Publication of CN109977426A publication Critical patent/CN109977426A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a kind of training method of translation model, device and machine readable medias, wherein the described method includes: the received text of the first languages to be converted to the transmogrified text of the first languages;The corresponding second languages text of received text of the received text of first languages, the transmogrified text and first languages of first languages is trained the translation model as the training data of translation model, and according to the training data.The translation model that the training program of the translation model provided through the embodiment of the present invention trains accurately can carry out simultaneous interpretation to the colloquial style voice of user's input, improve the translation performance of translation model.

Description

A kind of training method of translation model, device and machine readable media
Technical field
The present invention relates to bilingual translation technical field, more particularly to a kind of training method of translation model, device and Machine readable media.
Background technique
With the increase of international exchange, linked up using the languages of different language more and more frequent.To overcome languages to exchange Obstacle carries out online voiced translation i.e. simultaneous interpretation by translation model and obtains extensively based on translation model is installed in client Application.
Online voiced translation relates generally to two links, and first is to carry out speech recognition, i.e., the first language inputted user The voice signal of kind is converted to text;Second is to be translated by translation model to text, obtains as translation result The text of two languages finally provides a user the second languages text or language message.
Translation model is based on the training of bilingual sentence pair and generates, and is formal written for training the bilingual sentence pair of translation model Language is received text, therefore translation model is only capable of accurately turning over the corresponding voice of received text of user's input in unison It translates.And in actual application, it is accustomed to the voice to be translated excessively colloquial style of user's input for fear of speaking for user, turns at this time Translating model will be unable to accurately carry out simultaneous interpretation to the voice of user's input, influence the translation performance of translation model.
Summary of the invention
The present invention provides a kind of training method of translation model, device and machine readable medias, can be accurately right The colloquial style voice of user's input carries out simultaneous interpretation, improves the performance of interpreting of translation model.
To solve the above-mentioned problems, the invention discloses a kind of training methods of translation model, wherein the method packet It includes: the received text of the first languages is converted to the transmogrified text of the first languages;By the received text of first languages, described The transmogrified text of first languages and the corresponding second languages text of the received text of first languages are as translation model Training data, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the invention also discloses a kind of training device of translation model, wherein described device packet It includes: conversion module, for the received text of the first languages to be converted to the transmogrified text of the first languages;Training module, being used for will The received text of the received text of first languages, the transmogrified text of first languages and first languages is corresponding Training data of the second languages text as translation model, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the present invention discloses a kind of device for translation model training, including memory again, And one or more than one program, wherein the one or more programs are stored in the memory, and It is configured to execute the one or more programs by one or more than one processor to include following for carrying out The instruction of operation: the received text of the first languages is converted to the transmogrified text of the first languages;By the standard of first languages The corresponding second languages text of the received text of text, the transmogrified text of first languages and first languages, which is used as, to be turned over The training data of model is translated, and the translation model is trained according to the training data.
To solve the above-mentioned problems, the invention also discloses a kind of machine readable medias, instruction are stored thereon with, when by one When a or multiple processors execute, so that device executes the training method of any one translation model as described in the present invention.
Compared with prior art, the invention has the following advantages that
Training method, device and the machine readable media of translation model provided in an embodiment of the present invention can will be used for The received text of first languages of training pattern is converted to the transmogrified text of the first languages;By the received text of the first languages, The training number of the transmogrified text of one languages and the corresponding second languages text of the received text of the first languages as translation model It is trained according to translation model.There are transmogrified text passes corresponding with received text in the translation model obtained due to training System, thus when user input colloquial style voice when recognize its corresponding transmogrified text after, can further determine that the transmogrified text The voice output of corresponding cypher text or cypher text, therefore accurately the colloquial style voice of user's input can be carried out in unison Translation, improves the translation performance of translation model.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of the training method of according to embodiments of the present invention one translation model;
Fig. 2 is a kind of step flow chart of the training method of according to embodiments of the present invention two translation model;
Fig. 3 is a kind of structural block diagram of the training device of according to embodiments of the present invention three translation model;
Fig. 4 is a kind of structural block diagram of the training device of according to embodiments of the present invention four translation model;
Fig. 5 is a kind of structural block diagram of according to embodiments of the present invention five device for translation model training;And
Fig. 6 is the structural block diagram of the server in the embodiment of the present invention five.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
The received text for being used for the first languages of training is converted into the transmogrified text of the first languages in the embodiment of the present invention, The second language that the transmogrified text of received text, the first languages and the received text of the first languages based on the first languages are translated Kind text training translation model, passing through the translation model that training obtains can determine the first languages colloquial style voice pair of user's input The received text for the first languages answered, thus the corresponding second languages text of the received text for obtaining the first languages, output second Languages text or the corresponding voice of the second languages text, to realize the accurate translation to colloquial style voice.
The embodiment of the present invention can be applied to voiced translation, simultaneous interpretation etc. and arbitrarily need to be not result progress to language In the scene of translation on line.First languages and the second languages can be used for indicating different bilinguals, above-mentioned first languages and Two languages can be obtained by user is preset, can also be obtained by analyzing the historical behavior of user.It is alternatively possible to which user is most normal The languages used are determined as the second languages as the first languages, by other languages in addition to the first languages.Such as: for Chinese For (Chinese) is the user of mother tongue, the first languages can be Chinese (Chinese), the second languages can for English, Japanese, Korean, Japanese, German, French, minority language kind one kind or combination.
Embodiment one
Referring to Fig.1, a kind of step flow chart of the training method of translation model of the embodiment of the present invention one is shown.
The training method of the translation model of the embodiment of the present invention the following steps are included:
Step 101: the received text of the first languages is converted to the transmogrified text of the first languages.
During hands-on, need to be trained according to the bilingual sentence pair translation model of multiple groups, in the embodiment of the present invention It is illustrated by taking the training of one group of bilingual sentence pair as an example.Every group of bilingual sentence pair includes the received text and the standard of the first languages The corresponding second languages text of text, the second languages text is also received text.The translation model trained can be by the first language Kind text or voiced translation at the second languages text or voice.
The transmogrified text of first languages is close to spoken expression text, we can be by carrying out phase to received text It should handle, so that the text after conversion is closer to actual oral expression mode.Usually colloquial voice input can deposit In following several problems: word is repeated, is overturned there are redundancy modal particle, sentence incompleteness or word order etc..To ensure to be trained Translation model simultaneous interpretation accurately can be carried out to colloquial style voice, then need to introduce in translation model training with it is spoken The close transmogrified text of expression way.
Correspondingly, the standard text of the first languages of training can will be used for by following any one or more mode combination Originally it is converted to the transmogrified text of the first languages: the participle in the received text of the first languages is repeated with certain probability with suitable Answer the problem of word repeats in colloquial style voice;In the received text of the first languages with certain probability be inserted into default insertion word with There are redundancy tone word problems in adaptation colloquial style voice;By the participle in the received text of the first languages with certain probability into Row is deleted to adapt to the problem of sentence incompleteness in colloquial style voice;By the participle in the received text of the first languages with certain probability Position replacement is carried out to adapt to the problem that word order is reverse in colloquial style voice.During specific implementation, those skilled in the art The received text of first languages is converted to by the combination that can select any one of the above or various ways according to actual needs The transmogrified text of one languages.
Step 102: by the received text of the received text of the first languages, the transmogrified text of the first languages and the first languages Training data of the corresponding second languages text as translation model, and translation model is trained according to training data.
In the training process, can by the received text of the first languages, the first languages transmogrified text as a whole, A bilingual sentence pair of group is constituted with the second languages text, the bilingual sentence pair according to composition is trained translation model.For according to According to the concrete mode that bilingual sentence pair is trained model, referring to existing the relevant technologies, in the embodiment of the present invention not to this Do concrete restriction.
In translation model after training, there are the received texts of the first languages, the transmogrified text of the first languages and first The corresponding second languages text of the received text of languages.Therefore, in the specific application process, if user's input colloquial first Languages voice, translation model can obtain the transmogrified text of the first languages to colloquial first languages speech recognition, determine first The received text of corresponding first languages of languages transmogrified text, thus corresponding second languages of the received text for obtaining the first languages Text exports the second languages text or the corresponding voice of the second languages text, to realize precisely turning over to colloquial style voice It translates.
To sum up, the training method of translation model provided in an embodiment of the present invention can will be used for the first language of training pattern The received text of kind is converted to the transmogrified text of the first languages;By the received text of the first languages, the transmogrified text of the first languages And first languages the corresponding second languages text of received text as translation model training data to translation model carry out Training.Exist in the translation model obtained due to training and is closed close to the transmogrified text of oral expression is corresponding with received text System, thus when user input colloquial style voice when recognize its corresponding transmogrified text after, can further determine that the transmogrified text The voice output of corresponding cypher text or cypher text, therefore accurately the colloquial style voice of user's input can be carried out in unison Translation, improves the translation performance of translation model.
Embodiment two
Referring to Fig. 2, a kind of step flow chart of the training method of translation model of the embodiment of the present invention two is shown.
The translation model training method of the embodiment of the present invention specifically includes the following steps:
Step 201: the received text of the first languages is subjected to word segmentation processing.
The translation model trained supports the simultaneous interpretation between the first languages voice, the second language voice.In hands-on Cheng Zhong needs to be trained according to the bilingual sentence pair translation model of multiple groups, with the training of one group of bilingual sentence pair in the embodiment of the present invention For be illustrated.Every group of bilingual sentence pair includes the received text and the corresponding second languages text of the received text of the first languages This.For enable trained translation model precisely translate input the first languages spoken voice, training translation model When introduce the corresponding colloquial style text of received text of the first languages, therefore the received text by the first languages is needed to be converted into correspondence Transmogrified text, wherein step 201 to step 202 be a kind of feasibility transform mode.
The received text of first languages can may be passage for a short sentence, therefore comprising more in received text A participle.When carrying out participle division to received text, can be divided according to participle table preset in system.Such as: the The received text of one sentence languages is " we have a meeting afternoon ", then can obtain " we ", " afternoon " " meeting " three after word segmentation processing A participle.
Step 202: being directed to each participle, generate the first random probability value;It is true according to preset number of repetition probability distribution Participle is repeated the number of repetition time by the fixed corresponding number of repetition of first random probability value.
Number of repetition probability distribution is preset in translation model training system, number of repetition probability distribution can be by this field Technical staff presets, can also be by analyzing to obtain to the daily voice input habit of user.Number of repetition probability distribution is real It is the corresponding relationship of recurrence probability and number of repetition in matter.Such as: setting recurrence probability is that 0.6 corresponding number of repetition is 1, is repeated Probability is that 0.2 corresponding number of repetition is 2, and recurrence probability is that 0.12 corresponding number of repetition is 3, and recurrence probability is 0.08 corresponding repetition Number is 4.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages Whether need to repeat, can generate the first random probability value when judging every time, judges that the first random probability value generated is corresponding Number of repetition, if corresponding number of repetition is greater than or equal to 1, it is determined that repeated to the participle, if corresponding repetition time Number is 0, then the participle is without repeating.The the first random probability value possibility generated every time is identical may also be different.
Such as: when judging whether segment " we " needs to repeat, translation model training system generates the first random probability value It is 0.6, can determine that the corresponding number of repetition of recurrence probability 0.6 is 1 by number of repetition probability distribution, accordingly, it can be determined that needing It is repeated once " we " this participle.
Each participle in the received text of the first languages is handled one by one using method shown in step 202, i.e., The transmogrified text of available first languages.Spoken language text training translation model after being converted by this kind of mode, exists in translation When the colloquial style voice of word replication problem, there is good translation performance.
Step 201 to step 202 is to repeat one or more participles in the received text of the first languages, is obtained The specific implementation of the transmogrified text of first languages.During specific implementation, the received text of the first languages is converted to The transmogrified text of first languages is not limited to carry out repeating a kind of this implementation to participle, can also be real in the following way It is existing:
At mode one, one or more insertion position in the received text of the first languages, addition insertion word is obtained The transmogrified text of first languages.
Wherein, appoint in the beginning of the sentence position, sentence tail position of the received text of the first languages and the received text of the first languages It is insertion position between two participle of meaning.Transmogrified text training translation model after being converted by this kind of mode, in translation, there are superfluous When the colloquial style voice of remaining modal particle problem, there is good translation performance.
A kind of feasibility at one or more insertion position in the received text of the first languages, addition insertion word The mode for obtaining the transmogrified text of the first languages is as follows:
Firstly, determining each insertion position in the received text of the first languages;
Such as: the received text of the first languages is " being originally in this way ", then " originally " "Yes" " this is obtained after word segmentation processing Three, sample " participles are an insertion position before then can determine " originally ";It " originally " is an insertion position between "Yes"; It is an insertion position between "Yes" and " in this way ", is an insertion position after " in this way ".
Secondly, being directed to each insertion position, the second random probability value is generated;According to preset insertion number probability distribution, Determine the corresponding insertion number of the second random probability value, the matched insertion number in determining and insertion position from insertion word list It is inserted into word, and is inserted into determining each insertion word.
Such as: the corresponding insertion number of the second random probability value is three, then determining and insertion position from insertion time list Matched three insertions word is set, these three determining insertion words are sequentially inserted at the insertion position.It is inserted into number probability distribution It can be preset by those skilled in the art, it can also be by analyzing to obtain to the daily voice input habit of user.
Being inserted into word list may include multiple modal particles, and insertion word list can be preset in by those skilled in the art to be turned over It translates in model training systems, can also be obtained by speech analysis of the translation model training system to user's common input.The present invention The specific word for including in insertion word list is not particularly limited in embodiment.It is right one by one using method shown in this step Each insertion position in the received text of first languages is handled, and the transmogrified text of the first languages can be obtained.
Mode two deletes one or more participles in the received text of the first languages, obtains the deformation of the first languages Text., there is the colloquial style of sentence incompleteness in translation in the transmogrified text training translation model after being converted by this kind of mode When voice, there is good translation performance.
A kind of feasibility one or more participles in the received text of the first languages are deleted, obtain the first languages The mode of transmogrified text is as follows:
Firstly, the received text of the first languages is carried out word segmentation processing;
Secondly, being directed to each participle, third random probability value is generated;According to preset probability of erasure distribution judge third with Whether machine probability value, which indicates, is deleted, if the participle is deleted from the received text of the first languages.
Probability of erasure distribution can be preset by those skilled in the art, can also be by inputting to the daily voice of user Habit analysis obtains.Two probability distribution sections can be only set in probability of erasure distribution, and a probability distribution section is corresponding It deletes, another probability distribution section is corresponding to be retained.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages Whether need to delete, can generate third random probability value when judging every time, whether judges the third random probability value generated Instruction is deleted, if so, the participle is deleted in determination, if it is not, then retaining the participle.The third random probability value generated every time can Energy is identical may also be different.Each participle in the received text of the first languages is carried out one by one using method shown in this step Processing, can be obtained the transmogrified text of the first languages.
One or more participles in the received text of first languages are carried out position replacement by mode three, obtain the first language The transmogrified text of kind.Transmogrified text training translation model after being converted by this kind of mode, in translation, there are word order to overturn problem Colloquial style voice when, have good translation performance.
One or more participles in the received text of the first languages are subjected to a kind of feasibility position replacements, obtain the The mode of the transmogrified text of one languages is as follows:
Firstly, the received text of the first languages is carried out word segmentation processing;
Secondly, being directed to each participle, the 4th random probability value is generated;According to preset replacement probability distribution judge the 4th with Whether machine probability value, which indicates, is replaced, if the participle and adjacent participle are carried out position replacement.It can be in replacement probability distribution Two probability distribution sections, the corresponding replacement in a probability distribution section are only set, another probability distribution section correspondence is not replaced.
Replacement probability distribution can be preset by those skilled in the art, can also be by inputting to the daily voice of user Habit analysis obtains.
Each participle is judged one by one when carrying out transmogrified text conversion comprising multiple participles in the received text of first languages Whether need to carry out position replacement, can generate the 4th random probability value when judging every time, judges that the generate the 4th is random general Whether rate value, which indicates, is replaced, and the participle is carried out position replacement if so, determining, if it is not, then without replacement.It generates every time 4th random probability value possibility is identical may also be different.Using method shown in this step one by one to the standard text of the first languages Each participle in this is handled, and the transmogrified text of the first languages can be obtained.
Probability threshold value can be preset in system, judge whether the 4th random probability value indicates by the probability threshold value of setting Replacement.Specifically when judge, the size of the 4th random probability value and probability threshold value can be compared, foundation comparison result determine the 4th with Whether machine probability value, which indicates, is replaced;Such as: if judging result is that the 4th random probability value is greater than probability threshold value, it is determined that for referring to Show replacement;If judging result is that the 4th random probability value is less than or equal to probability threshold value, it is determined that be used to indicate without replacement.
It is enumerated respectively in the embodiment of the present invention and has suffered the change that four kinds of received texts by the first languages are converted into the first languages The mode of shape text.During specific implementation, it can be carried out only in accordance with any one mode in above-mentioned cited mode Text conversion, can also be by any two kinds or more of combined uses in above-mentioned four kinds of modes.
Step 203: by the received text of the received text of the first languages, the transmogrified text of the first languages and the first languages Training data of the corresponding second languages text as translation model, and translation model is trained according to training data.
The input method provided through the embodiment of the present invention removes the training side with translation model shown in embodiment one Outside beneficial effect possessed by method, the transmogrified text that a variety of received texts by the first languages are converted to the first languages is provided Concrete mode, those skilled in the art can select any one or a variety of transform modes, strong flexibility according to actual needs.
Embodiment three
Referring to Fig. 3, a kind of structural block diagram of the training device of translation model of the embodiment of the present invention three is shown.
The training device of the translation model of the embodiment of the present invention may include: conversion module 301, for by the first languages Received text is converted to the transmogrified text of the first languages;Training module 302, for by the received text of first languages, institute The corresponding second languages text of received text of the transmogrified text and first languages of the first languages is stated as translation model Training data, and the translation model is trained according to the training data.
The training device of the translation model provided through the embodiment of the present invention can will be used for the first languages of training pattern Received text be converted to the transmogrified texts of the first languages;By the transmogrified text of the received text of the first languages, the first languages with And first the corresponding second languages texts of received text of languages translation model is instructed as the training data of translation model Practice.There are the corresponding relationships of transmogrified text and received text in the translation model obtained due to training, work as user input port After recognizing its corresponding transmogrified text when language voice, the corresponding cypher text of the transmogrified text or translation can be further determined that The voice output of text, therefore simultaneous interpretation accurately can be carried out to the colloquial style voice of user's input, improve translation model Translate performance.
Example IV
Referring to Fig. 4, a kind of structural block diagram of the training device of translation model of the embodiment of the present invention four is shown.
The embodiment of the present invention is advanced optimizing to the training device of the translation model in embodiment three, turning over after optimization The training device for translating model may include: conversion module 401, for the received text of the first languages to be converted to the first languages Transmogrified text;Training module 402, for by the transmogrified text of the received text of first languages, first languages and Training data of the corresponding second languages text of the received text of first languages as translation model, and according to the training Data are trained the translation model.
Optionally, the conversion module 401 may include: repetition submodule 4011, for by the mark of first languages One or more participles in quasi- text are repeated, and the transmogrified text of the first languages is obtained.
Optionally, the submodule 4011 that repeats may include: the first split cells, for by the mark of first languages Quasi- text carries out word segmentation processing;First processing units, for according to preset number of repetition probability distribution, determine described first with The corresponding number of repetition of machine probability value.
Optionally, the conversion module 401 may include: insertion submodule 4012, for the mark in first languages At one or more insertion position in quasi- text, addition insertion word obtains the transmogrified text of the first languages;Wherein, described In the received text of the beginning of the sentence position of the received text of one languages, sentence tail position and first languages between any two participle For insertion position.
Optionally, the insertion submodule 4012 may include: the second determination unit, for determining first languages Each insertion position in received text;Third determination unit, for determining the corresponding insertion probability in each insertion position;Second processing Unit generates the second random probability value for being directed to each insertion position;According to preset insertion number probability distribution, determine The corresponding insertion number of second random probability value, it is determining from insertion word list matched with the insertion position described to insert The insertion word of indegree, and it is inserted into determining each insertion word.
Optionally, the conversion module 401 may include: to delete submodule 4013, for by the mark of first languages One or more participles in quasi- text are deleted, and the spoken language text of the first languages is obtained.
Optionally, the deletion submodule 4013 may include: the second split cells, for by the mark of first languages Quasi- text carries out word segmentation processing;4th determination unit, for determining the corresponding probability of erasure of each participle;Third processing unit is used In being directed to each participle, third random probability value is generated;The third random chance is judged according to the distribution of preset probability of erasure Whether value, which indicates, is deleted, if the participle is deleted from the received text of first languages.
Optionally, the conversion module 401 may include: replacement submodule 4014, for by the mark of first languages One or more participles in quasi- text carry out position replacement, obtain the transmogrified text of the first languages.
Optionally, the replacement submodule 4014 may include: third split cells, for by the mark of first languages Quasi- text carries out word segmentation processing;Fourth processing unit generates the 4th random probability value for being directed to each participle;According to default Replacement probability distribution judge whether the 4th random probability value indicates to replace, if by it is described participle with adjacent participle carry out Position replacement.
The training device of the translation model of the embodiment of the present invention is for realizing phase in previous embodiment one and embodiment two The training method for the translation model answered, and the beneficial effect with corresponding embodiment of the method, details are not described herein.About upper The device in embodiment is stated, the concrete mode that wherein modules execute operation carries out in the embodiment of the method Detailed description, no detailed explanation will be given here.
Embodiment five
The embodiment of the invention also provides a kind of devices for translation model training, include memory and one Perhaps more than one program one of them or more than one program is stored in memory, and be configured to by one or It includes the instruction for performing the following operation that more than one processor of person, which executes the one or more programs: by first The received text of languages is converted to the transmogrified text of the first languages;By the received text of first languages, first languages Transmogrified text and first languages training data of the corresponding second languages text of received text as translation model, And the translation model is trained according to the training data.
Referring to Fig. 5, a kind of structural block diagram of device for translation model training of the embodiment of the present invention five is shown.
Fig. 5 is a kind of block diagram of device 600 for translation model training shown according to an exemplary embodiment.Example Such as, device 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, and plate is set It is standby, Medical Devices, body-building equipment, personal digital assistant etc..
Referring to Fig. 5, device 600 may include following one or more components: processing component 602, memory 604, power supply Component 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614, and Communication component 616.
The integrated operation of the usual control device 600 of processing component 602, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 602 may include that one or more processors 620 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more modules, just Interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, it is more to facilitate Interaction between media component 608 and processing component 602.
Memory 604 is configured as storing various types of data to support the operation in equipment 600.These data are shown Example includes the instruction of any application or method for operating on device 600, contact data, and telephone book data disappears Breath, picture, video etc..Memory 604 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 606 provides electric power for the various assemblies of device 600.Power supply module 606 may include power management system System, one or more power supplys and other with for device 600 generate, manage, and distribute the associated component of electric power.
Multimedia component 608 includes the screen of one output interface of offer between described device 600 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 608 includes a front camera and/or rear camera.When equipment 600 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a Mike Wind (MIC), when device 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 604 or via communication set Part 616 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 612 provides interface between processing component 602 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 614 includes one or more sensors, and the state for providing various aspects for device 600 is commented Estimate.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, and the relative positioning of component, for example, it is described Component is the display and keypad of device 600, and sensor module 614 can be with 600 1 components of detection device 600 or device Position change, the existence or non-existence that user contacts with device 600,600 orientation of device or acceleration/deceleration and device 600 Temperature change.Sensor module 614 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 614 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 616 is configured to facilitate the communication of wired or wireless way between device 600 and other equipment.Device 600 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 616 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 616 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 600 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of machine readable storage medium including instruction is additionally provided, for example including instruction Memory 604, above-metioned instruction can by the processor 620 of device 600 execute to complete the above method.For example, the machine can Reading storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932 It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900 In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
A kind of machine readable storage medium, when the instruction in the storage medium is by device (terminal or server) When managing device execution, the training method for executing a kind of translation model is enabled a device to, which comprises by the mark of the first languages Quasi- text is converted to the transmogrified text of the first languages;By the received text of first languages, the deformation text of first languages Training data of the corresponding second languages text of received text of this and first languages as translation model, and according to institute Training data is stated to be trained the translation model.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.For system embodiment For, since it is basically similar to the method embodiment, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.
A kind of training method of translation model provided by the present invention, device and machine readable media have been carried out in detail above Thin to introduce, used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (12)

1. a kind of training method of translation model characterized by comprising
The received text of first languages is converted to the transmogrified text of the first languages;
By the received text of the received text of first languages, the transmogrified text of first languages and first languages Training data of the corresponding second languages text as translation model, and the translation model is carried out according to the training data Training.
2. the method according to claim 1, wherein the received text by the first languages is converted to the first language The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are repeated, the deformation text of the first languages is obtained This.
3. according to the method described in claim 2, it is characterized in that, one by the received text of first languages Or multiple participles are repeated, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, the first random probability value is generated;According to preset number of repetition probability distribution, determine described first with The participle is repeated the number of repetition time by the corresponding number of repetition of machine probability value.
4. the method according to claim 1, wherein the received text by the first languages is converted to the first language The step of transmogrified text of kind, comprising:
At one or more insertion position in the received text of first languages, addition insertion word obtains the first languages Transmogrified text;Wherein, the mark of the beginning of the sentence position, sentence tail position of the received text of first languages and first languages In quasi- text it is any two participle between be insertion position.
5. according to the method described in claim 4, it is characterized in that, one in the received text of first languages or At multiple insertion positions, the step of word obtains the transmogrified text of the first languages is inserted into addition, comprising:
Determine each insertion position in the received text of first languages;
For each insertion position, the second random probability value is generated;According to preset insertion number probability distribution, described is determined The corresponding insertion number of two random probability values, the determining and matched insertion number in the insertion position from insertion word list Insertion word, and be inserted into determining each insertion word.
6. the method according to claim 1, wherein the received text by the first languages is converted to the first language The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are deleted, the transmogrified text of the first languages is obtained.
7. according to the method described in claim 6, it is characterized in that, one by the received text of first languages Or multiple participles are deleted, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, third random probability value is generated;Judge that the third is general at random according to the distribution of preset probability of erasure Whether rate value, which indicates, is deleted, if the participle is deleted from the received text of first languages.
8. the method according to claim 1, wherein the received text by the first languages is converted to the first language The step of transmogrified text of kind, comprising:
One or more participles in the received text of first languages are subjected to position replacement, obtain the deformation of the first languages Text.
9. according to the method described in claim 8, it is characterized in that, one by the received text of first languages Or multiple participles carry out position replacements, the step of obtaining the transmogrified text of the first languages, comprising:
The received text of first languages is subjected to word segmentation processing;
For each participle, the 4th random probability value is generated;Judge that the described 4th is random general according to preset replacement probability distribution Whether rate value, which indicates, is replaced, if the participle is carried out position replacement with adjacent participle.
10. a kind of training device of translation model characterized by comprising
Conversion module, for the received text of the first languages to be converted to the transmogrified text of the first languages;
Training module, for by the received text of first languages, the transmogrified text of first languages and described first Training data of the corresponding second languages text of the received text of languages as translation model, and according to the training data to institute Translation model is stated to be trained.
11. a kind of device for translation model training, which is characterized in that including memory and one or more than one Program, wherein the one or more programs are stored in the memory, and are configured to by one or one It includes the instruction for performing the following operation that a above processor, which executes the one or more programs:
The received text of first languages is converted to the transmogrified text of the first languages;
By the received text of the received text of first languages, the transmogrified text of first languages and first languages Training data of the corresponding second languages text as translation model, and the translation model is carried out according to the training data Training.
12. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held The training method of translation model of the row as described in one or more in claim 1 to 9.
CN201711448599.5A 2017-12-27 2017-12-27 A kind of training method of translation model, device and machine readable media Pending CN109977426A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711448599.5A CN109977426A (en) 2017-12-27 2017-12-27 A kind of training method of translation model, device and machine readable media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711448599.5A CN109977426A (en) 2017-12-27 2017-12-27 A kind of training method of translation model, device and machine readable media

Publications (1)

Publication Number Publication Date
CN109977426A true CN109977426A (en) 2019-07-05

Family

ID=67071176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711448599.5A Pending CN109977426A (en) 2017-12-27 2017-12-27 A kind of training method of translation model, device and machine readable media

Country Status (1)

Country Link
CN (1) CN109977426A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027332A (en) * 2019-12-11 2020-04-17 北京百度网讯科技有限公司 Method and device for generating translation model
CN111291560A (en) * 2020-03-06 2020-06-16 深圳前海微众银行股份有限公司 Sample expansion method, terminal, device and readable storage medium
CN112487833A (en) * 2020-12-01 2021-03-12 中译语通科技(青岛)有限公司 Machine translation method and translation system thereof
CN112597779A (en) * 2020-12-24 2021-04-02 语联网(武汉)信息技术有限公司 Document translation method and device
CN112784612A (en) * 2021-01-26 2021-05-11 浙江香侬慧语科技有限责任公司 Method, apparatus, medium, and device for synchronous machine translation based on iterative modification
CN113345422A (en) * 2021-04-23 2021-09-03 北京巅峰科技有限公司 Voice data processing method, device, equipment and storage medium
CN111258991B (en) * 2020-01-08 2023-11-07 北京小米松果电子有限公司 Data processing method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591415A (en) * 2003-09-01 2005-03-09 株式会社国际电气通信基础技术研究所 Machine translation apparatus and machine translation computer program
TW200805091A (en) * 2005-10-28 2008-01-16 Rozetta Corp Apparatus, method, and program for determining naturalness of array of words
CN103956162A (en) * 2014-04-04 2014-07-30 上海元趣信息技术有限公司 Voice recognition method and device oriented towards child
CN106547743A (en) * 2015-09-23 2017-03-29 阿里巴巴集团控股有限公司 A kind of method translated and its system
CN106708812A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Machine translation model obtaining method and device
CN106782502A (en) * 2016-12-29 2017-05-31 昆山库尔卡人工智能科技有限公司 A kind of speech recognition equipment of children robot

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591415A (en) * 2003-09-01 2005-03-09 株式会社国际电气通信基础技术研究所 Machine translation apparatus and machine translation computer program
TW200805091A (en) * 2005-10-28 2008-01-16 Rozetta Corp Apparatus, method, and program for determining naturalness of array of words
CN103956162A (en) * 2014-04-04 2014-07-30 上海元趣信息技术有限公司 Voice recognition method and device oriented towards child
CN106547743A (en) * 2015-09-23 2017-03-29 阿里巴巴集团控股有限公司 A kind of method translated and its system
CN106708812A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Machine translation model obtaining method and device
CN106782502A (en) * 2016-12-29 2017-05-31 昆山库尔卡人工智能科技有限公司 A kind of speech recognition equipment of children robot

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027332A (en) * 2019-12-11 2020-04-17 北京百度网讯科技有限公司 Method and device for generating translation model
CN111258991B (en) * 2020-01-08 2023-11-07 北京小米松果电子有限公司 Data processing method, device and storage medium
CN111291560A (en) * 2020-03-06 2020-06-16 深圳前海微众银行股份有限公司 Sample expansion method, terminal, device and readable storage medium
CN112487833A (en) * 2020-12-01 2021-03-12 中译语通科技(青岛)有限公司 Machine translation method and translation system thereof
CN112597779A (en) * 2020-12-24 2021-04-02 语联网(武汉)信息技术有限公司 Document translation method and device
CN112784612A (en) * 2021-01-26 2021-05-11 浙江香侬慧语科技有限责任公司 Method, apparatus, medium, and device for synchronous machine translation based on iterative modification
CN112784612B (en) * 2021-01-26 2023-12-22 浙江香侬慧语科技有限责任公司 Method, device, medium and equipment for synchronous machine translation based on iterative modification
CN113345422A (en) * 2021-04-23 2021-09-03 北京巅峰科技有限公司 Voice data processing method, device, equipment and storage medium
CN113345422B (en) * 2021-04-23 2024-02-20 北京巅峰科技有限公司 Voice data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109977426A (en) A kind of training method of translation model, device and machine readable media
WO2021077529A1 (en) Neural network model compressing method, corpus translation method and device thereof
CN107992812A (en) A kind of lip reading recognition methods and device
CN106202150B (en) Information display method and device
CN106251869B (en) Voice processing method and device
CN107632980A (en) Voice translation method and device, the device for voiced translation
CN111524521A (en) Voiceprint extraction model training method, voiceprint recognition method, voiceprint extraction model training device, voiceprint recognition device and voiceprint recognition medium
CN105335754A (en) Character recognition method and device
CN107992485A (en) A kind of simultaneous interpretation method and device
CN110210310A (en) A kind of method for processing video frequency, device and the device for video processing
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN108073572B (en) Information processing method and device, simultaneous interpretation system
CN105139848B (en) Data transfer device and device
WO2021082637A1 (en) Audio information processing method, apparatus, electronic equipment and storage medium
CN108628813A (en) Treating method and apparatus, the device for processing
CN108538284A (en) Simultaneous interpretation result shows method and device, simultaneous interpreting method and device
CN109002184A (en) A kind of association method and device of input method candidate word
CN107274903A (en) Text handling method and device, the device for text-processing
WO2022037600A1 (en) Abstract recording method and apparatus, and computer device and storage medium
CN107291704A (en) Treating method and apparatus, the device for processing
CN109471919A (en) Empty anaphora resolution method and device
CN109961791A (en) A kind of voice information processing method, device and electronic equipment
KR20210032875A (en) Voice information processing method, apparatus, program and storage medium
CN108650543A (en) The caption editing method and device of video
CN108628819A (en) Treating method and apparatus, the device for processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination