CN104050160A

CN104050160A - Machine and human translation combined spoken language translation method and device

Info

Publication number: CN104050160A
Application number: CN201410090457.6A
Authority: CN
Inventors: 高鹏
Original assignee: Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination
Current assignee: Purple Winter Of Beijing Is Voice Technology Co Ltd With Keen Determination
Priority date: 2014-03-12
Filing date: 2014-03-12
Publication date: 2014-09-17
Anticipated expiration: 2034-03-12
Also published as: CN104050160B

Abstract

The invention provides a machine and human translation combined spoken language translation method and a machine and human translation combined spoken language translation device. The method comprises the following steps of recognizing a continuous voice paragraph, and segmenting the continuous voice paragraph to obtain an input text taking a sentence as a unit; searching whether a corresponding object statement exists in a database according to the input text, if so, directly outputting the object statement in voice; using machine translation to translate the input text to obtain an object statement, and grading a confidence coefficient of the object statement; using human translation to translate the input text to obtain an object statement; estimating the confidence coefficient of the machine translation and estimating the quality of human translation, and generating rhythm adjustable voice output according to the translated object statements which are well estimated by adopting a voice synthesis method.

Description

Interpreter's method and apparatus that a kind of machine and human translation merge mutually

Technical field

The present invention relates to the automatic process field of computerized information, be specifically related to Interpreter's method and apparatus that a kind of machine and human translation merge mutually.

Background technology

The research and development of automatic translation system have been carried out 50 years---in fact from 1940's robot calculator birth, just start the exploration of Computer application for voiced translation.Since the eighties of last century latter stage eighties, people start to be devoted to the research of voiced translation (Speech-to-speech Translation) technology.So-called voiced translation is exactly to allow the process of computer realization from a kind of voice of language to the voiced translation of another kind of language.It is imagined substantially, allows computing machine as people, serve as the role who translates between the speaker who holds different language.The language using due to speaker is all generally the spoken language in daily life, and people are also just wishing that machine translation system can accept and realize the translation of any spoken utterance, and, this hope, along with the fast-developing and raising of speech recognition technology and spoken language analyzing technology, has become no longer remote imagination.Therefore, voiced translation is often known as again Interpreter (Spoken Language Translation, SLT).

Interpreter relates to multiple subject and the technology such as automatic speech recognition, mechanical translation, phonetic synthesis, therefore has great Research Significance.Early stage the nineties in eighties of last century, be accompanied by improving constantly of speech recognition technology, Interpreter is flourish gradually.Along with the development of its core technology, Interpreter is defined in the translation of high-quality in certain field, and will be to realize interchange between different language as target.Research in Interpreter lays particular emphasis on the following aspects: 1) in interactively voiced translation, excavate session structure; 2) research written word with spoken language different; 3) performance of disposal system and robustness problem.In recent years, along with the foundation of statistical framework in speech recognition, in mechanical translation and Interpreter, increasing system brings into use statistical method to carry out modeling.And traditional oral translation system, owing to being subject to technical limitation, can only be applied under some voice constrained environment, and the main challenge of current techniques is how to solve real-life oral communication problem.Concrete application scenarios is served travel information consulting etc. from international conference (comprising sports meet) informix, therefore from the angle of society and economy, Interpreter (SLT) and the world of our this globalization have very large correlativity.

Spoken message is expressed the main communication way of one as the daily interchange of people, and the terseness of language performance and the popularity of usable range are more and more subject to people's attention.The machine translation system of exploitation based on spoken message contents processing is applied to an important content developing practicality machine translation system for people on easy to use and embedded system platform that carry.But the Some features of spoken language self makes the research and development of practicality oral translation system difficult.Interpreter's main difficulty is 1) in the real scene such as spoken dialog, Internet chat, the sentence of input often lacks standardization, is difficult to flutter catch its inherent syntactic structure information, causes the result of statistical translation more stiff, the problem such as poor stability; 2) statistical machine translation is with data-driven, and its foundation of depending on for existence is bilingual data resource.And in current data accumulation, quite deficient towards spoken bilingual corpora (China and Britain).Therefore current, the oral translation system that relies on statistical method completely can not meet the widespread demand of people's daily life completely.3) Interpreter is different from text translation, and it mainly completes the crowd's of different language interchange problem, and therefore its requirement of real-time to translation is high, and especially how optimizing translation flow at net environment is to improve the key that user experiences.

Summary of the invention

The present invention aims to provide a kind of Interpreter's method that machine and human translation merge mutually, can solve simple use machine and carry out Interpreter, the problems such as its result is stiff, poor stability, and by continuous accumulation the extensive Interpreter's database taking sentence as unit, improve the automaticity of translation, accelerate the commercialization of oral translation system.

According to an aspect of the present invention, a kind of Interpreter's method that it provides machine and human translation to merge mutually, comprising:

Step 1, identify continuous voice paragraph, and to its cutting of making pauses in reading unpunctuated ancient writings, obtain the input text taking sentence as unit;

Step 2, carry out database search according to described input text, search and whether have corresponding object statement, if having directly object statement with voice output, otherwise go to step 3;

Step 3, use mechanical translation are translated and are obtained object statement described input text, and described object statement is carried out to degree of confidence marking;

Step 4, obtain object statement by human translation institute input text;

Step 5, assess according to the quality of the degree of translation confidence assessment of described mechanical translation and human translation, and adopt the method for phonetic synthesis that good assessment special translating purpose statement is generated to the adjustable voice output of the rhythm.

The other method according to the present invention, it discloses a kind of machine and Interpreter's device that human translation merges mutually, comprising:

Speech recognition and fragmentation module, identify continuous voice paragraph, and to its cutting of making pauses in reading unpunctuated ancient writings, obtain the input text taking sentence as unit;

Masterplate retrieval and replacement module, carry out database search according to described input text, searches and whether have corresponding object statement, if having directly object statement with voice output, otherwise enters the first translation module;

Mechanical translation module based on level phrase, is used mechanical translation to translate and obtain object statement described input text, and described object statement is carried out to degree of confidence marking;

The artificial mass-rent translation module of intelligence, obtains object statement by human translation institute input text;

Quality assessment modules, assesses according to the degree of translation confidence assessment of described mechanical translation and the quality of human translation, and provides the last translation result of judgement;

Phonetic synthesis output module, the last translation result that adopts the method for phonetic synthesis that quality assessment modules is determined generates the adjustable voice output of the rhythm.

Described in the such scheme that the present invention proposes, the result of different interpretation methods is passed to user with the mode unification of phonetic synthesis, met the real-time demand of speech exchange.For image is transmitted source language user feeling, this recognition methods represents to target language user by identification user feeling and with diagram or synthetic mode, improves the experience under user's application scenarios.

The present invention is made pauses in reading unpunctuated ancient writings and has been realized the warehouse-in of the artificial high-quality translation language material taking sentence as unit and effectively utilized by paragraph, has overcome the problem that Traditional Man free translation data are difficult to be utilized by statistic translation method; The present invention stablizes to such an extent that be output as the voice that comprise emotion by phonetic synthesis by the result of different interpretation methods, can be good at solving the real-time communication problem in voiced translation; The present invention can solve the Interpreter problem of magnanimity mobile subscriber under the scenes such as daily life, tourism, study, simple dialog, accelerates the marketization of spoken automatic translation technology, has very great Practical significance.

Brief description of the drawings

Fig. 1 Interpreter's method flow diagram that to be the machine that proposes of the present invention merge mutually with human translation;

Fig. 2 is the present invention carries out sentence segmentation schematic diagram to character string " ABCDEFGH ";

User's growth curve figure in 2 years futures that Fig. 3 can predict for the present invention;

Fig. 4 is the schematic flow sheet of degree of translation confidence appraisal procedure in the present invention;

Fig. 5 Interpreter's unit simulation figure that to be the machine that proposes of the present invention merge mutually with human translation.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

Referring to Fig. 1, according to the present invention, mechanical translation is as follows with the flow process of Interpreter's method that manually mass-rent translation is merged mutually:

S11: by the voice of terminal device input source language, the continuous speech of input is carried out to cutting punctuate taking automatic identification, the automatic interpolation of punctuation mark etc. of voice as core, obtain translating the required input text taking sentence as unit;

S12: after each corpus of text input system, first in translation database, retrieve, if the match is successful, the cypher text that output retrieves, and jump to step S17;

S13: if cannot retrieve corresponding data, start Intelligent management module, input the complexity of sentence by calculating, and determine whether enable human translation in conjunction with user's grade situation, data are sent into mechanical translation module simultaneously;

S14: based on the statistical machine translation method of level phrase, merge and force the parameter training result of alignment to carry out degree of translation confidence assessment;

S15: if determine to enable human translation, use the method for artificial mass-rent translation, and use visual terminal to make interpreter can complete text translation sentence by sentence on backstage;

S16: according to degree of translation confidence, mechanical translation result and human translation result are carried out to comprehensive assessment, export preferably cypher text result;

S17: the cypher text of output is generated to stay-in-grade voice output, can image transmit user feeling, the actual demand that can solve well face-to-face real time translation.

Describe said method step in detail below by embodiment bis-.

The fragmentation disposal route of continuous speech in step S11

In on-line continuous speech recognition, voice can be input in sound identification module endlessly.Although the calculating that the posterior probability of eigentransformation and every frame voice can be real-time, if demoder and internal memory cannot bear long speech data.Therefore in on-line continuous speech recognition, the punctuate cutting to voice is extremely important, and it is not only the guarantee of system stable operation, and the recognition performance of system is had to vital effect.Traditional on-line continuous speech recognition is to carry out cutting according to the quiet length in voice.Like this cutting read aloud or news voice in have reasonable effect, but run into natural speech recognition effect and will suddenly decline.Because in naturally speaking, in short middle pause is also ubiquitous phenomenon, and the local cutting punctuate that will pause according to quiet cutting.Can cause like this identification of former and later two sound bites that pause all can occur mistake.In addition, in naturally speaking, say continuously because the fluctuation of mood causes a lot of words, centre does not even comprise a pause.Utilize quiet method of making pauses in reading unpunctuated ancient writings to have no way of detecting this cutting, and then can have influence on the recognition effect of voice.

What our on-line continuous speech recognition adopted is the cutting punctuate method taking discrimination as core, and cutting is to carry out under the precondition of not losing discrimination.Such cutting method can judge automatically to the punctuate in natural-sounding.

Fig. 2 shows the present invention carries out the schematic diagram of sentence segmentation to character string " ABCDEFGH ".As shown in Figure 2, suppose one section of voice, search for according to acoustic model and language model score, result is ABCDEFG.An ensuing character is H, and consideration is all to need to carry out cutting punctuate to have two kinds of situations in figure below.

In the first situation, thinking herein should cutting, i.e. a paths above in figure, and the probability of language model is so:

P_{1} = P_{h} \times P (w_{n + 1} = < \ s | W_{n}^{n - x}) \times P (w_{1} = H | < s >) - - - (1)

Wherein w=w ₁w ₂... .w _nrepresent that length is the character string of n, x represents x gram language model.Character string is ABCDEFG in this example, and length is 7.P _hbe the historical probability of identification string ABCDEFG, this is the initial state of two kinds of situations. that identification string last character G is the probability of a tail.P (w ₁=H|<s>) be the probability of first unknown character H as the beginning of a new statement.

The second situation is non-divided herein, is exactly a paths below in figure, and the probability of language model is:

P_{2} = P_{h} \times P (w_{n + 1} = H | W_{n}^{n - x}) - - - (2)

Wherein P _hto be the same in formula (1). that character H follows the probability after character G.

The relatively probabilistic language model in two kinds of situations, if P ₁> P ₂, so we just think herein should cutting, start other a word from H.Otherwise the words does not also finish, H, immediately following form new character string ABCDEFGH after G, then carries out identical processing to character I below.This method not only can be got rid of the false-alarm that the pause in a word brings, and can avoid failing to report in the continuous speech that there is no pause, thereby makes discrimination reach best.In addition this cutting method do not rely on quiet, thereby also avoided the inaccurate mistake cutting bringing of quiet detection.

In step S12, extensive translation database search method

This translation data library storage the high-quality translation result through human translation or artificial check and correction.This database has been born two tasks in system:

After corpus of text is processed through fragmentation, first in translation database, retrieve, if the match is successful, return results immediately.The coupling is here that the similarity by calculating between sentence realizes.So-called sentence similarity refers to that two sentences, at coupling matching degree semantically, are worth for the real number between [0,1], and value shows that more greatly two sentences are more similar.In the time that value is 1, show that two sentences are identical semantically; Value is less shows that two sentence similarities are lower, in the time that value is 0, shows that two sentences are completely different semantically.Here we mainly adopt the TF-IDF method based on vector space model that merges semantic information to calculate sentence similarity.Because spoken language lacks the information such as syntactic structure of specification, we are difficult to flutter the syntactic-semantic structure of catching input text.Therefore, semantic information herein mainly comprises the shallow semantic information such as time, numeral, named entity, and replace on this basis generate translation template input text is mated.

Because statictic machine translation system is that language material drives, its performance is with the language material interwoveness of input.Especially, what our translation database was collected is real-time, language material on line accurately, and concerning machine translation system, along with the continuous increase of this part data scale and perfect, system performance will be continued to optimize.

In step S13, intelligent coordinated method

If input sentence does not retrieve in translation database, we obtain system and start intelligent coordinated module, and according to actual user's classification, the complexity result of calculation of comprehensive input text, determines whether manually complete translation by interpreter.

The user of this default pays the number of expense, is divided into free user, domestic consumer's (paying the user of small charge), advanced level user (user that the accuracy of system is had higher requirements) three kinds.

Fig. 3 has shown the user's growth curve in 2 years futures that this system can predict.As seen from Figure 3, at initial stage of development, be mainly to receive free user, build that to improve intelligent transcription platform be main, along with the growth of number of users, the optimization updating ability of combined with intelligent transcription platform self, increasing paying customer will be attracted.In addition, Intelligent management module is also being controlled translation result is preferably being added in database, to ensure the continuous updating of database.After treating that it builds up to certain scale, this part language material can be added in statistical translation system, translation model and language model are upgraded, and then make whole translation system along with the injection of real-time language material keeps continuing to optimize.Therefore, along with the growth of system deal with data scale, the translation duties proportion that the automatic modules such as database retrieval and mechanical translation can complete progressively increases, and this can reduce the cost of human translation to a great extent, improves system effectiveness.

In step S14, based on the statistical machine translation method of level phrase, merge and force the parameter training result of alignment to carry out degree of translation confidence assessment.

For Interpreter, use the stratification phrase statistical machine method based on the irrelevant grammer (synchronous context-free grammar, SCFG) of synchronous context.Basic thought is from extensive bilingualism corpora, to extract magnanimity alignment phrase segment, as knowledge source " memory " on storage medium.The input text utilization efficient searching algorithm match phrase fragment that speech recognition is gone out is also combined into the sentence that will translate.And set up the confidence calculations method of a kind of fusion based on forcing alignment model parameter training, the translation uncertainty that primary study translation probability characteristic is brought, is finally that the translation result of each target language generates degree of confidence marking.

Statistical machine model based on stratification phrase is based upon the irrelevant grammer (synchronous context-free grammar, SCFG) of synchronous context, belongs to the category of formalization syntax.Statistical translation system based on stratification phrase has the general characteristic of the statistical machine translation based on formalization syntax: having used the structure of formalization syntax, is stratification thereby make translation process, has structure.Therefore, it can adjust order to become the local order of adjusting the long distance in sentence; And can in the structure of tree, introduce variable and solve the problem such as extensive.Meanwhile, compared with introducing the statistical translation system of linguistics syntax, formalization syntax has does not need the extra advantage such as syntactic analysis resource, complete compatible phrase system.Especially, at spoken dialog, in the real scenes such as Internet chat, input sentence is often lack of standardization, has a certain distance with real written language sentence, and this is greatly affected the performance of syntactic analysis.If build the statictic machine translation system based on linguistics grammer in such environment, its performance will be had a greatly reduced quality because syntactic analysis performance is low.But the statistical machine translation based on formal syntax is in such language environment, performance can not be subject to too large impact, because formal syntax model still has the advantage of phrase model in this respect.Introduce the level phrase interpretation method with the output of degree of confidence translation result below:

1) level phrase translation

Under the interpretative system based on level phrase, the task of demoder is to find optimum target language string meet

\hat{e} = e ({\arg \max}_{D_{s . t .} f (D) = f} P (D))

Wherein e represents target language string, and D representative generates the process of target strings through a series of derivations.That is to say, can find a synergetic syntax derivation D who generates source language string, the target language string that the target language end of the synchronous syntax generates in D, is exactly final translation.It should be noted that, what found here is the derivation of maximum probability instead of the target language string of maximum probability, because may there be multiple derivation can both generate similarly target language string, so it is higher to find the target language string calculation cost of maximum probability.

According to log-linear model, probability P (D) can be thought the log-linear combination of multiple features:

P (D) &Proportional; \underset{i}{Π} Φ_{i} {(D)}^{λ_{i}}

Wherein Φ _iand λ _ii fundamental function and corresponding weight thereof.Except language model feature P _lMin addition, other features of using can be write as the form of regular product:

φ_{i} (D) = \underset{(X &RightArrow; < γ, α >) &Element;}{Π} φ_{i} (X &RightArrow; < γ, α >)

Wherein X → <r, a> represents a series of context-free grammars of using in level phrase system derivation, therefore can rewrite P (D) to be:

P (D) &Proportional; P_{LM} {(e)}^{λ_{LM}} \times \underset{i &NotEqual; LM}{Π} \underset{X &RightArrow; (α, γ) &Element; D}{Π} Φ_{i} {(X &RightArrow; (α, γ))}^{λ_{i}}

Traditional statistical translation system, as Pharaoh, adopts beam search, the linear codec algorithms such as A*, although can incorporate the language model of simple tune order model and N-Gram, the word order of translation result is always barely satisfactory.But translation system front end based on level phrase combines the structural information of tree, the corresponding analytical algorithm that also will utilize tree of decoding algorithm of rear end.So the present invention uses for reference the syntactic analysis method based on chart analysis, realize the demoder of CYK-Style.

CYK algorithm is improved bottom-up shift-in-stipulations algorithm.In decode procedure, can produce a large amount of hypothesis, for fear of all possibilities of search, we have adopted stack architecture, and utilize different strategies in search procedure, to carry out necessary beta pruning.Set up addEdge function for each limit that will enter line chart (chart), first check that it whether can passing threshold (threshold) beta pruning, if by, check the whether again merger (recombination) of this limit, finally check whether it can pass through histogram (histogram) beta pruning.Only have the limit of successfully having passed through these three kinds of beta prunings, could finally be stored in line chart.

2) merge the degree of translation confidence appraisal procedure based on forcing alignment model parameter training

Under ordinary meaning, degree of confidence is to evaluate measuring of a correct probability, and it has reflected the degree of reliability that a certain particular event occurs.In mechanical translation, it is expressed as is not having under the condition of Key for Reference in advance, to the translation result assessment of given input.In this system, we not only utilize the puzzled degree of some translation system external informations such as source language/target language, and length etc., also come in the various information fusion in translation process, simulate better the scoring of people to translation result.

Fig. 4 shows the schematic flow sheet of degree of translation confidence appraisal procedure in the present invention.As described in Figure 4, based on forcing in the degree of translation confidence appraisal procedure of alignment model parameter training, first utilize and force alignment techniques to complete the parameter training of model, and retain the information such as probability in alignment procedure; Meanwhile, we are dissolved into the knowledge such as puzzled degree and length of source language string S and target strings T in the assessment of degree of confidence as feature.By machine learning, much information can be fused in a unified framework, and simulated target function to greatest extent.In native system, we have used support vector regression (Support Vector Regression, SVR) as machine learning instrument.This is the artificial scoring that can be similar to well target-mock translation result of our proposition due to the objective function of SVR.

In step S15, artificial mass-rent interpretation method:

1) artificial mass-rent translation

Mass-rent (Crowdsourcing), refer to that a company is contracted out to task the way of unspecific popular network with free voluntary form, be a kind of socialization collaboration mode based on internet, can embed the links of business operation, form new organizational form.This concept is that the well-known IT magazine of the U.S. " line " proposes, and is called another important business concept after " The long tail " by industry.This invention, organizing this link of human translation to adopt mass-rent pattern, is organized each passerby of certain Interpreter's ability, utilizes colony's wisdom to create the social new things that everybody is benefited.

Although mass-rent is translated this concept and is suggested, also the article and the books translation community that have websites such as translating speech, tiger are flutterred, shell to carry out mass-rent pattern, but the main translation of these community-type online media sites is to liking the text message of real time communication on the networks such as current political news, competitive sports, conventionally find suitable original text by the senior online friend of some forums and website employee, and " translation group " online friend's level and the division of labor are held.Mass-rent in this invention translation towards be the spoken message mutually exchanging in daily life, and passed through fragmentation processing.Therefore,, than the mass-rent translation of the penman texts such as news, the spoken mass-rent translation that this invention proposes has following some advantage:

The content that spoken message is expressed, taking communication exchange in actual life as object, therefore needs more the translation result of " hommization ".Mass-rent translation can utilize the strength of colony, not only reduced the cost of research and development, operation, manpower, also allows the translation result demand of being more close to the users.

The content that spoken message is expressed relates to the many aspects such as people's clothing, food, shelter, row, uses mass-rent to translate here, can converge the more amateur translation talent, fully demonstrates the advantage of colony's wisdom.

In visual terminal, what interpreter faced is the text message after fragmentation.Conventionally, interpreter is good to good problem is set and offers suggestions, and for problem targetedly provides translation answer, and therefore such setting is more conducive to that interpreter carries out in real time, translation efficiently.

Upper by the language material of fragmentation being presented to mobile terminal (ipad, mobile phone), for interpreter translate, editor etc., last interpreter can select translation result, with voice mode entry terminal, to realize the object of real time translation.

2) interpreter terminal

Use visual terminal to make interpreter can complete text translation sentence by sentence on backstage, this invention has reduced interpreter's difficulty, has reduced the cost of human translation, has ensured the demand of real time translation simultaneously, is mainly reflected in the following aspects:

On backstage, the text of fragmentation is directly presented in visual terminal, what interpreter faced is no longer tediously long, continuous voice, the substitute is fragmentation and visual text message.On the one hand, the text of fragmentation is taking sentence as unit, and interpreter no longer needs to remember the information of many and even whole section, and has reduced the demand to the comprehensive abstract ability of interpreter; On the other hand, convert input message to visualText, interpreter can translate into target language by the text message of seeing quickly and easily.This working method has been relaxed the demand to the various abilities of interpreter, and " amateur " translation talent also can express delivery be incorporated, and has greatly improved the scope of application of system.

Need the word of human translation by setting the highlighted demonstration of different conditions of current task, as represented with different colours whether data are retrieved in terminal, whether by mechanical translation and whether need human translation.In the time of needs human translation, can use highlighted demonstration to be different from other states.The method of this highlighted demonstration particular state makes interpreter can complete quickly and easily the translation sentence by sentence of text, has reduced interpreter's difficulty, has reduced the cost of human translation.

Last due to our outstanding speech recognition system, interpreter can select translation result, with voice mode entry terminal, to be automatically converted to word by its embedded speech recognition chunk.Here no longer need word for word typing translation result, reduced the time of hand labor, improved work efficiency.

In step S16, the method that automatic translation and human translation are carried out to comprehensive assessment:

After completing automatic translation and human translation, all results are inputted this module and are carried out quality evaluation and comprehensively export.This module is according to the degree of confidence marking of automatic translation, and each interpreter's rank is beaten grading information, and uses for reference automatic translation evaluation criterion BLEU computing method, and the result of different interpretation methods is carried out to comprehensive assessment, and preferably result returns to voice synthetic module.

Fig. 5 shows Interpreter's unit simulation figure that machine and human translation that the present invention proposes merge mutually.As shown in Figure 5:

S11: by the voice of terminal device (as mobile phone) input source language, the continuous speech of input is carried out to cutting punctuate taking automatic identification, the automatic interpolation of punctuation mark etc. of voice as core; And obtain translating the required input text taking sentence as unit;

S12: after each corpus of text input system, first retrieve in translation database, if the match is successful, the cypher text that output retrieves, through after quality evaluation, directly exports the synthetic result of language of cypher text;

S13: if cannot retrieve corresponding data, start Intelligent management module, input the complexity of sentence by calculating, and determine whether to enable human translation in conjunction with user's grade situation, data will be sent into mechanical translation module simultaneously;

S15: if determine to enable human translation, use the method for artificial mass-rent translation, and use visual terminal (ipad or other computers) to make interpreter can complete text translation sentence by sentence on backstage;

S17: the cypher text of output is generated to stay-in-grade voice output, and language performance can solve the actual demand of face-to-face real time translation well.

The invention allows for a kind of speech recognition and Interpreter's conversion equipment, it comprises:

Speech recognition and fragmentation module: can carry out automatic speech recognition, automatic marking interpolation, the text to be translated that formation sentence is unit to continuous speech;

Masterplate retrieval and the replacement module of extensive translation database: the large scale database search and use shallow-layer syntactic information that adopts the TF-IDF method based on vector space model that merges semantic information to complete Interpreter's example sentence completes the partial content of translation data and replaces, and strengthens the function of the generalization amount aspect of database;

Intelligent management module: major function is to input the complexity of sentence by calculating, and determine whether to enable human translation in conjunction with user's grade situation, data will be sent into mechanical translation module simultaneously.Meanwhile, it can also continue to collect, arrange the result that artificial mass-rent is translated, and forms high-quality translation database in order to retrieval service to be efficiently provided, and upgrades and optimize the bilingual corpora that high-quality is provided for Machine Translation Model; And day by day perfect along with the continuous accumulation of translation database and machine translation system, progressively increase the ratio of data retrieval and mechanical translation, realize dynamically updating and optimizing of whole translation system;

Mechanical translation module based on level phrase: comprise translation model, language model and its basic thought of demoder three parts and be and from extensive bilingualism corpora, extract magnanimity alignment phrase segment, as knowledge source " memory " on storage medium.The Chinese sentence that user is keyed in, utilizes efficient searching algorithm match phrase fragment and is combined into the english sentence that will translate.The present invention has also set up the confidence calculations method of a kind of fusion based on forcing alignment model parameter training thereon, and the translation uncertainty that primary study translation probability characteristic is brought is finally that the translation result of each target language generates degree of confidence marking.

The artificial mass-rent translation module of intelligence: the mode of employing mass-rent attracts to have oracy and is ready to be translated into the translation talent of marketable value, such talent's service type community is converged in formation, and by intelligent platform subcontracting, provide instant artificial interpretation service; And it is upper that the language material of fragmentation is presented to mobile terminal (ipad, mobile phone), translates, edits, evaluates, returns for interpreter.Visual terminal, the text translation that interpreter can be completed sentence by sentence in real time comprises: on backstage, the text of fragmentation is directly presented in visual terminal, interpreter is no longer directly in the face of continuous voice and word, and this has reduced the specific demand of hearing, memory and abstract ability to interpreter to a great extent; The word that needs human translation by setting the highlighted demonstration of different conditions of current task, makes interpreter can complete quickly and easily the translation sentence by sentence of text, has reduced interpreter's difficulty, has reduced the cost of human translation; Finally, after having translated, interpreter no longer needs word for word input text, can select to complete phonetic entry by terminal device, and be automatically converted to word by its embedded sound identification module and submit to system, and this will greatly improve interpreter's work efficiency.

Quality assessment modules: the execution degree marking that the result of retrieving by analytical database and machine translation apparatus provide, and under the result of the information such as the translation result that may feed back in comprehensive artificial mass-rent translation, translate duration, interpreter's grade and adaptive these multi-aspect informations of user gradation, each result is carried out to comprehensive judgement, and provide the last translation result of judgement.

Voice synthetic module: the mode by phonetic synthesis is exported translation result.For image is transmitted source language user feeling, this recognition methods represents to target language user by identification user feeling and with diagram or synthetic mode, improves the experience under user's application scenarios.

Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. machine and Interpreter's method that human translation merges mutually, is characterized in that, comprising:

Step 4, obtain object statement by human translation institute input text;

2. method according to claim 1, is characterized in that, the cutting of making pauses in reading unpunctuated ancient writings in described step 1, and the input text obtaining taking sentence as unit specifically comprises:

The continuous speech that is unit input by voice paragraph carries out cutting punctuate taking the rhythm as principal character, and adds in conjunction with the automatic identification of voice, the automatic powder adding of punctuation mark the cutting of making pauses in reading unpunctuated ancient writings, and carries out under the precondition of not losing discrimination.

3. method according to claim 1, is characterized in that, step 2 specifically comprises:

To each input text, adopt the TF-IDF method based on vector space model that merges semantic information to calculate the similarity of statement in input text and database, and then obtain object statement.

4. method according to claim 1, is characterized in that, adopts the machine translation system of the statistical method based on level phrase in described step 3, and output packet is containing the object statement of confidence metric, and its detailed process comprises:

From bilingualism corpora, extract magnanimity alignment phrase segment, as knowledge source " memory ", on storage medium, the input text that speech recognition is gone out utilizes searching algorithm match phrase fragment and is combined into the object statement that will translate; And merge the confidence calculations method based on forcing alignment model parameter training, for each object statement generates confidence.

5. method according to claim 1, is characterized in that, before execution step 4, also needs as judged: calculate the complexity of input text, and determine whether to need human translation in conjunction with user's classification.

6. the method as described in claim 1-4 any one, it is characterized in that, in step 1, judge whether punctuate according to the identification probability of different language model, wherein said language model identification probability is the historical probability of identification string, identification string last character is that the probability of a tail and first unknown character string are the product of the probability that starts of statement.

7. method as claimed in claim 4, it is characterized in that, the described statistical translation based on level phrase specifically refers to find out the multiple derivation generative processes from source language string to target strings the derivation of maximum probability, the result output using target strings corresponding this derivation as mechanical translation.

8. the method as described in claim 1,4 any one, is characterized in that, in step 3 by support vector machine just in translation process the puzzled degree of source language ship and target strings, length merge and learn, and then final object statement is carried out to degree of confidence marking.

9. machine and Interpreter's device that human translation merges mutually, is characterized in that, comprising:

10. device as claimed in claim 9, is characterized in that, also comprises:

Intelligent management module, inputs the complexity of sentence, and determines whether enable human translation in conjunction with user's classification by calculating, data are sent into the mechanical translation module based on level phrase simultaneously.