CN109408833A

CN109408833A - A kind of interpretation method, device, equipment and readable storage medium storing program for executing

Info

Publication number: CN109408833A
Application number: CN201811276866.XA
Authority: CN
Inventors: 孔常青; 高建清; 刘俊华; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-03-01
Also published as: WO2020087655A1

Abstract

This application discloses a kind of interpretation methods, device, equipment and readable storage medium storing program for executing, method includes: when getting source language text to be translated, further made pauses in reading unpunctuated ancient writings according to current translation scene to source language text, source language text after obtained punctuate is more in line with current translation scene, obviously, compared to existing interpretation method, the application increases punctuate optimization process to obtained source language text, current translation scene is considered to make pauses in reading unpunctuated ancient writings to source language text again, so that the punctuate mode of source language text more optimizes, and then source language text after punctuate is translated based on this, the quality of obtained target language text also can be higher.

Description

A kind of interpretation method, device, equipment and readable storage medium storing program for executing

Technical field

This application involves translation technology field, more specifically to a kind of interpretation method, device, equipment and readable deposit Storage media.

Background technique

Source language text to be translated, i.e., is translated as the process of target voice text by the process of text translation.For to The source language text of translation, punctuate mode is simultaneously lack of standardization, is influenced by the source of source language text, such as being known by voice The source language text not obtained, the pause information for relying primarily on voice are made pauses in reading unpunctuated ancient writings, are often influenced by speaker's habit.

When the prior art carries out machine translation based on the source language text that such punctuate mode does not optimize, it can largely effect on The quality of machine translation.

Summary of the invention

In view of this, this application provides a kind of interpretation method, device, equipment and readable storage medium storing program for executing, it is existing for solving There is source language text punctuate to be translated not optimize, leads to the problem that mechanical translation quality is low.

To achieve the goals above, it is proposed that scheme it is as follows:

A kind of interpretation method, comprising:

Obtain source language text to be translated；

Made pauses in reading unpunctuated ancient writings according to current translation scene to the source language text, the source language text after being made pauses in reading unpunctuated ancient writings；

Source language text after the punctuate is translated, target language text is obtained.

Preferably, described to be made pauses in reading unpunctuated ancient writings according to translation scene to the source language text, the original language text after being made pauses in reading unpunctuated ancient writings This, comprising:

The source language text is inputted to preset text punctuate model, after obtaining the punctuate of text punctuate model output Source language text；

Wherein, the text punctuate model is, using original language training text as training data, to be trained with the original language The punctuate result for meeting the current translation scene of text is obtained as training label training.

Preferably, the determination process of the text punctuate model includes:

Obtain original language training text；

Determine that the punctuate for meeting the current translation scene of the original language training text is tied as a result, making pauses in reading unpunctuated ancient writings as target Fruit；

Using the original language training text as training data, using the target punctuate result as training label, training Text punctuate model.

Preferably, the punctuate for meeting the current translation scene of the determination original language training text is as a result, make For target punctuate result, comprising:

Obtain object language training text of the original language training text after the translation under the current translation scene；

The punctuate changing mode of reference settings is modified the punctuate mode of the original language training text, obtains more Original language training text after changing, by after changing original language training text and the original language training text form candidate source language Say training text；

Using preset Machine Translation Model, each candidate source speech training text is translated, is obtained each The machine translation result of the candidate source speech training text；

The machine translation of each candidate source speech training text is determined as a result, with the object language training text Similarity, using the highest candidate source speech training text of similarity as the target punctuate result.

Preferably, the punctuate changing mode of the reference settings carries out the punctuate mode of the original language training text Change, the original language training text after being changed, comprising:

Determine the nonterminal type punctuate that the original language training text includes；

The each nonterminal type punctuate for including by the original language training text, is replaced using termination type punctuate, is obtained Original language training text after to change.

Preferably, described using preset Machine Translation Model, each candidate source speech training text is turned over It translates, obtains the machine translation result of each candidate source speech training text, comprising:

By each candidate source speech training text according to it includes termination type punctuate carry out clause's division, drawn Clause's sequence after point；

Using preset Machine Translation Model, to each clause in clause's sequence of the candidate source speech training text point It is not translated, obtains the machine translation result of each clause；

According to the sequence of each clause in clause's sequence, the machine translation result of each clause is merged, the candidate source is obtained The machine translation result of speech training text.

Preferably, described using the original language training text as training data, using the target punctuate result as Training label, before training text punctuate model, this method further include:

It obtains manually to original language training text mark punctuate as a result, the original language after manually being marked is trained Text；

Using the original language training text as training data, using the original language training text after the artificial mark as Training label, training text punctuate model obtain preliminary text punctuate model；

Then, described using the original language training text as training data, it is marked using the target punctuate result as training Label, training text punctuate model, comprising:

Using the original language training text as training data, using the target punctuate result as training label, training The preliminary text punctuate model.

Preferably, the source language text to after the punctuate is translated, and obtains target language text, comprising:

By the source language text after the punctuate according to it includes termination type punctuate carry out clause's division, after obtaining division Clause's sequence；

Using preset Machine Translation Model, to each clause in clause's sequence of the source language text after the punctuate point It is not translated, obtains the machine translation result of each clause；

According to the sequence of each clause in clause's sequence, the machine translation result of each clause is merged, the target language is obtained Say text.

A kind of translating equipment, comprising:

Source language text acquiring unit, for obtaining source language text to be translated；

Text punctuate unit, for being made pauses in reading unpunctuated ancient writings according to current translation scene to the source language text, after obtaining punctuate Source language text；

Source language text translation unit obtains object language for translating to the source language text after the punctuate Text.

Preferably, the text punctuate unit includes:

Model reference unit obtains text punctuate for the source language text to be inputted to preset text punctuate model Source language text after the punctuate of model output；

Preferably, further includes: text punctuate model determination unit, for determining text punctuate model；The text punctuate Model includes:

Original language training text acquiring unit, for obtaining original language training text；

Punctuate result determination unit meets the disconnected of the current translation scene for determine the original language training text Sentence is as a result, as target punctuate result；

First model training unit, for being made pauses in reading unpunctuated ancient writings with the target using the original language training text as training data As a result as training label, training text punctuate model.

Preferably, the punctuate result determination unit includes:

Object language training text acquiring unit, for obtaining the original language training text in the current translation scene Under translation after object language training text；

Punctuate changing unit, for the punctuate changing mode of reference settings, to the punctuate side of the original language training text Formula is modified, the original language training text after being changed, by the original language training text and original language instruction after changing Practice text and forms candidate source speech training text；

Original language training text translation unit, for utilizing preset Machine Translation Model, to each candidate source language Speech training text is translated, and the machine translation result of each candidate source speech training text is obtained；

Similarity determining unit, for determining the machine translation of each candidate source speech training text as a result, with institute The highest candidate source speech training text of similarity is made pauses in reading unpunctuated ancient writings as the target and is tied by the similarity for stating object language training text Fruit.

Preferably, the punctuate changing unit includes:

Nonterminal type punctuate determination unit, the nonterminal type punctuate for including for determining the original language training text；

Nonterminal type punctuate replacement unit, each nonterminal type punctuate for including by the original language training text, It is replaced using termination type punctuate, the original language training text after being changed.

Preferably, the original language training text translation unit includes:

First clause's division unit, for by each candidate source speech training text according to it includes termination type mark Point carries out clause's division, clause's sequence after being divided；

First clause's translation unit, for utilizing preset Machine Translation Model, to the candidate source speech training text Clause's sequence in each clause translate respectively, obtain the machine translation result of each clause；

First translation result combining unit turns over the machine of each clause for the sequence according to each clause in clause's sequence Result merging is translated, the machine translation result of the candidate source speech training text is obtained.

Preferably, the text punctuate model further include:

Artificial annotation results acquiring unit, for obtain manually to original language training text mark punctuate as a result, Original language training text after manually being marked；

Second model training unit is used for using the original language training text as training data, with the artificial mark Original language training text afterwards obtains preliminary text punctuate model as training label, training text punctuate model；

Then the first model training unit is specifically used for:

Preferably, the source language text translation unit, comprising:

Second clause's division unit, for by the source language text after the punctuate according to it includes termination type punctuate into Row clause divides, clause's sequence after being divided；

Second clause's translation unit, for utilizing preset Machine Translation Model, to the source language text after the punctuate Clause's sequence in each clause translate respectively, obtain the machine translation result of each clause；

Second translation result combining unit turns over the machine of each clause for the sequence according to each clause in clause's sequence Result merging is translated, the target language text is obtained.

A kind of interpreting equipment, including memory and processor；

The memory, for storing program；

The processor realizes each step of interpretation method as described above for executing described program.

A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor Now each step of interpretation method as described above.

It can be seen from the above technical scheme that interpretation method provided by the embodiments of the present application, to be translated getting When source language text, further made pauses in reading unpunctuated ancient writings according to current translation scene to source language text, the source language after obtained punctuate Speech text is more in line with current translation scene, it is clear that compared to existing interpretation method, the application is to obtained source language text Punctuate optimization process is increased, that is, considers current translation scene and is made pauses in reading unpunctuated ancient writings again to source language text, so that original language is literary This punctuate mode more optimizes, and then is translated based on this to the source language text after punctuate, obtained object language text This quality also can be higher.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is a kind of interpretation method flow chart disclosed in the embodiment of the present application；

Fig. 2 is a kind of translating equipment structural schematic diagram disclosed in the embodiment of the present application；

Fig. 3 is a kind of hardware block diagram of interpreting equipment disclosed in the embodiment of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.

The process of text translation is translated as target language text i.e. by source language text to be translated.According to be translated The separate sources of source language text, punctuate mode is not also unique, is to treat translated speech to know with source language text to be translated It is illustrated for not obtaining.Under different translation scenes, the different punctuate modes of source language text be will affect based on source language The quality of target language text after saying text translation.Show for example, source language text is in different context environmentals, translation As a result there may be difference.For another example, it is also likely to be present difference in the translation result of different translation occasion source language texts, As descended at the meeting occasion, the translation result requirement of source language text is more rigorous, standardizes, and the source language text under occasion of chatting Translation result may more arbitrarily, colloquial style.

In the prior art, it for source language text to be translated, is sent directly into Machine Translation Model and is translated, and source The punctuate mode of language text is lack of standardization, and the source language text such as obtained by speech recognition may be spoken by speaker is accustomed to shadow It rings, punctuate mode does not optimize, and does not consider current translation scene, also not high based on the outcome quality after this translation.For This, this application provides the interpretation methods after a kind of optimization.The interpretation method of the application can be applied to have data processing energy In the electronic equipment of power.

Next it is introduced in conjunction with interpretation method of the attached drawing 1 to this case, this method may include:

Step S100, source language text to be translated is obtained.

Specifically, source language text to be translated can obtain through a variety of ways, the source language text uploaded such as user Or receive the text that the voice data progress speech recognition of user obtains.By taking voiced translation process as an example, voice can use End-point detection technology handles the real-time voice of acquisition, obtains sound bite.Further sound bite is identified, is obtained Text after to identification is as source language text to be translated.

Here, original language is the language that text to be translated uses.Corresponding, the language definition after translation is target language Speech, the purpose of the application namely to source language text translate, obtain target language text.

Step S110, made pauses in reading unpunctuated ancient writings according to current translation scene to the source language text, the original language after being made pauses in reading unpunctuated ancient writings Text.

It is understood that the punctuate mode in source language text obtained in previous step is (i.e. in source language text Punctuate) habit that may be spoken by speaker influenced, and punctuate mode is simultaneously lack of standardization, current translation scene is not considered yet, if Directly the source language text of acquisition is translated, translation result quality is not also high.

For this purpose, increasing the process for carrying out punctuate processing to source language text in this step, and the punctuate treatment process is examined Current translation scene is considered, so that the punctuate mode of the source language text after punctuate is more in line with current translation scene.For source The detailed process of language text punctuate processing, is hereinafter described.

Step S120, the source language text after the punctuate is translated, obtains target language text.

The source language text after making pauses in reading unpunctuated ancient writings obtained in previous step is turned in general, Machine Translation Model can be used It translates, the target language text after being translated.

On this basis, the embodiment of the present application can also select target language text synthesizing language according to user's needs Sound, and then voice broadcast is carried out, it realizes from source language speech to the conversion process of object language voice.

Interpretation method provided by the embodiments of the present application, when getting source language text to be translated, further basis is worked as Preceding translation scene makes pauses in reading unpunctuated ancient writings to source language text, and the source language text after obtained punctuate is more in line with current translation field Scape, it is clear that compared to existing interpretation method, the application increases punctuate optimization process to obtained source language text, that is, considers Current translation scene makes pauses in reading unpunctuated ancient writings again to source language text, so that the punctuate mode of source language text more optimizes, in turn Source language text after punctuate is translated based on this, the quality of obtained target language text also can be higher.

In another embodiment of the application, to above-mentioned steps S110, according to current translation scene to the original language Text is made pauses in reading unpunctuated ancient writings, and the process of the source language text after being made pauses in reading unpunctuated ancient writings is introduced.

It is possible, firstly, to understand, punctuate mode can have the characteristics that certain under different translation scenes, therefore the application Punctuate mode rule corresponding with each translation scene can be preset.Show for example, under meeting occasion, it may be necessary to the greatest extent Using short sentence, namely use termination type punctuates more as far as possible amount more.It is corresponding that this translation scene of meeting occasion then can be set Punctuate mode rule in, the number of the termination type punctuate used is greater than nonterminal type punctuate.

Here, according to whether can completely express sentence meaning, punctuate is divided into termination type punctuate and nonterminal type punctuate Two classes, wherein the representative of termination type punctuate can completely express sentence meaning, such as fullstop, question mark, exclamation mark.Nonterminal type punctuate generation Table can not completely express sentence meaning, such as comma, pause mark etc..

Based on this, when carrying out current translation, preset corresponding relationship can be inquired, determines that current translation scene is corresponding Punctuate mode rule.And then after getting source language text to be translated, according to determining punctuate mode rule, to source Language text carries out punctuate processing, the source language text after being made pauses in reading unpunctuated ancient writings.Obviously, the source language text after the punctuate can meet The needs of current translation scene.

Further, the embodiment of the present application also provides the processing modes that another kind makes pauses in reading unpunctuated ancient writings to source language text, i.e., Machine learning model can be used to execute the process made pauses in reading unpunctuated ancient writings to source language text, detailed process is introduced as follows:

Defining and carrying out the machine learning model of punctuate processing in the present embodiment is text punctuate model, be can be used existing Various structures machine learning model, such as BLSTM model, Self-Attention model under sequence labelling frame, or Person is that the sequence under encoding and decoding Encode-Decode frame generates model etc., can also use existing various structures model certainly Combination.

Certainly, according to the model under sequence labelling frame, then the input of model is each word in text sequence, model Output be the corresponding punctuate classification of each word, which can be null value, comma, fullstop, question mark etc., wherein null value generation Table is not for adding any punctuate after word.

According to the model under Encode-Decode frame, then the input of model can be the text sequence without punctuate, The output of model is the text sequence comprising pointing information, namely by model to the result after input text sequence addition punctuate. Which kind of can be specifically needed according to application using the machine learning model of form to select, the application does not do considered critical.

Further, after the structure of determining text punctuate model, the training data for obtaining model is needed further exist for To be trained to text punctuate model.A large amount of original language training text can be collected in the embodiment of the present application, as training Data.The collection for defining original language training text composition is combined into T1.Further, it is also necessary to determine each original language training text in T1 This punctuate for meeting current translation scene is as a result, training label as corresponding source speech training text, by the training label Cooperate training data training text punctuate model together.It is understood that the training data obtained in the present embodiment can be It is extracted from source language text to be translated.In addition to this it is possible to training data is obtained by other approach, for example, from Selected part text in existing material text, as training data.

According to the text punctuate model after above-mentioned training data and training label training, have input sample according to symbol The needs for closing current translation scene carry out punctuate processing, and output meets the ability of the punctuate result of current translation scene.Based on this, The source language text input text punctuate model that can be will acquire, the original language text after obtaining the punctuate of text punctuate model output This, namely obtain the source language text after punctuate optimization processing.

In another embodiment of the application, expansion explanation, text are carried out to the determination process of above-mentioned text punctuate model The determination process of punctuate model may include:

A1, original language training text is obtained.

It ibid introduces, the collection for defining original language training text composition is combined into T1.

A2, determine the punctuate for meeting the current translation scene of the original language training text as a result, disconnected as target Sentence result.

The collection for defining the target punctuate result composition for meeting current translation scene of original language training text is combined into T2.T2 is The result that original language training text each in T1 is translated.

A3, using the original language training text as training data, using the target punctuate result as training label, instruction Practice text punctuate model.

On the basis of the determination process of the text punctuate model of above-mentioned example, the embodiment of the present application provides another text The method of determination of this punctuate model increases following steps that is, before above-mentioned A3:

A4, it obtains manually to original language training text mark punctuate as a result, original language after manually being marked Training text.

It specifically, can be by manually carrying out punctuate to original language training text after obtaining original language training text Mark, the original language training text after manually being marked.

A5, using the original language training text as training data, with the original language training text after the artificial mark As training label, training text punctuate model obtains preliminary text punctuate model.

On the basis of A4 step, can with, using the original language training text as training data, with the artificial mark Original language training text afterwards obtains preliminary text punctuate model as training label, training text punctuate model.

On this basis, above-mentioned A3 step can specifically include:

Specifically, model adaptation update method can be used, using the original language training text as training data, The target punctuate result carries out parameter update as training label, to preliminary text punctuate model.

Using this model update method, model training data volume can be improved, so that the text that training obtains is disconnected Sentence model is more outstanding.

In another embodiment of the application, meeting for the original language training text is described current to be determined to above-mentioned A2 The punctuate of scene is translated as a result, the process as target punctuate result is introduced.

It is understood that above-mentioned have been described above can preset punctuate mode corresponding with each translation scene and advise Then.It can then break according to the corresponding punctuate mode rule of current translation scene to original language training text in the present embodiment Sentence processing, obtains the punctuate result of each original language training text.

In addition, additionally providing another optional embodiment in the present embodiment, can specifically include:

A21, object language training of the original language training text after the translation under the current translation scene is obtained Text.

Specifically, object language training text of the original language training text after the translation under currently translation scene can lead to The mode for crossing human translation determines.That is, can be turned over by artificial according to current translation scene to original language training text It translates, obtains object language training text.

The punctuate changing mode of A22, reference settings are modified the punctuate mode of the original language training text, obtain Original language training text after to change, by after changing original language training text and the original language training text form candidate Original language training text.

Specifically, the embodiment of the present application can preset punctuate changing mode, so can according to setting punctuate more Change mode, the punctuate mode of original language training text is modified.

It is understood that the original language instruction by the way that punctuate changing mode is rationally arranged, after a plurality of change can be expanded Practice text.The punctuate mode for meeting current translation scene of original language training text or original language training text itself The punctuate mode of original language training text after punctuate mode or the change of a certain item.

That is, the treatment process of this step is the candidate source speech training in order to expand candidate source speech training text Text kind contains the punctuate mode for meeting current translation scene of original language training text.

A23, using preset Machine Translation Model, each candidate source speech training text is translated, is obtained The machine translation result of each candidate source speech training text.

A24, the machine translation of each candidate source speech training text is determined as a result, with object language training text This similarity, using the highest candidate source speech training text of similarity as the target punctuate result.

Specifically, object language training text is that original language training text is tied after the translation under the current translation scene Fruit.Based on this, using object language training text as standard in this step, determine that the machine of each candidate source speech training text turns over Translate the similarity as a result, with object language training text.It is understood that the candidate source speech training text that similarity is higher This, illustrates that it is higher with the matching degree of current translation scene.Based on this, the highest candidate original language instruction of similarity can be chosen Practice text, as the target punctuate result for meeting current translation scene of original language training text.

It optionally, can be by the way of BLEU marking, i.e., with object language training when calculating similarity in this step Text is standard, carries out marking evaluation to the machine translation result of each candidate source speech training text respectively, and marking value is higher Candidate source speech training text represent it is higher with the similarity of object language training text.

A kind of optionally the punctuate mode of original language training text is carried out more further, in this embodiment describing Change, the mode of the original language training text after being changed can specifically include:

A221, the nonterminal type punctuate that the original language training text includes is determined.

Specifically, for each original language training text T1 in original language training text set T1_j(j=1 ... n), n is The item number of original language training text, determines T1 in T1_jThe number M for the nonterminal type punctuate for including.

A222, each nonterminal type punctuate for including by the original language training text, are replaced using termination type punctuate It changes, the original language training text after being changed.

It is understood that T1_jIn any one nonterminal type punctuate termination type punctuate can be used be replaced.

According to the alternative of this step introduction, if T1_jThe number for the nonterminal type punctuate for including be M, then by replacing before Original language training text, and form candidate source speech training text by replacing the original language training text after obtained change This set includes 2^M (2 power sides) candidate source speech training text in the set altogether.

It is illustrated below by an example:

Specifically, for original language training text: " today, weather was pretty good, I, which wants to go to, climbs the mountain, you go? ", due to wherein It is nonterminal type punctuate there are two comma, therefore each comma can be replaced with termination type punctuate such as fullstop, may finally obtain 2 ^2=4 candidate source speech training text, as follows:

1, today, was weather pretty good, I, which wants to go to, climbs the mountain, you go?

2, today, weather was pretty good.Does I, which wants to go to, climb the mountain, you go?

3, today, weather was pretty good, I, which wants to go to, climbs the mountain.Do you go?

4, today, weather was pretty good.I, which wants to go to, climbs the mountain.Do you go?

It is understood that the 1st article is original language training text itself in 4 obtained candidate source speech training texts, The 2-4 articles is original language training text after the change obtained after being replaced by punctuate.

The implementation of the A22 introduced based on the above embodiment, the embodiment of the present application further describe above-mentioned A23, benefit With preset Machine Translation Model, to a kind of optional embodiment that each candidate source speech training text is translated, It can specifically include:

A231, by each candidate source speech training text according to it includes termination type punctuate carry out clause's division, Clause's sequence after being divided.

Specifically, for each candidate source speech training text, termination type mark wherein included is traversed from the beginning Candidate source speech training text, as a division points, will be divided into several clauses, after division at each termination type punctuate by point Each clause according to the sequencing in candidate source speech training text, form clause's sequence.

A232, using preset Machine Translation Model, to each in clause's sequence of the candidate source speech training text Clause translates respectively, obtains the machine translation result of each clause.

A232, the machine translation result merging of each clause is obtained into the time according to the sequence of each clause in clause's sequence Select the machine translation result of original language training text.

It is understood that the item number of candidate source speech training text is 2^M, it is equal for every candidate language training text It is translated in the manner described above, then may finally obtain 2^M machine translation result.

According to the above-mentioned processing mode of the application introduction, termination type mark can be converted by a part of nonterminal type punctuate Point, then the probability of occurrence of termination type punctuate can improve, and in machine translation process, be with the content before termination type punctuate into The primary translation of row, therefore the time for waiting termination type punctuate can be shortened according to application scheme, to improve the production of translation result Speed out reduces user's subjective time for waiting translation result, improves the experience of user.

Still illustrate the realization process of A23 with above-mentioned exemplary example:

For the ease of statement, 4 candidate source speech training texts of above-mentioned example are respectively defined as candidate text 1-4.

For candidate text 1: due to only finally there is termination type punctuate in the sentence, without termination type punctuate in sentence, because This sentence can not further be split, and the clause after splitting in other words is candidate text 1 itself.It therefore, can be by candidate text Originally it 1 is translated as a sentence feeding Machine Translation Model.

For candidate text 2: being a fullstop after " good " in the sentence, can be split to sentence, candidate text 2 can Two clauses are obtained to split, are respectively as follows:

Clause 21: today, weather was pretty good.

Does clause 22: I, which wants to go to, climb the mountain, you go?

For two clauses after splitting, it is respectively fed to Machine Translation Model and is translated, and machine translation result is closed And obtain the machine translation result of candidate text 2.

For candidate text 3: being a fullstop after " climbing the mountain " in the sentence, can be split to sentence, candidate text 3 can Two clauses are obtained to split, are respectively as follows:

Clause 31: today, weather was pretty good, I, which wants to go to, climbs the mountain.

Do clause 32: you go?

For two clauses after splitting, it is respectively fed to Machine Translation Model and is translated, and machine translation result is closed And obtain the machine translation result of candidate text 3.

For candidate text 4: after " good " and " climbing the mountain " it is respectively a fullstop in the sentence, sentence can be split, Candidate text 4 can split to obtain three clauses, be respectively as follows:

Clause 41: today, weather was pretty good.

Clause 42: I, which wants to go to, climbs the mountain.

Do clause 43: you go?

For three clauses after splitting, it is respectively fed to Machine Translation Model and is translated, and machine translation result is closed And obtain the machine translation result of candidate text 4.

Furthermore further, it is assumed that for above-mentioned candidate text 1-4, given a mark using BLEU method, score value is successively are as follows: 0.1,0.2,0.3,0.4.The highest candidate text 4 of score value can be then chosen, is currently turned over as meeting for original language training text Translate the target punctuate result of scene.

Then, original language training text: " today, weather was pretty good, I, which wants to go to, climbs the mountain, you go? "

Target punctuate result: " today, weather was pretty good.I, which wants to go to, climbs the mountain.You go? "

It can be using this original language training text machine target punctuate result as training data and training label, training text Punctuate model.

In another embodiment of the application, to above-mentioned steps S120, the source language text after the punctuate is carried out Translation, the process for obtaining target language text are introduced.

Introduction based on the above embodiment carries out the source language text after punctuate it is found that Machine Translation Model can be used Translation.In specific translation process, clause can be carried out first, in accordance with the termination type punctuate that the source language text after punctuate includes and drawn Point, clause's sequence after being divided.Further, using preset Machine Translation Model, to the source language text pair after punctuate Each clause translates respectively in the clause's sequence answered, and obtains the machine translation result of each clause.According in clause's sequence The sequence of each clause merges the machine translation result of each clause, the machine translation of the source language text after obtaining the punctuate As a result, namely obtaining target language text.

Introduction based on the various embodiments described above is it is found that present application contemplates current translation scenes to break to source language text , the source language text after obtained punctuate is more in line with current translation scene, and then based on this to the original language after punctuate Text is translated, and the quality of obtained target language text also can be higher.

Further, the application is replaced in punctuate by punctuate, can convert end for a part of nonterminal type punctuate Only type punctuate, then the probability of occurrence of termination type punctuate can improve, and in machine translation process, it is before termination type punctuate Content is once translated, therefore can shorten the time for waiting termination type punctuate according to application scheme, to improve translation knot The speed of response of fruit reduces user's subjective time for waiting translation result, improves the experience of user.

Translating equipment provided by the embodiments of the present application is described below, translating equipment described below and above description Interpretation method can correspond to each other reference.

Referring to fig. 2, Fig. 2 is a kind of translating equipment structural schematic diagram disclosed in the embodiment of the present application.

As shown in Fig. 2, the apparatus may include:

Source language text acquiring unit 11, for obtaining source language text to be translated；

Text punctuate unit 12 is made pauses in reading unpunctuated ancient writings for being made pauses in reading unpunctuated ancient writings according to current translation scene to the source language text Source language text afterwards；

Source language text translation unit 13 obtains target language for translating to the source language text after the punctuate Say text.

Optionally, above-mentioned text punctuate unit may include:

Optionally, the translating equipment of the application can also include: text punctuate model determination unit, for determining that text is disconnected Sentence model.Text punctuate model may include:

Optionally, above-mentioned punctuate result determination unit may include:

Optionally, above-mentioned punctuate changing unit may include:

Optionally, above-mentioned original language training text translation unit may include:

Optionally, above-mentioned text punctuate model can also include:

Second model training unit is used for using the original language training text as training data, with the artificial mark Original language training text afterwards obtains preliminary text punctuate model as training label, training text punctuate model.Based on this, Above-mentioned first model training unit specifically can be used for:

Optionally, above-mentioned source language text translation unit may include:

Translating equipment provided by the embodiments of the present application can be applied to interpreting equipment, such as PC terminal, cloud platform, server and clothes Business device cluster etc..Optionally, Fig. 3 shows the hardware block diagram of interpreting equipment, referring to Fig. 3, the hardware configuration of interpreting equipment It may include: at least one processor 1, at least one communication interface 2, at least one processor 3 and at least one communication bus 4；

In the embodiment of the present application, processor 1, communication interface 2, memory 3, communication bus 4 quantity be at least one, And processor 1, communication interface 2, memory 3 complete mutual communication by communication bus 4；

Processor 1 may be a central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road etc.；

Memory 3 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile Memory) etc., a for example, at least magnetic disk storage；

Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:

Obtain source language text to be translated；

Optionally, the refinement function of described program and extension function can refer to above description.

The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor Capable program, described program are used for:

Obtain source language text to be translated；

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of interpretation method characterized by comprising

Obtain source language text to be translated；

2. the method according to claim 1, wherein described carry out the source language text according to translation scene Punctuate, the source language text after being made pauses in reading unpunctuated ancient writings, comprising:

The source language text is inputted to preset text punctuate model, the source language after obtaining the punctuate of text punctuate model output Say text；

Wherein, the text punctuate model is, using original language training text as training data, with the original language training text Meet it is described it is current translation scene punctuate result as training label training obtain.

3. according to the method described in claim 2, it is characterized in that, the determination process of the text punctuate model includes:

Obtain original language training text；

Determine the punctuate for meeting the current translation scene of the original language training text as a result, as target punctuate result；

4. according to the method described in claim 3, it is characterized in that, meeting for the determination original language training text is described The punctuate of current translation scene is as a result, as target punctuate result, comprising:

The punctuate changing mode of reference settings is modified the punctuate mode of the original language training text, after obtaining change Original language training text, by after changing original language training text and the original language training text form candidate original language instruction Practice text；

Using preset Machine Translation Model, each candidate source speech training text is translated, is obtained each described The machine translation result of candidate source speech training text；

The machine translation of each candidate source speech training text is determined as a result, similar to the object language training text Degree, using the highest candidate source speech training text of similarity as the target punctuate result.

5. according to the method described in claim 4, it is characterized in that, the punctuate changing mode of the reference settings, to the source The punctuate mode of speech training text is modified, the original language training text after being changed, comprising:

The each nonterminal type punctuate for including by the original language training text, is replaced using termination type punctuate, is obtained more Original language training text after changing.

6. according to the method described in claim 5, it is characterized in that, described using preset Machine Translation Model, to each institute It states candidate source speech training text to be translated, obtains the machine translation of each candidate source speech training text as a result, packet It includes:

By each candidate source speech training text according to it includes termination type punctuate carry out clause's division, after obtaining division Clause's sequence；

Using preset Machine Translation Model, to each clause in clause's sequence of the candidate source speech training text respectively into Row translation, obtains the machine translation result of each clause；

According to the sequence of each clause in clause's sequence, the machine translation result of each clause is merged, obtains the candidate original language The machine translation result of training text.

7. according to the method described in claim 3, it is characterized in that, described using the original language training text as training number According to, using the target punctuate result as training label, before training text punctuate model, this method further include:

It obtains manually to original language training text mark punctuate as a result, the original language training after manually being marked is literary This；

Then, described using the original language training text as training data, using the target punctuate result as training label, instruction Practice text punctuate model, comprising:

Using the original language training text as training data, using the target punctuate result as training label, described in training Preliminary text punctuate model.

8. the method according to claim 1, wherein the source language text to after the punctuate is turned over It translates, obtains target language text, comprising:

By the source language text after the punctuate according to it includes termination type punctuate carry out clause's division, son after being divided Sentence sequence；

Using preset Machine Translation Model, to each clause in clause's sequence of the source language text after the punctuate respectively into Row translation, obtains the machine translation result of each clause；

According to the sequence of each clause in clause's sequence, the machine translation result of each clause is merged, obtains the object language text This.

9. a kind of translating equipment characterized by comprising

Text punctuate unit, for being made pauses in reading unpunctuated ancient writings according to current translation scene to the source language text, the source after being made pauses in reading unpunctuated ancient writings Language text；

Source language text translation unit obtains target language text for translating to the source language text after the punctuate.

10. device according to claim 9, which is characterized in that the text punctuate unit includes:

Model reference unit obtains text punctuate model for the source language text to be inputted to preset text punctuate model Source language text after the punctuate of output；

11. device according to claim 10, which is characterized in that further include: text punctuate model determination unit, for true Determine text punctuate model；The text punctuate model includes:

Punctuate result determination unit, for determining the punctuate knot for meeting the current translation scene of the original language training text Fruit, as target punctuate result；

First model training unit is used for using the original language training text as training data, with the target punctuate result As training label, training text punctuate model.

12. device according to claim 11, which is characterized in that the punctuate result determination unit includes:

Object language training text acquiring unit, for obtaining the original language training text under the current translation scene Object language training text after translation；

Punctuate changing unit, for the punctuate changing mode of reference settings, to the punctuate mode of the original language training text into Row change, the original language training text after being changed, by the original language training text and original language training text after changing This composition candidate source speech training text；

Original language training text translation unit, for utilizing preset Machine Translation Model, to each candidate original language instruction Practice text to be translated, obtains the machine translation result of each candidate source speech training text；

Similarity determining unit, for determining the machine translation of each candidate source speech training text as a result, with the mesh Poster says the similarity of training text, using the highest candidate source speech training text of similarity as the target punctuate result.

13. device according to claim 12, which is characterized in that the punctuate changing unit includes:

Nonterminal type punctuate replacement unit, each nonterminal type punctuate for including by the original language training text use Termination type punctuate is replaced, the original language training text after being changed.

14. device according to claim 13, which is characterized in that the original language training text translation unit includes:

First clause's division unit, for by each candidate source speech training text according to it includes termination type punctuate into Row clause divides, clause's sequence after being divided；

First clause's translation unit, for utilizing preset Machine Translation Model, to the son of the candidate source speech training text Each clause translates respectively in sentence sequence, obtains the machine translation result of each clause；

First translation result combining unit, for the sequence according to each clause in clause's sequence, by the machine translation knot of each clause Fruit merges, and obtains the machine translation result of the candidate source speech training text.

15. device according to claim 11, which is characterized in that the text punctuate model further include:

Artificial annotation results acquiring unit, for obtaining manually to original language training text mark punctuate as a result, obtaining Original language training text after artificial mark；

Second model training unit is used for using the original language training text as training data, after the artificial mark Original language training text obtains preliminary text punctuate model as training label, training text punctuate model；

Then the first model training unit is specifically used for:

16. device according to claim 9, which is characterized in that the source language text translation unit, comprising:

Second clause's division unit, for by the source language text after the punctuate according to it includes termination type punctuate carry out son Sentence divides, clause's sequence after being divided；

Second clause's translation unit, for utilizing preset Machine Translation Model, to the son of the source language text after the punctuate Each clause translates respectively in sentence sequence, obtains the machine translation result of each clause；

Second translation result combining unit, for the sequence according to each clause in clause's sequence, by the machine translation knot of each clause Fruit merges, and obtains the target language text.

17. a kind of interpreting equipment, which is characterized in that including memory and processor；

The memory, for storing program；

The processor realizes each of such as interpretation method of any of claims 1-8 for executing described program Step.

18. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed When device executes, each step such as interpretation method of any of claims 1-8 is realized.