CN108628819A - Treating method and apparatus, the device for processing - Google Patents
Treating method and apparatus, the device for processing Download PDFInfo
- Publication number
- CN108628819A CN108628819A CN201710157267.5A CN201710157267A CN108628819A CN 108628819 A CN108628819 A CN 108628819A CN 201710157267 A CN201710157267 A CN 201710157267A CN 108628819 A CN108628819 A CN 108628819A
- Authority
- CN
- China
- Prior art keywords
- optimal
- punctuate
- point
- cut
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Abstract
An embodiment of the present invention provides a kind for the treatment of method and apparatus and a kind of device for processing, method therein specifically includes:Obtain pending text;According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the corresponding optimal punctuate result of the pending text is obtained;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result include:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;Export the corresponding optimal punctuate result of the processing text.The embodiment of the present invention can improve the translation quality of the corresponding punctuate result of pending text.
Description
Technical field
The present invention relates to natural language processing technique fields, more particularly to a kind for the treatment of method and apparatus and one kind
Device for processing.
Background technology
Punctuate technology is an important basic technology in natural language processing field.So-called punctuate is exactly by text
It is cut into semantic complete sentence.Due to by text dividing at semantic complete sentence, be to realize machine recognition human language
The first step, therefore punctuate technology is widely used in the application of the natural language processings such as machine translation, speech recognition, information service
In branch.
Machine translation mothod refers to that a kind of natural language (original language) is converted to another natural language using computer
The process of (object language).Traditional machine translation mothod before carrying out machine translation, usually by source text input by user or
Person carries out punctuate processing by the source text that speech recognition obtains, then carries out machine translation according to punctuate handling result;Therefore, break
The accuracy rate of sentence handling result has vital influence to mechanical translation quality, and the accuracy rate height of punctuate handling result will
Directly affect the height of mechanical translation quality.
The mode of existing scheme generally use setting threshold value carries out text punctuate.If for example, the comma quantity that text includes
More than first threshold or when the number of words that text is included is more than second threshold, then make pauses in reading unpunctuated ancient writings to the text.
However, semantic incomplete sentence is susceptible in the punctuate handling result that existing scheme obtains, and the semanteme is not
Complete sentence will influence the translation quality of machine translation, therefore the punctuate handling result of existing scheme leads to machine translation
Translation quality is relatively low.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind
Processing method, processing unit and the device for processing to solve the above problems, the embodiment of the present invention can improve pending text
The translation quality of this corresponding punctuate result.
To solve the above-mentioned problems, the invention discloses a kind of processing methods, including:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtained
This corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result
Including:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains
The corresponding optimal punctuate of the pending text is as a result, include:
Using dynamic programming algorithm, the segmentation obtained according to the preset punctuation mark for including based on the pending text
Point obtains the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text
Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determined
Set;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion mode
The backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described
The corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence
Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute
The subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion mode
The backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point
The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain
Point;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described the
Two semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence
Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence
At least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, with
And it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentence
Translation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result
Point obtains the corresponding optimal punctuate of the pending text as a result, including:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, with
The maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to described
Pending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result
Recalled, including:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includes
Recall cut-point P2.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains
The corresponding optimal punctuate of the pending text is as a result, include:
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, to the pending text
Punctuate processing is carried out, to obtain the corresponding a variety of punctuate results of the pending text;
Determine the corresponding synthesized translation quality of the punctuate result;
The punctuate of selection synthesized translation optimal quality is as a result, make from the pending text corresponding a variety of punctuate results
For the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
On the other hand, the invention discloses a kind of processing units, including:
Pending text acquisition module, for obtaining pending text;
Optimal punctuate result acquisition module, for being obtained according to the preset punctuation mark for including based on the pending text
Cut-point, obtain the corresponding optimal punctuate result of the pending text;Wherein, the synthesized translation of the optimal punctuate result
Optimal quality, the optimal punctuate result include:At least one sentence, the institute that the synthesized translation quality includes for punctuate result
There is sentence to correspond to the synthesis of translation quality;And
Optimal punctuate result output module, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module includes:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending text
The cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule includes:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines institute
State the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to large
Fixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement set
As a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence
Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute
Stating recursion unit includes:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding i
The optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point k
The translation quality score of adopted unit;Wherein, first semantic primitive includes:What the preceding i subordinate sentence included is located at cut-point k
Subordinate sentence before, second semantic primitive include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k
<i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive,
To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point k
Score obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentence
Cut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentence
Collect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding i
The corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit includes:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement set
Cutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result
Point;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement set
Backtracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement includes:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit, for obtain that the pending text includes be located at described first recall cut-point P1 before
The corresponding second backtracking cut-point P2 of subordinate sentence.
Optionally, the optimal punctuate result acquisition module includes:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text,
Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending text
Optimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Include memory and one or one in another aspect, the invention discloses a kind of device for processing
Above program, one of them either more than one program be stored in memory and be configured to by one or one with
It includes the instruction for being operated below that upper processor, which executes the one or more programs,:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtained
This corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result
Including:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
The embodiment of the present invention includes following advantages:
The cut-point that the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending text, described in acquisition
The corresponding optimal punctuate result of pending text;Due to the synthesized translation quality of the above-mentioned optimal punctuate result of the embodiment of the present invention
Optimal, the optimal punctuate result may include:At least one sentence, the synthesized translation quality can be a kind of punctuate result
Including all sentences correspond to the synthesis of translation quality;Therefore the optimal punctuate result of the embodiment of the present invention can realize synthesized translation
The global optimum of quality, therefore the optimal punctuate result of the embodiment of the present invention can improve the corresponding punctuate result of pending text
Translation quality.
Description of the drawings
Fig. 1 is a kind of example arrangement schematic diagram of processing system of the embodiment of the present invention;
Fig. 2 is a kind of processing method embodiment flow chart of the present invention;
Fig. 3 is a kind of schematic diagram of the path planning of pending text of the embodiment of the present invention;
Fig. 4 is a kind of structure diagram of processing unit embodiment of the present invention;
Fig. 5 be shown according to an exemplary embodiment it is a kind of for processing device as terminal when block diagram;And
Fig. 6 be shown according to an exemplary embodiment it is a kind of for processing device as server when block diagram.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
An embodiment of the present invention provides a kind of processing scheme, which can be according to based on the pending text packet
The cut-point that the preset punctuation mark contained obtains obtains the corresponding optimal punctuate result of the pending text;Due to the present invention
The synthesized translation optimal quality of the above-mentioned optimal punctuate result of embodiment, the optimal punctuate result may include:It is at least one
Sentence, the synthesized translation quality can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore this
The optimal punctuate result of inventive embodiments can realize the global optimum of synthesized translation quality, herein it is globally available in indicate wait for
The corresponding entirety of the corresponding optimal punctuate result of text is handled, therefore the optimal punctuate result of the embodiment of the present invention can be improved and be waited for
Handle the translation quality of the corresponding punctuate result of text.
The embodiment of the present invention can be applied to turn in the needs such as machine translation, speech recognition, information service punctuate and machine
The arbitrary scene translated, it will be understood that the embodiment of the present invention does not limit specific application scenarios.
For example, referring to Fig. 1, a kind of example arrangement schematic diagram of processing system of the embodiment of the present invention is shown, have
Body may include:Processing unit 101, machine translation apparatus 102 and translation result output device 103.Wherein, processing unit 101,
Machine translation apparatus 102 and translation result output device 103 can be used as individual server, can also be set to jointly same
In a server, that is, the embodiment of the present invention is for processing unit 101, machine translation apparatus 102 and translation result output device
103 specific location does not limit.
Wherein, processing unit 101 can obtain pending text;According to the preset mark for including based on the pending text
The cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding most to obtain the pending text
Excellent punctuate result;And export the corresponding optimal punctuate result of the processing text to machine translation apparatus 102.
Optionally, processing unit 101 can obtain pending text according to the voice signal of spoken user.Such situation
Under, the voice signal of spoken user can be converted to text message by processing unit 101, and from obtaining and waiting in text information
Manage text.In practical applications, spoken user may include:The use of voice signal is talked and sent out in the scene of simultaneous interpretation
Family, and/or the user etc. that voice signal is generated by terminal can be said by microphone or the reception of other voice collecting devices
Talk about the voice signal of user.
Optionally, processing unit 101 may be used speech recognition technology and the voice signal of spoken user be converted to text
Information.If the voice signal of user's spoken user is denoted as S, corresponding language is obtained after carrying out a series of processing to S
Sound characteristic sequence O, is denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is phonetic feature total number.
The corresponding sentences of voice signal S are considered as a word string being made of many words, are denoted as W={ w1, w2..., wn}.Voice is known
Other process is exactly to find out most probable word string W according to known phonetic feature sequence O.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people
Sound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, to establish speech recognition institute
The template needed;The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratio
Compared with process, finally determine with the optimal Template of the inputted voice match of the user, to obtain the result of speech recognition.Tool
The speech recognition algorithm of body can be used training and the recognizer of the hidden Markov model based on statistics, base can also be used
In the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present invention
Embodiment does not limit specific speech recognition process.
Alternatively, optionally, processing unit 101 can obtain pending text according to text input by user.For example, user
The text inputted under the scenes such as instant messaging, office documents, can be as the source of pending text.
In practical applications, processing unit 101 can according to practical application request, from the corresponding text of voice signal or
Pending text is obtained in text input by user.It is alternatively possible to the interval time according to voice signal S, from voice signal S
Pending text is obtained in corresponding text;For example, when the interval time of voice signal S being more than time threshold, it can foundation
The time point determines corresponding first separation, using the corresponding texts of voice signal S before first separation as waiting locating
Text is managed, and the corresponding texts of voice signal S after first separation are handled, it is pending to continue therefrom to obtain
Text.It is alternatively possible to according to the number of words that the corresponding text of voice signal or text input by user are included, believe from voice
Pending text is obtained in number corresponding text or text input by user;For example, in the corresponding text of voice signal or
When the number of words that text input by user includes is more than number of words threshold value, corresponding second boundary can be determined according to the number of words threshold value
Point, can be using the corresponding texts of voice signal S before second separation as pending text, and to second separation
The corresponding texts of voice signal S later are handled, to continue therefrom to obtain pending text.
In the embodiment of the present invention, sentence be made of according to certain syntax rule word or phrase, expression it is relatively complete
The meaning, have the syntactical unit of the apparent tone and sentence tune.Optionally, sentence may include:Simple sentence and/or complex sentence.Wherein, simple sentence
It is the sentence being made of phrase or single word, one certain tone language that completely looks like and have relatively of independent expression
It adjusts, such as " classmates have returned to school ", " he is in the pink of condition ".Relatively independent simple sentence form is referred to as point in complex sentence
Sentence, generally has pause, is indicated with comma or branch on written between subordinate sentence and subordinate sentence;Subordinate sentence and subordinate sentence have one in the sense
Fixed contact, commonly uses some related words adverbial word or phrase of relevant effect (conjunction) to connect, such as " China wants rich and powerful, this
It is the hope of more than ten00000000 Chinese people " etc..
Optionally, the interval time and its language model that processing unit 101 can be according to voice signal S, in spoken user
The corresponding text message of voice signal in be inserted into corresponding preset punctuation mark.Optionally, the preset punctuation mark of insertion can
For identifying the pause in sentence between each subordinate sentence, which can include but is not limited to:Comma, pause mark, branch
Deng.
The cut-point that processing unit 101 is obtained according to the preset punctuation mark for including based on the pending text obtains
The corresponding optimal punctuate result of the pending text;Specifically, in the embodiment of the present invention, the pending text includes pre-
Set punctuation mark possible as or not as punctuate processing cut-point, that is, can be according to the pending text packet
The preset punctuation mark contained as or not as punctuate processing cut-point situation, make pauses in reading unpunctuated ancient writings to the pending text
Processing, in this way, pending text will it is corresponding there are many punctuate scheme and its corresponding punctuate as a result, the embodiment of the present invention most
What is obtained eventually is the punctuate result of synthesized translation optimal quality.
In a kind of application example of the present invention, it is assumed that 2 comma punctuates that pending text [A, B, C] includes have can
Can or can not possibly be as the cut-point of punctuate processing, and assume that corresponding punctuate result may include:{ (A, B, C) }, (A),
(B, C) }, { (A), (B), (C) } and { (A, B), (C) } etc., then the embodiment of the present invention can obtain synthesized translation optimal quality
Punctuate result;Wherein, [] indicates that pending text, () indicate that the sentence that punctuate obtains, { } indicate punctuate result.
Machine translation apparatus 102, can be received from processing unit 101 the processing text it is corresponding it is optimal punctuate as a result,
And the corresponding optimal punctuate result of the processing text is translated as to the word of object language, wherein machine translation apparatus
102 may be used the translation that machine translation mothod carries out optimal punctuate result, and machine translation mothod can utilize computer by one
The target subordinate sentence of kind natural language (original language) is converted to the process of the word of another natural language (object language), for example,
Source language and the target language can be respectively Chinese and English, alternatively, source language and the target language can be respectively English in
Text etc., the embodiment of the present invention mention specific machine translation mothod for specific original language, target language and do not limit.It is optional
The type on ground, above-mentioned machine translation apparatus 102 may include:Measurement type and/or neural network type etc., it will be understood that this
Inventive embodiments do not limit the concrete type of machine translation apparatus 102.
Translation result output device 103 can receive the word of object language from machine translation apparatus 102, and to the target
The word of language is exported, and the corresponding way of output may include:Voice mode and/or interface manner etc..For example, in unison
Under the scene of translation, the text conversion of the object language can be the voice of object language, and export.It is alternatively possible to
It is object language by the text conversion of the object language using the switch technology (such as speech synthesis technique) of Text To Speech
Voice, and by the speech plays such as earphone, loud speaker device by the voice output of object language.It is appreciated that the present invention is implemented
Example is not for limiting the detailed process of voice and output that the text conversion of the object language is object language.Again
Such as, under the scene of information service (such as translation web site or translation APP), directly machine translation apparatus 102 can be obtained
The word of object language exports, for example, the text importing of object language is looked into the display device of such as screen for user
It sees.
It is appreciated that processing system shown in Fig. 1 is intended only as can be exemplified, in fact, processing unit 101 can in addition to
Except machine translation apparatus 102 other devices output processing text it is corresponding it is optimal punctuate as a result, the embodiment of the present invention for
Specific processing system does not limit.
Embodiment of the method
With reference to Fig. 2, shows a kind of processing method embodiment flow chart of the present invention, can specifically include following steps:
Step 201 obtains pending text;
Step 202, the cut-point obtained according to the preset punctuation mark for including based on the pending text, described in acquisition
The corresponding optimal punctuate result of pending text;Wherein, the synthesized translation optimal quality of the optimal punctuate result, it is described optimal
Punctuate result may include:At least one sentence, the synthesized translation quality can be all sentences pair that punctuate result includes
Answer the synthesis of translation quality;
Step 203, the corresponding optimal punctuate result of the output processing text.
Processing method provided in an embodiment of the present invention can be applied to the application environment of the computing devices such as terminal or server
In.Optionally, above-mentioned terminal can include but is not limited to:Smart mobile phone, tablet computer, pocket computer on knee, vehicle mounted electric
Brain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic services
Device, the processing service for providing pending text to client.
Processing method provided in an embodiment of the present invention is applicable to the processing of the language such as Chinese, Japanese, Korean, for improving
The translation quality of the corresponding punctuate result of pending text.It is appreciated that the arbitrary language made pauses in reading unpunctuated ancient writings is in this hair
In the scope of application of the processing method of bright embodiment.
In the embodiment of the present invention, the text that pending text can be used for indicating to be handled, which can
With the text or voice inputted by computing device from user, other computing devices are can be from.It needs to illustrate
It is that may include in above-mentioned pending text:A kind of language or more than one language, for example, in above-mentioned pending text
It may include Chinese, can also include the Chinese mixing with other for example English language, the embodiment of the present invention is to specifically waiting for
Processing text does not limit.
In practical applications, the computing device of the embodiment of the present invention can by client end AP P (application,
Application the process flow of the embodiment of the present invention) is executed, client application may operate on computing device, example
Such as, which can be the arbitrary APP that runs in terminal, then the client application can be answered from other of computing device
With the pending text of acquisition.Alternatively, the computing device of the embodiment of the present invention can be executed by the functional device of client application
The process flow of the embodiment of the present invention, then the functional device can be from the pending text of other functional devices acquisition.Alternatively,
The computing device of the embodiment of the present invention can execute the processing method of the embodiment of the present invention as server.
In a kind of alternative embodiment of the present invention, the method for the embodiment of the present invention can also include:Step 201 is obtained
At least one pending text write-in buffer area taken;Then step 202 can read pending text from the buffer area first,
And the cut-point obtained according to the preset punctuation mark for including based on read pending text, obtain the pending text
Corresponding optimal punctuate result.It is alternatively possible to establish such as queue, array or chained list in the memory field of computing device
Data structure does not limit specific buffer area as above-mentioned buffer area, the embodiment of the present invention.It is above-mentioned to use buffer area
The treatment effeciency of pending text can be improved by storing the mode of pending text, it will be understood that pending using disk storage
The mode of text is also feasible, and the embodiment of the present invention does not limit the specific storage mode of pending text.
In the embodiment of the present invention, the preset punctuation mark that the pending text includes is possible as or not as disconnected
The cut-point of sentence processing, that is, can be used as according to the preset punctuation mark that the pending text includes or not as disconnected
The situation of the cut-point of sentence processing, carries out punctuate processing, in this way, a pending text will be corresponding with to the pending text
A variety of punctuate schemes and its it is corresponding punctuate as a result, the embodiment of the present invention it is finally obtained be synthesized translation optimal quality punctuate
As a result.
The embodiment of the present invention can provide point obtained according to the preset punctuation mark for including based on the pending text
Cutpoint, the following optimal result for obtaining the corresponding optimal punctuate result of the pending text obtain scheme:
Optimal result acquisition scheme 1,
Optimal result obtains scheme 1:It is obtained according to the preset punctuation mark for including based on the pending text
The cut-point arrived carries out punctuate processing to the pending text, is tied with obtaining the corresponding a variety of punctuates of the pending text
Fruit;Determine the corresponding synthesized translation quality of the punctuate result;And from the corresponding a variety of punctuate results of the pending text
The punctuate of synthesized translation optimal quality is selected as a result, as the corresponding optimal punctuate result of the pending text.
In practical applications, path planning algorithm may be used, according to the preset mark for including based on the pending text
The cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding more to obtain the pending text
Kind path and the corresponding punctuate result in each path.The principle of above-mentioned path planning algorithm can be, in the ring with barrier
In border, according to certain evaluation criterion, a collisionless path from initial state to dbjective state is found, specific to the present invention
Embodiment, barrier can be used for indicating that the corresponding cut-point of pending text, initial state and dbjective state indicate to wait locating respectively
Manage the first subordinate sentence and end subordinate sentence of text.
With reference to Fig. 3, a kind of schematic diagram of the path planning of pending text of the embodiment of the present invention is shown, wherein wait for
It is [A, B, C] to handle text, it is assumed that 2 comma punctuates that pending text [A, B, C] includes are possible to or can not possibly make
For the cut-point of punctuate processing, in Fig. 3, subordinate sentence A, B, C are indicated with rectangle respectively, and comma punctuate is indicated with circle respectively, funny
When number punctuate is used as cut-point, corresponding rounded periphery be provided with hexagon, then the punctuate result of [A, B, C] may include:0
Cut-point corresponding { (A, B, C) }, the 1st comma punctuate are as cut-point corresponding { (A), (B, C) }, the 1st comma punctuate
It is corresponding as cut-point as cut-point corresponding { (A), (B), (C) } and the 2nd comma punctuate with the 2nd comma punctuate
{ (A, B), (C) } etc..
It is appreciated that path planning algorithm is intended only as the alternative embodiment of the embodiment of the present invention, actually this field skill
Art personnel can according to practical application request, obtain the corresponding a variety of punctuates of the pending text using other algorithms as a result,
It is appreciated that the embodiment of the present invention is not subject to the specific acquisition algorithm of the corresponding a variety of punctuate results of the pending text
Limitation.
In a kind of alternative embodiment of the present invention, the corresponding synthesized translation quality of the determination punctuate result can be with
Including:For the sentence that each punctuate result includes, corresponding translation quality score is determined;All sentences for including to each punctuate result
The corresponding translation quality score of son is merged, to obtain corresponding synthesized translation quality score;It can then be tied from all punctuates
The highest punctuate of synthesized translation quality score is obtained in fruit as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the above-mentioned sentence for including for each punctuate result determines that the process of corresponding translation quality score can be with
Including:Machine translation evaluation method can be used and determine the corresponding translation quality score of sentence.Wherein, above-mentioned machine translation evaluation side
Method may include:Automatic evaluation method and/or artificial evaluation method;Above-mentioned automatic evaluation method can obtain evaluation and test set in advance
(including original language input sentence and reference translation), then can be Chong Die with reference translation according to the corresponding machine translation result of sentence
N-gram (N-gram, such as " having deep love for home " be a bi-gram, " liking eating apple " is a Trigram), calculate
The corresponding translation quality score of sentence.It is appreciated that arbitrary machine translation evaluation method is feasible, the embodiment of the present invention
For the sentence for including for each punctuate result, determine that the detailed process of corresponding translation quality score does not limit.
Optionally, the process that the corresponding translation quality score of the above-mentioned all sentences for including to each punctuate result is merged
May include:Summation or product are carried out to the corresponding translation quality score of all sentences that each punctuate result includes or added
Weight average processing etc., it will be understood that the embodiment of the present invention is for the corresponding translation matter of all sentences for including to each punctuate result
Point detailed process merged is measured not limit.
Optimal result acquisition scheme 2,
Optimal result obtains scheme 2:Using dynamic programming algorithm, foundation includes based on the pending text
The obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
The principle of above-mentioned dynamic programming algorithm can be, by splitting problem, the pass between problem definition state and state
System so that problem can go to solve in a manner of recursion (dividing and ruling in other words).Specific to the embodiment of the present invention, problem can be to wait for
The corresponding synthesized translation optimal quality of the corresponding punctuate result of text is handled, state can be the corresponding subordinate sentence sequence of pending text
Arrange the corresponding synthesized translation optimal quality of the corresponding punctuate result of each subset of set.It is poor that scheme 1 is obtained relative to optimal result
It lifts the corresponding a variety of punctuate results of the pending text and determines that the synthesized translation quality of a variety of punctuate results, optimal result obtain
Take the dynamic programming algorithm that scheme 2 uses that can reduce operand, and as the preset punctuate that the pending text includes accords with
Number quantity increase, the reduction amplitude of operand will be increasing.
Optionally, above-mentioned to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text
Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:According to described pending
The preset punctuation mark that text includes determines the corresponding subordinate sentence arrangement set of the pending text;According to the subordinate sentence sequence
The sequence of the subset of set from small to large determines that each subset corresponds to the backtracking segmentation of optimal subset punctuate result by recursion mode
Point;Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described waiting locating
Manage the corresponding optimal punctuate result of text.
Wherein, above-mentioned subordinate sentence arrangement set can be used for indicating the sequence for the continuous subordinate sentence composition that the pending text is included
The set of row, optionally, the subordinate sentence sequence included by above-mentioned subordinate sentence arrangement set can be by preceding i continuous subordinate sentence groups of target vocabulary
At for example, pending text [C1C2…CM] corresponding subordinate sentence arrangement set may include:{C1, C1C2, C1C2C3..., C1C2…
CM, the subset which is included according to sequence length (namely sequence includes the quantity of subordinate sentence) from small to large
Sequence can be expressed as:{C1}、{C1C2}、{C1C2C3}…{C1C2…CM, wherein above-mentioned subset corresponds to adjacent in subordinate sentence sequence
It can be connected by preset punctuation mark between subordinate sentence;Optionally, the subset of the embodiment of the present invention can include a subordinate sentence sequence
Row, wherein CiI-th of subordinate sentence for including for indicating pending text, i are the positive integer more than or equal to 0, are waited for described in M expressions
The subordinate sentence quantity of text is handled, M is positive integer.
For each subset of subordinate sentence arrangement set, corresponding subset punctuate result is also corresponding with synthesized translation matter
Amount, therefore the embodiment of the present invention can determine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result;The optimal subset
The backtracking cut-point of punctuate result can be used for indicating subset corresponds to optimal subset punctuate result it is optimal when, in which preset punctuate symbol
It is divided or makes pauses in reading unpunctuated ancient writings at number.Assuming that subset { C1C2C3Optimal subset punctuate result is corresponded to as { (C1), (C2C3), then illustrate son
Collect { C1C2C3It is in " C1" at be divided or punctuate, it is corresponding to recall cut-point and be expressed as " C1" number 1, Ke Yili
Solution, the embodiment of the present invention are not limited for recalling the specific representation of cut-point.
The embodiment of the present invention can pass through recursion mode according to the subset sequence from small to large of the subordinate sentence arrangement set
Determine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, it is assumed that according to the subset of the subordinate sentence arrangement set from small
Each subset is expressed as to big sequence:G1、G2、G3…Gu, wherein u is positive integer, then can obtain G successively1、G2、G3…Gu
The backtracking cut-point of corresponding optimal subset punctuate result;Also, for Go (1≤o≤u), subset before Go is needed (such as
Go-1、Go-2Deng) optimal subset punctuate as a result, determining that Go corresponds to the backtracking cut-point of optimal subset punctuate result.
In a kind of alternative embodiment of the present invention, the subset of the subordinate sentence arrangement set may include:It is described pending
The preceding i subordinate sentence of text, the corresponding optimal subset synthesized translation quality score of preceding i subordinate sentence are expressed as F (i), 0≤i≤described
The subordinate sentence quantity M of pending text, then the sequence of the subset according to the subordinate sentence arrangement set from small to large, passes through recursion
Mode determines that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, can specifically include:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point
The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain
Point;Wherein, first semantic primitive may include:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, institute
Stating the second semantic primitive may include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence
Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence
At least one cut-point k in obtain the corresponding Target Segmentation point k ' of optimal synthesis translation quality score;In practical applications, divide
The quantity of cutpoint k can be one or more, and the quantity of Target Segmentation point k ' can be one or more, but Target Segmentation
The corresponding set of point k ' can be less than or equal to the corresponding set of cut-point k.Assuming that the corresponding collection of cut-point k is combined into { 0,1,2,3 ...
K }, then the corresponding set of Target Segmentation point k ' can be the subset of { 0,1,2,3 ... k }, for example, the corresponding collection of Target Segmentation point k '
Closing can be { 0,1 } etc..
The Target Segmentation point k ' is corresponded to the backtracking cut-point of optimal subset punctuate result as the preceding i subordinate sentence,
And using the corresponding synthesized translation quality scores of the Target Segmentation point k ' as the corresponding optimal subset of the preceding i subordinate sentence
Synthesized translation quality score F (i).
In the embodiment of the present invention, semantic primitive can be used for indicating the unit of one meaning of expression, can in the embodiment of the present invention
Two to be made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k by the first semantic primitive and the second semantic element representation
A semantic primitive.In practical applications, made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, i subordinate sentence packet before obtaining
The second language after the cut-point k that the first semantic primitive and preceding i subordinate sentence before what is included be located at cut-point k include
Adopted unit.It is appreciated that the embodiment of the present invention for the first semantic primitive and the included subordinate sentence of the second semantic primitive quantity not
It limits, for example, the first semantic primitive and the second semantic primitive can separately include one or more subordinate sentence.
The corresponding optimal synthesis translation quality score of k subordinate sentence before F (k) can be used for indicating.It in practical applications, can be with needle
To F (k), preset corresponding initial value, for example, initial value=0, k of the corresponding F of k=0 [0] is more than the initial of 0 corresponding F [i]
Value=- INF (minus infinity) etc., it will be understood that the embodiment of the present invention does not limit the corresponding initial values of F (k).It can be with
Find out, the value of F (0) can be obtained by preset;When k is more than 0, the initial value of corresponding F (k) can be obtained by preset, corresponding F (k)
End value can be obtained by iteration, for example, can be acquired by following formula (1) k more than 0 correspond to F (k) end value.
Assuming that the corresponding optimal subset synthesized translation quality score of the first semantic primitive is F (k), the second semantic primitive is turned over
It is NMT_score (k, i) to translate quality score, then is integrated to the translation quality score of F (k) and second semantic primitive
Process may include:Summation or product or weighted average processing etc. are carried out to F (k) and NMT_score (k, i), it can
To understand, the embodiment of the present invention carries out the specific of synthesis for the translation quality score to F (k) and second semantic primitive
Process does not limit.
In practical applications, for preceding i subordinate sentence, corresponding cut-point k can be located at corresponding of preceding i subordinate sentence
Meaning position, in this way, the corresponding cut-point of preceding i subordinate sentence such as subset { C1C2C3Corresponding cut-point k number can be 0,1,2,
3 etc..It correspondingly, can be according to the preceding i subordinate sentence and the corresponding synthesized translation quality score F (i, k) of cut-point k, from described
The corresponding Target Segmentation point of optimal synthesis translation quality score is obtained in the corresponding at least one cut-point k of preceding i subordinate sentence.
In the embodiment of the present invention, it can be obtained by the size of synthesized translation quality score to weigh optimal synthesis translation quality
Point, it is assumed that F (i, k)=F [k]+NMT_score (k, i), then the corresponding optimal synthesis translation quality score of the preceding i subordinate sentence,
The corresponding Target Segmentation point of the optimal synthesis translation quality score can be expressed as:
F [i]=max (F [k]+NMT_score (k, i)) (1)
Index [i]=argmax (F [k]+NMT_score (k, i)) (2)
Index [i] can be used for indicating maximum (F [k]+NMT_score (k, i)) corresponding k values.In practical applications,
Can be according to the sequences of i from small to large, the corresponding optimal subset synthesized translation quality score F of i subordinate sentence before Recursive Solution successively
(i) and corresponding backtracking cut-point.
Optionally, the method for the embodiment of the present invention can also include:Each subset of the subordinate sentence arrangement set is corresponded to most
The backtracking cut-point of excellent subset punctuate result is recorded;Alternatively, information to each subset of the subordinate sentence arrangement set and its
Mapping relations between the backtracking cut-point of corresponding optimal subset punctuate result are recorded, to obtain corresponding record content.
Wherein, the information of the subset of above-mentioned subordinate sentence arrangement set may include:The number information of the corresponding end subordinate sentence of subset, and/or,
Corresponding number information of subset etc..For example, for preceding i subordinate sentence, corresponding number information can be i, correspond to end point
The information etc. of sentence namely i-th of subordinate sentence.It is appreciated that the embodiment of the present invention does not limit the specifying information of subset.
In a kind of alternative embodiment of the present invention, above-mentioned each subset according to the subordinate sentence arrangement set corresponds to optimal son
The backtracking cut-point for collecting punctuate result obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, with
The maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to described
Pending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, above-mentioned each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result
Recalled, be can specifically include:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includes
Recall cut-point P2.
In practical applications, the backtracking of backtracking cut-point, former M subordinate sentence can be carried out according to the sequences of i from big to small
For the acquisition process of corresponding backtracking cut-point, the corresponding first backtracking cut-point P1 of M subordinate sentence, example before can determining first
Such as, the corresponding first backtracking cut-point P1 of M subordinate sentence before being inquired from record content above-mentioned;Wherein, the first backtracking point
The corresponding optimal subset punctuate result of M subordinate sentence before cutpoint P1 can be obtained;Then, P1 before being obtained from record content above-mentioned
A subordinate sentence corresponding second recalls cut-point P2, for example, P1 subordinate sentence is corresponding before being inquired from record content above-mentioned
Second backtracking cut-point P2;Wherein, the corresponding optimal subset punctuate knot of P1 subordinate sentence before the second backtracking cut-point P2 can be obtained
Fruit can terminate to recall if P1 or P2 is equal to 0, otherwise, if P1 or P2 is not equal to 0, can continue to recall.
To make those skilled in the art more fully understand the cutting processing procedure of the embodiment of the present invention, shown herein by one kind
Example illustrates that the processing procedure of the embodiment of the present invention, the example are related to handling pending text [A, B, C], corresponding to handle
Process can specifically include following steps:
Step S1, the corresponding subordinate sentence arrangement set { [A, B], [A, B], [A, B, C] } of pending text [A, B, C] is obtained;
Assuming that S (i, j) indicate from u-th of comma to the subordinate sentence sequence v-th of preset punctuation mark, then S (0,1)=
A, S (1,2)=B, S (2,3)=C, S (0,2)=A, B, S (1,3)=B, C, S (0,3)=A, B, C.
It is further assumed that the translation quality score of the corresponding sentences of S (i, j) is respectively:
NMT_score (0,1)=- 10
NMT_score (1,2)=- 15
NMT_score (2,3)=- 20
NMT_score (0,2)=- 2
NMT_score (1,3)=- 5
NMT_score (0,3)=- 30
Step S2, the corresponding optimal subset synthesized translation quality score of i subordinate sentence before being indicated using F (i), F's [0] is initial
It is worth initial value=- INF (minus infinity) that=0, i is more than 0 corresponding F [i];
Step S3, as i=0, the corresponding optimal subset synthesized translation quality score F (0)=0 of first 0 continuous subordinate sentence;
Step S4, as i=1, corresponding cut-point k=0, then
F [1]=max (F [0]+NMT_score (0,1))=- 10
Index [1]=0;
Step S5, as i=2, corresponding cut-point k=0,1, then
F [2]=max (F [0]+NMT_score (0,2), F [1]+NMT_score (1,2))=F [0]+NMT_score (0,
2)=- 2
Index [2]=0;
Step S6, as i=3, corresponding cut-point k=0,1,2, then
F [3]=max (F [0]+NMT_score (0,3), F [1]+NMT_score (1,3), F [2]+NMT_score (2,
3))=F [1]+NMT_score (1,3)=- 15
Index [3]=1;
Step S7, the corresponding backtracking cut-points of F (3) are recalled;
Wherein it is possible to obtain the corresponding backtracking cut-point P1=1 of F (3) first, the corresponding backtracking segmentations of F (1) are then obtained
Point P2=0, that is, pending text [A, B, C] can be made pauses in reading unpunctuated ancient writings for 2 sentences, corresponding backtracking cut-point is respectively:P=0,
And P=1, that is, 2 sentences that cutting obtains are located at after the 0th subordinate sentence and the 1st subordinate sentence, therefore can obtain
Corresponding optimal punctuate result " A " and " B, C ".
It is appreciated that above-mentioned pending text [A, B, C] is intended only as alternative embodiment, it will be understood that art technology
Personnel can be handled arbitrary pending text, according to practical application request to obtain corresponding optimal punctuate result.
For example, for pending text [A, B, C, D, E, F], " Saunders indicates that Donald Trump once promised to undertake that he was after taking up the post of during general election
System of social security, the elderly's medical insurance system and Medicaid will not be cancelled, still, now he appoint this
A little people exactly advocate that crowd of people for cancelling above-mentioned system " corresponding punctuate result may include:" A, B, C, D " and " E, F ".
To sum up, the processing method of the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending text
Cut-point, obtain the corresponding optimal punctuate result of the pending text;Due to the above-mentioned optimal punctuate of the embodiment of the present invention
As a result synthesized translation optimal quality, the optimal punctuate result may include:At least one sentence, the synthesized translation quality
It can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore the optimal punctuate knot of the embodiment of the present invention
Fruit can realize the global optimum of synthesized translation quality, thus the optimal punctuate result of the embodiment of the present invention can improve it is pending
The translation quality of the corresponding punctuate result of text.
It should be noted that for embodiment of the method, for simple description, therefore it is dynamic to be all expressed as a series of movement
It combines, but those skilled in the art should understand that, the embodiment of the present invention is not limited by described athletic performance sequence
System, because of embodiment according to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, art technology
Personnel should also know that embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs
Surely it is necessary to the embodiment of the present invention.
Device embodiment
With reference to Fig. 4, shows a kind of structure diagram of processing unit embodiment of the present invention, can specifically include:
Pending text acquisition module 401, for obtaining pending text;
Optimal punctuate result acquisition module 402, for according to the preset punctuation mark for including based on the pending text
Obtained cut-point obtains the corresponding optimal punctuate result of the pending text;Wherein, the synthesis of the optimal punctuate result
Translation quality is optimal, and the optimal punctuate result may include:At least one sentence, the synthesized translation quality are optimal punctuate
As a result all sentences for including correspond to the synthesis of translation quality;And
Optimal punctuate result output module 403, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module 402 may include:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending text
The cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule may include:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines institute
State the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to large
Fixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement set
As a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set may include:The preceding i subordinate sentence of the pending text, preceding i
The corresponding optimal subset synthesized translation quality score of subordinate sentence is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text,
Then the recursion unit may include:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding i
The optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point k
The translation quality score of adopted unit;Wherein, first semantic primitive may include:The position that the preceding i subordinate sentence may include
Subordinate sentence before cut-point k, second semantic primitive may include:What the preceding i subordinate sentence may include is located at segmentation
Subordinate sentence after point k, 0≤k<i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive,
To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point k
Score obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentence
Cut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentence
Collect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding i
The corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit may include:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement set
Cutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result
Point;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement set
Backtracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement may include:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit described first recalls cut-point P1 for being located at of obtaining that the pending text may include
The corresponding second backtracking cut-point P2 of subordinate sentence before.
Optionally, the optimal punctuate result acquisition module 402 may include:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text,
Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending text
Optimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark may include:Comma and/or branch and/or branch.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 5 be shown according to an exemplary embodiment it is a kind of for processing device as terminal when block diagram.For example,
The terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device,
Medical Devices, body-building equipment, personal digital assistant etc..
With reference to Fig. 5, terminal 900 may include following one or more components:Processing component 902, memory 904, power supply
Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and
Communication component 916.
The integrated operation of 902 usual control terminal 900 of processing component, such as with display, call, data communication, phase
Machine operates and record operates associated operation.Processing element 902 may include that one or more processors 920 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just
Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate
Interaction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in terminal 900.These data are shown
Example includes instruction for any application program or method that are operated in terminal 900, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 may include power management system
System, one or more power supplys and other generated with for terminal 900, management and the associated component of distribution electric power.
Multimedia component 908 is included in the screen of one output interface of offer between the terminal 900 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion
The boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 908 includes a front camera and/or rear camera.When terminal 900 is in operation mode, mould is such as shot
When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting
Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike
Wind (MIC), when terminal 900 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set
Part 916 is sent.In some embodiments, audio component 910 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 912 provide interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commented
Estimate.For example, sensor module 914 can detect the state that opens/closes of terminal 900, and the relative positioning of component, for example, it is described
Component is the display and keypad of terminal 900, and sensor module 914 can be with 900 1 components of detection terminal 900 or terminal
Position change, the existence or non-existence that user contacts with terminal 900,900 orientation of terminal or acceleration/deceleration and terminal 900
Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal
900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation
In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 900 can be believed by one or more application application-specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of terminal 900 to complete the above method.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 6 be shown according to an exemplary embodiment it is a kind of for processing device as server when block diagram.It should
Server 1900 can generate bigger difference because configuration or performance are different, may include one or more central processings
Device (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932,
(such as one or more magnanimity of storage medium 1930 of one or more storage application programs 1942 or data 1944
Storage device).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.Storage is stored in be situated between
The program of matter 1930 may include one or more modules (diagram does not mark), and each module may include in server
Series of instructions operation.Further, central processing unit 1922 could be provided as communicating with storage medium 1930, service
The series of instructions operation in storage medium 1930 is executed on device 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 1932 of instruction, above-metioned instruction can be executed by the processor 1922 of server 1900 to complete the above method.
For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape,
Floppy disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processor of server
When execution so that device (server or terminal) is able to carry out a kind of processing method, the method includes:Obtain pending text
This;According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained
The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet
It includes:At least one sentence, the synthesized translation quality are that all sentences that the optimal punctuate result includes correspond to translation quality
Synthesis;Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains
The corresponding optimal punctuate of the pending text is as a result, include:Using dynamic programming algorithm, according to based on the pending text
Including the obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text
Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determined
Set;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion mode
The backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described
The corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence
Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute
The subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion mode
The backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point
The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain
Point;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described the
Two semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence
Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence
At least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, with
And it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentence
Translation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result
Point obtains the corresponding optimal punctuate of the pending text as a result, including:Each subset of the subordinate sentence arrangement set is corresponded to most
The backtracking cut-point of excellent subset punctuate result is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to optimal son
Collect the backtracking cut-point of punctuate result;Maximal subset according to the subordinate sentence arrangement set corresponds to returning for optimal subset punctuate result
Trace back cut-point, makes pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result
Recalled, including:The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;Obtain the position that the pending text includes
The corresponding second backtracking cut-point P2 of subordinate sentence before the first backtracking cut-point P1.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains
The corresponding optimal punctuate of the pending text is as a result, include:According to the preset punctuate symbol for including based on the pending text
Number obtained cut-point, punctuate processing is carried out to the pending text, corresponding a variety of disconnected to obtain the pending text
Sentence result;Determine the corresponding synthesized translation quality of the punctuate result;From the corresponding a variety of punctuate results of the pending text
The middle punctuate for selecting synthesized translation optimal quality is as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention
Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or
Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Above to a kind of processing method provided by the present invention, a kind of processing unit and a kind of device for processing,
It is described in detail, principle and implementation of the present invention are described for specific case used herein, the above reality
The explanation for applying example is merely used to help understand the method and its core concept of the present invention;Meanwhile for the general technology of this field
Personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theory
Bright book content should not be construed as limiting the invention.
Claims (10)
1. a kind of processing method, which is characterized in that including:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained
The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet
It includes:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
2. according to the method described in claim 1, it is characterized in that, the foundation based on the pending text include it is preset
The cut-point that punctuation mark obtains obtains the corresponding optimal punctuate of the pending text as a result, including:
It is obtained according to the cut-point that the preset punctuation mark for including based on the pending text obtains using dynamic programming algorithm
Take the corresponding optimal punctuate result of the pending text.
3. according to the method described in claim 2, it is characterized in that, described utilize dynamic programming algorithm, foundation to be waited for based on described
The cut-point that the processing text preset punctuation mark that includes obtains, obtain the corresponding optimal punctuate of the pending text as a result,
Including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence sets of the pending text are determined
It closes;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to optimal son by recursion mode
Collect the backtracking cut-point of punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described waiting locating
Manage the corresponding optimal punctuate result of text.
4. according to the method described in claim 3, it is characterized in that, the subset of the subordinate sentence arrangement set includes:It is described to wait locating
The preceding i subordinate sentence of text is managed, the corresponding optimal subset synthesized translation quality score of preceding i subordinate sentence is expressed as F (i), 0≤i≤institute
The subordinate sentence quantity M of pending text is stated, then the sequence of the subset according to the subordinate sentence arrangement set from small to large, by passing
The mode of pushing away determines that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, it is corresponding to obtain the preceding i subordinate sentence and the cut-point k
The translation quality score of the optimal subset synthesized translation quality score F (k) of first semantic primitive and the second semantic primitive;Its
In, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described second is semantic
Unit includes:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence and divide
The corresponding synthesized translation quality scores of cutpoint k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, from the preceding i subordinate sentence it is corresponding to
The corresponding Target Segmentation point of optimal synthesis translation quality score is obtained in a few cut-point k;
The backtracking cut-point of optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, and, it will
The corresponding synthesized translation quality score of the Target Segmentation point is as the corresponding optimal subset synthesized translation matter of the preceding i subordinate sentence
Measure a point F (i).
5. method according to claim 3 or 4, which is characterized in that each subset according to the subordinate sentence arrangement set
The backtracking cut-point of corresponding optimal subset punctuate result obtains the corresponding optimal punctuate of the pending text as a result, including:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, to obtain
The maximal subset of the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, waits locating to described
Reason text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
6. according to the method described in claim 5, it is characterized in that, each subset to the subordinate sentence arrangement set corresponds to most
The backtracking cut-point of excellent subset punctuate result is recalled, including:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain corresponding second backtracking of subordinate sentence being located at before the first backtracking cut-point P1 that the pending text includes
Cut-point P2.
7. according to the method described in claim 1, it is characterized in that, the foundation based on the pending text include it is preset
The cut-point that punctuation mark obtains obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is carried out
Punctuate processing, to obtain the corresponding a variety of punctuate results of the pending text;
Determine the corresponding synthesized translation quality of the punctuate result;
The punctuate of synthesized translation optimal quality is selected as a result, as institute from the corresponding a variety of punctuate results of the pending text
State the corresponding optimal punctuate result of pending text.
8. according to the method described in claims 1 or 2 or 3 or 4 or 7, which is characterized in that the preset punctuation mark includes:It is funny
Number and/or branch and/or branch.
9. a kind of processing unit, which is characterized in that including:
Pending text acquisition module, for obtaining pending text;
Optimal punctuate result acquisition module, point for being obtained according to the preset punctuation mark for including based on the pending text
Cutpoint obtains the corresponding optimal punctuate result of the pending text;Wherein, the synthesized translation quality of the optimal punctuate result
Optimal, the optimal punctuate result includes:At least one sentence, the synthesized translation quality are all sentences that punctuate result includes
The synthesis of the corresponding translation quality of son;And
Optimal punctuate result output module, for exporting the corresponding optimal punctuate result of the processing text.
10. a kind of device for processing, which is characterized in that include memory and one or more than one program,
Either more than one program is stored in memory and is configured to be executed by one or more than one processor for one of them
The one or more programs include the instruction for being operated below:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained
The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet
It includes:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710157267.5A CN108628819B (en) | 2017-03-16 | 2017-03-16 | Processing method and device for processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710157267.5A CN108628819B (en) | 2017-03-16 | 2017-03-16 | Processing method and device for processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108628819A true CN108628819A (en) | 2018-10-09 |
CN108628819B CN108628819B (en) | 2022-09-20 |
Family
ID=63687489
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710157267.5A Active CN108628819B (en) | 2017-03-16 | 2017-03-16 | Processing method and device for processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628819B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408833A (en) * | 2018-10-30 | 2019-03-01 | 科大讯飞股份有限公司 | A kind of interpretation method, device, equipment and readable storage medium storing program for executing |
CN109920406A (en) * | 2019-03-28 | 2019-06-21 | 国家计算机网络与信息安全管理中心 | A kind of dynamic voice recognition methods and system based on variable initial position |
CN110321532A (en) * | 2019-06-06 | 2019-10-11 | 数译(成都)信息技术有限公司 | Language pre-processes punctuate method, computer equipment and computer readable storage medium |
CN111046649A (en) * | 2019-11-22 | 2020-04-21 | 北京捷通华声科技股份有限公司 | Text segmentation method and device |
CN114420102A (en) * | 2022-01-04 | 2022-04-29 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150139A1 (en) * | 2007-12-10 | 2009-06-11 | Kabushiki Kaisha Toshiba | Method and apparatus for translating a speech |
CN104915264A (en) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | Input error-correction method and device |
CN105912522A (en) * | 2016-03-31 | 2016-08-31 | 长安大学 | Automatic extraction method and extractor of English corpora based on constituent analyses |
CN106484681A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of method generating candidate's translation, device and electronic equipment |
-
2017
- 2017-03-16 CN CN201710157267.5A patent/CN108628819B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090150139A1 (en) * | 2007-12-10 | 2009-06-11 | Kabushiki Kaisha Toshiba | Method and apparatus for translating a speech |
CN104915264A (en) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | Input error-correction method and device |
CN106484681A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | A kind of method generating candidate's translation, device and electronic equipment |
CN105912522A (en) * | 2016-03-31 | 2016-08-31 | 长安大学 | Automatic extraction method and extractor of English corpora based on constituent analyses |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408833A (en) * | 2018-10-30 | 2019-03-01 | 科大讯飞股份有限公司 | A kind of interpretation method, device, equipment and readable storage medium storing program for executing |
WO2020087655A1 (en) * | 2018-10-30 | 2020-05-07 | 科大讯飞股份有限公司 | Translation method, apparatus and device, and readable storage medium |
CN109920406A (en) * | 2019-03-28 | 2019-06-21 | 国家计算机网络与信息安全管理中心 | A kind of dynamic voice recognition methods and system based on variable initial position |
CN109920406B (en) * | 2019-03-28 | 2021-12-03 | 国家计算机网络与信息安全管理中心 | Dynamic voice recognition method and system based on variable initial position |
CN110321532A (en) * | 2019-06-06 | 2019-10-11 | 数译(成都)信息技术有限公司 | Language pre-processes punctuate method, computer equipment and computer readable storage medium |
CN111046649A (en) * | 2019-11-22 | 2020-04-21 | 北京捷通华声科技股份有限公司 | Text segmentation method and device |
CN114420102A (en) * | 2022-01-04 | 2022-04-29 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
CN114420102B (en) * | 2022-01-04 | 2022-10-14 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108628819B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110288077B (en) | Method and related device for synthesizing speaking expression based on artificial intelligence | |
CN107291690A (en) | Punctuate adding method and device, the device added for punctuate | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN107632980A (en) | Voice translation method and device, the device for voiced translation | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
CN108628813A (en) | Treating method and apparatus, the device for processing | |
CN108399914B (en) | Voice recognition method and device | |
CN107291704B (en) | Processing method and device for processing | |
CN106202150B (en) | Information display method and device | |
CN107274903A (en) | Text handling method and device, the device for text-processing | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN110322760A (en) | Voice data generation method, device, terminal and storage medium | |
CN108345612A (en) | A kind of question processing method and device, a kind of device for issue handling | |
CN111583919A (en) | Information processing method, device and storage medium | |
CN108304412A (en) | A kind of cross-language search method and apparatus, a kind of device for cross-language search | |
CN108255940A (en) | A kind of cross-language search method and apparatus, a kind of device for cross-language search | |
CN110069624A (en) | Text handling method and device | |
CN111149172B (en) | Emotion management method, device and computer-readable storage medium | |
CN107564526A (en) | Processing method, device and machine readable media | |
CN109471919B (en) | Zero pronoun resolution method and device | |
CN107424612A (en) | Processing method, device and machine readable media | |
CN109002184A (en) | A kind of association method and device of input method candidate word | |
WO2018214663A1 (en) | Voice-based data processing method and apparatus, and electronic device | |
CN108628461A (en) | A kind of input method and device, a kind of method and apparatus of update dictionary | |
CN113936697A (en) | Voice processing method and device for voice processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |