CN108628819A - Treating method and apparatus, the device for processing - Google Patents

Treating method and apparatus, the device for processing Download PDF

Info

Publication number
CN108628819A
CN108628819A CN201710157267.5A CN201710157267A CN108628819A CN 108628819 A CN108628819 A CN 108628819A CN 201710157267 A CN201710157267 A CN 201710157267A CN 108628819 A CN108628819 A CN 108628819A
Authority
CN
China
Prior art keywords
optimal
punctuate
point
cut
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710157267.5A
Other languages
Chinese (zh)
Other versions
CN108628819B (en
Inventor
姜里羊
王宇光
陈伟
程善伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710157267.5A priority Critical patent/CN108628819B/en
Publication of CN108628819A publication Critical patent/CN108628819A/en
Application granted granted Critical
Publication of CN108628819B publication Critical patent/CN108628819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

An embodiment of the present invention provides a kind for the treatment of method and apparatus and a kind of device for processing, method therein specifically includes:Obtain pending text;According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the corresponding optimal punctuate result of the pending text is obtained;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result include:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;Export the corresponding optimal punctuate result of the processing text.The embodiment of the present invention can improve the translation quality of the corresponding punctuate result of pending text.

Description

Treating method and apparatus, the device for processing
Technical field
The present invention relates to natural language processing technique fields, more particularly to a kind for the treatment of method and apparatus and one kind Device for processing.
Background technology
Punctuate technology is an important basic technology in natural language processing field.So-called punctuate is exactly by text It is cut into semantic complete sentence.Due to by text dividing at semantic complete sentence, be to realize machine recognition human language The first step, therefore punctuate technology is widely used in the application of the natural language processings such as machine translation, speech recognition, information service In branch.
Machine translation mothod refers to that a kind of natural language (original language) is converted to another natural language using computer The process of (object language).Traditional machine translation mothod before carrying out machine translation, usually by source text input by user or Person carries out punctuate processing by the source text that speech recognition obtains, then carries out machine translation according to punctuate handling result;Therefore, break The accuracy rate of sentence handling result has vital influence to mechanical translation quality, and the accuracy rate height of punctuate handling result will Directly affect the height of mechanical translation quality.
The mode of existing scheme generally use setting threshold value carries out text punctuate.If for example, the comma quantity that text includes More than first threshold or when the number of words that text is included is more than second threshold, then make pauses in reading unpunctuated ancient writings to the text.
However, semantic incomplete sentence is susceptible in the punctuate handling result that existing scheme obtains, and the semanteme is not Complete sentence will influence the translation quality of machine translation, therefore the punctuate handling result of existing scheme leads to machine translation Translation quality is relatively low.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind Processing method, processing unit and the device for processing to solve the above problems, the embodiment of the present invention can improve pending text The translation quality of this corresponding punctuate result.
To solve the above-mentioned problems, the invention discloses a kind of processing methods, including:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtained This corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result Including:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains The corresponding optimal punctuate of the pending text is as a result, include:
Using dynamic programming algorithm, the segmentation obtained according to the preset punctuation mark for including based on the pending text Point obtains the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determined Set;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion mode The backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described The corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute The subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion mode The backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain Point;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described the Two semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence At least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, with And it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentence Translation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result Point obtains the corresponding optimal punctuate of the pending text as a result, including:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, with The maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to described Pending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result Recalled, including:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includes Recall cut-point P2.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains The corresponding optimal punctuate of the pending text is as a result, include:
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, to the pending text Punctuate processing is carried out, to obtain the corresponding a variety of punctuate results of the pending text;
Determine the corresponding synthesized translation quality of the punctuate result;
The punctuate of selection synthesized translation optimal quality is as a result, make from the pending text corresponding a variety of punctuate results For the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
On the other hand, the invention discloses a kind of processing units, including:
Pending text acquisition module, for obtaining pending text;
Optimal punctuate result acquisition module, for being obtained according to the preset punctuation mark for including based on the pending text Cut-point, obtain the corresponding optimal punctuate result of the pending text;Wherein, the synthesized translation of the optimal punctuate result Optimal quality, the optimal punctuate result include:At least one sentence, the institute that the synthesized translation quality includes for punctuate result There is sentence to correspond to the synthesis of translation quality;And
Optimal punctuate result output module, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module includes:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending text The cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule includes:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines institute State the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to large Fixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement set As a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute Stating recursion unit includes:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding i The optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point k The translation quality score of adopted unit;Wherein, first semantic primitive includes:What the preceding i subordinate sentence included is located at cut-point k Subordinate sentence before, second semantic primitive include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k <i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive, To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point k Score obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentence Cut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentence Collect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding i The corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit includes:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement set Cutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result Point;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement set Backtracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement includes:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit, for obtain that the pending text includes be located at described first recall cut-point P1 before The corresponding second backtracking cut-point P2 of subordinate sentence.
Optionally, the optimal punctuate result acquisition module includes:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text, Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending text Optimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Include memory and one or one in another aspect, the invention discloses a kind of device for processing Above program, one of them either more than one program be stored in memory and be configured to by one or one with It includes the instruction for being operated below that upper processor, which executes the one or more programs,:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is obtained This corresponding optimal punctuate result;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result Including:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
The embodiment of the present invention includes following advantages:
The cut-point that the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending text, described in acquisition The corresponding optimal punctuate result of pending text;Due to the synthesized translation quality of the above-mentioned optimal punctuate result of the embodiment of the present invention Optimal, the optimal punctuate result may include:At least one sentence, the synthesized translation quality can be a kind of punctuate result Including all sentences correspond to the synthesis of translation quality;Therefore the optimal punctuate result of the embodiment of the present invention can realize synthesized translation The global optimum of quality, therefore the optimal punctuate result of the embodiment of the present invention can improve the corresponding punctuate result of pending text Translation quality.
Description of the drawings
Fig. 1 is a kind of example arrangement schematic diagram of processing system of the embodiment of the present invention;
Fig. 2 is a kind of processing method embodiment flow chart of the present invention;
Fig. 3 is a kind of schematic diagram of the path planning of pending text of the embodiment of the present invention;
Fig. 4 is a kind of structure diagram of processing unit embodiment of the present invention;
Fig. 5 be shown according to an exemplary embodiment it is a kind of for processing device as terminal when block diagram;And
Fig. 6 be shown according to an exemplary embodiment it is a kind of for processing device as server when block diagram.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
An embodiment of the present invention provides a kind of processing scheme, which can be according to based on the pending text packet The cut-point that the preset punctuation mark contained obtains obtains the corresponding optimal punctuate result of the pending text;Due to the present invention The synthesized translation optimal quality of the above-mentioned optimal punctuate result of embodiment, the optimal punctuate result may include:It is at least one Sentence, the synthesized translation quality can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore this The optimal punctuate result of inventive embodiments can realize the global optimum of synthesized translation quality, herein it is globally available in indicate wait for The corresponding entirety of the corresponding optimal punctuate result of text is handled, therefore the optimal punctuate result of the embodiment of the present invention can be improved and be waited for Handle the translation quality of the corresponding punctuate result of text.
The embodiment of the present invention can be applied to turn in the needs such as machine translation, speech recognition, information service punctuate and machine The arbitrary scene translated, it will be understood that the embodiment of the present invention does not limit specific application scenarios.
For example, referring to Fig. 1, a kind of example arrangement schematic diagram of processing system of the embodiment of the present invention is shown, have Body may include:Processing unit 101, machine translation apparatus 102 and translation result output device 103.Wherein, processing unit 101, Machine translation apparatus 102 and translation result output device 103 can be used as individual server, can also be set to jointly same In a server, that is, the embodiment of the present invention is for processing unit 101, machine translation apparatus 102 and translation result output device 103 specific location does not limit.
Wherein, processing unit 101 can obtain pending text;According to the preset mark for including based on the pending text The cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding most to obtain the pending text Excellent punctuate result;And export the corresponding optimal punctuate result of the processing text to machine translation apparatus 102.
Optionally, processing unit 101 can obtain pending text according to the voice signal of spoken user.Such situation Under, the voice signal of spoken user can be converted to text message by processing unit 101, and from obtaining and waiting in text information Manage text.In practical applications, spoken user may include:The use of voice signal is talked and sent out in the scene of simultaneous interpretation Family, and/or the user etc. that voice signal is generated by terminal can be said by microphone or the reception of other voice collecting devices Talk about the voice signal of user.
Optionally, processing unit 101 may be used speech recognition technology and the voice signal of spoken user be converted to text Information.If the voice signal of user's spoken user is denoted as S, corresponding language is obtained after carrying out a series of processing to S Sound characteristic sequence O, is denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is phonetic feature total number. The corresponding sentences of voice signal S are considered as a word string being made of many words, are denoted as W={ w1, w2..., wn}.Voice is known Other process is exactly to find out most probable word string W according to known phonetic feature sequence O.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people Sound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, to establish speech recognition institute The template needed;The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratio Compared with process, finally determine with the optimal Template of the inputted voice match of the user, to obtain the result of speech recognition.Tool The speech recognition algorithm of body can be used training and the recognizer of the hidden Markov model based on statistics, base can also be used In the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present invention Embodiment does not limit specific speech recognition process.
Alternatively, optionally, processing unit 101 can obtain pending text according to text input by user.For example, user The text inputted under the scenes such as instant messaging, office documents, can be as the source of pending text.
In practical applications, processing unit 101 can according to practical application request, from the corresponding text of voice signal or Pending text is obtained in text input by user.It is alternatively possible to the interval time according to voice signal S, from voice signal S Pending text is obtained in corresponding text;For example, when the interval time of voice signal S being more than time threshold, it can foundation The time point determines corresponding first separation, using the corresponding texts of voice signal S before first separation as waiting locating Text is managed, and the corresponding texts of voice signal S after first separation are handled, it is pending to continue therefrom to obtain Text.It is alternatively possible to according to the number of words that the corresponding text of voice signal or text input by user are included, believe from voice Pending text is obtained in number corresponding text or text input by user;For example, in the corresponding text of voice signal or When the number of words that text input by user includes is more than number of words threshold value, corresponding second boundary can be determined according to the number of words threshold value Point, can be using the corresponding texts of voice signal S before second separation as pending text, and to second separation The corresponding texts of voice signal S later are handled, to continue therefrom to obtain pending text.
In the embodiment of the present invention, sentence be made of according to certain syntax rule word or phrase, expression it is relatively complete The meaning, have the syntactical unit of the apparent tone and sentence tune.Optionally, sentence may include:Simple sentence and/or complex sentence.Wherein, simple sentence It is the sentence being made of phrase or single word, one certain tone language that completely looks like and have relatively of independent expression It adjusts, such as " classmates have returned to school ", " he is in the pink of condition ".Relatively independent simple sentence form is referred to as point in complex sentence Sentence, generally has pause, is indicated with comma or branch on written between subordinate sentence and subordinate sentence;Subordinate sentence and subordinate sentence have one in the sense Fixed contact, commonly uses some related words adverbial word or phrase of relevant effect (conjunction) to connect, such as " China wants rich and powerful, this It is the hope of more than ten00000000 Chinese people " etc..
Optionally, the interval time and its language model that processing unit 101 can be according to voice signal S, in spoken user The corresponding text message of voice signal in be inserted into corresponding preset punctuation mark.Optionally, the preset punctuation mark of insertion can For identifying the pause in sentence between each subordinate sentence, which can include but is not limited to:Comma, pause mark, branch Deng.
The cut-point that processing unit 101 is obtained according to the preset punctuation mark for including based on the pending text obtains The corresponding optimal punctuate result of the pending text;Specifically, in the embodiment of the present invention, the pending text includes pre- Set punctuation mark possible as or not as punctuate processing cut-point, that is, can be according to the pending text packet The preset punctuation mark contained as or not as punctuate processing cut-point situation, make pauses in reading unpunctuated ancient writings to the pending text Processing, in this way, pending text will it is corresponding there are many punctuate scheme and its corresponding punctuate as a result, the embodiment of the present invention most What is obtained eventually is the punctuate result of synthesized translation optimal quality.
In a kind of application example of the present invention, it is assumed that 2 comma punctuates that pending text [A, B, C] includes have can Can or can not possibly be as the cut-point of punctuate processing, and assume that corresponding punctuate result may include:{ (A, B, C) }, (A), (B, C) }, { (A), (B), (C) } and { (A, B), (C) } etc., then the embodiment of the present invention can obtain synthesized translation optimal quality Punctuate result;Wherein, [] indicates that pending text, () indicate that the sentence that punctuate obtains, { } indicate punctuate result.
Machine translation apparatus 102, can be received from processing unit 101 the processing text it is corresponding it is optimal punctuate as a result, And the corresponding optimal punctuate result of the processing text is translated as to the word of object language, wherein machine translation apparatus 102 may be used the translation that machine translation mothod carries out optimal punctuate result, and machine translation mothod can utilize computer by one The target subordinate sentence of kind natural language (original language) is converted to the process of the word of another natural language (object language), for example, Source language and the target language can be respectively Chinese and English, alternatively, source language and the target language can be respectively English in Text etc., the embodiment of the present invention mention specific machine translation mothod for specific original language, target language and do not limit.It is optional The type on ground, above-mentioned machine translation apparatus 102 may include:Measurement type and/or neural network type etc., it will be understood that this Inventive embodiments do not limit the concrete type of machine translation apparatus 102.
Translation result output device 103 can receive the word of object language from machine translation apparatus 102, and to the target The word of language is exported, and the corresponding way of output may include:Voice mode and/or interface manner etc..For example, in unison Under the scene of translation, the text conversion of the object language can be the voice of object language, and export.It is alternatively possible to It is object language by the text conversion of the object language using the switch technology (such as speech synthesis technique) of Text To Speech Voice, and by the speech plays such as earphone, loud speaker device by the voice output of object language.It is appreciated that the present invention is implemented Example is not for limiting the detailed process of voice and output that the text conversion of the object language is object language.Again Such as, under the scene of information service (such as translation web site or translation APP), directly machine translation apparatus 102 can be obtained The word of object language exports, for example, the text importing of object language is looked into the display device of such as screen for user It sees.
It is appreciated that processing system shown in Fig. 1 is intended only as can be exemplified, in fact, processing unit 101 can in addition to Except machine translation apparatus 102 other devices output processing text it is corresponding it is optimal punctuate as a result, the embodiment of the present invention for Specific processing system does not limit.
Embodiment of the method
With reference to Fig. 2, shows a kind of processing method embodiment flow chart of the present invention, can specifically include following steps:
Step 201 obtains pending text;
Step 202, the cut-point obtained according to the preset punctuation mark for including based on the pending text, described in acquisition The corresponding optimal punctuate result of pending text;Wherein, the synthesized translation optimal quality of the optimal punctuate result, it is described optimal Punctuate result may include:At least one sentence, the synthesized translation quality can be all sentences pair that punctuate result includes Answer the synthesis of translation quality;
Step 203, the corresponding optimal punctuate result of the output processing text.
Processing method provided in an embodiment of the present invention can be applied to the application environment of the computing devices such as terminal or server In.Optionally, above-mentioned terminal can include but is not limited to:Smart mobile phone, tablet computer, pocket computer on knee, vehicle mounted electric Brain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic services Device, the processing service for providing pending text to client.
Processing method provided in an embodiment of the present invention is applicable to the processing of the language such as Chinese, Japanese, Korean, for improving The translation quality of the corresponding punctuate result of pending text.It is appreciated that the arbitrary language made pauses in reading unpunctuated ancient writings is in this hair In the scope of application of the processing method of bright embodiment.
In the embodiment of the present invention, the text that pending text can be used for indicating to be handled, which can With the text or voice inputted by computing device from user, other computing devices are can be from.It needs to illustrate It is that may include in above-mentioned pending text:A kind of language or more than one language, for example, in above-mentioned pending text It may include Chinese, can also include the Chinese mixing with other for example English language, the embodiment of the present invention is to specifically waiting for Processing text does not limit.
In practical applications, the computing device of the embodiment of the present invention can by client end AP P (application, Application the process flow of the embodiment of the present invention) is executed, client application may operate on computing device, example Such as, which can be the arbitrary APP that runs in terminal, then the client application can be answered from other of computing device With the pending text of acquisition.Alternatively, the computing device of the embodiment of the present invention can be executed by the functional device of client application The process flow of the embodiment of the present invention, then the functional device can be from the pending text of other functional devices acquisition.Alternatively, The computing device of the embodiment of the present invention can execute the processing method of the embodiment of the present invention as server.
In a kind of alternative embodiment of the present invention, the method for the embodiment of the present invention can also include:Step 201 is obtained At least one pending text write-in buffer area taken;Then step 202 can read pending text from the buffer area first, And the cut-point obtained according to the preset punctuation mark for including based on read pending text, obtain the pending text Corresponding optimal punctuate result.It is alternatively possible to establish such as queue, array or chained list in the memory field of computing device Data structure does not limit specific buffer area as above-mentioned buffer area, the embodiment of the present invention.It is above-mentioned to use buffer area The treatment effeciency of pending text can be improved by storing the mode of pending text, it will be understood that pending using disk storage The mode of text is also feasible, and the embodiment of the present invention does not limit the specific storage mode of pending text.
In the embodiment of the present invention, the preset punctuation mark that the pending text includes is possible as or not as disconnected The cut-point of sentence processing, that is, can be used as according to the preset punctuation mark that the pending text includes or not as disconnected The situation of the cut-point of sentence processing, carries out punctuate processing, in this way, a pending text will be corresponding with to the pending text A variety of punctuate schemes and its it is corresponding punctuate as a result, the embodiment of the present invention it is finally obtained be synthesized translation optimal quality punctuate As a result.
The embodiment of the present invention can provide point obtained according to the preset punctuation mark for including based on the pending text Cutpoint, the following optimal result for obtaining the corresponding optimal punctuate result of the pending text obtain scheme:
Optimal result acquisition scheme 1,
Optimal result obtains scheme 1:It is obtained according to the preset punctuation mark for including based on the pending text The cut-point arrived carries out punctuate processing to the pending text, is tied with obtaining the corresponding a variety of punctuates of the pending text Fruit;Determine the corresponding synthesized translation quality of the punctuate result;And from the corresponding a variety of punctuate results of the pending text The punctuate of synthesized translation optimal quality is selected as a result, as the corresponding optimal punctuate result of the pending text.
In practical applications, path planning algorithm may be used, according to the preset mark for including based on the pending text The cut-point that point symbol obtains carries out punctuate processing to the pending text, corresponding more to obtain the pending text Kind path and the corresponding punctuate result in each path.The principle of above-mentioned path planning algorithm can be, in the ring with barrier In border, according to certain evaluation criterion, a collisionless path from initial state to dbjective state is found, specific to the present invention Embodiment, barrier can be used for indicating that the corresponding cut-point of pending text, initial state and dbjective state indicate to wait locating respectively Manage the first subordinate sentence and end subordinate sentence of text.
With reference to Fig. 3, a kind of schematic diagram of the path planning of pending text of the embodiment of the present invention is shown, wherein wait for It is [A, B, C] to handle text, it is assumed that 2 comma punctuates that pending text [A, B, C] includes are possible to or can not possibly make For the cut-point of punctuate processing, in Fig. 3, subordinate sentence A, B, C are indicated with rectangle respectively, and comma punctuate is indicated with circle respectively, funny When number punctuate is used as cut-point, corresponding rounded periphery be provided with hexagon, then the punctuate result of [A, B, C] may include:0 Cut-point corresponding { (A, B, C) }, the 1st comma punctuate are as cut-point corresponding { (A), (B, C) }, the 1st comma punctuate It is corresponding as cut-point as cut-point corresponding { (A), (B), (C) } and the 2nd comma punctuate with the 2nd comma punctuate { (A, B), (C) } etc..
It is appreciated that path planning algorithm is intended only as the alternative embodiment of the embodiment of the present invention, actually this field skill Art personnel can according to practical application request, obtain the corresponding a variety of punctuates of the pending text using other algorithms as a result, It is appreciated that the embodiment of the present invention is not subject to the specific acquisition algorithm of the corresponding a variety of punctuate results of the pending text Limitation.
In a kind of alternative embodiment of the present invention, the corresponding synthesized translation quality of the determination punctuate result can be with Including:For the sentence that each punctuate result includes, corresponding translation quality score is determined;All sentences for including to each punctuate result The corresponding translation quality score of son is merged, to obtain corresponding synthesized translation quality score;It can then be tied from all punctuates The highest punctuate of synthesized translation quality score is obtained in fruit as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the above-mentioned sentence for including for each punctuate result determines that the process of corresponding translation quality score can be with Including:Machine translation evaluation method can be used and determine the corresponding translation quality score of sentence.Wherein, above-mentioned machine translation evaluation side Method may include:Automatic evaluation method and/or artificial evaluation method;Above-mentioned automatic evaluation method can obtain evaluation and test set in advance (including original language input sentence and reference translation), then can be Chong Die with reference translation according to the corresponding machine translation result of sentence N-gram (N-gram, such as " having deep love for home " be a bi-gram, " liking eating apple " is a Trigram), calculate The corresponding translation quality score of sentence.It is appreciated that arbitrary machine translation evaluation method is feasible, the embodiment of the present invention For the sentence for including for each punctuate result, determine that the detailed process of corresponding translation quality score does not limit.
Optionally, the process that the corresponding translation quality score of the above-mentioned all sentences for including to each punctuate result is merged May include:Summation or product are carried out to the corresponding translation quality score of all sentences that each punctuate result includes or added Weight average processing etc., it will be understood that the embodiment of the present invention is for the corresponding translation matter of all sentences for including to each punctuate result Point detailed process merged is measured not limit.
Optimal result acquisition scheme 2,
Optimal result obtains scheme 2:Using dynamic programming algorithm, foundation includes based on the pending text The obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
The principle of above-mentioned dynamic programming algorithm can be, by splitting problem, the pass between problem definition state and state System so that problem can go to solve in a manner of recursion (dividing and ruling in other words).Specific to the embodiment of the present invention, problem can be to wait for The corresponding synthesized translation optimal quality of the corresponding punctuate result of text is handled, state can be the corresponding subordinate sentence sequence of pending text Arrange the corresponding synthesized translation optimal quality of the corresponding punctuate result of each subset of set.It is poor that scheme 1 is obtained relative to optimal result It lifts the corresponding a variety of punctuate results of the pending text and determines that the synthesized translation quality of a variety of punctuate results, optimal result obtain Take the dynamic programming algorithm that scheme 2 uses that can reduce operand, and as the preset punctuate that the pending text includes accords with Number quantity increase, the reduction amplitude of operand will be increasing.
Optionally, above-mentioned to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:According to described pending The preset punctuation mark that text includes determines the corresponding subordinate sentence arrangement set of the pending text;According to the subordinate sentence sequence The sequence of the subset of set from small to large determines that each subset corresponds to the backtracking segmentation of optimal subset punctuate result by recursion mode Point;Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described waiting locating Manage the corresponding optimal punctuate result of text.
Wherein, above-mentioned subordinate sentence arrangement set can be used for indicating the sequence for the continuous subordinate sentence composition that the pending text is included The set of row, optionally, the subordinate sentence sequence included by above-mentioned subordinate sentence arrangement set can be by preceding i continuous subordinate sentence groups of target vocabulary At for example, pending text [C1C2…CM] corresponding subordinate sentence arrangement set may include:{C1, C1C2, C1C2C3..., C1C2… CM, the subset which is included according to sequence length (namely sequence includes the quantity of subordinate sentence) from small to large Sequence can be expressed as:{C1}、{C1C2}、{C1C2C3}…{C1C2…CM, wherein above-mentioned subset corresponds to adjacent in subordinate sentence sequence It can be connected by preset punctuation mark between subordinate sentence;Optionally, the subset of the embodiment of the present invention can include a subordinate sentence sequence Row, wherein CiI-th of subordinate sentence for including for indicating pending text, i are the positive integer more than or equal to 0, are waited for described in M expressions The subordinate sentence quantity of text is handled, M is positive integer.
For each subset of subordinate sentence arrangement set, corresponding subset punctuate result is also corresponding with synthesized translation matter Amount, therefore the embodiment of the present invention can determine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result;The optimal subset The backtracking cut-point of punctuate result can be used for indicating subset corresponds to optimal subset punctuate result it is optimal when, in which preset punctuate symbol It is divided or makes pauses in reading unpunctuated ancient writings at number.Assuming that subset { C1C2C3Optimal subset punctuate result is corresponded to as { (C1), (C2C3), then illustrate son Collect { C1C2C3It is in " C1" at be divided or punctuate, it is corresponding to recall cut-point and be expressed as " C1" number 1, Ke Yili Solution, the embodiment of the present invention are not limited for recalling the specific representation of cut-point.
The embodiment of the present invention can pass through recursion mode according to the subset sequence from small to large of the subordinate sentence arrangement set Determine that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, it is assumed that according to the subset of the subordinate sentence arrangement set from small Each subset is expressed as to big sequence:G1、G2、G3…Gu, wherein u is positive integer, then can obtain G successively1、G2、G3…Gu The backtracking cut-point of corresponding optimal subset punctuate result;Also, for Go (1≤o≤u), subset before Go is needed (such as Go-1、Go-2Deng) optimal subset punctuate as a result, determining that Go corresponds to the backtracking cut-point of optimal subset punctuate result.
In a kind of alternative embodiment of the present invention, the subset of the subordinate sentence arrangement set may include:It is described pending The preceding i subordinate sentence of text, the corresponding optimal subset synthesized translation quality score of preceding i subordinate sentence are expressed as F (i), 0≤i≤described The subordinate sentence quantity M of pending text, then the sequence of the subset according to the subordinate sentence arrangement set from small to large, passes through recursion Mode determines that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, can specifically include:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain Point;Wherein, first semantic primitive may include:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, institute Stating the second semantic primitive may include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence At least one cut-point k in obtain the corresponding Target Segmentation point k ' of optimal synthesis translation quality score;In practical applications, divide The quantity of cutpoint k can be one or more, and the quantity of Target Segmentation point k ' can be one or more, but Target Segmentation The corresponding set of point k ' can be less than or equal to the corresponding set of cut-point k.Assuming that the corresponding collection of cut-point k is combined into { 0,1,2,3 ... K }, then the corresponding set of Target Segmentation point k ' can be the subset of { 0,1,2,3 ... k }, for example, the corresponding collection of Target Segmentation point k ' Closing can be { 0,1 } etc..
The Target Segmentation point k ' is corresponded to the backtracking cut-point of optimal subset punctuate result as the preceding i subordinate sentence, And using the corresponding synthesized translation quality scores of the Target Segmentation point k ' as the corresponding optimal subset of the preceding i subordinate sentence Synthesized translation quality score F (i).
In the embodiment of the present invention, semantic primitive can be used for indicating the unit of one meaning of expression, can in the embodiment of the present invention Two to be made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k by the first semantic primitive and the second semantic element representation A semantic primitive.In practical applications, made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, i subordinate sentence packet before obtaining The second language after the cut-point k that the first semantic primitive and preceding i subordinate sentence before what is included be located at cut-point k include Adopted unit.It is appreciated that the embodiment of the present invention for the first semantic primitive and the included subordinate sentence of the second semantic primitive quantity not It limits, for example, the first semantic primitive and the second semantic primitive can separately include one or more subordinate sentence.
The corresponding optimal synthesis translation quality score of k subordinate sentence before F (k) can be used for indicating.It in practical applications, can be with needle To F (k), preset corresponding initial value, for example, initial value=0, k of the corresponding F of k=0 [0] is more than the initial of 0 corresponding F [i] Value=- INF (minus infinity) etc., it will be understood that the embodiment of the present invention does not limit the corresponding initial values of F (k).It can be with Find out, the value of F (0) can be obtained by preset;When k is more than 0, the initial value of corresponding F (k) can be obtained by preset, corresponding F (k) End value can be obtained by iteration, for example, can be acquired by following formula (1) k more than 0 correspond to F (k) end value.
Assuming that the corresponding optimal subset synthesized translation quality score of the first semantic primitive is F (k), the second semantic primitive is turned over It is NMT_score (k, i) to translate quality score, then is integrated to the translation quality score of F (k) and second semantic primitive Process may include:Summation or product or weighted average processing etc. are carried out to F (k) and NMT_score (k, i), it can To understand, the embodiment of the present invention carries out the specific of synthesis for the translation quality score to F (k) and second semantic primitive Process does not limit.
In practical applications, for preceding i subordinate sentence, corresponding cut-point k can be located at corresponding of preceding i subordinate sentence Meaning position, in this way, the corresponding cut-point of preceding i subordinate sentence such as subset { C1C2C3Corresponding cut-point k number can be 0,1,2, 3 etc..It correspondingly, can be according to the preceding i subordinate sentence and the corresponding synthesized translation quality score F (i, k) of cut-point k, from described The corresponding Target Segmentation point of optimal synthesis translation quality score is obtained in the corresponding at least one cut-point k of preceding i subordinate sentence.
In the embodiment of the present invention, it can be obtained by the size of synthesized translation quality score to weigh optimal synthesis translation quality Point, it is assumed that F (i, k)=F [k]+NMT_score (k, i), then the corresponding optimal synthesis translation quality score of the preceding i subordinate sentence, The corresponding Target Segmentation point of the optimal synthesis translation quality score can be expressed as:
F [i]=max (F [k]+NMT_score (k, i)) (1)
Index [i]=argmax (F [k]+NMT_score (k, i)) (2)
Index [i] can be used for indicating maximum (F [k]+NMT_score (k, i)) corresponding k values.In practical applications, Can be according to the sequences of i from small to large, the corresponding optimal subset synthesized translation quality score F of i subordinate sentence before Recursive Solution successively (i) and corresponding backtracking cut-point.
Optionally, the method for the embodiment of the present invention can also include:Each subset of the subordinate sentence arrangement set is corresponded to most The backtracking cut-point of excellent subset punctuate result is recorded;Alternatively, information to each subset of the subordinate sentence arrangement set and its Mapping relations between the backtracking cut-point of corresponding optimal subset punctuate result are recorded, to obtain corresponding record content. Wherein, the information of the subset of above-mentioned subordinate sentence arrangement set may include:The number information of the corresponding end subordinate sentence of subset, and/or, Corresponding number information of subset etc..For example, for preceding i subordinate sentence, corresponding number information can be i, correspond to end point The information etc. of sentence namely i-th of subordinate sentence.It is appreciated that the embodiment of the present invention does not limit the specifying information of subset.
In a kind of alternative embodiment of the present invention, above-mentioned each subset according to the subordinate sentence arrangement set corresponds to optimal son The backtracking cut-point for collecting punctuate result obtains the corresponding optimal punctuate of the pending text as a result, can specifically include:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, with The maximal subset for obtaining the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, to described Pending text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, above-mentioned each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result Recalled, be can specifically include:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain the subordinate sentence corresponding second being located at before the first backtracking cut-point P1 that the pending text includes Recall cut-point P2.
In practical applications, the backtracking of backtracking cut-point, former M subordinate sentence can be carried out according to the sequences of i from big to small For the acquisition process of corresponding backtracking cut-point, the corresponding first backtracking cut-point P1 of M subordinate sentence, example before can determining first Such as, the corresponding first backtracking cut-point P1 of M subordinate sentence before being inquired from record content above-mentioned;Wherein, the first backtracking point The corresponding optimal subset punctuate result of M subordinate sentence before cutpoint P1 can be obtained;Then, P1 before being obtained from record content above-mentioned A subordinate sentence corresponding second recalls cut-point P2, for example, P1 subordinate sentence is corresponding before being inquired from record content above-mentioned Second backtracking cut-point P2;Wherein, the corresponding optimal subset punctuate knot of P1 subordinate sentence before the second backtracking cut-point P2 can be obtained Fruit can terminate to recall if P1 or P2 is equal to 0, otherwise, if P1 or P2 is not equal to 0, can continue to recall.
To make those skilled in the art more fully understand the cutting processing procedure of the embodiment of the present invention, shown herein by one kind Example illustrates that the processing procedure of the embodiment of the present invention, the example are related to handling pending text [A, B, C], corresponding to handle Process can specifically include following steps:
Step S1, the corresponding subordinate sentence arrangement set { [A, B], [A, B], [A, B, C] } of pending text [A, B, C] is obtained;
Assuming that S (i, j) indicate from u-th of comma to the subordinate sentence sequence v-th of preset punctuation mark, then S (0,1)= A, S (1,2)=B, S (2,3)=C, S (0,2)=A, B, S (1,3)=B, C, S (0,3)=A, B, C.
It is further assumed that the translation quality score of the corresponding sentences of S (i, j) is respectively:
NMT_score (0,1)=- 10
NMT_score (1,2)=- 15
NMT_score (2,3)=- 20
NMT_score (0,2)=- 2
NMT_score (1,3)=- 5
NMT_score (0,3)=- 30
Step S2, the corresponding optimal subset synthesized translation quality score of i subordinate sentence before being indicated using F (i), F's [0] is initial It is worth initial value=- INF (minus infinity) that=0, i is more than 0 corresponding F [i];
Step S3, as i=0, the corresponding optimal subset synthesized translation quality score F (0)=0 of first 0 continuous subordinate sentence;
Step S4, as i=1, corresponding cut-point k=0, then
F [1]=max (F [0]+NMT_score (0,1))=- 10
Index [1]=0;
Step S5, as i=2, corresponding cut-point k=0,1, then
F [2]=max (F [0]+NMT_score (0,2), F [1]+NMT_score (1,2))=F [0]+NMT_score (0, 2)=- 2
Index [2]=0;
Step S6, as i=3, corresponding cut-point k=0,1,2, then
F [3]=max (F [0]+NMT_score (0,3), F [1]+NMT_score (1,3), F [2]+NMT_score (2, 3))=F [1]+NMT_score (1,3)=- 15
Index [3]=1;
Step S7, the corresponding backtracking cut-points of F (3) are recalled;
Wherein it is possible to obtain the corresponding backtracking cut-point P1=1 of F (3) first, the corresponding backtracking segmentations of F (1) are then obtained Point P2=0, that is, pending text [A, B, C] can be made pauses in reading unpunctuated ancient writings for 2 sentences, corresponding backtracking cut-point is respectively:P=0, And P=1, that is, 2 sentences that cutting obtains are located at after the 0th subordinate sentence and the 1st subordinate sentence, therefore can obtain Corresponding optimal punctuate result " A " and " B, C ".
It is appreciated that above-mentioned pending text [A, B, C] is intended only as alternative embodiment, it will be understood that art technology Personnel can be handled arbitrary pending text, according to practical application request to obtain corresponding optimal punctuate result. For example, for pending text [A, B, C, D, E, F], " Saunders indicates that Donald Trump once promised to undertake that he was after taking up the post of during general election System of social security, the elderly's medical insurance system and Medicaid will not be cancelled, still, now he appoint this A little people exactly advocate that crowd of people for cancelling above-mentioned system " corresponding punctuate result may include:" A, B, C, D " and " E, F ".
To sum up, the processing method of the embodiment of the present invention is obtained according to the preset punctuation mark for including based on pending text Cut-point, obtain the corresponding optimal punctuate result of the pending text;Due to the above-mentioned optimal punctuate of the embodiment of the present invention As a result synthesized translation optimal quality, the optimal punctuate result may include:At least one sentence, the synthesized translation quality It can be the synthesis that a kind of all sentences that punctuate result includes correspond to translation quality;Therefore the optimal punctuate knot of the embodiment of the present invention Fruit can realize the global optimum of synthesized translation quality, thus the optimal punctuate result of the embodiment of the present invention can improve it is pending The translation quality of the corresponding punctuate result of text.
It should be noted that for embodiment of the method, for simple description, therefore it is dynamic to be all expressed as a series of movement It combines, but those skilled in the art should understand that, the embodiment of the present invention is not limited by described athletic performance sequence System, because of embodiment according to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, art technology Personnel should also know that embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs Surely it is necessary to the embodiment of the present invention.
Device embodiment
With reference to Fig. 4, shows a kind of structure diagram of processing unit embodiment of the present invention, can specifically include:
Pending text acquisition module 401, for obtaining pending text;
Optimal punctuate result acquisition module 402, for according to the preset punctuation mark for including based on the pending text Obtained cut-point obtains the corresponding optimal punctuate result of the pending text;Wherein, the synthesis of the optimal punctuate result Translation quality is optimal, and the optimal punctuate result may include:At least one sentence, the synthesized translation quality are optimal punctuate As a result all sentences for including correspond to the synthesis of translation quality;And
Optimal punctuate result output module 403, for exporting the corresponding optimal punctuate result of the processing text.
Optionally, the optimal punctuate result acquisition module 402 may include:
Dynamic Programming acquisition submodule, for utilizing dynamic programming algorithm, foundation to include based on the pending text The cut-point that preset punctuation mark obtains obtains the corresponding optimal punctuate result of the pending text.
Optionally, the Dynamic Programming acquisition submodule may include:
Subordinate sentence arrangement set determination unit, the preset punctuation mark for including according to the pending text, determines institute State the corresponding subordinate sentence arrangement set of pending text;
Recursion unit, it is true by recursion mode for the sequence of the subset according to the subordinate sentence arrangement set from small to large Fixed each subset corresponds to the backtracking cut-point of optimal subset punctuate result;And
Optimal punctuate result acquiring unit, for corresponding to optimal subset punctuate according to each subset of the subordinate sentence arrangement set As a result backtracking cut-point obtains the corresponding optimal punctuate result of the pending text.
Optionally, the subset of the subordinate sentence arrangement set may include:The preceding i subordinate sentence of the pending text, preceding i The corresponding optimal subset synthesized translation quality score of subordinate sentence is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, Then the recursion unit may include:
Subset punctuate subelement, for being made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain the preceding i The optimal subset synthesized translation quality score F (k) and the second language of subordinate sentence and corresponding first semantic primitives of the cut-point k The translation quality score of adopted unit;Wherein, first semantic primitive may include:The position that the preceding i subordinate sentence may include Subordinate sentence before cut-point k, second semantic primitive may include:What the preceding i subordinate sentence may include is located at segmentation Subordinate sentence after point k, 0≤k<i;
Quality comprehensive subelement, for being integrated to the translation quality score of F (k) and second semantic primitive, To obtain the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k;
Target Segmentation point obtains subelement, for according to the preceding i subordinate sentence and the corresponding synthesized translation quality of cut-point k Score obtains the corresponding target of optimal synthesis translation quality score from the corresponding at least one cut-point k of the preceding i subordinate sentence Cut-point;
Recall cut-point and obtain subelement, for corresponding to optimal son using the Target Segmentation point as the preceding i subordinate sentence Collect the backtracking cut-point of punctuate result, and, using the corresponding synthesized translation quality score of the Target Segmentation point as the preceding i The corresponding optimal subset synthesized translation quality score F (i) of a subordinate sentence.
Optionally, the optimal punctuate result acquiring unit may include:
Recall subelement, the backtracking point for corresponding to optimal subset punctuate result to each subset of the subordinate sentence arrangement set Cutpoint is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result Point;
Recall punctuate subelement, optimal subset punctuate result is corresponded to for the maximal subset according to the subordinate sentence arrangement set Backtracking cut-point, make pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, the backtracking subelement may include:
First trace unit, for the corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Second trace unit described first recalls cut-point P1 for being located at of obtaining that the pending text may include The corresponding second backtracking cut-point P2 of subordinate sentence before.
Optionally, the optimal punctuate result acquisition module 402 may include:
Exhaustive submodule, the cut-point for being obtained according to the preset punctuation mark for including based on the pending text, Punctuate processing is carried out to the pending text, to obtain the corresponding a variety of punctuate results of the pending text;
Comprehensive quality determination sub-module, for determining the corresponding synthesized translation quality of the punctuate result;
As a result submodule is selected, for selecting synthesized translation matter from the corresponding a variety of punctuate results of the pending text Optimal punctuate is measured as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark may include:Comma and/or branch and/or branch.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 5 be shown according to an exemplary embodiment it is a kind of for processing device as terminal when block diagram.For example, The terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc..
With reference to Fig. 5, terminal 900 may include following one or more components:Processing component 902, memory 904, power supply Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and Communication component 916.
The integrated operation of 902 usual control terminal 900 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing element 902 may include that one or more processors 920 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate Interaction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in terminal 900.These data are shown Example includes instruction for any application program or method that are operated in terminal 900, contact data, and telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 may include power management system System, one or more power supplys and other generated with for terminal 900, management and the associated component of distribution electric power.
Multimedia component 908 is included in the screen of one output interface of offer between the terminal 900 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion The boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 908 includes a front camera and/or rear camera.When terminal 900 is in operation mode, mould is such as shot When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike Wind (MIC), when terminal 900 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set Part 916 is sent.In some embodiments, audio component 910 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 912 provide interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock Determine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commented Estimate.For example, sensor module 914 can detect the state that opens/closes of terminal 900, and the relative positioning of component, for example, it is described Component is the display and keypad of terminal 900, and sensor module 914 can be with 900 1 components of detection terminal 900 or terminal Position change, the existence or non-existence that user contacts with terminal 900,900 orientation of terminal or acceleration/deceleration and terminal 900 Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 900 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of terminal 900 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
Fig. 6 be shown according to an exemplary embodiment it is a kind of for processing device as server when block diagram.It should Server 1900 can generate bigger difference because configuration or performance are different, may include one or more central processings Device (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932, (such as one or more magnanimity of storage medium 1930 of one or more storage application programs 1942 or data 1944 Storage device).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.Storage is stored in be situated between The program of matter 1930 may include one or more modules (diagram does not mark), and each module may include in server Series of instructions operation.Further, central processing unit 1922 could be provided as communicating with storage medium 1930, service The series of instructions operation in storage medium 1930 is executed on device 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 1932 of instruction, above-metioned instruction can be executed by the processor 1922 of server 1900 to complete the above method. For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, Floppy disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processor of server When execution so that device (server or terminal) is able to carry out a kind of processing method, the method includes:Obtain pending text This;According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet It includes:At least one sentence, the synthesized translation quality are that all sentences that the optimal punctuate result includes correspond to translation quality Synthesis;Export the corresponding optimal punctuate result of the processing text.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains The corresponding optimal punctuate of the pending text is as a result, include:Using dynamic programming algorithm, according to based on the pending text Including the obtained cut-point of preset punctuation mark, obtain the corresponding optimal punctuate result of the pending text.
Optionally, described to utilize dynamic programming algorithm, according to the preset punctuation mark for including based on the pending text Obtained cut-point obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence of the pending text is determined Set;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to most by recursion mode The backtracking cut-point of excellent subset punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described The corresponding optimal punctuate result of pending text.
Optionally, the subset of the subordinate sentence arrangement set includes:The preceding i subordinate sentence of the pending text, preceding i subordinate sentence Corresponding optimal subset synthesized translation quality score is expressed as F (i), the subordinate sentence quantity M of 0≤i≤pending text, then institute The subset sequence from small to large according to the subordinate sentence arrangement set is stated, determines that each subset corresponds to optimal subset by recursion mode The backtracking cut-point of punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, to obtain k pairs of the preceding i subordinate sentence and the cut-point The optimal subset synthesized translation quality score F (k) for the first semantic primitive answered and the translation quality of the second semantic primitive obtain Point;Wherein, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described the Two semantic primitives include:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence Synthesized translation quality score corresponding with cut-point k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, corresponded to from the preceding i subordinate sentence At least one cut-point k in obtain the corresponding Target Segmentation point of optimal synthesis translation quality score;
The backtracking cut-point that optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, with And it is integrated the corresponding synthesized translation quality score of the Target Segmentation point as the corresponding optimal subset of the preceding i subordinate sentence Translation quality score F (i).
Optionally, each subset according to the subordinate sentence arrangement set corresponds to the backtracking segmentation of optimal subset punctuate result Point obtains the corresponding optimal punctuate of the pending text as a result, including:Each subset of the subordinate sentence arrangement set is corresponded to most The backtracking cut-point of excellent subset punctuate result is recalled, and the maximal subset to obtain the subordinate sentence arrangement set corresponds to optimal son Collect the backtracking cut-point of punctuate result;Maximal subset according to the subordinate sentence arrangement set corresponds to returning for optimal subset punctuate result Trace back cut-point, makes pauses in reading unpunctuated ancient writings to the pending text, to obtain the corresponding optimal punctuate result of the pending text.
Optionally, each subset to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result Recalled, including:The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;Obtain the position that the pending text includes The corresponding second backtracking cut-point P2 of subordinate sentence before the first backtracking cut-point P1.
Optionally, the cut-point that the preset punctuation mark that the foundation includes based on the pending text obtains obtains The corresponding optimal punctuate of the pending text is as a result, include:According to the preset punctuate symbol for including based on the pending text Number obtained cut-point, punctuate processing is carried out to the pending text, corresponding a variety of disconnected to obtain the pending text Sentence result;Determine the corresponding synthesized translation quality of the punctuate result;From the corresponding a variety of punctuate results of the pending text The middle punctuate for selecting synthesized translation optimal quality is as a result, as the corresponding optimal punctuate result of the pending text.
Optionally, the preset punctuation mark includes:Comma and/or branch and/or branch.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Above to a kind of processing method provided by the present invention, a kind of processing unit and a kind of device for processing, It is described in detail, principle and implementation of the present invention are described for specific case used herein, the above reality The explanation for applying example is merely used to help understand the method and its core concept of the present invention;Meanwhile for the general technology of this field Personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theory Bright book content should not be construed as limiting the invention.

Claims (10)

1. a kind of processing method, which is characterized in that including:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet It includes:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
2. according to the method described in claim 1, it is characterized in that, the foundation based on the pending text include it is preset The cut-point that punctuation mark obtains obtains the corresponding optimal punctuate of the pending text as a result, including:
It is obtained according to the cut-point that the preset punctuation mark for including based on the pending text obtains using dynamic programming algorithm Take the corresponding optimal punctuate result of the pending text.
3. according to the method described in claim 2, it is characterized in that, described utilize dynamic programming algorithm, foundation to be waited for based on described The cut-point that the processing text preset punctuation mark that includes obtains, obtain the corresponding optimal punctuate of the pending text as a result, Including:
According to the preset punctuation mark that the pending text includes, the corresponding subordinate sentence sequence sets of the pending text are determined It closes;
According to the subset sequence from small to large of the subordinate sentence arrangement set, determine that each subset corresponds to optimal son by recursion mode Collect the backtracking cut-point of punctuate result;The corresponding synthesized translation optimal quality of the optimal subset punctuate result;
Each subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, obtains described waiting locating Manage the corresponding optimal punctuate result of text.
4. according to the method described in claim 3, it is characterized in that, the subset of the subordinate sentence arrangement set includes:It is described to wait locating The preceding i subordinate sentence of text is managed, the corresponding optimal subset synthesized translation quality score of preceding i subordinate sentence is expressed as F (i), 0≤i≤institute The subordinate sentence quantity M of pending text is stated, then the sequence of the subset according to the subordinate sentence arrangement set from small to large, by passing The mode of pushing away determines that each subset corresponds to the backtracking cut-point of optimal subset punctuate result, including:
Made pauses in reading unpunctuated ancient writings to the preceding i subordinate sentence using cut-point k, it is corresponding to obtain the preceding i subordinate sentence and the cut-point k The translation quality score of the optimal subset synthesized translation quality score F (k) of first semantic primitive and the second semantic primitive;Its In, first semantic primitive includes:The subordinate sentence being located at before cut-point k that the preceding i subordinate sentence includes, described second is semantic Unit includes:The subordinate sentence being located at after cut-point k that the preceding i subordinate sentence includes, 0≤k<i;
The translation quality score of F (k) and second semantic primitive are integrated, to obtain the preceding i subordinate sentence and divide The corresponding synthesized translation quality scores of cutpoint k;
According to the preceding i subordinate sentence and the corresponding synthesized translation quality scores of cut-point k, from the preceding i subordinate sentence it is corresponding to The corresponding Target Segmentation point of optimal synthesis translation quality score is obtained in a few cut-point k;
The backtracking cut-point of optimal subset punctuate result is corresponded to using the Target Segmentation point as the preceding i subordinate sentence, and, it will The corresponding synthesized translation quality score of the Target Segmentation point is as the corresponding optimal subset synthesized translation matter of the preceding i subordinate sentence Measure a point F (i).
5. method according to claim 3 or 4, which is characterized in that each subset according to the subordinate sentence arrangement set The backtracking cut-point of corresponding optimal subset punctuate result obtains the corresponding optimal punctuate of the pending text as a result, including:
The backtracking cut-point that optimal subset punctuate result is corresponded to each subset of the subordinate sentence arrangement set is recalled, to obtain The maximal subset of the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result;
Maximal subset according to the subordinate sentence arrangement set corresponds to the backtracking cut-point of optimal subset punctuate result, waits locating to described Reason text is made pauses in reading unpunctuated ancient writings, to obtain the corresponding optimal punctuate result of the pending text.
6. according to the method described in claim 5, it is characterized in that, each subset to the subordinate sentence arrangement set corresponds to most The backtracking cut-point of excellent subset punctuate result is recalled, including:
The corresponding first backtracking cut-point P1 of i subordinate sentence before obtaining;
Obtain corresponding second backtracking of subordinate sentence being located at before the first backtracking cut-point P1 that the pending text includes Cut-point P2.
7. according to the method described in claim 1, it is characterized in that, the foundation based on the pending text include it is preset The cut-point that punctuation mark obtains obtains the corresponding optimal punctuate of the pending text as a result, including:
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text is carried out Punctuate processing, to obtain the corresponding a variety of punctuate results of the pending text;
Determine the corresponding synthesized translation quality of the punctuate result;
The punctuate of synthesized translation optimal quality is selected as a result, as institute from the corresponding a variety of punctuate results of the pending text State the corresponding optimal punctuate result of pending text.
8. according to the method described in claims 1 or 2 or 3 or 4 or 7, which is characterized in that the preset punctuation mark includes:It is funny Number and/or branch and/or branch.
9. a kind of processing unit, which is characterized in that including:
Pending text acquisition module, for obtaining pending text;
Optimal punctuate result acquisition module, point for being obtained according to the preset punctuation mark for including based on the pending text Cutpoint obtains the corresponding optimal punctuate result of the pending text;Wherein, the synthesized translation quality of the optimal punctuate result Optimal, the optimal punctuate result includes:At least one sentence, the synthesized translation quality are all sentences that punctuate result includes The synthesis of the corresponding translation quality of son;And
Optimal punctuate result output module, for exporting the corresponding optimal punctuate result of the processing text.
10. a kind of device for processing, which is characterized in that include memory and one or more than one program, Either more than one program is stored in memory and is configured to be executed by one or more than one processor for one of them The one or more programs include the instruction for being operated below:
Obtain pending text;
According to the cut-point that the preset punctuation mark for including based on the pending text obtains, the pending text pair is obtained The optimal punctuate result answered;Wherein, the synthesized translation optimal quality of the optimal punctuate result, the optimal punctuate result packet It includes:At least one sentence, the synthesized translation quality are the synthesis that all sentences that punctuate result includes correspond to translation quality;
Export the corresponding optimal punctuate result of the processing text.
CN201710157267.5A 2017-03-16 2017-03-16 Processing method and device for processing Active CN108628819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710157267.5A CN108628819B (en) 2017-03-16 2017-03-16 Processing method and device for processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710157267.5A CN108628819B (en) 2017-03-16 2017-03-16 Processing method and device for processing

Publications (2)

Publication Number Publication Date
CN108628819A true CN108628819A (en) 2018-10-09
CN108628819B CN108628819B (en) 2022-09-20

Family

ID=63687489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710157267.5A Active CN108628819B (en) 2017-03-16 2017-03-16 Processing method and device for processing

Country Status (1)

Country Link
CN (1) CN108628819B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408833A (en) * 2018-10-30 2019-03-01 科大讯飞股份有限公司 A kind of interpretation method, device, equipment and readable storage medium storing program for executing
CN109920406A (en) * 2019-03-28 2019-06-21 国家计算机网络与信息安全管理中心 A kind of dynamic voice recognition methods and system based on variable initial position
CN110321532A (en) * 2019-06-06 2019-10-11 数译(成都)信息技术有限公司 Language pre-processes punctuate method, computer equipment and computer readable storage medium
CN111046649A (en) * 2019-11-22 2020-04-21 北京捷通华声科技股份有限公司 Text segmentation method and device
CN114420102A (en) * 2022-01-04 2022-04-29 广州小鹏汽车科技有限公司 Method and device for speech sentence-breaking, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150139A1 (en) * 2007-12-10 2009-06-11 Kabushiki Kaisha Toshiba Method and apparatus for translating a speech
CN104915264A (en) * 2015-05-29 2015-09-16 北京搜狗科技发展有限公司 Input error-correction method and device
CN105912522A (en) * 2016-03-31 2016-08-31 长安大学 Automatic extraction method and extractor of English corpora based on constituent analyses
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150139A1 (en) * 2007-12-10 2009-06-11 Kabushiki Kaisha Toshiba Method and apparatus for translating a speech
CN104915264A (en) * 2015-05-29 2015-09-16 北京搜狗科技发展有限公司 Input error-correction method and device
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment
CN105912522A (en) * 2016-03-31 2016-08-31 长安大学 Automatic extraction method and extractor of English corpora based on constituent analyses

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408833A (en) * 2018-10-30 2019-03-01 科大讯飞股份有限公司 A kind of interpretation method, device, equipment and readable storage medium storing program for executing
WO2020087655A1 (en) * 2018-10-30 2020-05-07 科大讯飞股份有限公司 Translation method, apparatus and device, and readable storage medium
CN109920406A (en) * 2019-03-28 2019-06-21 国家计算机网络与信息安全管理中心 A kind of dynamic voice recognition methods and system based on variable initial position
CN109920406B (en) * 2019-03-28 2021-12-03 国家计算机网络与信息安全管理中心 Dynamic voice recognition method and system based on variable initial position
CN110321532A (en) * 2019-06-06 2019-10-11 数译(成都)信息技术有限公司 Language pre-processes punctuate method, computer equipment and computer readable storage medium
CN111046649A (en) * 2019-11-22 2020-04-21 北京捷通华声科技股份有限公司 Text segmentation method and device
CN114420102A (en) * 2022-01-04 2022-04-29 广州小鹏汽车科技有限公司 Method and device for speech sentence-breaking, electronic equipment and storage medium
CN114420102B (en) * 2022-01-04 2022-10-14 广州小鹏汽车科技有限公司 Method and device for speech sentence-breaking, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108628819B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
CN107291690A (en) Punctuate adding method and device, the device added for punctuate
CN108628819A (en) Treating method and apparatus, the device for processing
CN107632980A (en) Voice translation method and device, the device for voiced translation
CN107221330A (en) Punctuate adding method and device, the device added for punctuate
CN108628813A (en) Treating method and apparatus, the device for processing
CN108399914B (en) Voice recognition method and device
CN107291704B (en) Processing method and device for processing
CN106202150B (en) Information display method and device
CN107274903A (en) Text handling method and device, the device for text-processing
CN108073572A (en) Information processing method and its device, simultaneous interpretation system
CN110322760A (en) Voice data generation method, device, terminal and storage medium
CN108345612A (en) A kind of question processing method and device, a kind of device for issue handling
CN111583919A (en) Information processing method, device and storage medium
CN108304412A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN108255940A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN110069624A (en) Text handling method and device
CN111149172B (en) Emotion management method, device and computer-readable storage medium
CN107564526A (en) Processing method, device and machine readable media
CN109471919B (en) Zero pronoun resolution method and device
CN107424612A (en) Processing method, device and machine readable media
CN109002184A (en) A kind of association method and device of input method candidate word
WO2018214663A1 (en) Voice-based data processing method and apparatus, and electronic device
CN108628461A (en) A kind of input method and device, a kind of method and apparatus of update dictionary
CN113936697A (en) Voice processing method and device for voice processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant