CN108628813A - Treating method and apparatus, the device for processing - Google Patents
Treating method and apparatus, the device for processing Download PDFInfo
- Publication number
- CN108628813A CN108628813A CN201710162165.2A CN201710162165A CN108628813A CN 108628813 A CN108628813 A CN 108628813A CN 201710162165 A CN201710162165 A CN 201710162165A CN 108628813 A CN108628813 A CN 108628813A
- Authority
- CN
- China
- Prior art keywords
- punctuate
- optimal
- result
- punctuation mark
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
An embodiment of the present invention provides a kind for the treatment of method and apparatus and a kind of device for processing, method therein specifically includes:Obtain pending text;The pending text is segmented, to obtain the corresponding global word sequence of the pending text;Punctuate addition processing is carried out to the global word sequence, result is added to obtain the corresponding optimal punctuate of the pending text;Wherein, target punctuation mark is added in punctuate addition processing in the global word sequence between adjacent word, the corresponding probabilistic language model of the optimal punctuate addition result is optimal, and the optimal punctuate addition result includes:At least one semantic segment, the semantic segment include:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;Export the optimal punctuate addition result.The embodiment of the present invention can improve the accuracy of addition punctuate.
Description
Technical field
The present invention relates to technical field of information processing, are used for more particularly to a kind for the treatment of method and apparatus and one kind
The device of processing.
Background technology
In the technical field of information processing such as the communications field and internet arena, needed for some in certain application scenarios
The file addition punctuate for lacking punctuate, for example, adding punctuate etc. for the corresponding text of voice recognition result.
Existing scheme can be that the corresponding text of voice recognition result adds punctuate according to the mute interval of voice signal.
Specifically, the threshold value of mute length can be set first, if the length of mute interval when spoken user is spoken in voice signal
Degree is more than the threshold value, then adds punctuate on corresponding position;, whereas if mute when spoken user is spoken in voice signal
The length at interval is more than the threshold value, then does not add punctuate.
However, inventor has found during realizing the embodiment of the present invention, different spoken users often have different
Word speed is that the corresponding text of voice recognition result adds punctuate in this way, according to the mute interval of voice signal in existing scheme,
It will influence the accuracy of addition punctuate.For example, if the word speed of spoken user is too fast, it is not spaced or is spaced between sentence
It is very short so that being less than threshold value, then will not be that text adds any punctuate;For another example, it if the word speed of spoken user is excessively slow, approaches
The case where one word one times, then text will be corresponding with many punctuates;Above-mentioned two situations can cause punctuate to add mistake,
That is the accuracy of addition punctuate is relatively low.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind
Processing method, processing unit and the device for processing to solve the above problems, the embodiment of the present invention can improve addition punctuate
Accuracy.
To solve the above-mentioned problems, the invention discloses a kind of processing methods, including:
Obtain pending text;
The pending text is segmented, to obtain the corresponding global word sequence of the pending text;
Punctuate addition processing is carried out to the global word sequence, is added with obtaining the corresponding optimal punctuate of the pending text
Add result;Wherein, the punctuate addition processing adds target punctuation mark in the global word sequence between adjacent word, described
The corresponding probabilistic language model of optimal punctuate addition result is optimal, and the optimal punctuate addition result includes:At least one semanteme
Segment, the semantic segment include:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;
Export the optimal punctuate addition result.
On the other hand, the invention discloses a kind of processing units, including:
Pending text acquisition module, for obtaining pending text;
Word-dividing mode, for being segmented to the pending text, to obtain the corresponding overall situation of the pending text
Word sequence;
Punctuate adds processing module, for carrying out punctuate addition processing to the global word sequence, to obtain described waiting locating
It manages the corresponding optimal punctuate of text and adds result;Wherein, punctuate addition processing in the global word sequence adjacent word it
Between add target punctuation mark, the corresponding probabilistic language model of the optimal punctuate addition result is optimal, and the optimal punctuate adds
The result is added to include:At least one semantic segment, the semantic segment include:The continuous word of the overall situation word sequence, and/or, add
Added with the continuous word of punctuation mark;And
As a result output module adds result for exporting the optimal punctuate.
Optionally, the punctuate addition processing module includes:
Dynamic Programming handles submodule, and for utilizing dynamic programming algorithm, punctuate addition is carried out to the global word sequence
Processing adds result to obtain the corresponding optimal punctuate of the pending text.
Optionally, the Dynamic Programming processing submodule includes:
Gather acquiring unit, for obtaining the corresponding word sequence set of the global word sequence;
First recursion unit passes through recursion mode for the sequence of the subset according to the word sequence set from small to large
Determine that each subset corresponds to the target punctuation mark of optimal subset punctuate addition result;The optimal subset punctuate addition result corresponds to
Probabilistic language model it is optimal;
First optimal result acquiring unit, for corresponding to the addition of optimal subset punctuate according to the subset of the word sequence set
As a result, obtaining the corresponding optimal punctuate addition result of the pending text.
Optionally, the subset of the continuous word sequence set includes:Preceding i continuous words of the pending text, 0<i≤
The word quantity M that the pending text includes, then the first recursion unit include:
Subelement is added, for corresponding to the target punctuation mark that optimal subset punctuate adds result according to preceding k continuous words,
Punctuation mark is added between adjacent word in the preceding i continuous words, to obtain the preceding i corresponding at least one of continuous word
Subset punctuate adds paths;Wherein, 0<k<I, k are positive integer;
First language model probability determination subelement determines the subset punctuate for utilizing neural network language model
It adds paths and corresponds to the probabilistic language model of the first semantic segment;
First choice subelement, for the probabilistic language model according to first semantic segment, from at least one
The subset punctuate optimal optimal subset punctuate of middle selection probabilistic language model that adds paths adds paths;
Target punctuation mark obtains subelement, for the punctuate symbol for including that adds paths according to the optimal subset punctuate
Number, obtain the target punctuation mark that the preceding i continuous words correspond to optimal subset punctuate addition result.
Optionally, the Dynamic Programming processing submodule includes:
Global path acquiring unit, for adding punctuation mark between adjacent word in the global word sequence, to obtain
The corresponding global punctuate of the overall situation word sequence adds paths;
Mobile acquiring unit, for according to vertical sequence, road to be added from the global punctuate by move mode
Local punctuate is obtained in diameter to add paths and its corresponding second semantic segment;Wherein, the included word of different second semantic segments
The quantity for according with unit is identical, and the second adjacent semantic segment has the character cell repeated, and the character cell includes:Word and/
Or punctuation mark;
Second recursion unit, for according to vertical sequence, the semantic piece of optimal second to be determined by recursion mode
The corresponding target punctuation mark of section;The optimal corresponding probabilistic language model of the second semantic segment is optimal;
Second optimal result acquiring unit, for according to the corresponding target punctuate symbol of each second optimal semantic segment
Number, obtain the corresponding optimal punctuate addition result of the pending text.
Optionally, the second recursion unit includes:
Second language model probability determination subelement, for utilizing N-gram language model and/or neural network language mould
Type determines the corresponding probabilistic language model of current second semantic segment;
Second selection subelement, for according to the corresponding probabilistic language model of current second semantic segment, from a variety of
Optimal current second semantic segment is selected in the second current semantic segment;
Target punctuation mark determination subelement, the punctuation mark for including by optimal current second semantic segment
As the optimal corresponding target punctuation mark of current second semantic segment;
Second semantic segment determination subelement, for according to the optimal corresponding target punctuate symbol of current second semantic segment
Number, obtain next second semantic segment.
Optionally, the second optimal result acquiring unit includes:
Add subelement, for according to from back to front sequence or vertical sequence, according to described each optimal
The corresponding target punctuation mark of second semantic segment adds punctuation mark to the global word sequence, described pending to obtain
The corresponding optimal punctuate of text adds result.
Optionally, the punctuate addition processing module includes:
As a result exhaustive submodule adds result for obtaining the corresponding a variety of punctuates of the global word sequence;
Probabilistic language model determination sub-module, for determining the corresponding probabilistic language model of the punctuate addition result;With
And
As a result submodule is selected, for selecting language mould from the corresponding a variety of punctuates addition results of the overall situation word sequence
The optimal punctuate addition of type probability as the corresponding optimal punctuate of the pending text as a result, add result.
Include memory and one or one in another aspect, the invention discloses a kind of device for processing
Above program, one of them either more than one program be stored in memory and be configured to by one or one with
It includes the instruction for being operated below that upper processor, which executes the one or more programs,:
Obtain pending text;
The pending text is segmented, to obtain the corresponding global word sequence of the pending text;
Punctuate addition processing is carried out to the global word sequence, is added with obtaining the corresponding optimal punctuate of the pending text
Add result;Wherein, the punctuate addition processing adds target punctuation mark in the global word sequence between adjacent word, described
The corresponding probabilistic language model of optimal punctuate addition result is optimal, and the optimal punctuate addition result includes:At least one semanteme
Segment, the semantic segment include:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;
Export the optimal punctuate addition result.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention is handled in the corresponding global word sequence of pending text by punctuate addition between adjacent word
Target punctuation mark is added, and general by the corresponding language model of optimal punctuate addition result that above-mentioned punctuate addition is handled
Rate is optimal, and the optimal punctuate addition result may include:At least one semantic segment, above-mentioned semantic segment may include:Institute
The continuous word of global word sequence is stated, and/or, it is added with the continuous word of punctuation mark;Due to the optimal punctuate of the embodiment of the present invention
Addition result can realize the global optimum of probabilistic language model, herein globally available in indicating that pending text corresponds to punctuate
The corresponding entirety of result is added, therefore the optimal punctuate addition result of the embodiment of the present invention can improve the accurate of addition punctuate
Degree.
Description of the drawings
Fig. 1 is a kind of step flow chart of processing method embodiment of the present invention;
Fig. 2 is the schematic diagram that a kind of pending text of the embodiment of the present invention corresponds to the path planning of global word sequence;
Fig. 3 is a kind of structure diagram of processing unit embodiment of the present invention;
Fig. 4 be a kind of device for information processing shown according to an exemplary embodiment as terminal when block diagram;
And
Fig. 5 be a kind of device for information processing shown according to an exemplary embodiment as server when frame
Figure.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is described in further detail.
An embodiment of the present invention provides a kind of processing scheme, the program can segment pending text, to obtain
The corresponding global word sequence of the pending text carries out punctuate addition processing, to obtain described wait for the global word sequence
The corresponding optimal punctuate addition of processing text is as a result, and export the optimal punctuate addition result;Due to the embodiment of the present invention
Above-mentioned punctuate addition processing adds target punctuation mark in the global word sequence between adjacent word, and target punctuation mark is available
The best candidate punctuation mark being added between indicating adjacent word, the embodiment of the present invention handle to obtain by above-mentioned punctuate addition
The corresponding probabilistic language model of optimal punctuate addition result it is optimal, the optimal punctuate addition result may include:At least one
A semantic segment, above-mentioned semantic segment may include:The continuous word of the overall situation word sequence, and/or, added with punctuation mark
Continuous word, it is general that all semantic segments that above-mentioned probabilistic language model can include for a kind of punctuate addition result correspond to language model
The synthesis of rate;Since the optimal punctuate addition result of the embodiment of the present invention can realize the global optimum of probabilistic language model, this
The globally available of place corresponds to the corresponding entirety of punctuate addition result in the pending text of expression, therefore the embodiment of the present invention is optimal
Punctuate addition result can improve the accuracy of addition punctuate.
Embodiment of the method
Referring to Fig.1, the step flow chart for showing a kind of processing method embodiment of the present invention, can specifically include as follows
Step:
Step 101 obtains pending text;
Step 102 segments the pending text, to obtain the corresponding global word order of the pending text
Row;
Step 103 carries out punctuate addition processing to the global word sequence, corresponding most to obtain the pending text
Excellent punctuate adds result;Wherein, the punctuate addition processing adds target punctuate in the global word sequence between adjacent word
Symbol, the corresponding probabilistic language model of the optimal punctuate addition result is optimal, and the optimal punctuate addition result may include:
At least one semantic segment, the semantic segment may include:The continuous word of the overall situation word sequence, and/or, it is added with punctuate
The continuous word of symbol;
Step 104, the output optimal punctuate add result.
The embodiment of the present invention can be applied to need the arbitrary applied field of addition punctuate in speech recognition, machine translation etc.
Scape, it will be understood that the embodiment of the present invention does not limit specific application scenarios.For example, in the applied field of speech recognition
Under scape,
Processing method provided in an embodiment of the present invention can be applied to the application environment of the computing devices such as terminal or server
In.Optionally, above-mentioned terminal can include but is not limited to:Smart mobile phone, tablet computer, pocket computer on knee, vehicle mounted electric
Brain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic services
Device, the processing service for providing pending text to client.
Processing method provided in an embodiment of the present invention is applicable to the processing processing of the language such as Chinese, Japanese, Korean, is used for
Improve the accuracy of addition punctuate.It is appreciated that arbitrarily needing the language for being added punctuate in the embodiment of the present invention
In the scope of application of processing method.
In the embodiment of the present invention, the text that pending text can be used for indicating to carry out processing processing, the pending text
Originally text or voice that user is inputted by computing device can be derived from, other computing devices are can be from.It needs
It is bright, may include in above-mentioned pending text:A kind of language or more than one language, for example, above-mentioned pending text
It may include Chinese in this, can also include the Chinese mixing with other for example English language, the embodiment of the present invention is to specific
Pending text do not limit.
In practical applications, the computing device of the embodiment of the present invention can by client end AP P (application,
Application the process flow of the embodiment of the present invention) is executed, client application may operate on computing device, example
Such as, which can be the arbitrary APP that runs in terminal, then the client application can be answered from other of computing device
With the pending text of acquisition.Alternatively, the computing device of the embodiment of the present invention can be executed by the functional device of client application
The process flow of the embodiment of the present invention, then the functional device can be from the pending text of other functional devices acquisition.Alternatively,
The computing device of the embodiment of the present invention can execute the processing method of the embodiment of the present invention as server.
In a kind of alternative embodiment of the present invention, step 101 can obtain according to the voice signal of spoken user and wait locating
Text is managed, in such cases, the voice signal of spoken user can be converted to text message by step 101, and be believed from the text
Pending text is obtained in breath.Alternatively, the voice signal that step 101 can directly receive user from speech recognition equipment is corresponding
Text message, and obtain pending text from from text information.In practical applications, spoken user may include:In unison
It talks in the scene of translation and sends out the user of voice signal, and/or generate the user etc. of voice signal by terminal, then may be used
To receive the voice signal of spoken user by microphone or other voice collecting devices.
It is alternatively possible to which the voice signal of spoken user is converted to text message using speech recognition technology.If will
The voice signal of user's spoken user is denoted as S, and corresponding phonetic feature sequence O is obtained after carrying out a series of processing to S,
It is denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is phonetic feature total number.S pairs of voice signal
The sentence answered is considered as a word string being made of many words, is denoted as W={ w1, w2..., wn}.The process of speech recognition is exactly
According to known phonetic feature sequence O, most probable word string W is found out.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people
Sound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, to establish speech recognition institute
The template needed;The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratio
Compared with process, finally determine with the optimal Template of the inputted voice match of the user, to obtain the result of speech recognition.Tool
The speech recognition algorithm of body can be used training and the recognizer of the hidden Markov model based on statistics, base can also be used
In the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present invention
Embodiment does not limit specific speech recognition process.
In another alternative embodiment of the present invention, step 101 can obtain pending according to text input by user
Text.For example, the text that is inputted under the scenes such as instant messaging, office documents of user may not include punctuation mark or comprising
Punctuation mark it is less, therefore can be as the source of pending text.
In practical applications, step 101 can be according to practical application request, from the corresponding text of voice signal or user
Pending text is obtained in the text of input.It is alternatively possible to which the interval time according to voice signal S, corresponds to from voice signal S
Text in obtain pending text;For example, when the interval time of voice signal S being more than time threshold, when can be according to this
Between put and determine corresponding first separation, using the corresponding texts of voice signal S before first separation as pending text
This, and the corresponding texts of voice signal S after first separation are handled, to continue therefrom to obtain pending text
This.Alternatively, it is alternatively possible to according to the number of words that the corresponding text of voice signal or text input by user are included, from language
Pending text is obtained in the corresponding text of sound signal or text input by user;For example, in the corresponding text of voice signal
Or text input by user include number of words be more than number of words threshold value when, corresponding second point can be determined according to the number of words threshold value
Boundary's point, can using the corresponding texts of voice signal S before second separation as pending text, and to this second boundary
The corresponding texts of voice signal S after point are handled, to continue therefrom to obtain pending text.It is appreciated that the present invention
Embodiment from the corresponding text of voice signal or text input by user for obtaining the detailed process of pending text not
It limits.
Language material used in training due to language model is usually the language material by segmenting, therefore in order to obtain optimal punctuate
The corresponding probabilistic language model of semantic segment included by result is added, the embodiment of the present invention can be waited for by step 102 above-mentioned
Processing text is segmented, to obtain the corresponding global word sequence of above-mentioned pending text.
So-called participle is exactly by text dividing into individual word one by one, is by continuous text according to certain rule
Model is reassembled into the process of global word sequence.By taking Chinese words segmentation as an example, the target of participle technique is exactly by text dividing
For individual Chinese word one by one.And be individual word by sentence cutting, it is realize machine recognition human language first
Step, therefore participle technique is widely used in the nature such as literary periodicals, machine translation, speech recognition, text snippet, text retrieval
In the application branch of Language Processing.
In the embodiment of the present invention, step 102 segments the pending text, and the segmenting method that may be used is specific
May include:Segmenting method based on string matching, the segmenting method based on understanding and the segmenting method etc. based on statistics, can
To understand, the embodiment of the present invention does not limit the detailed process segmented to the pending text.In the present invention
A kind of application example in, pending text is " it is that Nice to see you by Xiao Ming that you, which get well me, ", then its corresponding global word sequence
May include:" hello/I be/Xiao Ming/be very glad/recognize you ".
It should be noted that the process that is segmented to the pending text of step 102 and speech recognition process can be with
For mutually independent process, the process that step 102 segments the pending text can not be by speech recognition process
It influences, for example, step 102 can sentence W progress word segmentation processing corresponding to voice signal S.
In a kind of alternative embodiment of the present invention, the method for the embodiment of the present invention can also include:Step 101 is obtained
At least one pending text write-in buffer area taken;Then step 102 can read pending text from the buffer area first,
And read pending text is segmented.It is alternatively possible to establish such as queue, number in the memory field of computing device
As above-mentioned buffer area, the embodiment of the present invention does not limit specific buffer area for group or the data structure of chained list.On
The treatment effeciency of pending text can be improved in such a way that buffer area stores pending text by stating, it will be understood that use magnetic
The mode of the pending text of disk storage is also feasible, and the embodiment of the present invention is not added with the specific storage mode of pending text
With limitation.
In the embodiment of the present invention, the pending text correspond to can be added between adjacent word in global word sequence it is corresponding
A variety of candidate's punctuation marks are added that is, can be corresponded to according to the pending text in global word sequence between adjacent word
The situation of a variety of candidate's punctuation marks carries out punctuate addition processing, in this way, a pending text pair to the pending text
Answer global word sequence will it is corresponding there are many punctuate addition scheme and its addition of corresponding punctuate as a result, the embodiment of the present invention is final
To be that the optimal optimal punctuate of probabilistic language model adds result.Wherein, above-mentioned probabilistic language model can be (arbitrary) one
All semantic segments that kind punctuate addition result includes correspond to the synthesis of probabilistic language model.
In natural language processing field, language model is the probabilistic model established for a kind of language or multilingual,
Purpose is to establish the distribution for the probability that one can describe appearance of the given global word sequence in language.Specific to of the invention real
Example is applied, the distribution of the probability of appearance of the given global word sequence that can describe language model in language is known as language model
Probability, also, the given global word sequence of language model description can carry punctuation mark.It is alternatively possible to from corpus
Language material sentence is obtained, which is segmented, and according to the global word sequence comprising punctuation mark, training obtains above-mentioned
Language model.Such as " I likes dog, and dog plays ball." corresponding global word sequence can be:" I/like/dog/,/dog/object for appreciation/
Ball/.", it will be understood that specific overall situation word sequence used by training of the embodiment of the present invention for language model is not limited
System.
In the embodiment of the present invention, language model may include:N-gram (N-gram) language model, and/or, nerve net
Network language model, wherein neural network language model may further include:RNNLM (Recognition with Recurrent Neural Network language model,
Recurrent neural Network Language Model), CNNLM (convolutional neural networks language model,
Convolutional Neural Networks Language Model), DNNLM (deep neural network language model, Deep
Neural Networks Language Model) etc..
Wherein, N-gram language models based on it is such a it is assumed that i.e. the appearance of n-th word only and the word phase of front N-1
It closes, and it is all uncorrelated to other any words, and the probability of whole sentence is exactly the product of each word probability of occurrence.
Since N-gram language models predict n-th word using limited N-1 word (above), therefore N-gram language moulds
Type can have the descriptive power of the probabilistic language model for the semantic segment that length is N, for example, N can be 3,5 etc. relatively fixed
And numerical value be less than the first length threshold positive integer.And relative to the neural network language of N-gram language models, such as RNNLM
One advantage of speech model is:Really fully next word can be predicted using all above, therefore RNNLM can have
The descriptive power of the probabilistic language model of the variable semantic segment of length, that is, RNNLM is suitable for the semanteme of wider length range
Segment, for example, the length range of the corresponding semantic segments of RNNLM can be:1~the second length threshold, wherein the second length threshold
Value is more than the first length threshold.
In the embodiment of the present invention, semantic segment can be used for indicating the part of the global word sequence added with punctuation mark, institute
Stating semantic segment may include:The continuous word (namely not including punctuation mark) of the overall situation word sequence, and/or, added with mark
The continuous word of point symbol.It is alternatively possible to part be intercepted from above-mentioned global word sequence, to obtain above-mentioned continuous word.For example, right
For global word sequence " hello, and/I is that/Xiao Ming/is very glad/recognizes you ", corresponding semantic segment may include:" you
It is good/,/I am ", " I is that/Xiao Ming/is very glad " etc., wherein "/" is the explanation of application documents and the symbol that is arranged for convenience,
"/" is for indicating the boundary between boundary, and/or word and punctuation mark between word, and in practical applications, "/" can not have
For in all senses.
It should be noted that those skilled in the art can be according to practical application request, determining needs candidate mark to be added
Point symbol, optionally, above-mentioned candidate's punctuation mark may include:Comma, question mark, fullstop, exclamation mark, space etc., wherein space
It can play the role of word segmentation or cut little ice, for example, for English, space can be used for dividing different
Word, for Chinese, space can be the punctuation mark to cut little ice, it will be understood that the embodiment of the present invention is for tool
The candidate punctuation mark of body does not limit.
The embodiment of the present invention can be provided carries out punctuate addition processing to the global word sequence, described pending to obtain
The following technical solution of the corresponding optimal punctuate addition result of text:
Technical solution 1,
Technical solution 1 may include:Obtain the corresponding a variety of punctuate addition results of the global word sequence;Determine the mark
The corresponding probabilistic language model of point addition result;Language is selected from the corresponding a variety of punctuates addition results of the overall situation word sequence
The optimal punctuate addition of model probability as the corresponding optimal punctuate of the pending text as a result, add result.
In practical applications, path planning algorithm may be used, obtain the corresponding a variety of punctuates of the global word sequence and add
Add result.The principle of above-mentioned path planning algorithm can be, in the environment with barrier, according to certain evaluation criterion,
A collisionless path from initial state to dbjective state is found, specific to the embodiment of the present invention, barrier can be used for indicating
The candidate punctuation mark added between the adjacent word that pending text corresponds to global word sequence, initial state and dbjective state point
The first word and the punctuation mark after the word of end that pending text corresponds to global word sequence are not indicated.
With reference to Fig. 2, show that a kind of pending text of the embodiment of the present invention corresponds to the path planning of global word sequence
Schematic diagram, wherein pending text corresponds to global word order and is classified as " hello, and/I is that/Xiao Ming/is very glad/recognizes you ", then " hello/
I be/Xiao Ming/be very glad/recognize you " adjacent word between be possible to be added candidate punctuation mark;In Fig. 2, " hello ",
The words such as " I is ", " Xiao Ming ", " being very glad ", " recognizing you " indicate with rectangle respectively, comma, space, exclamation, question mark, fullstop etc.
Punctuation mark indicates that then pending text corresponds to the first word " hello " and end word " understanding of global word sequence with circle respectively
You " after punctuation mark between can have mulitpath.
It is appreciated that path planning algorithm is intended only as the alternative embodiment of the embodiment of the present invention, actually this field skill
Art personnel can obtain the corresponding a variety of punctuate addition knots of the pending text according to practical application request using other algorithms
Fruit, it will be understood that the specific acquisition algorithm that the embodiment of the present invention adds result for a variety of punctuates does not limit.
In practical applications, the corresponding probabilistic language model of the punctuate addition result can be determined using language model,
Language model may include accordingly:N-gram language models, and/or, neural network language model etc..
In a kind of alternative embodiment of the present invention, the determination punctuate adds the corresponding probabilistic language model of result
Process may include:For the third semantic segment that each punctuate addition result includes, corresponding probabilistic language model is determined;It is right
The corresponding probabilistic language model of all third semantic segments that each punctuate addition result includes is merged, to obtain corresponding language
Say model probability;It can then be added from all punctuates in result and obtain the highest punctuate addition of probabilistic language model as a result, conduct
The corresponding optimal punctuate of the pending text adds result.
It is alternatively possible to according to vertical sequence, acquisition pair in result is added from the punctuate by move mode
The quantity of the third semantic segment answered, different the included character cells of third semantic segment can be identical, and adjacent second is semantic
Segment may exist the character cell repeated, and the character cell may include:Word and/or punctuation mark.In such cases, may be used
The corresponding probabilistic language model of third semantic segment is determined by N-gram language models and/or neural network language model.Assuming that N
=5, the number of initial character unit is 1, then can be according to the following sequence of number:1-5,2-6,3-7,4-8 etc. are from the punctuate
The third semantic segment that corresponding length is 5 is obtained in addition result, and determines that each third is semantic using N-gram language models
The corresponding probabilistic language model of segment, for example, each third semantic segment is inputted N-gram, then exportable corresponding languages of N-gram
Say model probability.
Optionally, it is above-mentioned to each punctuate addition result corresponding probabilistic language model of all third semantic segments for including into
Row fusion process may include:The corresponding probabilistic language model of all third semantic segments for including to each punctuate addition result
Carry out summation or product or weighted average processing etc., it will be understood that the embodiment of the present invention is tied for being added to each punctuate
The detailed process that the corresponding probabilistic language model of all third semantic segments that fruit includes is merged does not limit.
In another alternative embodiment of the present invention, it is general that the determination punctuate adds the corresponding language model of result
The process of rate may include:Using neural network language model, determine that all semantic segments of each punctuate addition result are corresponding
Probabilistic language model;It can then be added from all punctuates in result and obtain the highest punctuate addition of probabilistic language model as a result, making
Result is added for the corresponding optimal punctuate of the pending text.Since RNNLM is suitable for the semantic segment of wider length range,
Therefore all semantic segments that each punctuate can be added to result are as a whole, determine that punctuate adds the institute of result by RNNLM
There is the corresponding probabilistic language model of semantic segment, for example, all character cells that punctuate addition result includes are inputted into RNNLM,
The then exportable corresponding probabilistic language models of RNNLM.
Technical solution 2,
Technical solution 2 may include:Using dynamic programming algorithm, punctuate addition processing is carried out to the global word sequence,
Result is added to obtain the corresponding optimal punctuate of the pending text.
The principle of above-mentioned dynamic programming algorithm can be, by splitting problem, the pass between problem definition state and state
System so that problem can go to solve in a manner of recursion (dividing and ruling in other words).Specific to the embodiment of the present invention, problem can be:
The corresponding optimal punctuate addition of the pending text is found as a result, state can be the punctuate addition processing to global word sequence
It is decomposed, to obtain the corresponding optimal punctuate addition result of the pending text in part and its corresponding target punctuation mark;Mesh
The best candidate punctuation mark that mark punctuation mark can be used for indicating being added between adjacent word.It is exhaustive complete relative to technical solution 1
The corresponding a variety of punctuate addition results of office's word sequence and the therefrom optimal punctuate addition of selection probabilistic language model are as a result, technology
The dynamic programming algorithm that scheme 2 uses can reduce operand, and as the pending text corresponds to the length of global word sequence
The reduction amplitude of the increase of degree, operand will be increasing.
The embodiment of the present invention can be provided using dynamic programming algorithm, be carried out at punctuate addition to the global word sequence
Reason, to obtain the following Dynamic Programming scheme that the corresponding optimal punctuate of the pending text adds result:
Dynamic Programming scheme 1,
It is above-mentioned to utilize dynamic programming algorithm in Dynamic Programming scheme 1, the global word sequence is carried out at punctuate addition
Reason, to obtain the corresponding optimal punctuate addition of the pending text as a result, can specifically include:
Obtain the corresponding word sequence set of the global word sequence;
According to the subset sequence from small to large of the word sequence set, it is optimal to determine that each subset corresponds to by recursion mode
Subset punctuate adds the target punctuation mark of result;The optimal subset punctuate adds the corresponding probabilistic language model of result most
It is excellent;
Subset according to the word sequence set corresponds to the addition of optimal subset punctuate as a result, obtaining the pending text pair
The optimal punctuate addition result answered.
Wherein, above-mentioned word sequence set can be used for indicating the word sequence for the continuous word composition that the global word sequence is included
Set, optionally, the subset of above-mentioned word sequence set can be made of preceding i continuous words of global word sequence, for example, global word
Sequence [C1C2…CM] corresponding word sequence set may include:{C1, C1C2, C1C2C3..., C1C2…CM, the word sequence set
Including subset can be expressed as according to the sequence of subset length (namely subset include word quantity) from small to large:{C1}、
{C1C2}、{C1C2C3}…{C1C2…CM, wherein CiI-th of word for including for indicating pending text, i are just more than 0
Integer, M indicate the word quantity (namely length of global word sequence) of the pending text, and M is positive integer.On it is appreciated that
State global word sequence [C1C2…CM] length difference of adjacent subset is 1 to be intended only as optional implementation in corresponding word sequence set
Example, in fact, global word sequence [C1C2…CM] length difference of adjacent subset can also be more than 1 in corresponding word sequence set.
For each subset of word sequence set, it is general that corresponding subset punctuate addition result is corresponding with language model
Rate, therefore the embodiment of the present invention can determine that each subset corresponds to the target punctuation mark of optimal subset punctuate addition result;It is described most
The target punctuation mark of excellent subset punctuate addition result can be used for indicating subset corresponds to optimal subset punctuate add result it is optimal when,
By which kind of Segmentation of Punctuation between adjacent word.Assuming that subset { C1C2C3Corresponding optimal subset punctuate addition result is { (C1),
(C2C3), then illustrate subset { C1C2C3In adjacent word " C1" and " C2" between by ", " segmentation, subset { C1C2C3In adjacent word
“C2" and " C3" between divided by space, corresponding target punctuation mark can be expressed as:“C1" number 1 and comma, Ke Yili
Solution, the embodiment of the present invention do not limit the specific representation of target punctuation mark.
The embodiment of the present invention can be true by recursion mode according to the subset sequence from small to large of the word sequence set
Fixed each subset corresponds to the target punctuation mark that optimal subset punctuate adds result, it is assumed that according to the word sequence set subset from
It is small to be expressed as each subset to big sequence:G1、G2、G3…Gu, wherein u is positive integer, then can obtain G successively1、G2、G3…
GuThe target punctuation mark of corresponding optimal subset punctuate addition result.Also, for Go (1≤o≤u), before needing Go
Subset (such as Go-1、Go-2Deng) optimal subset punctuate addition as a result, determine Go corresponds to optimal subset punctuate addition result mesh
Punctuation mark is marked, specifically, the optimal subset punctuate that Go can be multiplexed the subset before Go adds as a result, for example, subset
{C1C2C3C4In punctuate addition processing between first 3 continuous words can be multiplexed subset { C1C2C3Optimal subset punctuate addition
As a result.
In a kind of alternative embodiment of the present invention, the subset of the continuous word sequence set may include:It is described to wait locating
Preceding i continuous words of reason text, 0<The word quantity M that i≤pending text includes, then it is described according to the word sequence set
Subset sequence from small to large, determine that each subset corresponds to the target punctuate that optimal subset punctuate adds result by recursion mode
Symbol can specifically include:
The target punctuation mark that optimal subset punctuate adds result is corresponded to according to preceding k continuous words, it is continuous at the preceding i
Punctuation mark is added between adjacent word in word, to obtain the preceding i continuous word corresponding at least one subset punctuate addition roads
Diameter;Wherein, 0<k<I, k are positive integer;
Using neural network language model, determines that the subset punctuate adds paths and correspond to the language mould of the first semantic segment
Type probability;
According to the probabilistic language model of first semantic segment, add paths middle choosing from least one subset punctuate
The optimal optimal subset punctuate of probabilistic language model is selected to add paths;
It adds paths the punctuation mark for including according to the optimal subset punctuate, obtains the preceding i continuous words and correspond to most
The target punctuation mark of excellent subset punctuate addition result.
Subset punctuate add paths can be used for indicating using the first word of subset as the end word of initial state and subset it
Punctuation mark afterwards is the corresponding path of dbjective state.Optionally, it is corresponded to according to k continuous words between preceding i continuous words
Optimal subset punctuate adds the target punctuation mark of result, and adjacent punctuate is added between this what k-th of word and i-th of word included
Symbol, the corresponding at least one subset punctuate of a continuously words of i adds paths before can obtaining.Wherein, per subset punctuate addition
Path can be corresponding with the first semantic segment, the i continuously corresponding punctuate additions of word before which can be used for indicating
As a result.
Since RNNLM is suitable for the semantic segment of wider length range, for example, the length of the corresponding semantic segments of RNNLM
It may range from:1~the second length threshold, therefore for 0<The word quantity M that i≤pending text includes, the present invention are implemented
Example can utilize neural network language model, determine that the subset punctuate adds paths and correspond to the language model of the first semantic segment
Probability.
Since a variety of punctuation marks can be added between the adjacent word of a pair of preceding i continuous words, therefore under normal conditions, preceding i
The type that the corresponding subset punctuate of a continuous word adds paths is more than 1, and therefore, the embodiment of the present invention can be according to first language
The probabilistic language model of adopted segment, it is optimal most from least one subset punctuate middle selection probabilistic language model that adds paths
Excellent subset punctuate adds paths, and adds paths the punctuation mark for including according to the optimal subset punctuate, obtains the preceding i
Continuous word corresponds to the target punctuation mark of optimal subset punctuate addition result.It is alternatively possible to further, preceding i continuous words pair
The target punctuation mark for answering optimal subset punctuate addition result adds punctuation mark in preceding j continuous words between adjacent word, with
The corresponding at least one subset punctuate of a continuously words of the preceding j is obtained to add paths;Wherein, j > i, j are positive integer.
Optionally, above-mentioned to add paths the punctuation mark for including according to the optimal subset punctuate, obtain described first i even
Continuous word corresponds to the target punctuation mark of optimal subset punctuate addition result, may include:Optimal subset punctuate is corresponded to each subset
The target punctuation mark of addition result is recorded;Alternatively, the information to each subset and its corresponding optimal subset punctuate addition knot
Mapping relations between the target punctuation mark of fruit are recorded, to obtain corresponding record content.Wherein, the letter of above-mentioned subset
Breath may include:The number information of the corresponding end word of subset, and/or, corresponding number information of subset etc..For example, for preceding i
A continuous word, corresponding number information can be i, correspond to the information etc. of end word namely i-th of word.It is appreciated that this
Inventive embodiments do not limit the specifying information of subset.Wherein, record each subset correspond to optimal subset punctuate addition
As a result during target punctuation mark, the target complete mark that each subset corresponds to optimal subset punctuate addition result can be recorded
Point symbol can also record each subset and correspond to parts optimal subset punctuate addition result, different from adjacent previous subset
Target punctuation mark.
In a kind of alternative embodiment of the present invention, the above-mentioned subset according to the word sequence set corresponds to optimal subset mark
Point addition is as a result, obtain the corresponding optimal punctuate addition of the pending text as a result, can specifically include:
The maximal subset of the word sequence set is corresponded into the addition of optimal subset punctuate as a result, as the pending text
Corresponding optimal punctuate adds result;And/or
Maximal subset according to the word sequence set corresponds to all target punctuates symbol of optimal subset punctuate addition result
Number, make pauses in reading unpunctuated ancient writings to the word sequence, result is added to obtain the corresponding optimal punctuate of the pending text;And/or
Each subset according to the word sequence set corresponds to the partial target punctuation mark of optimal subset punctuate addition result,
Make pauses in reading unpunctuated ancient writings to the word sequence, result is added to obtain the corresponding optimal punctuate of the pending text.
To sum up, Dynamic Programming scheme 1 is determined according to the subset sequence from small to large of word sequence set by recursion mode
Each subset corresponds to the target punctuation mark of optimal subset punctuate addition result, and is corresponded to most according to the subset of the word sequence set
Excellent subset punctuate addition is as a result, obtain the corresponding optimal punctuate addition result of the pending text;Wherein, the optimal subset
The corresponding probabilistic language model of punctuate addition result is optimal, in this way, the subset before capable of being covered due to subset later, this
Sample, the subset before the subset after can enabling is multiplexed correspond to the target punctuate symbol that optimal subset punctuate adds result
Number, therefore the operand needed for the acquisition of optimal punctuate addition result can be reduced by recursion mode;Also, according to from it is small to
Big sequence, subset can gradually cover the semantic segment that word sequence is included, and therefore, above-mentioned subset can gradually realize word order
The included semantic segment of row corresponds to the optimal of probabilistic language model.
Dynamic Programming scheme 2,
It is above-mentioned to utilize dynamic programming algorithm in Dynamic Programming scheme 2, the global word sequence is carried out at punctuate addition
Reason, to obtain the corresponding optimal punctuate addition of the pending text as a result, can specifically include:
Punctuation mark is added between adjacent word in the global word sequence, it is corresponding complete to obtain the global word sequence
Office's punctuate adds paths;
According to vertical sequence, added from the global punctuate middle the acquisitions part punctuate that adds paths by move mode
Add path and its corresponding second semantic segment;Wherein, the quantity of the included character cell of different second semantic segments is identical, phase
There is the character cell repeated in the second adjacent semantic segment, the character cell includes:Word and/or punctuation mark;
According to vertical sequence, determine that the corresponding target punctuate of the second optimal semantic segment accords with by recursion mode
Number;The optimal corresponding probabilistic language model of the second semantic segment is optimal;
According to the corresponding target punctuation mark of each second optimal semantic segment, obtains the pending text and correspond to
Optimal punctuate add result.
Dynamic Programming scheme 2 adds paths middle acquisition according to vertical sequence, by move mode from global punctuate
Length is identical (identical comprising character cell quantity) and there is the second semantic segment repeated, and according to vertical sequence,
The corresponding target punctuation mark of the second optimal semantic segment is determined by recursion mode.Wherein, global punctuate adds paths
Acquisition process is referred to Fig. 2, and the embodiment of the present invention does not limit the specific acquisition process that global punctuate adds paths.
Local punctuate, which adds paths, can be used for indicating the part that global punctuate adds paths, each global punctuate adds paths can be right
There should be the second semantic segment.
In practical applications, the corresponding probabilistic language model of the second semantic segment can be determined by N-gram language models.It is false
If N=5, then the length of the second semantic segment can be 5, it is assumed that the number of the initial character unit of word sequence is 1, then can be according to
The following sequence of number:1-5,2-6,3-7,4-8 etc. are added from the punctuate and are obtained the second language that corresponding length is 5 in result
Adopted segment, and determine the corresponding probabilistic language model of each second semantic segment using N-gram language models, for example, by each second
Semantic segment inputs N-gram, then the exportable corresponding probabilistic language models of N-gram.It certainly, also can be by neural network language mould
Type (such as Recognition with Recurrent Neural Network language model) determines the corresponding probabilistic language model of the second semantic segment, the embodiment of the present invention for
The specific determination process of the corresponding probabilistic language model of second semantic segment does not limit.It is appreciated that above-mentioned adjacent second
Displacement distance between semantic segment is intended only as example for 1, in fact, those skilled in the art can be according to practical application need
It asks, determines the displacement distance between above-mentioned adjacent second semantic segment, for example, the displacement distance can also be 2,3 etc..
It is above-mentioned according to vertical sequence in a kind of alternative embodiment of the present invention, it is determined most by recursion mode
The corresponding target punctuation mark of the second excellent semantic segment, can specifically include:
Using N-gram language model and/or neural network language model, the corresponding language of current second semantic segment is determined
Say model probability;
According to the corresponding probabilistic language model of current second semantic segment, from a variety of the second current semantic segments
Select optimal current second semantic segment;
The punctuation mark for including using optimal current second semantic segment is as optimal current second semanteme
The corresponding target punctuation mark of segment;
According to the optimal corresponding target punctuation mark of current second semantic segment, next second semantic segment is obtained.
It is corresponding second semantic that current second semantic segment can be used for indicating that in recursive process, local punctuate adds paths
Field, it is assumed that the number of current second semantic segment is k, and k is positive integer, then can utilize N-gram language model and/or god
Through netspeak model, the corresponding probabilistic language model of k-th of second semantic segments is determined, and semantic from a variety of k-th second
K-th optimal of second semantic segments for selecting probabilistic language model optimal in segment, by k-th optimal of second semantic segments
Including punctuation mark as corresponding target punctuation mark;And according to the optimal corresponding target of k-th of second semantic segments
Punctuation mark obtains+1 the second semantic segment of kth, wherein+1 the second semantic segment of kth can be multiplexed optimal k-th
The corresponding target punctuation mark of second semantic segment.By taking Fig. 2 as an example, it is assumed that the length of the second semantic segment is the 5, the optimal the 1st
A second semantic segment is " hello/,/I am/space/Xiao Ming ", then the 2nd the second semantic segment " mark by punctuation mark/I is/
Point symbol/Xiao Ming/punctuation mark " can be multiplexed the optimal corresponding target punctuation mark of the 1st the second semantic segment, in this way,
2nd the second semantic segment can add punctuation mark on the basis of " ,/I be/space/Xiao Ming/punctuation mark ", in this way,
Optimal punctuation mark can be selected from a variety of punctuation marks after " Xiao Ming ".
In practical applications, above-mentioned according to the corresponding target punctuation mark of each second optimal semantic segment, it obtains
The corresponding optimal punctuate addition of the pending text is as a result, can specifically include:According to from back to front sequence or the past
Sequence after arriving adds the global word sequence according to the corresponding target punctuation mark of each second optimal semantic segment
Mark-on point symbol adds result to obtain the corresponding optimal punctuate of the pending text.That is, can according to certain sequence,
Each punctuation mark position that determining overall situation punctuate adds paths corresponding target punctuation mark (between adjacent word), and according to above-mentioned
Target punctuation mark obtains the corresponding optimal punctuate addition result of the pending text.
To sum up, Dynamic Programming scheme 2 adds road by move mode according to vertical sequence from the global punctuate
Local punctuate is obtained in diameter to add paths and its corresponding second semantic segment, and according to vertical sequence, passes through recursion
Mode determines the corresponding target punctuation mark of the second optimal semantic segment;Because there is repetition in the second adjacent semantic segment
Character cell, therefore next second semantic segment can be multiplexed the optimal corresponding target punctuation mark of current second semantic segment,
Therefore the operand needed for the acquisition of optimal punctuate addition result can be reduced by recursion mode;Also, due to adjacent second
There is displacement distance, therefore the embodiment of the present invention can pass through the optimal probabilistic language model of the second semantic segment between semantic segment
All second semantic segments of optimal realization correspond to the optimal of optimal probabilistic language model.
The optimal punctuate addition result output that step 104 can obtain step 103.It is appreciated that people in the art
Member can be according to practical application request, and the optimal punctuate addition result that step 103 is obtained exports.For example, can be by step 103
Obtained optimal punctuate addition result is shown in the display device of current computing device;For another example, it can be set by currently calculating
The standby optimal punctuate addition obtained to other computing device forwarding steps 103 is as a result, for example, be server in current computing device
When, other computing devices can be client or other servers etc..
To sum up, the processing method of the embodiment of the present invention, by punctuate addition processing in the corresponding global word of pending text
Target punctuation mark is added in sequence between adjacent word, and result is added by the optimal punctuate that above-mentioned punctuate addition is handled
Corresponding probabilistic language model is optimal, and the optimal punctuate addition result may include:At least one semantic segment, above-mentioned semanteme
Segment may include:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;Due to above-mentioned language
Model probability can be the synthesis that all semantic segments that the optimal punctuate addition result includes correspond to probabilistic language model, therefore
The optimal punctuate addition result of the embodiment of the present invention can realize the global optimum of probabilistic language model, herein it is globally available in
Indicate that pending text corresponds to the corresponding entirety of punctuate addition result, therefore the optimal punctuate of the embodiment of the present invention adds result energy
Enough accuracy for improving addition punctuate.
It should be noted that for embodiment of the method, for simple description, therefore it is dynamic to be all expressed as a series of movement
It combines, but those skilled in the art should understand that, the embodiment of the present invention is not limited by described athletic performance sequence
System, because of embodiment according to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, art technology
Personnel should also know that embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs
Surely it is necessary to the embodiment of the present invention.
Device embodiment
With reference to Fig. 3, shows a kind of structure diagram of processing unit embodiment of the present invention, can specifically include:It waits locating
Manage text acquisition module 301, word-dividing mode 302, punctuate addition processing module 303 and result output module 304.
Wherein, pending text acquisition module 301, for obtaining pending text;
Word-dividing mode 302 is corresponding complete to obtain the pending text for being segmented to the pending text
Office's word sequence;
Punctuate adds processing module 303, for carrying out punctuate addition processing to the global word sequence, to obtain described wait for
It handles the corresponding optimal punctuate of text and adds result;Wherein, the punctuate addition processing adjacent word in the global word sequence
Between add target punctuation mark, the corresponding probabilistic language model of the optimal punctuate addition result is optimal, the optimal punctuate
Adding result may include:At least one semantic segment, the semantic segment may include:It is described the overall situation word sequence it is continuous
Word, and/or, it is added with the continuous word of punctuation mark;And
As a result result is added in output module 304 for exporting the optimal punctuate.
Optionally, the punctuate addition processing module 303 may include:
Dynamic Programming handles submodule, and for utilizing dynamic programming algorithm, punctuate addition is carried out to the global word sequence
Processing adds result to obtain the corresponding optimal punctuate of the pending text.
Optionally, the Dynamic Programming processing submodule may include:
Gather acquiring unit, for obtaining the corresponding word sequence set of the global word sequence;
First recursion unit passes through recursion mode for the sequence of the subset according to the word sequence set from small to large
Determine that each subset corresponds to the target punctuation mark of optimal subset punctuate addition result;The optimal subset punctuate addition result corresponds to
Probabilistic language model it is optimal;
First optimal result acquiring unit, for corresponding to the addition of optimal subset punctuate according to the subset of the word sequence set
As a result, obtaining the corresponding optimal punctuate addition result of the pending text.
Optionally, the subset of the continuous word sequence set may include:Preceding i continuous words of the pending text, 0
<The word quantity M that i≤pending text includes, then the first recursion unit may include:
Subelement is added, for corresponding to the target punctuation mark that optimal subset punctuate adds result according to preceding k continuous words,
Punctuation mark is added between adjacent word in the preceding i continuous words, to obtain the preceding i corresponding at least one of continuous word
Subset punctuate adds paths;Wherein, 0<k<I, k are positive integer;
First language model probability determination subelement determines the subset punctuate for utilizing neural network language model
It adds paths and corresponds to the probabilistic language model of the first semantic segment;
First choice subelement, for the probabilistic language model according to first semantic segment, from at least one
The subset punctuate optimal optimal subset punctuate of middle selection probabilistic language model that adds paths adds paths;
Target punctuation mark obtains subelement, for the punctuate symbol for including that adds paths according to the optimal subset punctuate
Number, obtain the target punctuation mark that the preceding i continuous words correspond to optimal subset punctuate addition result.
Optionally, the Dynamic Programming processing submodule may include:
Global path acquiring unit, for adding punctuation mark between adjacent word in the global word sequence, to obtain
The corresponding global punctuate of the overall situation word sequence adds paths;
Mobile acquiring unit, for according to vertical sequence, road to be added from the global punctuate by move mode
Local punctuate is obtained in diameter to add paths and its corresponding second semantic segment;Wherein, the included word of different second semantic segments
The quantity for according with unit is identical, and the second adjacent semantic segment has the character cell repeated, and the character cell may include:Word
And/or punctuation mark;
Second recursion unit, for according to vertical sequence, the semantic piece of optimal second to be determined by recursion mode
The corresponding target punctuation mark of section;The optimal corresponding probabilistic language model of the second semantic segment is optimal;
Second optimal result acquiring unit, for according to the corresponding target punctuate symbol of each second optimal semantic segment
Number, obtain the corresponding optimal punctuate addition result of the pending text.
Optionally, the second recursion unit may include:
Second language model probability determination subelement, for utilizing N-gram language model and/or neural network language mould
Type determines the corresponding probabilistic language model of current second semantic segment;
Second selection subelement, for according to the corresponding probabilistic language model of current second semantic segment, from a variety of
Optimal current second semantic segment is selected in the second current semantic segment;
Target punctuation mark determination subelement, the punctuation mark for including by optimal current second semantic segment
As the optimal corresponding target punctuation mark of current second semantic segment;
Second semantic segment determination subelement, for according to the optimal corresponding target punctuate symbol of current second semantic segment
Number, obtain next second semantic segment.
Optionally, the second optimal result acquiring unit may include:
Add subelement, for according to from back to front sequence or vertical sequence, according to described each optimal
The corresponding target punctuation mark of second semantic segment adds punctuation mark to the global word sequence, described pending to obtain
The corresponding optimal punctuate of text adds result.
Optionally, the punctuate addition processing module 303 may include:
As a result exhaustive submodule adds result for obtaining the corresponding a variety of punctuates of the global word sequence;
Probabilistic language model determination sub-module, for determining the corresponding probabilistic language model of the punctuate addition result;With
And
As a result submodule is selected, for selecting language mould from the corresponding a variety of punctuates addition results of the overall situation word sequence
The optimal punctuate addition of type probability as the corresponding optimal punctuate of the pending text as a result, add result.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 4 be a kind of device for information processing shown according to an exemplary embodiment as terminal when block diagram.
For example, the terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet
Equipment, Medical Devices, body-building equipment, personal digital assistant etc..
With reference to Fig. 4, terminal 900 may include following one or more components:Processing component 902, memory 904, power supply
Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and
Communication component 916.
The integrated operation of 902 usual control terminal 900 of processing component, such as with display, call, data communication, phase
Machine operates and record operates associated operation.Processing element 902 may include that one or more processors 920 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just
Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate
Interaction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in terminal 900.These data are shown
Example includes instruction for any application program or method that are operated in terminal 900, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 may include power management system
System, one or more power supplys and other generated with for terminal 900, management and the associated component of distribution electric power.
Multimedia component 908 is included in the screen of one output interface of offer between the terminal 900 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion
The boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 908 includes a front camera and/or rear camera.When terminal 900 is in operation mode, mould is such as shot
When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting
Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike
Wind (MIC), when terminal 900 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set
Part 916 is sent.In some embodiments, audio component 910 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 912 provide interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commented
Estimate.For example, sensor module 914 can detect the state that opens/closes of terminal 900, and the relative positioning of component, for example, it is described
Component is the display and keypad of terminal 900, and sensor module 914 can be with 900 1 components of detection terminal 900 or terminal
Position change, the existence or non-existence that user contacts with terminal 900,900 orientation of terminal or acceleration/deceleration and terminal 900
Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal
900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation
In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 900 can be believed by one or more application application-specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of terminal 900 to complete the above method.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 5 be a kind of device for information processing shown according to an exemplary embodiment as server when frame
Figure.The server 1900 can generate bigger difference because configuration or performance are different, may include in one or more
Central processor (central processing units, CPU) 1922 (for example, one or more processors) and memory
1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or one with
Upper mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.It is stored in
The program of storage medium 1930 may include one or more modules (diagram does not mark), and each module may include to clothes
The series of instructions operation being engaged in device.Further, central processing unit 1922 could be provided as communicating with storage medium 1930,
The series of instructions operation in storage medium 1930 is executed on server 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 1932 of instruction, above-metioned instruction can be executed by the processor 1922 of server 1900 to complete the above method.
For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape,
Floppy disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal or
Server) processor execute when so that device is able to carry out a kind of processing method, the method includes:Obtain pending text
This;The pending text is segmented, to obtain the corresponding global word sequence of the pending text;To the global word
Sequence carries out punctuate addition processing, and result is added to obtain the corresponding optimal punctuate of the pending text;Wherein, the punctuate
Target punctuation mark is added in addition processing in the global word sequence between adjacent word, the optimal punctuate addition result corresponds to
Probabilistic language model it is optimal, the optimal punctuate addition result includes:At least one semantic segment, the semantic segment packet
It includes:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;Export the optimal punctuate addition knot
Fruit.
Optionally, described that punctuate addition processing is carried out to the global word sequence, it is corresponded to obtaining the pending text
Optimal punctuate add as a result, including:Using dynamic programming algorithm, punctuate addition processing is carried out to the global word sequence, with
Obtain the corresponding optimal punctuate addition result of the pending text.
Optionally, described to utilize dynamic programming algorithm, punctuate addition processing is carried out to the global word sequence, to obtain
The corresponding optimal punctuate addition of pending text is stated as a result, including:Obtain the corresponding word sequence set of the global word sequence;It presses
According to the subset sequence from small to large of the word sequence set, determine that each subset corresponds to optimal subset punctuate and adds by recursion mode
Add the target punctuation mark of result;It is optimal that the optimal subset punctuate adds the corresponding probabilistic language model of result;According to described in
The subset of word sequence set corresponds to the addition of optimal subset punctuate as a result, obtaining the corresponding optimal punctuate addition of the pending text
As a result.
Optionally, the subset of the continuous word sequence set includes:Preceding i continuous words of the pending text, 0<i≤
The word quantity M that the pending text includes, then the sequence of the subset according to the word sequence set from small to large, passes through
Recursion mode determines that each subset corresponds to the target punctuation mark of optimal subset punctuate addition result, including:According to preceding k continuous words
The target punctuation mark of corresponding optimal subset punctuate addition result, punctuate is added in the preceding i continuous words between adjacent word
Symbol is added paths with obtaining the corresponding at least one subset punctuate of a continuously words of the preceding i;Wherein, 0<k<I, k are just whole
Number;Using neural network language model, determine the subset punctuate add paths corresponding first semantic segment language model it is general
Rate;According to the probabilistic language model of first semantic segment, add paths middle selection language from least one subset punctuate
The optimal optimal subset punctuate of speech model probability adds paths;Add paths the punctuate symbol for including according to the optimal subset punctuate
Number, obtain the target punctuation mark that the preceding i continuous words correspond to optimal subset punctuate addition result.
Optionally, described to utilize dynamic programming algorithm, punctuate addition processing is carried out to the global word sequence, to obtain
The corresponding optimal punctuate addition of pending text is stated as a result, including:In the global word sequence punctuate is added between adjacent word
Symbol is added paths with obtaining the corresponding global punctuate of the global word sequence;According to vertical sequence, by movement side
Formula adds paths and its corresponding second semantic segment from the global punctuate middle the acquisitions part punctuate that adds paths;Wherein, no
Identical with the quantity of the included character cell of the second semantic segment, there is the character cell repeated in the second adjacent semantic segment,
The character cell includes:Word and/or punctuation mark;According to vertical sequence, optimal is determined by recursion mode
The corresponding target punctuation mark of two semantic segments;The optimal corresponding probabilistic language model of the second semantic segment is optimal;According to institute
The corresponding target punctuation mark of each the second optimal semantic segment is stated, the corresponding optimal punctuate addition of the pending text is obtained
As a result.
Optionally, described according to vertical sequence, determine that the second optimal semantic segment corresponds to by recursion mode
Target punctuation mark, including:Using N-gram language model and/or neural network language model, determine that current second is semantic
The corresponding probabilistic language model of segment;According to the corresponding probabilistic language model of current second semantic segment, from a variety of current
The second semantic segment in select optimal current second semantic segment;Include by optimal current second semantic segment
Punctuation mark is as the optimal corresponding target punctuation mark of current second semantic segment;According to optimal current second language
The corresponding target punctuation mark of adopted segment, obtains next second semantic segment.
Optionally, described according to the corresponding target punctuation mark of each second optimal semantic segment, obtain described wait for
The corresponding optimal punctuate addition of text is handled as a result, including:According to from back to front sequence or vertical sequence, according to
According to the corresponding target punctuation mark of each second optimal semantic segment, punctuation mark is added to the global word sequence, with
Obtain the corresponding optimal punctuate addition result of the pending text.
Optionally, described that punctuate addition processing is carried out to the global word sequence, it is corresponded to obtaining the pending text
Optimal punctuate add as a result, including:Obtain the corresponding a variety of punctuate addition results of the global word sequence;Determine the punctuate
Add the corresponding probabilistic language model of result;Language mould is selected from the corresponding a variety of punctuates addition results of the overall situation word sequence
The optimal punctuate addition of type probability as the corresponding optimal punctuate of the pending text as a result, add result.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention
Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or
Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Above to a kind of processing method provided by the present invention, a kind of processing unit and a kind of device for processing,
It is described in detail, principle and implementation of the present invention are described for specific case used herein, the above reality
The explanation for applying example is merely used to help understand the method and its core concept of the present invention;Meanwhile for the general technology of this field
Personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theory
Bright book content should not be construed as limiting the invention.
Claims (10)
1. a kind of processing method, which is characterized in that including:
Obtain pending text;
The pending text is segmented, to obtain the corresponding global word sequence of the pending text;
Punctuate addition processing is carried out to the global word sequence, to obtain the corresponding optimal punctuate addition knot of the pending text
Fruit;Wherein, the punctuate addition processing adds target punctuation mark in the global word sequence between adjacent word, described optimal
The corresponding probabilistic language model of punctuate addition result is optimal, and the optimal punctuate addition result includes:At least one semantic segment,
The semantic segment includes:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;
Export the optimal punctuate addition result.
2. according to the method described in claim 1, it is characterized in that, described carry out at punctuate addition the global word sequence
Reason, to obtain the corresponding optimal punctuate addition of the pending text as a result, including:
Using dynamic programming algorithm, punctuate addition processing is carried out to the global word sequence, to obtain the pending text pair
The optimal punctuate addition result answered.
3. according to the method described in claim 2, it is characterized in that, described utilize dynamic programming algorithm, to the global word order
Row carry out punctuate addition processing, to obtain the corresponding optimal punctuate addition of the pending text as a result, including:
Obtain the corresponding word sequence set of the global word sequence;
According to the subset sequence from small to large of the word sequence set, determine that each subset corresponds to optimal subset by recursion mode
Punctuate adds the target punctuation mark of result;It is optimal that the optimal subset punctuate adds the corresponding probabilistic language model of result;
Subset according to the word sequence set corresponds to the addition of optimal subset punctuate as a result, to obtain the pending text corresponding
Optimal punctuate adds result.
4. according to the method described in claim 3, it is characterized in that, the subset of the continuous word sequence set includes:It is described to wait for
Preceding i continuous words of processing text, 0<The word quantity M that i≤pending text includes, then it is described according to the word sequence collection
The sequence of the subset of conjunction from small to large determines that each subset corresponds to the target mark that optimal subset punctuate adds result by recursion mode
Point symbol, including:
The target punctuation mark that optimal subset punctuate adds result is corresponded to according to preceding k continuous words, in the preceding i continuous words
Punctuation mark is added between adjacent word, is added paths with obtaining the corresponding at least one subset punctuate of a continuously words of the preceding i;Its
In, 0<k<I, k are positive integer;
Using neural network language model, determine the subset punctuate add paths corresponding first semantic segment language model it is general
Rate;
According to the probabilistic language model of first semantic segment, add paths middle selection language from least one subset punctuate
The optimal optimal subset punctuate of speech model probability adds paths;
It adds paths the punctuation mark for including according to the optimal subset punctuate, obtains the preceding i continuous words and correspond to optimal son
Collect the target punctuation mark of punctuate addition result.
5. according to the method described in claim 2, it is characterized in that, described utilize dynamic programming algorithm, to the global word order
Row carry out punctuate addition processing, to obtain the corresponding optimal punctuate addition of the pending text as a result, including:
Punctuation mark is added between adjacent word in the global word sequence, to obtain the corresponding global mark of the global word sequence
Point adds paths;
According to vertical sequence, added paths middle acquisitions local punctuate addition road from the global punctuate by move mode
Diameter and its corresponding second semantic segment;Wherein, the quantity of the included character cell of different second semantic segments is identical, adjacent
There is the character cell repeated in the second semantic segment, the character cell includes:Word and/or punctuation mark;
According to vertical sequence, the corresponding target punctuation mark of the second optimal semantic segment is determined by recursion mode;
The optimal corresponding probabilistic language model of the second semantic segment is optimal;
According to the corresponding target punctuation mark of each second optimal semantic segment, it is corresponding most to obtain the pending text
Excellent punctuate adds result.
6. according to the method described in claim 5, it is characterized in that, described according to vertical sequence, pass through recursion mode
Determine the corresponding target punctuation mark of the second optimal semantic segment, including:
Using N-gram language model and/or neural network language model, the corresponding language mould of current second semantic segment is determined
Type probability;
According to the corresponding probabilistic language model of current second semantic segment, selected from a variety of the second current semantic segments
Optimal current second semantic segment;
The punctuation mark for including using optimal current second semantic segment is as optimal current second semantic segment
Corresponding target punctuation mark;
According to the optimal corresponding target punctuation mark of current second semantic segment, next second semantic segment is obtained.
7. according to the method described in claim 5, it is characterized in that, described correspond to according to each second optimal semantic segment
Target punctuation mark, obtain the corresponding optimal punctuate addition of the pending text as a result, including:
According to from back to front sequence or vertical sequence, it is corresponding according to each second optimal semantic segment
Target punctuation mark adds punctuation mark, to obtain the corresponding optimal punctuate of the pending text to the global word sequence
Add result.
8. according to the method described in claim 1, it is characterized in that, described carry out at punctuate addition the global word sequence
Reason, to obtain the corresponding optimal punctuate addition of the pending text as a result, including:
Obtain the corresponding a variety of punctuate addition results of the global word sequence;
Determine the corresponding probabilistic language model of the punctuate addition result;
The optimal punctuate of selection probabilistic language model adds knot from the overall situation word sequence corresponding a variety of punctuates addition results
Fruit adds result as the corresponding optimal punctuate of the pending text.
9. a kind of processing unit, which is characterized in that including:
Pending text acquisition module, for obtaining pending text;
Word-dividing mode, for being segmented to the pending text, to obtain the corresponding global word order of the pending text
Row;
Punctuate adds processing module, for carrying out punctuate addition processing to the global word sequence, to obtain the pending text
This corresponding optimal punctuate adds result;Wherein, the punctuate addition processing adds in the global word sequence between adjacent word
Add target punctuation mark, the corresponding probabilistic language model of the optimal punctuate addition result is optimal, the optimal punctuate addition knot
Fruit includes:At least one semantic segment, the semantic segment include:The continuous word of the overall situation word sequence, and/or, it is added with
The continuous word of punctuation mark;And
As a result output module adds result for exporting the optimal punctuate.
10. a kind of device for processing, which is characterized in that include memory and one or more than one program,
Either more than one program is stored in memory and is configured to be executed by one or more than one processor for one of them
The one or more programs include the instruction for being operated below:
Obtain pending text;
The pending text is segmented, to obtain the corresponding global word sequence of the pending text;
Punctuate addition processing is carried out to the global word sequence, to obtain the corresponding optimal punctuate addition knot of the pending text
Fruit;Wherein, the punctuate addition processing adds target punctuation mark in the global word sequence between adjacent word, described optimal
The corresponding probabilistic language model of punctuate addition result is optimal, and the optimal punctuate addition result includes:At least one semantic segment,
The semantic segment includes:The continuous word of the overall situation word sequence, and/or, it is added with the continuous word of punctuation mark;
Export the optimal punctuate addition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710162165.2A CN108628813B (en) | 2017-03-17 | 2017-03-17 | Processing method and device for processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710162165.2A CN108628813B (en) | 2017-03-17 | 2017-03-17 | Processing method and device for processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108628813A true CN108628813A (en) | 2018-10-09 |
CN108628813B CN108628813B (en) | 2022-09-23 |
Family
ID=63686639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710162165.2A Active CN108628813B (en) | 2017-03-17 | 2017-03-17 | Processing method and device for processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628813B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410949A (en) * | 2018-10-11 | 2019-03-01 | 厦门大学 | Content of text based on weighted finite state converter adds punctuate method |
CN110908583A (en) * | 2019-11-29 | 2020-03-24 | 维沃移动通信有限公司 | Symbol display method and electronic equipment |
CN111046649A (en) * | 2019-11-22 | 2020-04-21 | 北京捷通华声科技股份有限公司 | Text segmentation method and device |
CN111241810A (en) * | 2020-01-16 | 2020-06-05 | 百度在线网络技术(北京)有限公司 | Punctuation prediction method and device |
CN112685996A (en) * | 2020-12-23 | 2021-04-20 | 北京有竹居网络技术有限公司 | Text punctuation prediction method and device, readable medium and electronic equipment |
CN113053390A (en) * | 2021-03-22 | 2021-06-29 | 北京儒博科技有限公司 | Text processing method and device based on voice recognition, electronic equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070162272A1 (en) * | 2004-01-16 | 2007-07-12 | Nec Corporation | Text-processing method, program, program recording medium, and device thereof |
CN105609107A (en) * | 2015-12-23 | 2016-05-25 | 北京奇虎科技有限公司 | Text processing method and device based on voice identification |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
-
2017
- 2017-03-17 CN CN201710162165.2A patent/CN108628813B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070162272A1 (en) * | 2004-01-16 | 2007-07-12 | Nec Corporation | Text-processing method, program, program recording medium, and device thereof |
CN105609107A (en) * | 2015-12-23 | 2016-05-25 | 北京奇虎科技有限公司 | Text processing method and device based on voice identification |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN105786782A (en) * | 2016-03-25 | 2016-07-20 | 北京搜狗科技发展有限公司 | Word vector training method and device |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410949A (en) * | 2018-10-11 | 2019-03-01 | 厦门大学 | Content of text based on weighted finite state converter adds punctuate method |
CN109410949B (en) * | 2018-10-11 | 2021-11-16 | 厦门大学 | Text content punctuation adding method based on weighted finite state converter |
CN111046649A (en) * | 2019-11-22 | 2020-04-21 | 北京捷通华声科技股份有限公司 | Text segmentation method and device |
CN110908583A (en) * | 2019-11-29 | 2020-03-24 | 维沃移动通信有限公司 | Symbol display method and electronic equipment |
CN111241810A (en) * | 2020-01-16 | 2020-06-05 | 百度在线网络技术(北京)有限公司 | Punctuation prediction method and device |
CN112685996A (en) * | 2020-12-23 | 2021-04-20 | 北京有竹居网络技术有限公司 | Text punctuation prediction method and device, readable medium and electronic equipment |
CN112685996B (en) * | 2020-12-23 | 2024-03-22 | 北京有竹居网络技术有限公司 | Text punctuation prediction method and device, readable medium and electronic equipment |
CN113053390A (en) * | 2021-03-22 | 2021-06-29 | 北京儒博科技有限公司 | Text processing method and device based on voice recognition, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN108628813B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291690A (en) | Punctuate adding method and device, the device added for punctuate | |
CN108628813A (en) | Treating method and apparatus, the device for processing | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
CN110503942A (en) | A kind of voice driven animation method and device based on artificial intelligence | |
CN107632980A (en) | Voice translation method and device, the device for voiced translation | |
CN108363706A (en) | The method and apparatus of human-computer dialogue interaction, the device interacted for human-computer dialogue | |
CN110853617B (en) | Model training method, language identification method, device and equipment | |
CN107992812A (en) | A kind of lip reading recognition methods and device | |
EP3852044A1 (en) | Method and device for commenting on multimedia resource | |
CN111368541B (en) | Named entity identification method and device | |
CN107291704B (en) | Processing method and device for processing | |
CN110781305A (en) | Text classification method and device based on classification model and model training method | |
CN107992485A (en) | A kind of simultaneous interpretation method and device | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN107274903A (en) | Text handling method and device, the device for text-processing | |
CN110097890A (en) | A kind of method of speech processing, device and the device for speech processes | |
CN113362812A (en) | Voice recognition method and device and electronic equipment | |
CN107564526A (en) | Processing method, device and machine readable media | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN109002184A (en) | A kind of association method and device of input method candidate word | |
CN109977426A (en) | A kind of training method of translation model, device and machine readable media | |
CN108345612A (en) | A kind of question processing method and device, a kind of device for issue handling | |
CN108241690A (en) | A kind of data processing method and device, a kind of device for data processing | |
CN111583919A (en) | Information processing method, device and storage medium | |
WO2019101099A1 (en) | Video program identification method and device, terminal, system, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |