CN107291690A - Punctuate adding method and device, the device added for punctuate - Google Patents
Punctuate adding method and device, the device added for punctuate Download PDFInfo
- Publication number
- CN107291690A CN107291690A CN201710396130.5A CN201710396130A CN107291690A CN 107291690 A CN107291690 A CN 107291690A CN 201710396130 A CN201710396130 A CN 201710396130A CN 107291690 A CN107291690 A CN 107291690A
- Authority
- CN
- China
- Prior art keywords
- punctuate
- text
- target text
- target
- pending
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
A kind of device added the embodiments of the invention provide punctuate adding method and device, for punctuate, method therein is specifically included:Obtain pending text;For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;It is that the target text adds punctuate by neural network model if the first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, to obtain the corresponding second punctuate addition result of the target text.The embodiment of the present invention can improve the degree of accuracy of punctuate addition.
Description
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of punctuate adding method and device and one kind
The device added for punctuate.
Background technology
In the technical field of information processing such as the communications field and internet arena, needed in some application scenarios for some
Lack the file addition punctuate of punctuate, be corresponding text addition punctuate of voice identification result etc. for example, reading for convenience.
Existing scheme can be according to voice signal mute interval, be voice identification result corresponding text addition punctuate.
Specifically, the threshold value of Jing Yin length can be set first, if the length of mute interval when spoken user is spoken in voice signal
Degree exceedes the threshold value, then adds punctuate on corresponding position;, whereas if Jing Yin when spoken user is spoken in voice signal
The length at interval is not less than the threshold value, then without punctuate.
However, inventor has found during the embodiment of the present invention is realized, different spoken users often have different
Word speed, so, the mute interval in existing scheme according to voice signal is the corresponding text addition punctuate of voice identification result,
The degree of accuracy of punctuate addition will be influenceed.If for example, the word speed of spoken user is too fast, without interval or interval between sentence
It is short so that very much less than threshold value, then will not be that text adds any punctuate.
The content of the invention
In view of the above problems, it is proposed that the embodiment of the present invention overcomes above mentioned problem or at least in part to provide one kind
Punctuate adding method, punctuate adding set, the device added for punctuate solved the above problems, the embodiment of the present invention can be carried
The degree of accuracy of high punctuate addition.
In order to solve the above problems, the invention discloses a kind of punctuate adding method, including:Obtain pending text;For
The pending text addition punctuate, to obtain the corresponding first punctuate addition result of the pending text;If described first
Punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through neural network model
Punctuate is added for the target text, to obtain the corresponding second punctuate addition result of the target text.
Alternatively, described is that the target text adds punctuate by neural network model, including:To the target text
Participle is carried out, to obtain corresponding second word sequence;Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence;
Using neutral net language model, the corresponding language model scores of candidate's punctuate addition result are determined;From second word
The optimal candidate's punctuate addition result of selection language model scores, is used as institute in the corresponding a variety of candidate's punctuate addition results of sequence
State the corresponding second punctuate addition result of target text.
Alternatively, described is that the target text adds punctuate by neural network model, including:Turned by neutral net
Mold changing type is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text;Wherein, institute
It is to be obtained according to parallel corpora training to state neutral net transformation model, and the parallel corpora includes:Source language material and destination end language
Material, the destination end language material is the corresponding punctuate of each vocabulary in the source language material.
Alternatively, described is that the target text adds punctuate by neutral net transformation model, including:To the target
Text is encoded, to obtain the corresponding source hidden layer state of the target text;Model according to neutral net transformation model
Parameter, source hidden layer state corresponding to the target text is decoded, and is belonged to obtaining each vocabulary in the target text
The probability of candidate's punctuate;Belong to the probability of candidate's punctuate according to each vocabulary in target text, obtain the target text corresponding
Second punctuate adds result.
Alternatively, it is described to add punctuate for the pending text, including:Treated by N-gram language model to be described
Handle text addition punctuate.
Alternatively, it is described that punctuate is added for the pending text by N-gram language model, including:Treated to described
Handle text and carry out participle, to obtain corresponding first word sequence of the pending text;It is adjacent in first word sequence
Punctuate is added between word, is added paths with obtaining the corresponding global punctuate of first word sequence;According to vertical order,
Added paths by move mode from the global punctuate and middle obtain local punctuate and add paths and its corresponding first semantic piece
Section;Wherein, the quantity that different first semantic segments include character cell is identical, and the first adjacent semantic segment has what is repeated
Character cell, the character cell includes:Word and/or punctuate;According to vertical order, determined by recursion mode optimal
The corresponding target punctuate of the first semantic segment;The optimal corresponding language model scores of the first semantic segment are optimal, pass through N members
Grammatical language model determines the corresponding language model scores of first semantic segment;According to each first optimal semantic piece
The corresponding target punctuate of section, obtains the corresponding first punctuate addition result of the pending text.
Alternatively, it is described according to vertical order, optimal the first semantic segment correspondence is determined by recursion mode
Target punctuate, including:Using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment;According to
According to the corresponding language model scores of current first semantic segment, select optimal from a variety of the first current semantic segments
Current first semantic segment;The punctuate that optimal current first semantic segment is included is used as described optimal current first
The corresponding target punctuate of semantic segment;According to the optimal corresponding target punctuate of current first semantic segment, next first is obtained
Semantic segment.
On the other hand, the invention discloses a kind of punctuate adding set, including:
Text acquisition module, for obtaining pending text;
First punctuate add module, for adding punctuate for the pending text, to obtain the pending text pair
The the first punctuate addition result answered;And
Second punctuate add module, for including number of words more than number of words threshold value in first punctuate addition result and not wrapping
It is that the target text adds punctuate by neural network model, to obtain the target during target text containing preset punctuate
The corresponding second punctuate addition result of text.
Alternatively, the second punctuate add module includes:
Second participle submodule, for carrying out participle to the target text, to obtain corresponding second word sequence;
Candidate result acquisition submodule, for obtaining the corresponding a variety of candidate's punctuate addition results of second word sequence;
Second model score determining unit, for utilizing neutral net language model, determines candidate's punctuate addition knot
Really corresponding language model scores;
Second selecting unit, for selecting language from the corresponding a variety of candidate's punctuate addition results of second word sequence
The optimal candidate's punctuate addition result of model score, result is added as corresponding second punctuate of the target text.
Alternatively, the second punctuate add module includes:
Model treatment submodule, for being that the target text adds punctuate by neutral net transformation model, to obtain
The corresponding second punctuate addition result of the target text;Wherein, the neutral net transformation model is according to parallel corpora instruction
Get, the parallel corpora includes:Source language material and destination end language material, the destination end language material are each in the source language material
The corresponding punctuate of vocabulary.
Alternatively, the model treatment submodule includes:
Coding unit, for being encoded to the target text, to obtain the corresponding source hidden layer of the target text
State;
Decoding unit, for the model parameter according to neutral net transformation model, source corresponding to the target text
Hidden layer state is decoded, to obtain the probability that each vocabulary in the target text belongs to candidate's punctuate;
As a result determining unit, the probability for belonging to candidate's punctuate according to each vocabulary in target text, obtain the target
The corresponding second punctuate addition result of text.
Alternatively, the first punctuate add module is marked by N-gram language model for the pending text addition
Point, the first punctuate add module includes:
First participle submodule, for carrying out participle to the pending text, to obtain the pending text correspondence
The first word sequence;
First addition submodule, for adding punctuate between adjacent word in first word sequence, to obtain described the
The corresponding global punctuate of one word sequence adds paths;
Local message acquisition submodule, for according to vertical order, by move mode from the global punctuate
The middle part punctuate that obtains that adds paths adds paths and its corresponding first semantic segment;Wherein, different first semantic segment institutes
Quantity comprising character cell is identical, and the first adjacent semantic segment has the character cell repeated, and the character cell includes:
Word and/or punctuate;
Recursion submodule, for according to vertical order, the first optimal semantic segment to be determined by recursion mode
Corresponding target punctuate;The optimal corresponding language model scores of the first semantic segment are optimal, true by N-gram language model
Determine the corresponding language model scores of first semantic segment;
As a result acquisition submodule, for according to the corresponding target punctuate of each first optimal semantic segment, obtaining institute
State the corresponding first punctuate addition result of pending text.
Alternatively, the recursion submodule includes:
First model score determining unit, for utilizing N-gram language model, it is determined that current first semantic segment correspondence
Language model scores;
First choice unit, for according to the corresponding language model scores of current first semantic segment, working as from a variety of
Optimal current first semantic segment is selected in the first preceding semantic segment;
Target punctuate determining unit, for the punctuate that includes optimal current first semantic segment as it is described most
The excellent corresponding target punctuate of current first semantic segment;
Semantic segment update module, for according to the optimal corresponding target punctuate of current first semantic segment, obtaining down
One first semantic segment.
Alternatively, the result acquisition submodule includes:
Target punctuate adding device, for according to order from back to front or vertical order, according to described each
The corresponding target punctuate of the first optimal semantic segment, adds punctuate, to obtain the pending text to first word sequence
This corresponding first punctuate addition result.
Another further aspect, the invention discloses a kind of device added for punctuate, includes memory, and one or
More than one program, one of them or more than one program storage is configured to by one or one in memory
Individual above computing device is one or more than one program bag contains the instruction for being used for being operated below:Obtain pending
Text;For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;If institute
State the first punctuate addition result to exceed number of words threshold value including number of words and do not include the target text of preset punctuate, then pass through nerve net
Network model is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.
Another aspect, the invention discloses a kind of machine readable media, is stored thereon with instruction, when by one or many
During individual computing device so that the foregoing punctuate adding method of device.
The embodiment of the present invention includes advantages below:
The embodiment of the present invention can include number of words in first punctuate addition result and exceed number of words threshold value and not comprising pre-
It is that the target text adds punctuate by neural network model, to obtain the mesh in the case of the target text for putting punctuate
Mark the corresponding second punctuate addition result of text.Because neural network model can represent a vocabulary by term vector, and
The semantic distance between vocabulary is characterized by the distance between term vector, such embodiment of the present invention can be by a vocabulary correspondence
Numerous contexts participate in the training of neural network model so that the neural network model possesses accurate punctuate addition energy
Power;Therefore, punctuate is added for the pending text by neural network model, the first punctuate can be solved to a certain extent
The problem of very long one section of text does not add punctuate in addition result, and then the degree of accuracy of punctuate addition can be improved.
Brief description of the drawings
Fig. 1 is a kind of example arrangement schematic diagram of speech recognition system of the present invention;
Fig. 2 is a kind of step flow chart of punctuate adding method embodiment of the present invention;
Fig. 3 is that a kind of punctuate of word sequence of the embodiment of the present invention adds the schematic diagram of processing procedure;
Fig. 4 is a kind of structured flowchart of punctuate adding set embodiment of the present invention;
Fig. 5 be according to an exemplary embodiment it is a kind of for punctuate add device as block diagram during terminal;
And
Fig. 6 be according to an exemplary embodiment it is a kind of for punctuate add device as frame during server
Figure.
Embodiment
In order to facilitate the understanding of the purposes, features and advantages of the present invention, it is below in conjunction with the accompanying drawings and specific real
Applying mode, the present invention is further detailed explanation.
Scheme is added the embodiments of the invention provide a kind of punctuate, the program can add mark for pending text first
Point, to obtain the corresponding first punctuate addition result of the pending text;Then, include in first punctuate addition result
Number of words is the target by neural network model more than number of words threshold value and in the case of the target text not comprising preset punctuate
Text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.
The problem of not adding punctuate for very long one section of text in the first punctuate addition result, the embodiment of the present invention can be with
Adding result in first punctuate includes number of words more than number of words threshold value and in the case of the target text not comprising preset punctuate,
It is that the target text adds punctuate by neural network model, is tied with obtaining the corresponding second punctuate addition of the target text
Really.Because neural network model can represent a vocabulary by term vector, and characterized by the distance between term vector
The corresponding numerous contexts of one vocabulary can be participated in neutral net by the semantic distance between vocabulary, such embodiment of the present invention
The training of model so that the neural network model possesses accurate punctuate addition ability;Therefore, it is institute by neural network model
Pending text addition punctuate is stated, very long one section of text in the first punctuate addition result can be solved to a certain extent and is not added
The problem of punctuating, and then the degree of accuracy of punctuate addition can be improved.
The embodiment of the present invention can apply to need to add any applied field of punctuate in speech recognition, voiced translation etc.
Scape, it will be understood that the embodiment of the present invention is not any limitation as specific application scenarios.
Punctuate adding method provided in an embodiment of the present invention can be applied to the application environment of the devices such as terminal or server
In.Alternatively, above-mentioned terminal can include but is not limited to:Smart mobile phone, tablet personal computer, pocket computer on knee, vehicle mounted electric
Brain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic services
Device, is serviced for providing punctuate addition to client.
Punctuate adding method provided in an embodiment of the present invention is applicable to the processing processing of the language such as Chinese, Japanese, Korean,
The degree of accuracy for improving punctuate addition.It is appreciated that being added the language of punctuate the need for any in present invention implementation
In the scope of application of the punctuate adding method method of example.
Reference picture 1, shows a kind of example arrangement schematic diagram of speech recognition system of the present invention, it can specifically be wrapped
Include:Speech recognition equipment 101 and punctuate adding set 102.Wherein, speech recognition equipment 101 and punctuate adding set 102 can be with
As single device (including server or terminal), it can be arranged at jointly in same device;It is appreciated that of the invention
Embodiment is not any limitation as the specific set-up mode of speech recognition equipment 101 and punctuate adding set 102.
Wherein, speech recognition equipment 101 can be used for the voice signal of spoken user being converted to text message, specifically,
Speech recognition equipment 101 can export voice identification result.In actual applications, spoken user can be the scene of voiced translation
The middle user for talking and sending voice signal, then can receive the language of spoken user by microphone or other voice collecting devices
Message number, and send received voice signal to speech recognition equipment 101;Or, the speech recognition equipment 101 can have
Receive the function of the voice signal of spoken user.
Alternatively, speech recognition equipment 101 can be converted to the voice signal of spoken user using speech recognition technology
Text message.If the voice signal of user's spoken user is denoted as into S, corresponded after carrying out a series of processing to S
Phonetic feature sequence O, be denoted as O={ O1, O2..., Oi..., OT, wherein OiIt is i-th of phonetic feature, T is that phonetic feature is total
Number.The corresponding sentences of voice signal S are considered as a word string being made up of many words, are denoted as W={ w1, w2..., wn}.Language
The process of sound identification is exactly, according to known phonetic feature sequence O, to obtain most probable word string W, wherein, T, i, n are positive integer.
Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people
Sound feature sets up speech model, passes through the analysis of the voice signal to input, the feature needed for extracting, to set up speech recognition institute
The template needed;It is the feature and the template ratio that user is inputted to voice that the process that voice is identified is inputted to user
Compared with process, finally determine with the user input the optimal Template of voice match, so as to obtain the result of speech recognition.Tool
The speech recognition algorithm of body, training that can be using the hidden Markov model based on statistics and recognizer, can also use base
In the training of neutral net and recognizer, the recognizer matched based on dynamic time consolidation etc. other algorithms, the present invention
Embodiment is not any limitation as specific speech recognition process.
Punctuate adding set 102 can be connected with speech recognition equipment 101, and it can receive speech recognition equipment 101 and send out
The voice identification result sent, for the voice identification result addition punctuate received, specifically, it can know the voice received
Other result adds punctuate, to obtain the pending text corresponding first for pending text first as pending text
Punctuate adds result;Then, include number of words in first punctuate addition result and exceed number of words threshold value and not comprising preset punctuate
Target text in the case of, by neural network model be the target text add punctuate, to obtain the target text
Corresponding second punctuate adds result.
It is alternatively possible to according to the corresponding second punctuate addition result of the target text, to the pending text pair
The the first punctuate addition result answered carries out editing and processing, for example, above-mentioned editing and processing can be corresponding by the pending text
Target text in first punctuate addition result replaces with the corresponding second punctuate addition result of the target text, to obtain
State the corresponding final punctuate addition result of pending text.Certainly, it is above-mentioned to add the editing and processing of result only for the first punctuate
It is as alternative embodiment, in fact, can also be according to the corresponding first punctuate addition result of the pending text, to second
Punctuate addition result carries out editing and processing, to obtain the corresponding final punctuate addition result of the pending text;Or,
In the case that one punctuate addition result only includes target text, directly the second punctuate addition result can also be treated as described
Handle the corresponding final punctuate addition result of text.
In actual applications, can be by the corresponding final punctuate addition result output of the pending text.Alternatively, exist
Under the application scenarios of speech recognition, it is final that punctuate adding set 102 can export this to user or the corresponding client of user
Punctuate adds result;Under the application scenarios of voiced translation, punctuate adding set 102 can export this to machine translation apparatus most
Whole punctuate adds result.It is appreciated that those skilled in the art can determine the pending text according to actual application scenarios
This corresponding final punctuate adds the corresponding way of output of result, and the embodiment of the present invention is corresponding most for the pending text
Whole punctuate adds the corresponding specific way of output of result and is not any limitation as.
Embodiment of the method
Reference picture 2, shows a kind of step flow chart of punctuate adding method embodiment of the present invention, can specifically include
Following steps:
Step 201, the pending text of acquisition;
Step 202, for the pending text addition punctuate, added with obtaining corresponding first punctuate of the pending text
Plus result;
Exceed number of words threshold value if step 203, first punctuate addition result include number of words and do not include preset punctuate
Target text, then be that the target text adds punctuate by neural network model, to obtain the target text corresponding the
Two punctuates add result.
In the embodiment of the present invention, pending text can be used for representing to need to be added the text of punctuate, the pending text
Originally the text or voice that can be inputted from user by device, can be from other devices.It should be noted that on
Stating in pending text to include:A kind of language or more than one language, for example, can be with above-mentioned pending text
Including Chinese, the Chinese mixing with such as other language of English can also be included, the embodiment of the present invention is to specific pending
Text is not any limitation as.
In actual applications, the embodiment of the present invention can perform this hair by client end AP P (application, Application)
The punctuate adding method flow of bright embodiment, client application may operate in terminal, for example, the client application can be
Any APP run in terminal, then the client application can obtain pending text from the other application of terminal.Or, this
Inventive embodiments can perform the punctuate adding method flow of the embodiment of the present invention by the functional device of client application, then should
Functional device can obtain pending text from other functional devices.Or, the embodiment of the present invention can be performed by server
The punctuate adding method of the embodiment of the present invention.
In a kind of alternative embodiment of the present invention, step 201 can obtain according to the voice signal of spoken user and wait to locate
Text is managed, in such cases, step 201 can be converted to the voice signal of spoken user text message, and believe from the text
Pending text is obtained in breath.Or, the voice signal that step 201 can be directly from speech recognition equipment reception user is corresponding
Text message, and obtain from text information pending text.
In actual applications, step 201 can be according to practical application request, from the corresponding text of voice signal or user
Pending text is obtained in the text of input.It is alternatively possible to the interval time according to voice signal S, from voice signal S correspondences
Text in obtain pending text;For example, when voice signal S interval time being more than time threshold, when can be according to this
Between point determine corresponding separation, using the corresponding texts of voice signal S before the separation as pending text, and to this
The corresponding texts of voice signal S after separation are handled, to continue therefrom to obtain pending text.It is appreciated that this
Inventive embodiments from the corresponding text of voice signal or the text of user's input for obtaining the specific mistake of pending text
Journey is not any limitation as.
In actual applications, step 202 can use arbitrary punctuate addition manner for the pending text addition mark
Point.It is for instance possible to use the mute interval based on voice signal in existing scheme, is that the corresponding pending text of voice signal adds
Punctuate.
Can be the pending text addition punctuate by language model in a kind of alternative embodiment of the present invention.
In natural language processing field, language model is the probabilistic model set up for a kind of language or multilingual, it is therefore an objective to built
The distribution of probability of given appearance of the word sequence in language can be described by standing one., can be by specific to the embodiment of the present invention
The distribution of the probability of appearance of the given word sequence of language model description in language is referred to as language model scores.Alternatively, may be used
To obtain language material sentence from corpus, participle, and the word sequence obtained according to participle are carried out to the language material sentence, training is obtained
Above-mentioned language model.Alternatively, the given word sequence of language model description can carry punctuate, to realize for speech recognition knot
The punctuate addition processing of fruit.
In the embodiment of the present invention, language model can include:N-gram (N-gram) language model, and/or, nerve net
Network language model, wherein, neutral net language model may further include:RNNLM (Recognition with Recurrent Neural Network language model,
Recurrent neural Network Language Model), CNNLM (convolutional neural networks language model,
Convolutional Neural Networks Language Model), DNNLM (deep neural network language model, Deep
Neural Networks Language Model) etc..
Wherein, N-gram language models based on it is such a it is assumed that i.e. the appearance of n-th word only and above N-1 word phase
Close, and it is all uncorrelated to other any words, and the probability of whole sentence is exactly the product of each word probability of occurrence.The embodiment of the present invention passes through
N-gram language model is that pending text adds punctuate, because N-gram language model can be given according to language model scores
Go out relatively reasonable the first punctuate addition result, therefore the degree of accuracy of punctuate addition can be improved.
It is that the pending text adds above by N-gram language model in a kind of alternative embodiment of the present invention
Punctuate, can specifically include:Participle is carried out to the pending text, to obtain corresponding first word of the pending text
Sequence;It is that first word sequence adds punctuate by N-gram language model, is tied with obtaining corresponding first punctuate addition
Really.
In the embodiment of the present invention, corresponding a variety of candidate's punctuates can be added in first word sequence between adjacent word,
That is, can be according to the situation that a variety of candidate's punctuates are added between adjacent word in first word sequence, to the target word
Sequence carries out punctuate addition processing, so, and first word sequence is by that should have a variety of punctuates to add schemes and its corresponding the
One punctuate adds result.It is alternatively possible to determine that a variety of first punctuates add the language mould of result by N-gram language model
Type score, so, may finally obtain optimal the first punctuate addition result of language model scores.
It should be noted that those skilled in the art can be according to practical application request, it is determined that needing the candidate's mark added
Point, alternatively, above-mentioned candidate's punctuate can include:Comma, question mark, fullstop, exclamation mark, space etc., wherein, space can be played
The effect of word segmentation cuts little ice, for example, for English, space can be used for splitting different words, in
For text, space can be the punctuate cut little ice, it will be understood that the embodiment of the present invention for specific candidate's punctuate not
It is any limitation as.
Reference picture 3, shows that a kind of punctuate of word sequence of the embodiment of the present invention adds the schematic diagram of processing procedure, its
In, word order be classified as " hello/I be/Xiao Ming/be very glad/recognize you ", then " hello/I be/Xiao Ming/be very glad/recognize you "
It is possible to be added candidate's punctuate between adjacent word;In Fig. 3, " hello ", " I is ", " Xiao Ming ", " being very glad ", " recognizing you "
Represented respectively with rectangle Deng word, the punctuate such as comma, space, exclamation, question mark, fullstop is represented with circle respectively, then the head of word sequence
Can possess mulitpath between punctuate after individual word " hello " and end word " recognizing you ".
In another alternative embodiment of the present invention, it is possible to use dynamic programming algorithm, from a variety of of pending text
The global punctuate optimal global punctuate of middle selection that adds paths adds paths and its corresponding optimal the first punctuate addition result,
Wherein, optimal first punctuate addition result can realize the global optimum of language model scores, herein globally available in table
Show the corresponding entirety of pending text correspondence the first punctuate addition result, therefore the first optimal punctuate of the embodiment of the present invention adds
Plus result can improve the degree of accuracy of addition punctuate.Correspondingly, the step 202 is treated by N-gram language model to be described
The process that text adds punctuate is handled, can be included:
Step A1, participle is carried out to the pending text, to obtain corresponding first word sequence of the pending text;
Step A2, in first word sequence punctuate is added between adjacent word, to obtain the first word sequence correspondence
Global punctuate add paths;
Step A3, according to vertical order, added paths middle acquisition office from the global punctuate by move mode
Portion's punctuate adds paths and its corresponding first semantic segment;Wherein, different first semantic segments include the number of character cell
Amount is identical, and the first adjacent semantic segment has the character cell repeated, and the character cell can include:Word and/or punctuate;
Step A4, according to vertical order, the optimal corresponding mesh of the first semantic segment is determined by recursion mode
Mark punctuate;The optimal corresponding language model scores of the first semantic segment are optimal, and described is determined by N-gram language model
The corresponding language model scores of one semantic segment;
Step A5, according to the corresponding target punctuate of each first optimal semantic segment, obtain the pending text
Corresponding first punctuate adds result.
Step A1 to step A5 is added paths middle obtain by move mode from global punctuate according to vertical order
Length identical (identical comprising character cell quantity) and the first semantic segment that there is repetition, and according to vertical order,
The corresponding target punctuate of the first optimal semantic segment is determined by recursion mode.Wherein, the acquisition that global punctuate adds paths
Process is referred to Fig. 3, and the embodiment of the present invention is not any limitation as the specific acquisition process that global punctuate adds paths.It is local
Punctuate adds paths available for the part that global punctuate adds paths is represented, every kind of global punctuate adds paths can be to that should have
First semantic segment.
In actual applications, the corresponding language model scores of the first semantic segment can be determined by N-gram language models.It is false
If N=5, then the length of the first semantic segment can be 5, it is assumed that the numbering of the initial character unit of word sequence is 1, then can be according to
The order below of numbering:1-5,2-6,3-7,4-8 etc. from first punctuate add result in obtain corresponding length for 5 the
One semantic segment, and the corresponding language model scores of each first semantic segment are determined using N-gram language models, for example, will be each
First semantic segment inputs N-gram language models, then the exportable corresponding language model scores of N-gram language models.Can be with
Understand, the displacement between above-mentioned adjacent first semantic segment is intended only as example for 1, in fact, those skilled in the art
Displacement between above-mentioned adjacent first semantic segment can be determined, for example, the displacement is also according to practical application request
Can be 2,3 etc..
In a kind of alternative embodiment of the present invention, above-mentioned steps A4 passes through recursion mode according to vertical order
The corresponding target punctuate of the first optimal semantic segment is determined, can specifically be included:
Step A41, using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment;
Step A42, according to the corresponding language model scores of current first semantic segment, from a variety of the first current languages
Optimal current first semantic segment is selected in adopted fragment;
Step A43, the punctuate for including optimal current first semantic segment are used as described optimal current first
The corresponding target punctuate of semantic segment;
Step A44, according to the optimal corresponding target punctuate of current first semantic segment, obtain the next first semantic piece
Section.
It is corresponding first semantic that current first semantic segment can be used for representing that in recursive process local punctuate adds paths
Field, it is assumed that the numbering of current first semantic segment is k, k is positive integer, then can utilize N-gram language model, determine the
The k corresponding language model scores of the first semantic segment, and select language model scores from a variety of k-th of first semantic segments
Optimal k-th optimal of first semantic segments, the punctuate that k-th optimal of first semantic segments are included is used as corresponding mesh
Mark punctuate;And according to the optimal corresponding target punctuate of k-th of first semantic segments, (k+1) individual first semantic segment is obtained,
Wherein, (k+1) individual first semantic segment can be multiplexed the optimal corresponding target punctuate of k-th of first semantic segments.With Fig. 3
Exemplified by, it is assumed that the length of the first semantic segment is 5, optimal the 1st the first semantic segment for " hello/,/I am/space/small
It is bright ", then can be multiplexed optimal the 1st first semantic for the 2nd the first semantic segment " punctuate/I be/punctuate/Xiao Ming/punctuate "
The corresponding target punctuate of fragment, so, the 2nd the first semantic segment can be on the bases of " ,/I be/space/Xiao Ming/punctuate "
Upper addition punctuate, so, can select optimal punctuate from a variety of punctuates after " Xiao Ming ".
In actual applications, it is above-mentioned according to the corresponding target punctuate of each first optimal semantic segment, obtain described
The corresponding first punctuate addition result of pending text, can specifically include:According to order from back to front or from front to back
Order, according to the corresponding target punctuate of each first optimal semantic segment, punctuate is added to first word sequence, with
Obtain the corresponding first punctuate addition result of the pending text.That is, can be according to certain order, it is determined that global punctuate adds
Plus the corresponding target punctuate in each punctuate position (between adjacent word) in path, and obtain described pending according to above-mentioned target punctuate
The corresponding first punctuate addition result of text.
To sum up, for step A1 to step A5 punctuate adding procedure, repeated because the first adjacent semantic segment is present
Character cell, therefore next first semantic segment can be multiplexed the optimal corresponding target punctuate of current first semantic segment, because
This can reduce the operand needed for the acquisition of optimal punctuate addition result by recursion mode;Also, due to adjacent first language
There is displacement between adopted fragment, therefore the embodiment of the present invention can pass through the optimal language model scores of the first semantic segment
It is optimal to realize the optimal of the optimal language model scores of all first semantic segment correspondences.
Although N-gram language model has the advantages that processing speed is fast, however, because N-gram language model is adding
It is merely capable of seeing N-1 word (above) above during punctuating, and the mark in whole pending text can not be known
Point addition situation, therefore would be possible to occur the situation that very long one section of text in the first punctuate addition result does not add punctuate.
In the case of application scenarios applied to translation, in order to improve translation quality, often rely on punctuate and translated, that is, machine
Device translating equipment is translated generally directed to the text with punctuate, conversely, for being translated without the text of punctuate, will
Easily there is the problem of translation quality is low.Therefore, the first punctuate addition result obtained by N-gram language model may not
Meet the demand of machine translation.
In actual applications, the first punctuate addition result that can be obtained to step 202 judge, specifically, can be with
Judge whether the first punctuate addition result includes number of words and exceed number of words threshold value and the target text not comprising preset punctuate.On
Stating preset punctuate can be determined by those skilled in the art according to practical application request.For example, can be determined according to translation demand
State preset punctuate.The example of above-mentioned preset punctuate can include:Comma, question mark, fullstop, exclamation mark etc., the embodiment of the present invention pair
It is not any limitation as in specific preset punctuate.
Above-mentioned number of words threshold value can add the quantity for the individual character that result includes for first punctuate, for English, German
Deng the word being made up of alphabetic character, above-mentioned individual character can be equal to word;For Chinese, Japanese, Korean etc. by non-alphabetic word
The word of composition is accorded with, above-mentioned individual character can be single word.
Above-mentioned number of words threshold value can determine according to practical application request by those skilled in the art, for example, under initial situation,
Above-mentioned number of words threshold value can be default empirical value.Later stage, can be corresponding according to user feedback, and/or above-mentioned number of words threshold value
Translation quality, is adjusted to above-mentioned default empirical value.If for example, the corresponding translation quality of current number of words threshold value TH is less than pre-
Condition is put, then can be turned down on the basis of current number of words threshold value TH, for example, being adjusted to (TH-1).Alternatively, TH model
Enclosing to include:15 to 20, it will be understood that the embodiment of the present invention is not any limitation as specific number of words threshold value.
Step 203 can include number of words in first punctuate addition result and exceed number of words threshold value and not comprising preset punctuate
Target text in the case of, by neural network model be the target text add punctuate, to obtain the target text
Corresponding second punctuate adds result.Because neural network model can represent a vocabulary by term vector, and pass through word
The distance between vector characterizes the semantic distance between vocabulary, and such embodiment of the present invention can be corresponding numerous by a vocabulary
Context participates in the training of neural network model so that the neural network model possesses accurate punctuate addition ability;Therefore,
Punctuate is added for the pending text by neural network model, the first punctuate addition result can be solved to a certain extent
In very long one section of text the problem of do not add punctuate, and then the degree of accuracy of punctuate addition can be improved.
The embodiment of the present invention can provide the following technology for adding punctuate for the target text by neural network model
Scheme:
Technical scheme 1
In technical scheme 1, neural network model can be neutral net language model, be described by neural network model
The process of target text addition punctuate can include:Participle is carried out to the target text, to obtain corresponding second word sequence;
Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence;Using neutral net language model, the time is determined
Punctuate is selected to add the corresponding language model scores of result;From the corresponding a variety of candidate's punctuate addition results of second word sequence
Candidate's punctuate addition result that language model scores are optimal is selected, is tied as the corresponding second punctuate addition of the target text
Really.
Relative to N-gram language models, an advantage of such as RNNLM neutral net language model is:Can be true
Just fully predict next word above using all, therefore RNNLM can possess the language mould of adjustable length semantic segment
The descriptive power of type score, that is, RNNLM is applied to the semantic segment of wider length range, for example, the corresponding semantemes of RNNLM
The length range of fragment can be:1 to the second length threshold, wherein, the second length threshold can be more than the first length threshold.
In technical scheme 1, because RNNLM is applied to the semantic segment of wider length range, therefore can be by each candidate's punctuate
All semantic segments of result are added as an entirety, determine that candidate's punctuate adds all semantic segments of result by RNNLM
Corresponding language model scores.For example, candidate's punctuate is added into all character cells for including of result inputs RNNLM, then RNNLM
Exportable corresponding language model scores.
Technical scheme 2
In technical scheme 2, neural network model can be neutral net transformation model, and technical scheme 2 can add punctuate
Plus the problem of be converted to the problem of vocabulary punctuate is changed, the vocabulary punctuate conversion be specially each vocabulary in source language material is converted to
The corresponding punctuate of destination end, and turned by training obtained neutral net transformation model to handle the vocabulary punctuate based on parallel corpora
Change problem.
Correspondingly, it can include above by the process that neural network model is target text addition punctuate:Pass through
Neutral net transformation model is that the target text adds punctuate, is tied with obtaining the corresponding second punctuate addition of the target text
Really;Wherein, the neutral net transformation model can be to be obtained according to parallel corpora training, and the parallel corpora can include:
Source language material and destination end language material, the destination end language material are the corresponding punctuate of each vocabulary in the source language material.
In actual applications, the parallel corpora can include:Source language material and destination end language material, the destination end language material
Can be the corresponding punctuate of each vocabulary in the source language material.Generally, the corresponding punctuate of each vocabulary can be to add behind the vocabulary
Plus punctuate.
In actual applications, source language material can include:Several source sentences, destination end language material can be above-mentioned source
The corresponding punctuate of each vocabulary in sentence.For example, for source sentence " today weather how we go out play ", wherein each word
Converge corresponding destination end punctuate can for " _ ____!", wherein, " _ " represents not punctuate after correspondence vocabulary.
In a kind of alternative embodiment of the present invention, the process for obtaining neutral net transformation model is trained according to parallel corpora
It can include:According to neural network structure, the vocabulary of source is set up to the neutral net transformation model of the punctuate of destination end;And profit
With Learning Algorithm, parallel corpora is trained, to obtain the model parameter of the neutral net transformation model.
In a kind of alternative embodiment of the present invention, the neural network structure can include:RNN (Recognition with Recurrent Neural Network,
Recurrent Neural Networks), LSTM (shot and long term remember, Long Short-Term Memory) or GRU (doors
Control cycling element, Gated Recurrent Unit) etc..It is appreciated that those skilled in the art can be according to practical application need
Ask, using required neural network structure, it will be understood that the embodiment of the present invention is not limited for specific neural network structure
System.
Alternatively, above-mentioned neutral net transformation model can include:Mapping letter of the vocabulary of source to the punctuate of destination end
Number, the mapping function can be expressed as the form of conditional probability, such as P (y ︱ x) or p (yj︱ y < j, x), wherein, x represents source
Information (information of such as target text), y represents target client information (such as in target text each vocabulary corresponding punctuate);Generally
The accuracy rate for adding punctuate is higher, then the conditional probability is bigger.
In actual applications, neural network structure can include multiple neuronal layers, specifically, and the neuronal layers can be with
Including:Input layer, hidden layer and output layer, wherein, input layer is responsible for receiving source information, and is distributed to hidden layer, and hidden layer is born
The required calculating of duty simultaneously exports result of calculation to output layer, and output layer is responsible for output target client information namely result of calculation.At this
In a kind of alternative embodiment of invention, the model parameter of neutral net transformation model can include:Between input layer and hidden layer
In the second connection weight U and output layer and the offset parameter of hidden layer between first connection weight W, output layer and hidden layer
It is at least one, it will be understood that the embodiment of the present invention is not subject to for specific network transformation model and its corresponding model parameter
Limitation.
Parallel corpora is trained, the maximization target of neutral net transformation model is that given source client information x is exported just
True pointing information y probability.In actual applications, it is possible to use Learning Algorithm, parallel corpora is trained, and
Model parameter is optimized using the optimization method of such as stochastic gradient descent method, for example, above-mentioned optimization can be according to defeated
The error for going out layer seeks model parameter gradient, and model parameter is updated according to optimization method, can so realize nerve
The maximization target of network transformation model.Alternatively, Learning Algorithm can include:BP (error back propagation,
Error BackPropagation) algorithm, heredity etc., it will be understood that the embodiment of the present invention is for specific neural network learning
Algorithm and Learning Algorithm is utilized, the detailed process being trained to parallel corpora is not any limitation as.
In actual applications, the target text can be inputted to the neutral net transformation model that training is obtained, by the god
The target text is handled through network transformation model, and exports the corresponding second punctuate addition result of the target text.
It is that target text addition punctuate is related to above by neutral net transformation model in a kind of alternative embodiment of the present invention
The process that is handled the target text of neutral net transformation model can include:
Step S1, the target text is encoded, to obtain the corresponding source hidden layer state of the target text;
Step S2, the model parameter according to neutral net transformation model, source hidden layer shape corresponding to the target text
State is decoded, to obtain the probability that each vocabulary in the target text belongs to candidate's punctuate;
Each vocabulary belongs to the probability of candidate's punctuate in step S3, foundation target text, obtains the target text corresponding
Second punctuate adds result.
In actual applications, each vocabulary in target text can be converted into corresponding vocabulary vector by step S1 first, should
Vocabulary vector dimension can be identical with the size of vocabulary, but due to the size of vocabulary cause vocabulary vector dimension compared with
Greatly, the semantic relation in order to avoid dimension disaster and preferably between expression vocabulary and vocabulary, can be by the vocabulary DUAL PROBLEMS OF VECTOR MAPPING
To the semantic space of a low-dimensional, each vocabulary is by by the dense vector representation of a fixed dimension, and the dense vector is referred to as
Term vector, the distance between the term vector can weigh the similitude between vocabulary to a certain extent.Further, it is possible to utilize
The corresponding word sequence of neural network structure compression goal text, to obtain the compression expression of whole target text, namely target text
This corresponding source hidden layer state.It is alternatively possible to using activation primitive (such as sigmoid (the S types of neural network structure hidden layer
Function), tanh (hyperbolic tangent function) etc.), the corresponding word sequence of compression goal text, to obtain the corresponding source of target text
Hidden layer state, the embodiment of the present invention is not any limitation as the specific compress mode of the corresponding source hidden layer state of target text.
In a kind of alternative embodiment of the present invention, the source hidden layer state can include:The source hidden layer shape of forward direction
State, so, the hidden layer state of each vocabulary only have compressed the vocabulary before it in target text.Or, the source hidden layer state
It can include:The source hidden layer state of forward direction and backward source hidden layer state, so, the hidden layer shape of each vocabulary in target text
State not only have compressed the vocabulary before it, can be with the vocabulary behind compressor reducer, so can be corresponding numerous by a vocabulary
Context participates in the training of network transformation model so that the network transformation model possesses accurate punctuate addition ability.
In an embodiment of the present invention, step S2 can obtain source according to the corresponding source hidden layer state of target text
Hold corresponding context vector, according to the context vector, determine destination end hidden layer state, and according to the hidden layer state and
The model parameter of neutral net transformation model, determines that each vocabulary in the target text belongs to the probability of candidate's punctuate.
It should be noted that those skilled in the art can according to practical application request, it is determined that need adjacent words it
Between candidate's punctuate for adding, alternatively, above-mentioned candidate's punctuate can include:Comma, question mark, fullstop, exclamation mark, space etc., its
In, space " _ " can play a part of word segmentation or cut little ice, for example, for English, space can be used for dividing
Different words are cut, for Chinese, space can be the punctuate cut little ice, it will be understood that the embodiment of the present invention pair
It is not any limitation as in specific candidate's punctuate.
In a kind of alternative embodiment of the present invention, the corresponding context vector of source can be fixed vector, specifically,
The corresponding context vector of source can be the combination of all source hidden layer states of source.Can in the corresponding context vector of source
In the case of thinking fixed vector, each vocabulary of source is identical for the contribution of each target end position, but this has one
Fixed irrationality, for example, significantly larger for the contribution of target end position with the source position of destination end position consistency.It is above-mentioned
Reasonability is less problematic when source sentence comparison is short, but if source sentence comparison is long, shortcoming will be obvious, because
This will reduce the degree of accuracy of punctuate addition and easily increases operand.
The problem of degree of accuracy that can be brought for the corresponding context vector of above-mentioned source for fixed vector declines, at this
In a kind of alternative embodiment of invention, variable context vector can be used, accordingly, is changed above by neutral net
Model is that target text addition punctuate can also include:Step S3, determine the corresponding source position of the target text with
Alignment probability between the corresponding target end position of punctuate addition result;
The then model parameter of the step S2 according to neutral net transformation model, source corresponding to the target text is hidden
The process that layer state is decoded can include:According to the alignment probability and the corresponding source hidden layer shape of the target text
State, obtains the corresponding context vector of source;According to the context vector, destination end hidden layer state is determined;According to described hidden
The model parameter of layer state and neutral net transformation model, determines that each vocabulary belongs to the general of candidate's punctuate in the target text
Rate.
Above-mentioned alignment probability can be used for characterizing the matching degree between i-th of source position and j-th of target end position, according to
According to the alignment probability and the corresponding source hidden layer state of the target text, the corresponding context vector of source is obtained, so
The corresponding context vector of source can be made to increasingly focus on the part vocabulary in source, therefore can be reduced to a certain extent
Operand, and the degree of accuracy of punctuate addition can be improved.
The embodiment of the present invention can provide the corresponding source position of the target text mesh corresponding with punctuate addition result
That marks the alignment probability between end position is identified below mode:
Determination mode 1, model parameter and destination end hidden layer state according to neutral net transformation model, obtain the target
The probability that aligns between the target end position corresponding with punctuate addition result of the corresponding source position of text;Or
Determination mode 2, by relatively the source hidden layer state and destination end hidden layer state, obtain the target text pair
The probability that aligns between the target end position corresponding with punctuate addition result of the source position answered;Or
Determination mode 3, the corresponding alignment source position of target end position is determined, determine that each target end position is corresponding
Alignment probability between source of aliging position.
Wherein it is determined that mode 1 can be according to neutral net transformation model model parameter and destination end hidden layer state, obtain
Align probability, specifically, can input softmax functions to the product of the first connection weight and destination end hidden layer state, by
Softmax functions output alignment probability.Wherein, softmax functions are normalized function, and it can map the value of a pile real number
It is interval to [0,1], and make they and be 1.
Determination mode 2 can be compared by alignment function to the source hidden layer state and destination end hidden layer state.
The example of alignment function can for scoring functions index with based on hidden layer state to the summed result of the index of scoring functions it
Between ratio, scoring functions can be the function related with destination end hidden layer state to source hidden layer state, it will be understood that this hair
Bright embodiment is not any limitation as specific alignment function.
Determination mode 3 can be for the corresponding alignment source position p of j-th of target end position generationj, and take window in source
Mouth [pj-D,pj+ D], D is positive integer, then context vector can be by the weighted average of the source hidden layer state in calculation window
Obtain, if window exceeds the border of source sentence, be defined by the border of sentence.Wherein, pjCan be preset value, can also
The value obtained for On-line Estimation, the embodiment of the present invention is for alignment source position pjSpecific determination process be not worth.
It is described in detail above by determination mode 1 to the determination process of 3 pairs of alignment probability of determination mode, Ke Yili
Solution, those skilled in the art can according to practical application request, using any of determination mode 1 into determination mode 3, or,
Other determination modes can also be used, the embodiment of the present invention is not any limitation as the specific determination process for the probability that aligns.
Each vocabulary belongs to the probability of candidate's punctuate in the target text that step S3 can be obtained according to step S2, obtains described
Target text corresponding second punctuate addition result, specifically, can by for a vocabulary by candidate's punctuate of maximum probability
It is used as its corresponding target punctuate.Further, it is possible to according to the corresponding target punctuate of each vocabulary in target text, obtain target text
This corresponding second punctuate addition result, punctuate addition result can be the target text that processing is added by punctuate.For example,
The corresponding punctuate addition result of target text " it is that Nice to see you by Xiao Ming that you, which get well me, " can be for " hello, and I is Xiao Ming, very high
It is emerging to recognize you ".Certainly, punctuate addition result can be the corresponding target punctuate of each vocabulary in target text, it will be understood that this
The embodiments mode that inventive embodiments add result for the punctuate is not any limitation as.
By neural network model it is that the target text adds punctuate above by technical scheme 1 to 2 pairs of technical scheme
Process be described in detail, it will be understood that those skilled in the art can be according to practical application request, using technical scheme
1 and technical scheme 2 in any, or, it is target text addition punctuate that can also use by neural network model
Other processes, for example, the source of the neural network model used can be pending text, destination end can be to add by punctuate
Plus the text of processing etc., it will be understood that the embodiment of the present invention by neural network model for the target text for being added
The detailed process of punctuate is not any limitation as.
In a kind of alternative embodiment of the present invention, the target text that can be obtained according to step 203 corresponding the
Two punctuates add result, and the corresponding first punctuate addition result of the pending text obtained to step 202 carries out Editorial Services
Reason, is replaced for example, corresponding first punctuate of the pending text can be added the target text in result by above-mentioned editing and processing
The corresponding second punctuate addition result of the target text is changed to, is added with obtaining the corresponding final punctuate of the pending text
As a result.
In actual applications, can be by the corresponding final punctuate addition result output of the pending text.Alternatively, exist
Under the application scenarios of speech recognition, the final punctuate addition result can be exported to user or the corresponding client of user;
Under the application scenarios of voiced translation, the final punctuate addition result can be exported to machine translation apparatus.It is appreciated that this area
Technical staff can determine that the corresponding final punctuate addition result of the pending text is corresponding according to actual application scenarios
The way of output, the embodiment of the present invention adds the corresponding specific output side of result for the corresponding final punctuate of the pending text
Formula is not any limitation as.
To sum up, the punctuate adding method of the embodiment of the present invention, result can be added in first punctuate super including number of words
Cross number of words threshold value and not comprising preset punctuate target text in the case of, by neural network model be the target text add
Punctuate, to obtain the corresponding second punctuate addition result of the target text.Due to neural network model can by word to
Amount characterizes the semantic distance between vocabulary, the so present invention in fact to represent a vocabulary by the distance between term vector
A vocabulary corresponding numerous contexts can be participated in the training of neural network model by applying example so that the neural network model has
Standby accurate punctuate adds ability;Therefore, punctuate is added for the pending text by neural network model, can be certain
Very long one section of text is solved in the first punctuate addition result in degree the problem of do not add punctuate, and then punctuate can be improved add
Plus the degree of accuracy.
It should be noted that for embodiment of the method, in order to be briefly described, therefore it is dynamic that it is all expressed as to a series of motion
Combine, but those skilled in the art should know, the embodiment of the present invention is not limited by described athletic performance order
System, because according to the embodiment of the present invention, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology
Personnel should also know, embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs
Surely necessary to being the embodiment of the present invention.
Device embodiment
Reference picture 4, shows a kind of structured flowchart of punctuate adding set embodiment of the present invention, be able to can specifically wrap
Include:
Text acquisition module 401, for obtaining pending text;
First punctuate add module 402, for adding punctuate for the pending text, to obtain the pending text
Corresponding first punctuate adds result;And
Second punctuate add module 403, exceedes number of words threshold for that can include number of words in first punctuate addition result
It is that the target text adds punctuate by neural network model, to obtain when being worth and not including the target text of preset punctuate
The corresponding second punctuate addition result of the target text.
Alternatively, the first punctuate add module 402 is marked by N-gram language model for the pending text addition
Point, the first punctuate add module 402 can include:
First participle submodule, for carrying out participle to the pending text, to obtain the pending text correspondence
The first word sequence;
First addition submodule, for adding punctuate between adjacent word in first word sequence, to obtain described the
The corresponding global punctuate of one word sequence adds paths;
Local message acquisition submodule, for according to vertical order, by move mode from the global punctuate
The middle part punctuate that obtains that adds paths adds paths and its corresponding first semantic segment;Wherein, different first semantic segment institutes
Quantity comprising character cell is identical, and adjacent the first semantic segment has a character cell repeated, and the character cell can be with
Including:Word and/or punctuate;
Recursion submodule, for according to vertical order, the first optimal semantic segment to be determined by recursion mode
Corresponding target punctuate;The optimal corresponding language model scores of the first semantic segment are optimal, true by N-gram language model
Determine the corresponding language model scores of first semantic segment;
As a result acquisition submodule, for according to the corresponding target punctuate of each first optimal semantic segment, obtaining institute
State the corresponding first punctuate addition result of pending text.
Alternatively, the recursion submodule can include:
First model score determining unit, for utilizing N-gram language model, it is determined that current first semantic segment correspondence
Language model scores;
First choice unit, for according to the corresponding language model scores of current first semantic segment, working as from a variety of
Optimal current first semantic segment is selected in the first preceding semantic segment;
Target punctuate determining unit, for the punctuate that includes optimal current first semantic segment as it is described most
The excellent corresponding target punctuate of current first semantic segment;
Semantic segment update module, for according to the optimal corresponding target punctuate of current first semantic segment, obtaining down
One first semantic segment.
Alternatively, the result acquisition submodule can include:
Target punctuate adding device, for according to order from back to front or vertical order, according to described each
The corresponding target punctuate of the first optimal semantic segment, adds punctuate, to obtain the pending text to first word sequence
This corresponding first punctuate addition result.
Alternatively, the second punctuate add module 403 can include:
Second participle submodule, for carrying out participle to the target text, to obtain corresponding second word sequence;
Candidate result acquisition submodule, for obtaining the corresponding a variety of candidate's punctuate addition results of second word sequence;
Second model score determining unit, for utilizing neutral net language model, determines candidate's punctuate addition knot
Really corresponding language model scores;
Second selecting unit, for selecting language from the corresponding a variety of candidate's punctuate addition results of second word sequence
The optimal candidate's punctuate addition result of model score, result is added as corresponding second punctuate of the target text.
Alternatively, the second punctuate add module 403 can include:
Model treatment submodule, for being that the target text adds punctuate by neutral net transformation model, to obtain
The corresponding second punctuate addition result of the target text;Wherein, the neutral net transformation model is according to parallel corpora instruction
Get, the parallel corpora can include:Source language material and destination end language material, the destination end language material are the source language material
In the corresponding punctuate of each vocabulary.
Alternatively, the model treatment submodule can include:
Coding unit, for being encoded to the target text, to obtain the corresponding source hidden layer of the target text
State;
Decoding unit, for the model parameter according to neutral net transformation model, source corresponding to the target text
Hidden layer state is decoded, to obtain the probability that each vocabulary in the target text belongs to candidate's punctuate;
As a result determining unit, the probability for belonging to candidate's punctuate according to each vocabulary in target text, obtain the target
The corresponding second punctuate addition result of text.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related
Part illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described by the way of progressive, what each embodiment was stressed be with
Between the difference of other embodiment, each embodiment identical similar part mutually referring to.
On the device in above-described embodiment, wherein modules perform the concrete mode of operation in relevant this method
Embodiment in be described in detail, explanation will be not set forth in detail herein.
The embodiment of the present invention additionally provides a kind of punctuate adding set, includes memory, and one or one with
On program, one of them or more than one program storage is configured to by one or more than one in memory
Computing device is one or more than one program bag contains the instruction for being used for being operated below:Obtain pending text;
For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;If described
One punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through neutral net mould
Type is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.
Fig. 5 be according to an exemplary embodiment it is a kind of for punctuate add device as block diagram during terminal.
For example, terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, flat board is set
It is standby, Medical Devices, body-building equipment, personal digital assistant etc..
Reference picture 5, terminal 900 can include following one or more assemblies:Processing assembly 902, memory 904, power supply
Component 906, multimedia groupware 908, audio-frequency assembly 910, the interface 912 of input/output (I/O), sensor cluster 914, and
Communication component 916.
The integrated operation of the usual control terminal 900 of processing assembly 902, such as with display, call, data communication, phase
Machine operates the operation associated with record operation.Treatment element 902 can refer to including one or more processors 920 to perform
Order, to complete all or part of step of above-mentioned method.In addition, processing assembly 902 can include one or more modules, just
Interaction between processing assembly 902 and other assemblies.For example, processing assembly 902 can include multi-media module, it is many to facilitate
Interaction between media component 908 and processing assembly 902.
Memory 904 is configured as storing various types of data supporting the operation in terminal 900.These data are shown
Example includes the instruction of any application program or method for being operated in terminal 900, and contact data, telephone book data disappears
Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group
Close and realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable to compile
Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 can include power management system
System, one or more power supplys, and other components associated with generating, managing and distributing electric power for terminal 900.
Multimedia groupware 908 is included in the screen of one output interface of offer between the terminal 900 and user.One
In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch sensings
Device is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or sliding motion
The border of action, but also the detection duration related to the touch or slide and pressure.In certain embodiments,
Multimedia groupware 908 includes a front camera and/or rear camera.When terminal 900 is in operator scheme, mould is such as shot
When formula or video mode, front camera and/or rear camera can receive the multi-medium data of outside.Each preposition shooting
Head and rear camera can be a fixed optical lens systems or with focusing and optical zoom capabilities.
Audio-frequency assembly 910 is configured as output and/or input audio signal.For example, audio-frequency assembly 910 includes a Mike
Wind (MIC), when terminal 900 be in operator scheme, when such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The audio signal received can be further stored in memory 904 or via communication set
Part 916 is sent.In certain embodiments, audio-frequency assembly 910 also includes a loudspeaker, for exports audio signal.
I/O interfaces 912 is provide interface between processing assembly 902 and peripheral interface module, above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor cluster 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commented
Estimate.For example, sensor cluster 914 can detect opening/closed mode of terminal 900, the relative positioning of component is for example described
Component is the display and keypad of terminal 900, and sensor cluster 914 can also detect 900 1 components of terminal 900 or terminal
Position change, the existence or non-existence that user contacts with terminal 900, the orientation of terminal 900 or acceleration/deceleration and terminal 900
Temperature change.Sensor cluster 914 can include proximity transducer, be configured to detect in not any physical contact
The presence of neighbouring object.Sensor cluster 914 can also include optical sensor, such as CMOS or ccd image sensor, for into
As being used in application.In certain embodiments, the sensor cluster 914 can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal
900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation
In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 916 also includes near-field communication (NFC) module, to promote junction service.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal 900 can be believed by one or more application specific integrated circuits (ASIC), numeral
Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 904 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 920 of terminal 900.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 6 be according to an exemplary embodiment it is a kind of for punctuate add device as frame during server
Figure.The server 1900 can be produced than larger difference because of configuration or performance difference, can be included in one or more
Central processor (central processing units, CPU) 1922 (for example, one or more processors) and memory
1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or one with
Upper mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistently storage.It is stored in
The program of storage medium 1930 can include one or more modules (diagram is not marked), and each module can be included to clothes
The series of instructions operation being engaged in device.Further, central processing unit 1922 could be arranged to communicate with storage medium 1930,
The series of instructions operation in storage medium 1930 is performed on server 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 1932 of instruction, above-mentioned instruction can complete the above method by the computing device of server 1900.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal or
Server) computing device when so that device is able to carry out a kind of punctuate adding method, and methods described includes:Obtain and wait to locate
Manage text;For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;If
The first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through nerve
Network model is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.
Alternatively, described is that the target text adds punctuate by neural network model, including:To the target text
Participle is carried out, to obtain corresponding second word sequence;Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence;
Using neutral net language model, the corresponding language model scores of candidate's punctuate addition result are determined;From second word
The optimal candidate's punctuate addition result of selection language model scores, is used as institute in the corresponding a variety of candidate's punctuate addition results of sequence
State the corresponding second punctuate addition result of target text.
Alternatively, described is that the target text adds punctuate by neural network model, including:Turned by neutral net
Mold changing type is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text;Wherein, institute
It is to be obtained according to parallel corpora training to state neutral net transformation model, and the parallel corpora includes:Source language material and destination end language
Material, the destination end language material is the corresponding punctuate of each vocabulary in the source language material.
Alternatively, described is that the target text adds punctuate by neutral net transformation model, including:To the target
Text is encoded, to obtain the corresponding source hidden layer state of the target text;Model according to neutral net transformation model
Parameter, source hidden layer state corresponding to the target text is decoded, and is belonged to obtaining each vocabulary in the target text
The probability of candidate's punctuate;Belong to the probability of candidate's punctuate according to each vocabulary in target text, obtain the target text corresponding
Second punctuate adds result.
Alternatively, it is described to add punctuate for the pending text, including:Treated by N-gram language model to be described
Handle text addition punctuate.
Alternatively, it is described that punctuate is added for the pending text by N-gram language model, including:Treated to described
Handle text and carry out participle, to obtain corresponding first word sequence of the pending text;It is adjacent in first word sequence
Punctuate is added between word, is added paths with obtaining the corresponding global punctuate of first word sequence;According to vertical order,
Added paths by move mode from the global punctuate and middle obtain local punctuate and add paths and its corresponding first semantic piece
Section;Wherein, the quantity that different first semantic segments include character cell is identical, and the first adjacent semantic segment has what is repeated
Character cell, the character cell includes:Word and/or punctuate;According to vertical order, determined by recursion mode optimal
The corresponding target punctuate of the first semantic segment;The optimal corresponding language model scores of the first semantic segment are optimal, pass through N members
Grammatical language model determines the corresponding language model scores of first semantic segment;According to each first optimal semantic piece
The corresponding target punctuate of section, obtains the corresponding first punctuate addition result of the pending text.
Alternatively, it is described according to vertical order, optimal the first semantic segment correspondence is determined by recursion mode
Target punctuate, including:Using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment;According to
According to the corresponding language model scores of current first semantic segment, select optimal from a variety of the first current semantic segments
Current first semantic segment;The punctuate that optimal current first semantic segment is included is used as described optimal current first
The corresponding target punctuate of semantic segment;According to the optimal corresponding target punctuate of current first semantic segment, next first is obtained
Semantic segment.
Those skilled in the art will readily occur to its of the present invention after considering specification and putting into practice invention disclosed herein
Its embodiment.It is contemplated that cover the present invention any modification, purposes or adaptations, these modifications, purposes or
Person's adaptations follow the general principle of the present invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.Description and embodiments are considered only as exemplary, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be appreciated that the invention is not limited in the precision architecture for being described above and being shown in the drawings, and
And various modifications and changes can be being carried out without departing from the scope.The scope of the present invention is only limited by appended claim
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Above a kind of punctuate adding method provided by the present invention, a kind of punctuate adding set and one kind are used to mark
The device of point addition, is described in detail, and specific case used herein is carried out to the principle and embodiment of the present invention
Illustrate, the explanation of above example is only intended to help to understand method and its core concept of the invention;Simultaneously for ability
The those skilled in the art in domain, according to the thought of the present invention, will change in specific embodiments and applications, comprehensive
Upper described, this specification content should not be construed as limiting the invention.
Claims (10)
1. a kind of punctuate adding method, it is characterised in that including:
Obtain pending text;
For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;
If the first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, lead to
Cross neural network model and add punctuate for the target text, tied with obtaining the corresponding second punctuate addition of the target text
Really.
2. according to the method described in claim 1, it is characterised in that described is that the target text adds by neural network model
Punctuate, including:
Participle is carried out to the target text, to obtain corresponding second word sequence;
Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence;
Using neutral net language model, the corresponding language model scores of candidate's punctuate addition result are determined;
The optimal candidate's mark of selection language model scores from second word sequence corresponding a variety of candidate's punctuate addition results
Point addition result, result is added as corresponding second punctuate of the target text.
3. according to the method described in claim 1, it is characterised in that described is that the target text adds by neural network model
Punctuate, including:
It is that the target text adds punctuate by neutral net transformation model, to obtain corresponding second mark of the target text
Point addition result;Wherein, the neutral net transformation model is to be obtained according to parallel corpora training, and the parallel corpora includes:
Source language material and destination end language material, the destination end language material are the corresponding punctuate of each vocabulary in the source language material.
4. method according to claim 3, it is characterised in that described is target text by neutral net transformation model
This addition punctuate, including:
The target text is encoded, to obtain the corresponding source hidden layer state of the target text;
According to the model parameter of neutral net transformation model, source hidden layer state corresponding to the target text is decoded,
To obtain the probability that each vocabulary in the target text belongs to candidate's punctuate;
Belong to the probability of candidate's punctuate according to each vocabulary in target text, obtain the corresponding second punctuate addition of the target text
As a result.
5. according to any described method in Claims 1-4, it is characterised in that described for the pending text addition mark
Point, including:Punctuate is added for the pending text by N-gram language model.
6. method according to claim 5, it is characterised in that described is described pending by N-gram language model
Text adds punctuate, including:
Participle is carried out to the pending text, to obtain corresponding first word sequence of the pending text;
Punctuate is added between adjacent word in first word sequence, is added with obtaining the corresponding global punctuate of first word sequence
Plus path;
According to vertical order, added paths by move mode from the global punctuate and middle obtain local punctuate and add road
Footpath and its corresponding first semantic segment;Wherein, the quantity that different first semantic segments include character cell is identical, adjacent
There is the character cell repeated in the first semantic segment, the character cell includes:Word and/or punctuate;
According to vertical order, the corresponding target punctuate of the first optimal semantic segment is determined by recursion mode;It is optimal
The corresponding language model scores of the first semantic segment it is optimal, first semantic segment is determined by N-gram language model
Corresponding language model scores;
According to the corresponding target punctuate of each first optimal semantic segment, corresponding first mark of the pending text is obtained
Point addition result.
7. method according to claim 6, it is characterised in that described according to vertical order, passes through recursion mode
The corresponding target punctuate of the first optimal semantic segment is determined, including:
Using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment;
According to the corresponding language model scores of current first semantic segment, selected from a variety of the first current semantic segments
Optimal current first semantic segment;
The punctuate that optimal current first semantic segment is included is used as the optimal current first semantic segment correspondence
Target punctuate;
According to the optimal corresponding target punctuate of current first semantic segment, next first semantic segment is obtained.
8. a kind of punctuate adding set, it is characterised in that including:
Text acquisition module, for obtaining pending text;
First punctuate add module, it is corresponding to obtain the pending text for adding punctuate for the pending text
First punctuate adds result;And
Second punctuate add module, exceedes number of words threshold value and not comprising pre- for including number of words in first punctuate addition result
It is that the target text adds punctuate by neural network model, to obtain the target text when putting the target text of punctuate
Corresponding second punctuate adds result.
9. a kind of device added for punctuate, it is characterised in that include memory, and one or more than one journey
Sequence, one of them or more than one program storage is configured to by one or more than one processor in memory
Perform one or more than one program bag and contain the instruction for being used for being operated below:
Obtain pending text;
For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text;
If the first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, lead to
Cross neural network model and add punctuate for the target text, tied with obtaining the corresponding second punctuate addition of the target text
Really.
10. a kind of machine readable media, is stored thereon with instruction, when executed by one or more processors so that device is held
Punctuate adding method of the row as described in one or more in claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710396130.5A CN107291690B (en) | 2017-05-26 | 2017-05-26 | Punctuation adding method and device and punctuation adding device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710396130.5A CN107291690B (en) | 2017-05-26 | 2017-05-26 | Punctuation adding method and device and punctuation adding device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107291690A true CN107291690A (en) | 2017-10-24 |
CN107291690B CN107291690B (en) | 2020-10-27 |
Family
ID=60094233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710396130.5A Active CN107291690B (en) | 2017-05-26 | 2017-05-26 | Punctuation adding method and device and punctuation adding device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291690B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
CN108564953A (en) * | 2018-04-20 | 2018-09-21 | 科大讯飞股份有限公司 | A kind of punctuate processing method and processing device of speech recognition text |
CN108597517A (en) * | 2018-03-08 | 2018-09-28 | 深圳市声扬科技有限公司 | Punctuation mark adding method, device, computer equipment and storage medium |
CN109255115A (en) * | 2018-10-19 | 2019-01-22 | 科大讯飞股份有限公司 | A kind of text punctuate method of adjustment and device |
CN109410949A (en) * | 2018-10-11 | 2019-03-01 | 厦门大学 | Content of text based on weighted finite state converter adds punctuate method |
CN109614627A (en) * | 2019-01-04 | 2019-04-12 | 平安科技(深圳)有限公司 | A kind of text punctuate prediction technique, device, computer equipment and storage medium |
CN109817210A (en) * | 2019-02-12 | 2019-05-28 | 百度在线网络技术(北京)有限公司 | Voice writing method, device, terminal and storage medium |
CN109918666A (en) * | 2019-03-06 | 2019-06-21 | 北京工商大学 | A kind of Chinese punctuation mark adding method neural network based |
CN109979435A (en) * | 2017-12-28 | 2019-07-05 | 北京搜狗科技发展有限公司 | Data processing method and device, the device for data processing |
CN110445922A (en) * | 2019-07-30 | 2019-11-12 | 惠州Tcl移动通信有限公司 | Mobile terminal contact person sharing method, device and storage medium |
CN111261162A (en) * | 2020-03-09 | 2020-06-09 | 北京达佳互联信息技术有限公司 | Speech recognition method, speech recognition apparatus, and storage medium |
CN111581911A (en) * | 2020-04-23 | 2020-08-25 | 北京中科智加科技有限公司 | Method for automatically adding punctuation to real-time text, model construction method and device |
CN111785259A (en) * | 2019-04-04 | 2020-10-16 | 北京猎户星空科技有限公司 | Information processing method and device and electronic equipment |
CN111797632A (en) * | 2019-04-04 | 2020-10-20 | 北京猎户星空科技有限公司 | Information processing method and device and electronic equipment |
CN112036174A (en) * | 2019-05-15 | 2020-12-04 | 南京大学 | Punctuation marking method and device |
CN113378541A (en) * | 2021-05-21 | 2021-09-10 | 标贝(北京)科技有限公司 | Text punctuation prediction method, device, system and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201818A (en) * | 2006-12-13 | 2008-06-18 | 李萍 | Method for calculating language structure, executing participle, machine translation and speech recognition using HMM |
CN101593518A (en) * | 2008-05-28 | 2009-12-02 | 中国科学院自动化研究所 | The balance method of actual scene language material and finite state network language material |
CN103544406A (en) * | 2013-11-08 | 2014-01-29 | 电子科技大学 | Method for detecting DNA sequence similarity by using one-dimensional cell neural network |
CN104022978A (en) * | 2014-06-18 | 2014-09-03 | 中国联合网络通信集团有限公司 | Half-blindness channel estimating method and system |
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
WO2014140541A3 (en) * | 2013-03-15 | 2015-03-19 | Google Inc. | Signal processing systems |
CN104765769A (en) * | 2015-03-06 | 2015-07-08 | 大连理工大学 | Short text query expansion and indexing method based on word vector |
CN105512692A (en) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | BLSTM-based online handwritten mathematical expression symbol recognition method |
CN106257441A (en) * | 2016-06-30 | 2016-12-28 | 电子科技大学 | A kind of training method of skip language model based on word frequency |
US20170032280A1 (en) * | 2015-07-27 | 2017-02-02 | Salesforce.Com, Inc. | Engagement estimator |
-
2017
- 2017-05-26 CN CN201710396130.5A patent/CN107291690B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201818A (en) * | 2006-12-13 | 2008-06-18 | 李萍 | Method for calculating language structure, executing participle, machine translation and speech recognition using HMM |
CN101593518A (en) * | 2008-05-28 | 2009-12-02 | 中国科学院自动化研究所 | The balance method of actual scene language material and finite state network language material |
WO2014140541A3 (en) * | 2013-03-15 | 2015-03-19 | Google Inc. | Signal processing systems |
CN103544406A (en) * | 2013-11-08 | 2014-01-29 | 电子科技大学 | Method for detecting DNA sequence similarity by using one-dimensional cell neural network |
CN104022978A (en) * | 2014-06-18 | 2014-09-03 | 中国联合网络通信集团有限公司 | Half-blindness channel estimating method and system |
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
CN104765769A (en) * | 2015-03-06 | 2015-07-08 | 大连理工大学 | Short text query expansion and indexing method based on word vector |
US20170032280A1 (en) * | 2015-07-27 | 2017-02-02 | Salesforce.Com, Inc. | Engagement estimator |
CN105512692A (en) * | 2015-11-30 | 2016-04-20 | 华南理工大学 | BLSTM-based online handwritten mathematical expression symbol recognition method |
CN106257441A (en) * | 2016-06-30 | 2016-12-28 | 电子科技大学 | A kind of training method of skip language model based on word frequency |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766325B (en) * | 2017-09-27 | 2021-05-28 | 百度在线网络技术(北京)有限公司 | Text splicing method and device |
CN107766325A (en) * | 2017-09-27 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Text joining method and its device |
CN109979435B (en) * | 2017-12-28 | 2021-10-22 | 北京搜狗科技发展有限公司 | Data processing method and device for data processing |
CN109979435A (en) * | 2017-12-28 | 2019-07-05 | 北京搜狗科技发展有限公司 | Data processing method and device, the device for data processing |
CN108597517A (en) * | 2018-03-08 | 2018-09-28 | 深圳市声扬科技有限公司 | Punctuation mark adding method, device, computer equipment and storage medium |
CN108597517B (en) * | 2018-03-08 | 2020-06-05 | 深圳市声扬科技有限公司 | Punctuation mark adding method and device, computer equipment and storage medium |
CN108564953A (en) * | 2018-04-20 | 2018-09-21 | 科大讯飞股份有限公司 | A kind of punctuate processing method and processing device of speech recognition text |
CN109410949A (en) * | 2018-10-11 | 2019-03-01 | 厦门大学 | Content of text based on weighted finite state converter adds punctuate method |
CN109410949B (en) * | 2018-10-11 | 2021-11-16 | 厦门大学 | Text content punctuation adding method based on weighted finite state converter |
CN109255115B (en) * | 2018-10-19 | 2023-04-07 | 科大讯飞股份有限公司 | Text punctuation adjustment method and device |
CN109255115A (en) * | 2018-10-19 | 2019-01-22 | 科大讯飞股份有限公司 | A kind of text punctuate method of adjustment and device |
CN109614627B (en) * | 2019-01-04 | 2023-01-20 | 平安科技(深圳)有限公司 | Text punctuation prediction method and device, computer equipment and storage medium |
CN109614627A (en) * | 2019-01-04 | 2019-04-12 | 平安科技(深圳)有限公司 | A kind of text punctuate prediction technique, device, computer equipment and storage medium |
CN109817210A (en) * | 2019-02-12 | 2019-05-28 | 百度在线网络技术(北京)有限公司 | Voice writing method, device, terminal and storage medium |
CN109918666A (en) * | 2019-03-06 | 2019-06-21 | 北京工商大学 | A kind of Chinese punctuation mark adding method neural network based |
CN109918666B (en) * | 2019-03-06 | 2024-03-15 | 北京工商大学 | Chinese punctuation mark adding method based on neural network |
CN111797632A (en) * | 2019-04-04 | 2020-10-20 | 北京猎户星空科技有限公司 | Information processing method and device and electronic equipment |
CN111785259A (en) * | 2019-04-04 | 2020-10-16 | 北京猎户星空科技有限公司 | Information processing method and device and electronic equipment |
CN111797632B (en) * | 2019-04-04 | 2023-10-27 | 北京猎户星空科技有限公司 | Information processing method and device and electronic equipment |
CN112036174B (en) * | 2019-05-15 | 2023-11-07 | 南京大学 | Punctuation marking method and device |
CN112036174A (en) * | 2019-05-15 | 2020-12-04 | 南京大学 | Punctuation marking method and device |
CN110445922A (en) * | 2019-07-30 | 2019-11-12 | 惠州Tcl移动通信有限公司 | Mobile terminal contact person sharing method, device and storage medium |
CN111261162A (en) * | 2020-03-09 | 2020-06-09 | 北京达佳互联信息技术有限公司 | Speech recognition method, speech recognition apparatus, and storage medium |
CN111261162B (en) * | 2020-03-09 | 2023-04-18 | 北京达佳互联信息技术有限公司 | Speech recognition method, speech recognition apparatus, and storage medium |
CN111581911B (en) * | 2020-04-23 | 2022-02-15 | 北京中科智加科技有限公司 | Method for automatically adding punctuation to real-time text, model construction method and device |
CN111581911A (en) * | 2020-04-23 | 2020-08-25 | 北京中科智加科技有限公司 | Method for automatically adding punctuation to real-time text, model construction method and device |
CN113378541B (en) * | 2021-05-21 | 2023-07-07 | 标贝(北京)科技有限公司 | Text punctuation prediction method, device, system and storage medium |
CN113378541A (en) * | 2021-05-21 | 2021-09-10 | 标贝(北京)科技有限公司 | Text punctuation prediction method, device, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107291690B (en) | 2020-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291690A (en) | Punctuate adding method and device, the device added for punctuate | |
CN110288077B (en) | Method and related device for synthesizing speaking expression based on artificial intelligence | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
WO2021077529A1 (en) | Neural network model compressing method, corpus translation method and device thereof | |
CN107578771B (en) | Voice recognition method and device, storage medium and electronic equipment | |
CN107632980A (en) | Voice translation method and device, the device for voiced translation | |
CN111261144B (en) | Voice recognition method, device, terminal and storage medium | |
CN107301865A (en) | A kind of method and apparatus for being used in phonetic entry determine interaction text | |
CN107291704A (en) | Treating method and apparatus, the device for processing | |
CN111583944A (en) | Sound changing method and device | |
CN111508511A (en) | Real-time sound changing method and device | |
CN108628813A (en) | Treating method and apparatus, the device for processing | |
CN107274903A (en) | Text handling method and device, the device for text-processing | |
CN109145213A (en) | Inquiry recommended method and device based on historical information | |
CN107564526A (en) | Processing method, device and machine readable media | |
CN108399914A (en) | A kind of method and apparatus of speech recognition | |
CN110210310A (en) | A kind of method for processing video frequency, device and the device for video processing | |
CN105531758A (en) | Speech recognition using foreign word grammar | |
CN107316635B (en) | Voice recognition method and device, storage medium and electronic equipment | |
CN108073573A (en) | A kind of machine translation method, device and machine translation system training method, device | |
CN108345612A (en) | A kind of question processing method and device, a kind of device for issue handling | |
CN108008832A (en) | A kind of input method and device, a kind of device for being used to input | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN109144285A (en) | A kind of input method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |