CN107291690A

CN107291690A - Punctuate adding method and device, the device added for punctuate

Info

Publication number: CN107291690A
Application number: CN201710396130.5A
Authority: CN
Inventors: 姜里羊; 王宇光; 陈伟; 郑宏
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2017-10-24
Anticipated expiration: 2037-05-26
Also published as: CN107291690B

Abstract

A kind of device added the embodiments of the invention provide punctuate adding method and device, for punctuate, method therein is specifically included：Obtain pending text；For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text；It is that the target text adds punctuate by neural network model if the first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, to obtain the corresponding second punctuate addition result of the target text.The embodiment of the present invention can improve the degree of accuracy of punctuate addition.

Description

Punctuate adding method and device, the device added for punctuate

Technical field

The present invention relates to technical field of information processing, more particularly to a kind of punctuate adding method and device and one kind The device added for punctuate.

Background technology

In the technical field of information processing such as the communications field and internet arena, needed in some application scenarios for some Lack the file addition punctuate of punctuate, be corresponding text addition punctuate of voice identification result etc. for example, reading for convenience.

Existing scheme can be according to voice signal mute interval, be voice identification result corresponding text addition punctuate. Specifically, the threshold value of Jing Yin length can be set first, if the length of mute interval when spoken user is spoken in voice signal Degree exceedes the threshold value, then adds punctuate on corresponding position；, whereas if Jing Yin when spoken user is spoken in voice signal The length at interval is not less than the threshold value, then without punctuate.

However, inventor has found during the embodiment of the present invention is realized, different spoken users often have different Word speed, so, the mute interval in existing scheme according to voice signal is the corresponding text addition punctuate of voice identification result, The degree of accuracy of punctuate addition will be influenceed.If for example, the word speed of spoken user is too fast, without interval or interval between sentence It is short so that very much less than threshold value, then will not be that text adds any punctuate.

The content of the invention

In view of the above problems, it is proposed that the embodiment of the present invention overcomes above mentioned problem or at least in part to provide one kind Punctuate adding method, punctuate adding set, the device added for punctuate solved the above problems, the embodiment of the present invention can be carried The degree of accuracy of high punctuate addition.

In order to solve the above problems, the invention discloses a kind of punctuate adding method, including：Obtain pending text；For The pending text addition punctuate, to obtain the corresponding first punctuate addition result of the pending text；If described first Punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through neural network model Punctuate is added for the target text, to obtain the corresponding second punctuate addition result of the target text.

Alternatively, described is that the target text adds punctuate by neural network model, including：To the target text Participle is carried out, to obtain corresponding second word sequence；Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence； Using neutral net language model, the corresponding language model scores of candidate's punctuate addition result are determined；From second word The optimal candidate's punctuate addition result of selection language model scores, is used as institute in the corresponding a variety of candidate's punctuate addition results of sequence State the corresponding second punctuate addition result of target text.

Alternatively, described is that the target text adds punctuate by neural network model, including：Turned by neutral net Mold changing type is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text；Wherein, institute It is to be obtained according to parallel corpora training to state neutral net transformation model, and the parallel corpora includes：Source language material and destination end language Material, the destination end language material is the corresponding punctuate of each vocabulary in the source language material.

Alternatively, described is that the target text adds punctuate by neutral net transformation model, including：To the target Text is encoded, to obtain the corresponding source hidden layer state of the target text；Model according to neutral net transformation model Parameter, source hidden layer state corresponding to the target text is decoded, and is belonged to obtaining each vocabulary in the target text The probability of candidate's punctuate；Belong to the probability of candidate's punctuate according to each vocabulary in target text, obtain the target text corresponding Second punctuate adds result.

Alternatively, it is described to add punctuate for the pending text, including：Treated by N-gram language model to be described Handle text addition punctuate.

Alternatively, it is described that punctuate is added for the pending text by N-gram language model, including：Treated to described Handle text and carry out participle, to obtain corresponding first word sequence of the pending text；It is adjacent in first word sequence Punctuate is added between word, is added paths with obtaining the corresponding global punctuate of first word sequence；According to vertical order, Added paths by move mode from the global punctuate and middle obtain local punctuate and add paths and its corresponding first semantic piece Section；Wherein, the quantity that different first semantic segments include character cell is identical, and the first adjacent semantic segment has what is repeated Character cell, the character cell includes：Word and/or punctuate；According to vertical order, determined by recursion mode optimal The corresponding target punctuate of the first semantic segment；The optimal corresponding language model scores of the first semantic segment are optimal, pass through N members Grammatical language model determines the corresponding language model scores of first semantic segment；According to each first optimal semantic piece The corresponding target punctuate of section, obtains the corresponding first punctuate addition result of the pending text.

Alternatively, it is described according to vertical order, optimal the first semantic segment correspondence is determined by recursion mode Target punctuate, including：Using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment；According to According to the corresponding language model scores of current first semantic segment, select optimal from a variety of the first current semantic segments Current first semantic segment；The punctuate that optimal current first semantic segment is included is used as described optimal current first The corresponding target punctuate of semantic segment；According to the optimal corresponding target punctuate of current first semantic segment, next first is obtained Semantic segment.

On the other hand, the invention discloses a kind of punctuate adding set, including：

Text acquisition module, for obtaining pending text；

First punctuate add module, for adding punctuate for the pending text, to obtain the pending text pair The the first punctuate addition result answered；And

Second punctuate add module, for including number of words more than number of words threshold value in first punctuate addition result and not wrapping It is that the target text adds punctuate by neural network model, to obtain the target during target text containing preset punctuate The corresponding second punctuate addition result of text.

Alternatively, the second punctuate add module includes：

Second participle submodule, for carrying out participle to the target text, to obtain corresponding second word sequence；

Candidate result acquisition submodule, for obtaining the corresponding a variety of candidate's punctuate addition results of second word sequence；

Second model score determining unit, for utilizing neutral net language model, determines candidate's punctuate addition knot Really corresponding language model scores；

Second selecting unit, for selecting language from the corresponding a variety of candidate's punctuate addition results of second word sequence The optimal candidate's punctuate addition result of model score, result is added as corresponding second punctuate of the target text.

Alternatively, the second punctuate add module includes：

Model treatment submodule, for being that the target text adds punctuate by neutral net transformation model, to obtain The corresponding second punctuate addition result of the target text；Wherein, the neutral net transformation model is according to parallel corpora instruction Get, the parallel corpora includes：Source language material and destination end language material, the destination end language material are each in the source language material The corresponding punctuate of vocabulary.

Alternatively, the model treatment submodule includes：

Coding unit, for being encoded to the target text, to obtain the corresponding source hidden layer of the target text State；

Decoding unit, for the model parameter according to neutral net transformation model, source corresponding to the target text Hidden layer state is decoded, to obtain the probability that each vocabulary in the target text belongs to candidate's punctuate；

As a result determining unit, the probability for belonging to candidate's punctuate according to each vocabulary in target text, obtain the target The corresponding second punctuate addition result of text.

Alternatively, the first punctuate add module is marked by N-gram language model for the pending text addition Point, the first punctuate add module includes：

First participle submodule, for carrying out participle to the pending text, to obtain the pending text correspondence The first word sequence；

First addition submodule, for adding punctuate between adjacent word in first word sequence, to obtain described the The corresponding global punctuate of one word sequence adds paths；

Local message acquisition submodule, for according to vertical order, by move mode from the global punctuate The middle part punctuate that obtains that adds paths adds paths and its corresponding first semantic segment；Wherein, different first semantic segment institutes Quantity comprising character cell is identical, and the first adjacent semantic segment has the character cell repeated, and the character cell includes： Word and/or punctuate；

Recursion submodule, for according to vertical order, the first optimal semantic segment to be determined by recursion mode Corresponding target punctuate；The optimal corresponding language model scores of the first semantic segment are optimal, true by N-gram language model Determine the corresponding language model scores of first semantic segment；

As a result acquisition submodule, for according to the corresponding target punctuate of each first optimal semantic segment, obtaining institute State the corresponding first punctuate addition result of pending text.

Alternatively, the recursion submodule includes：

First model score determining unit, for utilizing N-gram language model, it is determined that current first semantic segment correspondence Language model scores；

First choice unit, for according to the corresponding language model scores of current first semantic segment, working as from a variety of Optimal current first semantic segment is selected in the first preceding semantic segment；

Target punctuate determining unit, for the punctuate that includes optimal current first semantic segment as it is described most The excellent corresponding target punctuate of current first semantic segment；

Semantic segment update module, for according to the optimal corresponding target punctuate of current first semantic segment, obtaining down One first semantic segment.

Alternatively, the result acquisition submodule includes：

Target punctuate adding device, for according to order from back to front or vertical order, according to described each The corresponding target punctuate of the first optimal semantic segment, adds punctuate, to obtain the pending text to first word sequence This corresponding first punctuate addition result.

Another further aspect, the invention discloses a kind of device added for punctuate, includes memory, and one or More than one program, one of them or more than one program storage is configured to by one or one in memory Individual above computing device is one or more than one program bag contains the instruction for being used for being operated below：Obtain pending Text；For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text；If institute State the first punctuate addition result to exceed number of words threshold value including number of words and do not include the target text of preset punctuate, then pass through nerve net Network model is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.

Another aspect, the invention discloses a kind of machine readable media, is stored thereon with instruction, when by one or many During individual computing device so that the foregoing punctuate adding method of device.

The embodiment of the present invention includes advantages below：

The embodiment of the present invention can include number of words in first punctuate addition result and exceed number of words threshold value and not comprising pre- It is that the target text adds punctuate by neural network model, to obtain the mesh in the case of the target text for putting punctuate Mark the corresponding second punctuate addition result of text.Because neural network model can represent a vocabulary by term vector, and The semantic distance between vocabulary is characterized by the distance between term vector, such embodiment of the present invention can be by a vocabulary correspondence Numerous contexts participate in the training of neural network model so that the neural network model possesses accurate punctuate addition energy Power；Therefore, punctuate is added for the pending text by neural network model, the first punctuate can be solved to a certain extent The problem of very long one section of text does not add punctuate in addition result, and then the degree of accuracy of punctuate addition can be improved.

Brief description of the drawings

Fig. 1 is a kind of example arrangement schematic diagram of speech recognition system of the present invention；

Fig. 2 is a kind of step flow chart of punctuate adding method embodiment of the present invention；

Fig. 3 is that a kind of punctuate of word sequence of the embodiment of the present invention adds the schematic diagram of processing procedure；

Fig. 4 is a kind of structured flowchart of punctuate adding set embodiment of the present invention；

Fig. 5 be according to an exemplary embodiment it is a kind of for punctuate add device as block diagram during terminal； And

Fig. 6 be according to an exemplary embodiment it is a kind of for punctuate add device as frame during server Figure.

Embodiment

In order to facilitate the understanding of the purposes, features and advantages of the present invention, it is below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is further detailed explanation.

Scheme is added the embodiments of the invention provide a kind of punctuate, the program can add mark for pending text first Point, to obtain the corresponding first punctuate addition result of the pending text；Then, include in first punctuate addition result Number of words is the target by neural network model more than number of words threshold value and in the case of the target text not comprising preset punctuate Text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.

The problem of not adding punctuate for very long one section of text in the first punctuate addition result, the embodiment of the present invention can be with Adding result in first punctuate includes number of words more than number of words threshold value and in the case of the target text not comprising preset punctuate, It is that the target text adds punctuate by neural network model, is tied with obtaining the corresponding second punctuate addition of the target text Really.Because neural network model can represent a vocabulary by term vector, and characterized by the distance between term vector The corresponding numerous contexts of one vocabulary can be participated in neutral net by the semantic distance between vocabulary, such embodiment of the present invention The training of model so that the neural network model possesses accurate punctuate addition ability；Therefore, it is institute by neural network model Pending text addition punctuate is stated, very long one section of text in the first punctuate addition result can be solved to a certain extent and is not added The problem of punctuating, and then the degree of accuracy of punctuate addition can be improved.

The embodiment of the present invention can apply to need to add any applied field of punctuate in speech recognition, voiced translation etc. Scape, it will be understood that the embodiment of the present invention is not any limitation as specific application scenarios.

Punctuate adding method provided in an embodiment of the present invention can be applied to the application environment of the devices such as terminal or server In.Alternatively, above-mentioned terminal can include but is not limited to：Smart mobile phone, tablet personal computer, pocket computer on knee, vehicle mounted electric Brain, desktop computer, intelligent TV set, wearable device etc..Above-mentioned server can be Cloud Server or generic services Device, is serviced for providing punctuate addition to client.

Punctuate adding method provided in an embodiment of the present invention is applicable to the processing processing of the language such as Chinese, Japanese, Korean, The degree of accuracy for improving punctuate addition.It is appreciated that being added the language of punctuate the need for any in present invention implementation In the scope of application of the punctuate adding method method of example.

Reference picture 1, shows a kind of example arrangement schematic diagram of speech recognition system of the present invention, it can specifically be wrapped Include：Speech recognition equipment 101 and punctuate adding set 102.Wherein, speech recognition equipment 101 and punctuate adding set 102 can be with As single device (including server or terminal), it can be arranged at jointly in same device；It is appreciated that of the invention Embodiment is not any limitation as the specific set-up mode of speech recognition equipment 101 and punctuate adding set 102.

Wherein, speech recognition equipment 101 can be used for the voice signal of spoken user being converted to text message, specifically, Speech recognition equipment 101 can export voice identification result.In actual applications, spoken user can be the scene of voiced translation The middle user for talking and sending voice signal, then can receive the language of spoken user by microphone or other voice collecting devices Message number, and send received voice signal to speech recognition equipment 101；Or, the speech recognition equipment 101 can have Receive the function of the voice signal of spoken user.

Alternatively, speech recognition equipment 101 can be converted to the voice signal of spoken user using speech recognition technology Text message.If the voice signal of user's spoken user is denoted as into S, corresponded after carrying out a series of processing to S Phonetic feature sequence O, be denoted as O={ O₁, O₂..., O_i..., O_T, wherein O_iIt is i-th of phonetic feature, T is that phonetic feature is total Number.The corresponding sentences of voice signal S are considered as a word string being made up of many words, are denoted as W={ w₁, w₂..., w_n}.Language The process of sound identification is exactly, according to known phonetic feature sequence O, to obtain most probable word string W, wherein, T, i, n are positive integer.

Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of people Sound feature sets up speech model, passes through the analysis of the voice signal to input, the feature needed for extracting, to set up speech recognition institute The template needed；It is the feature and the template ratio that user is inputted to voice that the process that voice is identified is inputted to user Compared with process, finally determine with the user input the optimal Template of voice match, so as to obtain the result of speech recognition.Tool The speech recognition algorithm of body, training that can be using the hidden Markov model based on statistics and recognizer, can also use base In the training of neutral net and recognizer, the recognizer matched based on dynamic time consolidation etc. other algorithms, the present invention Embodiment is not any limitation as specific speech recognition process.

Punctuate adding set 102 can be connected with speech recognition equipment 101, and it can receive speech recognition equipment 101 and send out The voice identification result sent, for the voice identification result addition punctuate received, specifically, it can know the voice received Other result adds punctuate, to obtain the pending text corresponding first for pending text first as pending text Punctuate adds result；Then, include number of words in first punctuate addition result and exceed number of words threshold value and not comprising preset punctuate Target text in the case of, by neural network model be the target text add punctuate, to obtain the target text Corresponding second punctuate adds result.

It is alternatively possible to according to the corresponding second punctuate addition result of the target text, to the pending text pair The the first punctuate addition result answered carries out editing and processing, for example, above-mentioned editing and processing can be corresponding by the pending text Target text in first punctuate addition result replaces with the corresponding second punctuate addition result of the target text, to obtain State the corresponding final punctuate addition result of pending text.Certainly, it is above-mentioned to add the editing and processing of result only for the first punctuate It is as alternative embodiment, in fact, can also be according to the corresponding first punctuate addition result of the pending text, to second Punctuate addition result carries out editing and processing, to obtain the corresponding final punctuate addition result of the pending text；Or, In the case that one punctuate addition result only includes target text, directly the second punctuate addition result can also be treated as described Handle the corresponding final punctuate addition result of text.

In actual applications, can be by the corresponding final punctuate addition result output of the pending text.Alternatively, exist Under the application scenarios of speech recognition, it is final that punctuate adding set 102 can export this to user or the corresponding client of user Punctuate adds result；Under the application scenarios of voiced translation, punctuate adding set 102 can export this to machine translation apparatus most Whole punctuate adds result.It is appreciated that those skilled in the art can determine the pending text according to actual application scenarios This corresponding final punctuate adds the corresponding way of output of result, and the embodiment of the present invention is corresponding most for the pending text Whole punctuate adds the corresponding specific way of output of result and is not any limitation as.

Embodiment of the method

Reference picture 2, shows a kind of step flow chart of punctuate adding method embodiment of the present invention, can specifically include Following steps：

Step 201, the pending text of acquisition；

Step 202, for the pending text addition punctuate, added with obtaining corresponding first punctuate of the pending text Plus result；

Exceed number of words threshold value if step 203, first punctuate addition result include number of words and do not include preset punctuate Target text, then be that the target text adds punctuate by neural network model, to obtain the target text corresponding the Two punctuates add result.

In the embodiment of the present invention, pending text can be used for representing to need to be added the text of punctuate, the pending text Originally the text or voice that can be inputted from user by device, can be from other devices.It should be noted that on Stating in pending text to include：A kind of language or more than one language, for example, can be with above-mentioned pending text Including Chinese, the Chinese mixing with such as other language of English can also be included, the embodiment of the present invention is to specific pending Text is not any limitation as.

In actual applications, the embodiment of the present invention can perform this hair by client end AP P (application, Application) The punctuate adding method flow of bright embodiment, client application may operate in terminal, for example, the client application can be Any APP run in terminal, then the client application can obtain pending text from the other application of terminal.Or, this Inventive embodiments can perform the punctuate adding method flow of the embodiment of the present invention by the functional device of client application, then should Functional device can obtain pending text from other functional devices.Or, the embodiment of the present invention can be performed by server The punctuate adding method of the embodiment of the present invention.

In a kind of alternative embodiment of the present invention, step 201 can obtain according to the voice signal of spoken user and wait to locate Text is managed, in such cases, step 201 can be converted to the voice signal of spoken user text message, and believe from the text Pending text is obtained in breath.Or, the voice signal that step 201 can be directly from speech recognition equipment reception user is corresponding Text message, and obtain from text information pending text.

In actual applications, step 201 can be according to practical application request, from the corresponding text of voice signal or user Pending text is obtained in the text of input.It is alternatively possible to the interval time according to voice signal S, from voice signal S correspondences Text in obtain pending text；For example, when voice signal S interval time being more than time threshold, when can be according to this Between point determine corresponding separation, using the corresponding texts of voice signal S before the separation as pending text, and to this The corresponding texts of voice signal S after separation are handled, to continue therefrom to obtain pending text.It is appreciated that this Inventive embodiments from the corresponding text of voice signal or the text of user's input for obtaining the specific mistake of pending text Journey is not any limitation as.

In actual applications, step 202 can use arbitrary punctuate addition manner for the pending text addition mark Point.It is for instance possible to use the mute interval based on voice signal in existing scheme, is that the corresponding pending text of voice signal adds Punctuate.

Can be the pending text addition punctuate by language model in a kind of alternative embodiment of the present invention. In natural language processing field, language model is the probabilistic model set up for a kind of language or multilingual, it is therefore an objective to built The distribution of probability of given appearance of the word sequence in language can be described by standing one., can be by specific to the embodiment of the present invention The distribution of the probability of appearance of the given word sequence of language model description in language is referred to as language model scores.Alternatively, may be used To obtain language material sentence from corpus, participle, and the word sequence obtained according to participle are carried out to the language material sentence, training is obtained Above-mentioned language model.Alternatively, the given word sequence of language model description can carry punctuate, to realize for speech recognition knot The punctuate addition processing of fruit.

In the embodiment of the present invention, language model can include：N-gram (N-gram) language model, and/or, nerve net Network language model, wherein, neutral net language model may further include：RNNLM (Recognition with Recurrent Neural Network language model, Recurrent neural Network Language Model), CNNLM (convolutional neural networks language model, Convolutional Neural Networks Language Model), DNNLM (deep neural network language model, Deep Neural Networks Language Model) etc..

Wherein, N-gram language models based on it is such a it is assumed that i.e. the appearance of n-th word only and above N-1 word phase Close, and it is all uncorrelated to other any words, and the probability of whole sentence is exactly the product of each word probability of occurrence.The embodiment of the present invention passes through N-gram language model is that pending text adds punctuate, because N-gram language model can be given according to language model scores Go out relatively reasonable the first punctuate addition result, therefore the degree of accuracy of punctuate addition can be improved.

It is that the pending text adds above by N-gram language model in a kind of alternative embodiment of the present invention Punctuate, can specifically include：Participle is carried out to the pending text, to obtain corresponding first word of the pending text Sequence；It is that first word sequence adds punctuate by N-gram language model, is tied with obtaining corresponding first punctuate addition Really.

In the embodiment of the present invention, corresponding a variety of candidate's punctuates can be added in first word sequence between adjacent word, That is, can be according to the situation that a variety of candidate's punctuates are added between adjacent word in first word sequence, to the target word Sequence carries out punctuate addition processing, so, and first word sequence is by that should have a variety of punctuates to add schemes and its corresponding the One punctuate adds result.It is alternatively possible to determine that a variety of first punctuates add the language mould of result by N-gram language model Type score, so, may finally obtain optimal the first punctuate addition result of language model scores.

It should be noted that those skilled in the art can be according to practical application request, it is determined that needing the candidate's mark added Point, alternatively, above-mentioned candidate's punctuate can include：Comma, question mark, fullstop, exclamation mark, space etc., wherein, space can be played The effect of word segmentation cuts little ice, for example, for English, space can be used for splitting different words, in For text, space can be the punctuate cut little ice, it will be understood that the embodiment of the present invention for specific candidate's punctuate not It is any limitation as.

Reference picture 3, shows that a kind of punctuate of word sequence of the embodiment of the present invention adds the schematic diagram of processing procedure, its In, word order be classified as " hello/I be/Xiao Ming/be very glad/recognize you ", then " hello/I be/Xiao Ming/be very glad/recognize you " It is possible to be added candidate's punctuate between adjacent word；In Fig. 3, " hello ", " I is ", " Xiao Ming ", " being very glad ", " recognizing you " Represented respectively with rectangle Deng word, the punctuate such as comma, space, exclamation, question mark, fullstop is represented with circle respectively, then the head of word sequence Can possess mulitpath between punctuate after individual word " hello " and end word " recognizing you ".

In another alternative embodiment of the present invention, it is possible to use dynamic programming algorithm, from a variety of of pending text The global punctuate optimal global punctuate of middle selection that adds paths adds paths and its corresponding optimal the first punctuate addition result, Wherein, optimal first punctuate addition result can realize the global optimum of language model scores, herein globally available in table Show the corresponding entirety of pending text correspondence the first punctuate addition result, therefore the first optimal punctuate of the embodiment of the present invention adds Plus result can improve the degree of accuracy of addition punctuate.Correspondingly, the step 202 is treated by N-gram language model to be described The process that text adds punctuate is handled, can be included：

Step A1, participle is carried out to the pending text, to obtain corresponding first word sequence of the pending text；

Step A2, in first word sequence punctuate is added between adjacent word, to obtain the first word sequence correspondence Global punctuate add paths；

Step A3, according to vertical order, added paths middle acquisition office from the global punctuate by move mode Portion's punctuate adds paths and its corresponding first semantic segment；Wherein, different first semantic segments include the number of character cell Amount is identical, and the first adjacent semantic segment has the character cell repeated, and the character cell can include：Word and/or punctuate；

Step A4, according to vertical order, the optimal corresponding mesh of the first semantic segment is determined by recursion mode Mark punctuate；The optimal corresponding language model scores of the first semantic segment are optimal, and described is determined by N-gram language model The corresponding language model scores of one semantic segment；

Step A5, according to the corresponding target punctuate of each first optimal semantic segment, obtain the pending text Corresponding first punctuate adds result.

Step A1 to step A5 is added paths middle obtain by move mode from global punctuate according to vertical order Length identical (identical comprising character cell quantity) and the first semantic segment that there is repetition, and according to vertical order, The corresponding target punctuate of the first optimal semantic segment is determined by recursion mode.Wherein, the acquisition that global punctuate adds paths Process is referred to Fig. 3, and the embodiment of the present invention is not any limitation as the specific acquisition process that global punctuate adds paths.It is local Punctuate adds paths available for the part that global punctuate adds paths is represented, every kind of global punctuate adds paths can be to that should have First semantic segment.

In actual applications, the corresponding language model scores of the first semantic segment can be determined by N-gram language models.It is false If N=5, then the length of the first semantic segment can be 5, it is assumed that the numbering of the initial character unit of word sequence is 1, then can be according to The order below of numbering：1-5,2-6,3-7,4-8 etc. from first punctuate add result in obtain corresponding length for 5 the One semantic segment, and the corresponding language model scores of each first semantic segment are determined using N-gram language models, for example, will be each First semantic segment inputs N-gram language models, then the exportable corresponding language model scores of N-gram language models.Can be with Understand, the displacement between above-mentioned adjacent first semantic segment is intended only as example for 1, in fact, those skilled in the art Displacement between above-mentioned adjacent first semantic segment can be determined, for example, the displacement is also according to practical application request Can be 2,3 etc..

In a kind of alternative embodiment of the present invention, above-mentioned steps A4 passes through recursion mode according to vertical order The corresponding target punctuate of the first optimal semantic segment is determined, can specifically be included：

Step A41, using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment；

Step A42, according to the corresponding language model scores of current first semantic segment, from a variety of the first current languages Optimal current first semantic segment is selected in adopted fragment；

Step A43, the punctuate for including optimal current first semantic segment are used as described optimal current first The corresponding target punctuate of semantic segment；

Step A44, according to the optimal corresponding target punctuate of current first semantic segment, obtain the next first semantic piece Section.

It is corresponding first semantic that current first semantic segment can be used for representing that in recursive process local punctuate adds paths Field, it is assumed that the numbering of current first semantic segment is k, k is positive integer, then can utilize N-gram language model, determine the The k corresponding language model scores of the first semantic segment, and select language model scores from a variety of k-th of first semantic segments Optimal k-th optimal of first semantic segments, the punctuate that k-th optimal of first semantic segments are included is used as corresponding mesh Mark punctuate；And according to the optimal corresponding target punctuate of k-th of first semantic segments, (k+1) individual first semantic segment is obtained, Wherein, (k+1) individual first semantic segment can be multiplexed the optimal corresponding target punctuate of k-th of first semantic segments.With Fig. 3 Exemplified by, it is assumed that the length of the first semantic segment is 5, optimal the 1st the first semantic segment for " hello/,/I am/space/small It is bright ", then can be multiplexed optimal the 1st first semantic for the 2nd the first semantic segment " punctuate/I be/punctuate/Xiao Ming/punctuate " The corresponding target punctuate of fragment, so, the 2nd the first semantic segment can be on the bases of " ,/I be/space/Xiao Ming/punctuate " Upper addition punctuate, so, can select optimal punctuate from a variety of punctuates after " Xiao Ming ".

In actual applications, it is above-mentioned according to the corresponding target punctuate of each first optimal semantic segment, obtain described The corresponding first punctuate addition result of pending text, can specifically include：According to order from back to front or from front to back Order, according to the corresponding target punctuate of each first optimal semantic segment, punctuate is added to first word sequence, with Obtain the corresponding first punctuate addition result of the pending text.That is, can be according to certain order, it is determined that global punctuate adds Plus the corresponding target punctuate in each punctuate position (between adjacent word) in path, and obtain described pending according to above-mentioned target punctuate The corresponding first punctuate addition result of text.

To sum up, for step A1 to step A5 punctuate adding procedure, repeated because the first adjacent semantic segment is present Character cell, therefore next first semantic segment can be multiplexed the optimal corresponding target punctuate of current first semantic segment, because This can reduce the operand needed for the acquisition of optimal punctuate addition result by recursion mode；Also, due to adjacent first language There is displacement between adopted fragment, therefore the embodiment of the present invention can pass through the optimal language model scores of the first semantic segment It is optimal to realize the optimal of the optimal language model scores of all first semantic segment correspondences.

Although N-gram language model has the advantages that processing speed is fast, however, because N-gram language model is adding It is merely capable of seeing N-1 word (above) above during punctuating, and the mark in whole pending text can not be known Point addition situation, therefore would be possible to occur the situation that very long one section of text in the first punctuate addition result does not add punctuate. In the case of application scenarios applied to translation, in order to improve translation quality, often rely on punctuate and translated, that is, machine Device translating equipment is translated generally directed to the text with punctuate, conversely, for being translated without the text of punctuate, will Easily there is the problem of translation quality is low.Therefore, the first punctuate addition result obtained by N-gram language model may not Meet the demand of machine translation.

In actual applications, the first punctuate addition result that can be obtained to step 202 judge, specifically, can be with Judge whether the first punctuate addition result includes number of words and exceed number of words threshold value and the target text not comprising preset punctuate.On Stating preset punctuate can be determined by those skilled in the art according to practical application request.For example, can be determined according to translation demand State preset punctuate.The example of above-mentioned preset punctuate can include：Comma, question mark, fullstop, exclamation mark etc., the embodiment of the present invention pair It is not any limitation as in specific preset punctuate.

Above-mentioned number of words threshold value can add the quantity for the individual character that result includes for first punctuate, for English, German Deng the word being made up of alphabetic character, above-mentioned individual character can be equal to word；For Chinese, Japanese, Korean etc. by non-alphabetic word The word of composition is accorded with, above-mentioned individual character can be single word.

Above-mentioned number of words threshold value can determine according to practical application request by those skilled in the art, for example, under initial situation, Above-mentioned number of words threshold value can be default empirical value.Later stage, can be corresponding according to user feedback, and/or above-mentioned number of words threshold value Translation quality, is adjusted to above-mentioned default empirical value.If for example, the corresponding translation quality of current number of words threshold value TH is less than pre- Condition is put, then can be turned down on the basis of current number of words threshold value TH, for example, being adjusted to (TH-1).Alternatively, TH model Enclosing to include：15 to 20, it will be understood that the embodiment of the present invention is not any limitation as specific number of words threshold value.

Step 203 can include number of words in first punctuate addition result and exceed number of words threshold value and not comprising preset punctuate Target text in the case of, by neural network model be the target text add punctuate, to obtain the target text Corresponding second punctuate adds result.Because neural network model can represent a vocabulary by term vector, and pass through word The distance between vector characterizes the semantic distance between vocabulary, and such embodiment of the present invention can be corresponding numerous by a vocabulary Context participates in the training of neural network model so that the neural network model possesses accurate punctuate addition ability；Therefore, Punctuate is added for the pending text by neural network model, the first punctuate addition result can be solved to a certain extent In very long one section of text the problem of do not add punctuate, and then the degree of accuracy of punctuate addition can be improved.

The embodiment of the present invention can provide the following technology for adding punctuate for the target text by neural network model Scheme：

Technical scheme 1

In technical scheme 1, neural network model can be neutral net language model, be described by neural network model The process of target text addition punctuate can include：Participle is carried out to the target text, to obtain corresponding second word sequence； Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence；Using neutral net language model, the time is determined Punctuate is selected to add the corresponding language model scores of result；From the corresponding a variety of candidate's punctuate addition results of second word sequence Candidate's punctuate addition result that language model scores are optimal is selected, is tied as the corresponding second punctuate addition of the target text Really.

Relative to N-gram language models, an advantage of such as RNNLM neutral net language model is：Can be true Just fully predict next word above using all, therefore RNNLM can possess the language mould of adjustable length semantic segment The descriptive power of type score, that is, RNNLM is applied to the semantic segment of wider length range, for example, the corresponding semantemes of RNNLM The length range of fragment can be：1 to the second length threshold, wherein, the second length threshold can be more than the first length threshold.

In technical scheme 1, because RNNLM is applied to the semantic segment of wider length range, therefore can be by each candidate's punctuate All semantic segments of result are added as an entirety, determine that candidate's punctuate adds all semantic segments of result by RNNLM Corresponding language model scores.For example, candidate's punctuate is added into all character cells for including of result inputs RNNLM, then RNNLM Exportable corresponding language model scores.

Technical scheme 2

In technical scheme 2, neural network model can be neutral net transformation model, and technical scheme 2 can add punctuate Plus the problem of be converted to the problem of vocabulary punctuate is changed, the vocabulary punctuate conversion be specially each vocabulary in source language material is converted to The corresponding punctuate of destination end, and turned by training obtained neutral net transformation model to handle the vocabulary punctuate based on parallel corpora Change problem.

Correspondingly, it can include above by the process that neural network model is target text addition punctuate：Pass through Neutral net transformation model is that the target text adds punctuate, is tied with obtaining the corresponding second punctuate addition of the target text Really；Wherein, the neutral net transformation model can be to be obtained according to parallel corpora training, and the parallel corpora can include： Source language material and destination end language material, the destination end language material are the corresponding punctuate of each vocabulary in the source language material.

In actual applications, the parallel corpora can include：Source language material and destination end language material, the destination end language material Can be the corresponding punctuate of each vocabulary in the source language material.Generally, the corresponding punctuate of each vocabulary can be to add behind the vocabulary Plus punctuate.

In actual applications, source language material can include：Several source sentences, destination end language material can be above-mentioned source The corresponding punctuate of each vocabulary in sentence.For example, for source sentence " today weather how we go out play ", wherein each word Converge corresponding destination end punctuate can for " _ ____！", wherein, " _ " represents not punctuate after correspondence vocabulary.

In a kind of alternative embodiment of the present invention, the process for obtaining neutral net transformation model is trained according to parallel corpora It can include：According to neural network structure, the vocabulary of source is set up to the neutral net transformation model of the punctuate of destination end；And profit With Learning Algorithm, parallel corpora is trained, to obtain the model parameter of the neutral net transformation model.

In a kind of alternative embodiment of the present invention, the neural network structure can include：RNN (Recognition with Recurrent Neural Network, Recurrent Neural Networks), LSTM (shot and long term remember, Long Short-Term Memory) or GRU (doors Control cycling element, Gated Recurrent Unit) etc..It is appreciated that those skilled in the art can be according to practical application need Ask, using required neural network structure, it will be understood that the embodiment of the present invention is not limited for specific neural network structure System.

Alternatively, above-mentioned neutral net transformation model can include：Mapping letter of the vocabulary of source to the punctuate of destination end Number, the mapping function can be expressed as the form of conditional probability, such as P (y ︱ x) or p (y_j︱ y ＜ j, x), wherein, x represents source Information (information of such as target text), y represents target client information (such as in target text each vocabulary corresponding punctuate)；Generally The accuracy rate for adding punctuate is higher, then the conditional probability is bigger.

In actual applications, neural network structure can include multiple neuronal layers, specifically, and the neuronal layers can be with Including：Input layer, hidden layer and output layer, wherein, input layer is responsible for receiving source information, and is distributed to hidden layer, and hidden layer is born The required calculating of duty simultaneously exports result of calculation to output layer, and output layer is responsible for output target client information namely result of calculation.At this In a kind of alternative embodiment of invention, the model parameter of neutral net transformation model can include：Between input layer and hidden layer In the second connection weight U and output layer and the offset parameter of hidden layer between first connection weight W, output layer and hidden layer It is at least one, it will be understood that the embodiment of the present invention is not subject to for specific network transformation model and its corresponding model parameter Limitation.

Parallel corpora is trained, the maximization target of neutral net transformation model is that given source client information x is exported just True pointing information y probability.In actual applications, it is possible to use Learning Algorithm, parallel corpora is trained, and Model parameter is optimized using the optimization method of such as stochastic gradient descent method, for example, above-mentioned optimization can be according to defeated The error for going out layer seeks model parameter gradient, and model parameter is updated according to optimization method, can so realize nerve The maximization target of network transformation model.Alternatively, Learning Algorithm can include：BP (error back propagation, Error BackPropagation) algorithm, heredity etc., it will be understood that the embodiment of the present invention is for specific neural network learning Algorithm and Learning Algorithm is utilized, the detailed process being trained to parallel corpora is not any limitation as.

In actual applications, the target text can be inputted to the neutral net transformation model that training is obtained, by the god The target text is handled through network transformation model, and exports the corresponding second punctuate addition result of the target text. It is that target text addition punctuate is related to above by neutral net transformation model in a kind of alternative embodiment of the present invention The process that is handled the target text of neutral net transformation model can include：

Step S1, the target text is encoded, to obtain the corresponding source hidden layer state of the target text；

Step S2, the model parameter according to neutral net transformation model, source hidden layer shape corresponding to the target text State is decoded, to obtain the probability that each vocabulary in the target text belongs to candidate's punctuate；

Each vocabulary belongs to the probability of candidate's punctuate in step S3, foundation target text, obtains the target text corresponding Second punctuate adds result.

In actual applications, each vocabulary in target text can be converted into corresponding vocabulary vector by step S1 first, should Vocabulary vector dimension can be identical with the size of vocabulary, but due to the size of vocabulary cause vocabulary vector dimension compared with Greatly, the semantic relation in order to avoid dimension disaster and preferably between expression vocabulary and vocabulary, can be by the vocabulary DUAL PROBLEMS OF VECTOR MAPPING To the semantic space of a low-dimensional, each vocabulary is by by the dense vector representation of a fixed dimension, and the dense vector is referred to as Term vector, the distance between the term vector can weigh the similitude between vocabulary to a certain extent.Further, it is possible to utilize The corresponding word sequence of neural network structure compression goal text, to obtain the compression expression of whole target text, namely target text This corresponding source hidden layer state.It is alternatively possible to using activation primitive (such as sigmoid (the S types of neural network structure hidden layer Function), tanh (hyperbolic tangent function) etc.), the corresponding word sequence of compression goal text, to obtain the corresponding source of target text Hidden layer state, the embodiment of the present invention is not any limitation as the specific compress mode of the corresponding source hidden layer state of target text.

In a kind of alternative embodiment of the present invention, the source hidden layer state can include：The source hidden layer shape of forward direction State, so, the hidden layer state of each vocabulary only have compressed the vocabulary before it in target text.Or, the source hidden layer state It can include：The source hidden layer state of forward direction and backward source hidden layer state, so, the hidden layer shape of each vocabulary in target text State not only have compressed the vocabulary before it, can be with the vocabulary behind compressor reducer, so can be corresponding numerous by a vocabulary Context participates in the training of network transformation model so that the network transformation model possesses accurate punctuate addition ability.

In an embodiment of the present invention, step S2 can obtain source according to the corresponding source hidden layer state of target text Hold corresponding context vector, according to the context vector, determine destination end hidden layer state, and according to the hidden layer state and The model parameter of neutral net transformation model, determines that each vocabulary in the target text belongs to the probability of candidate's punctuate.

It should be noted that those skilled in the art can according to practical application request, it is determined that need adjacent words it Between candidate's punctuate for adding, alternatively, above-mentioned candidate's punctuate can include：Comma, question mark, fullstop, exclamation mark, space etc., its In, space " _ " can play a part of word segmentation or cut little ice, for example, for English, space can be used for dividing Different words are cut, for Chinese, space can be the punctuate cut little ice, it will be understood that the embodiment of the present invention pair It is not any limitation as in specific candidate's punctuate.

In a kind of alternative embodiment of the present invention, the corresponding context vector of source can be fixed vector, specifically, The corresponding context vector of source can be the combination of all source hidden layer states of source.Can in the corresponding context vector of source In the case of thinking fixed vector, each vocabulary of source is identical for the contribution of each target end position, but this has one Fixed irrationality, for example, significantly larger for the contribution of target end position with the source position of destination end position consistency.It is above-mentioned Reasonability is less problematic when source sentence comparison is short, but if source sentence comparison is long, shortcoming will be obvious, because This will reduce the degree of accuracy of punctuate addition and easily increases operand.

The problem of degree of accuracy that can be brought for the corresponding context vector of above-mentioned source for fixed vector declines, at this In a kind of alternative embodiment of invention, variable context vector can be used, accordingly, is changed above by neutral net Model is that target text addition punctuate can also include：Step S3, determine the corresponding source position of the target text with Alignment probability between the corresponding target end position of punctuate addition result；

The then model parameter of the step S2 according to neutral net transformation model, source corresponding to the target text is hidden The process that layer state is decoded can include：According to the alignment probability and the corresponding source hidden layer shape of the target text State, obtains the corresponding context vector of source；According to the context vector, destination end hidden layer state is determined；According to described hidden The model parameter of layer state and neutral net transformation model, determines that each vocabulary belongs to the general of candidate's punctuate in the target text Rate.

Above-mentioned alignment probability can be used for characterizing the matching degree between i-th of source position and j-th of target end position, according to According to the alignment probability and the corresponding source hidden layer state of the target text, the corresponding context vector of source is obtained, so The corresponding context vector of source can be made to increasingly focus on the part vocabulary in source, therefore can be reduced to a certain extent Operand, and the degree of accuracy of punctuate addition can be improved.

The embodiment of the present invention can provide the corresponding source position of the target text mesh corresponding with punctuate addition result That marks the alignment probability between end position is identified below mode：

Determination mode 1, model parameter and destination end hidden layer state according to neutral net transformation model, obtain the target The probability that aligns between the target end position corresponding with punctuate addition result of the corresponding source position of text；Or

Determination mode 2, by relatively the source hidden layer state and destination end hidden layer state, obtain the target text pair The probability that aligns between the target end position corresponding with punctuate addition result of the source position answered；Or

Determination mode 3, the corresponding alignment source position of target end position is determined, determine that each target end position is corresponding Alignment probability between source of aliging position.

Wherein it is determined that mode 1 can be according to neutral net transformation model model parameter and destination end hidden layer state, obtain Align probability, specifically, can input softmax functions to the product of the first connection weight and destination end hidden layer state, by Softmax functions output alignment probability.Wherein, softmax functions are normalized function, and it can map the value of a pile real number It is interval to [0,1], and make they and be 1.

Determination mode 2 can be compared by alignment function to the source hidden layer state and destination end hidden layer state. The example of alignment function can for scoring functions index with based on hidden layer state to the summed result of the index of scoring functions it Between ratio, scoring functions can be the function related with destination end hidden layer state to source hidden layer state, it will be understood that this hair Bright embodiment is not any limitation as specific alignment function.

Determination mode 3 can be for the corresponding alignment source position p of j-th of target end position generation_j, and take window in source Mouth [p_j-D,p_j+ D], D is positive integer, then context vector can be by the weighted average of the source hidden layer state in calculation window Obtain, if window exceeds the border of source sentence, be defined by the border of sentence.Wherein, p_jCan be preset value, can also The value obtained for On-line Estimation, the embodiment of the present invention is for alignment source position p_jSpecific determination process be not worth.

It is described in detail above by determination mode 1 to the determination process of 3 pairs of alignment probability of determination mode, Ke Yili Solution, those skilled in the art can according to practical application request, using any of determination mode 1 into determination mode 3, or, Other determination modes can also be used, the embodiment of the present invention is not any limitation as the specific determination process for the probability that aligns.

Each vocabulary belongs to the probability of candidate's punctuate in the target text that step S3 can be obtained according to step S2, obtains described Target text corresponding second punctuate addition result, specifically, can by for a vocabulary by candidate's punctuate of maximum probability It is used as its corresponding target punctuate.Further, it is possible to according to the corresponding target punctuate of each vocabulary in target text, obtain target text This corresponding second punctuate addition result, punctuate addition result can be the target text that processing is added by punctuate.For example, The corresponding punctuate addition result of target text " it is that Nice to see you by Xiao Ming that you, which get well me, " can be for " hello, and I is Xiao Ming, very high It is emerging to recognize you ".Certainly, punctuate addition result can be the corresponding target punctuate of each vocabulary in target text, it will be understood that this The embodiments mode that inventive embodiments add result for the punctuate is not any limitation as.

By neural network model it is that the target text adds punctuate above by technical scheme 1 to 2 pairs of technical scheme Process be described in detail, it will be understood that those skilled in the art can be according to practical application request, using technical scheme 1 and technical scheme 2 in any, or, it is target text addition punctuate that can also use by neural network model Other processes, for example, the source of the neural network model used can be pending text, destination end can be to add by punctuate Plus the text of processing etc., it will be understood that the embodiment of the present invention by neural network model for the target text for being added The detailed process of punctuate is not any limitation as.

In a kind of alternative embodiment of the present invention, the target text that can be obtained according to step 203 corresponding the Two punctuates add result, and the corresponding first punctuate addition result of the pending text obtained to step 202 carries out Editorial Services Reason, is replaced for example, corresponding first punctuate of the pending text can be added the target text in result by above-mentioned editing and processing The corresponding second punctuate addition result of the target text is changed to, is added with obtaining the corresponding final punctuate of the pending text As a result.

In actual applications, can be by the corresponding final punctuate addition result output of the pending text.Alternatively, exist Under the application scenarios of speech recognition, the final punctuate addition result can be exported to user or the corresponding client of user； Under the application scenarios of voiced translation, the final punctuate addition result can be exported to machine translation apparatus.It is appreciated that this area Technical staff can determine that the corresponding final punctuate addition result of the pending text is corresponding according to actual application scenarios The way of output, the embodiment of the present invention adds the corresponding specific output side of result for the corresponding final punctuate of the pending text Formula is not any limitation as.

To sum up, the punctuate adding method of the embodiment of the present invention, result can be added in first punctuate super including number of words Cross number of words threshold value and not comprising preset punctuate target text in the case of, by neural network model be the target text add Punctuate, to obtain the corresponding second punctuate addition result of the target text.Due to neural network model can by word to Amount characterizes the semantic distance between vocabulary, the so present invention in fact to represent a vocabulary by the distance between term vector A vocabulary corresponding numerous contexts can be participated in the training of neural network model by applying example so that the neural network model has Standby accurate punctuate adds ability；Therefore, punctuate is added for the pending text by neural network model, can be certain Very long one section of text is solved in the first punctuate addition result in degree the problem of do not add punctuate, and then punctuate can be improved add Plus the degree of accuracy.

It should be noted that for embodiment of the method, in order to be briefly described, therefore it is dynamic that it is all expressed as to a series of motion Combine, but those skilled in the art should know, the embodiment of the present invention is not limited by described athletic performance order System, because according to the embodiment of the present invention, some steps can be carried out sequentially or simultaneously using other.Secondly, art technology Personnel should also know, embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs Surely necessary to being the embodiment of the present invention.

Device embodiment

Reference picture 4, shows a kind of structured flowchart of punctuate adding set embodiment of the present invention, be able to can specifically wrap Include：

Text acquisition module 401, for obtaining pending text；

First punctuate add module 402, for adding punctuate for the pending text, to obtain the pending text Corresponding first punctuate adds result；And

Second punctuate add module 403, exceedes number of words threshold for that can include number of words in first punctuate addition result It is that the target text adds punctuate by neural network model, to obtain when being worth and not including the target text of preset punctuate The corresponding second punctuate addition result of the target text.

Alternatively, the first punctuate add module 402 is marked by N-gram language model for the pending text addition Point, the first punctuate add module 402 can include：

Local message acquisition submodule, for according to vertical order, by move mode from the global punctuate The middle part punctuate that obtains that adds paths adds paths and its corresponding first semantic segment；Wherein, different first semantic segment institutes Quantity comprising character cell is identical, and adjacent the first semantic segment has a character cell repeated, and the character cell can be with Including：Word and/or punctuate；

Alternatively, the recursion submodule can include：

Alternatively, the result acquisition submodule can include：

Alternatively, the second punctuate add module 403 can include：

Model treatment submodule, for being that the target text adds punctuate by neutral net transformation model, to obtain The corresponding second punctuate addition result of the target text；Wherein, the neutral net transformation model is according to parallel corpora instruction Get, the parallel corpora can include：Source language material and destination end language material, the destination end language material are the source language material In the corresponding punctuate of each vocabulary.

Alternatively, the model treatment submodule can include：

For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, it is related Part illustrates referring to the part of embodiment of the method.

Each embodiment in this specification is described by the way of progressive, what each embodiment was stressed be with Between the difference of other embodiment, each embodiment identical similar part mutually referring to.

On the device in above-described embodiment, wherein modules perform the concrete mode of operation in relevant this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

The embodiment of the present invention additionally provides a kind of punctuate adding set, includes memory, and one or one with On program, one of them or more than one program storage is configured to by one or more than one in memory Computing device is one or more than one program bag contains the instruction for being used for being operated below：Obtain pending text； For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text；If described One punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through neutral net mould Type is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.

Fig. 5 be according to an exemplary embodiment it is a kind of for punctuate add device as block diagram during terminal. For example, terminal 900 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, flat board is set It is standby, Medical Devices, body-building equipment, personal digital assistant etc..

Reference picture 5, terminal 900 can include following one or more assemblies：Processing assembly 902, memory 904, power supply Component 906, multimedia groupware 908, audio-frequency assembly 910, the interface 912 of input/output (I/O), sensor cluster 914, and Communication component 916.

The integrated operation of the usual control terminal 900 of processing assembly 902, such as with display, call, data communication, phase Machine operates the operation associated with record operation.Treatment element 902 can refer to including one or more processors 920 to perform Order, to complete all or part of step of above-mentioned method.In addition, processing assembly 902 can include one or more modules, just Interaction between processing assembly 902 and other assemblies.For example, processing assembly 902 can include multi-media module, it is many to facilitate Interaction between media component 908 and processing assembly 902.

Memory 904 is configured as storing various types of data supporting the operation in terminal 900.These data are shown Example includes the instruction of any application program or method for being operated in terminal 900, and contact data, telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group Close and realize, such as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM) is erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 906 provides electric power for the various assemblies of terminal 900.Power supply module 906 can include power management system System, one or more power supplys, and other components associated with generating, managing and distributing electric power for terminal 900.

Multimedia groupware 908 is included in the screen of one output interface of offer between the terminal 900 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive the input signal from user.Touch panel includes one or more touch sensings Device is with the gesture on sensing touch, slip and touch panel.The touch sensor can not only sensing touch or sliding motion The border of action, but also the detection duration related to the touch or slide and pressure.In certain embodiments, Multimedia groupware 908 includes a front camera and/or rear camera.When terminal 900 is in operator scheme, mould is such as shot When formula or video mode, front camera and/or rear camera can receive the multi-medium data of outside.Each preposition shooting Head and rear camera can be a fixed optical lens systems or with focusing and optical zoom capabilities.

Audio-frequency assembly 910 is configured as output and/or input audio signal.For example, audio-frequency assembly 910 includes a Mike Wind (MIC), when terminal 900 be in operator scheme, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The audio signal received can be further stored in memory 904 or via communication set Part 916 is sent.In certain embodiments, audio-frequency assembly 910 also includes a loudspeaker, for exports audio signal.

I/O interfaces 912 is provide interface between processing assembly 902 and peripheral interface module, above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor cluster 914 includes one or more sensors, and the state for providing various aspects for terminal 900 is commented Estimate.For example, sensor cluster 914 can detect opening/closed mode of terminal 900, the relative positioning of component is for example described Component is the display and keypad of terminal 900, and sensor cluster 914 can also detect 900 1 components of terminal 900 or terminal Position change, the existence or non-existence that user contacts with terminal 900, the orientation of terminal 900 or acceleration/deceleration and terminal 900 Temperature change.Sensor cluster 914 can include proximity transducer, be configured to detect in not any physical contact The presence of neighbouring object.Sensor cluster 914 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In certain embodiments, the sensor cluster 914 can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 916 is configured to facilitate the communication of wired or wireless way between terminal 900 and other equipment.Terminal 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 also includes near-field communication (NFC) module, to promote junction service.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, terminal 900 can be believed by one or more application specific integrated circuits (ASIC), numeral Number processor (DSP), digital signal processing appts (DSPD), PLD (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 904 of instruction, above-mentioned instruction can be performed to complete the above method by the processor 920 of terminal 900.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Fig. 6 be according to an exemplary embodiment it is a kind of for punctuate add device as frame during server Figure.The server 1900 can be produced than larger difference because of configuration or performance difference, can be included in one or more Central processor (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or one with Upper mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistently storage.It is stored in The program of storage medium 1930 can include one or more modules (diagram is not marked), and each module can be included to clothes The series of instructions operation being engaged in device.Further, central processing unit 1922 could be arranged to communicate with storage medium 1930, The series of instructions operation in storage medium 1930 is performed on server 1900.

Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided Such as include the memory 1932 of instruction, above-mentioned instruction can complete the above method by the computing device of server 1900.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal or Server) computing device when so that device is able to carry out a kind of punctuate adding method, and methods described includes：Obtain and wait to locate Manage text；For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text；If The first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, then passes through nerve Network model is that the target text adds punctuate, to obtain the corresponding second punctuate addition result of the target text.

Those skilled in the art will readily occur to its of the present invention after considering specification and putting into practice invention disclosed herein Its embodiment.It is contemplated that cover the present invention any modification, purposes or adaptations, these modifications, purposes or Person's adaptations follow the general principle of the present invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.Description and embodiments are considered only as exemplary, and true scope and spirit of the invention are by following Claim is pointed out.

It should be appreciated that the invention is not limited in the precision architecture for being described above and being shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.The scope of the present invention is only limited by appended claim

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Above a kind of punctuate adding method provided by the present invention, a kind of punctuate adding set and one kind are used to mark The device of point addition, is described in detail, and specific case used herein is carried out to the principle and embodiment of the present invention Illustrate, the explanation of above example is only intended to help to understand method and its core concept of the invention；Simultaneously for ability The those skilled in the art in domain, according to the thought of the present invention, will change in specific embodiments and applications, comprehensive Upper described, this specification content should not be construed as limiting the invention.

Claims

1. a kind of punctuate adding method, it is characterised in that including：

Obtain pending text；

For the pending text addition punctuate, result is added to obtain corresponding first punctuate of the pending text；

If the first punctuate addition result includes number of words and exceedes number of words threshold value and the target text not comprising preset punctuate, lead to Cross neural network model and add punctuate for the target text, tied with obtaining the corresponding second punctuate addition of the target text Really.

2. according to the method described in claim 1, it is characterised in that described is that the target text adds by neural network model Punctuate, including：

Participle is carried out to the target text, to obtain corresponding second word sequence；

Obtain the corresponding a variety of candidate's punctuate addition results of second word sequence；

Using neutral net language model, the corresponding language model scores of candidate's punctuate addition result are determined；

The optimal candidate's mark of selection language model scores from second word sequence corresponding a variety of candidate's punctuate addition results Point addition result, result is added as corresponding second punctuate of the target text.

3. according to the method described in claim 1, it is characterised in that described is that the target text adds by neural network model Punctuate, including：

It is that the target text adds punctuate by neutral net transformation model, to obtain corresponding second mark of the target text Point addition result；Wherein, the neutral net transformation model is to be obtained according to parallel corpora training, and the parallel corpora includes： Source language material and destination end language material, the destination end language material are the corresponding punctuate of each vocabulary in the source language material.

4. method according to claim 3, it is characterised in that described is target text by neutral net transformation model This addition punctuate, including：

The target text is encoded, to obtain the corresponding source hidden layer state of the target text；

According to the model parameter of neutral net transformation model, source hidden layer state corresponding to the target text is decoded, To obtain the probability that each vocabulary in the target text belongs to candidate's punctuate；

Belong to the probability of candidate's punctuate according to each vocabulary in target text, obtain the corresponding second punctuate addition of the target text As a result.

5. according to any described method in Claims 1-4, it is characterised in that described for the pending text addition mark Point, including：Punctuate is added for the pending text by N-gram language model.

6. method according to claim 5, it is characterised in that described is described pending by N-gram language model Text adds punctuate, including：

Participle is carried out to the pending text, to obtain corresponding first word sequence of the pending text；

Punctuate is added between adjacent word in first word sequence, is added with obtaining the corresponding global punctuate of first word sequence Plus path；

According to vertical order, added paths by move mode from the global punctuate and middle obtain local punctuate and add road Footpath and its corresponding first semantic segment；Wherein, the quantity that different first semantic segments include character cell is identical, adjacent There is the character cell repeated in the first semantic segment, the character cell includes：Word and/or punctuate；

According to vertical order, the corresponding target punctuate of the first optimal semantic segment is determined by recursion mode；It is optimal The corresponding language model scores of the first semantic segment it is optimal, first semantic segment is determined by N-gram language model Corresponding language model scores；

According to the corresponding target punctuate of each first optimal semantic segment, corresponding first mark of the pending text is obtained Point addition result.

7. method according to claim 6, it is characterised in that described according to vertical order, passes through recursion mode The corresponding target punctuate of the first optimal semantic segment is determined, including：

Using N-gram language model, it is determined that the corresponding language model scores of current first semantic segment；

According to the corresponding language model scores of current first semantic segment, selected from a variety of the first current semantic segments Optimal current first semantic segment；

The punctuate that optimal current first semantic segment is included is used as the optimal current first semantic segment correspondence Target punctuate；

According to the optimal corresponding target punctuate of current first semantic segment, next first semantic segment is obtained.

8. a kind of punctuate adding set, it is characterised in that including：

Text acquisition module, for obtaining pending text；

First punctuate add module, it is corresponding to obtain the pending text for adding punctuate for the pending text First punctuate adds result；And

Second punctuate add module, exceedes number of words threshold value and not comprising pre- for including number of words in first punctuate addition result It is that the target text adds punctuate by neural network model, to obtain the target text when putting the target text of punctuate Corresponding second punctuate adds result.

9. a kind of device added for punctuate, it is characterised in that include memory, and one or more than one journey Sequence, one of them or more than one program storage is configured to by one or more than one processor in memory Perform one or more than one program bag and contain the instruction for being used for being operated below：

Obtain pending text；

10. a kind of machine readable media, is stored thereon with instruction, when executed by one or more processors so that device is held Punctuate adding method of the row as described in one or more in claim 1 to 7.