CN105609107A

CN105609107A - Text processing method and device based on voice identification

Info

Publication number: CN105609107A
Application number: CN201510982716.0A
Authority: CN
Inventors: 曹松军
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2015-12-23
Filing date: 2015-12-23
Publication date: 2016-05-25

Abstract

The embodiment of the invention provides a text processing method and a device based on voice identification. The method comprises a step of obtaining a first text which is obtained by the voice identification of voice data, a step of punctuating the first text, and obtaining one or more text segments, and a step of adding punctuations to the one or more text segments, and forming a second text through combination. According to the embodiment of the invention, the automatic adding of the punctuations is realized, the manual positioning and punctuation adding of a user are avoided, and the convenience of voice input is improved greatly.

Description

A kind of text handling method and device based on speech recognition

Technical field

The present invention relates to speech recognition technology field, particularly relate to a kind of text place based on speech recognitionReason method and a kind of text processing apparatus based on speech recognition.

Background technology

At present, the fast development of mobile Internet has driven wide as the mobile device such as mobile phone, panel computerGeneral universal, and as one of natural mode of man-machine interaction most convenient on mobile device, phonetic entry just byGradually accepted by users.

Speech recognition is at present generally the extensive Speaker-independent continuous speech recognition of application, its objective is defeatedThe voice that enter are converted into word output, and the recognition result of general output is a series of text.

In the application scenarios such as input method, user needs manually to the text of identification is positioned and addedPunctuation mark, especially for long statement and mobile device, its screen is less, and the position of interface display is alsoLittle, the positioning action of punctuate, the interpolation of punctuation mark operate all very loaded down with trivial details, make the operation of phonetic entryVery loaded down with trivial details.

Summary of the invention

In view of the above problems, the present invention has been proposed to provide one to overcome the problems referred to above or at least part ofA kind of text handling method based on speech recognition of addressing the above problem and corresponding a kind of based on languageThe text processing apparatus of sound identification.

According to one aspect of the present invention, provide a kind of text handling method based on speech recognition, bagDraw together:

Obtain the first text that speech data is carried out to speech recognition acquisition;

Described the first text is made pauses in reading unpunctuated ancient writings, obtain one or more text fragments;

Described one or more text fragments are added to punctuation mark, be combined into the second text.

Alternatively, described described the first text is made pauses in reading unpunctuated ancient writings, obtain the step of one or more text fragmentsSuddenly comprise:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

Making pauses in reading unpunctuated ancient writings in word position in the appointment of described the first text, obtains one or more text fragments.

Alternatively, the step that the lexeme of the described one or more words of described identification is put comprises:

According to the order of word, the probability of putting according to each lexeme of previous word, calculates a rear wordThe probability put of each lexeme;

According to the backward of word, the lexeme according to a rear word based on probability mark is put, and marks out previous wordLexeme put.

Alternatively, described according to the order of word, the probability of putting according to each lexeme of previous word, meterThe step that calculates the probability that each lexeme of a rear word puts comprises:

By default sequence labelling model, calculate the probability that each lexeme of the 1st word is put;

By default sequence labelling model, the probability that the each lexeme based on i-1 word is put, calculatesGo out the probability that each lexeme of i word is put, i is greater than 1 positive integer;

The probability of putting for each lexeme of i word, to be worth, the highest probability puts as described lexemeProbability.

Alternatively, described sequence labelling model is conditional random field models, based on training text and for instituteState the lexeme of the sign note in training text and put training generation, the punctuation mark of described training text is replaced.

Alternatively, described according to the backward of word, the lexeme according to a rear word based on probability mark is put, markOutpouring the step that the lexeme of previous word puts comprises:

For the last character, the affiliated lexeme of probability that mark value is the highest is put;

In the time that i word determines that lexeme is put, query count goes out the i-1 of the probability of i words positionThe probability of word, i is greater than 1 positive integer;

For i-1 word, the affiliated lexeme of probability that marks described i-1 word is put.

Alternatively, described lexeme is put and is comprised one or more in prefix, suffix, word, in monosyllabic word;

The step of making pauses in reading unpunctuated ancient writings in the word position of the described appointment at described the first text comprises:

Before the prefix of described the first text and/or monosyllabic word, make pauses in reading unpunctuated ancient writings;

And/or,

After the suffix of described the first text and/or monosyllabic word, make pauses in reading unpunctuated ancient writings.

Alternatively, the described step that described one or more text fragments are added to punctuation mark comprises:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

According to a further aspect in the invention, provide a kind of text processing apparatus based on speech recognition, bagDraw together:

The first text acquisition module, is suitable for obtaining the first text that speech data is carried out to speech recognition acquisition;

Punctuate module, is suitable for described the first text to make pauses in reading unpunctuated ancient writings, and obtains one or more text fragments;

Punctuation mark adds module, is suitable for described one or more text fragments to add punctuation mark, groupSynthetic the second text.

Alternatively, described punctuate module is also suitable for:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

Alternatively, described punctuate module is also suitable for:

Described punctuate module is also suitable for:

And/or,

Alternatively, described punctuation mark interpolation module is also suitable for:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

The embodiment of the present invention is carried out the result of speech recognition to speech data, the first text, makes pauses in reading unpunctuated ancient writings,Text fragments after punctuate is added to punctuation mark, be combined into the second text, realized punctuation markAutomatically add, avoid user manually to position, add punctuate, the phonetic entry of raising is easy greatlyProperty.

Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand skill of the present inventionArt means, and can being implemented according to the content of description, and for allow of the present invention above-mentioned and otherObject, feature and advantage can become apparent, below especially exemplified by the specific embodiment of the present invention.

Brief description of the drawings

By reading below detailed description of the preferred embodiment, various other advantage and benefit for thisIt is cheer and bright that field those of ordinary skill will become. Accompanying drawing is only for the object of preferred embodiment is shown,And do not think limitation of the present invention. And in whole accompanying drawing, represent by identical reference symbolIdentical parts. In the accompanying drawings:

Fig. 1 shows a kind of according to an embodiment of the invention text handling method based on speech recognitionThe flow chart of steps of embodiment; And

Fig. 2 shows a kind of according to an embodiment of the invention text processing apparatus based on speech recognitionThe structured flowchart of embodiment.

Detailed description of the invention

Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail. Although show in accompanying drawingExemplary embodiment of the present disclosure, but should be appreciated that and can realize the disclosure and not with various formsThe embodiment that should be set forth here limits. On the contrary, providing these embodiment is for can be more thoroughlyUnderstand the disclosure, and can be by the those skilled in the art that conveys to complete the scope of the present disclosure.

With reference to Fig. 1, show a kind of according to an embodiment of the invention text place based on speech recognitionThe flow chart of steps of reason embodiment of the method, specifically can comprise the steps:

Step 101, obtains the first text that speech data is carried out to speech recognition acquisition;

In specific implementation, user can be by disposing the electronic equipment input of the sound card equipment such as microphoneLanguage data.

Wherein, this electronic equipment can be mobile device, as mobile phone, panel computer, personal digital assistant,Wearing equipment (as glasses, wrist-watch etc.) etc. can be also permanent plant, as PC, intelligenceTV, Smart Home/household electrical appliances (as air-conditioning, electric cooker) etc., the embodiment of the present invention is not limit thisSystem.

In the time that electronic equipment receives the language data of sound card equipment conversion, can carry out in this locality voice knowledgeNot, add punctuation mark, also can be sent to server carry out speech recognition, add punctuation mark, thisInventive embodiments is not limited this.

In specific implementation, the speech recognition system of carrying out speech recognition can be based on WFST (WeightedFinite-stateTransducer, weighting FST) build network, conventionally by following baseThis module forms:

1, signal is processed and characteristic extracting module; The main task of this module is to extract from voice dataFeature, for acoustic model processing. Meanwhile, it has generally also comprised some signal processing technologies, to the greatest extent canCan reduce the impact that the factors such as ambient noise, channel, speaker cause feature.

2, acoustic model; Adopt of speech recognition system carried out modeling based on single order HMM more.

3, pronunciation dictionary; Pronunciation dictionary comprises the speech recognition system treatable word finder of institute and pronunciation thereof.The actual mapping that acoustic model and language model are provided of pronunciation dictionary.

4, language model; Language model to speech recognition system for language carry out modeling. TheoreticalUpper, comprise regular language, context-free grammar can be served as language mould at interior various language modelsType, but current various system N unit's syntax that generally adopt or based on statistics and variant thereof.

5, decoder; Decoder is one of core of speech recognition system, and its task is the letter to inputNumber, according to acoustics, language model and dictionary, searching can be exported with maximum probability the word string of this signal.Can more clearly understand the relation between above-mentioned module from mathematical angle.

In embodiments of the present invention, decoder can use GMM (GaussianMixtureModel,Gauss hybrid models) and DNN (DeepNeuralNetworks, degree of depth neural network model) trainingThe acoustic model going out.

Due to HMM (HiddenMarkovModel, hidden Markov model), model can be wellThe time variation of description voice and in short-term stationarity, be widely used in large vocabulary continuous speech recognition systemAcoustic Modeling.

In embodiments of the present invention, using syllogic half syllable as basic pronunciation unit, or claim phone, andUsing context-sensitive three-tone as Acoustic Modeling unit.

Each three-tone unit represents with the HMM of 5 states, and the 1st, the 5th stateFor non-emissive state, in the time of training and identification, do not take speech frame; 2nd, 3,4 states are transmitting state,When training and identification, at least take frame voice.

In the starting stage, adopting GMM is state modeling, and adopts based on maximum-likelihood criterionBaum-Welch algorithm iteration is optimized HMM and GMM parameter. When model parameter reaches the condition of convergenceOr reach predefined iterations, deconditioning. And by HMM-GMM system in voiceThe time boundary of HMM state carries out cutting.

Training when DNN, is characterized as input with speech frame, with state corresponding to current speech frame (byThe cutting of HMM-GMM system obtains) be output, maximizing under the object function of cross entropy, adoptGradient descent algorithm based on mini-batch is trained DNN parameter.

Step 102, makes pauses in reading unpunctuated ancient writings to described the first text, obtains one or more text fragments;

In embodiments of the present invention, can, to the result of speech recognition (i.e. the first text), carry out semantemeExcavating, makes pauses in reading unpunctuated ancient writings in the position of interrupting at semanteme.

In the optional embodiment of one of the present invention, step 102 can comprise following sub-step:

Sub-step S11, cuts word processing to described the first text, obtains one or more words;

In embodiments of the present invention, can carry out pretreatment to the first text, be cut into single character.

For example, if the first text of speech recognition is " your good Beijing ", can be cut into " you ", " good "," north ", " capital " is totally four words.

Sub-step S12, identifies the lexeme of described one or more words and puts;

Lexeme is put, and represents the position of this word in certain word, be specifically as follows comprise following a kind of orMultiple:

Prefix (begin, B), represents word first position at a word;

Suffix (end, E), represents word last position at a word;

In word (middle, M), represent that word is in the centre position of a word, non-first position, non-A rear position;

Monosyllabic word (single, S), represents that a word is independent word.

In embodiments of the present invention, can application conditions random field (ConditionalRandomField,CRF) lexeme of the one or more words after mark cutting is put.

CRF is a kind of discriminate probabilistic model, is the one of random field, can be for mark.

Suppose that P (Y|X) is linear chain condition random field, under the condition that is x in stochastic variable X value,Stochastic variable Y value is that the conditional probability of y has following form:

P (y | x) = \frac{1}{Z (x)} \exp (\underset{i, k}{Σ} λ_{k} t_{k} (y_{i - 1}, y_{i}, x, i) + \underset{i, 1}{Σ} μ_{l} s_{l} (y_{i}, x, i))

Wherein,

Z (x) = \underset{y}{Σ} \exp (\underset{i, k}{Σ} λ_{k} t_{k} (y_{i - 1}, y_{i}, x, i) + \underset{i, l}{Σ} μ_{l} s_{l} (y_{i}, x, i))

In above-mentioned formula, t_kWith sl be characteristic function, λ_kAnd μ_lCorresponding weights, Z (x) be standardization because ofSon is sued for peace on all possible output sequence.

Based on CRF punctuate, can adopt the label sets of 4-tag, i.e. BEMS, the feature templates of designAs follows:

#Unigram

U00:％x[-2,0]

U01:％x[-1,0]

U02:％x[0,0]

U03:％x[1,0]

U04:％x[2,0]

U05:％x[-2,0]/％x[-1,0]/％x[0,0]

U06:％x[-1,0]/％x[0,0]/％x[1,0]

U07:％x[0,0]/％x[1,0]/％x[2,0]

U08:％x[-1,0]/％x[0,0]

U09:％x[0,0]/％x[1,0]

#Bigram

B

Wherein Unigram is corresponding to status flag s_l, current word x, characteristic function need to be considered front and back twoIndividual word; Bigram is corresponding to transfer characteristic t_k, for convenience of calculation, set transition probability herein for instituteSome list entries x are all the same.

According to above characteristic Design, can use improved iteration method of scales to carry out CRF training, instructionPractise sequence labelling model, sequence labelling model is conditional random field models, based on training text and pinThe lexeme of the sign note in described training text is put to training to be generated.

In order to meet the use scenes on line, corpus can adopt the mark of speech recognition training language materialData, data volume can reach totally 80 ten thousand statements.

In the time of training, first can carry out pretreatment by training text, by the punctuate symbol occurring in training textNumber with space replace, the punctuation mark of training text is replaced.

For example, to training text "! Alright, say again next time. " in punctuation mark replace,Obtain " uh good next time say again ".

Then use BEMS to mark each word in training text, for example, to training textThe mark of " uh good next time say again " is as follows:

Uh S

Good B

E

Lower B

Inferior M

M again

Say E

The training text that region marks, can carry out the training of CRF model.

The model finally obtaining mainly comprises the content of two aspects, is the Chinese that characteristic function F is corresponding on the one handWord sequence is the weight w that characteristic function is corresponding on the other hand.

In the optional embodiment of one of the present invention, sub-step S12 can comprise following sub-step:

Sub-step S121, according to the order of word, the probability of putting according to each lexeme of previous word,Calculate the probability that each lexeme of a rear word is put;

In embodiments of the present invention, the lexeme that can calculate in the word sequence of input by the mode of recursion is putProbability.

If application sequence marking model, can input model characteristic vector F (y, x) and weight vectors w,Word sequence (being the sequence that the word after cutting forms) x=(x₁,x₂,…,x_n)。

In the time of calculating probability, can first initialize:

δ₁(j)＝w*F₁(y₀＝start,y₁＝j,x),j＝1,2,…,m

Wherein, x is the word sequence of input, and y is the mark sequence that the first text is corresponding, w be weight toAmount, m is the number of mark sequence, such as m corresponding to BEMS is that 4, i is the i in word sequenceIndividual word, j is j mark.

Particularly, for the first character of word sequence, by default sequence labelling model, according to spyLevy function F and corresponding weight w thereof, calculate the probability δ that each lexeme of the 1st word is put₁(j)。

Each lexeme of i the word of probability recursion of then, putting by each lexeme of i-1 word is putProbability, record the historical record Ψ of optimal path simultaneously_i, i is greater than 1 positive integer, i.e. i=2, and 3 ..., n.

δ_{i} (l) = \underset{1 \leq j \leq m}{m a x} {δ_{i - 1} (j) + w * F_{i} (y_{i - 1} = j, y_{i} = l, x)}, l = 1, 2, ..., m

Ψ_{i} (l) = \underset{1 \leq j \leq m}{argmax} {δ_{i - 1} (j) + w * F_{i} (y_{i - 1} = j, y_{i} = l, x)}, l = 1, 2, ..., m

Particularly, by default sequence labelling model, according to feature F and weight w, based on i-1The probability δ that each lexeme of individual word is put_i-1, calculate the probability δ that each lexeme of i word is put_i(j)。

For example, for " you ", " good ", " north ", " capital " these four words, can first calculate " you " and belong toIn the probability of BEMS.

The probability recursion " good " that belongs to B based on " you " belongs to the probability of BEMS;

The probability recursion " good " that belongs to E based on " you " belongs to the probability of BEMS;

The probability recursion " good " that belongs to M based on " you " belongs to the probability of BEMS;

The probability recursion " good " that belongs to S based on " you " belongs to the probability of BEMS;

For " good ", its probability that has that belongs to B has four, is respectively δ_B-B、δ_E-B、δ_M-B、δ_S-B，The subscript of probability represents the path of its recursion, and the highest probability of selected value from these four probability, as" good " belongs to the probability of B eventually, and its probability that belongs to EMS is also processed equally.

The probability recursion " north " that belongs to BEMS based on " good " equally, respectively belongs to the general of BEMSRate, the probability recursion " capital " that belongs to BEMS based on " north " belongs to the probability of BEMS, until passPush away.

Sub-step S122, according to the backward of word, the lexeme according to a rear word based on probability mark is put,The lexeme that marks out previous word is put.

When all words of the first text all calculating probability complete after, get the corresponding all marks of the last characterIn optimal value as last optimal value.

\underset{y}{m a x} (w * F (y, x)) = \underset{1 \leq j \leq m}{m a x} δ_{n} (j)

y_{n}^{*} = \underset{1 \leq j \leq m}{argmax} δ_{n} (j)

Particularly, for the last character, the affiliated lexeme of probability that mark value is the highest is put.

For example, for " you ", " good ", " north ", " capital " these four words, " capital " is the last character,Belong to these four probability of BEMS from " capital ", the probability that selective value is the highest, marks its lexeme and puts,As E.

Recall and obtain final mark path according to last optimal value, complete last mark, try to achieveShortest path

y^{*} = (y_{1}^{*}, y_{2}^{*}, ..., y_{n}^{*}) .

y_{i}^{*} = Ψ_{i + 1} (y_{i + 1}^{*})

Wherein, x is the word sequence of input, and y is the mark sequence that the first text is corresponding, and i is greater than 1Positive integer, i.e. i=n-1, n-2 ..., 1.

Particularly, in the time that i word determines that lexeme is put, query count goes out the general of i words positionThe probability of i-1 word of rate;

For example, for " you ", " good ", " north ", " capital " these four words, " capital " has marked lexeme and has putFor E.

Search the formerly historical record Ψ in the path of record_i, be δ_B-E, " capital " belongs to the probability of E,Be the probability recursion that is belonged to B by previous word " north ", can be to capital " mark lexeme is set to B,Also process equally for " good ", " north ", finally complete " your good Beijing " is marked:

You are B

Good E

North B

Capital E

Sub-step S13, makes pauses in reading unpunctuated ancient writings in the word position of the appointment of described the first text, obtain one orMultiple text fragments.

In specific implementation, lexeme is put and is comprised one or more in prefix, suffix, word, in monosyllabic word.

Therefore,, in the time of punctuate, can before the prefix of the first text and/or monosyllabic word, make pauses in reading unpunctuated ancient writings, alsoCan after the suffix of the first text and/or monosyllabic word, make pauses in reading unpunctuated ancient writings.

It should be noted that, when prefix and/or monosyllabic word are the first character in the first text, before itPunctuate is insignificant, and suffix and/or monosyllabic word are when the rear word in the first text, after itPunctuate is insignificant, can ignore.

In addition, if the punctuate carrying out after suffix and/or monosyllabic word with before prefix and/or monosyllabic wordMake pauses in reading unpunctuated ancient writings overlapping, can once make pauses in reading unpunctuated ancient writings.

For example, for the first text " your good Beijing ", can make pauses in reading unpunctuated ancient writings afterwards at suffix " good ", obtainObtain text fragments " hello ", " Beijing ".

Step 103, adds punctuation mark to described one or more text fragments, is combined into the second text.

The application embodiment of the present invention, the mapping that can set up in advance punctuation mark and one or more keywordsRelation.

The example of these mapping relations is as shown in the table:

Punctuation mark	Keyword
		？	What, how, where,, why, who
！	Oh, eh,
		，	Other (i.e. words of other keyword in mapping relations)

Therefore, for each text fragments, can identify keyword, search the punctuate that keyword is correspondingSymbol adds described punctuation mark after text fragments.

For example, for text fragments " hello ", " Beijing ", can add afterwards ", " " hello ",Text fragments and punctuation mark are combined into the second text " hello, Beijing ".

If make pauses in reading unpunctuated ancient writings, add punctuate in electronic equipment this locality, can directly show second after combinationText, shows user.

If make pauses in reading unpunctuated ancient writings, add punctuate at server, can be by the second text return electron after combinationEquipment shows.

For embodiment of the method, for simple description, thus it is all expressed as to a series of combination of actions,But those skilled in the art should know, the embodiment of the present invention is not subject to the limit of described sequence of movementSystem, because according to the embodiment of the present invention, some step can adopt other orders or carry out simultaneously. ItsInferior, those skilled in the art also should know, the embodiment described in description all belongs to preferred enforcementExample, related action might not be that the embodiment of the present invention is necessary.

With reference to Fig. 2, show a kind of according to an embodiment of the invention text place based on speech recognitionThe structured flowchart of reason device embodiment, specifically can comprise as lower module:

The first text acquisition module 201, is suitable for obtaining speech data is carried out to first of speech recognition acquisitionText;

Punctuate module 202, is suitable for described the first text to make pauses in reading unpunctuated ancient writings, and obtains one or more text sheetsSection;

Punctuation mark adds module 203, is suitable for described one or more text fragments to add punctuation mark,Be combined into the second text.

In the optional embodiment of one of the present invention, described punctuate module 202 can also be suitable for:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

In specific implementation, described sequence labelling model is conditional random field models, based on training text andPut training for the lexeme of the sign note in described training text and generate, the punctuation mark of described training textBe replaced.

In the optional embodiment of one of the present invention, described lexeme put comprise in prefix, suffix, word, singleOne or more in words;

Described punctuate module 202 can also be suitable for:

And/or,

In the optional embodiment of one of the present invention, described punctuation mark adds module 203 and can also be suitable for:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

For device embodiment, because it is substantially similar to embodiment of the method, so the comparison of describingSimply, relevant part is referring to the part explanation of embodiment of the method.

The algorithm providing at this and demonstration are solid with any certain computer, virtual system or miscellaneous equipmentHave relevant. Various general-purpose systems also can with based on using together with this teaching. According to description above,It is apparent constructing the desired structure of this type systematic. In addition, the present invention is not also for any specificProgramming language. It should be understood that and can utilize various programming languages to realize content of the present invention described here,And the description of above language-specific being done is in order to disclose preferred forms of the present invention.

In the description that provided herein, a large amount of details are described. But, can understand, thisInventive embodiment can be put into practice in the situation that there is no these details. In some instances, notBe shown specifically known method, structure and technology, so that not fuzzy understanding of this description.

Similarly, should be appreciated that in order to simplify the disclosure and to help to understand in each inventive aspectOr multiple, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is sometimesBe grouped together into single embodiment, figure or in its description. But, should be by the disclosureMethod be construed to the following intention of reflection: the present invention for required protection requires than in each claimThe more feature of the middle feature of clearly recording. Or rather, as claims below reflectLike that, inventive aspect is to be less than all features of disclosed single embodiment above. Therefore, followClaims of detailed description of the invention are incorporated to this detailed description of the invention, wherein each right thus clearlyRequirement itself is all as independent embodiment of the present invention.

Those skilled in the art are appreciated that and can carry out certainly the module in the equipment in embodimentChange adaptively and they are arranged in one or more equipment different from this embodiment. CanModule in embodiment or unit or assembly are combined into a module or unit or assembly, and in addition canTo put them into multiple submodules or subelement or sub-component. Except such feature and/or process orAt least some in unit are, outside mutually repelling, can adopt any combination (to comprise companion to this descriptionWith claim, summary and accompanying drawing) in disclosed all features and so disclosed any method orAll processes or the unit of person's equipment combine. Unless clearly statement in addition, this description (comprises companionWith claim, summary and accompanying drawing) in disclosed each feature can be by providing identical, being equal to or phaseAlternative features like object replaces.

In addition, although those skilled in the art will appreciate that embodiment more described herein comprise itIncluded some feature instead of further feature in its embodiment, but the group of the feature of different embodimentClose and mean within scope of the present invention and form different embodiment. For example, power belowIn profit claim, the one of any of embodiment required for protection can make with combination arbitrarilyWith.

All parts embodiment of the present invention can realize with hardware, or with in one or more processingThe software module of moving on device realizes, or realizes with their combination. Those skilled in the art shouldUnderstand, can use in practice microprocessor or digital signal processor (DSP) to realize basisOne of some or all parts in the text-processing equipment based on speech recognition of the embodiment of the present inventionA little or repertoire. The present invention can also be embodied as the part for carrying out method as described hereinOr whole equipment or device program (for example, computer program and computer program). ThisThe realizing program of the present invention and can be stored on computer-readable medium of sample, or can have one orThe form of the multiple signals of person. Such signal can be downloaded and obtain from internet website, or at carrierOn signal, provide, or provide with any other form.

It should be noted above-described embodiment the present invention will be described instead of limit the invention,And those skilled in the art can design replacement in the case of not departing from the scope of claimsEmbodiment. In the claims, any reference symbol between bracket should be configured to rightThe restriction requiring. Word " comprises " not to be got rid of existence and is not listed as element or step in the claims. PositionWord " one " before the element or " one " do not get rid of and have multiple such elements. The present invention canWith by means of including the hardware of some different elements and realizing by means of the computer of suitably programming.In the unit claim of having enumerated some devices, several in these devices can be by sameIndividual hardware branch carrys out imbody. The use of word first, second and C grade does not represent any order.Can be title by these word explanations.

The embodiment of the invention discloses A1, a kind of text handling method based on speech recognition, comprising:

A2, method as described in A1, describedly make pauses in reading unpunctuated ancient writings to described the first text, obtains one or manyThe step of individual text fragments comprises:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

A3, method as described in A2, the step bag that the lexeme of the described one or more words of described identification is putDraw together:

A4, method as described in A3, described according to the order of word, according to each word of previous wordThe probability of position, the step that calculates the probability that each lexeme of a rear word puts comprises:

A5, method as described in A4, described sequence labelling model is conditional random field models, based on instructionPractice text and put training for the lexeme of the sign note in described training text generating, described training textPunctuation mark is replaced.

A6, method as described in A3 or A4 or A5, described according to the backward of word, according to latter oneThe lexeme of word based on probability mark put, and marks out the step that the lexeme of previous word puts and comprises:

A7, method as described in A2 or A3 or A4 or A5, described lexeme put comprise prefix, suffix,One or more in word, in monosyllabic word;

And/or,

A8, method as described in A1 or A2 or A3 or A4 or A5, described to described one or manyThe step that individual text fragments adds punctuation mark comprises:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

The embodiment of the invention also discloses B9, a kind of text processing apparatus based on speech recognition, comprising:

B10, device as described in B9, described punctuate module is also suitable for:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

B11, device as described in B10, described punctuate module is also suitable for:

B12, device as described in B11, described punctuate module is also suitable for:

B13, device as described in B11, described sequence labelling model is conditional random field models, based onTraining text and the lexeme of noting for the sign in described training text are put training and are generated, described training textPunctuation mark be replaced.

B14, device as described in B11 or B12 or B13, described punctuate module is also suitable for:

B15, device as described in B10 or B11 or B12 or B13, described lexeme put comprise prefix,One or more in suffix, word, in monosyllabic word;

Described punctuate module is also suitable for:

And/or,

B16, device as described in B9 or B10 or B11 or B12 or B13, described punctuation mark addsAdding module is also suitable for:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

Claims

1. the text handling method based on speech recognition, comprising:

2. the method for claim 1, is characterized in that, described described the first text is carried outPunctuate, the step that obtains one or more text fragments comprises:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;

3. method as claimed in claim 2, is characterized in that, described identification is described one or moreThe step that the lexeme of word is put comprises:

4. method as claimed in claim 3, is characterized in that, described according to the order of word, according toThe probability that each lexeme of previous word is put, calculates probability that each lexeme of a rear word putsStep comprises:

5. method as claimed in claim 4, is characterized in that, described sequence labelling model is conditionRandom field models, puts training life based on training text and for the lexeme of the sign note in described training textBecome, the punctuation mark of described training text is replaced.

6. the method as described in claim 3 or 4 or 5, is characterized in that, described contrary according to wordOrder, the lexeme according to a rear word based on probability mark is put, and marks out the step that the lexeme of previous word is putComprise:

7. the method as described in claim 2 or 3 or 4 or 5, is characterized in that, described lexeme is putComprise one or more in prefix, suffix, word, in monosyllabic word;

And/or,

8. the method as described in claim 1 or 2 or 3 or 4 or 5, is characterized in that, described rightThe step that described one or more text fragments adds punctuation mark comprises:

For each text fragments, identify keyword;

Search the punctuation mark that described keyword is corresponding;

After described text fragments, add described punctuation mark.

9. the text processing apparatus based on speech recognition, comprising:

10. device as claimed in claim 9, is characterized in that, described punctuate module is also suitable for:

Described the first text is cut to word processing, obtain one or more words;

Identifying the lexeme of described one or more words puts;