CN1161703C - Integrated prediction searching method for Chinese continuous speech recognition - Google Patents

Integrated prediction searching method for Chinese continuous speech recognition Download PDF

Info

Publication number
CN1161703C
CN1161703C CNB001249711A CN00124971A CN1161703C CN 1161703 C CN1161703 C CN 1161703C CN B001249711 A CNB001249711 A CN B001249711A CN 00124971 A CN00124971 A CN 00124971A CN 1161703 C CN1161703 C CN 1161703C
Authority
CN
China
Prior art keywords
speech
node
path
search
buffer zone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB001249711A
Other languages
Chinese (zh)
Other versions
CN1346112A (en
Inventor
波 徐
徐波
黄泰翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB001249711A priority Critical patent/CN1161703C/en
Publication of CN1346112A publication Critical patent/CN1346112A/en
Application granted granted Critical
Publication of CN1161703C publication Critical patent/CN1161703C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to an integral forecasting and searching method for Chinese continuous speech recognition, which belongs to the field of automatic speech recognition. The present invention is basically characterized in that a three-voice sub-model with tone and a ternary word statistical language model are integrally searched for one time, and a language model is forecasted in the process of decryption. The present invention relates to the problems on the aspects of the integral searching method, organizations of word stocks, search of the forecasting language model and the reduction of local search local searching paths.

Description

The integrated prediction searching method of Chinese continuous speech identification
Technical field
The invention belongs to the technical field of Chinese continuous speech identification, be meant a kind of integrated prediction searching method of Chinese continuous speech identification especially, it is applicable to that also any needs carry out integrated and comprehensive pattern-recognition of knowledge and the search problem in the artificial intelligence.
Background technology
The more successful way of speech recognition is based on statistical model at present, and its fundamental characteristics is exactly some adjustable parameters, and these parameters can directly be inferred from observed data.Suppose that A represents the acoustics observed data that recognizer will be decoded, the word series that W expresses possibility, P (W/A) represents given observation A, the probability that word series W is said, by statistical decision, recognizer should be maked decision according to following formula:
W ~ = arg max W P ( W | A )
(formula 1)
So, formula 1) can further be write as:
W ~ = arg max W P ( A | W ) P ( W )
(formula 2)
Wherein P (W) is the probability that word strings W is said, P (A|W) is that the word strings that hypothesis is said is to observe the probability of data A under the W situation, recognition system can obtain explanation by Fig. 1, and recognizer comprises front-end processing, acoustic model P (A|W), language model P (W) and searching algorithm.Searching algorithm is exactly to find the word sequence with maximum probability under the condition of acoustic model, language model and acoustics characteristic sequence
Figure C0012497100061
The basic search algorithm mainly contains the Viterbi-beam search of time synchronized and the A* searching algorithm of depth-first.Through studying effort for many years, for reducing the huge calculated amount of search, having occurred with multipass search (Multi-Pass) is the continuous speech recognition search framework of representative.A basic thought of these frameworks is exactly to add senior acoustic model knowledge and language model knowledge gradually, utilizes last time Search Results to inspire the back one time search procedure of quickening.The multipass search framework can be divided into two classes according to the output of intermediate result: N sentence (N-Best) that the probability score is the highest of the general direct output of the first kind, second class produce a medium term figure as next grammer all over search.In fact the sentence of N-Best itself also can produce from speech figure, and speech figure can think an intermediate result of N-Best algorithm, and their relation as shown in Figure 2.
From the basic search algorithm, the time frame synchronous searching is widely adopted in speech recognition, and it comes down to a dynamic programming technology based on Viterbi Beam search, can be regarded as time-progradation on the model state grid.If the frame synchronization searching algorithm has been handled t constantly, this moment, corresponding sub-observation sequence was: Y 2Y 2Λ Y t, this moment this subpath W lThe state node of residing basic-element model and primitive inside is respectively λ tAnd s t, then the score of subpath can be defined as:
Pr (W l)=∏ Prob (Y i| λ t, s t, W l) (formula 3)
At formula 3) in, t+1 extended mode constantly is subjected to the constraint of grammer: in HMM inside, St is subjected to the constraint of HMM topological structure, and the expansion of λ t is subjected to the constraint of dictionary, and the expansion of Wl then is subjected to the constraint of language model between the speech.Wherein the most basic constraint is a dictionary in the Chinese continuous speech identification, and related between hunting zone and speech and the speech is that language model also depends on dictionary.The general tree-shaped organizational form that adopts as shown in Figure 4 of dictionary in this drawing, is represented a paths from the root node to the leaf node, the corresponding one group of homonym (as " pressing " among Fig. 4 and " secretly ") of the leaf node in this path.By this representation, can share start-up portion common in the dictionary fully, reduce the number of path of search, improve search efficiency.
Dictionary is according to the data organization of carrying out shown in Figure 4.If a local path i in the search procedure is by Path (i)={ W 1, W 2, n, s ... expression, here, W 1, W 2Represent two historical speech in front of this paths, n represents the node number of current path in tree, and s represents the residing state of the HMM of current path in node n.Then when state s is last state of HMM, this paths will go from node n jumps to the expanding node of n.As shown in Figure 7.Suppose that m is one of them expanding node of node n, then when when node n expands to m, become path j from path i, owing to jump to node m from node n, this path does not also arrive leaf node, so the identity of its speech is not determined as yet, thereby its language probability is constant, and this transition does not simultaneously take the acoustics time, thereby its acoustics score does not change yet, so overall score does not change, i.e. Prob (Path j)=Prob (Path i).This tree-shaped search extension method, systematic search point have only to determining the speech number of this speech during the leaf node of tree, and the linguistry adding has very big delay, thereby causes irrecoverable error.While is owing to score difference between the path is little, even some is identical, causes and reduces difficulty.
In search procedure, the score in most of path is very low in addition.Keeping the lower path of these scores is unpractical on room and time, also is unnecessary.Thereby we can dynamically carry out beta pruning to the path in the frame synchronization search procedure, abandon and wish little path.In the N of all expansions paths, current optimal path is expressed as:
Wx=argmax (Pr (W l)) l<=N (formula 4) wherein
Can be by setting a thresholding BEAM, all scores are at Pr (W M) and BEAM*Pr (W M) between the path will obtain keeping, carry out next step expansion, and delete all the other paths.BEAMSEARCH significantly reduces volumes of searches (order of magnitude that probably has only Beam* input measurement vector sequence length) like this.In traditional reduction strategy, generally adopt simple gate limit strategy, if i.e. probability P<=Beam*Pr (W of a paths M), then this local path is just reduced.But because in search procedure, the dynamic score in path is constantly changing, and reduces too much, can bring many Search Errors; Reduce very little, influence recognition speed again.The most direct way is the number in control path, but this need can bring more calculated amount to the operation such as sort of all paths.
Summary of the invention
The objective of the invention is to make full use of powerful band and transfer three-tone acoustic model and ternary speech language model, search out the result of an optimum once, overcome ubiquity problem in the multipass search framework, as multipass search problem 1) can not organize all knowledge sources together and decode, so its algorithm is not optimum, and mistake is propagation and enlarges; Multipass search problem 2) adopt fairly simple acoustic model and language model in the pre-search in front, the wrong possibility of bringing is bigger.
Other purpose of the present invention is by the design to the dictionary tissue, makes the prediction of language become possibility, and could add the language probability after not needing to reach root node, accelerates search speed.The integrated prediction searching method of a kind of Chinese continuous speech identification of the present invention is characterized in that, the statistical language model of three-tone model with tune and ternary speech is carried out integration search once, and carry out the prediction of language model in decode procedure; The core algorithm of search adopts the synchronous multi-threshold cutting search of time frame, utilizes the special construction of dictionary and the detection that the ternary statistical model is predicted language model in search procedure;
The search dictionary is by tree-shaped tissue and have following architectural feature:
1) speech in the dictionary is numbered, the principle of numbering is that the speech numbering is with consistent by putting in order of the pairing speech of tree-shaped tissue back leaf nodes;
2) dictionary of tree-shaped tissue, the node of its each representative model contain the numbering Wx and the Wy of two speech, the scope of the speech that expression can be expanded from this node, and promptly the scope from the speech of this node expansion drops between Wx and the Wy;
3) if node m is obtained by node n expansion, then must have:
W Mx<=W Nx<=W Ny<=W MyW wherein Mx, W MyBe speech scope from node m expansion; W Nx, W NyBe speech scope from node n expansion;
The three-tone model that its acoustic model has adopted band to transfer, band is transferred the model of three-tone rhythm pattern master not only to depend on the initial consonant on left and right limit but also is depended on the simple or compound vowel of a Chinese syllable tone on the second from left, the right side two and the tone of itself, thereby in the tree-shaped tissue of dictionary, add tone information, by the tone information of simple or compound vowel of a Chinese syllable is attached on the corresponding initial consonant of same syllable, making only needs pre-expansion one deck node in search procedure;
Wherein adopted a kind of beam search of multi-threshold; Set n probability threshold P 0, P 1, P 2..., P n, P iBe I thresholding, algorithm is as follows:
A) judge between a certain the scoring area that the path fell that must be divided into P, if i.e. P i=<P>=P I+1, then this paths thinks that to drop on i interval; This path, interval counter Ci adds 1;
B) for i=1 ..., N calculates the 1st to i interval accumulative total path number Si;
Si = Σ j = 1 i Cj
Wherein Cj is a j interval path counter, finds the minimum i that satisfies Si>=CountThread, then reduces thresholding and just is Pi, wherein the activated path number of CountThread for controlling according to system's needs;
C) reduce the path according to the Pi thresholding;
Wherein adopted the prediction of three gram language model; When node n expanded to node m, the computing formula of prediction was:
Prob(Path j)=[Prob(Path i)-ProbLm(W 1,W 2,n)]+ProbLm(W 1,W 2,m)
At formula Prob (Path j)=[Prob (Path i)-ProbLm (W 1, W 2, n)]+ProbLm (W 1, W 2, m) in, ProbLm (W 1, W 2, n), ProbLm (W 1, W 2, m) expression from all speech of node n, m with W1, the W2 ternary connects maximum probability, i.e. ProbLm (W 1, W 2, n)=MaxProb (W 1, W 2, W 3) or ProbLm (W 1, W 2, m)=MaxProb (W 1, W 2, W 3), W herein 3Be all speech that can arrive, Prob (Path from node n or m j), Prob (Path i) the probability score of expression j and I paths;
Wherein set up a probability retrieval buffer zone, each is made up of this retrieval buffer zone ProbBuffer four elements: { W 1, W 2, W 3, l, MaxLm}, W here 1, W 2, W 3For the ternary of speech is right, l is a node number, and MaxLm in search procedure, need call speech W for needing the buffering probability of retrieval 1, W 2, with from the maximum ternary probability of all speech of node m the time, can at first retrieve at buffer zone:
1) in buffer zone, finds W 1, W 2, m then directly exports MaxLm,
2) in buffer zone, can not find W 1, W 2, m, but in buffer zone, can find W 1, W 2, n, wherein n is the father node of m, and satisfies Wmx<=W 3<=Wmy then directly exports MaxLm, Wmx wherein, and the meaning of Wmy is existing explanation in right 2;
3) otherwise directly retrieve.
Wherein also set up a probability retrieval buffer zone, each is made up of this retrieval buffer zone four elements: ProbBuffer={W 1, W 2, W 3, n, MaxLm}; In search procedure, need call ProbBLm (W 1, W 2, m) during function, can at first retrieve at buffer zone:
1) finds W at buffer zone 1, W 2, m then directly exports MaxLm;
2) can not find W at buffer zone 1, W 2, m, but in buffer zone, can find W 1, W 2, n, wherein n is the father node of m, and satisfies Wmx<=Wmy, then directly exports MaxLm;
3) otherwise directly remove speech model retrieval ProbBLm.
Description of drawings
For further specifying technology contents of the present invention, below in conjunction with embodiment and accompanying drawing describes in detail as after, wherein:
Fig. 1 is the speech recognition system block diagram;
Fig. 2 is the graph of a relation of N-BEST and speech figure;
Fig. 3 is integrated prediction search framework figure;
Fig. 4 is the tree-shaped expression synoptic diagram of dictionary;
Fig. 5 is the mark of tone information in tree;
Fig. 6 is a multi-threshold cutting synoptic diagram;
Fig. 7 is the tree-shaped expression synoptic diagram of dictionary that is used to predict;
Fig. 8 is the retrieval synoptic diagram.
Embodiment
Conceptual illustration of the present invention as schematically shown in Figure 3.In this search framework, core still is the frame search algorithm of a time synchronized, and its input comprises the search dictionary, adds up three gram language model, three-tone model with tune, the prediction of speech and speech recognition features stream.Comparison diagram 2 can be seen the framework than Fig. 2, and Fig. 3 does not have the output of intermediate result; In Fig. 2, need simultaneously many cover acoustic models and language model, the simple model in front, the back is used complicated model again, and Fig. 3 then directly uses five-star acoustic model and language model.In this framework, the input of acoustic model is direct three-tone model with tune.Technical essential of the present invention is as follows:
1. search for the multi-threshold strategy of reducing
Multi-threshold is reduced synoptic diagram as shown in Figure 6.Set n thresholding P 0, P 1, P 2..., P n, P here 0Be the most probable value in the current point in time path.Algorithm is as follows:
A) judge between a certain the scoring area that the path fell that must be divided into P, if i.e. P i=<P>=P I+1,
Then this paths thinks that to drop on i interval; This path, interval counter Ci adds 1;
B) for i=1 ..., N calculates the 1st to i interval accumulative total path number Si.
Si = Σ j = 1 i Cj
Wherein Cj is a path counter of j song sword, finds satisfied
The minimum i of Si>=CountThread then reduces thresholding and just is Pi.CountThread wherein
Be the activated path number that to control according to system's needs.
C) reduce the path according to the Pi thresholding.
Just can control the number of path that needs expansion more exactly by said process.The design of these thresholding empirical values can obtain by statistics.
2. dictionary tissue
Need fully to use the knowledge of language model to carry out the prediction of path score.Thereby the present invention added another one information especially, promptly begins to expand the set of the leaf node that can arrive from certain node n, i.e. the set of speech.In the close preceding node layer of root node, the leaf node that each node can extend and the set of speech are sizable, and it is unpractical directly writing down this set.In the present invention, the speech in original dictionary is renumberd, the principle of numbering is with the pairing speech ordering of dictionary leaf nodes unanimity dictionary.Utilize this ordering, just can adopt very compact structure that this is described, promptly write down first speech and last speech numbering that this node connects in the set of words and just can.So each node all has a Wx and Wy.As shown in Figure 7.Obviously if node m is by node n expansion, then must have:
W Mx<=W Nx<= WnY<=W My(formula 5)
W wherein Mx, W MyBe speech scope from node m expansion; W Nx, W NyBe speech scope from node n expansion.
As mentioned above, the pattern number of simple or compound vowel of a Chinese syllable depends on the initial consonant on limit, the left and right sides, the simple or compound vowel of a Chinese syllable tone on the second from left right side two and the tone of itself.When search, when simple or compound vowel of a Chinese syllable of expansion, owing to the contextual information on the left side is known, but the contextual information on the right is unknown, so must expand in advance like this.According to above-mentioned, in fact need to expand in advance the back two-layer node, cause the rapid expansion of number of path like this.In this speech tree structure, the tone information of simple or compound vowel of a Chinese syllable is attached on the corresponding initial consonant of same syllable, making so only needs pre-expansion one deck node in search procedure, as shown in Figure 5.
3. language model prediction
Improved algorithm is exactly that any point of expanding at tree node in search procedure can add effective linguistry, thereby improves discrimination, reduces a large amount of time of volumes of searches cost to greatest extent.Calculate in the probability score that expands to path j from path i in such cases and to become:
Prob (Path j)=[Prob (Path i)-ProbLm (W 1, W 2, n)]+ProbLm (W 1, W 2, m) (formula 6) is at formula Prob (Path j)=[Prob (Path i)-ProbLm (W 1, W 2, n)]+ProbLm (W 1, W 2, m) in, ProbLm (W 1, W 2, n), ProbLm (W 1, W 2, m) expression from all speech of node n, m with W1, the W2 ternary connects maximum probability, promptly
ProbLm(W 1,W 2,n)=MaxProb(W 1,W 2,W 3)
Or ProbLm (W 1, W 2, m)=MaxProb (W 1, W 2, W 3) (formula 7)
W herein 3Be all speech that can arrive from node n or m.Prob (Path j), Prob (Path i) the probability score of expression j and I paths.The main points of formula 7 are new node of every expansion, just the language probability that approaches are most joined in the path, thereby in advance the language probability are joined in the search raising search speed and recognition accuracy for information about.
4. prediction probability retrieval
Above-mentioned ProbLm (W 1, W 2, retrieval m) probably need take for 20% time in continuous speech recognition.And, find function ProbLm (W by a large amount of observations in certain following period of time 1, W 2, m) retrieving W 1, W 2, the probability that the m parameter repeats is very big, and this repetition can be given to explain by Fig. 8: supposing has 5 paths at certain time point t, and its speech tree node n of living in is identical, historical speech w1, w2 is also identical, is HMM state difference of living in, promptly is in state 0 respectively, 1,2,3,4.Obviously at time t, the path that is in state 4 will expand to next node m.Then need to retrieve ProbLm (w1, w2, m).Then at time t+1: the path that is positioned s=3 will jump to state 4, and further expands to node m and also need to retrieve same probability.The original route that is in node 2 during T+2 will expand to node m equally.
Can find that in addition the path all expands by father node, and father node n the set out speech of three maximum meta-language probability in front may just drop in the scope that this node m expanded, at this moment:
ProbLm (W 1, W 2, m)=ProbLm (W 1, W 2, n), as Wnx<=Wmx<=W 3<=Wmy<=Wny
Based on above-mentioned observation, set up a probability buffer zone, each is made up of this retrieval buffer zone ProbBuffer four elements: { W 1, W 2, W 3, l, MaxLm}, W here 1, W 2, W 3For the ternary of speech is right, l is a node number, and MaxLm need call speech W for the buffering probability of needs retrieval, in search procedure 1, W 2, with from the maximum ternary probability of all speech of node m the time, can at first retrieve at buffer zone:
1) in buffer zone, finds W 1, W 2, m then directly exports MaxLm,
2) in buffer zone, can not find W 1, W 2, m, but in buffer zone, can find W 1, W 2, n, its
Middle n is the father node of m, and satisfies Wmx<=W 3<=Wmy then directly exports MaxLm,
Wmx wherein, the meaning of Wmy is existing explanation in right 2.
3) otherwise directly retrieve.
The invention has the advantages that: the shortcoming at above-mentioned searching algorithm is particularly set out at this Supersonic segment information of the tone that needs extra integrated Chinese in Chinese demand, disposable processing is carried out in the necessary various inputs of continuous speech recognition such as voice acoustic feature sequence, dictionary, acoustic model and language model, the recognition methods that draws word sequence optimum on the probability meaning can fully effectively utilize all utilizable knowledge sources, thereby reduce Search Error to greatest extent, improve search efficiency.
Benly be, though the foregoing invention explanation is to discern under the disposable search framework at Chinese continuous speech to realize that principle and algorithm are suitable for the search problem of any speech recognition.
Application examples of the present invention:
1. the effect of integrated prediction decoding in continuous speech recognition
This coding/decoding method and at first in large vocabulary continuous speech recognition system, realize and test about implementation algorithm.Total system comprises training utterance, corpus collection and processing, compositions such as acoustic training model, language model training, integrated prediction decoding and tone testing storehouse.Whole device is realized on the PC platform, includes general sound card and external microphone.
Wherein test library adopts country's " 863 " standard testing database, and this storehouse is made up of 6 men, 6 woman's pronunciations, and everyone pronounces 40, totally 480 sentences, sentence is selected from the Peoples Daily, adopts that discrimination improves 6% more than behind this searching algorithm, and recognition speed is then suitable substantially.
2. the application in interactive system
Existing completed application comprises travel information advisory system LoadStar; Hotel reservation system and restaurant translation help system, by replacing vocabulary, the present invention of alternate language model just can carry out the system transplantation in different task field very simply, can illustrate that also the present invention is irrelevant with concrete application, vocabulary and language model etc.System is made up of 5 modules such as speech recognition, language understanding, dialogue management, language response generation, phonetic syntheses.Wherein sound identification module includes the algorithm that adopts the present invention and realize.

Claims (2)

1. the integrated prediction searching method of a Chinese continuous speech identification is characterized in that, the statistical language model of three-tone model with tune and ternary speech is carried out integration search once, and carry out the prediction of language model in decode procedure; The core algorithm of search adopts the synchronous multi-threshold cutting search of time frame, utilizes the special construction of dictionary and the detection that the ternary statistical model is predicted language model in search procedure;
The search dictionary is by tree-shaped tissue and have following architectural feature:
1) speech in the dictionary is numbered, the principle of numbering is that the speech numbering is with consistent by putting in order of the pairing speech of tree-shaped tissue back leaf nodes;
2) dictionary of tree-shaped tissue, the node of its each representative model contain the numbering Wx and the Wy of two speech, the scope of the speech that expression can be expanded from this node, and promptly the scope from the speech of this node expansion drops between Wx and the Wy;
3) if node m is obtained by node n expansion, then must have:
W Mx<=W Nx<=W Ny<=W MyW wherein Mx, W MyBe speech scope from node m expansion; W Nx, W NyBe speech scope from node n expansion;
The three-tone model that its acoustic model has adopted band to transfer, band is transferred the model of three-tone rhythm pattern master not only to depend on the initial consonant on left and right limit but also is depended on the simple or compound vowel of a Chinese syllable tone on the second from left, the right side two and the tone of itself, thereby in the tree-shaped tissue of dictionary, add tone information, by the tone information of simple or compound vowel of a Chinese syllable is attached on the corresponding initial consonant of same syllable, making only needs pre-expansion one deck node in search procedure;
Wherein adopted a kind of beam search of multi-threshold; Set n probability threshold P 0, P 1, P 2..., P n, P iBe I thresholding, algorithm is as follows:
A) judge between a certain the scoring area that the path fell that must be divided into P, if i.e. P i=<P>=P I+1, then this paths thinks that to drop on i interval; This path, interval counter Ci adds 1;
B) for i=1 ..., N calculates the 1st to i interval accumulative total path number Si;
Si = Σ i = 1 i Cj
Wherein Cj is a j interval path counter, finds the minimum i that satisfies Si>=CountThread, then reduces thresholding and just is Pi, wherein the activated path number of CountThread for controlling according to system's needs;
C) reduce the path according to the Pi thresholding;
Wherein adopted the prediction of three gram language model; When node n expanded to node m, the computing formula of prediction was:
Prob(Path j)=[Prob(Path i)-ProbLm(W 1,W 2,n)]+ProbLm(W 1,W 2,m)
At formula Prob (Path j)=[Prob (Path i)-ProbLm (W 1, W 2, n)]+ProbLm (W 1, W 2, m) in, ProbLm (W 1, W 2, n), ProbLm (W 1, W 2, m) expression from all speech of node n, m with W1, the W2 ternary connects maximum probability, i.e. ProbLm (W 1, W 2, n)=MaxProb (W 1, W 2, W 3) or ProbLm (W 1, W 2, m)=MaxProb (W 1, W 2, W 3), W herein 3Be all speech that can arrive, Prob (Path from node n or m j), Prob (Path i) the probability score of expression j and I paths;
Wherein set up a probability retrieval buffer zone, each is made up of this retrieval buffer zone ProbBuffer four elements: { W 1, W 2, W 3, l, MaxLm}, W here 1, W 2, W 3For the ternary of speech is right, l is a node number, and MaxLm in search procedure, need call speech W for needing the buffering probability of retrieval 1, W 2, with from the maximum ternary probability of all speech of node m the time, can at first retrieve at buffer zone:
1) in buffer zone, finds W 1, W 2, m then directly exports MaxLm,
2) in buffer zone, can not find W 1, W 2, m, but in buffer zone, can find W 1, W 2, n, wherein n is the father node of m, and satisfies Wmx<=W 3<=Wmy then directly exports MaxLm, Wmx wherein, and the meaning of Wmy is existing explanation in right 2;
3) otherwise directly retrieve.
2, the integrated prediction searching method of Chinese continuous speech identification according to claim 1 is characterized in that, has wherein also set up a probability retrieval buffer zone, and each is made up of this retrieval buffer zone four elements: ProbBuffer={W 1, W 2, W 3, n, MaxLm}; In search procedure, need call ProbBLm (W 1, W 2, m) during function, can at first retrieve at buffer zone:
1) finds W at buffer zone 1, W 2, m then directly exports MaxLm;
2) can not find W at buffer zone 1, W 2, m, but in buffer zone, can find W 1, W 2, n, wherein n is the father node of m, and satisfies Wmx<=Wmy, then directly exports MaxLm;
3) otherwise directly remove speech model retrieval ProbBLm.
CNB001249711A 2000-09-27 2000-09-27 Integrated prediction searching method for Chinese continuous speech recognition Expired - Lifetime CN1161703C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB001249711A CN1161703C (en) 2000-09-27 2000-09-27 Integrated prediction searching method for Chinese continuous speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB001249711A CN1161703C (en) 2000-09-27 2000-09-27 Integrated prediction searching method for Chinese continuous speech recognition

Publications (2)

Publication Number Publication Date
CN1346112A CN1346112A (en) 2002-04-24
CN1161703C true CN1161703C (en) 2004-08-11

Family

ID=4590774

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB001249711A Expired - Lifetime CN1161703C (en) 2000-09-27 2000-09-27 Integrated prediction searching method for Chinese continuous speech recognition

Country Status (1)

Country Link
CN (1) CN1161703C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100386759C (en) * 2003-04-15 2008-05-07 李琳山 Chinese information retrieve method based on speech
JP4769031B2 (en) * 2005-06-24 2011-09-07 マイクロソフト コーポレーション Method for creating language model, kana-kanji conversion method, apparatus, computer program, and computer-readable storage medium
CN101882226B (en) * 2010-06-24 2013-07-24 汉王科技股份有限公司 Method and device for improving language discrimination among characters
CN105096944B (en) * 2015-07-20 2017-11-03 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN105913848A (en) * 2016-04-13 2016-08-31 乐视控股(北京)有限公司 Path storing method and path storing system based on minimal heap, and speech recognizer
CN108682415B (en) * 2018-05-23 2020-09-29 广州视源电子科技股份有限公司 Voice search method, device and system

Also Published As

Publication number Publication date
CN1346112A (en) 2002-04-24

Similar Documents

Publication Publication Date Title
CN110364171B (en) Voice recognition method, voice recognition system and storage medium
EP0977174B1 (en) Search optimization system and method for continuous speech recognition
US10210862B1 (en) Lattice decoding and result confirmation using recurrent neural networks
US5884259A (en) Method and apparatus for a time-synchronous tree-based search strategy
CN1199148C (en) Voice identifying device and method, and recording medium
US10381000B1 (en) Compressed finite state transducers for automatic speech recognition
CN101493812B (en) Tone-character conversion method
CN1349211A (en) Identification system using words tree
CN1201284C (en) Rapid decoding method for voice identifying system
Renals et al. Decoder technology for connectionist large vocabulary speech recognition
CN1161703C (en) Integrated prediction searching method for Chinese continuous speech recognition
CN1499484A (en) Recognition system of Chinese continuous speech
CN1402867A (en) Speech recognition device comprising language model having unchangeable and changeable syntactic block
CN1159701C (en) Speech recognition apparatus for executing syntax permutation rule
Ström Continuous speech recognition in the WAXHOLM dialogue system
KR101727306B1 (en) Languange model clustering based speech recognition apparatus and method
CN1211026A (en) Continuous voice identification technology for Chinese putonghua large vocabulary
JPH09134192A (en) Statistical language model forming device and speech recognition device
JP2938865B1 (en) Voice recognition device
JP2005250071A (en) Method and device for speech recognition, speech recognition program, and storage medium with speech recognition program stored therein
Amich et al. Multi-level improvement for a transcription generated by automatic speech recognition system for arabic.
CN1061451C (en) Concealed Markov-mould Chines word sound idenfitying method and apparatus thereof
Deoras et al. Iterative decoding: A novel re-scoring framework for confusion networks
Jalalvand et al. Direct word graph rescoring using a* search and RNNLM.
Fu et al. Combination of multiple predictors to improve confidence measure based on local posterior probabilities

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20020424

Assignee: The purple winter of Beijing is voice technology company limited with keen determination

Assignor: Institute of Automation, Chinese Academy of Sciences

Contract record no.: 2015110000014

Denomination of invention: Integrated prediction searching method for Chinese continuous speech recognition

Granted publication date: 20040811

License type: Common License

Record date: 20150519

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20020424

Assignee: Taro Technology (Hangzhou) Co., Ltd.

Assignor: The purple winter of Beijing is voice technology company limited with keen determination

Contract record no.: 2015110000050

Denomination of invention: Integrated prediction searching method for Chinese continuous speech recognition

Granted publication date: 20040811

License type: Common License

Record date: 20151130

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
CX01 Expiry of patent term

Granted publication date: 20040811

CX01 Expiry of patent term