CN102592595B

CN102592595B - Voice recognition method and system

Info

Publication number: CN102592595B
Application number: CN2012100734129A
Authority: CN
Inventors: 潘青华; 鹿晓亮; 何婷婷; 王智国; 胡国平; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2012-03-19
Filing date: 2012-03-19
Publication date: 2013-05-29
Anticipated expiration: 2032-03-19
Also published as: CN102592595A

Abstract

The invention relates to the technical field of voice recognition and discloses voice recognition method and system. The method comprises the following steps: a decoding recognition network is built; each voice signal frame of received voice signals is decoded according to the decoding recognition network, and in the decoding process, historical pathways of active nodes are excitated according to hot words so as to improve the accumulation probability of the historical pathways where the hot words are positioned; the active node with maximal cumulative probability is selected to serve as an optimal node until the last voice signal frame is decoded; and the optimal node is traced back to an optimal pathway and a corresponding word sequence from the decoding state. According to the invention, system parameter reassessment is avoided, and hot words and user personalized words can be recognized quickly and accurately, so that the recognition effect of the hot words is improved.

Description

Audio recognition method and system

Technical field

The present invention relates to the speech recognition technology field, particularly a kind of audio recognition method and system.

Background technology

Realize man-machine between hommization, intelligentized effectively mutual, make up man-machine communication's environment of efficient natural, become the active demand of current information technology application and development.In recent years, along with the develop rapidly of speech recognition technology, the various online speech recognition application such as phonetic entry, phonetic search have received increasing concern.System based on the mass data training can satisfy the needs that phonetic entry commonly used is write in advance, and recognition accuracy is often higher when the phonetic entry content meets the distribution of original language model probability especially.Yet in actual applications, mobile Internet and social networks fast development are constantly producing new much-talked-about topic and corresponding focus vocabulary, also there is the identification demand of different personalized vocabulary in different user, as get in touch with name etc., these focus vocabulary or personalized vocabulary are because ageing often occurrence frequency is lower in the language material of acquired original with specificity, thereby the original language model often covers deficiency to such vocabulary, and then causes corresponding recognition system can not accurately identify such hot word.

For this reason, often adopt in the prior art the method for systematic parameter revaluation, after the hot word material that will newly collect adds former corpus, again train new language model to improve the recognition accuracy to new gain of heat word.Yet in actual applications, hot word update frequency is often higher, and system can't in time collect enough language materials and participate in the systematic parameter revaluation, and then impact is to the recognition effect of hot word.On the other hand, again the training of language model and recognition system resource are (as based on WFST (Weighted Finite-State Transducers, the weighting FST) structure decoding recognition network) is often time-consuming more, cost is larger, can't realize the quick response to hot word identification.

Summary of the invention

The embodiment of the invention provides a kind of audio recognition method and system, can't fast, accurately identify the technical matters of focus vocabulary and user individual vocabulary to solve prior art.

For this reason, the embodiment of the invention provides following technical scheme:

A kind of audio recognition method comprises:

Make up the decoding recognition network;

To the voice signal that receives, according to described decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, according to hot word the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot word place;

After finishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node;

Recall the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.

A kind of speech recognition system comprises:

The network struction unit is used for making up the decoding recognition network;

Decoding unit is used for the voice signal to receiving, and according to described decoding recognition network every frame voice signal frame is wherein decoded;

Exciting unit is used for the historical path of live-vertex being encouraged according to hot word at the decode procedure of described decoding unit, to improve the accumulated history path probability in path, hot word place;

Optimum node determination unit is used for after described decoding unit is finished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;

Trace unit is used for recalling the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.

Embodiment of the invention audio recognition method and system, employing encourages the historical path of live-vertex based on hot word coupling, to improve the accumulated history path probability in path, hot word place, realized the effective excitation to hot word identification, improved the recognition effect of hot word.Need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word entry of simple update system support.

Description of drawings

In order to be illustrated more clearly in technical scheme of the invention process, the below will do to introduce simply to the accompanying drawing of required use among the embodiment, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the process flow diagram of embodiment of the invention audio recognition method;

Fig. 2 is the synoptic diagram of a kind of decoding recognition network based on WFST in the embodiment of the invention;

Fig. 3 is the synoptic diagram of the hot word dictionary of tree structure in the embodiment of the invention;

Fig. 4 is a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention;

Fig. 5 is a kind of process flow diagram that encourage in the historical path that according to hot word dictionary decoding obtained in the embodiment of the invention;

Fig. 6 is according to the spreading result in historical path a kind of realization flow figure of the accumulated history path probability in new historical path more in the embodiment of the invention;

Fig. 7 is according to the spreading result in the historical path another kind of realization flow figure of the accumulated history path probability in new historical path more in the embodiment of the invention;

Fig. 8 is a kind of concrete synoptic diagram of hot word dictionary in the embodiment of the invention;

Fig. 9 is the structural representation of embodiment of the invention speech recognition system;

Figure 10 is a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;

Figure 11 is the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;

Figure 12 is a kind of structural representation of excitation subelement in the embodiment of the invention;

Figure 13 is the another kind of structural representation of excitation subelement in the embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

For the scheme that makes the embodiment of the invention is more readily understood, and embody better difference with the existing voice identifying schemes, the below at first does simple declaration to audio recognition method basic in the prior art.

In the prior art, normally the semantic network of language model is extended to search network based on the model state layer by acoustic model and dictionary etc., namely make up the decoding recognition network, then when input speech signal is decoded, obtain new effective extensions path by each the frame voice signal that calculates input with respect to the accumulated history path probability of each acoustic model on current effective extensions path and language model.When having searched for the last frame voice signal, obtain the optimal path of decoding by recalling from the optimum node executing state with the historical path probability of cumulative maximum subsequently, obtain corresponding word sequence.

Embodiment of the invention audio recognition method and system, adopt the mode of systematic parameter revaluation to improve recognition accuracy to new gain of heat word for prior art, can't fast, accurately identify the technical matters of focus vocabulary and user individual vocabulary, current historical path is encouraged based on hot word, thereby improve the accumulated history path probability in path, hot word place, improved the recognition effect of hot word.Need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary.

As shown in Figure 1, be the process flow diagram of embodiment of the invention audio recognition method, may further comprise the steps:

Step 101 makes up the decoding recognition network.

In embodiments of the present invention, described decoding recognition network can be made up online by system, also can make up by offline mode, directly is written into when starting in system, to reduce system's operand and required memory, further improves decoding efficiency.

Step 102, to the voice signal that receives, according to described decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, according to hot word the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot word place.

The process of utilizing described decoding recognition network that the voice signal of user's input is decoded is one and searches for optimal path in this decoding recognition network, realizes the process of the conversion of speech-to-text.

Particularly, can be that the series of discrete energy value deposits data buffer area in to the continuous speech signal sampling that receives at first.

Certainly, for the robustness of further raising system, can also carry out noise reduction process to the continuous speech signal that receives first.At first by short-time energy and short-time zero-crossing rate analysis to voice signal, continuous voice signal is divided into independently voice snippet and non-voice segment, then carry out voice enhancing processing to cutting apart the voice snippet that obtains, when carrying out voice enhancing processing, can be by methods such as Wiener filterings, neighbourhood noise in the voice signal is further eliminated, to improve follow-up system to the processing power of this signal.

Consider and still can have the irrelevant redundant information of a large amount of speech recognitions in the voice signal after the noise reduction process, directly to its identification operand and recognition accuracy are reduced, for this reason, can extract identification efficient voice feature the speech energy signal after noise reduction process, and deposit in the feature buffer area.Particularly, can extract MFCC (the Mel Frequency Cepstrum Coefficient of voice, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms are done short-time analysis and are obtained MFCC parameter and single order thereof, second order difference, amount to 39 dimensions.That is to say, every frame voice signal is quantified as the characteristic sequence of one 39 dimensions.

Then, according to described decoding recognition network every frame voice signal is wherein decoded, obtain optimal path, thereby finish the decoding identifying.

In the prior art, the search procedure of optimal path is as follows: according to time sequencing from left to right, calculate the accumulated history path probability that every frame voice signal frame arrives each live-vertex in the decoding recognition network.

Particularly, for every frame voice signal frame that needs are investigated, can at first calculate in the current decoding recognition network all live-vertexs with respect to historical path and the accumulated history path probability of this voice signal frame.

Such as, for a current frame voice signal, corresponding phonetic feature sequence is: { O ₁, O ₂..., O _t, t phonetic feature O constantly wherein _tChange the path probability of live-vertex j over to

Namely from live-vertex i to this node j might historical path the maximum probability value be calculated as follows:

Wherein, i all live-vertexs that link to each other with live-vertex j in the recognition network that represent to decode; Expression (t-1) is feature O constantly _T-1Drop on the historical path probability on the live-vertex i; a _IjThe transition probability of expression from node i to node j, and b _j(o _t) expression t moment feature O _tLikelihood probability corresponding to node j.

The accumulated history path probability of live-vertex j for all with node path that live-vertex j links to each other in have the path score of cumulative maximum path probability.That is to say, in the cumulative path probability that calculates live-vertex j, also known the last node of live-vertex j, and then known the historical path of live-vertex j.

Then, obtain next frame voice signal frame, and expand backward decoding from the historical path of satisfying the systemic presupposition condition.After to last frame voice signal frame decoding, the live-vertex that wherein has the historical path probability of cumulative maximum is optimum node, recall the historical path that obtains from this optimum node by decoded state and be optimal path, the word sequence on this optimal path is decoded result.

The vocabulary that can embody well former corpus owing to the language model based on the mass data training distributes, thereby the conventional vocabulary of major part is had preferably recognition effect.And focus vocabulary and user individual vocabulary is owing to have the Extraordinary characteristics, and probability is less in original language model, thereby the decoding path score of its correspondence is often on the low side, causes correctly identifying.

For this reason, in the invention process, based on hot word the historical path of live-vertex is encouraged, keep the time-to-live of hot word in the searching route expansion, may be optimized with the path of hot word coupling thereby make in the decoding recognition network, improve the success ratio of hot word coupling, hot word identification correctness also can obtain corresponding raising.

Particularly, to may adopting different energisation modes with the path of hot word coupling in the decoding recognition network, will describe in detail in the back this.

Step 103, after finishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node.

Step 104 is recalled the word sequence that obtains optimal path and correspondence from described optimum node by decoded state.

Recall the historical path that obtains from described optimum node by decoded state and be optimal path.

This shows that embodiment of the invention audio recognition method adopts the historical path energized process based on hot word coupling, and the accumulated history path probability in path, hot word place is optimized, and has improved the recognition effect of hot word.Utilize embodiment of the invention audio recognition method, need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word dictionary of simple update system support.

The front is mentioned, and in embodiments of the present invention, described decoding recognition network can be made up online by system, also can make up by offline mode, directly is written into when starting in system, to reduce system's operand and required memory, further improves decoding efficiency.

Particularly, the structure of described decoding recognition network can utilize the structures such as default acoustic model and language model.

Wherein, described acoustic model is mainly used in simulating character sound characteristics, specifically can adopt the field of speech recognition HMM based on transition probability and transmission probability (Hidden Markov Model, hidden Markov) model commonly used.Consider that in the large vocabulary continuous speech recognition, the quantity of vocabulary is too huge, if each character is made up a HMM model, then model quantity is too many, is unfavorable for data storage and calculating.Therefore, in actual applications, can only to basic pronunciation unit, make up the HMM model such as syllable or phoneme unit.Obviously acoustic model can also adopt the other technologies means, such as neural network etc., this embodiment of the invention is not done restriction.

Wherein, described language model is in order more effectively to characterize the knowledge such as grammer and semanteme, to remedy the deficiency of acoustic model, to improve discrimination.Specifically can adopt field of speech recognition to commonly use statistical language model, utilize the mode descriptor of statistical probability and the relation between the word, namely suppose certain word w _kThe probability that occurs is only relevant with its front n-1 word, is designated as

Obviously language model also can adopt the other technologies means, such as the words equity, this embodiment of the invention is not done restriction.

The structure of described decoding recognition network can adopt construction methods more of the prior art, utilizes acoustic model described language model expansion to be become the search network of model layer.Fig. 2 shows a kind of synoptic diagram of the decoding recognition network based on WFST.Certainly, can also adopt the decoding recognition network of other modes, as based on the dynamic decoder recognition network of historical word tree copy etc.

In embodiments of the present invention, allow at any time self-defined hot word of user, to enlarge the identification range of system, adapt to the demand that vocabulary constantly changes.Particularly, after the user inputs new hot word, described hot word can be saved in hot set of words.That is to say that described hot set of words can be a hot word entry storehouse that dynamically updates.

In addition, when upgrading described hot word dictionary, can also adopt a minute word algorithm, by the existing dictionary that uses system intialization the text that the user increases self-defining hot word newly is carried out participle, make each hot word entry be expressed as a sequence that is comprised of the existing basic words unit of system, and the word segmentation result that obtains is saved in hot word dictionary, this word segmentation result can be by tree structure management shown in Figure 3, the corresponding hot word entry of each branch, specific as follows:

Hot word A:Wordi1 Wordi2 Wordi3;

Hot word B:Wordi1 Wordi4;

Hot word C:Wordi5;

...；

Hot word N:Wordij...Wordik.

This hot word dictionary can be that a hot word that dynamically updates divides dictionary.

Based on above-mentioned hot set of words or hot word dictionary, to may can adopting different energisation modes with the path of hot word coupling in the decoding recognition network, to describing in detail below this.

1. according to hot set of words the historical path that decoding obtains is optimized

As shown in Figure 4, be a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention.

In this flow process, give suitable score excitation when corresponding word sequence can consist of a hot word entry in the historical path of live-vertex, to improve the priority in this history path.

This flow process specifically may further comprise the steps:

Step 401 is obtained historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.

Step 402 judges according to described hot set of words whether contiguous word consists of a hot word on the described historical path; If so, execution in step 403 then; Otherwise, execution in step 404.

Step 403 improves the accumulated history path probability in described historical path.

Such as, can take the method for constant excitation bonus point to improve the priority in path, hot word place, so that the easier reservation in path, hot word place.The concrete numerical value of constant excitation bonus point can preset as required, and in general, the excitation bonus point is higher, and then hot word matching degree is higher, the also corresponding raising of hot word identification correctness.

The reference value that the constant excitation bonus point can be set on the basis of the hot word discrimination of balance and other non-hot word discrimination is 300, certainly, also can be other numerical value, and this embodiment of the invention is not done restriction.

Step 404 keeps the accumulated history path probability in described historical path.

Need to prove, step 402 is to step 404, need to carry out one by one the historical path of each bar that step 401 is obtained, do not comprise its path probability of historical Route maintenance of hot word in the historical path of all live-vertexs that namely step 401 obtained, and the historical path probability that comprises hot word is encouraged.

Need to prove, in actual applications, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.

In embodiments of the present invention, said process need to be after encourage according to flow process shown in Figure 4 in the historical path of all live-vertexs that the decoding to the current speech signal frame obtains, accumulated history path probability according to current all live-vertexs (comprise excitation after live-vertex and unperturbed live-vertex) is determined new live-vertex, realizes the expansion of subsequent path.

In flow process shown in Figure 4, improve the priority in path, hot word place by the method for constant excitation bonus point, realize the optimization to path, hot word place.

In actual applications, because hot word often is made of two or more participles, the above-mentioned mode that encourages when complete coupling realized in hot word entry may be reduced the purpose that does not reach excitation too early owing to path, hot word entry place, affects the accuracy rate of hot word identification.For this reason, in embodiments of the present invention, can also adopt the motivational techniques based on predicted path, improve constantly the priority in path, hot word place by the mode that progressively encourages.

2. according to hot word dictionary the historical path that decoding obtains is optimized

As shown in Figure 5, be a kind of process flow diagram that encourage in the historical path that according to hot word dictionary decoding obtained in the embodiment of the invention, may further comprise the steps:

Step 501 is obtained historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.

Step 502 judges in the historical path of described live-vertex neologisms whether occur; If so, execution in step 503 then; Otherwise, execution in step 504.

Step 503 according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, is upgraded the accumulated history path probability in described historical path.

Step 504 keeps the accumulated history path probability in described historical path.

Need to prove that the historical path for all live-vertexs of each frame voice signal frame all needs to carry out above-mentioned steps 502 to the process of step 504.And, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex being optimized, the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.

Be different from above-mentioned when hot word mates fully just to the excitation of historical path, step 503 is carried out excitation to historical path when hot word partly mates in decode procedure, namely according to the spreading result in the historical path accumulated history path probability in new historical path more, by improving in advance the path accumulated probability, can guarantee better the survival probability in path, hot word place.

The multiple diverse ways of can adopting of above-mentioned steps 503 realizes, such as:

As shown in Figure 6, be in the embodiment of the invention according to the spreading result in historical path a kind of realization flow figure of the accumulated history path probability in new historical path more, may further comprise the steps:

Step 601 is judged the whether follow-up participle of corresponding hot word participle on the described historical path of neologisms; If so, execution in step 605 then; Otherwise, execution in step 602.

Step 602 judges that whether described neologisms are the initial participle in the described hot word dictionary; If so, execution in step 603 then; Otherwise, execution in step 604.

Step 603 is carried out the bonus point corresponding with described initial participle to the accumulated history path probability in described historical path.

Step 604 keeps the accumulated history path probability in described historical path.

Step 605 is carried out the bonus point corresponding with described follow-up participle to the accumulated history path probability in described historical path.

As shown in Figure 7, be in the embodiment of the invention according to the spreading result in the historical path another kind of realization flow figure of the accumulated history path probability in new historical path more, may further comprise the steps:

Step 701 judges that whether word sequence before the above neologisms of historical path, neologisms place is a complete thermal word in the described hot word dictionary; If so, execution in step 705 then; Otherwise, execution in step 702.

Step 702 is judged the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms; If so, execution in step 703 then; Otherwise, execution in step 704.

Step 703 is carried out the bonus point corresponding with described follow-up participle to the accumulated history path probability in described historical path.

Step 704, before cancelling to the bonus point of the accumulated history path probability in described historical path.

Step 705 judges that whether described neologisms are the initial participle in the described hot word dictionary; If so, execution in step 706 then; Otherwise, execution in step 707.

Step 706 is carried out the bonus point corresponding with described initial participle to the accumulated history path probability in described historical path.

Step 707 keeps the accumulated history path probability in described historical path.

Utilize flow process shown in Figure 7, can further avoid mistakenly the excitation to historical path, non-hot word place.

In addition in actual applications, if historical path corresponding to each live-vertex only kept a hot word participle historical record, then in decoding, possibly can't guarantee that all paths that comprise hot word can both be found.For example, Word1Word2Word3 classified as in the word order of hot word A, and Word2Word4 classified as in the word order of hot word B, when Word1Word2Word4 classified as in the input word order, when being decoded, " Word2 " can preferentially match the Word2 participle of hot word A, and ignored the Word2 participle of hot word B, so that word sequence Word1Word2Word4 does not finally match on the hot word B of Word2Word4.To this, in embodiments of the present invention, can also adopt the historical path to single live-vertex to keep the rationality that the historical method of a plurality of hot word participle couplings improves hot word coupling.Namely when " Word2 " decoded, not only keep the part coupling history of the hot word A of " Word1Word2 ", the part of the hot word B of reservation " Word2 " coupling is historical simultaneously, and to a plurality of excitation added values of the corresponding preservation in same historical path.Subsequently in subsequent decoding when definite a certain hot word participle matching result can not continue expansion, before cancelling again to described historical path because the bonus point of the accumulated history path probability that hot word participle produces.

In actual applications, can path probability be encouraged according to the matching degree of decoding gained word and hot word participle.Such as, can when neologisms are obtained in decoding, obtain the hot word participle weight of system intialization, and the path is encouraged.Especially, the weight of hot word participle head and the tail character that can also be by hot word dictionary for word segmentation is set, and other participle weights simply are set to 0 to simplify the process of motivation.Such as, suppose current hot word dictionary as shown in Figure 8, consisted of by " China, China, the people, people ".During decoding identification, if occur on the historical path of certain live-vertex of obtaining of decoding " in " or when " people ", during namely with the initial character of hot word dictionary coupling, to described historical path give with corresponding hot word participle " in " or " people " corresponding score value excitation, namely the accumulated history path probability in described historical path is carried out the bonus point corresponding with described initial character.Subsequently, when subsequent expansion is carried out in path, hot word participle place, if occur " state; China " or " people; " on the subsequent expansion path, consist of " China " " China ", " people ", when the hot word of " people " or the hot word of part, then continue this path to be given and " state " " China ", " people ", " " corresponding score value excitation.Otherwise when described hot word participle place Path extension is other non-hot word words, the path, place is not encouraged, perhaps delete the excitation that increases previously.

As seen, the audio recognition method of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages, and realizes the optimization to path, hot word place, has improved the recognition effect of hot word.

Correspondingly, the embodiment of the invention also provides a kind of speech recognition system, as shown in Figure 9, is a kind of structural representation of this system.

In this embodiment, described system comprises:

Network struction unit 901 is used for making up the decoding recognition network;

Decoding unit 902 is used for the voice signal to receiving, and according to described decoding recognition network every frame voice signal frame is wherein decoded;

Exciting unit 903 is used for the historical path of live-vertex being encouraged according to hot word at the decode procedure of described decoding unit 902, to improve the accumulated history path probability in path, hot word place;

Optimum node determination unit 904 is used for after described decoding unit 902 is finished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;

Trace unit 905 is used for recalling the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.

In embodiments of the present invention, network struction unit 901 specifically can make up described decoding recognition network by online mode, also can make up described decoding recognition network by offline mode, when starting, system directly is written into the decoding recognition network that has made up, can reduce system's operand and required memory, further improve decoding efficiency.Network struction unit 901 specifically can utilize the structures such as default acoustic model and language model, is not described in detail at this.

In embodiments of the present invention, the process that decoding unit 902 utilizes described decoding recognition network that the voice signal of user's input is decoded is one and calculates the process that every frame voice signal frame arrives the accumulated history path probability of each live-vertex in the decoding recognition network, after decoding unit 902 is to each frame voice signal frame decoding, can obtain historical path and the accumulated history path probability of current all live-vertexs, exciting unit 903 can adopt multitude of different ways according to hot word the historical path of live-vertex to be encouraged, improve the accumulated history path probability in path, hot word place, specifically will describe in detail in the back.

Need to prove, in actual applications, before decoding unit 902 carries out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.

This shows that embodiment of the invention speech recognition system adopts the historical path energized process based on hot word coupling, and the accumulated history path probability in path, hot word place is optimized, and has improved the recognition effect of hot word.Utilize embodiment of the invention speech recognition system, need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word dictionary of simple update system support.

In embodiments of the present invention, allow at any time self-defined hot word of user, to enlarge the identification range of system, adapt to the demand that vocabulary constantly changes.

As shown in figure 10, be a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.

In this embodiment, described system also comprises: hot set of words updating block 911 is used for obtaining the hot word of user's input, and described hot word is saved in hot set of words.This hot set of words can be a hot word entry storehouse that dynamically updates.

Correspondingly, exciting unit 913 encourages the historical path of live-vertex according to described hot set of words in decode procedure.

In this embodiment, exciting unit 913 specifically comprises: obtain subelement 9131, judgment sub-unit 9132 and excitation subelement 9133.Wherein:

Obtain subelement 9131, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that 702 pairs of current speech signal frames decodings of decoding unit obtain;

Judgment sub-unit 9132 is used for judging according to described hot set of words whether contiguous word consists of a hot word on the described historical path;

Excitation subelement 9133 is used for improving the accumulated history path probability in described historical path after described judgment sub-unit 9132 judges that the contiguous word in described historical path consists of a hot word; After described judgment sub-unit 9132 judges that the contiguous word in described historical path can not consist of a hot word, keep the accumulated history path probability in described historical path.When encouraging, can take the method for constant excitation bonus point to improve the priority in path, hot word place, so that the easier reservation in path, hot word place.The concrete numerical value of constant excitation bonus point can preset as required, and in general, the excitation bonus point is higher, and then hot word matching degree is higher, the also corresponding raising of hot word identification correctness.

The speech recognition system of this embodiment gives suitable score excitation when corresponding word sequence can consist of a hot word entry in the historical path of live-vertex, realize the optimization to path, hot word place, has improved the recognition effect of hot word.

In actual applications, because hot word often is made of two or more participles, the above-mentioned mode that encourages when complete coupling realized in hot word entry may be reduced the purpose that does not reach excitation too early owing to path, hot word entry place, affects the accuracy rate of hot word identification.For this reason, in another embodiment of speech recognition system of the present invention, exciting unit can also adopt the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages.

As shown in figure 11, be the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.

In this embodiment, described system also comprises: hot word acquiring unit 921 and cutting unit 922, and wherein, hot word acquiring unit 921 is used for obtaining the hot word of user's input; The hot word that cutting unit 922 is used for described hot word acquiring unit is obtained carries out the text participle, and the participle that obtains is saved in hot word dictionary.

Correspondingly, exciting unit 923 encourages the historical path of live-vertex according to described hot word dictionary in decode procedure.

In this embodiment, exciting unit 923 specifically comprises: obtain subelement 9231, neologisms judgment sub-unit 9232 and excitation subelement 9233.Wherein:

Obtain subelement 9231, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;

Neologisms judgment sub-unit 9232 is used for judging whether the historical path of described live-vertex neologisms occur;

Excitation subelement 9233, be used for after described neologisms judgment sub-unit 9232 judges that neologisms appear in the historical path of described live-vertex, according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path; Otherwise the accumulated history path probability that keeps described historical path.

In this embodiment, the excitation subelement 9133 that is different among above-mentioned Figure 10 just encourages historical path when hot word mates fully, excitation subelement 9233 is carried out excitation to historical path when hot word partly mates in decode procedure, namely according to the spreading result in the historical path accumulated history path probability in new historical path more, by improving in advance the path accumulated probability, can guarantee better the survival probability in path, hot word place.Described excitation subelement 9233 can have multiple implementation, and the below is described in detail for example.

As shown in figure 12, be a kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.

In this embodiment, described excitation subelement comprises:

The first judgment sub-unit 121 is used for judging the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

First processes subelement 122, be used for after described the first judgment sub-unit 121 judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path, the accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle;

The second judgment sub-unit 123 is used for after described the first judgment sub-unit 121 judges that described neologisms are not the follow-up participle of hot word participle corresponding on the described historical path, judges that whether described neologisms are the initial participle in the described hot word dictionary;

Second processes subelement 124, is used for after described the second judgment sub-unit 123 judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path.

As shown in figure 13, be the another kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.

In this embodiment, described excitation subelement comprises:

Hot word judgment sub-unit 131 is used for judging the word sequence complete thermal word of described hot word dictionary whether before the above neologisms of historical path, described neologisms place;

The 3rd judgment sub-unit 132, when being used for word sequence before described hot word judgment sub-unit 131 is judged the above neologisms of historical path, described neologisms place and being a complete thermal word of described hot word dictionary, judge the whether initial participle in the described hot word dictionary of described neologisms;

The 3rd processes subelement 133, is used for after described the 3rd judgment sub-unit 132 judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path;

The 4th judgment sub-unit 134, when being used for word sequence before described hot word judgment sub-unit 131 is judged the above neologisms of historical path, described neologisms place and not being a complete thermal word of described hot word dictionary, judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

Manages subelement 135 everywhere, be used for after described the 4th judgment sub-unit 134 judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path, the accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in described historical path.

Certainly, in actual applications, described excitation subelement can also have other implementation, and this embodiment of the invention is not done restriction.Such as, it is historical to keep to the historical path of single live-vertex a plurality of hot word participle couplings, in subsequent decoding when definite a certain hot word participle matching result can not continue expansion, before cancelling again to described historical path because the bonus point of the accumulated history path probability that hot word participle produces further improves the rationality that hot word mates with this.

The speech recognition system of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages, and realizes the optimization to path, hot word place, has improved the recognition effect of hot word.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for system embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above only is schematically, and wherein said unit and module as the separating component explanation can or can not be physically to separate also.In addition, can also select according to the actual needs wherein some or all of unit and the module purpose that realizes the present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation of not paying creative work.

More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and not breaking away from some improvements and modifications of doing under the principle of the invention prerequisite, all should drop in protection scope of the present invention.

Claims

1. an audio recognition method is characterized in that, comprising:

Make up the decoding recognition network;

2. the method for claim 1 is characterized in that, described method also comprises: obtain the hot word of user's input, and described hot word is saved in hot set of words;

Described according to hot word the historical path of live-vertex the excitation in decode procedure comprises: according to described hot set of words the historical path of live-vertex is encouraged in decode procedure.

3. method as claimed in claim 2 is characterized in that, described according to described hot set of words the historical path of live-vertex the excitation in decode procedure comprises:

Obtain historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;

Judge according to described hot set of words whether contiguous word consists of a hot word on the described historical path;

If so, then improve the accumulated history path probability in described historical path;

If not, the accumulated history path probability that then keeps described historical path.

4. the method for claim 1 is characterized in that, described method also comprises: obtain the hot word of user's input, described hot word is carried out the text participle, and the participle that obtains is saved in hot word dictionary;

Described according to hot word the historical path of live-vertex the excitation in decode procedure comprises: according to described hot word dictionary the historical path of live-vertex is encouraged in decode procedure.

5. method as claimed in claim 4 is characterized in that, described according to described hot word dictionary the historical path of live-vertex the excitation in decode procedure comprises:

Judge in the historical path of described live-vertex and neologisms whether occur;

If so, then according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path;

6. method as claimed in claim 5 is characterized in that, described according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, the accumulated history path probability of upgrading described historical path comprises:

Judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described follow-up participle;

If not, judge that then whether described neologisms are the initial participle in the described hot word dictionary;

If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described initial participle;

7. method as claimed in claim 5 is characterized in that, described according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, the accumulated history path probability of upgrading described historical path comprises:

If the word sequence before the above neologisms of historical path, described neologisms place is a complete thermal word in the described hot word dictionary, judge that then whether described neologisms are the initial participle in the described hot word dictionary;

If not, the accumulated history path probability that then keeps described historical path;

If the word sequence before the above neologisms of historical path, described neologisms place is not a complete thermal word in the described hot word dictionary, then judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

If not, then cancel before to the bonus point of the accumulated history path probability in described historical path.

8. a speech recognition system is characterized in that, comprising:

9. system as claimed in claim 8 is characterized in that, described system also comprises:

Hot set of words updating block is used for obtaining the hot word of user's input, and described hot word is saved in hot set of words;

Described exciting unit encourages the historical path of live-vertex according to described hot set of words in described decoding unit decodes process.

10. system as claimed in claim 9 is characterized in that, described exciting unit comprises:

Obtain subelement, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;

Judgment sub-unit is used for judging according to described hot set of words whether contiguous word consists of a hot word on the described historical path;

The excitation subelement is used for improving the accumulated history path probability in described historical path after described judgment sub-unit judges that the contiguous word in described historical path consists of a hot word; After described judgment sub-unit judges that the contiguous word in described historical path can not consist of a hot word, keep the accumulated history path probability in described historical path.

11. system as claimed in claim 8 is characterized in that, described system also comprises:

Hot word acquiring unit is used for obtaining the hot word of user's input;

The cutting unit, the hot word that is used for described hot word acquiring unit is obtained carries out the text participle, and the participle that obtains is saved in hot word dictionary;

Described exciting unit encourages the historical path of live-vertex according to described hot word dictionary in described decoding unit decodes process.

12. system as claimed in claim 11 is characterized in that, described exciting unit comprises:

The neologisms judgment sub-unit is used for judging whether the historical path of described live-vertex neologisms occur;

The excitation subelement, be used for after described neologisms judgment sub-unit judges that neologisms appear in the historical path of described live-vertex, according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path; Otherwise the accumulated history path probability that keeps described historical path.

13. system as claimed in claim 12 is characterized in that, described excitation subelement comprises:

The first judgment sub-unit is used for judging the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

First processes subelement, is used for after described the first judgment sub-unit judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle;

The second judgment sub-unit is used for after described the first judgment sub-unit judges that described neologisms are not the follow-up participle of hot word participle corresponding on the described historical path, judges that whether described neologisms are the initial participle in the described hot word dictionary;

Second processes subelement, is used for after described the second judgment sub-unit judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path.

14. system as claimed in claim 12 is characterized in that, described excitation subelement comprises:

Hot word judgment sub-unit is used for judging the word sequence complete thermal word of described hot word dictionary whether before the above neologisms of historical path, described neologisms place;

The 3rd judgment sub-unit, when being used for word sequence before described hot word judgment sub-unit is judged the above neologisms of historical path, described neologisms place and being a complete thermal word of described hot word dictionary, judge the whether initial participle in the described hot word dictionary of described neologisms;

The 3rd processes subelement, is used for after described the 3rd judgment sub-unit judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path;

The 4th judgment sub-unit, when being used for word sequence before described hot word judgment sub-unit is judged the above neologisms of historical path, described neologisms place and not being a complete thermal word of described hot word dictionary, judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;

The manages subelement everywhere, is used for after described the 4th judgment sub-unit judges that described neologisms are the follow-up participle of hot word participle corresponding on the described historical path accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in described historical path.