CN102592595B - Voice recognition method and system - Google Patents

Voice recognition method and system Download PDF

Info

Publication number
CN102592595B
CN102592595B CN2012100734129A CN201210073412A CN102592595B CN 102592595 B CN102592595 B CN 102592595B CN 2012100734129 A CN2012100734129 A CN 2012100734129A CN 201210073412 A CN201210073412 A CN 201210073412A CN 102592595 B CN102592595 B CN 102592595B
Authority
CN
China
Prior art keywords
path
hot word
historical path
participle
neologisms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2012100734129A
Other languages
Chinese (zh)
Other versions
CN102592595A (en
Inventor
潘青华
鹿晓亮
何婷婷
王智国
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN2012100734129A priority Critical patent/CN102592595B/en
Publication of CN102592595A publication Critical patent/CN102592595A/en
Application granted granted Critical
Publication of CN102592595B publication Critical patent/CN102592595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of voice recognition and discloses voice recognition method and system. The method comprises the following steps: a decoding recognition network is built; each voice signal frame of received voice signals is decoded according to the decoding recognition network, and in the decoding process, historical pathways of active nodes are excitated according to hot words so as to improve the accumulation probability of the historical pathways where the hot words are positioned; the active node with maximal cumulative probability is selected to serve as an optimal node until the last voice signal frame is decoded; and the optimal node is traced back to an optimal pathway and a corresponding word sequence from the decoding state. According to the invention, system parameter reassessment is avoided, and hot words and user personalized words can be recognized quickly and accurately, so that the recognition effect of the hot words is improved.

Description

Audio recognition method and system
Technical field
The present invention relates to the speech recognition technology field, particularly a kind of audio recognition method and system.
Background technology
Realize man-machine between hommization, intelligentized effectively mutual, make up man-machine communication's environment of efficient natural, become the active demand of current information technology application and development.In recent years, along with the develop rapidly of speech recognition technology, the various online speech recognition application such as phonetic entry, phonetic search have received increasing concern.System based on the mass data training can satisfy the needs that phonetic entry commonly used is write in advance, and recognition accuracy is often higher when the phonetic entry content meets the distribution of original language model probability especially.Yet in actual applications, mobile Internet and social networks fast development are constantly producing new much-talked-about topic and corresponding focus vocabulary, also there is the identification demand of different personalized vocabulary in different user, as get in touch with name etc., these focus vocabulary or personalized vocabulary are because ageing often occurrence frequency is lower in the language material of acquired original with specificity, thereby the original language model often covers deficiency to such vocabulary, and then causes corresponding recognition system can not accurately identify such hot word.
For this reason, often adopt in the prior art the method for systematic parameter revaluation, after the hot word material that will newly collect adds former corpus, again train new language model to improve the recognition accuracy to new gain of heat word.Yet in actual applications, hot word update frequency is often higher, and system can't in time collect enough language materials and participate in the systematic parameter revaluation, and then impact is to the recognition effect of hot word.On the other hand, again the training of language model and recognition system resource are (as based on WFST (Weighted Finite-State Transducers, the weighting FST) structure decoding recognition network) is often time-consuming more, cost is larger, can't realize the quick response to hot word identification.
Summary of the invention
The embodiment of the invention provides a kind of audio recognition method and system, can't fast, accurately identify the technical matters of focus vocabulary and user individual vocabulary to solve prior art.
For this reason, the embodiment of the invention provides following technical scheme:
A kind of audio recognition method comprises:
Make up the decoding recognition network;
To the voice signal that receives, according to described decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, according to hot word the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot word place;
After finishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node;
Recall the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.
A kind of speech recognition system comprises:
The network struction unit is used for making up the decoding recognition network;
Decoding unit is used for the voice signal to receiving, and according to described decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit is used for the historical path of live-vertex being encouraged according to hot word at the decode procedure of described decoding unit, to improve the accumulated history path probability in path, hot word place;
Optimum node determination unit is used for after described decoding unit is finished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit is used for recalling the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.
Embodiment of the invention audio recognition method and system, employing encourages the historical path of live-vertex based on hot word coupling, to improve the accumulated history path probability in path, hot word place, realized the effective excitation to hot word identification, improved the recognition effect of hot word.Need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word entry of simple update system support.
Description of drawings
In order to be illustrated more clearly in technical scheme of the invention process, the below will do to introduce simply to the accompanying drawing of required use among the embodiment, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of embodiment of the invention audio recognition method;
Fig. 2 is the synoptic diagram of a kind of decoding recognition network based on WFST in the embodiment of the invention;
Fig. 3 is the synoptic diagram of the hot word dictionary of tree structure in the embodiment of the invention;
Fig. 4 is a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention;
Fig. 5 is a kind of process flow diagram that encourage in the historical path that according to hot word dictionary decoding obtained in the embodiment of the invention;
Fig. 6 is according to the spreading result in historical path a kind of realization flow figure of the accumulated history path probability in new historical path more in the embodiment of the invention;
Fig. 7 is according to the spreading result in the historical path another kind of realization flow figure of the accumulated history path probability in new historical path more in the embodiment of the invention;
Fig. 8 is a kind of concrete synoptic diagram of hot word dictionary in the embodiment of the invention;
Fig. 9 is the structural representation of embodiment of the invention speech recognition system;
Figure 10 is a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;
Figure 11 is the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system;
Figure 12 is a kind of structural representation of excitation subelement in the embodiment of the invention;
Figure 13 is the another kind of structural representation of excitation subelement in the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
For the scheme that makes the embodiment of the invention is more readily understood, and embody better difference with the existing voice identifying schemes, the below at first does simple declaration to audio recognition method basic in the prior art.
In the prior art, normally the semantic network of language model is extended to search network based on the model state layer by acoustic model and dictionary etc., namely make up the decoding recognition network, then when input speech signal is decoded, obtain new effective extensions path by each the frame voice signal that calculates input with respect to the accumulated history path probability of each acoustic model on current effective extensions path and language model.When having searched for the last frame voice signal, obtain the optimal path of decoding by recalling from the optimum node executing state with the historical path probability of cumulative maximum subsequently, obtain corresponding word sequence.
Embodiment of the invention audio recognition method and system, adopt the mode of systematic parameter revaluation to improve recognition accuracy to new gain of heat word for prior art, can't fast, accurately identify the technical matters of focus vocabulary and user individual vocabulary, current historical path is encouraged based on hot word, thereby improve the accumulated history path probability in path, hot word place, improved the recognition effect of hot word.Need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary.
As shown in Figure 1, be the process flow diagram of embodiment of the invention audio recognition method, may further comprise the steps:
Step 101 makes up the decoding recognition network.
In embodiments of the present invention, described decoding recognition network can be made up online by system, also can make up by offline mode, directly is written into when starting in system, to reduce system's operand and required memory, further improves decoding efficiency.
Step 102, to the voice signal that receives, according to described decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, according to hot word the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot word place.
The process of utilizing described decoding recognition network that the voice signal of user's input is decoded is one and searches for optimal path in this decoding recognition network, realizes the process of the conversion of speech-to-text.
Particularly, can be that the series of discrete energy value deposits data buffer area in to the continuous speech signal sampling that receives at first.
Certainly, for the robustness of further raising system, can also carry out noise reduction process to the continuous speech signal that receives first.At first by short-time energy and short-time zero-crossing rate analysis to voice signal, continuous voice signal is divided into independently voice snippet and non-voice segment, then carry out voice enhancing processing to cutting apart the voice snippet that obtains, when carrying out voice enhancing processing, can be by methods such as Wiener filterings, neighbourhood noise in the voice signal is further eliminated, to improve follow-up system to the processing power of this signal.
Consider and still can have the irrelevant redundant information of a large amount of speech recognitions in the voice signal after the noise reduction process, directly to its identification operand and recognition accuracy are reduced, for this reason, can extract identification efficient voice feature the speech energy signal after noise reduction process, and deposit in the feature buffer area.Particularly, can extract MFCC (the Mel Frequency Cepstrum Coefficient of voice, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms are done short-time analysis and are obtained MFCC parameter and single order thereof, second order difference, amount to 39 dimensions.That is to say, every frame voice signal is quantified as the characteristic sequence of one 39 dimensions.
Then, according to described decoding recognition network every frame voice signal is wherein decoded, obtain optimal path, thereby finish the decoding identifying.
In the prior art, the search procedure of optimal path is as follows: according to time sequencing from left to right, calculate the accumulated history path probability that every frame voice signal frame arrives each live-vertex in the decoding recognition network.
Particularly, for every frame voice signal frame that needs are investigated, can at first calculate in the current decoding recognition network all live-vertexs with respect to historical path and the accumulated history path probability of this voice signal frame.
Such as, for a current frame voice signal, corresponding phonetic feature sequence is: { O 1, O 2..., O t, t phonetic feature O constantly wherein tChange the path probability of live-vertex j over to
Figure BDA0000144773320000051
Namely from live-vertex i to this node j might historical path the maximum probability value be calculated as follows:
Figure BDA0000144773320000052
Wherein, i all live-vertexs that link to each other with live-vertex j in the recognition network that represent to decode; Expression (t-1) is feature O constantly T-1Drop on the historical path probability on the live-vertex i; a IjThe transition probability of expression from node i to node j, and b j(o t) expression t moment feature O tLikelihood probability corresponding to node j.
The accumulated history path probability of live-vertex j for all with node path that live-vertex j links to each other in have the path score of cumulative maximum path probability.That is to say, in the cumulative path probability that calculates live-vertex j, also known the last node of live-vertex j, and then known the historical path of live-vertex j.
Then, obtain next frame voice signal frame, and expand backward decoding from the historical path of satisfying the systemic presupposition condition.After to last frame voice signal frame decoding, the live-vertex that wherein has the historical path probability of cumulative maximum is optimum node, recall the historical path that obtains from this optimum node by decoded state and be optimal path, the word sequence on this optimal path is decoded result.
The vocabulary that can embody well former corpus owing to the language model based on the mass data training distributes, thereby the conventional vocabulary of major part is had preferably recognition effect.And focus vocabulary and user individual vocabulary is owing to have the Extraordinary characteristics, and probability is less in original language model, thereby the decoding path score of its correspondence is often on the low side, causes correctly identifying.
For this reason, in the invention process, based on hot word the historical path of live-vertex is encouraged, keep the time-to-live of hot word in the searching route expansion, may be optimized with the path of hot word coupling thereby make in the decoding recognition network, improve the success ratio of hot word coupling, hot word identification correctness also can obtain corresponding raising.
Particularly, to may adopting different energisation modes with the path of hot word coupling in the decoding recognition network, will describe in detail in the back this.
Step 103, after finishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node.
Step 104 is recalled the word sequence that obtains optimal path and correspondence from described optimum node by decoded state.
Recall the historical path that obtains from described optimum node by decoded state and be optimal path.
This shows that embodiment of the invention audio recognition method adopts the historical path energized process based on hot word coupling, and the accumulated history path probability in path, hot word place is optimized, and has improved the recognition effect of hot word.Utilize embodiment of the invention audio recognition method, need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word dictionary of simple update system support.
The front is mentioned, and in embodiments of the present invention, described decoding recognition network can be made up online by system, also can make up by offline mode, directly is written into when starting in system, to reduce system's operand and required memory, further improves decoding efficiency.
Particularly, the structure of described decoding recognition network can utilize the structures such as default acoustic model and language model.
Wherein, described acoustic model is mainly used in simulating character sound characteristics, specifically can adopt the field of speech recognition HMM based on transition probability and transmission probability (Hidden Markov Model, hidden Markov) model commonly used.Consider that in the large vocabulary continuous speech recognition, the quantity of vocabulary is too huge, if each character is made up a HMM model, then model quantity is too many, is unfavorable for data storage and calculating.Therefore, in actual applications, can only to basic pronunciation unit, make up the HMM model such as syllable or phoneme unit.Obviously acoustic model can also adopt the other technologies means, such as neural network etc., this embodiment of the invention is not done restriction.
Wherein, described language model is in order more effectively to characterize the knowledge such as grammer and semanteme, to remedy the deficiency of acoustic model, to improve discrimination.Specifically can adopt field of speech recognition to commonly use statistical language model, utilize the mode descriptor of statistical probability and the relation between the word, namely suppose certain word w kThe probability that occurs is only relevant with its front n-1 word, is designated as
Figure BDA0000144773320000071
Obviously language model also can adopt the other technologies means, such as the words equity, this embodiment of the invention is not done restriction.
The structure of described decoding recognition network can adopt construction methods more of the prior art, utilizes acoustic model described language model expansion to be become the search network of model layer.Fig. 2 shows a kind of synoptic diagram of the decoding recognition network based on WFST.Certainly, can also adopt the decoding recognition network of other modes, as based on the dynamic decoder recognition network of historical word tree copy etc.
In embodiments of the present invention, allow at any time self-defined hot word of user, to enlarge the identification range of system, adapt to the demand that vocabulary constantly changes.Particularly, after the user inputs new hot word, described hot word can be saved in hot set of words.That is to say that described hot set of words can be a hot word entry storehouse that dynamically updates.
In addition, when upgrading described hot word dictionary, can also adopt a minute word algorithm, by the existing dictionary that uses system intialization the text that the user increases self-defining hot word newly is carried out participle, make each hot word entry be expressed as a sequence that is comprised of the existing basic words unit of system, and the word segmentation result that obtains is saved in hot word dictionary, this word segmentation result can be by tree structure management shown in Figure 3, the corresponding hot word entry of each branch, specific as follows:
Hot word A:Wordi1 Wordi2 Wordi3;
Hot word B:Wordi1 Wordi4;
Hot word C:Wordi5;
...;
Hot word N:Wordij...Wordik.
This hot word dictionary can be that a hot word that dynamically updates divides dictionary.
Based on above-mentioned hot set of words or hot word dictionary, to may can adopting different energisation modes with the path of hot word coupling in the decoding recognition network, to describing in detail below this.
1. according to hot set of words the historical path that decoding obtains is optimized
As shown in Figure 4, be a kind of process flow diagram that encourage in the historical path that according to hot set of words decoding obtained in the embodiment of the invention.
In this flow process, give suitable score excitation when corresponding word sequence can consist of a hot word entry in the historical path of live-vertex, to improve the priority in this history path.
This flow process specifically may further comprise the steps:
Step 401 is obtained historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.
Step 402 judges according to described hot set of words whether contiguous word consists of a hot word on the described historical path; If so, execution in step 403 then; Otherwise, execution in step 404.
Step 403 improves the accumulated history path probability in described historical path.
Such as, can take the method for constant excitation bonus point to improve the priority in path, hot word place, so that the easier reservation in path, hot word place.The concrete numerical value of constant excitation bonus point can preset as required, and in general, the excitation bonus point is higher, and then hot word matching degree is higher, the also corresponding raising of hot word identification correctness.
The reference value that the constant excitation bonus point can be set on the basis of the hot word discrimination of balance and other non-hot word discrimination is 300, certainly, also can be other numerical value, and this embodiment of the invention is not done restriction.
Step 404 keeps the accumulated history path probability in described historical path.
Need to prove, step 402 is to step 404, need to carry out one by one the historical path of each bar that step 401 is obtained, do not comprise its path probability of historical Route maintenance of hot word in the historical path of all live-vertexs that namely step 401 obtained, and the historical path probability that comprises hot word is encouraged.
Need to prove, in actual applications, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
In embodiments of the present invention, said process need to be after encourage according to flow process shown in Figure 4 in the historical path of all live-vertexs that the decoding to the current speech signal frame obtains, accumulated history path probability according to current all live-vertexs (comprise excitation after live-vertex and unperturbed live-vertex) is determined new live-vertex, realizes the expansion of subsequent path.
In flow process shown in Figure 4, improve the priority in path, hot word place by the method for constant excitation bonus point, realize the optimization to path, hot word place.
In actual applications, because hot word often is made of two or more participles, the above-mentioned mode that encourages when complete coupling realized in hot word entry may be reduced the purpose that does not reach excitation too early owing to path, hot word entry place, affects the accuracy rate of hot word identification.For this reason, in embodiments of the present invention, can also adopt the motivational techniques based on predicted path, improve constantly the priority in path, hot word place by the mode that progressively encourages.
2. according to hot word dictionary the historical path that decoding obtains is optimized
As shown in Figure 5, be a kind of process flow diagram that encourage in the historical path that according to hot word dictionary decoding obtained in the embodiment of the invention, may further comprise the steps:
Step 501 is obtained historical path and the accumulated history path probability of all live-vertexs that current speech signal frame decoding is obtained.
Step 502 judges in the historical path of described live-vertex neologisms whether occur; If so, execution in step 503 then; Otherwise, execution in step 504.
Step 503 according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, is upgraded the accumulated history path probability in described historical path.
Step 504 keeps the accumulated history path probability in described historical path.
Need to prove that the historical path for all live-vertexs of each frame voice signal frame all needs to carry out above-mentioned steps 502 to the process of step 504.And, before carrying out next frame voice signal frame decoding, can also the historical path of live-vertex being optimized, the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
Be different from above-mentioned when hot word mates fully just to the excitation of historical path, step 503 is carried out excitation to historical path when hot word partly mates in decode procedure, namely according to the spreading result in the historical path accumulated history path probability in new historical path more, by improving in advance the path accumulated probability, can guarantee better the survival probability in path, hot word place.
The multiple diverse ways of can adopting of above-mentioned steps 503 realizes, such as:
As shown in Figure 6, be in the embodiment of the invention according to the spreading result in historical path a kind of realization flow figure of the accumulated history path probability in new historical path more, may further comprise the steps:
Step 601 is judged the whether follow-up participle of corresponding hot word participle on the described historical path of neologisms; If so, execution in step 605 then; Otherwise, execution in step 602.
Step 602 judges that whether described neologisms are the initial participle in the described hot word dictionary; If so, execution in step 603 then; Otherwise, execution in step 604.
Step 603 is carried out the bonus point corresponding with described initial participle to the accumulated history path probability in described historical path.
Step 604 keeps the accumulated history path probability in described historical path.
Step 605 is carried out the bonus point corresponding with described follow-up participle to the accumulated history path probability in described historical path.
As shown in Figure 7, be in the embodiment of the invention according to the spreading result in the historical path another kind of realization flow figure of the accumulated history path probability in new historical path more, may further comprise the steps:
Step 701 judges that whether word sequence before the above neologisms of historical path, neologisms place is a complete thermal word in the described hot word dictionary; If so, execution in step 705 then; Otherwise, execution in step 702.
Step 702 is judged the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms; If so, execution in step 703 then; Otherwise, execution in step 704.
Step 703 is carried out the bonus point corresponding with described follow-up participle to the accumulated history path probability in described historical path.
Step 704, before cancelling to the bonus point of the accumulated history path probability in described historical path.
Step 705 judges that whether described neologisms are the initial participle in the described hot word dictionary; If so, execution in step 706 then; Otherwise, execution in step 707.
Step 706 is carried out the bonus point corresponding with described initial participle to the accumulated history path probability in described historical path.
Step 707 keeps the accumulated history path probability in described historical path.
Utilize flow process shown in Figure 7, can further avoid mistakenly the excitation to historical path, non-hot word place.
In addition in actual applications, if historical path corresponding to each live-vertex only kept a hot word participle historical record, then in decoding, possibly can't guarantee that all paths that comprise hot word can both be found.For example, Word1Word2Word3 classified as in the word order of hot word A, and Word2Word4 classified as in the word order of hot word B, when Word1Word2Word4 classified as in the input word order, when being decoded, " Word2 " can preferentially match the Word2 participle of hot word A, and ignored the Word2 participle of hot word B, so that word sequence Word1Word2Word4 does not finally match on the hot word B of Word2Word4.To this, in embodiments of the present invention, can also adopt the historical path to single live-vertex to keep the rationality that the historical method of a plurality of hot word participle couplings improves hot word coupling.Namely when " Word2 " decoded, not only keep the part coupling history of the hot word A of " Word1Word2 ", the part of the hot word B of reservation " Word2 " coupling is historical simultaneously, and to a plurality of excitation added values of the corresponding preservation in same historical path.Subsequently in subsequent decoding when definite a certain hot word participle matching result can not continue expansion, before cancelling again to described historical path because the bonus point of the accumulated history path probability that hot word participle produces.
In actual applications, can path probability be encouraged according to the matching degree of decoding gained word and hot word participle.Such as, can when neologisms are obtained in decoding, obtain the hot word participle weight of system intialization, and the path is encouraged.Especially, the weight of hot word participle head and the tail character that can also be by hot word dictionary for word segmentation is set, and other participle weights simply are set to 0 to simplify the process of motivation.Such as, suppose current hot word dictionary as shown in Figure 8, consisted of by " China, China, the people, people ".During decoding identification, if occur on the historical path of certain live-vertex of obtaining of decoding " in " or when " people ", during namely with the initial character of hot word dictionary coupling, to described historical path give with corresponding hot word participle " in " or " people " corresponding score value excitation, namely the accumulated history path probability in described historical path is carried out the bonus point corresponding with described initial character.Subsequently, when subsequent expansion is carried out in path, hot word participle place, if occur " state; China " or " people; " on the subsequent expansion path, consist of " China " " China ", " people ", when the hot word of " people " or the hot word of part, then continue this path to be given and " state " " China ", " people ", " " corresponding score value excitation.Otherwise when described hot word participle place Path extension is other non-hot word words, the path, place is not encouraged, perhaps delete the excitation that increases previously.
As seen, the audio recognition method of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages, and realizes the optimization to path, hot word place, has improved the recognition effect of hot word.
Correspondingly, the embodiment of the invention also provides a kind of speech recognition system, as shown in Figure 9, is a kind of structural representation of this system.
In this embodiment, described system comprises:
Network struction unit 901 is used for making up the decoding recognition network;
Decoding unit 902 is used for the voice signal to receiving, and according to described decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit 903 is used for the historical path of live-vertex being encouraged according to hot word at the decode procedure of described decoding unit 902, to improve the accumulated history path probability in path, hot word place;
Optimum node determination unit 904 is used for after described decoding unit 902 is finished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit 905 is used for recalling the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.
In embodiments of the present invention, network struction unit 901 specifically can make up described decoding recognition network by online mode, also can make up described decoding recognition network by offline mode, when starting, system directly is written into the decoding recognition network that has made up, can reduce system's operand and required memory, further improve decoding efficiency.Network struction unit 901 specifically can utilize the structures such as default acoustic model and language model, is not described in detail at this.
In embodiments of the present invention, the process that decoding unit 902 utilizes described decoding recognition network that the voice signal of user's input is decoded is one and calculates the process that every frame voice signal frame arrives the accumulated history path probability of each live-vertex in the decoding recognition network, after decoding unit 902 is to each frame voice signal frame decoding, can obtain historical path and the accumulated history path probability of current all live-vertexs, exciting unit 903 can adopt multitude of different ways according to hot word the historical path of live-vertex to be encouraged, improve the accumulated history path probability in path, hot word place, specifically will describe in detail in the back.
Need to prove, in actual applications, before decoding unit 902 carries out next frame voice signal frame decoding, can also the historical path of live-vertex be optimized that the impossible path of deletion is to improve subsequent searches efficient.Particularly, can adopt Based on Probability to reduce the method for thresholding, at first add up the mxm. of current live-vertex accumulated history path probability, then calculate respectively the accumulated history path probability of each live-vertex and the difference of this mxm., difference wherein is made as non-live-vertex greater than the live-vertex of the cutting thresholding that sets in advance, and from the subsequent searches path, wipe out, end the subsequent searches that begins from this node.
This shows that embodiment of the invention speech recognition system adopts the historical path energized process based on hot word coupling, and the accumulated history path probability in path, hot word place is optimized, and has improved the recognition effect of hot word.Utilize embodiment of the invention speech recognition system, need not the systematic parameter revaluation, can fast, accurately identify focus vocabulary and user individual vocabulary, provide a kind of feasible solution for system supports personalized dictionary or the individualized language model of customization, the user can realize identification support to personalized vocabulary by the hot word dictionary of simple update system support.
In embodiments of the present invention, allow at any time self-defined hot word of user, to enlarge the identification range of system, adapt to the demand that vocabulary constantly changes.
As shown in figure 10, be a kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.
In this embodiment, described system also comprises: hot set of words updating block 911 is used for obtaining the hot word of user's input, and described hot word is saved in hot set of words.This hot set of words can be a hot word entry storehouse that dynamically updates.
Correspondingly, exciting unit 913 encourages the historical path of live-vertex according to described hot set of words in decode procedure.
In this embodiment, exciting unit 913 specifically comprises: obtain subelement 9131, judgment sub-unit 9132 and excitation subelement 9133.Wherein:
Obtain subelement 9131, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that 702 pairs of current speech signal frames decodings of decoding unit obtain;
Judgment sub-unit 9132 is used for judging according to described hot set of words whether contiguous word consists of a hot word on the described historical path;
Excitation subelement 9133 is used for improving the accumulated history path probability in described historical path after described judgment sub-unit 9132 judges that the contiguous word in described historical path consists of a hot word; After described judgment sub-unit 9132 judges that the contiguous word in described historical path can not consist of a hot word, keep the accumulated history path probability in described historical path.When encouraging, can take the method for constant excitation bonus point to improve the priority in path, hot word place, so that the easier reservation in path, hot word place.The concrete numerical value of constant excitation bonus point can preset as required, and in general, the excitation bonus point is higher, and then hot word matching degree is higher, the also corresponding raising of hot word identification correctness.
The speech recognition system of this embodiment gives suitable score excitation when corresponding word sequence can consist of a hot word entry in the historical path of live-vertex, realize the optimization to path, hot word place, has improved the recognition effect of hot word.
In actual applications, because hot word often is made of two or more participles, the above-mentioned mode that encourages when complete coupling realized in hot word entry may be reduced the purpose that does not reach excitation too early owing to path, hot word entry place, affects the accuracy rate of hot word identification.For this reason, in another embodiment of speech recognition system of the present invention, exciting unit can also adopt the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages.
As shown in figure 11, be the another kind of concrete application structure synoptic diagram of embodiment of the invention speech recognition system.
In this embodiment, described system also comprises: hot word acquiring unit 921 and cutting unit 922, and wherein, hot word acquiring unit 921 is used for obtaining the hot word of user's input; The hot word that cutting unit 922 is used for described hot word acquiring unit is obtained carries out the text participle, and the participle that obtains is saved in hot word dictionary.
Correspondingly, exciting unit 923 encourages the historical path of live-vertex according to described hot word dictionary in decode procedure.
In this embodiment, exciting unit 923 specifically comprises: obtain subelement 9231, neologisms judgment sub-unit 9232 and excitation subelement 9233.Wherein:
Obtain subelement 9231, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Neologisms judgment sub-unit 9232 is used for judging whether the historical path of described live-vertex neologisms occur;
Excitation subelement 9233, be used for after described neologisms judgment sub-unit 9232 judges that neologisms appear in the historical path of described live-vertex, according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path; Otherwise the accumulated history path probability that keeps described historical path.
In this embodiment, the excitation subelement 9133 that is different among above-mentioned Figure 10 just encourages historical path when hot word mates fully, excitation subelement 9233 is carried out excitation to historical path when hot word partly mates in decode procedure, namely according to the spreading result in the historical path accumulated history path probability in new historical path more, by improving in advance the path accumulated probability, can guarantee better the survival probability in path, hot word place.Described excitation subelement 9233 can have multiple implementation, and the below is described in detail for example.
As shown in figure 12, be a kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.
In this embodiment, described excitation subelement comprises:
The first judgment sub-unit 121 is used for judging the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
First processes subelement 122, be used for after described the first judgment sub-unit 121 judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path, the accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle;
The second judgment sub-unit 123 is used for after described the first judgment sub-unit 121 judges that described neologisms are not the follow-up participle of hot word participle corresponding on the described historical path, judges that whether described neologisms are the initial participle in the described hot word dictionary;
Second processes subelement 124, is used for after described the second judgment sub-unit 123 judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path.
As shown in figure 13, be the another kind of concrete structure synoptic diagram of excitation subelement in the embodiment of the invention.
In this embodiment, described excitation subelement comprises:
Hot word judgment sub-unit 131 is used for judging the word sequence complete thermal word of described hot word dictionary whether before the above neologisms of historical path, described neologisms place;
The 3rd judgment sub-unit 132, when being used for word sequence before described hot word judgment sub-unit 131 is judged the above neologisms of historical path, described neologisms place and being a complete thermal word of described hot word dictionary, judge the whether initial participle in the described hot word dictionary of described neologisms;
The 3rd processes subelement 133, is used for after described the 3rd judgment sub-unit 132 judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path;
The 4th judgment sub-unit 134, when being used for word sequence before described hot word judgment sub-unit 131 is judged the above neologisms of historical path, described neologisms place and not being a complete thermal word of described hot word dictionary, judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
Manages subelement 135 everywhere, be used for after described the 4th judgment sub-unit 134 judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path, the accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in described historical path.
Certainly, in actual applications, described excitation subelement can also have other implementation, and this embodiment of the invention is not done restriction.Such as, it is historical to keep to the historical path of single live-vertex a plurality of hot word participle couplings, in subsequent decoding when definite a certain hot word participle matching result can not continue expansion, before cancelling again to described historical path because the bonus point of the accumulated history path probability that hot word participle produces further improves the rationality that hot word mates with this.
The speech recognition system of this embodiment adopts the motivational techniques based on predicted path, improves constantly the priority in path, hot word place by the mode that progressively encourages, and realizes the optimization to path, hot word place, has improved the recognition effect of hot word.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for system embodiment, because its basic simlarity is in embodiment of the method, so describe fairly simplely, relevant part gets final product referring to the part explanation of embodiment of the method.System embodiment described above only is schematically, and wherein said unit and module as the separating component explanation can or can not be physically to separate also.In addition, can also select according to the actual needs wherein some or all of unit and the module purpose that realizes the present embodiment scheme.Those of ordinary skills namely can understand and implement in the situation of not paying creative work.
More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and not breaking away from some improvements and modifications of doing under the principle of the invention prerequisite, all should drop in protection scope of the present invention.

Claims (14)

1. an audio recognition method is characterized in that, comprising:
Make up the decoding recognition network;
To the voice signal that receives, according to described decoding recognition network every frame voice signal frame is wherein decoded, and in decode procedure, according to hot word the historical path of live-vertex is encouraged, to improve the accumulated history path probability in path, hot word place;
After finishing last frame voice signal frame decoding, the live-vertex that selection has the cumulative maximum probability is optimum node;
Recall the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.
2. the method for claim 1 is characterized in that, described method also comprises: obtain the hot word of user's input, and described hot word is saved in hot set of words;
Described according to hot word the historical path of live-vertex the excitation in decode procedure comprises: according to described hot set of words the historical path of live-vertex is encouraged in decode procedure.
3. method as claimed in claim 2 is characterized in that, described according to described hot set of words the historical path of live-vertex the excitation in decode procedure comprises:
Obtain historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judge according to described hot set of words whether contiguous word consists of a hot word on the described historical path;
If so, then improve the accumulated history path probability in described historical path;
If not, the accumulated history path probability that then keeps described historical path.
4. the method for claim 1 is characterized in that, described method also comprises: obtain the hot word of user's input, described hot word is carried out the text participle, and the participle that obtains is saved in hot word dictionary;
Described according to hot word the historical path of live-vertex the excitation in decode procedure comprises: according to described hot word dictionary the historical path of live-vertex is encouraged in decode procedure.
5. method as claimed in claim 4 is characterized in that, described according to described hot word dictionary the historical path of live-vertex the excitation in decode procedure comprises:
Obtain historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judge in the historical path of described live-vertex and neologisms whether occur;
If so, then according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path;
If not, the accumulated history path probability that then keeps described historical path.
6. method as claimed in claim 5 is characterized in that, described according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, the accumulated history path probability of upgrading described historical path comprises:
Judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described follow-up participle;
If not, judge that then whether described neologisms are the initial participle in the described hot word dictionary;
If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described initial participle;
If not, the accumulated history path probability that then keeps described historical path.
7. method as claimed in claim 5 is characterized in that, described according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, the accumulated history path probability of upgrading described historical path comprises:
If the word sequence before the above neologisms of historical path, described neologisms place is a complete thermal word in the described hot word dictionary, judge that then whether described neologisms are the initial participle in the described hot word dictionary;
If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described initial participle;
If not, the accumulated history path probability that then keeps described historical path;
If the word sequence before the above neologisms of historical path, described neologisms place is not a complete thermal word in the described hot word dictionary, then judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
If so, then the accumulated history path probability in described historical path is carried out the bonus point corresponding with described follow-up participle;
If not, then cancel before to the bonus point of the accumulated history path probability in described historical path.
8. a speech recognition system is characterized in that, comprising:
The network struction unit is used for making up the decoding recognition network;
Decoding unit is used for the voice signal to receiving, and according to described decoding recognition network every frame voice signal frame is wherein decoded;
Exciting unit is used for the historical path of live-vertex being encouraged according to hot word at the decode procedure of described decoding unit, to improve the accumulated history path probability in path, hot word place;
Optimum node determination unit is used for after described decoding unit is finished last frame voice signal frame decoding, and the live-vertex that selection has the cumulative maximum probability is optimum node;
Trace unit is used for recalling the word sequence that obtains optimal path and correspondence by decoded state from described optimum node.
9. system as claimed in claim 8 is characterized in that, described system also comprises:
Hot set of words updating block is used for obtaining the hot word of user's input, and described hot word is saved in hot set of words;
Described exciting unit encourages the historical path of live-vertex according to described hot set of words in described decoding unit decodes process.
10. system as claimed in claim 9 is characterized in that, described exciting unit comprises:
Obtain subelement, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
Judgment sub-unit is used for judging according to described hot set of words whether contiguous word consists of a hot word on the described historical path;
The excitation subelement is used for improving the accumulated history path probability in described historical path after described judgment sub-unit judges that the contiguous word in described historical path consists of a hot word; After described judgment sub-unit judges that the contiguous word in described historical path can not consist of a hot word, keep the accumulated history path probability in described historical path.
11. system as claimed in claim 8 is characterized in that, described system also comprises:
Hot word acquiring unit is used for obtaining the hot word of user's input;
The cutting unit, the hot word that is used for described hot word acquiring unit is obtained carries out the text participle, and the participle that obtains is saved in hot word dictionary;
Described exciting unit encourages the historical path of live-vertex according to described hot word dictionary in described decoding unit decodes process.
12. system as claimed in claim 11 is characterized in that, described exciting unit comprises:
Obtain subelement, be used for obtaining historical path and the accumulated history path probability of all live-vertexs that decoding obtains to the current speech signal frame;
The neologisms judgment sub-unit is used for judging whether the historical path of described live-vertex neologisms occur;
The excitation subelement, be used for after described neologisms judgment sub-unit judges that neologisms appear in the historical path of described live-vertex, according to the spreading result of described neologisms with respect to the participle in the corresponding described hot word dictionary on the described historical path, upgrade the accumulated history path probability in described historical path; Otherwise the accumulated history path probability that keeps described historical path.
13. system as claimed in claim 12 is characterized in that, described excitation subelement comprises:
The first judgment sub-unit is used for judging the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
First processes subelement, is used for after described the first judgment sub-unit judges that described neologisms are the follow-up participle of the hot word participle of correspondence on the described historical path accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle;
The second judgment sub-unit is used for after described the first judgment sub-unit judges that described neologisms are not the follow-up participle of hot word participle corresponding on the described historical path, judges that whether described neologisms are the initial participle in the described hot word dictionary;
Second processes subelement, is used for after described the second judgment sub-unit judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path.
14. system as claimed in claim 12 is characterized in that, described excitation subelement comprises:
Hot word judgment sub-unit is used for judging the word sequence complete thermal word of described hot word dictionary whether before the above neologisms of historical path, described neologisms place;
The 3rd judgment sub-unit, when being used for word sequence before described hot word judgment sub-unit is judged the above neologisms of historical path, described neologisms place and being a complete thermal word of described hot word dictionary, judge the whether initial participle in the described hot word dictionary of described neologisms;
The 3rd processes subelement, is used for after described the 3rd judgment sub-unit judges that described neologisms are the initial participle of described hot word dictionary the accumulated history path probability in described historical path being carried out the bonus point corresponding with described initial participle; Otherwise the accumulated history path probability that keeps described historical path;
The 4th judgment sub-unit, when being used for word sequence before described hot word judgment sub-unit is judged the above neologisms of historical path, described neologisms place and not being a complete thermal word of described hot word dictionary, judge the whether follow-up participle of corresponding hot word participle on the described historical path of described neologisms;
The manages subelement everywhere, is used for after described the 4th judgment sub-unit judges that described neologisms are the follow-up participle of hot word participle corresponding on the described historical path accumulated history path probability in described historical path being carried out the bonus point corresponding with described follow-up participle; Otherwise before cancelling to the bonus point of the accumulated history path probability in described historical path.
CN2012100734129A 2012-03-19 2012-03-19 Voice recognition method and system Active CN102592595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100734129A CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100734129A CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Publications (2)

Publication Number Publication Date
CN102592595A CN102592595A (en) 2012-07-18
CN102592595B true CN102592595B (en) 2013-05-29

Family

ID=46481136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100734129A Active CN102592595B (en) 2012-03-19 2012-03-19 Voice recognition method and system

Country Status (1)

Country Link
CN (1) CN102592595B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589164B1 (en) * 2012-10-18 2013-11-19 Google Inc. Methods and systems for speech recognition processing using search query information
CN103065630B (en) 2012-12-28 2015-01-07 科大讯飞股份有限公司 User personalized information voice recognition method and user personalized information voice recognition system
CN103903619B (en) * 2012-12-28 2016-12-28 科大讯飞股份有限公司 A kind of method and system improving speech recognition accuracy
CN103971686B (en) * 2013-01-30 2015-06-10 腾讯科技(深圳)有限公司 Method and system for automatically recognizing voice
JP6585022B2 (en) * 2016-11-11 2019-10-02 株式会社東芝 Speech recognition apparatus, speech recognition method and program
CN107146610B (en) * 2017-04-10 2021-06-15 易视星空科技无锡有限公司 Method and device for determining user intention
CN109213777A (en) * 2017-06-29 2019-01-15 杭州九阳小家电有限公司 A kind of voice-based recipe processing method and system
CN108682415B (en) * 2018-05-23 2020-09-29 广州视源电子科技股份有限公司 Voice search method, device and system
CN108735201B (en) * 2018-06-29 2020-11-17 广州视源电子科技股份有限公司 Continuous speech recognition method, device, equipment and storage medium
CN111081226B (en) * 2018-10-18 2024-02-13 北京搜狗科技发展有限公司 Speech recognition decoding optimization method and device
CN109524017A (en) * 2018-11-27 2019-03-26 北京分音塔科技有限公司 A kind of the speech recognition Enhancement Method and device of user's custom words
CN110164416B (en) * 2018-12-07 2023-05-09 腾讯科技(深圳)有限公司 Voice recognition method and device, equipment and storage medium thereof
CN111354348B (en) * 2018-12-21 2024-04-26 北京搜狗科技发展有限公司 Data processing method and device for data processing
CN111354347B (en) * 2018-12-21 2023-08-15 中国科学院声学研究所 Speech recognition method and system based on self-adaptive hotword weight
CN109389970A (en) * 2018-12-28 2019-02-26 合肥凯捷技术有限公司 A kind of speech analysis recognition methods
CN109902306B (en) * 2019-03-12 2021-02-02 珠海格力电器股份有限公司 Voice recognition method, device, storage medium and voice equipment
CN110110294B (en) * 2019-03-26 2021-02-02 北京捷通华声科技股份有限公司 Dynamic reverse decoding method, device and readable storage medium
CN110349569B (en) * 2019-07-02 2022-04-15 思必驰科技股份有限公司 Method and device for training and identifying customized product language model
CN110808032B (en) * 2019-09-20 2023-12-22 平安科技(深圳)有限公司 Voice recognition method, device, computer equipment and storage medium
CN110956959B (en) * 2019-11-25 2023-07-25 科大讯飞股份有限公司 Speech recognition error correction method, related device and readable storage medium
CN111028830B (en) * 2019-12-26 2022-07-15 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111063353B (en) * 2019-12-31 2022-11-11 思必驰科技股份有限公司 Client processing method allowing user-defined voice interactive content and user terminal
CN111462751B (en) * 2020-03-27 2023-11-03 京东科技控股股份有限公司 Method, apparatus, computer device and storage medium for decoding voice data
CN111508478B (en) * 2020-04-08 2023-04-11 北京字节跳动网络技术有限公司 Speech recognition method and device
CN111583909B (en) * 2020-05-18 2024-04-12 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
CN111402895B (en) * 2020-06-08 2020-10-02 腾讯科技(深圳)有限公司 Voice processing method, voice evaluating method, voice processing device, voice evaluating device, computer equipment and storage medium
CN112634904A (en) * 2020-12-22 2021-04-09 北京有竹居网络技术有限公司 Hot word recognition method, device, medium and electronic equipment
CN113096648A (en) * 2021-03-20 2021-07-09 杭州知存智能科技有限公司 Real-time decoding method and device for speech recognition
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
CN113450803B (en) * 2021-06-09 2024-03-19 上海明略人工智能(集团)有限公司 Conference recording transfer method, system, computer device and readable storage medium
CN113436614B (en) * 2021-07-02 2024-02-13 中国科学技术大学 Speech recognition method, device, equipment, system and storage medium
CN113516967A (en) * 2021-08-04 2021-10-19 青岛信芯微电子科技股份有限公司 Voice recognition method and device
CN117351944B (en) * 2023-12-06 2024-04-12 科大讯飞股份有限公司 Speech recognition method, device, equipment and readable storage medium
CN117437909B (en) * 2023-12-20 2024-03-05 慧言科技(天津)有限公司 Speech recognition model construction method based on hotword feature vector self-attention mechanism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0535929A2 (en) * 1991-09-30 1993-04-07 Kurzweil Applied Intelligence, Inc. Speech recognition system
US5345537A (en) * 1990-12-19 1994-09-06 Fujitsu Limited Network reformer and creator
CN101739437A (en) * 2009-11-26 2010-06-16 杭州鑫方软件有限公司 Implementation method for network sound-searching unit and specific device thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7058575B2 (en) * 2001-06-27 2006-06-06 Intel Corporation Integrating keyword spotting with graph decoder to improve the robustness of speech recognition
US20080154600A1 (en) * 2006-12-21 2008-06-26 Nokia Corporation System, Method, Apparatus and Computer Program Product for Providing Dynamic Vocabulary Prediction for Speech Recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345537A (en) * 1990-12-19 1994-09-06 Fujitsu Limited Network reformer and creator
EP0535929A2 (en) * 1991-09-30 1993-04-07 Kurzweil Applied Intelligence, Inc. Speech recognition system
CN101739437A (en) * 2009-11-26 2010-06-16 杭州鑫方软件有限公司 Implementation method for network sound-searching unit and specific device thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一个面向广播语音识别的语言模型自适应框架;王晓瑞等;《中文信息学报》;20070731;第21卷(第4期);73-79 *
王晓瑞等.一个面向广播语音识别的语言模型自适应框架.《中文信息学报》.2007,第21卷(第4期),73-79.

Also Published As

Publication number Publication date
CN102592595A (en) 2012-07-18

Similar Documents

Publication Publication Date Title
CN102592595B (en) Voice recognition method and system
JP7417634B2 (en) Using context information in end-to-end models for speech recognition
JP6550068B2 (en) Pronunciation prediction in speech recognition
CN104157285B (en) Audio recognition method, device and electronic equipment
CN108320733B (en) Voice data processing method and device, storage medium and electronic equipment
US9070367B1 (en) Local speech recognition of frequent utterances
CN103903619B (en) A kind of method and system improving speech recognition accuracy
WO2017076222A1 (en) Speech recognition method and apparatus
CN103065630B (en) User personalized information voice recognition method and user personalized information voice recognition system
JP5218052B2 (en) Language model generation system, language model generation method, and language model generation program
US11132509B1 (en) Utilization of natural language understanding (NLU) models
CN103971685B (en) Method and system for recognizing voice commands
KR20210150497A (en) Context biasing for speech recognition
CN102280106A (en) VWS method and apparatus used for mobile communication terminal
US11043214B1 (en) Speech recognition using dialog history
US20210193116A1 (en) Data driven dialog management
CN111862942B (en) Method and system for training mixed speech recognition model of Mandarin and Sichuan
CN108630200B (en) Voice keyword detection device and voice keyword detection method
CN106157953A (en) continuous speech recognition method and system
CN103035243A (en) Real-time feedback method and system of long voice continuous recognition and recognition result
KR20110128229A (en) Improving the robustness to environmental changes of a context dependent speech recognizer
CN108735201A (en) Continuous speech recognition method, apparatus, equipment and storage medium
CN108899013A (en) Voice search method, device and speech recognition system
CN102982811A (en) Voice endpoint detection method based on real-time decoding
CN107403619A (en) A kind of sound control method and system applied to bicycle environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: 230088 No. 616, Mount Huangshan Road, hi tech Development Zone, Anhui, Hefei

Patentee before: Anhui USTC iFLYTEK Co., Ltd.