CN103177721B - Audio recognition method and system - Google Patents

Audio recognition method and system Download PDF

Info

Publication number
CN103177721B
CN103177721B CN201110440273.4A CN201110440273A CN103177721B CN 103177721 B CN103177721 B CN 103177721B CN 201110440273 A CN201110440273 A CN 201110440273A CN 103177721 B CN103177721 B CN 103177721B
Authority
CN
China
Prior art keywords
related term
voice
speech recognition
keyword
voice messaging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110440273.4A
Other languages
Chinese (zh)
Other versions
CN103177721A (en
Inventor
冯克威
赵江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201110440273.4A priority Critical patent/CN103177721B/en
Publication of CN103177721A publication Critical patent/CN103177721A/en
Application granted granted Critical
Publication of CN103177721B publication Critical patent/CN103177721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention discloses a kind of audio recognition method and system.Wherein in audio recognition method, speech quality evaluation is carried out to the first voice messaging and the second voice messaging, select the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information; Speech recognition is carried out to reference voice information, obtains with reference to identifying information; With reference to selecting in identifying information n word that degree of confidence is the highest as keyword; For each keyword, generate the set of m level related term according to predetermined vocabulary; Improve keyword and the weighted value of related term in speech recognition modeling dictionary; Utilize the speech recognition modeling dictionary upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information.Owing to have modified the weighted value of related term according to conversation content, thus the accuracy that raising speech recognition modeling describes current session content, improve the accuracy rate of speech recognition.

Description

Audio recognition method and system
Technical field
The present invention relates to field of information processing, particularly relate to audio recognition method and system.
Background technology
Language is natural, the most the most frequently used exchange waies of the mankind, (Automatic Speech Recognition is called for short: ASR) be the new branch of science grown up nearly half a century for speech recognition (SpeechRecognition) or automatic speech recognition.The target of speech recognition is the natural-sounding making machine " understand " people, and by identifying that the information that obtains can be used as control signal and is applied to every field, speech recognition has broad application prospects in industry, military affairs, traffic, medical science, each side such as civilian.Speech recognition system, according to the requirement to speaker's tongue, can be divided into isolated word, word speech recognition system, conjunction speech recognition system and Continuous Speech Recognition System; According to the degree of dependence to speaker, particular person and signer-independent sign language recognition system can be divided into; According to vocabulary size, little vocabulary, medium vocabulary, large vocabulary and unlimited vocabulary speech recognition system can be divided into.Different speech recognition systems, although it is different to realize details, the basic framework adopted is similar.
Mainly based on Hidden Markov Model (HMM), (HiddenMarkov Model is called for short: HMM) existing main flow speech recognition system.In general recognition system, mainly utilize acoustic model (Acoustic Model, be called for short: AM) and language model (LanguageModel, be called for short: LM), by decoding (Decode) operation acquisition recognition result.Wherein in language model, widely used form is statistical language model, and statistical language model is the statistical law disclosing linguistic unit inherence by the method for probability statistics, and wherein N-Gram is simply effective, is widely used.
For call voice identification, particularly for speech recognition and the speech retrieval of call center, because voice quality is relatively poor relative to normal speech identification scene, such as in an office environment, therefore speech recognition effect is restricted.Here voice quality is very poor comprises following reason, have powerful connections difference that noise, client voice capture device, the noise of verbal system, the Noise and Interference of communication line, different communication circuit or switch produce, dissimilar device end voice coding modes in communication process of such as client is different, also have client itself speak band have an accent or employ dialect, speaker itself speak ambiguous or unclear etc.All of these factors taken together all may cause speech recognition deleterious.
On the other hand, the content difference of each dialogue is very large, and often logical talk time is not very long, and generally only have some minutes, content is between hundreds of word to one or two K word.For once talking with, especially for the voice of client, carrying out acoustic model self-adaptation or language model adaptation data and being all difficult to satisfactory.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of audio recognition method and system, by the weighted value of conversation content amendment related term, weighted value is also referred to as probable value, thus the accuracy that raising speech recognition modeling describes current session content, improve the accuracy rate of speech recognition.
According to an aspect of the present invention, provide a kind of audio recognition method, comprising:
First voice messaging of the first teller and second voice messaging of the second teller is obtained respectively from dialogic voice information;
Respectively speech quality evaluation is carried out to the first voice messaging and the second voice messaging, select the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information;
Speech recognition is carried out to reference voice information, obtains with reference to identifying information;
With reference in identifying information, select n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer;
For each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, does not comprise keyword in the set of m level related term, simultaneously in the set of m level related term, related term does not also repeat;
Improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtains the speech recognition modeling upgraded;
Utilize the speech recognition modeling upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information, obtain the first identifying information and the second identifying information.
According to an aspect of the present invention, provide a kind of speech recognition system, comprising:
Acquiring unit, with the second voice messaging of the first voice messaging and the second teller that obtain the first teller from dialogic voice information respectively;
Assessment unit, for carrying out speech quality evaluation to the first voice messaging and the second voice messaging respectively, selects the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information;
First voice recognition unit, for carrying out speech recognition to reference voice information, obtains with reference to identifying information;
Keyword generation unit, for reference in identifying information, selects n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer;
Related term generation unit, for for each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, and each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, in the set of m level related term, do not comprise keyword, simultaneously in the set of m level related term, related term does not also repeat;
Weight adjustment unit, for improving the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtains the speech recognition modeling upgraded;
Second voice recognition unit, for utilizing the speech recognition modeling of renewal, carrying out speech recognition to reference voice information and assistant voice information respectively, obtaining the first identifying information and the second identifying information.
The present invention carries out speech recognition by utilizing the good reference voice information of voice quality in dialogue, obtains with reference to identifying information.With reference in identifying information, select n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer; For each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m; Improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtain the speech recognition modeling dictionary upgraded; Utilize the speech recognition modeling dictionary upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information, obtain the first identifying information and the second identifying information.Owing to have modified the weighted value of related term according to conversation content, thus the accuracy that raising speech recognition modeling describes current session content, improve the accuracy rate of speech recognition.
Description of the invention provides in order to example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is in order to principle of the present invention and practical application are better described, and enables those of ordinary skill in the art understand the present invention thus design the various embodiments with various amendment being suitable for special-purpose.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of an audio recognition method of the present invention embodiment.
Fig. 2 is the schematic diagram of another embodiment of audio recognition method of the present invention.
Fig. 3 is the schematic diagram of a speech recognition system of the present invention embodiment.
Fig. 4 is the schematic diagram of another embodiment of speech recognition system of the present invention.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention is wherein described.
Fig. 1 is the schematic diagram of an audio recognition method of the present invention embodiment.As shown in Figure 1, the audio recognition method of this embodiment is as follows:
Step 101, obtains first voice messaging of the first teller and second voice messaging of the second teller respectively from dialogic voice information.
Step 102, carries out speech quality evaluation to the first voice messaging and the second voice messaging respectively, selects the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information.
Step 103, carries out speech recognition to reference voice information, obtains with reference to identifying information.
Step 104, with reference in identifying information, selects n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer.
Step 105, for each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, and each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, in the set of m level related term, do not comprise keyword, simultaneously in the set of m level related term, related term does not also repeat.
Step 106, improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtains the speech recognition modeling upgraded.
Step 107, utilizes the speech recognition modeling upgraded, carries out speech recognition respectively, obtain the first identifying information and the second identifying information to reference voice information and assistant voice information.
Based on the audio recognition method that the above embodiment of the present invention improves, by utilizing the good reference voice information of voice quality in dialogue to carry out speech recognition, obtain with reference to identifying information.With reference in identifying information, select n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer; For each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m; Improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtain the speech recognition modeling dictionary upgraded; Utilize the speech recognition modeling dictionary upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information, obtain the first identifying information and the second identifying information.Owing to have modified the weighted value of related term according to conversation content, thus the accuracy that raising speech recognition modeling describes current session content, improve the accuracy rate of speech recognition.
Fig. 2 is the schematic diagram of another embodiment of audio recognition method of the present invention.As shown in Figure 2, the audio recognition method of this embodiment is as follows:
Step 201, obtains first voice messaging of the first teller and second voice messaging of the second teller respectively from dialogic voice information.
According to another specific embodiment of the present invention, in dialogic voice information, from correspond to the first teller first via signal obtain the first voice messaging, from correspond to the second teller the second road signal obtain the second voice messaging.
Step 202, carries out speech quality evaluation to the first voice messaging and the second voice messaging respectively, selects the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information.
Due in heart speech recognition in a call, a favourable factor is the mandarin that contact staff is all suitable for comparatively standard, acoustic enviroment simultaneously residing for contact staff is comparatively stable and single, thus voice quality is higher, and relative its accuracy rate of client speech recognition is higher.Therefore the result of contact staff's speech recognition can be utilized to improve the speech recognition effect of client.Certainly, the customer voice quality situation higher than contact staff voice quality is not got rid of yet.
Those skilled in the art are scrutable, carry out speech quality evaluation to voice messaging, and this is known in the art, such as signal-noise ratio estimation method, voice quality objective evaluating method, a method in the methods such as pronunciation standard evaluation and test or combination.
Step 203, carries out speech recognition to reference voice information, obtains with reference to identifying information.
Speech recognition is carried out to reference voice information, existing speech recognition technology can be adopted.
Step 204, with reference in identifying information, selects n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer.
Such as under many circumstances, because the voice quality of contact staff is higher, therefore by tentatively identifying the voice of contact staff, result relatively reliably can be obtained.
Step 205, for each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, and each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, in the set of m level related term, do not comprise keyword, simultaneously in the set of m level related term, related term does not also repeat.
By determining keyword and the set of m level related term, the word sequence of a corresponding relation from tight to general can be obtained.
Step 206, improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtains the speech recognition modeling upgraded.
According to another specific embodiment of the present invention, in order to the accuracy utilizing conversation content to improve speech recognition, when weighted value upgrades, the weighted value of keyword is greater than the weighted value of related term in the set of m level related term, the weighted value of related term in the set of L-1 level related term, is greater than the weighted value of related term in the set of L level related term.
Due to the weighted value of part word become large after, the weight of all words and more than 1, may therefore need the weight bi-directional scaling to all words, make the weight of all words and be 1, ensure integrality and the standardization of language model.
Step 207, utilizes the speech recognition modeling upgraded, carries out speech recognition respectively, obtain the first identifying information and the second identifying information to reference voice information and assistant voice information.
Step 208, according to pre-conditioned, judges whether to need to carry out iterative processing to reference voice information and assistant voice information.If desired iterative processing is carried out to reference voice information and assistant voice information, then return step 204; If do not need, iterative processing is carried out to reference voice information and assistant voice information, then terminate this steps flow chart.
With a concrete example, this programme is described below.Such as by identifying the voice of contact staff, obtain following result:
" to (0.9) railway station (0.9), how (0.7) is walked (0.8) ", " from (0.33) emperor (0.55) mansion (0.8) (0.7) ".
Numerical value in its bracket is degree of confidence.The word selecting degree of confidence the highest is keyword, such as, select following keyword:
To (0.9), railway station (0.9), walk (0.8), mansion (0.8).
Word higher for these degree of confidence is filtered, remove the word that word comparatively common does not in general sense have quantity of information in other words, such as adopt word frequency-reverse document-frequency (TermFrequency-Inverse Document Frequency, be called for short: TF-IDF) rule is except going to (0.9), walking (0.8), finally determines that keyword is
" railway station ", " mansion ".
According to predetermined vocabulary, select " related term " of several these keywords, wherein the relation list of word and word by calculating in advance from a large amount of text data, and she describes the sequence of a word and the word of corresponding relation from tight to general thereof, such as
With " railway station " word in close relations as " train number ", " traffic ", " bus station " etc., with " mansion " word in close relations as " floor ", " office building ", " commercial affairs " etc.Such as each " keyword " selects two related terms as the set of first order related term, wherein:
" railway station ": related term is " train number ", " traffic ".
" mansion ": related term is " floor ", " office building ".
Like this, the set of first order related term comprises " train number ", " traffic ", " floor ", " office building ".
Equally, for the set of first order related term, second level related term set can be generated, concentrate:
" train number ": related term is train, moment.
" traffic ": related term is automobile, railway.
" floor ": related term is elevator, one deck.
" office building ": related term is mansion, rent.
Like this, second level related term set comprises train, moment, automobile, railway, elevator, one deck, mansion, rent, notice in the related term set of the second level and may comprise keyword, such meeting causes repetition when weight adjusting, therefore need to remove the keyword that related term set at different levels comprises, final second level related term set is:
Train, moment, automobile, railway, elevator, one deck, rent.
As required, multistage related term set can be set.
In speech recognition modeling dictionary, improve the weighted value of related term in keyword and related term set at different levels.Such as, for keyword, weight becomes original 3 times; For the related term in the set of first order related term, weight becomes original 2.5 times; For the related term in the related term set of the second level, weight becomes original 1.5 times.The weight of all the other words remains unchanged.Such weight adjusting is put for the uni-gram part in N-gram.
Due to the weighted value of part word become large after, the weight of all words and more than 1, may therefore need the weight bi-directional scaling to all words, make the weight of all words and be 1, ensure integrality and the standardization of language model.
According to the speech recognition modeling dictionary after renewal, the voice of client and contact staff are identified.Because speech recognition modeling dictionary has carried out dynamic conditioning according to conversation content, the accuracy of speech recognition therefore can be improved.
Fig. 3 is the schematic diagram of an audio recognition method of the present invention embodiment.In the embodiment shown in fig. 3, comprise acquiring unit 301, assessment unit 302, first voice recognition unit 303, keyword generation unit 304, related term generation unit 305, weight adjustment unit 306, second voice recognition unit 307, wherein:
Acquiring unit 301, with the second voice messaging of the first voice messaging and the second teller that obtain the first teller from dialogic voice information respectively.
Assessment unit 302, for carrying out speech quality evaluation to the first voice messaging and the second voice messaging respectively, selects the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information.
First voice recognition unit 303, for carrying out speech recognition to reference voice information, obtains with reference to identifying information;
Keyword generation unit 304, for reference in identifying information, selects n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer.
Related term generation unit 305, for for each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, and each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, in the set of m level related term, do not comprise keyword, simultaneously in the set of m level related term, related term does not also repeat.
Weight adjustment unit 306, for improving the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtains the speech recognition modeling upgraded.
Second voice recognition unit 307, for utilizing the speech recognition modeling of renewal, carrying out speech recognition to reference voice information and assistant voice information respectively, obtaining the first identifying information and the second identifying information.
Based on the speech recognition system that the above embodiment of the present invention improves, by utilizing the good reference voice information of voice quality in dialogue to carry out speech recognition, obtain with reference to identifying information.With reference in identifying information, select n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer; For each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m; Improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtain the speech recognition modeling dictionary upgraded; Utilize the speech recognition modeling dictionary upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information, obtain the first identifying information and the second identifying information.Owing to have modified the weighted value of related term according to conversation content, thus the accuracy that raising speech recognition modeling describes current session content, improve the accuracy rate of speech recognition.
According to another specific embodiment of the present invention, acquiring unit 301 is specifically specifically in dialogic voice information, from correspond to the first teller first via signal obtain the first voice messaging, from correspond to the second teller the second road signal obtain the second voice messaging.
Fig. 4 is the schematic diagram of another embodiment of audio recognition method of the present invention.Compared with embodiment illustrated in fig. 3, in the embodiment shown in fig. 4, also comprise judging unit 401, after speech recognition modeling dictionary for upgrading in the second voice recognition unit 307 utilization carries out speech recognition to the first voice messaging and the second voice messaging respectively, judge whether to need to carry out iterative processing to reference voice information and assistant voice information, if desired iterative processing is carried out to reference voice information and assistant voice information, then indicate selection unit 304 to perform with reference to the operation selecting in identifying information n word that degree of confidence is the highest as keyword.
According to another specific embodiment of the present invention, the weighted value of keyword is greater than the weighted value of related term in the set of m level related term; The weighted value of related term in the set of L-1 level related term, is greater than the weighted value of related term in the set of L level related term.

Claims (6)

1. an audio recognition method, is characterized in that, comprising:
First voice messaging of the first teller and second voice messaging of the second teller is obtained respectively from dialogic voice information;
Respectively speech quality evaluation is carried out to the first voice messaging and the second voice messaging, select the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information;
Speech recognition is carried out to reference voice information, obtains with reference to identifying information;
With reference in identifying information, select n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer;
For each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, does not comprise keyword in the set of m level related term, simultaneously in the set of m level related term, related term does not also repeat;
Improve the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtain the speech recognition modeling upgraded, wherein the weighted value of keyword is greater than the weighted value of related term in the set of m level related term, the weighted value of related term in the set of L-1 level related term, be greater than the weighted value of related term in the set of L level related term,
Utilize the speech recognition modeling upgraded, respectively speech recognition is carried out to reference voice information and assistant voice information, obtain the first identifying information and the second identifying information.
2. method according to claim 1, is characterized in that,
After the speech recognition modeling dictionary of utilization renewal carries out speech recognition to the first voice messaging and the second voice messaging respectively, also comprise:
Judge whether to need to carry out iterative processing to reference voice information and assistant voice information;
If desired iterative processing is carried out to reference voice information and assistant voice information, then perform with reference to the step selecting in identifying information n word that degree of confidence is the highest as keyword.
3. method according to claim 1 and 2, is characterized in that,
Second voice messaging of described the first voice messaging and the second teller that obtain the first teller from corresponding voice messaging respectively comprises:
In dialogic voice information, from correspond to the first teller first via signal obtain the first voice messaging, from correspond to the second teller the second road signal obtain the second voice messaging.
4. a speech recognition system, is characterized in that, comprising:
Acquiring unit, with the second voice messaging of the first voice messaging and the second teller that obtain the first teller from dialogic voice information respectively;
Assessment unit, for carrying out speech quality evaluation to the first voice messaging and the second voice messaging respectively, selects the good voice messaging of voice quality as with reference to voice messaging, using voice messaging poor for voice quality as assistant voice information;
First voice recognition unit, for carrying out speech recognition to reference voice information, obtains with reference to identifying information;
Keyword generation unit, for reference in identifying information, selects n word that degree of confidence is the highest as keyword, n be greater than 0 positive integer;
Related term generation unit, for for each keyword, the set of m level related term is generated according to predetermined vocabulary, each related term wherein in the set of first order related term is associated with a keyword respectively, and each related term in the set of L level related term is associated with a related term in the set of L-1 level related term respectively, m, L be greater than 0 positive integer, 2≤L≤m, in the set of m level related term, do not comprise keyword, simultaneously in the set of m level related term, related term does not also repeat;
Weight adjustment unit, for improving the weighted value of related term in speech recognition modeling dictionary in keyword and the set of m level related term, wherein be multiplied by a multiple for each weighted value, the weight increase multiple of keyword is greater than the weight increase multiple of related term in the set of m level related term, in the set of L-1 level related term, the weight of related term increases the weight increase multiple that multiple is greater than related term in the set of L level related term, the weighted value of word whole in speech recognition modeling dictionary is normalized, obtain the speech recognition modeling upgraded, wherein the weighted value of keyword is greater than the weighted value of related term in the set of m level related term, the weighted value of related term in the set of L-1 level related term, be greater than the weighted value of related term in the set of L level related term,
Second voice recognition unit, for utilizing the speech recognition modeling of renewal, carrying out speech recognition to reference voice information and assistant voice information respectively, obtaining the first identifying information and the second identifying information.
5. system according to claim 4, is characterized in that, also comprises:
Judging unit, after speech recognition modeling dictionary for upgrading in the second voice recognition unit utilization carries out speech recognition to the first voice messaging and the second voice messaging respectively, judge whether to need to carry out iterative processing to reference voice information and assistant voice information, if desired iterative processing is carried out to reference voice information and assistant voice information, then indicate selection unit to perform with reference to the operation selecting in identifying information n word that degree of confidence is the highest as keyword.
6. the system according to claim 4 or 5, is characterized in that,
Acquiring unit specifically in dialogic voice information, from correspond to the first teller first via signal obtain the first voice messaging, from correspond to the second teller the second road signal obtain the second voice messaging.
CN201110440273.4A 2011-12-26 2011-12-26 Audio recognition method and system Active CN103177721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110440273.4A CN103177721B (en) 2011-12-26 2011-12-26 Audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110440273.4A CN103177721B (en) 2011-12-26 2011-12-26 Audio recognition method and system

Publications (2)

Publication Number Publication Date
CN103177721A CN103177721A (en) 2013-06-26
CN103177721B true CN103177721B (en) 2015-08-19

Family

ID=48637528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110440273.4A Active CN103177721B (en) 2011-12-26 2011-12-26 Audio recognition method and system

Country Status (1)

Country Link
CN (1) CN103177721B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102014109122A1 (en) * 2013-07-12 2015-01-15 Gm Global Technology Operations, Llc Systems and methods for result-based arbitration in speech dialogue systems
US9715878B2 (en) 2013-07-12 2017-07-25 GM Global Technology Operations LLC Systems and methods for result arbitration in spoken dialog systems
CN103700369B (en) * 2013-11-26 2016-08-31 科大讯飞股份有限公司 Phonetic navigation method and system
TWI506458B (en) 2013-12-24 2015-11-01 Ind Tech Res Inst Apparatus and method for generating recognition network
CN103700368B (en) * 2014-01-13 2017-01-18 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
DE102015205044A1 (en) * 2015-03-20 2016-09-22 Bayerische Motoren Werke Aktiengesellschaft Enter navigation target data in a navigation system
CN106971741B (en) * 2016-01-14 2020-12-01 芋头科技(杭州)有限公司 Method and system for voice noise reduction for separating voice in real time
US10217458B2 (en) * 2016-09-23 2019-02-26 Intel Corporation Technologies for improved keyword spotting
CN107742517A (en) * 2017-10-10 2018-02-27 广东中星电子有限公司 A kind of detection method and device to abnormal sound
CN110444193B (en) * 2018-01-31 2021-12-14 腾讯科技(深圳)有限公司 Method and device for recognizing voice keywords
JP6790003B2 (en) * 2018-02-05 2020-11-25 株式会社東芝 Editing support device, editing support method and program
CN110837758B (en) * 2018-08-17 2023-06-02 杭州海康威视数字技术股份有限公司 Keyword input method and device and electronic equipment
CN111147673A (en) * 2019-12-20 2020-05-12 北京淇瑀信息科技有限公司 Method, device and system for cooperatively judging line state by operator signaling and voice

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324806A (en) * 2007-06-14 2008-12-17 台达电子工业股份有限公司 Input system and method for mobile search
CN101329868A (en) * 2008-07-31 2008-12-24 林超 Speech recognition optimizing system aiming at locale language use preference and method thereof
CN101609672A (en) * 2009-07-21 2009-12-23 北京邮电大学 A kind of speech recognition semantic confidence feature extracting methods and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8301449B2 (en) * 2006-10-16 2012-10-30 Microsoft Corporation Minimum classification error training with growth transformation optimization
TWI311311B (en) * 2006-11-16 2009-06-21 Inst Information Industr Speech recognition device, method, application program, and computer readable medium for adjusting speech models with selected speech data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324806A (en) * 2007-06-14 2008-12-17 台达电子工业股份有限公司 Input system and method for mobile search
CN101329868A (en) * 2008-07-31 2008-12-24 林超 Speech recognition optimizing system aiming at locale language use preference and method thereof
CN101609672A (en) * 2009-07-21 2009-12-23 北京邮电大学 A kind of speech recognition semantic confidence feature extracting methods and device

Also Published As

Publication number Publication date
CN103177721A (en) 2013-06-26

Similar Documents

Publication Publication Date Title
CN103177721B (en) Audio recognition method and system
US9552815B2 (en) Speech understanding method and system
CN101548313B (en) Voice activity detection system and method
US9117450B2 (en) Combining re-speaking, partial agent transcription and ASR for improved accuracy / human guided ASR
US8639508B2 (en) User-specific confidence thresholds for speech recognition
US20200020320A1 (en) Dialect phoneme adaptive training system and method
CN101118745B (en) Confidence degree quick acquiring method in speech identification system
KR20180087942A (en) Method and apparatus for speech recognition
CN109036412A (en) voice awakening method and system
US11488587B2 (en) Regional features based speech recognition method and system
GB2489489A (en) An integrated auto-diarization system which identifies a plurality of speakers in audio data and decodes the speech to create a transcript
JP2006079079A (en) Distributed speech recognition system and its method
US8688447B1 (en) Method and system for domain-specific noisy channel natural language processing (NLP)
US20200013391A1 (en) Acoustic information based language modeling system and method
CN110299150A (en) A kind of real-time voice speaker separation method and system
CN112581938A (en) Voice breakpoint detection method, device and equipment based on artificial intelligence
CN107886940B (en) Voice translation processing method and device
Graciarena et al. The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation.
CN110809796B (en) Speech recognition system and method with decoupled wake phrases
US7340398B2 (en) Selective sampling for sound signal classification
Tsiakoulis et al. Statistical methods for building robust spoken dialogue systems in an automobile
CN114328867A (en) Intelligent interruption method and device in man-machine conversation
Kwon et al. A method for on-line speaker indexing using generic reference models.
Gada et al. Confidence measures for detecting speech recognition errors
Barakat et al. An improved template-based approach to keyword spotting applied to the spoken content of user generated video blogs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant