CN101217035A - A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system - Google Patents
A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system Download PDFInfo
- Publication number
- CN101217035A CN101217035A CNA2007101857093A CN200710185709A CN101217035A CN 101217035 A CN101217035 A CN 101217035A CN A2007101857093 A CNA2007101857093 A CN A2007101857093A CN 200710185709 A CN200710185709 A CN 200710185709A CN 101217035 A CN101217035 A CN 101217035A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- data base
- acoustic model
- polyphone
- identification system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention relates to a method of building a character stock in a speech recognition system and a searching and comparing method thereof to solve the problem of calculating the same character repeatedly and reducing the whole operand. The method comprises the following steps of: 1) providing the data of polyphonic characters; 2) typing in the data; 3) building an acoustic model; 4) storing the data and the corresponding acoustic model thereof into the character stock. The character stock in the invention has the function of polyphonic characters recognition, so that the speech recognition system is closer to the pronunciation habit of the average users with human elements, therefore, enabling the users to follow the conventional pronunciation and to receive the correct recognition result.
Description
Technical field
The present invention relates to a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient.
Background technology
Known voice identification system does not add the processing capacity of polyphone, causes the user when carrying out phonetic entry, and the another kind pronunciation that must read into its polyphone could the identification success, and for example, " OK " word of the old Lixing of name must pronounce to be " factory
' " could identification success, as user's pronunciation for "
' " just correctly identification, again for example, " pleasure " word of philharmonic society, must pronounce for "
` " the ability identification, if pronunciation is " ㄩ
` " also correctly identification, and such phonetic entry mode is with generally there is very big difference user's the pronunciation custom.In addition, voice identification system is when carrying out identification, normally utilize viterbi algorithm (Viterbi Algorithm) calculate each word in the vocabulary the probit value of corresponding acoustic model carry out identification, and such calculation also is the place of voice identification system cost max calculation amount, therefore, unnecessary calculated amount will increase the weight of if frequent some identical word of double counting will cause system, also can cause the decline of System Discrimination speed, therefore facilitate us to think deeply and how to avoid the identical word of double counting to reduce whole operand.
Summary of the invention
The purpose of this invention is to provide a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient, thus avoid the identical word of double counting to reduce the technical matters of whole operand.
Technical solution of the present invention is: the present invention is a kind of set-up method of lexical data base of voice identification system, and its special character is: the method includes the steps of:
1), provide the polyphone data: the polyphone data comprise a plurality of polyphones and articulation type thereof;
2), input vocabulary;
3), set up acoustic model: vocabulary and polyphone data are compared, judge whether this vocabulary comprises at least one polyphone, if, then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of the polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
4), store the acoustic model of this vocabulary and correspondence thereof to lexical data base.
A kind of search comparison method that utilizes the lexical data base of above-mentioned voice identification system, its special character is: this method may further comprise the steps:
1), provide a lexical data base: this lexical data base comprises a plurality of vocabulary, and the vocabulary that wherein prefix is identical is sorted in adjacent mode, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
2), input speech sound signal;
3), obtain the characteristic parameter of speech sound signal: wherein characteristic parameter is the Mel cepstral coefficients;
4), the characteristic parameter that step 3) is obtained and the acoustic model of these vocabulary are compared one by one: acoustic model is the probit value that produces respectively corresponding to characteristic parameter, and wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words;
5), carry out the identification of speech sound signal by the probit value of these vocabulary.
Above-mentioned acoustic model is concealed markov model.
Above-mentioned probit value utilizes viterbi algorithm (Viterbi Algorithm) to calculate and produces.
The lexical data base set-up method of voice identification system of the present invention and search comparison method thereof, can build and put a kind of lexical data base of supporting the polyphone processing capacity, by polyphone data comparison with required vocabulary and lexical data base, judge whether this vocabulary comprises at least one polyphone, thereby set up corresponding one or a plurality of acoustic model respectively for one of polyphone that this vocabulary comprised or a plurality of articulation type, make the present invention build the lexical data base of putting, has the polyphone discriminating function, make voice identification system hommization more, also more press close to general user's pronunciation custom, allow the user can be, and obtain correct identification result according to general habitual pronunciation; Compare one by one by the characteristic parameter of the speech sound signal that obtains and the acoustic model of these vocabulary, carry out the identification of speech sound signal, can avoid double counting, improve the integral operation amount identical word according to the acoustic model probit value.
Description of drawings
Fig. 1 is the process flow diagram of the lexical data base set-up method of voice identification system of the present invention;
Fig. 2 is the process flow diagram of specific embodiment of the lexical data base set-up method of voice identification system of the present invention;
Fig. 3 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched comparison method;
Fig. 4 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched the specific embodiment of comparison method.
Embodiment
Voice identification system of the present invention mainly is to utilize concealed markov model (Hidden MarkovModel, HMM) method is done identification, it describes the phenomenon of pronunciation with the probability model, with the phonation of a bit of voice, regards that continuum of states shifts in the markov model as; Wherein the speech characteristic parameter that identification process utilized is Mel cepstral coefficients (Mel-Frequency CepstrumCoefficients, MFCC), it is except considering the impression degree of people's ear to different frequency, have more and separate pronunciation cavity model and the characteristic that excites signal, make us when speech recognition, can not be subjected to speaker's volume, or the influence of five kinds of tones of Chinese speech (, two, three, the four tones of standard Chinese pronunciation and softly).
Based on above characteristic, we will select the polyphone that is fit to identification system of the present invention from 245 Chinese polyphones, because the characteristic parameter that uses during identification is the Mel cepstral coefficients, therefore its pronunciation difference only is these words that tone is different in the polyphone, be not comprised in our polyphone to be processed, for example: the pronunciation of " lacking " this polyphone has two kinds, one be "
The one ˇ ", another then be "
The one ` "; difference only is the difference of tone; we just cast out it; last left be our polyphone data, its word that comprises roughly has: row, young, happy and, weigh, says, does, grow, greatly, once, Shen, emit, do not have, the school, from, all, fall, towards, pass, single, walk back and forth, call together, just, fall, contain, strong, accent, join, stick, province, fill in, poor, cover, be close to, as, bullet, screen, luxuriant, more, sudden and violent, ripe, mould, give, approach, accuse, frighten, hide, go back, Zhai, know, ride, be, feel, reveal, belong to, stir or the like.
Referring to Fig. 1, the lexical data base set-up method step of voice identification system of the present invention is as follows:
Step S11: the polyphone data are provided;
Step S12: input vocabulary;
Step S13: compare this polyphone data, judge whether this vocabulary comprises at least one polyphone, if then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of this polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
Step S14: store this vocabulary and these acoustic models to this lexical data base.
Wherein, above-mentioned polyphone data comprise a plurality of polyphones and articulation type thereof, and above-mentioned acoustic model is concealed markov model.
Referring to Fig. 2, be specific embodiment with the singer name, it is as follows that the present invention builds the method step of the lexical data base of putting singer name:
Step S21: read in singer name;
Step S22: the singer name and the polyphone data of input are compared, judge whether this singer name comprises at least one polyphone, if, execution in step S23, if not, execution in step S24;
Step S23: increase by one group of name that replaces by polyphone;
Step S24: the word of name is converted to by concealed markov model represent respectively;
Step S25: whether read the finishing touch singer name, if, execution in step S26 if not, execution in step S21;
Step S26: finish initialization, enter the identification flow process.
Build the lexical data base of putting by the present invention, have the polyphone discriminating function, allow the user can be, and obtain correct identification result according to general habitual pronunciation.
In addition, in the speech recognition technology, each Chinese words can be decomposed into it initial consonant and simple or compound vowel of a Chinese syllable, initial consonant appears at the syllable front end, simple or compound vowel of a Chinese syllable appears at the syllable tail end, each Chinese words all can utilize the acoustic model of two expression initial consonants and simple or compound vowel of a Chinese syllable to represent, and speech recognition promptly is to do judgement by the acoustic model probit value that calculates initial consonant and simple or compound vowel of a Chinese syllable, if therefore the vocabulary in the lexical data base is sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word, can save the calculated amount when searching comparison.
Referring to Fig. 3, the step that the lexical data base of voice identification system of the present invention is searched comparison method is as follows:
Step S31 a: lexical data base is provided: this lexical data base comprises a plurality of vocabulary, and these vocabulary are to sort in the adjacent mode of the identical person of prefix, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
Step S32: import a speech sound signal;
Step S33 a: characteristic parameter that obtains this speech sound signal: this characteristic parameter be the Mel cepstral coefficients (Mel-Frequency Cepstrum Coefficients, MFCC);
Step S34: the acoustic model of characteristic parameter and these vocabulary is compared one by one: acoustic model is a probit value that produces respectively corresponding to characteristic parameter, wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words (vocabulary in the lexical data base to be sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word);
Step S35: by the probit value of these vocabulary, to carry out the identification of speech sound signal.
Above-mentioned acoustic model is a concealed markov model, and above-mentioned probit value is to utilize a viterbi algorithm (Viterbi Algorithm) to calculate to produce.
Lexical data base with singer name is an example, if sum has 692 singer name, have 2233 words, when doing viterbi algorithm counter rate, 4466 times search will be done with the acoustic model of system in every section voice, in these are searched part being arranged is double counting, and therefore, the present invention sorts singer name, allow singer mutually of the same surname come together, and write down the probability of previous name phonetically similar word, so when calculating this singer's name, as long as calculate the probability of non-phonetically similar word.
Referring to Fig. 4, the step of the preferred embodiment of the lexical data base of voice identification system of the present invention search comparison method is as follows:
Step S41: the Mel cepstral coefficients (characteristic parameter of the speech sound signal that is obtained) of input voice;
Step S42: read in the singer name model;
Step S43: judge whether the pronunciation of present singer name and previous singer name repeat, if, execution in step S44, if not, execution in step S45 then;
Step S44: utilize the probability of previous name record to replace the word of same pronunciation, proceed next procedure by the word of difference pronunciation again;
Step S45: utilize viterbi algorithm (Viterbi Algorithm) counter rate;
Step S46: the probability that stores present each word of singer name;
Step S47: all singer name counter rate all whether, if execution in step S48 if not, then repeats above-mentioned steps S42; And
Step S48: the singer name of arranging out five maximum of probabilitys.
With singer name " Chen Lihang " is example, it is adjacent with singer " Chen Lihong ", the pronunciation of preceding two words of these two singer name is identical, therefore when the calculating of doing viterbi algorithm, the Mel cepstral coefficients of input voice is done probability with 6 acoustic models of " Chen Lihang " representative earlier and is calculated, and store the probit value of its each word, when next importing voice and will do probability and calculate with " Chen Lihong ", only need utilize previous name to calculate the probability of " Chen Li " these two words, then add 2 probit values that acoustic model calculated of " grand " at present, can obtain the complete probability of " Chen Lihong ".
Claims (5)
1. the set-up method of the lexical data base of a voice identification system, it is characterized in that: the method includes the steps of:
1), provide the polyphone data: the polyphone data comprise a plurality of polyphones and articulation type thereof;
2), input vocabulary;
3), set up acoustic model: vocabulary and polyphone data are compared, judge whether this vocabulary comprises at least one polyphone, if, then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of the polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
4), store the acoustic model of this vocabulary and correspondence thereof to lexical data base.
2. the set-up method of the lexical data base of voice identification system according to claim 1, it is characterized in that: described acoustic model is concealed markov model.
3. search comparison method that utilizes the lexical data base of the described voice identification system of claim 1, it is characterized in that: it may further comprise the steps:
1), provide a lexical data base: this lexical data base comprises a plurality of vocabulary, and the vocabulary that wherein prefix is identical is sorted in adjacent mode, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
2), input speech sound signal;
3), obtain the characteristic parameter of speech sound signal: wherein characteristic parameter is the Mel cepstral coefficients;
4), the characteristic parameter that step 3) is obtained and the acoustic model of these vocabulary are compared one by one: acoustic model is the probit value that produces respectively corresponding to characteristic parameter, and wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words;
5), carry out the identification of speech sound signal by the probit value of these vocabulary.
4. the lexical data base of voice identification system according to claim 3 is searched comparison method, and it is characterized in that: described acoustic model is concealed markov model.
5. the lexical data base of voice identification system according to claim 3 is searched comparison method, it is characterized in that: described probit value utilizes viterbi algorithm to calculate and produces.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101857093A CN101217035A (en) | 2007-12-29 | 2007-12-29 | A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2007101857093A CN101217035A (en) | 2007-12-29 | 2007-12-29 | A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101217035A true CN101217035A (en) | 2008-07-09 |
Family
ID=39623457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2007101857093A Pending CN101217035A (en) | 2007-12-29 | 2007-12-29 | A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101217035A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102135812A (en) * | 2010-11-30 | 2011-07-27 | 华为终端有限公司 | method and device for inputting polyphonic Chinese characters |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103514236A (en) * | 2012-06-30 | 2014-01-15 | 重庆新媒农信科技有限公司 | Retrieval condition error correction prompt processing method based on Pinyin in retrieval application |
CN103578467A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Acoustic model building method, voice recognition method and electronic device |
WO2016101577A1 (en) * | 2014-12-24 | 2016-06-30 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN106128457A (en) * | 2016-08-29 | 2016-11-16 | 昆山邦泰汽车零部件制造有限公司 | A kind of control method talking with robot |
-
2007
- 2007-12-29 CN CNA2007101857093A patent/CN101217035A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102135812A (en) * | 2010-11-30 | 2011-07-27 | 华为终端有限公司 | method and device for inputting polyphonic Chinese characters |
CN103365925A (en) * | 2012-04-09 | 2013-10-23 | 高德软件有限公司 | Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices |
CN103514236A (en) * | 2012-06-30 | 2014-01-15 | 重庆新媒农信科技有限公司 | Retrieval condition error correction prompt processing method based on Pinyin in retrieval application |
CN103514236B (en) * | 2012-06-30 | 2017-06-09 | 重庆新媒农信科技有限公司 | Search condition error correcting prompt processing method based on phonetic in retrieval application |
CN103578467A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Acoustic model building method, voice recognition method and electronic device |
WO2016101577A1 (en) * | 2014-12-24 | 2016-06-30 | 中兴通讯股份有限公司 | Voice recognition method, client and terminal device |
CN106128457A (en) * | 2016-08-29 | 2016-11-16 | 昆山邦泰汽车零部件制造有限公司 | A kind of control method talking with robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7467087B1 (en) | Training and using pronunciation guessers in speech recognition | |
Johnson | Massive reduction in conversational American English | |
US6317712B1 (en) | Method of phonetic modeling using acoustic decision tree | |
CN1112669C (en) | Method and system for speech recognition using continuous density hidden Markov models | |
JP5200712B2 (en) | Speech recognition apparatus, speech recognition method, and computer program | |
US20060173685A1 (en) | Method and apparatus for constructing new chinese words by voice input | |
CN101217035A (en) | A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system | |
CN108877769B (en) | Method and device for identifying dialect type | |
EP1215654B1 (en) | Method for recognizing speech | |
EP1933302A1 (en) | Speech recognition method | |
KR20180057970A (en) | Apparatus and method for recognizing emotion in speech | |
White et al. | Maximum entropy confidence estimation for speech recognition | |
US20050075887A1 (en) | Automatic language independent triphone training using a phonetic table | |
JPWO2006093092A1 (en) | Conversation system and conversation software | |
JP2974621B2 (en) | Speech recognition word dictionary creation device and continuous speech recognition device | |
TWI299854B (en) | Lexicon database implementation method for audio recognition system and search/match method thereof | |
EP3718107B1 (en) | Speech signal processing and evaluation | |
JP3576066B2 (en) | Speech synthesis system and speech synthesis method | |
JPH11311994A (en) | Information processor, information processing method, and presentation media | |
JP5315976B2 (en) | Speech recognition apparatus, speech recognition method, and program | |
Ferrieux et al. | Phoneme-level indexing for fast and vocabulary-independent voice/voice retrieval | |
Arısoy | Turkish dictation system for radiology and broadcast news applications | |
US20090112591A1 (en) | System and method of word lattice augmentation using a pre/post vocalic consonant distinction | |
JP2980382B2 (en) | Speaker adaptive speech recognition method and apparatus | |
Peng et al. | An innovative prosody modeling method for Chinese speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Open date: 20080709 |