CN101217035A - A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system - Google Patents

A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system Download PDF

Info

Publication number
CN101217035A
CN101217035A CNA2007101857093A CN200710185709A CN101217035A CN 101217035 A CN101217035 A CN 101217035A CN A2007101857093 A CNA2007101857093 A CN A2007101857093A CN 200710185709 A CN200710185709 A CN 200710185709A CN 101217035 A CN101217035 A CN 101217035A
Authority
CN
China
Prior art keywords
vocabulary
data base
acoustic model
polyphone
identification system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007101857093A
Other languages
Chinese (zh)
Inventor
廖崇伯
陈淮琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CNA2007101857093A priority Critical patent/CN101217035A/en
Publication of CN101217035A publication Critical patent/CN101217035A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method of building a character stock in a speech recognition system and a searching and comparing method thereof to solve the problem of calculating the same character repeatedly and reducing the whole operand. The method comprises the following steps of: 1) providing the data of polyphonic characters; 2) typing in the data; 3) building an acoustic model; 4) storing the data and the corresponding acoustic model thereof into the character stock. The character stock in the invention has the function of polyphonic characters recognition, so that the speech recognition system is closer to the pronunciation habit of the average users with human elements, therefore, enabling the users to follow the conventional pronunciation and to receive the correct recognition result.

Description

The lexical data base set-up method of voice identification system and search comparison method thereof
Technical field
The present invention relates to a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient.
Background technology
Known voice identification system does not add the processing capacity of polyphone, causes the user when carrying out phonetic entry, and the another kind pronunciation that must read into its polyphone could the identification success, and for example, " OK " word of the old Lixing of name must pronounce to be " factory
Figure S2007101857093D00011
' " could identification success, as user's pronunciation for "
Figure S2007101857093D00012
Figure S2007101857093D00013
' " just correctly identification, again for example, " pleasure " word of philharmonic society, must pronounce for "
Figure S2007101857093D00014
Figure S2007101857093D00015
` " the ability identification, if pronunciation is " ㄩ
Figure S2007101857093D00016
` " also correctly identification, and such phonetic entry mode is with generally there is very big difference user's the pronunciation custom.In addition, voice identification system is when carrying out identification, normally utilize viterbi algorithm (Viterbi Algorithm) calculate each word in the vocabulary the probit value of corresponding acoustic model carry out identification, and such calculation also is the place of voice identification system cost max calculation amount, therefore, unnecessary calculated amount will increase the weight of if frequent some identical word of double counting will cause system, also can cause the decline of System Discrimination speed, therefore facilitate us to think deeply and how to avoid the identical word of double counting to reduce whole operand.
Summary of the invention
The purpose of this invention is to provide a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient, thus avoid the identical word of double counting to reduce the technical matters of whole operand.
Technical solution of the present invention is: the present invention is a kind of set-up method of lexical data base of voice identification system, and its special character is: the method includes the steps of:
1), provide the polyphone data: the polyphone data comprise a plurality of polyphones and articulation type thereof;
2), input vocabulary;
3), set up acoustic model: vocabulary and polyphone data are compared, judge whether this vocabulary comprises at least one polyphone, if, then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of the polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
4), store the acoustic model of this vocabulary and correspondence thereof to lexical data base.
A kind of search comparison method that utilizes the lexical data base of above-mentioned voice identification system, its special character is: this method may further comprise the steps:
1), provide a lexical data base: this lexical data base comprises a plurality of vocabulary, and the vocabulary that wherein prefix is identical is sorted in adjacent mode, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
2), input speech sound signal;
3), obtain the characteristic parameter of speech sound signal: wherein characteristic parameter is the Mel cepstral coefficients;
4), the characteristic parameter that step 3) is obtained and the acoustic model of these vocabulary are compared one by one: acoustic model is the probit value that produces respectively corresponding to characteristic parameter, and wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words;
5), carry out the identification of speech sound signal by the probit value of these vocabulary.
Above-mentioned acoustic model is concealed markov model.
Above-mentioned probit value utilizes viterbi algorithm (Viterbi Algorithm) to calculate and produces.
The lexical data base set-up method of voice identification system of the present invention and search comparison method thereof, can build and put a kind of lexical data base of supporting the polyphone processing capacity, by polyphone data comparison with required vocabulary and lexical data base, judge whether this vocabulary comprises at least one polyphone, thereby set up corresponding one or a plurality of acoustic model respectively for one of polyphone that this vocabulary comprised or a plurality of articulation type, make the present invention build the lexical data base of putting, has the polyphone discriminating function, make voice identification system hommization more, also more press close to general user's pronunciation custom, allow the user can be, and obtain correct identification result according to general habitual pronunciation; Compare one by one by the characteristic parameter of the speech sound signal that obtains and the acoustic model of these vocabulary, carry out the identification of speech sound signal, can avoid double counting, improve the integral operation amount identical word according to the acoustic model probit value.
Description of drawings
Fig. 1 is the process flow diagram of the lexical data base set-up method of voice identification system of the present invention;
Fig. 2 is the process flow diagram of specific embodiment of the lexical data base set-up method of voice identification system of the present invention;
Fig. 3 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched comparison method;
Fig. 4 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched the specific embodiment of comparison method.
Embodiment
Voice identification system of the present invention mainly is to utilize concealed markov model (Hidden MarkovModel, HMM) method is done identification, it describes the phenomenon of pronunciation with the probability model, with the phonation of a bit of voice, regards that continuum of states shifts in the markov model as; Wherein the speech characteristic parameter that identification process utilized is Mel cepstral coefficients (Mel-Frequency CepstrumCoefficients, MFCC), it is except considering the impression degree of people's ear to different frequency, have more and separate pronunciation cavity model and the characteristic that excites signal, make us when speech recognition, can not be subjected to speaker's volume, or the influence of five kinds of tones of Chinese speech (, two, three, the four tones of standard Chinese pronunciation and softly).
Based on above characteristic, we will select the polyphone that is fit to identification system of the present invention from 245 Chinese polyphones, because the characteristic parameter that uses during identification is the Mel cepstral coefficients, therefore its pronunciation difference only is these words that tone is different in the polyphone, be not comprised in our polyphone to be processed, for example: the pronunciation of " lacking " this polyphone has two kinds, one be "
Figure S2007101857093D00031
The one ˇ ", another then be "
Figure S2007101857093D00032
The one ` "; difference only is the difference of tone; we just cast out it; last left be our polyphone data, its word that comprises roughly has: row, young, happy and, weigh, says, does, grow, greatly, once, Shen, emit, do not have, the school, from, all, fall, towards, pass, single, walk back and forth, call together, just, fall, contain, strong, accent, join, stick, province, fill in, poor, cover, be close to, as, bullet, screen, luxuriant, more, sudden and violent, ripe, mould, give, approach, accuse, frighten, hide, go back, Zhai, know, ride, be, feel, reveal, belong to, stir or the like.
Referring to Fig. 1, the lexical data base set-up method step of voice identification system of the present invention is as follows:
Step S11: the polyphone data are provided;
Step S12: input vocabulary;
Step S13: compare this polyphone data, judge whether this vocabulary comprises at least one polyphone, if then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of this polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
Step S14: store this vocabulary and these acoustic models to this lexical data base.
Wherein, above-mentioned polyphone data comprise a plurality of polyphones and articulation type thereof, and above-mentioned acoustic model is concealed markov model.
Referring to Fig. 2, be specific embodiment with the singer name, it is as follows that the present invention builds the method step of the lexical data base of putting singer name:
Step S21: read in singer name;
Step S22: the singer name and the polyphone data of input are compared, judge whether this singer name comprises at least one polyphone, if, execution in step S23, if not, execution in step S24;
Step S23: increase by one group of name that replaces by polyphone;
Step S24: the word of name is converted to by concealed markov model represent respectively;
Step S25: whether read the finishing touch singer name, if, execution in step S26 if not, execution in step S21;
Step S26: finish initialization, enter the identification flow process.
Build the lexical data base of putting by the present invention, have the polyphone discriminating function, allow the user can be, and obtain correct identification result according to general habitual pronunciation.
In addition, in the speech recognition technology, each Chinese words can be decomposed into it initial consonant and simple or compound vowel of a Chinese syllable, initial consonant appears at the syllable front end, simple or compound vowel of a Chinese syllable appears at the syllable tail end, each Chinese words all can utilize the acoustic model of two expression initial consonants and simple or compound vowel of a Chinese syllable to represent, and speech recognition promptly is to do judgement by the acoustic model probit value that calculates initial consonant and simple or compound vowel of a Chinese syllable, if therefore the vocabulary in the lexical data base is sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word, can save the calculated amount when searching comparison.
Referring to Fig. 3, the step that the lexical data base of voice identification system of the present invention is searched comparison method is as follows:
Step S31 a: lexical data base is provided: this lexical data base comprises a plurality of vocabulary, and these vocabulary are to sort in the adjacent mode of the identical person of prefix, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
Step S32: import a speech sound signal;
Step S33 a: characteristic parameter that obtains this speech sound signal: this characteristic parameter be the Mel cepstral coefficients (Mel-Frequency Cepstrum Coefficients, MFCC);
Step S34: the acoustic model of characteristic parameter and these vocabulary is compared one by one: acoustic model is a probit value that produces respectively corresponding to characteristic parameter, wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words (vocabulary in the lexical data base to be sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word);
Step S35: by the probit value of these vocabulary, to carry out the identification of speech sound signal.
Above-mentioned acoustic model is a concealed markov model, and above-mentioned probit value is to utilize a viterbi algorithm (Viterbi Algorithm) to calculate to produce.
Lexical data base with singer name is an example, if sum has 692 singer name, have 2233 words, when doing viterbi algorithm counter rate, 4466 times search will be done with the acoustic model of system in every section voice, in these are searched part being arranged is double counting, and therefore, the present invention sorts singer name, allow singer mutually of the same surname come together, and write down the probability of previous name phonetically similar word, so when calculating this singer's name, as long as calculate the probability of non-phonetically similar word.
Referring to Fig. 4, the step of the preferred embodiment of the lexical data base of voice identification system of the present invention search comparison method is as follows:
Step S41: the Mel cepstral coefficients (characteristic parameter of the speech sound signal that is obtained) of input voice;
Step S42: read in the singer name model;
Step S43: judge whether the pronunciation of present singer name and previous singer name repeat, if, execution in step S44, if not, execution in step S45 then;
Step S44: utilize the probability of previous name record to replace the word of same pronunciation, proceed next procedure by the word of difference pronunciation again;
Step S45: utilize viterbi algorithm (Viterbi Algorithm) counter rate;
Step S46: the probability that stores present each word of singer name;
Step S47: all singer name counter rate all whether, if execution in step S48 if not, then repeats above-mentioned steps S42; And
Step S48: the singer name of arranging out five maximum of probabilitys.
With singer name " Chen Lihang " is example, it is adjacent with singer " Chen Lihong ", the pronunciation of preceding two words of these two singer name is identical, therefore when the calculating of doing viterbi algorithm, the Mel cepstral coefficients of input voice is done probability with 6 acoustic models of " Chen Lihang " representative earlier and is calculated, and store the probit value of its each word, when next importing voice and will do probability and calculate with " Chen Lihong ", only need utilize previous name to calculate the probability of " Chen Li " these two words, then add 2 probit values that acoustic model calculated of " grand " at present, can obtain the complete probability of " Chen Lihong ".

Claims (5)

1. the set-up method of the lexical data base of a voice identification system, it is characterized in that: the method includes the steps of:
1), provide the polyphone data: the polyphone data comprise a plurality of polyphones and articulation type thereof;
2), input vocabulary;
3), set up acoustic model: vocabulary and polyphone data are compared, judge whether this vocabulary comprises at least one polyphone, if, then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of the polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;
4), store the acoustic model of this vocabulary and correspondence thereof to lexical data base.
2. the set-up method of the lexical data base of voice identification system according to claim 1, it is characterized in that: described acoustic model is concealed markov model.
3. search comparison method that utilizes the lexical data base of the described voice identification system of claim 1, it is characterized in that: it may further comprise the steps:
1), provide a lexical data base: this lexical data base comprises a plurality of vocabulary, and the vocabulary that wherein prefix is identical is sorted in adjacent mode, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;
2), input speech sound signal;
3), obtain the characteristic parameter of speech sound signal: wherein characteristic parameter is the Mel cepstral coefficients;
4), the characteristic parameter that step 3) is obtained and the acoustic model of these vocabulary are compared one by one: acoustic model is the probit value that produces respectively corresponding to characteristic parameter, and wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words;
5), carry out the identification of speech sound signal by the probit value of these vocabulary.
4. the lexical data base of voice identification system according to claim 3 is searched comparison method, and it is characterized in that: described acoustic model is concealed markov model.
5. the lexical data base of voice identification system according to claim 3 is searched comparison method, it is characterized in that: described probit value utilizes viterbi algorithm to calculate and produces.
CNA2007101857093A 2007-12-29 2007-12-29 A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system Pending CN101217035A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007101857093A CN101217035A (en) 2007-12-29 2007-12-29 A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007101857093A CN101217035A (en) 2007-12-29 2007-12-29 A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system

Publications (1)

Publication Number Publication Date
CN101217035A true CN101217035A (en) 2008-07-09

Family

ID=39623457

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007101857093A Pending CN101217035A (en) 2007-12-29 2007-12-29 A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system

Country Status (1)

Country Link
CN (1) CN101217035A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135812A (en) * 2010-11-30 2011-07-27 华为终端有限公司 method and device for inputting polyphonic Chinese characters
CN103365925A (en) * 2012-04-09 2013-10-23 高德软件有限公司 Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices
CN103514236A (en) * 2012-06-30 2014-01-15 重庆新媒农信科技有限公司 Retrieval condition error correction prompt processing method based on Pinyin in retrieval application
CN103578467A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Acoustic model building method, voice recognition method and electronic device
WO2016101577A1 (en) * 2014-12-24 2016-06-30 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN106128457A (en) * 2016-08-29 2016-11-16 昆山邦泰汽车零部件制造有限公司 A kind of control method talking with robot

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102135812A (en) * 2010-11-30 2011-07-27 华为终端有限公司 method and device for inputting polyphonic Chinese characters
CN103365925A (en) * 2012-04-09 2013-10-23 高德软件有限公司 Method for acquiring polyphone spelling, method for retrieving based on spelling, and corresponding devices
CN103514236A (en) * 2012-06-30 2014-01-15 重庆新媒农信科技有限公司 Retrieval condition error correction prompt processing method based on Pinyin in retrieval application
CN103514236B (en) * 2012-06-30 2017-06-09 重庆新媒农信科技有限公司 Search condition error correcting prompt processing method based on phonetic in retrieval application
CN103578467A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Acoustic model building method, voice recognition method and electronic device
WO2016101577A1 (en) * 2014-12-24 2016-06-30 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN106128457A (en) * 2016-08-29 2016-11-16 昆山邦泰汽车零部件制造有限公司 A kind of control method talking with robot

Similar Documents

Publication Publication Date Title
US7467087B1 (en) Training and using pronunciation guessers in speech recognition
Johnson Massive reduction in conversational American English
US6317712B1 (en) Method of phonetic modeling using acoustic decision tree
CN1112669C (en) Method and system for speech recognition using continuous density hidden Markov models
JP5200712B2 (en) Speech recognition apparatus, speech recognition method, and computer program
US20060173685A1 (en) Method and apparatus for constructing new chinese words by voice input
CN101217035A (en) A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system
CN108877769B (en) Method and device for identifying dialect type
EP1215654B1 (en) Method for recognizing speech
EP1933302A1 (en) Speech recognition method
KR20180057970A (en) Apparatus and method for recognizing emotion in speech
White et al. Maximum entropy confidence estimation for speech recognition
US20050075887A1 (en) Automatic language independent triphone training using a phonetic table
JPWO2006093092A1 (en) Conversation system and conversation software
JP2974621B2 (en) Speech recognition word dictionary creation device and continuous speech recognition device
TWI299854B (en) Lexicon database implementation method for audio recognition system and search/match method thereof
EP3718107B1 (en) Speech signal processing and evaluation
JP3576066B2 (en) Speech synthesis system and speech synthesis method
JPH11311994A (en) Information processor, information processing method, and presentation media
JP5315976B2 (en) Speech recognition apparatus, speech recognition method, and program
Ferrieux et al. Phoneme-level indexing for fast and vocabulary-independent voice/voice retrieval
Arısoy Turkish dictation system for radiology and broadcast news applications
US20090112591A1 (en) System and method of word lattice augmentation using a pre/post vocalic consonant distinction
JP2980382B2 (en) Speaker adaptive speech recognition method and apparatus
Peng et al. An innovative prosody modeling method for Chinese speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20080709