CN101217035A

CN101217035A - A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system

Info

Publication number: CN101217035A
Application number: CNA2007101857093A
Authority: CN
Inventors: 廖崇伯; 陈淮琰
Original assignee: Inventec Besta Xian Co Ltd
Current assignee: Inventec Besta Xian Co Ltd
Priority date: 2007-12-29
Filing date: 2007-12-29
Publication date: 2008-07-09

Abstract

The invention relates to a method of building a character stock in a speech recognition system and a searching and comparing method thereof to solve the problem of calculating the same character repeatedly and reducing the whole operand. The method comprises the following steps of: 1) providing the data of polyphonic characters; 2) typing in the data; 3) building an acoustic model; 4) storing the data and the corresponding acoustic model thereof into the character stock. The character stock in the invention has the function of polyphonic characters recognition, so that the speech recognition system is closer to the pronunciation habit of the average users with human elements, therefore, enabling the users to follow the conventional pronunciation and to receive the correct recognition result.

Description

The lexical data base set-up method of voice identification system and search comparison method thereof

Technical field

The present invention relates to a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient.

Background technology

Known voice identification system does not add the processing capacity of polyphone, causes the user when carrying out phonetic entry, and the another kind pronunciation that must read into its polyphone could the identification success, and for example, " OK " word of the old Lixing of name must pronounce to be " factory

' " could identification success, as user's pronunciation for "

' " just correctly identification, again for example, " pleasure " word of philharmonic society, must pronounce for "

` " the ability identification, if pronunciation is " ㄩ

` " also correctly identification, and such phonetic entry mode is with generally there is very big difference user's the pronunciation custom.In addition, voice identification system is when carrying out identification, normally utilize viterbi algorithm (Viterbi Algorithm) calculate each word in the vocabulary the probit value of corresponding acoustic model carry out identification, and such calculation also is the place of voice identification system cost max calculation amount, therefore, unnecessary calculated amount will increase the weight of if frequent some identical word of double counting will cause system, also can cause the decline of System Discrimination speed, therefore facilitate us to think deeply and how to avoid the identical word of double counting to reduce whole operand.

Summary of the invention

The purpose of this invention is to provide a kind of lexical data base set-up method of voice identification system and search comparison method, particularly a kind of search comparison method of supporting the lexical data base set-up method that polyphone is handled and having more efficient, thus avoid the identical word of double counting to reduce the technical matters of whole operand.

Technical solution of the present invention is: the present invention is a kind of set-up method of lexical data base of voice identification system, and its special character is: the method includes the steps of:

1), provide the polyphone data: the polyphone data comprise a plurality of polyphones and articulation type thereof;

2), input vocabulary;

3), set up acoustic model: vocabulary and polyphone data are compared, judge whether this vocabulary comprises at least one polyphone, if, then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of the polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;

4), store the acoustic model of this vocabulary and correspondence thereof to lexical data base.

A kind of search comparison method that utilizes the lexical data base of above-mentioned voice identification system, its special character is: this method may further comprise the steps:

1), provide a lexical data base: this lexical data base comprises a plurality of vocabulary, and the vocabulary that wherein prefix is identical is sorted in adjacent mode, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;

2), input speech sound signal;

3), obtain the characteristic parameter of speech sound signal: wherein characteristic parameter is the Mel cepstral coefficients;

4), the characteristic parameter that step 3) is obtained and the acoustic model of these vocabulary are compared one by one: acoustic model is the probit value that produces respectively corresponding to characteristic parameter, and wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words;

5), carry out the identification of speech sound signal by the probit value of these vocabulary.

Above-mentioned acoustic model is concealed markov model.

Above-mentioned probit value utilizes viterbi algorithm (Viterbi Algorithm) to calculate and produces.

The lexical data base set-up method of voice identification system of the present invention and search comparison method thereof, can build and put a kind of lexical data base of supporting the polyphone processing capacity, by polyphone data comparison with required vocabulary and lexical data base, judge whether this vocabulary comprises at least one polyphone, thereby set up corresponding one or a plurality of acoustic model respectively for one of polyphone that this vocabulary comprised or a plurality of articulation type, make the present invention build the lexical data base of putting, has the polyphone discriminating function, make voice identification system hommization more, also more press close to general user's pronunciation custom, allow the user can be, and obtain correct identification result according to general habitual pronunciation; Compare one by one by the characteristic parameter of the speech sound signal that obtains and the acoustic model of these vocabulary, carry out the identification of speech sound signal, can avoid double counting, improve the integral operation amount identical word according to the acoustic model probit value.

Description of drawings

Fig. 1 is the process flow diagram of the lexical data base set-up method of voice identification system of the present invention;

Fig. 2 is the process flow diagram of specific embodiment of the lexical data base set-up method of voice identification system of the present invention;

Fig. 3 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched comparison method;

Fig. 4 is the process flow diagram that the lexical data base of voice identification system of the present invention is searched the specific embodiment of comparison method.

Embodiment

Voice identification system of the present invention mainly is to utilize concealed markov model (Hidden MarkovModel, HMM) method is done identification, it describes the phenomenon of pronunciation with the probability model, with the phonation of a bit of voice, regards that continuum of states shifts in the markov model as; Wherein the speech characteristic parameter that identification process utilized is Mel cepstral coefficients (Mel-Frequency CepstrumCoefficients, MFCC), it is except considering the impression degree of people's ear to different frequency, have more and separate pronunciation cavity model and the characteristic that excites signal, make us when speech recognition, can not be subjected to speaker's volume, or the influence of five kinds of tones of Chinese speech (, two, three, the four tones of standard Chinese pronunciation and softly).

Based on above characteristic, we will select the polyphone that is fit to identification system of the present invention from 245 Chinese polyphones, because the characteristic parameter that uses during identification is the Mel cepstral coefficients, therefore its pronunciation difference only is these words that tone is different in the polyphone, be not comprised in our polyphone to be processed, for example: the pronunciation of " lacking " this polyphone has two kinds, one be "

The one ˇ ", another then be "

The one ` "; difference only is the difference of tone; we just cast out it; last left be our polyphone data, its word that comprises roughly has: row, young, happy and, weigh, says, does, grow, greatly, once, Shen, emit, do not have, the school, from, all, fall, towards, pass, single, walk back and forth, call together, just, fall, contain, strong, accent, join, stick, province, fill in, poor, cover, be close to, as, bullet, screen, luxuriant, more, sudden and violent, ripe, mould, give, approach, accuse, frighten, hide, go back, Zhai, know, ride, be, feel, reveal, belong to, stir or the like.

Referring to Fig. 1, the lexical data base set-up method step of voice identification system of the present invention is as follows:

Step S11: the polyphone data are provided;

Step S12: input vocabulary;

Step S13: compare this polyphone data, judge whether this vocabulary comprises at least one polyphone, if then set up corresponding a plurality of acoustic model respectively for a plurality of articulation types of this polyphone that this vocabulary comprised, if not, then set up the acoustic model of single correspondence for this vocabulary;

Step S14: store this vocabulary and these acoustic models to this lexical data base.

Wherein, above-mentioned polyphone data comprise a plurality of polyphones and articulation type thereof, and above-mentioned acoustic model is concealed markov model.

Referring to Fig. 2, be specific embodiment with the singer name, it is as follows that the present invention builds the method step of the lexical data base of putting singer name:

Step S21: read in singer name;

Step S22: the singer name and the polyphone data of input are compared, judge whether this singer name comprises at least one polyphone, if, execution in step S23, if not, execution in step S24;

Step S23: increase by one group of name that replaces by polyphone;

Step S24: the word of name is converted to by concealed markov model represent respectively;

Step S25: whether read the finishing touch singer name, if, execution in step S26 if not, execution in step S21;

Step S26: finish initialization, enter the identification flow process.

Build the lexical data base of putting by the present invention, have the polyphone discriminating function, allow the user can be, and obtain correct identification result according to general habitual pronunciation.

In addition, in the speech recognition technology, each Chinese words can be decomposed into it initial consonant and simple or compound vowel of a Chinese syllable, initial consonant appears at the syllable front end, simple or compound vowel of a Chinese syllable appears at the syllable tail end, each Chinese words all can utilize the acoustic model of two expression initial consonants and simple or compound vowel of a Chinese syllable to represent, and speech recognition promptly is to do judgement by the acoustic model probit value that calculates initial consonant and simple or compound vowel of a Chinese syllable, if therefore the vocabulary in the lexical data base is sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word, can save the calculated amount when searching comparison.

Referring to Fig. 3, the step that the lexical data base of voice identification system of the present invention is searched comparison method is as follows:

Step S31 a: lexical data base is provided: this lexical data base comprises a plurality of vocabulary, and these vocabulary are to sort in the adjacent mode of the identical person of prefix, and these vocabulary are corresponding to a plurality of acoustic models in man-to-man mode;

Step S32: import a speech sound signal;

Step S33 a: characteristic parameter that obtains this speech sound signal: this characteristic parameter be the Mel cepstral coefficients (Mel-Frequency Cepstrum Coefficients, MFCC);

Step S34: the acoustic model of characteristic parameter and these vocabulary is compared one by one: acoustic model is a probit value that produces respectively corresponding to characteristic parameter, wherein, each vocabulary is to inherit the probit value that the same pronunciation character is produced in the last adjacent words (vocabulary in the lexical data base to be sorted in the mode that the identical person of prefix comes together, and write down the probit value of previous vocabulary phonetically similar word, when calculating just if calculate present vocabulary with on vocabulary probit value of phonetically similar word not, and do not need the probit value of double counting phonetically similar word);

Step S35: by the probit value of these vocabulary, to carry out the identification of speech sound signal.

Above-mentioned acoustic model is a concealed markov model, and above-mentioned probit value is to utilize a viterbi algorithm (Viterbi Algorithm) to calculate to produce.

Lexical data base with singer name is an example, if sum has 692 singer name, have 2233 words, when doing viterbi algorithm counter rate, 4466 times search will be done with the acoustic model of system in every section voice, in these are searched part being arranged is double counting, and therefore, the present invention sorts singer name, allow singer mutually of the same surname come together, and write down the probability of previous name phonetically similar word, so when calculating this singer's name, as long as calculate the probability of non-phonetically similar word.

Referring to Fig. 4, the step of the preferred embodiment of the lexical data base of voice identification system of the present invention search comparison method is as follows:

Step S41: the Mel cepstral coefficients (characteristic parameter of the speech sound signal that is obtained) of input voice;

Step S42: read in the singer name model;

Step S43: judge whether the pronunciation of present singer name and previous singer name repeat, if, execution in step S44, if not, execution in step S45 then;

Step S44: utilize the probability of previous name record to replace the word of same pronunciation, proceed next procedure by the word of difference pronunciation again;

Step S45: utilize viterbi algorithm (Viterbi Algorithm) counter rate;

Step S46: the probability that stores present each word of singer name;

Step S47: all singer name counter rate all whether, if execution in step S48 if not, then repeats above-mentioned steps S42; And

Step S48: the singer name of arranging out five maximum of probabilitys.

With singer name " Chen Lihang " is example, it is adjacent with singer " Chen Lihong ", the pronunciation of preceding two words of these two singer name is identical, therefore when the calculating of doing viterbi algorithm, the Mel cepstral coefficients of input voice is done probability with 6 acoustic models of " Chen Lihang " representative earlier and is calculated, and store the probit value of its each word, when next importing voice and will do probability and calculate with " Chen Lihong ", only need utilize previous name to calculate the probability of " Chen Li " these two words, then add 2 probit values that acoustic model calculated of " grand " at present, can obtain the complete probability of " Chen Lihong ".

Claims

1. the set-up method of the lexical data base of a voice identification system, it is characterized in that: the method includes the steps of:

2), input vocabulary;

2. the set-up method of the lexical data base of voice identification system according to claim 1, it is characterized in that: described acoustic model is concealed markov model.

3. search comparison method that utilizes the lexical data base of the described voice identification system of claim 1, it is characterized in that: it may further comprise the steps:

2), input speech sound signal;

4. the lexical data base of voice identification system according to claim 3 is searched comparison method, and it is characterized in that: described acoustic model is concealed markov model.

5. the lexical data base of voice identification system according to claim 3 is searched comparison method, it is characterized in that: described probit value utilizes viterbi algorithm to calculate and produces.