JPH06230795A

JPH06230795A - Voice recognizing method using basic hidden markov model

Info

Publication number: JPH06230795A
Application number: JP5013442A
Authority: JP
Inventors: Shigeru Honma; 茂本間
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-01-29
Filing date: 1993-01-29
Publication date: 1994-08-19

Abstract

PURPOSE:To provide the voice recognizing method using basic hidden Markov model(HMM) of high-speed, high recognition accuracy. CONSTITUTION:This voice recognizing method is provided with a learning processing part 6 for analyzing the distribution of continuous time in each state by learning the output probability and transient probability of the basic HMM from a voice analysis part and learning data, parameter storage part 7 storing the output probability, transient probability to be used as an initial value when starting the learning of the basic HMM, output probability, transient probability outputted from the processing part 6, distribution of continuous time in each state, minimum continuous time and maximum continuous time, recognition processing part 9 for performing continuous time control according to the transient probability different at every resident time by using the transient probability at every resident time and the output probability taken out of the storage part 7, and recognized result output part 10 for outputting the recognized result according to the generation probability of respective candidates, the voice recognizing method estimates the transient probability at every resident time so that the distribution of continuous time in each state calculated from the transient probability at every resident time can be coincident with the distribution of continous time in each state at the time of learning extracted from the storage part 7.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、基本隠れマルコフモ
デルを用いた音声認識方法に関し、特に、継続時間制御
を行うことにより認識精度を向上させる基本隠れマルコ
フモデルを用いた音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method using a basic hidden Markov model, and more particularly to a speech recognition method using a basic hidden Markov model for improving recognition accuracy by controlling duration.

【０００２】[0002]

【従来の技術】音声合成認識の技術分野において、音声
信号について考慮する場合、或る音声がどの程度の微小
時間に亘って持続したかに着目し、この持続した微小時
間を継続時間と表現している。ところで、従来の基本隠
れマルコフモデル（Hidden Markov Model 、ＨＭＭ）を
用いた音声認識方法においては、種々の必要上この継続
時間を制御することが行われる。そして、この継続時間
制御の方法には、マルコフ過程に言うところの状態
の継続時間分布を反映した構造を持つＨＭＭを用いるも
の、各状態についての継続時間（滞在時間）毎の遷
移確率を学習時に推定するもの、遷移確率、出力確
率の他に、継続時間が望ましい長さであるものに高得点
を与えるペナルティを生起確率の計算に導入したものそ
の他の方法がある。2. Description of the Related Art In the field of speech synthesis and recognition, when considering a voice signal, attention is paid to how small a certain voice lasts, and this last minute time is expressed as a duration. ing. By the way, in the conventional speech recognition method using a basic Hidden Markov Model (HMM), this duration is controlled for various reasons. The method of this duration control uses an HMM that has a structure that reflects the duration distribution of states, which is the so-called Markov process. At the time of learning the transition probability for each duration (residence time) for each state, Besides estimation, transition probabilities and output probabilities, there are other methods that introduce a penalty for giving a high score to a thing whose duration is a desired length in the calculation of the occurrence probability.

【０００３】[0003]

【発明が解決しようとする課題】上述のおよびの方
法には、学習時に推定すべき遷移確率の数が数倍になる
ことに起因する学習時の計算量の増加、パラメータ１つ
あたりのデータ数が減ることに起因する推定精度の低
下、或は推定精度を低下させないために多くの学習デー
タを必要とするという問題がある。即ち、基本ＨＭＭに
おいては１つの状態についてその状態から出て行く遷移
の数だけ遷移確率を推定するのであるが、これらおよ
びの方法においては考慮する滞在時間に比例した遷移
確率を推定しなければならず、結局、学習時に推定すべ
き遷移確率の数は継続時間制御をしない基本ＨＭＭの数
倍になる。In the above methods (1) and (2), the number of transition probabilities to be estimated at the time of learning is increased to increase the amount of calculation at the time of learning and the number of data per parameter. There is a problem that a large amount of learning data is required in order to prevent the estimation accuracy from decreasing due to the decrease of the estimation accuracy. That is, in the basic HMM, the transition probability is estimated for one state by the number of transitions leaving the state, but in these and these methods, the transition probability proportional to the stay time to be considered must be estimated. In the end, the number of transition probabilities to be estimated at the time of learning is several times that of the basic HMM without the duration control.

【０００４】そして、継続時間制御をしない基本ＨＭＭ
においては、音声認識時の計算は前の状態の生起確率に
出力確率と遷移確率を掛け合わせたものの和を計算して
行くのであるが、の方法は出力確率と遷移確率とペナ
ルティを掛け合わせたものの和を計算しなければならな
い。従って、の方法には、継続時間制御をしない基本
ＨＭＭにおいて行う遷移確率、出力確率の計算以外に、
現在遷移確率、出力確率の計算の対象となっている状態
における滞在時間が望ましい長さであるほど高得点を与
えるペナルティを計算する必要があるために、認識時の
計算量が増加するという問題がある。更にペナルティの
生起確率への組み入れ方法が経験に基づいたものである
ために常に認識率が向上する保証がないという問題もあ
る。Then, a basic HMM without controlling the duration time.
In, the calculation at the time of speech recognition is to calculate the sum of the occurrence probability of the previous state multiplied by the output probability and the transition probability, but the method of You have to calculate the sum of things. Therefore, in the method of, in addition to the calculation of the transition probability and the output probability performed in the basic HMM without the duration control,
There is a problem that the amount of calculation at the time of recognition increases because it is necessary to calculate the penalty that gives a higher score as the stay time in the state subject to the calculation of the current transition probability and the output probability is the desired length. is there. Another problem is that there is no guarantee that the recognition rate will always improve because the method of incorporating the penalty into the occurrence probability is based on experience.

【０００５】この発明は、上述の通りの問題を解消した
基本隠れマルコフモデルを用いた音声認識方法を提供す
るものである。The present invention provides a speech recognition method using a basic hidden Markov model that solves the above-mentioned problems.

【０００６】[0006]

【課題を解決するための手段】入力された音声信号の分
析を行ない、その特徴パラメータを計算する音声分析部
６を具備し、学習データから基本隠れマルコフモデルの
出力確率と遷移確率の学習を行ない、更にその結果に基
づいて各状態の継続時間の分布の分析を行なう学習処理
部６を具備し、基本隠れマルコフモデルの学習開始時の
初期値として使用する出力確率と遷移確率と、該学習処
理部から出力された出力確率と遷移確率と、各状態の継
続時間の分布、最小継続時間、最大継続時間を記憶し
た、パラメータ記憶部７を具備し、滞在時間毎の遷移確
率と、該パラメータ記憶部から得られる出力確率を用い
て滞在時間毎に異なる遷移確率によって継続時間制御を
行なう基本隠れマルコフモデルにより認識を実行する認
識処理部９を具備し、認識処理部９によって計算された
各候補の生起確率に基づき認識結果を出力する認識結果
出力部１０を具備する基本隠れマルコフモデルを用いた
音声認識方法において、滞在時間毎の遷移確率から計算
上得られる各状態の継続時間の分布を該パラメータ記憶
部から取り出した学習時の各状態の継続時間の分布と一
致するように滞在時間毎の遷移確率を推定する基本隠れ
マルコフモデルを用いた音声認識方法、を構成した。SOLUTION: To solve the problem, an input voice signal is analyzed, and a voice analysis unit 6 for calculating a characteristic parameter thereof is provided, and output probabilities and transition probabilities of a basic hidden Markov model are learned from learning data. Further, the learning processing unit 6 for analyzing the distribution of the duration of each state based on the result is provided, and the output probability and transition probability used as initial values at the start of learning of the basic hidden Markov model, and the learning processing. The output probability and the transition probability output from the unit, the distribution of the duration of each state, the minimum duration, and the maximum duration are stored in the parameter storage unit 7. The transition probability for each stay time and the parameter storage are provided. A recognition processing unit 9 is provided for executing recognition by a basic hidden Markov model in which duration control is performed with transition probabilities that differ for each stay time using output probabilities obtained from the parts. In the speech recognition method using the basic hidden Markov model, which includes the recognition result output unit 10 that outputs the recognition result based on the occurrence probability of each candidate calculated by the recognition processing unit 9, the calculation is obtained from the transition probability for each stay time. Recognition method using a basic hidden Markov model that estimates transition probabilities for each stay time so that the distribution of the duration of each state is extracted from the parameter storage unit to match the distribution of the duration of each state during learning , Was constructed.

【０００７】[0007]

【実施例】この発明の基本ＨＭＭを用いた音声認識方法
を要約する。基本ＨＭＭの学習により推定した遷移確率
は、上述された従来の方法により求めた各滞在時間毎の
遷移確率の平均値と考えることができる。また、継続時
間分布の分析結果に基づいて最小、最大継続時間を一定
の危険率のもとで定めることが出来る。そこで、最小継
続時間までの自己遷移確率を1.0 、最大継続時間以降の
自己遷移確率を0.0 とし、この２つの間の遷移確率を操
作して継続時間の分布が学習時に求めた分布と等しくな
るように定めれば簡易に継続時間制御を行なうことがで
きる。BEST MODE FOR CARRYING OUT THE INVENTION The speech recognition method using the basic HMM of the present invention will be summarized. The transition probability estimated by the learning of the basic HMM can be considered as an average value of the transition probabilities for each stay time obtained by the conventional method described above. In addition, the minimum and maximum durations can be determined based on the analysis result of the duration distribution at a constant risk rate. Therefore, the self-transition probability up to the minimum duration is 1.0 and the self-transition probability after the maximum duration is 0.0, and the transition probability between these two is manipulated so that the distribution of the duration becomes equal to the distribution obtained during learning. If set to, the duration control can be easily performed.

【０００８】この発明の基本ＨＭＭを用いた音声認識方
法の実施例を図１を参照して具体的に説明する。図１に
おいて、１はＡＤ変換部、２は音声区間検出部、３は制
御部、４は特徴パラメータ計算部、５はベクトル量子化
部、６は学習処理部、７はパラメータ記憶部、８は遷移
確率推定部、９は認識処理部、１０は認識結果出力部で
ある。An embodiment of a voice recognition method using the basic HMM of the present invention will be specifically described with reference to FIG. In FIG. 1, 1 is an AD conversion unit, 2 is a voice section detection unit, 3 is a control unit, 4 is a characteristic parameter calculation unit, 5 is a vector quantization unit, 6 is a learning processing unit, 7 is a parameter storage unit, and 8 is A transition probability estimation unit, 9 is a recognition processing unit, and 10 is a recognition result output unit.

【０００９】先ず、共通的な処理について説明する。音
声がＡＤ変換部１に入力されると、音声入力はここにお
いて帯域制限された上でディジタルデータに変換され
る。これと同時に、音声区間検出部２は音声区間と雑音
区間の識別を行なうと共に発声の終了を検出する。発声
の終了は制御部３に通知される。制御部３が発声終了の
通知を受けとると、特徴パラメータ計算部４はＡＤ変換
部１においてＡＤ変換された入力音声についてディジタ
ルフーリエ変換、ＬＰＣケプストラムその他の特徴パラ
メータを計算する。ベクトル量子化部５は特徴パラメー
タ計算部４から送り込まれた特徴パラメータをベクトル
量子化し、その符号系列を出力する。First, common processing will be described. When voice is input to the AD conversion unit 1, the voice input is band-limited and converted into digital data. At the same time, the voice section detection unit 2 distinguishes the voice section from the noise section and detects the end of utterance. The control unit 3 is notified of the end of utterance. When the control unit 3 receives the notification of the end of utterance, the characteristic parameter calculation unit 4 calculates the digital Fourier transform, the LPC cepstrum and other characteristic parameters for the input voice AD-converted by the AD conversion unit 1. The vector quantizer 5 vector-quantizes the feature parameter sent from the feature parameter calculator 4 and outputs the code sequence.

【００１０】次に、学習処理について説明する。学習処
理部６は、学習を開始するに先だって、パラメータ記憶
部７から各ＨＭＭ毎の出力確率と遷移確率の初期値を取
り出してＨＭＭの初期化を行なう。学習処理部６は、学
習が開始されると、制御部３の指示に従ってベクトル量
子化部５の出力に該当するＨＭＭの学習を行なう。制御
部３は学習の進行状況を監視しており、学習が収束した
と判断した時点においてＨＭＭの出力確率と遷移確率の
学習を終了する。次に、学習結果を使用してトレリス上
のパスを計算し、各状態の継続時間を定め、継続時間の
分布を分析し、度数の総和が1.0 となるよう正規化す
る。学習処理部６は以上の処理が終了したところで出力
確率と遷移確率と継続時間の分布について結果をパラメ
ータ記憶部７に格納する。Next, the learning process will be described. Before starting learning, the learning processing unit 6 retrieves the initial values of the output probability and the transition probability of each HMM from the parameter storage unit 7 and initializes the HMM. When the learning is started, the learning processing unit 6 learns the HMM corresponding to the output of the vector quantization unit 5 according to the instruction of the control unit 3. The control unit 3 monitors the progress of learning, and ends learning of the output probability and transition probability of the HMM when it is determined that the learning has converged. Next, the path on the trellis is calculated using the learning result, the duration of each state is determined, the distribution of duration is analyzed, and the sum of the frequencies is normalized to be 1.0. When the above processing is completed, the learning processing unit 6 stores the result of the output probability, the transition probability, and the distribution of the duration in the parameter storage unit 7.

【００１１】ここで、認識処理について説明する。認識
を開始するに先だって、制御部３の指令に基づいてパラ
メータ記憶部７から各ＨＭＭの遷移確率と状態毎の継続
時間分布の情報を取り出し、これらを遷移確率推定部８
に送り込む。遷移確率推定部８は滞在時間毎の遷移確率
から計算上得られる各状態の継続時間の分布を、パラメ
ータ記憶部７から取り出した学習時の各状態の継続時間
の分布と一致する様に、滞在時間毎の遷移確率を調整推
定する。The recognition process will be described below. Prior to starting the recognition, information on the transition probability of each HMM and the duration distribution for each state is taken out from the parameter storage unit 7 based on the command of the control unit 3, and these are calculated by the transition probability estimation unit 8
Send to. The transition probability estimating unit 8 stays so that the distribution of the duration of each state obtained by calculation from the transition probability of each stay time matches the distribution of the duration of each state at the time of learning extracted from the parameter storage unit 7. Adjust and estimate the transition probability for each time.

【００１２】ここにおいて使用される計算上の継続時間
の分布は以下の様に求める。ここで、自己遷移確率と
は、或る状態から遷移せずに同一状態に留まる確率であ
るから、自己の状態に遷移する確率と他の状態に遷移す
る確率の和は1.0 である。従って、自己遷移確率を1.0
から差し引けば、これは現在の状態から出て行く確率と
なる。このことから、ｎ回自己遷移を行ってから他の状
態に遷移する確率は下記の式の通りになる。即ち、継続
時間ｎ（時間０に或る状態に到達し、時間ｎ＋１に他の
状態に遷移する場合）の度数は、滞在時間１から滞在時
間ｎまでの自己遷移確率の積と（１−滞在時間ｎ＋１の
自己遷移確率）の積となる。ところで、一般に、分布を
求めるに際して、それぞれの滞在時間のものが幾つある
かという場合、これを度数という表現を採用する。この
場合、実際の数を数えることはできないのであるが、そ
れぞれの滞在時間になる確率は幾つであるかを知ること
はできる。そこで、度数の代わりにこの確率を採用して
度数分布を計算する様にしている。継続時間ｎの度数＝滞在時間１の自己遷移確率× 滞在時間２の自己遷移確率× ……… 滞在時間ｎの自己遷移確率× （１−滞在時間ｎ＋１の自己遷移確率）継続時間ｎが最小継続時間から最大継続時間までの範囲
で度数を求める。度数の総和が1.0 となる様に正規化す
る。各滞在時間毎の遷移確率が算出されると、制御部３
に知らせる。すると、制御部３は、パラメータ記憶部７
から出力確率を遷移確率推定部８から各継続時間毎の遷
移確率を認識処理部９に送り込む。The distribution of the calculated duration used here is obtained as follows. Here, the self-transition probability is the probability of staying in the same state without making a transition from a certain state, so the sum of the probability of making a transition to its own state and the probability of making a transition to another state is 1.0. Therefore, the self-transition probability is 1.0
Subtracted from, this is the probability of leaving the current state. From this, the probability of transition to another state after performing self-transition n times is given by the following equation. That is, the frequency of the duration n (when reaching a certain state at time 0 and transiting to another state at time n + 1) is the product of the self-transition probabilities from the stay time 1 to the stay time n (1-stay It is the product of the self-transition probability of time n + 1). By the way, in general, when the distribution is obtained, when the number of stay times is how many, the expression “frequency” is adopted. In this case, it is not possible to count the actual number, but it is possible to know what is the probability of each stay time. Therefore, this probability is used instead of frequency to calculate the frequency distribution. Frequency of duration n = Self-transition probability of dwell time 1 x Self-transition probability of dwell time 2 ... …… Self-transition probability of dwell time n × (1-Self-transition probability of dwell time n + 1) Duration n is minimum continuation Calculate the frequency in the range from time to maximum duration. Normalize the sum of frequencies to 1.0. When the transition probability for each stay time is calculated, the control unit 3
Let us know. Then, the control unit 3 causes the parameter storage unit 7 to
The output probability is sent from the transition probability estimation unit 8 to the recognition processing unit 9 at each continuation time.

【００１３】音声認識時において、制御部３の指令に基
づいて認識処理部９にベクトル量子化部５から符号系列
が送り込まれると、認識処理部９は入力符号系列の各Ｈ
ＭＭに対する尤度を求め、これらの尤度を尤度の大きさ
の順に認識結果出力部１０に送り込む。制御部３は一定
値以上の尤度を持つ候補を認識結果出力部１０に出力す
る。When a code sequence is sent from the vector quantization unit 5 to the recognition processing unit 9 based on a command from the control unit 3 at the time of speech recognition, the recognition processing unit 9 causes each H of the input code sequence.
Likelihoods for the MM are obtained, and these likelihoods are sent to the recognition result output unit 10 in the order of likelihood magnitude. The control unit 3 outputs to the recognition result output unit 10 a candidate having a certain likelihood or more.

【００１４】学習時については、この発明は、継続時間
制御を行わない基本ＨＭＭを用いるところから推定する
遷移確率は各状態に１組づつでよい。状態の継続時間分
布を反映した構造を持つＨＭＭを用いるの方法は、考
慮する継続時間の数だけ状態が増加するのでその数に比
例して推定する遷移確率の数は増加する。各状態におけ
る滞在時間毎の遷移確率を学習時に推定するの方法
は、考慮する継続時間の数だけ遷移確率を推定する必要
がある。推定する遷移確率の数が増加すると、１つの遷
移確率を推定する学習データの数がそれだけ減少するの
で、その分推定精度は低下する。一般に、或る数を推定
する場合、使用されるデータの数が多ければ多い程推定
精度を高くすることができる。そして、推定精度が高け
ればそれだけ高い認識率が得られる。また、計算量が少
なければ同一計算機により当然高速計算を実行すること
ができる。In learning, according to the present invention, the transition probabilities estimated from the use of the basic HMM without the duration control may be one set for each state. In the method of using the HMM having a structure reflecting the state duration distribution, the number of states increases by the number of durations to be considered, and thus the number of transition probabilities to be estimated increases in proportion to the number of states. In the method of estimating the transition probability for each stay time in each state at the time of learning, it is necessary to estimate the transition probability by the number of durations to be considered. When the number of transition probabilities to be estimated increases, the number of learning data for estimating one transition probability decreases accordingly, so that the estimation accuracy decreases accordingly. In general, when estimating a certain number, the larger the number of data used, the higher the estimation accuracy can be. The higher the estimation accuracy, the higher the recognition rate. If the calculation amount is small, the same computer can naturally perform high-speed calculation.

【００１５】音声認識時については、この発明は、計算
量は各状態における滞在時間毎の遷移確率を学習時に推
定するの方法におけると同様である。計算量を状態の
継続時間分布を反映した構造を持つＨＭＭを用いるの
方法と比較した場合、状態の数が少ない分だけ計算量は
少ない。遷移確率、出力確率の他に、継続時間が望まし
い長さであるものに高得点を与えるペナルティを生起確
率の計算に導入するの方法と比較した場合、ペナルテ
ィの計算をする必要のない分計算は少なくて済む。In the case of speech recognition, the present invention is similar to that in the method of estimating the transition probability for each stay time in each state at the time of learning. When the amount of calculation is compared with the method of using an HMM having a structure that reflects the duration distribution of states, the amount of calculation is small as the number of states is small. In addition to the transition probabilities and output probabilities, when comparing the method that introduces a penalty that gives a high score to those whose duration is a desired length into the calculation of the occurrence probability, the minute calculation that does not need to calculate the penalty is It can be small.

【００１６】[0016]

【発明の効果】以上の通りであって、この発明は従来の
、およびの方法と比較して、計算量が少なく、パ
ラメータの推定精度が高いので、高速でしかも認識精度
が高い音声認識をすることができるという効果を奏す
る。As described above, according to the present invention, the amount of calculation is small and the estimation accuracy of parameters is high as compared with the conventional methods 1 and 2, so that the voice recognition is performed at high speed and with high recognition accuracy. There is an effect that can be.

[Brief description of drawings]

【図１】この発明の一実施例を説明する図。FIG. 1 is a diagram illustrating an embodiment of the present invention.

[Explanation of symbols]

１ＡＤ変換部２音声区間検出部３制御部４特徴パラメータ計算部５ベクトル量子化部６学習処理部７パラメータ記憶部８遷移確率推定部９認識処理部１０認識結果出力部 1 AD conversion unit 2 Voice section detection unit 3 Control unit 4 Characteristic parameter calculation unit 5 Vector quantization unit 6 Learning processing unit 7 Parameter storage unit 8 Transition probability estimation unit 9 Recognition processing unit 10 Recognition result output unit

Claims

[Claims]

1. An input speech signal is analyzed, and a speech analysis unit for calculating a characteristic parameter of the speech signal is provided, learning of output probabilities and transition probabilities of a basic hidden Markov model is performed from learning data, and the result is further analyzed. A learning processing unit that analyzes the distribution of the duration of each state based on the output probabilities and transition probabilities used as initial values at the start of learning of the basic hidden Markov model, and the output output from the learning processing unit. Probability and transition probability, distribution of duration of each state,
A parameter storage unit that stores the minimum duration time and the maximum duration time is provided, and the duration control is performed by the transition probability for each stay time and the transition probability that is different for each stay time using the output probability obtained from the parameter storage unit. It uses a basic hidden Markov model that has a recognition processing unit that executes recognition by the basic hidden Markov model and that has a recognition result output unit that outputs the recognition result based on the occurrence probability of each candidate calculated by the recognition processing unit. In the speech recognition method, the distribution of the duration of each state calculated from the transition probabilities for each duration of stay is adjusted so that it matches the distribution of the duration of each state during learning extracted from the parameter storage unit. A speech recognition method using a basic hidden Markov model characterized by estimating transition probabilities.