JP2864821B2

JP2864821B2 - Voice recognition device

Info

Publication number: JP2864821B2
Application number: JP33028191A
Authority: JP
Inventors: 武志則松; 良久中藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-12-13
Filing date: 1991-12-13
Publication date: 1999-03-08
Anticipated expiration: 2014-03-08
Also published as: JPH05165493A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、入力音声の特徴パタ―
ンと予め登録された認識対象となる音声の特徴パタ―ン
との時間正規化マッチングにより認識結果を導き出す音
声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a feature pattern of an input voice.
The present invention relates to a speech recognition device that derives a recognition result by time-normalized matching between a pattern and a feature pattern of a speech to be recognized which is registered in advance.

【０００２】[0002]

【従来の技術】従来の単語音声認識装置では、２つの音
声パタ―ン間の類似度を、スペクトルパラメ―タ時系列
同志の時間正規化マッチングにより求めるのが一般的で
ある。しかしこの方法はエネルギ―情報が考慮されてい
ないため、エネルギ―形状の異なったパタ―ン間でも類
似度が高くなることがある。これを解決するために、ス
ベクトル距離だけでなくエネルギ―距離も加味した距離
尺度を用い、エネルギ―形状の違いを距離に反映させる
ことによって認識率向上を図った方法が提案されている
（例えば、特公平２−４１７６０号公報）。2. Description of the Related Art In a conventional word speech recognition apparatus, a similarity between two speech patterns is generally obtained by time-normalized matching between spectral parameters in a time series. However, since this method does not take energy information into consideration, similarity may be high even between patterns having different energy shapes. In order to solve this, a method has been proposed in which a distance scale that takes into account not only the vector distance but also the energy distance is used, and the difference in energy shape is reflected in the distance to improve the recognition rate (for example, a method has been proposed). JP-B-2-41760).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら上記の音
声認識装置では、エネルギ―距離を用いることにより時
間正規化マッチング経路の範囲をある程度制御すること
は可能であるが、エネルギ―の変化状態に沿った正確な
パタ―ンマッチングが実行されることが保証されている
わけではない。そのため、エネルギ―形状の異なったパ
タ―ン間でも類似度が高くなり誤認識を生じる場合があ
る。However, in the above-described speech recognition apparatus, it is possible to control the range of the time-normalized matching path to some extent by using the energy distance, It is not guaranteed that exact pattern matching will be performed. For this reason, the similarity increases between patterns having different energy shapes, and erroneous recognition may occur.

【０００４】また音韻の類似した単語間の認識では、音
声の定常部、過渡部同志の違いを正確に比較することが
重要である。音声の定常部、過渡部はほぼエネルギ―変
化の定常部、過渡部に一致すると考えてよい。定常部は
音声中に占める割合が多いため従来の方法でも違いを区
別することができるが、過渡部は音声中に占める割合が
小さいため正確なマッチングによる２つのパタ―ンの比
較が要求される。例えば人名「ＳＡＴＯ」「ＫＡＴＯ」
の場合、両者の違いはほぼ音声の過渡部に相当する語頭
の子音「Ｓ」、「Ｋ」の部分であり、これらを正確にマ
ッチングできなければ認識性能向上は望めない。しかし
従来の音声認識装置では、定常部、過渡部の正確なマッ
チングを実行することが難しく、過渡部の違いが最終の
類似度に反映されにくいため、特に類似単語間で誤認識
を生じる原因となっていた。In recognition between words having similar phonemes, it is important to accurately compare the difference between the stationary part and the transient part of the speech. It can be considered that the stationary part and the transient part of the voice almost coincide with the stationary part and the transient part of the energy change. Since the stationary part has a large proportion in the speech, the difference can be distinguished by the conventional method. However, the transient part has a small proportion in the speech, so that two patterns must be compared by accurate matching. . For example, personal names "SATO" and "KATO"
In the case of, the difference between the two is the part of the consonants "S" and "K" at the beginning of the word, which substantially corresponds to the transient part of the voice. If these cannot be matched correctly, improvement in recognition performance cannot be expected. However, in the conventional speech recognition device, it is difficult to perform accurate matching between the stationary part and the transient part, and it is difficult for the difference between the transient parts to be reflected in the final similarity. Had become.

【０００５】さらに、語頭語尾にエネルギ―の小さな音
節を伴う音声の場合、音声区間検出を誤ると登録した音
声とのエネルギ―パタ―ンが異なってしまい、誤認識を
生じる原因となる。[0005] Further, in the case of a speech accompanied by a syllable having a small energy at the beginning and end of the speech, if the speech section detection is erroneous, the energy pattern of the registered speech differs from that of the registered speech, which causes erroneous recognition.

【０００６】本発明は上記従来の課題を解決するもので
あり、エネルギ―形状の異なった単語間、及び音韻の類
似した単語間の誤認識を簡単な制御で大幅に低減するこ
とができ、さらに語頭語尾の欠落を生じ易い音声の場合
にも精度良く認識することのできる音声認識装置を提供
することを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned conventional problems, and erroneous recognition between words having different energy shapes and words having similar phonemes can be greatly reduced by simple control. It is an object of the present invention to provide a speech recognition device capable of accurately recognizing a speech in which the beginning and ending are likely to be lost.

【０００７】[0007]

【課題を解決するための手段】上記課題を解決するため
に本発明の音声認識装置は、音声のスペクトルパラメ―
タと正規化された対数エネルギ―パラメ―タの時系列を
抽出する音声分析部と、正規化エネルギ―系列の微小な
変動を平滑化するための平滑化部と、音声の始終端以外
はエネルギ―隣接フレ―ム間のエネルギ―変化即ち上
昇、下降、変化なしの遷移パタ―ンに応じてエネルギ―
変化度を決定するエネルギ―変化度決定部と、音声の始
終端のエネルギー変化度合いを常に変化なしと設定する
初期設定部と、入力パタ―ン、標準パタ―ンのエネルギ
―変化度により任意に定めた対応付けの規則によってマ
ッチング経路を制限し、エネルギ―変化度の値に応じて
経路に重みづけをした時間正規化マッチングを実行する
パタ―ンマッチング部とを備えたものである。In order to solve the above-mentioned problems, a speech recognition apparatus according to the present invention provides a speech spectral parameter.
A voice analysis unit for extracting a time series of a logarithmic energy parameter and a normalized logarithmic energy parameter, a smoothing unit for smoothing a minute variation of the normalized energy sequence, and a unit other than the beginning and end of the voice.
Is the energy in response to the energy change between adjacent frames, i.e., a transition pattern with no rise, no change,
Energy to determine the degree of change - the change degree determination unit, the audio of the start
Always set the end energy change degree to no change
The matching path is restricted by the initial setting unit and the rules of association arbitrarily determined by the energy change degree of the input pattern and the standard pattern, and the path is weighted according to the value of the energy change degree. And a pattern matching unit for executing time-normalized matching.

【０００８】また本発明の音声認識装置は、音声の始終
端におけるエネルギ―変化度をエネルギ―変化がないも
のとして設定する初期設定部を備えたものである。Further, the speech recognition apparatus of the present invention includes an initial setting unit for setting the degree of energy change at the beginning and end of the speech as if there is no energy change.

【０００９】[0009]

【作用】本発明は上述した構成により、エネルギ―系列
の微小な変動を吸収する平滑化処理によりエネルギ―変
化の概形を求め、音声パタ―ンのエネルギ―の変化の状
態に合わせた時間正規化マッチングを実行できるため、
エネルギ―の微小変化の影響を受けずに音声の定常部、
過渡部同志を正確に対応付けることができ、エネルギ―
形状の異なった単語間、及び類似単語間の誤認識を大幅
に低減することのできる音声認識装置を提供することが
できる。According to the present invention, with the above-described configuration, a rough form of an energy change is obtained by smoothing processing for absorbing a minute change in an energy series, and a time regularization matching the state of the energy change of the voice pattern is performed. Can perform morphological matching,
The stationary part of the voice without being affected by the minute change of energy,
Transient parts can be accurately associated with each other,
It is possible to provide a speech recognition device that can significantly reduce erroneous recognition between words having different shapes and between similar words.

【００１０】また本発明は、音声の始終端ではエネルギ
―変化がないものとして２つのパタ―ン間の時間正規化
マッチングを実行することにより、上記の作用に加え
て、語頭語尾にエネルギ―の小さな山が存在し音声区間
検出を誤りやすい音声の場合にも精度良く認識すること
のできる音声認識装置を提供することができる。In addition, the present invention performs time-normalized matching between two patterns assuming that there is no energy change at the beginning and end of speech. It is possible to provide a speech recognition device capable of accurately recognizing speech even when a small mountain exists and speech section detection is apt to be erroneous.

【００１１】[0011]

【実施例】以下本発明の一実施例の音声認識装置につい
て図面を参照しながら説明する。（図１）は本発明の一
実施例の音声認識装置のブロック構成図である。（図
１）において、１は音声の特徴パラメ―タを抽出する音
声分析部で、入力された音声信号から一定時間毎に音声
スペクトルの特徴パラメ―タの時系列を抽出するスペク
トルパラメ―タ抽出部１１と、音声信号の対数エネルギ
―値の正規化した値を抽出するエネルギ―パラメ―タ抽
出部１２とから構成される。２はエネルギ―の微小変化
を平滑化処理する平滑化部である。３はエネルギ―変化
度決定部で、エネルギ―差分値からエネルギ―変化の状
態を判断するエネルギ―差分符合計算部３１と、各時点
でのエネルギ―差分符号値の遷移状態からエネルギ―の
変化の度合を計算するエネルギ―変化度計算部３２より
構成される。４は、音声の始終端でのエネルギ―変化度
をエネルギ―変化なしとして設定する初期設定部であ
る。５は入力音声の特徴パラメ―タの時系列を記憶する
入力パタ―ンメモリ、６は認識対象となる音声の特徴パ
ラメ―タの時系列を記憶する標準パタ―ンメモリであ
る。７は入力パタ―ンと標準パタ―ンとの類似度を計算
するパタ―ンマッチング部で、エネルギ―変化状態に応
じて入力パタ―ンと標準パタ―ンとの対応付けの制限を
行う対応付け部７１と、この対応付け部７１の制限に従
ってマッチングの範囲を制限しながら、エネルギ―変化
度合に応じた重み付けを施した時間正規化マッチングを
実行する制御部７２とにより構成される。８は入力パタ
―ンと各標準パタ―ンとの類似度から類似度が最大とな
る標準パタ―ンを選択する類似度比較部である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition apparatus according to one embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a speech analysis unit for extracting speech feature parameters, and a spectrum parameter extraction for extracting a time series of speech spectrum feature parameters at regular intervals from an input speech signal. It comprises a unit 11 and an energy parameter extracting unit 12 for extracting a normalized value of the logarithmic energy value of the audio signal. Reference numeral 2 denotes a smoothing unit for smoothing a small change in energy. Numeral 3 denotes an energy change degree determining unit which determines an energy change state from the energy difference value, and an energy change code calculating unit 31 which determines the energy change from the transition state of the energy difference code value at each time. It comprises an energy-degree-of-change calculator 32 for calculating the degree. Reference numeral 4 denotes an initial setting unit for setting the degree of energy change at the beginning and end of the voice as no energy change. Reference numeral 5 denotes an input pattern memory for storing a time series of characteristic parameters of an input voice, and reference numeral 6 denotes a standard pattern memory for storing a time series of characteristic parameters of a voice to be recognized. Reference numeral 7 denotes a pattern matching unit for calculating the similarity between the input pattern and the standard pattern, which limits the correspondence between the input pattern and the standard pattern according to the energy change state. The control unit 72 includes a weighting unit 71 and a time-normalized matching that is weighted according to the degree of change in energy while limiting the range of matching according to the restrictions of the association unit 71. Reference numeral 8 denotes a similarity comparison unit that selects a standard pattern having the maximum similarity from the similarity between the input pattern and each standard pattern.

【００１２】次に、上記の一実施例における音声認識装
置の動作を（図１）を用いて詳細に説明する。マイクロ
ホン等を通して入力された音声信号は、スペクトルパラ
メ―タ抽出部１１で例えば、線形予測分析などにより一
定時間毎に線形予測係数等のスペクトルパラメ―タに変
換される。音声信号はまたエネルギ―パラメ―タ抽出部
１２に送られ、ここで一定時間毎の対数エネルギ―値が
計算される。この対数エネルギ―値は、さらに音声区間
中のエネルギ―値の最大値と最小値との間で正規化した
値に変換される。このエネルギ―の正規化は、発声また
単語によるエネルギ―の大きさのばらつきを吸収するた
めに行う。この正規化された対数エネルギ―値系列は、
後の処理でエネルギ―の微小な変化による悪影響を軽減
するために、平滑化部２で平滑化処理される。平滑化処
理としては、例えば（数１）に示したような各フレ―ム
の前後１フレ―ム間の移動平均値を計算する。Next, the operation of the speech recognition apparatus in the above embodiment will be described in detail with reference to FIG. The audio signal input through a microphone or the like is converted into spectral parameters such as linear prediction coefficients at regular time intervals by, for example, linear prediction analysis or the like in the spectral parameter extraction unit 11. The voice signal is also sent to an energy parameter extraction unit 12, where a logarithmic energy value is calculated at regular intervals. This logarithmic energy value is further converted into a value normalized between the maximum value and the minimum value of the energy value in the voice section. This energy normalization is performed in order to absorb variations in the magnitude of energy due to speech or words. This normalized logarithmic energy-value series is
In the subsequent processing, the smoothing processing is performed by the smoothing unit 2 in order to reduce an adverse effect due to a small change in energy. As the smoothing processing, for example, a moving average value between one frame before and after each frame as shown in (Equation 1) is calculated.

【００１３】[0013]

【数１】 (Equation 1)

【００１４】ここで、Ｐiはフレ―ムｉでの正規化対数
エネルギ―値、Ｅiは平滑化後のエネルギ―値を表す。
この平滑化処理により、エネルギ―系列における小さな
凹凸が吸収され、音声のエネルギ―の大まかな概形を得
ることができる。なお、平滑化処理として前後数フレ―
ム間のエネルギ―値の中央値を求めるいわゆるメディア
ン平滑化処理を行っても同様の効果を得ることができ
る。スペクトルパラメ―タ抽出部１１及び平滑化部２で
算出されたスペクトルパラメ―タ、エネルギ―パラメ―
タの時系列は入力パタ―ンメモリ５に格納される。Here, Pi represents the normalized logarithmic energy value in frame i, and Ei represents the energy value after smoothing.
By this smoothing process, small irregularities in the energy sequence are absorbed, and a rough outline of the energy of the sound can be obtained. Note that several frames before and after
The same effect can be obtained by performing a so-called median smoothing process for finding the median value of the energy value between programs. Spectral parameters and energy parameters calculated by the spectral parameter extracting unit 11 and the smoothing unit 2
The time series of the data is stored in the input pattern memory 5.

【００１５】この平滑化部２で算出されたエネルギ―値
の時系列を用いて、エネルギ―差分符号計算部３１で各
フレ―ム毎にエネルギ―変化パタ―ン、即ち上昇、下
降、変化なしを決定する。ここで入力音声パタ―ンの任
意のフレ―ムをｉ（ｉ＝１．．．Ｉ）とし、フレ―ムｉ
の平滑化後のエネルギ―値をＥiとする。Using the time series of the energy values calculated by the smoothing unit 2, the energy-difference code calculation unit 31 performs an energy change pattern for each frame, ie, no rise, fall, or change. To determine. Here, an arbitrary frame of the input voice pattern is defined as i (i = 1... I), and the frame i
Let Ei be the energy value after smoothing.

【００１６】まずフレ―ムｉとその前フレ―ム（ｉ−
１）でのエネルギ―値の差分値ΔＥiを次式で求める。First, the frame i and the preceding frame (i-
The difference value ΔEi of the energy value in 1) is obtained by the following equation.

【００１７】[0017]

【数２】 (Equation 2)

【００１８】（数２）で得られた差分値ΔＥiの符号、
即ち＋（プラス）及び−（マイナス）を求め、＋に値１
を、−に値−１を与える。なお差分値の絶対値が任意に
定めた定数ａより小さい場合、即ち次式の条件を満たす
場合はエネルギ―の変化が小さいとして値０を与える。The sign of the difference value ΔEi obtained by (Equation 2)
That is, + (plus) and-(minus) are obtained, and the value 1 is added to +.
And − is given the value −1. If the absolute value of the difference value is smaller than an arbitrarily determined constant a, that is, if the following condition is satisfied, the value 0 is given assuming that the change in energy is small.

【００１９】[0019]

【数３】 (Equation 3)

【００２０】ここで得られた任意のフレ―ムｉのエネル
ギ―差分の変化パタ―ン（−１、０、１）をＦiとお
く。このＦiの時系列を用いて、エネルギ―変化度計算
部３２では、各フレ―ムでのエネルギ―の変化度Ｓiを
次式で算出する。The change pattern (-1, 0, 1) of the energy difference of an arbitrary frame i obtained here is defined as Fi. Using the time series of Fi, the energy change degree calculator 32 calculates the energy change degree Si in each frame by the following equation.

【００２１】[0021]

【数４】 (Equation 4)

【００２２】この値は任意のフレ―ムｉとその前後１フ
レ―ム間のエネルギ―の変化の傾向を表す量として定義
される。但し音声の始端及び終端におけるエネルギ―変
化度、即ちＳ1及びＳIの値は初期設定部４で強制的に０
に設定される。このようにして算出されたエネルギ―変
化度Ｓiの時系列は入力パタ―ンメモリ５に記憶され
る。This value is defined as an amount indicating the tendency of the change in energy between an arbitrary frame i and one frame before and after the arbitrary frame i. However, the energy change degree at the beginning and end of the voice, that is, the values of S1 and SI are forcibly set to 0 by the initial setting unit 4.
Is set to The time series of the energy change degree Si thus calculated is stored in the input pattern memory 5.

【００２３】なお、認識の対象となる各標準パタ―ンの
スペクトルパラメ―タとエネルギ―変化度Ｓの時系列は
上記と同様に予め求められ、標準パタ―ンメモリ６に記
憶されているものとする。It should be noted that the time series of the spectral parameters of each standard pattern to be recognized and the energy change degree S are obtained in advance in the same manner as described above and stored in the standard pattern memory 6. I do.

【００２４】次にパタ―ンマッチング部７の動作を説明
する。ここで、処理の対象となる標準パタ―ンの任意の
フレ―ムをｊ、フレ―ムｊにおけるエネルギ―変化度を
Ｓjとおく。入力パタ―ンと標準パタ―ンのパタ―ンマ
ッチングは、例えば（数５）の漸化式で示されるような
傾斜制限に従った動的計画法による時間正規化マッチン
グ（ＤＰマッチングと呼ぶ）により実行される。Next, the operation of the pattern matching section 7 will be described. Here, an arbitrary frame of the standard pattern to be processed is j, and the degree of energy change in the frame j is Sj. The pattern matching between the input pattern and the standard pattern is performed by time-normalized matching (called DP matching) by dynamic programming according to the gradient restriction as shown by the recurrence formula of Equation (5). Is executed by

【００２５】[0025]

【数５】 (Equation 5)

【００２６】ここでｄijは入力パタ―ンｉフレ―ム、
標準パタ―ンｊフレ―ムのスペクトル特徴パタ―ン間の
ベクトル距離、ｇijはその累積距離である。さらに、音
声の過渡部の違いを明確にするために、（数５）の傾斜
制限に対し、各フレ―ムでのエネルギ―変化度の値によ
ってその大きさに応じた重み付けを施した次式の漸化式
を導入する（（図２）参照）。Where dij is the input pattern i-frame,
The vector distance gij between the spectral feature patterns of the standard pattern j frame is the cumulative distance. Furthermore, in order to clarify the difference between the transient parts of the voice, the following equation is obtained by weighting the slope limit of (Equation 5) according to the magnitude of the energy change in each frame according to the magnitude thereof. (Refer to (FIG. 2)).

【００２７】[0027]

【数６】 (Equation 6)

【００２８】（数６）は、マッチング経路が入力パタ―
ン方向に進むときにはＳiを、標準パタ―ン方向に進む
ときはＳjを、斜め方向に進むときはＳi＋Ｓjを重みと
して（数５）に加えたものであり、入力パタ―ン、標準
パタ―ンのエネルギ―変化度に応じてエネルギ―変化の
大きいフレ―ムにより大きな重み付けがされる。この処
理により、音声の過渡部同志の相違が累積距離に大きく
反映されることになる。ここで、マイナス方向へのエネ
ルギ―変化部分には重み付けされないように、エネルギ
―変化度Ｓが負の場合は（数６）のＳに値０を代入す
る。(Equation 6) indicates that the matching path is an input pattern.
When moving in the normal direction, S i is used in the standard pattern direction, and when moving in the oblique direction, S i + S j is added to Equation 5 as a weight. The input pattern and the standard pattern are used. A frame having a large energy change is weighted in accordance with the degree of energy change of the frame. By this processing, the difference between the transient parts of the voice is largely reflected on the accumulated distance. Here, when the energy change degree S is negative, the value 0 is substituted for S in (Equation 6) so that the energy change portion in the negative direction is not weighted.

【００２９】今、入力パタ―ンのｉフレ―ム、標準パタ
―ンのｊフレ―ムでのマッチング計算を実行しているも
のとする。制御部７２ではまず、入力パタ―ンのフレ―
ムｉのエネルギ―変化度Ｓiの値に対し、例えば（数
７）に示したような対応付けの制限を定めた対応付け部
７１の制限に従い、（数６）の漸化式を計算するかどう
かを判断する。Now, it is assumed that the matching calculation is performed for the i-frame of the input pattern and the j-frame of the standard pattern. The control unit 72 first sets the frame of the input pattern.
Whether the recurrence formula of (Equation 6) is calculated for the value of the energy change degree Si of the unit i according to the limitation of the associating unit 71 which defines the limitation of the association as shown in (Equation 7), for example. Judge whether or not.

【００３０】[0030]

【数７】 (Equation 7)

【００３１】例えばＳiが２の場合、（数７）の制限に
従うと、Ｓjが１或は２の場合にはマッチング計算（数
４）を実行し、それ以外の場合は実行しないことにな
る。このように対応付け部７１の制限に従い各標準パタ
―ンについてパタ―ンマッチング計算を行い類似度を算
出する。２つのパタ―ン間でパタ―ンマッチング部７に
よるマッチング計算を行った結果例を（図３）に示す。
（図３）の斜線部はマッチング計算が実行可能な範囲を
表す。ここで、実線で囲んだ部分は終点までの経路計算
が実行でき累積距離を求めることができるが、他の部分
は、経路計算が途中で打ち切られる。（図３）におい
て、経路（１）は入力パタ―ンのエネルギ―の山Ａ1及
びＡ2を１つの山として標準パタ―ンのエネルギ―の山
Ｂ1と対応させ、Ａ2とＢ2を同様に対応付けした時の経
路を表し、山Ｂ3に対応させる入力パタ―ンの山が存在
しないため経路が途中で途切れている。経路（２）はＡ
1をＢ1、Ａ2をＢ2、Ａ3をＢ3にそれぞれ対応させた時の
経路を表す。経路（３）及び（４）は標準パタ―ンの山
Ｂ1のエネルギ―変化度が小さいことから山Ｂ1を入力パ
タ―ンの始点と対応付けした場合である。これは、初期
設定部４で始点のエネルギ―変化度を０に設定した効果
によるもので、音声の語頭の欠落及び雑音の付加等を考
慮した場合のマッチング経路となっている。終点につい
ても同様の効果がある。このようにして、可能性のある
経路を考慮しながら安定した音声の過渡部、定常部同志
の時間正規化マッチングが実行される。For example, when Si is 2, according to the restriction of (Equation 7), when Sj is 1 or 2, matching calculation (Equation 4) is executed, and otherwise, it is not executed. As described above, the pattern matching calculation is performed for each standard pattern according to the restriction of the association unit 71 to calculate the similarity. FIG. 3 shows an example of the result of performing a matching calculation by the pattern matching unit 7 between two patterns.
The hatched portion in FIG. 3 indicates a range in which matching calculation can be performed. Here, the portion surrounded by the solid line can be used to calculate the route to the end point, and the accumulated distance can be obtained. However, in other portions, the route calculation is interrupted halfway. In FIG. 3, the path (1) is such that the peaks A1 and A2 of the energy of the input pattern are regarded as one peak and correspond to the peak B1 of the energy of the standard pattern, and A2 and B2 are similarly associated. This indicates the route at the time of making, and the route is interrupted on the way because there is no mountain of the input pattern corresponding to the mountain B3. Route (2) is A
1 represents a route when B1 corresponds to B2, A2 corresponds to B2, and A3 corresponds to B3. Paths (3) and (4) correspond to the case where mountain B1 is associated with the starting point of the input pattern because the degree of energy change of mountain B1 of the standard pattern is small. This is due to the effect of setting the degree of energy change at the start point to 0 in the initial setting unit 4, and is a matching path in consideration of the lack of the beginning of speech and the addition of noise. The same effect is obtained for the end point. In this manner, stable time-normalized matching between the transient part and the steady part of the voice is performed while taking into account a possible path.

【００３２】また、パタ―ンマッチング部７でエネルギ
―形状の明らかに異なる２つのパタ―ン間のパタ―ンマ
ッチングを実行すると、マッチング経路が終点まで到達
できず、この段階でこの標準パタ―ンを認識候補音声か
ら除外することができる。If the pattern matching unit 7 performs pattern matching between two patterns having clearly different energy shapes, the matching path cannot reach the end point, and at this stage, the standard pattern Can be excluded from the recognition candidate voice.

【００３３】最後に類似度比較部８では、得られた各標
準パタ―ンとの類似度のうち、最大となる類似度を与え
る標準パタ―ンを認識候補として出力する。Finally, the similarity comparing section 8 outputs a standard pattern giving the maximum similarity among the obtained similarities with each standard pattern as a recognition candidate.

【００３４】以上のように本実施例によれば、平滑化部
２でエネルギ―の微小な変動をなくすために平滑化処理
を行い、エネルギ―変化度決定部３で各フレ―ムの前後
のフレ―ム間のエネルギ―変化の傾向からエネルギ―の
変化の度合を数値で決定し、この数値によって予め定め
られた対応付け部７１のマッチングの対応付けの制限に
従い、また制御部７２でエネルギ―変化度に応じた重み
付けによる傾斜制限に従い時間正規化マッチングを実行
するようにしたことにより、エネルギ―の小さな変動に
は影響されずにエネルギ―変化の過渡部、定常部同志の
最適なマッチングが実現でき、エネルギ―形状の異なっ
た単語間の誤認識、及び音韻の類似した単語間の誤認識
を大幅に低減することができる。As described above, according to the present embodiment, the smoothing unit 2 performs a smoothing process to eliminate a small change in energy, and the energy change degree determining unit 3 performs a smoothing process before and after each frame. The degree of change in energy is determined by a numerical value from the tendency of energy change between frames, and the numerical value is determined by the control unit 72 in accordance with the restriction on the matching of the matching unit 71 predetermined by the numerical value. By performing time-normalized matching in accordance with the slope limitation by weighting according to the degree of change, optimal matching between the transient part and the steady part of the energy change is realized without being affected by small fluctuations in energy. As a result, erroneous recognition between words having different energy shapes and erroneous recognition between words having similar phonemes can be significantly reduced.

【００３５】また、初期設定部４で音声の始端終端にお
けるエネルギ―変化度を０に設定して始終端でのマッチ
ングの自由度を持たせたことにより、音声の一部欠落を
生じ易い場合でも、欠落した場合を考慮したマッチング
経路計算が実行され、認識精度を向上させることができ
る。Further, the initial setting unit 4 sets the degree of energy change at the beginning and end of the voice to 0 to allow a degree of freedom in matching at the beginning and end, so that even when a part of the voice is likely to be missing. In addition, matching path calculation is performed in consideration of the case of missing, and the recognition accuracy can be improved.

【００３６】[0036]

【発明の効果】以上のように本発明によれば、エネルギ
―の形状を平滑化処理する平滑化部と、音声パタ―ンの
各フレ―ムの前後でのエネルギ―変化傾向から上昇、下
降といったエネルギ―変化の度合を数値的に定めるエネ
ルギ―変化度決定部と、エネルギ―の変化度合に応じ
て、予めその変化傾向が似通った部分同志がマッチング
されるように定めた制限によりマッチング範囲を限定
し、さらにエネルギ―変化度に応じた重み付けを施した
漸化式により時間正規化マッチングを実行するパタ―ン
マッチング部、音声の始端終端におけるエネルギ―変化
はないものとしてエネルギ―変化度を設定する初期設定
部を設けたことにより、エネルギ―の小さな変化には影
響されずにエネルギ―変化の過渡部、定常部同志の正確
なマッチングが可能となり、認識性能を大幅に向上させ
ることができる。また、エネルギ―変化の度合に応じた
重み付けの効果により、音声の過渡部の違いが強調さ
れ、特に音韻の類似した単語間の誤認識を低減すること
ができる。さらに、始終端のエネルギー変化は変化なし
と初期設定して対応付けの自由度を持たせたことによ
り、音声区間検出を誤り易い音声が入力された場合で
も、語頭語尾の欠落した場合を考慮したマッチング計算
が実行され、高性能の音声認識装置を提供することがで
きる。 As described above, according to the present invention, the smoothing unit for smoothing the shape of the energy and the rising and falling of the energy pattern before and after each frame of the voice pattern. The degree of energy change is numerically determined by an energy change degree determination unit, and the matching range is limited by a predetermined limit in accordance with the degree of change of the energy so that parts having similar change tendency are matched. A pattern matching unit that performs time-normalized matching using a recurrence formula that is limited and weighted according to the degree of energy change, and energy change at the beginning and end of speech
Initial setting to set energy change as if there is no
By providing the section, energy - energy without affected by the small changes - transition portion of the change, it is possible to accurately match the constant region comrades, recognition performance can be greatly improved. In addition, the effect of weighting according to the degree of energy change emphasizes differences in transient parts of speech, and in particular reduces false recognition between words having similar phonemes.
Can be. Furthermore, there is no change in energy at the beginning and end
Initial setting to allow the degree of freedom of association
In the case where a voice that is easy to
, Matching calculation considering missing initial endings
Is implemented and can provide a high-performance speech recognition device.
Wear.

【００３７】また、音声の始端終端におけるエネルギ―
変化はないものとしてエネルギ―変化度を設定する初期
設定部を設けたことにより、音声区間検出を誤り易い音
声が入力された場合でも、語頭語尾の欠落した場合を考
慮したマッチング計算が実行され、高性能の音声認識装
置を提供することができる。The energy at the beginning and end of speech
By providing the initial setting unit that sets the energy-degree of change assuming no change, even when a speech that is apt to cause an error in speech section detection is input, a matching calculation is performed in consideration of the case where the initial ending is missing, A high-performance speech recognition device can be provided.

[Brief description of the drawings]

【図１】本発明の一実施例の音声認識装置のブロック構
成図である。FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.

【図２】本実施例の時間正規化マッチングの傾斜制限を
示す説明図である。FIG. 2 is an explanatory diagram illustrating a slope limitation of time-normalized matching according to the present embodiment.

【図３】本実施例のパタ―ンマッチングの結果例を示す
説明図である。FIG. 3 is an explanatory diagram illustrating an example of a result of pattern matching according to the present embodiment.

[Explanation of symbols]

１音声分析部２平滑化部３エネルギ―変化度決定部４初期設定部５入力パタ―ンメモリ６標準パタ―ンメモリ７パタ―ンマッチング部８類似度比較部１１スペクトルパラメ―タ抽出部１２エネルギ―パラメ―タ抽出部３１エネルギ―差分符号計算部３２エネルギ―変化度計算部７１対応付け部７２制御部 DESCRIPTION OF SYMBOLS 1 Voice analysis part 2 Smoothing part 3 Energy change degree determination part 4 Initial setting part 5 Input pattern memory 6 Standard pattern memory 7 Pattern matching part 8 Similarity comparison part 11 Spectrum parameter extraction part 12 Energy Parameter extraction unit 31 Energy difference code calculation unit 32 Energy change degree calculation unit 71 Correlation unit 72 Control unit

フロントページの続き (56)参考文献特開平１−283599（ＪＰ，Ａ) 特開昭58−211199（ＪＰ，Ａ) 特開昭59−97200（ＪＰ，Ａ) 特開昭60−125899（ＪＰ，Ａ) 特開昭58−139200（ＪＰ，Ａ) 特開平４−295895（ＪＰ，Ａ) 特公昭61−52478（ＪＰ，Ｂ２) 特公平２−23876（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 533 G10L 3/00 531 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-1-283599 (JP, A) JP-A-58-211199 (JP, A) JP-A-59-97200 (JP, A) JP-A-60-125899 (JP) JP-A-58-139200 (JP, A) JP-A-4-295895 (JP, A) JP-B-61-52478 (JP, B2) JP-B-2-23876 (JP, B2) (58) Field surveyed (Int.Cl. ⁶ , DB name) G10L 3/00 533 G10L 3/00 531 JICST file (JOIS)

Claims

(57) [Claims]

1. An audio signal comprising a speech characteristic spectrum and a positive
Speech component to extract time series of normalized logarithmic energy values
Analysis unit and an energy value series obtained from the voice analysis unit
And an energy for each frame at points other than the beginning and end of the speech using an energy difference value between adjacent frames from the output sequence of the smoothing unit. Energy-degree-of-change determining unit that determines the degree of change, and the beginning and end of speech
Always set the degree of energy change at the end to no change
An initial setting unit, the energy change degree determining unit,
In accordance with the degree of change in the energy of the input voice pattern set by the setting unit, the calculation range of matching is limited in accordance with a predetermined restriction on the association with the standard pattern, and the standard pattern and the input pattern are restricted. And a pattern matching unit for performing time-normalized matching between two patterns by a recurrence formula weighting the matching path according to the value of the energy change degree of the pattern. Voice recognition device.