JP2001249681A

JP2001249681A - Device and method for adapting model, recording medium, and pattern recognition device

Info

Publication number: JP2001249681A
Application number: JP2000276856A
Authority: JP
Inventors: Kouchiyo Nakatsuka; 洪長中塚
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-12-28
Filing date: 2000-09-12
Publication date: 2001-09-14

Abstract

PROBLEM TO BE SOLVED: To improve recognition performance. SOLUTION: A voiceless acoustic model correction part 7 performs adaptation of a voiceless acoustic model, which is an acoustic model presenting a voiceless state, based on voice data observed in a section immediately before a voice recognition section to be voice-recognized, and the degree of freshness showing freshness of the acoustic data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、モデル適応装置お
よびモデル適応方法、記録媒体、並びにパターン認識装
置に関し、特に、例えば、音声認識等を行う場合に用い
て好適なモデル適応装置およびモデル適応方法、記録媒
体、並びにパターン認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a model adaptation apparatus, a model adaptation method, a recording medium, and a pattern recognition apparatus, and more particularly, to a model adaptation apparatus and a model adaptation method suitable for performing, for example, speech recognition. , A recording medium, and a pattern recognition device.

【０００２】[0002]

【従来の技術】従来より、ノイズ環境下において発話さ
れた単語等を認識する方法が知られており、その代表的
な方法としては、PMC(Parallel Model Combination)
法、SS/NSS(Spectral Subtraction/Nonlinear Spectral
Subtraction)法、SFE(StochasticFeature Extraction)
法等がある。2. Description of the Related Art Conventionally, there has been known a method of recognizing a word or the like uttered in a noise environment. A typical method is a PMC (Parallel Model Combination).
Method, SS / NSS (Spectral Subtraction / Nonlinear Spectral
Subtraction) method, SFE (Stochastic Feature Extraction)
There is a law.

【０００３】PMC法は、環境ノイズの情報を直接的に音
響モデルに取り込んでいるため、認識性能は良いが、そ
の反面、計算コストが高い。即ち、PHC法では、高度な
演算を必要とするので、装置の規模が大型化し、処理に
要する時間が長くなる。SS/NSS法では、音声データの特
徴量を抽出する段階において、環境ノイズが除去される
ので、PMC法よりも計算コストが低く、そのため現在、
広く用いられている。SFE法では、SS/NSS法と同様に、
環境ノイズを含む音声信号の特徴量を抽出する段階にお
いて、環境ノイズが除去されるが、特徴量として、確率
分布で表されるものが抽出される。SFE法では、このよ
うに、音声の特徴量が、特徴空間上の分布として抽出さ
れる点で、音声の特徴量が、特徴空間上の点として抽出
されるSS/NSS法や、PMC法と異なる。[0003] Since the PMC method directly incorporates information of environmental noise into an acoustic model, the recognition performance is good, but the calculation cost is high. That is, in the PHC method, since a high-level operation is required, the scale of the apparatus is increased, and the time required for the processing is increased. In the SS / NSS method, the environmental noise is removed at the stage of extracting the feature amount of the voice data, so the calculation cost is lower than the PMC method.
Widely used. In the SFE method, similar to the SS / NSS method,
At the stage of extracting the feature amount of the audio signal including the environmental noise, the environmental noise is removed, and a feature amount represented by a probability distribution is extracted. In the SFE method, the feature of speech is extracted as a distribution in the feature space. Thus, the SS / NSS method in which the feature of speech is extracted as a point in the feature space, and the PMC method. different.

【０００４】上述したいずれの方法においても、音声の
特徴量の抽出後は、その特徴量が、予め登録されている
複数の単語等に対応する音響モデルのうちのいずれに最
も適合するかが判定され、最も適合する音響モデルに対
応する単語が認識結果として出力される。In any of the above-described methods, after the feature amount of the voice is extracted, it is determined which of the acoustic models corresponding to a plurality of words or the like registered in advance is most suitable. Then, a word corresponding to the most suitable acoustic model is output as a recognition result.

【０００５】なお、SFE法については、本件出願人が先
に出願した特開平11-133992号（特願平9-300979号）等
に、その詳細が記載されている。また、PMC法、SS/NSS
法、およびSFE法の性能比較等については、例えば、"H.
Pao, H.Honda, K.Minamino, M.Omote, H.Ogawa and N.I
wahashi, Stochastic Feature Extraction for Improvi
ng Noise Robustness in Speech Recognition, Proceed
ings of the 8th Sony Research Forum, SRF98-234, p
p.9-14, October 1998", "N.Iwahashi, H.Pao, H.Hond
a, K.Minamino and M.Omote, Stochastic Features for
Noise Robust in Speech Recognition, ICASSP'98 Pro
ceedings, pp.633-636, May, 1998", "N.Iwahashi, H.P
ao(presented), H.Honda, K.Minamino and M.Omote, No
ise Robust Speech Recognition Using Stochastic Rep
resentation of Features, ASJ'98-Spring Proceeding
s, pp.91-92, March, 1998", "N.iwahashi, H.Pao H.Ho
nda, K.Minamino and M.Omote, Stochastic Represetat
ion of Feature for Noise Robust Speech Recognitio
n, Technical Report of IEICE, pp.19-24, SP97-97(19
98-01)等に、その詳細が記載されている。The details of the SFE method are described in Japanese Patent Application Laid-Open No. Hei 11-133992 (Japanese Patent Application No. 9-300979) previously filed by the present applicant. PMC method, SS / NSS
For comparison of the performance of the SFE method and the SFE method, see, for example, "H.
Pao, H. Honda, K. Minamino, M. Omote, H. Ogawa and NI
wahashi, Stochastic Feature Extraction for Improvi
ng Noise Robustness in Speech Recognition, Proceed
ings of the 8th Sony Research Forum, SRF98-234, p
p. 9-14, October 1998 "," N. Iwahashi, H. Pao, H. Hond
a, K. Minamino and M. Omote, Stochastic Features for
Noise Robust in Speech Recognition, ICASSP'98 Pro
ceedings, pp.633-636, May, 1998 "," N. Iwahashi, HP
ao (presented), H.Honda, K.Minamino and M.Omote, No
ise Robust Speech Recognition Using Stochastic Rep
resentation of Features, ASJ'98-Spring Proceeding
s, pp.91-92, March, 1998 "," N.iwahashi, H.Pao H.Ho
nda, K. Minamino and M. Omote, Stochastic Represetat
ion of Feature for Noise Robust Speech Recognitio
n, Technical Report of IEICE, pp.19-24, SP97-97 (19
98-01) and the like.

【０００６】[0006]

【発明が解決しようとする課題】ところで、上述のSFE
法等においては、音声認識の段階で環境ノイズが直接的
に反映されていない、すなわち、環境ノイズの情報が直
接的に無音音響モデルに取り込まれていないので、声音
認識の対象としている区間において無音の区間が存在す
ると、その無音の区間に起因して、認識性能が低下する
課題があった。However, the above-mentioned SFE
In the law, etc., the environmental noise is not directly reflected in the speech recognition stage, that is, since the information of the environmental noise is not directly taken into the silent acoustic model, the silent When there is a section, there is a problem that the recognition performance is reduced due to the silent section.

【０００７】具体的には、環境ノイズの情報が直接的に
無音音響モデルに取り込まれていないことに起因して、
音声認識が開始された時点から発話が開始されるまでの
時間が長くなると、認識性能が低下する課題があった。More specifically, because information on environmental noise is not directly taken into the silent acoustic model,
If the time from the start of the speech recognition to the start of the utterance becomes long, there is a problem that the recognition performance is reduced.

【０００８】本発明はこのような状況に鑑みてなされた
ものであり、環境ノイズの情報を用いて無音音響モデル
を更新（補正）することにより、音声認識が開始された
時から発話が開始される時までの時間が長くなるに伴っ
て認識性能が低下することを抑止することができるよう
にするものである。The present invention has been made in view of such a situation. By updating (correcting) a silent acoustic model using information of environmental noise, utterance is started from the time when speech recognition is started. It is possible to prevent the recognition performance from deteriorating as the time until the time becomes longer.

【０００９】[0009]

【課題を解決するための手段】本発明のモデル適応装置
は、所定区間における抽出データと、その抽出データの
新しさを表す新鮮度に基づいて、パターン認識に用いる
所定のモデルの適応を行うモデル適応手段を備えること
を特徴とする。According to the present invention, there is provided a model adapting apparatus for adapting a predetermined model used for pattern recognition based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data. It is characterized by having an adaptation means.

【００１０】本発明のモデル適応方法は、所定区間にお
ける抽出データと、その抽出データの新しさを表す新鮮
度に基づいて、所定のモデルの適応を行うモデル適応ス
テップを備えることを特徴とする。[0010] The model adaptation method of the present invention is characterized by comprising a model adapting step of adapting a predetermined model based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data.

【００１１】本発明の記録媒体は、所定区間における抽
出データと、その抽出データの新しさを表す新鮮度に基
づいて、所定のモデルの適応を行うモデル適応ステップ
を備えるプログラムが記録されていることを特徴とす
る。[0011] The recording medium of the present invention stores a program having a model adaptation step of adapting a predetermined model based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data. It is characterized by.

【００１２】本発明のパターン認識装置は、所定区間に
おける抽出データと、その抽出データの新しさを表す新
鮮度に基づいて、所定のモデルの適応を行うモデル適応
手段を備えることを特徴とする。[0012] The pattern recognition apparatus of the present invention is characterized by comprising model adaptation means for adapting a predetermined model based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data.

【００１３】本発明のモデル適応装置およびモデル適応
方法、記録媒体、並びにパターン認識装置においては、
所定区間における抽出データと、その抽出データの新し
さを表す新鮮度に基づいて、所定のモデルの適応が行わ
れる。[0013] In the model adaptation apparatus and model adaptation method, recording medium, and pattern recognition apparatus of the present invention,
A predetermined model is adapted based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data.

【００１４】[0014]

【発明の実施の形態】図１は、本発明を適用した音声認
識装置の一実施の形態の構成例を示している。この音
声認識装置において、マイクロフォン１は、認識対象で
ある発話音声を、環境ノイズとともに集音し、フレーム
化部２に出力する。フレーム化部２は、マイクロフォン
１から入力される音声データを、所定の時間間隔（例え
ば、１０ms）で取り出し、その取り出したデータを、１
フレームのデータとして出力する。フレーム化部２が出
力する１フレーム単位の音声データは、そのフレームを
構成する時系列の音声データそれぞれをコンポーネント
とする観測ベクトルａとして、ノイズ観測区間抽出部
３、および特徴抽出部５に供給される。FIG. 1 shows a configuration example of an embodiment of a speech recognition apparatus to which the present invention is applied. In this voice recognition device, the microphone 1 collects the uttered voice to be recognized together with the environmental noise and outputs it to the framing unit 2. The framing unit 2 extracts audio data input from the microphone 1 at a predetermined time interval (for example, 10 ms), and
Output as frame data. The audio data in units of one frame output from the framing unit 2 is supplied to the noise observation section extraction unit 3 and the feature extraction unit 5 as an observation vector a having each of the time-series audio data constituting the frame as a component. You.

【００１５】ここで、以下、適宜、第ｔフレームの音声
データである観測ベクトルを、ａ（ｔ）と表す。Here, the observation vector, which is the audio data of the t-th frame, is represented as a (t).

【００１６】ノイズ観測区間抽出部３は、フレーム化部
２から入力されるフレーム単位の音声データを所定の時
間（２Ｍフレーム分以上）だけバッファリングし、図２
に示すように、発話スイッチ４がオンとされたタイミン
グｔ_bからＭフレーム分だけ以前のタイミングｔ_aまでを
ノイズ観測区間Ｔｎとして、そのノイズ観測区間Ｔｎに
おける２Ｍフレーム分の観測ベクトルａを抽出して、特
徴抽出部５、および無音音響モデル補正部７に出力す
る。なお、本実施の形態では、ノイズ観測区間は、後述
する特徴分布を抽出するためのノイズ観測区間Ｔｍと、
音響モデルの適応を行うためのノイズ観測区間Ｔｎの２
つに分けられており、ノイズ観測区間ＴｍとＴｎのいず
れも、Ｍフレームとされている。但し、ノイズ観測区間
ＴｍとＴｎのフレーム数は、同一である必要はない。The noise observation section extraction unit 3 buffers the audio data in frame units input from the framing unit 2 for a predetermined time (2M frames or more).
As shown in, the from the timing t _b of the speech switch 4 is turned on until the M frames only previous timing t _a as a noise observation interval Tn, extracts the observation vector a of 2M frames in the noise observation interval Tn Then, it outputs to the feature extraction unit 5 and the silent acoustic model correction unit 7. In the present embodiment, the noise observation section includes a noise observation section Tm for extracting a feature distribution described later,
2 of the noise observation section Tn for adapting the acoustic model
Each of the noise observation sections Tm and Tn is an M frame. However, the number of frames in the noise observation sections Tm and Tn does not need to be the same.

【００１７】発話スイッチ４は、ユーザが発話を開始す
るときにユーザによってオンとされ、発話を終了すると
きにオフとされる。したがって、発話スイッチ４がオン
とされたタイミングｔ_b以前（ノイズ観測区間Ｔｎ）の
音声データには、発話音声は含まれず、環境ノイズだけ
が存在する。また、発話スイッチ４がオンとされたタイ
ミングｔ_bから発話スイッチ４がオフとされるタイミン
グｔ_dまでは、音声認識区間とされて、その音声認識区
間の音声データが音声認識の対象とされる。The utterance switch 4 is turned on by the user when the user starts uttering, and is turned off when ending the utterance. Therefore, the speech switch 4 is in the voice data on the timing t _b before (noise observation interval Tn), speech is not included, only ambient noise is present. Further, from the timing t _b of the speech switch 4 is turned on until the timing t _d the speech switch 4 is turned off, is a voice recognition section, the audio data in the speech recognition section is subjected to speech recognition .

【００１８】特徴抽出部５は、ノイズ観測区間抽出部３
から入力されるノイズ観測区間ＴｍとＴｎのうちの前半
のノイズ観測区間Ｔｍの環境ノイズだけが存在する音声
データに基づいて、フレーム化部２から入力される、タ
イミングｔ_b以降の音声認識区間の観測ベクトルａから
環境ノイズ成分を除去して、その特徴量を抽出する。す
なわち、特徴抽出部５は、例えば、観測ベクトルａとし
ての音声データをフーリエ変換し、そのパワースペクト
ラムを求め、そのパワースペクトラムの各周波数成分を
コンポーネントとする特徴ベクトルｙを算出する。な
お、パワースペクトラムの算出方法は、フーリエ変換に
よるものに限定されるものではない。すなわち、パワー
スペクトラムは、その他、例えば、いわゆるフィルタバ
ンク法などによって求めることも可能である。The feature extraction unit 5 includes a noise observation section extraction unit 3
Based on the audio data only ambient noise in the first half of the noise observation interval Tm of the noise observation interval Tm and Tn to be input exists from input from the framing section 2, after the timing t _b of the speech recognition section The environmental noise component is removed from the observation vector a, and the feature amount is extracted. That is, the feature extraction unit 5 performs, for example, Fourier transform on the audio data as the observation vector a, obtains a power spectrum, and calculates a feature vector y having each frequency component of the power spectrum as a component. The method for calculating the power spectrum is not limited to the method based on the Fourier transform. That is, the power spectrum can also be obtained by, for example, the so-called filter bank method.

【００１９】さらに、特徴抽出部５は、観測ベクトルａ
としての音声データに含まれる音声を、その特徴量の空
間（特徴ベクトル空間）に写像したときに得られる、そ
の特徴ベクトル空間上の分布を表すパラメータ（以下、
特徴分布パラメータと記述する）Ｚを、特徴ベクトルｙ
とノイズ観測区間Ｔｍの環境ノイズに基づいて算出し、
音声認識部６に供給する。Further, the feature extraction unit 5 calculates the observation vector a
A parameter representing the distribution in the feature vector space obtained when the speech included in the speech data as is mapped to the space of the feature amount (feature vector space) (hereinafter, referred to as
Z is described as a feature vector y
And the environmental noise of the noise observation section Tm,
It is supplied to the voice recognition unit 6.

【００２０】図３は、図１の特徴抽出部５の詳細な構成
例を示している。フレーム化部２から入力される観測ベ
クトルａは、特徴抽出部５において、パワースペクトラ
ム分析部１１に供給される。パワースペクトラム分析部
１１では、観測ベクトルａが、例えば、FFT（高速フー
リエ変換）によってフーリエ変換され、これにより、音
声のパワースペクトラムが、特徴ベクトルとして抽出さ
れる。なお、ここでは、１フレームの音声データとして
の観測ベクトルａが、Ｄ個のコンポーネントからなる特
徴ベクトル（Ｄ次元の特徴ベクトル）に変換されるもの
とする。FIG. 3 shows a detailed configuration example of the feature extraction unit 5 of FIG. The observation vector a input from the framing unit 2 is supplied to the power spectrum analysis unit 11 in the feature extraction unit 5. In the power spectrum analysis unit 11, the observation vector a is Fourier-transformed by, for example, FFT (Fast Fourier Transform), whereby the power spectrum of the voice is extracted as a feature vector. Here, it is assumed that the observation vector a as the audio data of one frame is converted into a feature vector (D-dimensional feature vector) including D components.

【００２１】ここで、第ｔフレームの観測ベクトルａ
（ｔ）から得られる特徴ベクトルをｙ（ｔ）と表す。ま
た、特徴ベクトルｙ（ｔ）のうち、真の音声のスペクト
ル成分をｘ（ｔ）と、環境ノイズのスペクトル成分をｕ
（ｔ）と表す。この場合、真の音声のスペクトル成分ｘ
（ｔ）は、次式（１）で表される。Here, the observation vector a of the t-th frame
The feature vector obtained from (t) is represented as y (t). In the feature vector y (t), the spectral component of the true voice is x (t), and the spectral component of the environmental noise is u.
(T). In this case, the spectral component x of the true voice
(T) is represented by the following equation (1).

【数１】ただし、ここでは、環境ノイズが不規則な特性を有し、
また、観測ベクトルａ（ｔ）としての音声データは、真
の音声成分に環境ノイズを加算したものであると仮定し
ている。(Equation 1) However, here, the environmental noise has irregular characteristics,
Also, it is assumed that the audio data as the observation vector a (t) is obtained by adding environmental noise to a true audio component.

【００２２】一方、ノイズ観測区間抽出部３から入力さ
れる音声データとしてのノイズ観測区間Ｔｍにおける環
境ノイズは、特徴検出部５において、ノイズ特性算出部
１３に入力される。ノイズ特性算出部１３では、ノイズ
観測区間Ｔｍにおける環境ノイズの特性が求められる。On the other hand, environmental noise in the noise observation section Tm as speech data input from the noise observation section extraction section 3 is input to the noise characteristic calculation section 13 in the feature detection section 5. The noise characteristic calculation unit 13 obtains environmental noise characteristics in the noise observation section Tm.

【００２３】すなわち、ここでは、音声認識区間におけ
る環境ノイズのパワースペクトラムｕ（ｔ）の分布が、
その音声認識区間の直前のノイズ観測区間Ｔｍにおける
環境ノイズと同一であり、かつ、その分布が正規分布で
あると仮定して、ノイズ特性算出部１３において、その
正規分布を規定する、環境ノイズの平均値（平均ベクト
ル）と分散（分散マトリクス）が求められる。That is, here, the distribution of the power spectrum u (t) of the environmental noise in the speech recognition section is:
Assuming that the noise distribution is the same as the environmental noise in the noise observation section Tm immediately before the speech recognition section, and that the distribution is a normal distribution, the noise characteristic calculating section 13 defines the normal noise of the environmental noise. An average value (average vector) and a variance (variance matrix) are obtained.

【００２４】なお、平均ベクトルμ’と分散マトリクス
Σ’は、次式（２）にしたがって求めることができる。The mean vector μ ′ and the variance matrix Σ ′ can be obtained according to the following equation (2).

【数２】ただし、μ’（ｉ）は、平均ベクトルμ’のｉ番目のコ
ンポーネントを表す（ｉ＝１，２，・・・，Ｄ）。ま
た、ｙ（ｔ）（ｉ）は、第ｔフレームの特徴ベクトルの
ｉ番目のコンポーネントを表す。さらに、Σ’（ｉ，
ｊ）は、分散マトリクスΣ’の、第ｉ行、第ｊ列のコン
ポーネントを表す（ｊ＝１，２，・・・，Ｄ）。(Equation 2) Here, μ ′ (i) represents the i-th component of the average vector μ ′ (i = 1, 2,..., D). Further, y (t) (i) represents the i-th component of the feature vector of the t-th frame. Furthermore, Σ ′ (i,
j) represents the component of the ith row and the jth column of the variance matrix Σ ′ (j = 1, 2,..., D).

【００２５】ここで、計算量の低減のために、環境ノイ
ズについては、特徴ベクトルｙの各コンポーネントが、
互いに無相関であると仮定する。この場合、次式に示す
ように、分散マトリクスΣ’は、対角成分以外は０とな
る。Here, in order to reduce the amount of calculation, regarding environmental noise, each component of the feature vector y
Assume that they are uncorrelated with each other. In this case, as shown in the following equation, the variance matrix Σ ′ is 0 except for the diagonal components.

【数３】 (Equation 3)

【００２６】ノイズ特性算出部１３では、以上のように
して、環境ノイズの特性としての、正規分布を規定する
平均ベクトルμ’および平均値Σ’が求められ、特徴分
布パラメータ算出部１２に供給される。In the noise characteristic calculation section 13, the average vector μ ′ and the average value Σ ′ that define the normal distribution as the environmental noise characteristics are obtained as described above, and are supplied to the characteristic distribution parameter calculation section 12. You.

【００２７】一方、パワースペクトラム分析部１１の出
力、すなわち、環境ノイズを含む発話音声の特徴ベクト
ルｙは、特徴分布パラメータ算出部１２に供給される。
特徴分布パラメータ算出部１２では、パワースペクトラ
ム分析部１１からの特徴ベクトルｙ、およびノイズ特性
算出部１３からの環境ノイズの特性に基づいて、真の音
声のパワースペクトラムの分布（推定値の分布）を表す
特徴分布パラメータが算出される。On the other hand, the output of the power spectrum analyzer 11, that is, the feature vector y of the uttered voice including environmental noise is supplied to the feature distribution parameter calculator 12.
The characteristic distribution parameter calculation unit 12 calculates the distribution of the power spectrum of the true voice (the distribution of the estimated values) based on the characteristic vector y from the power spectrum analysis unit 11 and the environmental noise characteristics from the noise characteristic calculation unit 13. A characteristic distribution parameter to be represented is calculated.

【００２８】すなわち、特徴分布パラメータ算出部１２
では、真の音声のパワースペクトラムの分布が正規分布
であるとして、その平均ベクトルξと分散マトリクスΨ
が、特徴分布パラメータとして、次式（４）乃至（７）
にしたがって計算される。That is, the feature distribution parameter calculator 12
Then, assuming that the distribution of the power spectrum of a true voice is a normal distribution, its average vector ξ and variance matrix Ψ
Are the following as the feature distribution parameters:
Is calculated according to

【数４】 (Equation 4)

【数５】 (Equation 5)

【数６】 (Equation 6)

【数７】 (Equation 7)

【００２９】ここで、ξ（ｔ）（ｉ）は、第ｔフレーム
における平均ベクトルξ（ｔ）のｉ番目のコンポーネン
トを表す。また、Ｅ［］は、［］内の平均値を意味す
る。ｘ（ｔ）（ｉ）は、第ｔフレームにおける真の音声
のパワースペクトラムｘ（ｔ）のｉ番目のコンポーネン
トを表す。さらに、ｕ（ｔ）（ｉ）は、第ｔフレームに
おける環境ノイズのパワースペクトラムのｉ番目のコン
ポーネントを表し、Ｐ（ｕ（ｔ）（ｉ））は、第ｔフレ
ームにおける環境ノイズのパワースペクトラムのｉ番目
のコンポーネントがｕ（ｔ）（ｉ）である確率を表す。
ここでは、環境ノイズの分布として正規分布を仮定して
いるので、Ｐ（ｕ（ｔ）（ｉ））は、式（７）に示した
ように表される。Here, ξ (t) (i) represents the ith component of the average vector ξ (t) in the t-th frame. E [] means an average value in []. x (t) (i) represents the ith component of the true speech power spectrum x (t) in the t-th frame. Further, u (t) (i) represents the ith component of the environmental noise power spectrum at the t-th frame, and P (u (t) (i)) represents the environmental noise power spectrum at the t-th frame. Represents the probability that the i-th component is u (t) (i).
Here, since a normal distribution is assumed as the distribution of the environmental noise, P (u (t) (i)) is expressed as shown in Expression (7).

【００３０】また、Ψ（ｔ）（ｉ，ｊ）は、第ｔフレー
ムにおける分散Ψ（ｔ）の、第ｉ行、第ｊ列のコンポー
ネントを表す。さらに、Ｖ［］は、［］内の分散を表
す。Ψ (t) (i, j) represents the component of the ith row and jth column of the variance Ψ (t) in the tth frame. Further, V [] represents the variance in [].

【００３１】特徴分布パラメータ算出部１２では、以上
のようにして、各フレームごとに、平均ベクトルξおよ
び分散マトリクスΨが、真の音声の特徴ベクトル空間上
での分布（ここでは、真の音声の特徴ベクトル空間上で
の分布が正規分布であると仮定した場合の、その分布）
を表す特徴分布パラメータとして求められる。In the feature distribution parameter calculation unit 12, as described above, the average vector ξ and the variance matrix 、 are distributed for each frame in the feature vector space of the true speech (here, the true speech Distribution assuming normal distribution in feature vector space)
Is obtained as a feature distribution parameter representing

【００３２】その後、音声認識区間の各フレームにおい
て求めた特徴分布パラメータは、音声認識部６に出力さ
れる。すなわち、いま、音声認識区間がＴフレームであ
ったとし、そのＴフレームそれぞれにおいて求められた
特徴分布パラメータを、ｚ（ｔ）＝｛ξ（ｔ），Ψ
（ｔ）｝（ｔ＝１，２，・・・，Ｔ）と表すと、特徴分
布パラメータ算出部１２は、特徴分布パラメータ（系
列）Ｚ＝｛ｚ（１），ｚ（２），・・・，ｚ（Ｔ）｝
を、音声認識部６に供給する。Thereafter, the feature distribution parameters obtained in each frame of the speech recognition section are output to the speech recognition unit 6. That is, it is now assumed that the speech recognition section is a T frame, and the characteristic distribution parameters obtained in each of the T frames are represented by z (t) = {(t), Ψ
(T)｝ (t = 1, 2,..., T), the feature distribution parameter calculation unit 12 calculates the feature distribution parameter (series) Z = Ｚz (1), z (2),.・, Z (T)｝
Is supplied to the voice recognition unit 6.

【００３３】図１に戻り、音声認識部６は、特徴抽出部
５から入力される特徴分布パラメータＺを、所定数Ｋの
音響モデルと１個の無音音響モデルのうちのいずれかに
分類し、その分類結果を、入力された音声の認識結果と
して出力する。すなわち、音声認識部６は、例えば、無
音区間に対応する識別関数（特徴パラメータＺが無音音
響モデルに分類されるかを識別するための関数）と、所
定数Ｋの単語それぞれに対応する識別関数（特徴パラメ
ータＺがいずれの音響モデルに分類されるかを識別する
ための関数）とを記憶しており、各音響モデルの識別関
数の値を、特徴抽出部５からの特徴分布パラメータＺを
引数として計算する。そして、その関数値（いわゆるス
コア）が最大である音響モデル（単語、または無音（ノ
イズ））が認識結果として出力される。Returning to FIG. 1, the speech recognition unit 6 classifies the feature distribution parameter Z input from the feature extraction unit 5 into one of a predetermined number K of acoustic models and one silent acoustic model. The classification result is output as a recognition result of the input speech. That is, the speech recognition unit 6 includes, for example, an identification function corresponding to a silent section (a function for identifying whether the feature parameter Z is classified as a silent acoustic model) and an identification function corresponding to each of the predetermined number K of words. (A function for identifying which acoustic model the feature parameter Z is classified into), and the value of the identification function of each acoustic model is used as an argument with the feature distribution parameter Z from the feature extraction unit 5 as an argument. Is calculated as Then, an acoustic model (word or silence (noise)) having the largest function value (so-called score) is output as a recognition result.

【００３４】即ち、図４は、図１の音声認識部６の詳細
な構成例を示している。特徴抽出部５の特徴分布パラメ
ータ算出部１２から入力される特徴分布パラメータＺ
は、識別関数演算部２１−１乃至２１−ｋ、および識別
関数演算部２１−ｓに供給される。識別関数演算部２１
−ｋ（ｋ＝１，２，・・・，Ｋ）は、Ｋ個の音響モデル
のうちのｋ番目に対応する単語を識別するための識別関
数Ｇ_k（Ｚ）を記憶しており、特徴抽出部５からの特徴
分布パラメータＺを引数として、識別関数Ｇ_k（Ｚ）を
演算する。識別関数演算部２１−ｓは、無音音響モデル
に対応する無音区間を識別するための識別関数Ｇ
_s（Ｚ）を記憶しており、特徴抽出部５からの特徴分布
パラメータＺを引数として、識別関数Ｇ_s（Ｚ）を演算
する。FIG. 4 shows a detailed configuration example of the voice recognition section 6 of FIG. The feature distribution parameter Z input from the feature distribution parameter calculation unit 12 of the feature extraction unit 5
Are supplied to the identification function operation units 21-1 to 21-k and the identification function operation unit 21-s. Discriminant function operation unit 21
−k (k = 1, 2,..., K) stores an identification function G _k (Z) for identifying the word corresponding to the k-th of the K acoustic models. The discrimination function G _k (Z) is calculated using the feature distribution parameter Z from the extraction unit 5 as an argument. The identification function calculation unit 21-s identifies the identification function G for identifying a silent section corresponding to the silent acoustic model.
_s (Z) is stored, and the discrimination function G _s (Z) is calculated using the feature distribution parameter Z from the feature extraction unit 5 as an argument.

【００３５】なお、音声認識部６では、例えば、HMM(Hi
dden Markov Model)法を用いて、クラスとしての単語ま
たは無音の識別（認識）が行われる。In the speech recognition unit 6, for example, HMM (Hi
Using the dden Markov Model) method, a word or silence as a class is identified (recognized).

【００３６】HMM法について、図５を参照して説明す
る。同図において、HMMは、Ｈ個の状態ｑ₁乃至ｑ_Hを有
しており、状態の遷移は、自身への遷移と、右隣の状態
への遷移のみが許されている。また、初期状態は、最も
左の状態ｑ₁とされ、最終状態は、最も右の状態ｑ_Hとさ
れており、最終状態ｑ_Hからの状態遷移は禁止されてい
る。このように、自身よりも左にある状態への遷移のな
いモデルは、left-to-rightモデルと呼ばれ、音声認識
では、一般に、left-to-rightモデルが用いられる。The HMM method will be described with reference to FIG. In the figure, the HMM has H states q _{1 to} q _H , and as for the state transition, only the transition to itself and the transition to the state on the right are permitted. The initial state is the leftmost state q _1, the final state is the rightmost state q _H, the state transition from the final state q _H is prohibited. As described above, a model having no transition to a state located to the left of itself is called a left-to-right model, and in speech recognition, a left-to-right model is generally used.

【００３７】いま、HMMのｋクラスを識別するためのモ
デルを、ｋクラスモデルというとすると、ｋクラスモデ
ルは、例えば、最初に状態ｑ_hにいる確率（初期状態確
率）π_k（ｑ_h）、ある時刻（フレーム）ｔにおいて、状
態ｑ_iにいて、次の時刻ｔ＋１において、状態ｑ_jに状態
遷移する確率（遷移確率）ａ_k（ｑ_i，ｑ_j）、および状
態ｑ_iから状態遷移が生じるときに、その状態ｑ_iが、特
徴ベクトルＯを出力する確率（出力確率）ｂ_k（ｑ_i）
（Ｏ）によって規定される（ｈ＝１，２，・・・，
Ｈ）。Now, assuming that a model for identifying the k class of the HMM is a k class model, the k class model is, for example, a probability of being initially in the state q _h (initial state probability) π _k (q _h ). , At a certain time (frame) t, the state q _i , and at the next time t + 1, the probability of transition to the state q _j (transition probability) a _k (q _i , q _j ) and the state transition from the state q _i Occurs, the state q _i is the probability (output probability) b _k (q _i ) of outputting the feature vector O
(H) (h = 1, 2,...,
H).

【００３８】そして、ある特徴ベクトル系列Ｏ₁，Ｏ₂，
・・・が与えられた場合、例えば、そのような特徴ベク
トル系列が観測される確率（観測確率）が最も高いモデ
ルのクラスが、その特徴ベクトル系列の認識結果とされ
る。Then, certain feature vector sequences O ₁ , O ₂ ,
.. Are given, for example, the class of the model having the highest probability of observing such a feature vector sequence (observation probability) is taken as the recognition result of the feature vector sequence.

【００３９】ここでは、この観測確率が、識別関数Ｇ_k
（Ｚ）によって求められる。すなわち、識別関数Ｇ
_k（Ｚ）は、特徴分布パラメータ（系列）Ｚ＝｛ｚ₁，ｚ
₂，・・・，ｚ_T｝に対する最適状態系列（最適な状態の
遷移のしていき方）において、そのような特徴分布パラ
メータ（系列）Ｚ＝｛ｚ₁，ｚ₂，・・・，ｚ_T｝が観測
される確率を求めるものとして、次式（８）で与えられ
る。Here, the observation probability is determined by the discriminant function G _k
(Z). That is, the discriminant function G
_k (Z) is a feature distribution parameter (series) Z = ｛z ₁ , z
₂ ,..., Z _T }, such a feature distribution parameter (sequence) Z = {z ₁ , z ₂ ,. _T ｝ is given by the following equation (8) to determine the probability of being observed.

【数８】 (Equation 8)

【００４０】ここで、ｂ_k’（ｑ_i）（ｚ_j）は、出力が
ｚ_jで表される分布であるときの出力確率を表す。状態
遷移時に各特徴ベクトルを出力する確率である出力確率
ｂ_k（ｓ）（Ｏ_t）には、ここでは、例えば、特徴ベクト
ル空間上のコンポーネントに相関がないものとして、正
規分布関数が用いられている。この場合、入力がｚ_tで
表される分布であるとき、出力確率ｂ_k’（ｓ）（ｚ_t）
は、平均ベクトルμ_k（ｓ）と分散マトリクスΣ_k（ｓ）
とによって規定される確率密度関数Ｐ_k ^m（ｓ）（ｘ）、
および第ｔフレームの特徴ベクトル（ここでは、パワー
スペクトラム）ｘの分布を表す確率密度関数Ｐ^f（ｔ）
（ｘ）を用いて、次式（９）により求めることができ
る。Here, b _k ′ (q _i ) (z _j ) represents the output probability when the output is a distribution represented by z _j . For the output probability b _k (s) (O _t ), which is the probability of outputting each feature vector at the time of state transition, for example, a normal distribution function is used assuming that components on the feature vector space have no correlation. ing. In this case, when the input has a distribution represented by z _t , the output probability b _k ′ (s) (z _t )
Is the mean vector μ _k (s) and the variance matrix Σ _k (s)
The probability density is defined by a function _{^{P k m (s) (x}} ),
And a probability density function P ^f (t) representing the distribution of the feature vector (here, the power spectrum) x of the t-th frame
Using (x), it can be obtained by the following equation (9).

【数９】ただし、式（９）における積分の積分区間は、Ｄ次元の
特徴ベクトル空間（ここでは、パワースペクトラム空
間）の全体である。(Equation 9) However, the integration interval of the integration in Expression (9) is the entirety of the D-dimensional feature vector space (here, the power spectrum space).

【００４１】また、式（９）において、Ｐ（ｓ）（ｉ）
（ξ（ｔ）（ｉ），Ψ（ｔ）（ｉ，ｉ））は、次式（１
０）で表される。In equation (9), P (s) (i)
(Ξ (t) (i), Ψ (t) (i, i)) is given by the following equation (1)
0).

【数１０】ただし、μ_k（ｓ）（ｉ）は、平均ベクトルμ_k（ｓ）の
ｉ番目のコンポーネントを、Σ_k（ｓ）（ｉ，ｉ）は、
分散マトリクスΣ_k（ｓ）の、第ｉ行第ｉ列のコンポー
ネントを、それぞれ表す。そして、ｋクラスモデルの出
力確率は、これらによって規定される。(Equation 10) Where μ _k (s) (i) is the ith component of the mean vector μ _k (s), and Σ _k (s) (i, i) is
The components in the ith row and the ith column of the variance matrix Σ _k (s) are respectively represented. The output probability of the k-class model is defined by these.

【００４２】なお、HMMは、上述したように、初期状態
確率π_k（ｑ_h）、遷移確率ａ_k（ｑ_i，ｑ_j）、および出
力確率ｂ_k（ｑ_i）（Ｏ）によって規定されるが、これら
は、学習用の音声データから特徴ベクトルを算出し、そ
の特徴ベクトルを用いて、予め求めることとする。As described above, the HMM is defined by the initial state probability π _k (q _h ), transition probability a _k (q _i , q _j ), and output probability b _k (q _i ) (O). However, these are calculated in advance by calculating a feature vector from the speech data for learning, and using the feature vector.

【００４３】ここで、HMMとして、図５に示したものを
用いる場合には、常に、最も左の状態ｑ₁から遷移が始
まるので、状態ｑ₁に対応する初期状態確率だけが１と
され、他の状態に対応する初期状態確率はすべて０とさ
れる。また、出力確率は、式（９），（１０）から明ら
かなように、Ψ（ｔ）（ｉ，ｉ）を０とすると、特徴ベ
クトルの分散を考慮しない場合の連続HMMにおける出力
確率に一致する。[0043] Here, as HMM, when used as shown in FIG. 5 is always so leftmost transition from the state q ₁ of starts, only the initial state probability corresponding to the state q ₁ is 1, The initial state probabilities corresponding to other states are all set to 0. Further, as is apparent from equations (9) and (10), when Ψ (t) (i, i) is set to 0, the output probability matches the output probability in the continuous HMM when the variance of the feature vector is not considered. I do.

【００４４】なお、HMMの学習方法としては、例えば、B
aum-Welchの再推定法などが知られている。As an HMM learning method, for example, B
Aum-Welch re-estimation method and the like are known.

【００４５】図４に戻る。識別関数演算部２１−ｋ（ｋ
＝１，２，・・・，Ｋ）は、ｋクラスモデルについて、
あらかじめ学習により求められている初期状態確率π_k
（ｑ_h）、遷移確率ａ_k（ｑ_i，ｑ_j）、および出力確率ｂ
_k（ｑ_i）（Ｏ）によって規定される式（８）の識別関数
Ｇ_k（Ｚ）を記憶しており、特徴抽出部２からの特徴分
布パラメータＺを引数として、識別関数Ｇ_k（Ｚ）を演
算し、その関数値（上述した観測確率）Ｇ_k（Ｚ）を、
決定部２２に出力する。識別関数演算部２１−ｓは、無
音音響モデル補正部７から供給される初期状態確率π_s
（ｑ_h）、遷移確率ａ_s（ｑ_i，ｑ_j）、および出力確率ｂ
_s（ｑ_i）（Ｏ）によって規定される、式（８）の識別関
数Ｇ_k（Ｚ）と同様の識別関数Ｇ_s（Ｚ）を記憶してお
り、特徴抽出部２からの特徴分布パラメータＺを引数と
して、識別関数Ｇ_s（Ｚ）を演算し、その関数値（上述
した観測確率）Ｇ_s（Ｚ）を、決定部２２に出力する。Returning to FIG. Discriminant function operation unit 21-k (k
= 1, 2,..., K)
Initial state probability π _k previously obtained by learning
(Q _h ), transition probability a _k (q _i , q _j ), and output probability b
_k (q _i) expression defined by (O) stores a discriminant function G _k (Z) of (8), the feature distribution parameter Z from the feature extraction section 2 as an argument, the identification function G _k (Z ), And the function value (the above-described observation probability) G _k (Z) is calculated as
Output to the determination unit 22. The discrimination function calculation unit 21-s calculates the initial state probability π _s supplied from the silent acoustic model correction unit 7.
(Q _h ), transition probability a _s (q _i , q _j ), and output probability b
_s (q _i ) (O), which stores a discriminant function G _s (Z) similar to the discriminant function G _k (Z) in equation (8), and stores a feature distribution parameter from the feature extracting unit 2. The identification function G _s (Z) is calculated using Z as an argument, and the function value (the above-described observation probability) G _s (Z) is output to the determination unit 22.

【００４６】決定部２２では、識別関数演算部２１−１
乃至２１−ｋ、および識別関数演算部２１−ｓそれぞれ
からの関数値Ｇ_k（Ｚ）（ここでは、関数値Ｇ_s（Ｚ）を
含むものとする）に対して、例えば、次式（１１）に示
す決定規則を用いて、特徴分布パラメータＺ、すなわ
ち、入力された音声が属するクラス（音響モデル）が識
別される。In the decision unit 22, the discriminant function operation unit 21-1
With respect to the function values G _k (Z) (here, the function values G _s (Z) are included here) from the individual function units 21 to k and the discriminant function operation unit 21-s, for example, the following expression (11) is used. Using the decision rule shown, the feature distribution parameter Z, that is, the class (acoustic model) to which the input speech belongs is identified.

【数１１】ただし、Ｃ（Ｚ）は、特徴分布パラメータＺが属するク
ラスを識別する識別操作（処理）を行う関数を表す。ま
た、式（１１）の第２式の右辺におけるmaxは、それに
続く関数値Ｇ_i（Ｚ）（ただし、ここでは、ｉ＝ｓ，
１，２，・・・，Ｋ）の最大値を表す。[Equation 11] Here, C (Z) represents a function for performing an identification operation (processing) for identifying a class to which the feature distribution parameter Z belongs. Further, max on the right side of the second equation of the equation (11) is a function value G _i (Z) (where i = s,
1, 2,..., K).

【００４７】決定部２２は、式（１１）にしたがって、
クラスを決定すると、それを、入力された音声の認識結
果として出力する。The deciding unit 22 calculates according to the equation (11)
When the class is determined, it is output as a recognition result of the input speech.

【００４８】図１に戻り、無音音響モデル補正部７は、
ノイズ観測区間抽出部３から入力されるノイズ観測区間
ＴｍとＴｎのうちの後半のノイズ観測区間Ｔｎにおける
音声データとしての環境ノイズに基づいて、音声認識部
６に記憶されている無音音響モデルに対応する識別関数
Ｇ_s（Ｚ）を生成し、この識別関数Ｇ_s（Ｚ）によって、
音声認識部６に記憶されている無音音響モデルの適応を
行う。Returning to FIG. 1, the silent acoustic model correction unit 7
Based on the environmental noise as speech data in the latter half of the noise observation section Tn of the noise observation section Tm and Tn input from the noise observation section extraction section 3, corresponding to the silent acoustic model stored in the speech recognition section 6. to generate a discriminant function G _s (Z), this identification function G _s (Z),
The silent acoustic model stored in the speech recognition unit 6 is adapted.

【００４９】具体的には、無音音響モデル補正部７で
は、ノイズ観測区間抽出部３から入力されるノイズ観測
区間Ｔｎの音声データ（環境ノイズ）のＭ個のフレーム
の各フレームについて、特徴ベクトルｙが観測され、さ
らに、特徴抽出部５における場合と同様にして、次式で
示す特徴分布の系列が生成される。More specifically, the silent sound model correction unit 7 calculates a feature vector y for each of M frames of speech data (environmental noise) in the noise observation section Tn input from the noise observation section extraction unit 3. Are observed, and a sequence of feature distribution represented by the following equation is generated in the same manner as in the case of the feature extraction unit 5.

【数１２】なお、特徴分布｛Ｆ_i（ｙ），ｉ＝１，２，・・・，
Ｍ｝は、確率密度関数(Probabilistic Density Functio
n)であり、以下、無音特徴分布PDFとも記述する。ま
た、無音特徴分布Ｆ_i（ｙ）におけるサフィックスｉ
は、ノイズ観測区間Ｔｎの先頭フレームからのフレーム
数を表す。(Equation 12) Note that the feature distribution ｛F _i (y), i = 1, 2,.
M｝ is the probability density function (Probabilistic Density Functio
n), and is also hereinafter referred to as a silent feature distribution PDF. Also, the suffix i in the silent feature distribution F _i (y)
Represents the number of frames from the first frame of the noise observation section Tn.

【００５０】次に、無音特徴分布PDFを、次式（１３）
に従い、無音音響モデルに対応する確率分布Ｆ_s（ｙ）
に写像する。Next, the silence feature distribution PDF is calculated by the following equation (13).
According to the probability distribution F _s (y) corresponding to the silent acoustic model
Map to

【数１３】ただし、Ｖは無音特徴分布PDF｛Ｆ_i（ｙ），ｉ＝１，
２，・・・，Ｍ｝を無音音響モデルＦ_s（ｙ）に写像す
る補正関数（写像関数）である。(Equation 13) Here, V is a silent feature distribution PDF ｛F _i (y), i = 1,
, M} is a correction function (mapping function) for mapping the silent acoustic model F _s (y).

【００５１】この写像は、無音特徴分布PDFの記述によ
って様々な方法が考えられるが、例えば、次式を採用す
ることができる。Various methods can be used for this mapping depending on the description of the silence feature distribution PDF. For example, the following equation can be adopted.

【数１４】ただし、β_i（Ｆ₁（ｙ），Ｆ₂（ｙ），・・・，Ｆ
_M（ｙ），Ｍ）は、ノイズ観測区間Ｔｎの第１フレーム
から得られる無音特徴分布に対する重み関数であり、以
下、β_iと記述する。なお、重み関数β_iは、次式（１
６）の条件を満足するものである。[Equation 14] Where β _i (F ₁ (y), F ₂ (y),..., F
_M (y), M) is a weighting function for the silent feature distribution obtained from the first frame of the noise observation section Tn, and is hereinafter referred to as β _i . The weighting function β _i is given by the following equation (1)
This satisfies the condition of 6).

【数１５】 (Equation 15)

【００５２】ここで、無音音響モデルの確率分布Ｆ
_s（ｙ）が正規分布であると仮定し、また、各フレーム
の特徴ベクトルを構成するコンポーネントが無相関であ
ると仮定すれば、無音特徴分布PDF｛Ｆ_i（ｙ），ｉ＝
１，２，・・・，Ｍ｝の共分散行列Σ_iは対角線行列と
なる。ただし、この仮定の前提条件として、無音音響モ
デルの共分散行列も対角線行列であることが必要であ
る。Here, the probability distribution F of the silent acoustic model
Assuming that _s (y) is a normal distribution, and that the components constituting the feature vector of each frame are uncorrelated, the silent feature distribution PDF @ F _i (y), i =
The covariance matrix Σ _i of 1, 2,..., M｝ is a diagonal matrix. However, as a precondition for this assumption, the covariance matrix of the silent acoustic model must also be a diagonal matrix.

【００５３】ノイズ観測区間Ｔｎにおける各フレームの
特徴ベクトルｙを構成するコンポーネントが無相関であ
れば、無音特徴分布PDF｛Ｆ_i（ｙ），ｉ＝１，２，・・
・，Ｍ｝は、各コンポーネントに対応する平均と分散を
持つ正規分布Ｇ（Ｅ_i，Σ_i）となる。但し、Ｅ_iはＦ
_i（ｙ）の平均値（期待値）であり、Σ_iはＦ_i（ｙ）の
共分散行列である。即ち、ノイズ観測区間Ｔｎの各フレ
ームから得られる無音特徴分布の平均をμ_i、分散をσ_i
²と表すことにすれば、無音特徴分布の確率密度関数
は、正規分布Ｇ（μ_i，σ_i ²）（ｉ＝１，２，・・・，
Ｍ）で表すことができる。If the components constituting the feature vector y of each frame in the noise observation section Tn are uncorrelated, the silent feature distribution PDF @ F _i (y), i = 1, 2,.
·, M} becomes the normal distribution G with mean and variance corresponding to each component (E _{_i,} Σ _i). Where E _i is F
_i (y) is the average value (expected value), and Σ _i is the covariance matrix of F _i (y). That is, the average of the silent feature distribution obtained from each frame in the noise observation section Tn is μ _i , and the variance is σ _i
² , the probability density function of the silent feature distribution is represented by a normal distribution G (μ _i , σ _i ² ) (i = 1, 2,...,
M).

【００５４】以上の仮定により、各フレームに対応する
平均μ_i、および分散σ_i ²を用い、以下に示す様々な方
法によって、無音音響モデルＦ_s（Ｘ）を近似する正規
分布Ｇ（μ_s，σ_s ²）（上述したＧ_s（Ｚ）に相当する）
を演算することができる。Based on the above assumption, the normal distribution G (μ _s ) approximating the silent acoustic model F _s (X) by the following various methods using the average μ _i and the variance σ _i ² corresponding to each frame. , Σ _s ² ) (corresponding to G _s (Z) described above)
Can be calculated.

【００５５】無音音響モデルの正規分布Ｇ（μ_s，
σ_s ²）を演算する第１の方法は、無音特徴分布｛Ｇ（μ
_i，σ_i ²），ｉ＝１，２，・・・，Ｍ｝を用い、式（１
７）に示すように、全てのμ_iの平均を無音音響モデル
の平均値μ_sとするとともに、式（１８）に示すよう
に、全てのσ_i ²の平均を無音音響モデルの分散σ_i ²とす
る方法である。The normal distribution G (μ _s ,
A first method of calculating σ _s ² ) is a silent feature distribution ｛G (μ
_i , σ _i ² ), i = 1, 2,.
As shown in 7), the average of all μ _{i is taken} as the average value μ _s of the silent acoustic model, and as shown in equation (18), the average of all σ _i ^{2 is taken} as the variance σ _{i of the} silent acoustic model. ^It is a method to be ² .

【数１６】ここで、ａおよびｂは、シミュレーションにより最適な
値が決定される係数である。(Equation 16) Here, a and b are coefficients for which an optimal value is determined by simulation.

【００５６】無音音響モデルの正規分布Ｇ（μ_s，
σ_s ²）を演算する第２の方法は、無音特徴分布｛Ｇ（μ
_i，σ_i ²），ｉ＝１，２，・・・，Ｍ｝の期待値μ_iだけ
を用い、次式（１９），（２０）に従って、無音音響モ
デルの平均値μ_sと、分散σ_i ²を演算する方法である。The normal distribution G (μ _s ,
A second method for calculating σ _s ² ) is a silent feature distribution ｛G (μ
_i , σ _i ² ), i = 1, 2,..., M}, using only the expected value μ _i , and according to the following equations (19) and (20), the average value μ _s of the silent acoustic model and the variance This is a method of calculating σ _i ² .

【数１７】ここで、ａおよびｂは、シミュレーションにより最適な
値が決定される係数である。[Equation 17] Here, a and b are coefficients for which an optimal value is determined by simulation.

【００５７】無音音響モデルの正規分布Ｇ（μ_s，
σ_s ²）を演算する第３の方法は、無音特徴分布｛Ｇ（μ
_i，σ_i ²），ｉ＝１，２，・・・，Ｍ｝の組み合わせに
よって、無音音響モデルの平均値μ_sと、分散σ_s ²を演
算する方法である。The normal distribution G (μ _s ,
A third method for calculating σ _s ² ) is a silent feature distribution ｛G (μ
_i , σ _i ² ), i = 1, 2,..., M}, to calculate the average μ _s and the variance σ _s ² of the silent acoustic model.

【００５８】この方法においては、各無音特徴分布Ｇ
（μ_i，σ_i ²）の確率統計量をＸ_iとする。In this method, each silent feature distribution G
Let X _i be the probability statistic of (μ _i , σ _i ² ).

【数１８】 (Equation 18)

【００５９】ここで、無音音響モデルの正規分布Ｇ（μ
_s，σ_s ²）の確率統計量をＸ_sとすれば、確率統計量Ｘ_s
は、次式（２２）に示すように、確率統計量Ｘ_iと重み
関数β _iの線形結合で表すことができる。なお、重み関
数β_iは式（１６）の条件を満足している。Here, the normal distribution G (μ
_s, Σ_s ^Two) Is the probability statistic X_sThen the probability statistic X_s
Is a probability statistic X as shown in the following equation (22)._iAnd weight
Function β _iCan be represented by a linear combination of Note that the weights
Number β_iSatisfies the condition of equation (16).

【数１９】 [Equation 19]

【００６０】そして、無音音響モデルの正規分布Ｇ（μ
_s，σ_s ²）は、次式（２３）に示すように表される。Then, the normal distribution G (μ
_s , σ _s ² ) is expressed as shown in the following equation (23).

【数２０】 (Equation 20)

【００６１】なお、式（２３）において、重み関数β_i
は、一般には、例えば、１／Ｍとすることができ、この
場合、式（２３）の平均値μ_sと分散σ_s ²は、例えば、
次式で示すように、所定の係数を用いて求められる。In equation (23), the weight function β _i
Can be generally set to, for example, 1 / M. In this case, the average value μ _s and the variance σ _s ² of the equation (23) are, for example,
As shown by the following equation, it is obtained by using a predetermined coefficient.

【数２１】ここで、ａおよびｂは、シミュレーションにより最適な
値が決定される係数である。(Equation 21) Here, a and b are coefficients for which an optimal value is determined by simulation.

【００６２】無音音響モデルの正規分布Ｇ（μ_s，
σ_s ²）を演算する第４の方法では、無音特徴分布｛Ｇ
（μ_i，σ_i ²），ｉ＝１，２，・・・，Ｍ｝の確率統計
量Ｘ_iに対応する統計母集団Ω_i＝｛ｆ_i,j｝を仮定す
る。ここで、The normal distribution G (μ _s ,
In the fourth method for calculating σ _s ² ), the silent feature distribution ｛G
Assume a statistical population Ω _i = {f _{i, j} } corresponding to the probability statistic X _i of (μ _i , σ _i ² ), i = 1, 2,. here,

【数２２】とすれば、平均値μ_iは、次式（２６）によって得るこ
とができ、分散σ_i ²は、次式（２８）によって得ること
ができる。(Equation 22) Then, the average value μ _i can be obtained by the following equation (26), and the variance σ _i ² can be obtained by the following equation (28).

【数２３】 (Equation 23)

【００６３】式（２８）を変形すれば、次式（２９）の
関係が成立する。By modifying equation (28), the following equation (29) holds.

【数２４】 (Equation 24)

【００６４】ここで、統計母集団の和ΩHere, the sum Ω of the statistical population

【数２５】を考慮すれば、式（２６）から次式（３０），（３１）
が導かれ、式（２９）から次式（３２）乃至（３４）が
導かれる。(Equation 25) In consideration of the following, the following Expressions (30) and (31) are obtained from Expression (26).
Is derived, and the following expressions (32) to (34) are derived from the expression (29).

【数２６】 (Equation 26)

【００６５】なお、実際には、式（３１）と式（３４）
は、次式に示すように、係数が乗算されて用いられる。In practice, the equations (31) and (34)
Is used after being multiplied by a coefficient as shown in the following equation.

【数２７】ここで、ａおよびｂは、シミュレーションにより最適な
値が決定される係数である。[Equation 27] Here, a and b are coefficients for which an optimal value is determined by simulation.

【００６６】また、次式（３７）を採用することも可能
である。なお、式（３７）では、分散σ_i ²に対してだ
け、係数ｂが乗算されている。The following equation (37) can also be employed. In the equation (37), only the variance σ _i ² is multiplied by the coefficient b.

【数２８】 [Equation 28]

【００６７】次に、図１の音声認識装置の動作について
説明する。Next, the operation of the speech recognition apparatus of FIG. 1 will be described.

【００６８】フレーム化部２には、マイクロフォン１で
集音された音声データ（環境ノイズを含む認識対象の発
話音声）が入力され、そこでは、音声データがフレーム
化され、各フレームの音声データは、観測ベクトルａと
して、ノイズ観測区間抽出部３、および特徴抽出部５に
順次供給される。ノイズ観測区間抽出部３では、発話ス
イッチ４がオンとされたタイミングｔ_b以前のノイズ観
測区間ＴｍとＴｎの音声データ（環境ノイズ）が抽出さ
れて、特徴抽出部５および無音音響モデル補正部７に供
給される。The framing unit 2 receives the voice data collected by the microphone 1 (the uttered voice to be recognized including environmental noise), where the voice data is framed, and the voice data of each frame is , And the observation vector a are sequentially supplied to the noise observation section extraction unit 3 and the feature extraction unit 5. The noise observation interval extraction section 3, a timing t _b previous noise observation interval Tm and Tn of audio data speech switch 4 is turned on (environmental noise) is extracted, the feature extraction unit 5 and the silence acoustic model correction section 7 Supplied to

【００６９】無音音響モデル補正部７では、ノイズ観測
区間Ｔｍの音声データとしての環境ノイズに基づいて、
ノイズ観測区間Ｔｎの各フレームから無音特徴分布PDF
が求められる。さらに、無音音響モデル補正部７では、
特徴分布PDFに基づいて、上述した第１乃至第４の方法
のうちのいずれかによって、無音音響モデルの更新（適
応）が行われ、音声認識部６に供給される。音声認識部
６では、無音音響モデル補正部７から供給される無音音
響モデルとしての識別関数によって、それまで記憶され
ていた無音音響モデルに対応する識別関数が更新され
る。即ち、無音音響モデルの適応が行われる。The silent acoustic model correcting section 7 calculates the noise based on the environmental noise as the voice data in the noise observation section Tm.
Silence feature distribution from each frame of noise observation section Tn PDF
Is required. Further, in the silent sound model correcting unit 7,
Based on the feature distribution PDF, the silence acoustic model is updated (adapted) by any one of the above-described first to fourth methods, and is supplied to the speech recognition unit 6. In the speech recognition unit 6, the identification function corresponding to the silence acoustic model that has been stored is updated by the identification function as the silence acoustic model supplied from the silence acoustic model correction unit 7. That is, adaptation of the silent acoustic model is performed.

【００７０】一方、特徴抽出部５では、フレーム化部２
からの観測ベクトルａとしての音声データが音響分析さ
れ、その特徴ベクトルｙが求められる。さらに、特徴抽
出部５では、求められた特徴ベクトルｙに基づいて、特
徴ベクトル空間における分布を表す特徴分布パラメータ
Ｚが算出され、音声認識部６に供給される。音声認識部
６では、特徴抽出部５からの特徴分布パラメータを用い
て、無音および所定数Ｋの単語それぞれに対応する音響
モデルの識別関数の値が演算され、その関数値が最大と
なる音響モデルが、音声の認識結果として出力される。On the other hand, in the feature extracting unit 5, the framing unit 2
The audio data as the observation vector a from the audio data is acoustically analyzed, and its feature vector y is obtained. Further, the feature extraction unit 5 calculates a feature distribution parameter Z representing a distribution in a feature vector space based on the obtained feature vector y, and supplies the calculated feature distribution parameter Z to the speech recognition unit 6. The speech recognition unit 6 calculates the values of the identification functions of the acoustic models corresponding to the silence and the predetermined number K of words using the feature distribution parameters from the feature extraction unit 5, and the acoustic model in which the function value is maximized. Is output as a speech recognition result.

【００７１】以上のように、観測ベクトルａとしての音
声データが、その特徴量の空間である特徴ベクトル空間
における分布を表す特徴分布パラメータＺに変換される
ので、その特徴分布パラメータは、音声データに含まれ
るノイズの分布特性を考慮したものとなっており、ま
た、無音を識別（認識）するための無音音響モデルに対
応する識別関数が、発話直前のノイズ観測区間Ｔｎの音
声データに基づいて更新されているので、音声認識率を
大きく向上させることが可能となる。As described above, the speech data as the observation vector a is converted into the feature distribution parameter Z representing the distribution in the feature vector space which is the space of the feature quantity, and the feature distribution parameter is converted into the speech data. The distribution function of the noise included is taken into consideration, and the identification function corresponding to the silent acoustic model for identifying (recognizing) silence is updated based on the voice data of the noise observation section Tn immediately before the utterance. Therefore, it is possible to greatly improve the speech recognition rate.

【００７２】次に、図６は、発話スイッチ４がオンとさ
れてから発話が開始されるまでの無音区間Ｔｓ（図２）
を変化させたときの音声認識率の変化を測定した実験
（シミュレーション）の結果を示している。Next, FIG. 6 shows a silent period Ts from when the utterance switch 4 is turned on until the start of utterance (FIG. 2).
4 shows the results of an experiment (simulation) in which a change in the speech recognition rate when the value was changed was measured.

【００７３】なお、図６において、曲線ａは無音音響モ
デルを補正しない（無音音響モデルの適応を行わない）
従来の方法による結果を、曲線ｂは第１の方法による結
果を、曲線ｃは第２の方法による結果を、曲線ｄは第３
の方法による結果を、曲線ｅは、第４の方法による結果
を、それぞれ示している。In FIG. 6, the curve a does not correct the silent acoustic model (no adaptation of the silent acoustic model).
Curve b shows the result of the first method, curve c shows the result of the second method, and curve d shows the result of the third method.
The curve e shows the result by the fourth method, and the curve e shows the result by the fourth method.

【００７４】実験の条件は、以下の通りである。即ち、
認識に用いた音声データは、高速道路を走行中の車内で
集音されたものである。ノイズ観測区間Ｔｎは、２０フ
レームで約０．２秒である。無音区間Ｔｓは、０．０５
秒、０．１秒、０．２秒、０．３秒、０．５秒とした。
音声データの特徴抽出においては、MFCC(Mel-Frequency
Cepstral Coefficients)ドメインで分析を実施した
（ＭＦＣＣ分析により、特徴量を得た）。認識の対象と
する音声の発話者は、男女４人ずつ計８人であり、一人
当たり３０３個の単語を、個別に発話してもらった。認
識を行った単語数は、日本語の５０００単語である。音
響モデルは、HMMであり、学習用に用意した音声データ
を用いて予め学習を行った。音声認識においては、Vite
rbiサーチ法を用い、そのビーム幅は３０００とした。The experimental conditions are as follows. That is,
The voice data used for recognition is data collected in a vehicle traveling on a highway. The noise observation section Tn is about 0.2 seconds in 20 frames. The silent section Ts is 0.05
Seconds, 0.1 seconds, 0.2 seconds, 0.3 seconds, and 0.5 seconds.
MFCC (Mel-Frequency)
Analysis was performed in the Cepstral Coefficients domain (features were obtained by MFCC analysis). A total of eight speakers, four men and women, were to be recognized, and 303 words were uttered individually per person. The number of recognized words is 5000 words in Japanese. The acoustic model was an HMM, and learning was performed in advance using voice data prepared for learning. For voice recognition, Vite
The beam width was set to 3000 using the rbi search method.

【００７５】なお、第１、第２、および第４の方法にお
いては、係数ａを１．０とし、係数ｂを０．１とした。
第３の方法においては、係数ａおよびｂのいずれも、
１．０とした。In the first, second and fourth methods, the coefficient a was set to 1.0 and the coefficient b was set to 0.1.
In a third method, both coefficients a and b are:
1.0.

【００７６】図６から明らかなように、従来の方法（曲
線ａ）では、無音区間Ｔｓが長くなるのに伴って音声認
識率が著しく低下しているが、第１乃至４の方法（曲線
ｂ乃至ｅ）では、無音区間Ｔｓが長くなっても、音声認
識率は、わずかしか低下しない。従って、無音音響モデ
ルの適応を行うことにより、無音区間Ｔｓが変化して
も、認識性能を維持することが可能である。As is clear from FIG. 6, in the conventional method (curve a), the speech recognition rate is remarkably reduced as the silent section Ts becomes longer, but the first to fourth methods (curve b) are used. In steps (e) to (e), even when the silent section Ts becomes longer, the speech recognition rate decreases only slightly. Accordingly, by adapting the silent acoustic model, it is possible to maintain the recognition performance even when the silent section Ts changes.

【００７７】なお、上述の第１乃至第４のいずれの方法
においても、無音音響モデルの正規分布Ｇ（μ_s，
σ_s ²）を規定する平均値μ_sは、無音特徴分布Ｇ（μ_i，
σ_i ²）の平均値μ_iの平均値となる。従って、例えば、
いま、無音特徴分布Ｇ（μ_i，σ_i ²）の平均値μ_iの平均
値を、μと表すとともに、第１乃至第４の方法によって
求められる無音音響モデルの正規分布を、それぞれ、Ｇ
_s1（μ，σ_s1 ²），Ｇ_s2（μ，σ_s2 ²），Ｇ_s3（μ，σ_s3
²），Ｇ_s4（μ，σ_s4 ²）と表すと、これらは、図７に示
すように、特徴空間において、平均値μを中心（重心）
とする分布となる。In each of the first to fourth methods described above, the normal distribution G (μ _s ,
The average value μ _s defining σ _s ² ) is a silent feature distribution G (μ _i ,
σ _i ² ) is the average value of μ _i . So, for example,
Now, the average value of the average value μ _i of the silent feature distribution G (μ _i , σ _i ² ) is represented as μ, and the normal distribution of the silent acoustic model obtained by the first to fourth methods is G, respectively.
_{_{^{s1 (μ, σ s1 2)}}} , G s2 (μ, σ s2 2), G s3 (μ, σ s3
² ) and G _s4 (μ, σ _s4 ² ), these are centered on the average value μ (centroid) in the feature space, as shown in FIG.
Distribution.

【００７８】ところで、無音特徴分布Ｇ（μ_i，σ_i ²）
に基づく、上述の第１乃至第４の方法による無音音響モ
デルの適応は、写像Ｖを用いて、次の式（３８）で定義
することができる。なお、以下、適宜、Ｇ（μ_i，
σ_i ²）をＧ_iと、Ｇ（μ_s，σ_s ²）をＧ_sと、それぞれ記
述する。By the way, the silent feature distribution G (μ _i , σ _i ² )
The adaptation of the silent acoustic model according to the above-described first to fourth methods based on the above can be defined by the following equation (38) using the mapping V. Hereinafter, G (μ _i ,
σ _i ² ) is described as G _i , and G (μ _s , σ _s ² ) is described as G _s .

【００７９】[0079]

【数２９】 (Equation 29)

【００８０】また、ここでは、無音音響モデルＧ_sとし
て、正規分布を仮定しており、正規分布は、平均値と分
散で規定されるから、無音音響モデルＧ_sの正規分布を
規定する平均値と分散を、上述のように、μ_sとσ_s ²で
表せば、式（３８）の定義は、平均値と分散の写像Ｖ_μ
とＶ_σ ₂とをそれぞれ用いて、式（３９）および（４
０）で表すこともできる。[0080] Further, here, as silence acoustic model G _s, and assuming a normal distribution, normal distribution, since is defined by the mean value and variance, the mean value for defining the normal distribution of the silence acoustic model G _s And variance are represented by μ _s and σ _s ² , as described above, the definition of equation (38) defines the average and variance mapping V _μ
And V _σ ₂ , respectively, using equations (39) and (4)
0).

【００８１】[0081]

【数３０】 [Equation 30]

【００８２】上述の写像Ｖ（Ｖ_μおよびＶ_σ ₂）で表さ
れる第１乃至第４の方法では、ノイズ観測区間Ｔｎ（図
２）におけるＭフレームそれぞれから得られる時系列の
無音特徴分布Ｇ₁，Ｇ₂，・・・，Ｇ_Mを平等に取り扱っ
ている。In the first to fourth methods represented by the above-described mappings V (V _μ and V _σ ₂ ), the time-series silence feature distribution G obtained from each of the M frames in the noise observation section Tn (FIG. 2). _1, G _2, ···, are handled equally G _M.

【００８３】しかしながら、音声認識区間における環境
ノイズは、厳密には、音声認識区間の直前のノイズ観測
区間Ｔｎにおける環境ノイズと同一ではなく、さらに、
一般には、ノイズ観測区間Ｔｎにおける環境ノイズは、
音声認識区間（の開始時刻ｔ _c）から離れるほど、音声
認識区間における環境ノイズとは異なるものとなると推
測される。However, the environment in the speech recognition section
Strictly speaking, the noise observation just before the speech recognition section
It is not the same as the environmental noise in the section Tn.
Generally, environmental noise in the noise observation section Tn is:
Start time t of the speech recognition section ( _c)
It is expected that this will be different from the environmental noise in the recognition section.
Measured.

【００８４】従って、ノイズ観測区間Ｔｎ（図２）にお
けるＭフレームそれぞれから得られる時系列の無音特徴
分布Ｇ₁，Ｇ₂，・・・，Ｇ_Mは、平等に扱うのではな
く、音声認識区間に近いものほど重みをおいて扱うべき
であり（音声認識区間から遠いものほど重みをおかずに
扱うべきであり）、そのようにすることで、音声認識精
度をより向上させる無音音響モデルの適応（補正および
更新）が可能となる。Therefore, the time-series silent feature distributions G ₁ , G ₂ ,..., G _M obtained from each of the M frames in the noise observation section Tn (FIG. 2) are not treated equally, but are not treated equally. Should be treated with weights closer to (the ones farther from the speech recognition section should be treated without weight), and by doing so, adaptation of a silent acoustic model that further improves speech recognition accuracy ( Correction and update) are possible.

【００８５】そこで、ノイズ観測区間Ｔｎにおいて得ら
れる無音特徴分布Ｇ₁，Ｇ₂，・・・，Ｇ_Mについて、そ
の新しさ（ここでは、音声認識区間への近さに相当す
る）を表す新鮮度を導入することとし、この新鮮度を考
慮して、無音音響モデルの適応を行う方法について説明
する。[0085] Therefore, the silence feature obtained in the noise observation interval Tn distribution G _1, G _2, · · ·, for G _M, the freshness (here corresponds to proximity to the speech recognition section) fresh representing the A method of adapting a silent acoustic model in consideration of the freshness will be described.

【００８６】図８は、新鮮度を考慮して、無音音響モデ
ルの適応を行う、図１の無音音響モデル補正部７の構成
例を示している。FIG. 8 shows an example of the structure of the silent acoustic model correction unit 7 shown in FIG. 1 for adapting a silent acoustic model in consideration of freshness.

【００８７】新鮮度関数記憶部３１は、上述したような
新鮮度を表す関数である新鮮度関数（を規定するパラメ
ータ）を記憶している。The freshness function storage section 31 stores a freshness function (a parameter defining the above) which is a function representing the freshness as described above.

【００８８】補正部３２には、ノイズ観測区間抽出部３
が出力する、ノイズ観測区間ＴｍとＴｎにおける音声デ
ータ（ノイズ）としての観測ベクトルの系列（ここで
は、２Ｍフレームの音声データ）が入力されるようにな
っており、補正部３２は、この観測ベクトルから、無音
特徴分布Ｇ₁，Ｇ₂，・・・，Ｇ_Mを得て、これらと、新
鮮度関数記憶部３１に記憶されている新鮮度関数に基づ
いて、無音音響モデルの適応を行う。The correction section 32 includes a noise observation section extraction section 3
Is output as a series of observation vectors (here, 2M-frame audio data) as audio data (noise) in the noise observation sections Tm and Tn, and the correction unit 32 from silence feature distribution G _1, G _2, · · ·, to obtain G _M, and these, on the basis of the freshness function stored in freshness function storage unit 31 performs adaptive silence acoustic model.

【００８９】ここで、無音特徴分布Ｇ₁，Ｇ₂，・・・，
Ｇ_Mは、ノイズ観測区間ＴｎにおけるＭフレームそれぞ
れで観測される離散値であり、無音音響モデル補正部７
が、離散値を処理するシステムであれば、離散値である
無音特徴分布Ｇ₁，Ｇ₂，・・・，Ｇ_Mをそのまま用いる
ことができる。しかしながら、無音音響モデル補正部７
が、連続値を処理するシステムである場合には、例え
ば、図９に示すように、離散値である無音特徴分布
Ｇ₁，Ｇ₂，・・・，Ｇ_Mを、連続変換器で連続値に変換
してから、無音音響モデル補正部７で処理する必要があ
る。離散値を連続値に変換する方法としては、例えば、
スプライン関数(Spline Function)によって近似を行う
方法がある。Here, the silent feature distributions G ₁ , G ₂ ,.
G _M is a discrete value observed in each of the M frames in the noise observation section Tn.
But if the system for processing discrete values, the silence feature distribution G _1, G ₂ is a discrete value, ..., it can be used as it is G _M. However, the silent acoustic model correction unit 7
But if a system for processing a continuous value, for example, as shown in FIG. 9, the silence feature distribution G _1, G ₂ is a discrete value, ..., a G _M, continuous values in continuous transducers , And then need to be processed by the silent acoustic model correction unit 7. As a method of converting a discrete value into a continuous value, for example,
There is a method of performing approximation using a spline function.

【００９０】なお、離散値とは、ある有限の観測区間に
おいて、離散的な時刻で観測される有限個の観測値であ
り、連続値とは、ある有限（または無限）の観測区間の
任意の時刻で観測される無限個の観測値であり、ある関
数によって表現される。Note that a discrete value is a finite number of observation values observed at discrete times in a certain finite observation section, and a continuous value is an arbitrary value in a certain finite (or infinite) observation section. It is an infinite number of observations observed at time, and is represented by a certain function.

【００９１】無音音響モデルの適応に用いる無音特徴分
布が離散値である場合には、新鮮度関数も離散値の関数
となり、無音特徴分布が連続値である場合には、新鮮度
関数も連続値の関数となる。When the silent feature distribution used for adapting the silent acoustic model is a discrete value, the freshness function is also a discrete value function. When the silent feature distribution is a continuous value, the freshness function is also a continuous value. Is a function of

【００９２】次に、新鮮度関数、およびそれを用いた無
音音響モデルの適応について、新鮮度関数が離散値であ
る場合と、連続値である場合とに分けて説明する。Next, the freshness function and the adaptation of the silent acoustic model using the function will be described separately for a case where the freshness function is a discrete value and a case where the freshness function is a continuous value.

【００９３】まず、新鮮度関数ｇ（ｔ）は、例えば、式
（４１）乃至（４３）に示すように定義することができ
る。First, the freshness function g (t) can be defined, for example, as shown in equations (41) to (43).

【００９４】[0094]

【数３１】但し、Ω_obsは、無音特徴分布の観測区間を表し、本実
施の形態では、ノイズ観測区間Ｔｎに相当する。(Equation 31) Here, Ω _obs represents an observation section of the silent feature distribution, and corresponds to a noise observation section Tn in the present embodiment.

【００９５】式（４１）により、新鮮度関数ｇ（ｔ）
は、観測区間Ω_obs以外では０となる。また、式（４
２）により、新鮮度関数ｇ（ｔ）は、観測区間Ω_obsに
おいて、時間の経過とともに増加するか、または変化し
ない関数（本明細書において、単調増加関数という）で
あり、従って、新鮮度関数ｇ（ｔ）は、基本的に、音声
認識区間（図２）に近づくほど、大きな値となる。さら
に、式（４３）により、新鮮度関数ｇ（ｔ）は、観測区
間Ω_obsに亘って積分した場合に、その積分値が１とな
る関数である。式（４１）乃至（４３）から、新鮮度関
数ｇ（ｔ）は、例えば、図１０に示すようになる。From equation (41), the freshness function g (t)
Is 0 outside the observation interval Ω _obs . Equation (4)
According to 2), the freshness function g (t) is a function that increases or does not change with the passage of time in the observation interval Ω _obs (referred to as a monotonically increasing function in the present specification). g (t) basically becomes larger as it approaches the voice recognition section (FIG. 2). Further, according to equation (43), the freshness function g (t) is a function whose integral value is 1 when integrated over the observation interval Ω _obs . From equations (41) to (43), the freshness function g (t) is, for example, as shown in FIG.

【００９６】ここで、本実施の形態では、新鮮度関数ｇ
（ｔ）は、後述するように、無音特徴分布に乗算する乗
数として用いられる。従って、新鮮度関数ｇ（ｔ）は、
その値が正または負のときには、それが乗数として乗算
される無音特徴分布に対する重みとして作用する。ま
た、新鮮度関数ｇ（ｔ）は、その値が０のときは、それ
が乗数として乗算される無音特徴分布を無効とし、無音
音響モデルの適応に影響を与えないように作用する。Here, in the present embodiment, the freshness function g
(T) is used as a multiplier for multiplying the silent feature distribution as described later. Therefore, the freshness function g (t) is
When its value is positive or negative, it acts as a weight for the silent feature distribution that is multiplied as a multiplier. When the value of the freshness function g (t) is 0, the silence feature distribution multiplied by the multiplier is invalidated, and acts so as not to affect the adaptation of the silence acoustic model.

【００９７】図８の補正部３２では、以上のような新鮮
度関数ｇ（ｔ）と、無音特徴分布Ｇ ₁，Ｇ₂，・・・，Ｇ
_Mとを用いて、基本的には、式（４４）にしたがって、
適応後の無音音響モデルＧ_sが求められる。The correction unit 32 shown in FIG.
Degree function g (t) and silence feature distribution G ₁, G_Two, ..., G
_MAnd basically, according to equation (44),
Silent acoustic model G after adaptation_sIs required.

【００９８】[0098]

【数３２】 (Equation 32)

【００９９】式（４４）によれば、無音特徴分布が、音
声認識区間に近いものほど重みをおいて扱われて、無音
音響モデルの適応が行われ、その結果、音声認識精度を
より向上させることが可能となる。According to equation (44), the closer the silence feature distribution is to the speech recognition section, the higher the weight is treated, and the silence acoustic model is adapted. As a result, the speech recognition accuracy is further improved. It becomes possible.

【０１００】次に、新鮮度関数Ｆ（ｘ）の具体例と、そ
れを用いた無音音響モデルの適応について説明する。な
お、以下では、無音特徴分布の観測区間Ω_obs（本実施
の形態では、ノイズ観測区間Ｔｎ）を、ｔが０からｔ_M
までの区間とする。また、新鮮度関数ｇ（ｔ）の関数値
としては、観測区間Ω_obsのみの値を考えることとする
（式（４１）に示したように、新鮮度関数ｇ（ｔ）の関
数値は、観測区間Ω_ob _s以外では０であるので、以下で
は、その点については言及しない）。Next, a specific example of the freshness function F (x) and adaptation of the silent acoustic model using the function will be described. In the following, the observation interval Ω _obs (in the present embodiment, the noise observation interval Tn) of the silent feature distribution is defined as t from 0 to t _M
The section up to. As a function value of the freshness function g (t), a value of only the observation section Ω _obs is considered (as shown in Expression (41), the function value of the freshness function g (t) is Since it is 0 outside the observation section Ω _ob _s , that point will not be described below).

【０１０１】新鮮度関数ｇ（ｔ）としては、例えば、線
形の関数を用いることができ、関数値として連続値をと
る場合には、新鮮度関数ｇ（ｔ）は、例えば、式（４
５）で表される。As the freshness function g (t), for example, a linear function can be used. When a continuous value is taken as the function value, the freshness function g (t) can be expressed by, for example, the equation (4)
5).

【０１０２】[0102]

【数３３】 [Equation 33]

【０１０３】式（４５）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義か
ら、２／ｔ_M ²となる。従って、式（４５）の新鮮度関数
ｇ（ｔ）は、式（４６）で表されることになる。Α in Equation (45) is a predetermined constant, and this constant α is 2 / t _M ² from the definition of the freshness function in Equation (43). Therefore, the freshness function g (t) in Expression (45) is expressed by Expression (46).

【０１０４】[0104]

【数３４】 (Equation 34)

【０１０５】ここで、式（４６）で表される新鮮度関数
ｇ（ｔ）を、図１１に示す。Here, the freshness function g (t) represented by the equation (46) is shown in FIG.

【０１０６】この場合、適応後の無音音響モデルＧ
_sは、式（４７）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (47).

【０１０７】[0107]

【数３５】なお、Ｇ_x（μ_x，σ_x ²）は、時刻ｘにおける無音特徴分
布を表し、μ_xとσ_x ²は、それぞれ、その無音特徴分布
を表す正規分布を規定する平均値と分散である。(Equation 35) Note that G _x (μ _x , σ _x ² ) represents a silent feature distribution at time x, and μ _x and σ _x ² are an average value and a variance defining a normal distribution representing the silent feature distribution, respectively. .

【０１０８】次に、新鮮度関数Ｆ（ｘ）としては、例え
ば、線形の、離散値をとる関数を用いることができ、こ
の場合、新鮮度関数Ｆ（ｘ）は、例えば、式（４８）で
表される。Next, as the freshness function F (x), for example, a function taking a linear, discrete value can be used. In this case, the freshness function F (x) can be obtained by, for example, the equation (48) It is represented by

【０１０９】[0109]

【数３６】 [Equation 36]

【０１１０】式（４８）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義か
ら、２／（ｔ_M（ｔ_M＋１））となる。従って、式（４
８）の新鮮度関数ｇ（ｔ）は、式（４９）で表されるこ
とになる。Α in Expression (48) is a predetermined constant, and this constant α is 2 / (t _M (t _M +1)) from the definition of the freshness function in Expression (43). Therefore, equation (4)
The freshness function g (t) of 8) is expressed by Expression (49).

【０１１１】[0111]

【数３７】 (37)

【０１１２】ここで、式（４９）で表される新鮮度関数
ｇ（ｔ）を、図１２に示す。Here, the freshness function g (t) represented by the equation (49) is shown in FIG.

【０１１３】この場合、適応後の無音音響モデルＧ
_sは、式（５０）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (50).

【０１１４】[0114]

【数３８】なお、Ｇｔは、サンプル点（サンプル時刻）ｔにおける
無音特徴分布を表す。(38) Gt represents a silent feature distribution at a sample point (sample time) t.

【０１１５】次に、新鮮度関数ｇ（ｔ）としては、例え
ば、指数関数や、高次２項関数、対数関数等の非線形の
関数を用いることができる。新鮮度関数ｇ（ｔ）とし
て、例えば、連続値をとる、高次関数としての２次関数
を用いる場合には、新鮮度関数ｇ（ｔ）は、例えば、式
（５１）で表される。Next, as the freshness function g (t), for example, a non-linear function such as an exponential function, a higher-order binomial function, or a logarithmic function can be used. For example, when a quadratic function as a higher-order function that takes a continuous value is used as the freshness function g (t), the freshness function g (t) is expressed by, for example, Expression (51).

【０１１６】[0116]

【数３９】 [Equation 39]

【０１１７】式（５１）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義か
ら、３／ｔ_M ³となる。従って、式（５１）の新鮮度関数
ｇ（ｔ）は、式（５２）で表されることになる。Α in Expression (51) is a predetermined constant, and this constant α is 3 / t _M ³ from the definition of the freshness function in Expression (43). Therefore, the freshness function g (t) of Expression (51) is expressed by Expression (52).

【０１１８】[0118]

【数４０】 (Equation 40)

【０１１９】ここで、式（５２）で表される新鮮度関数
ｇ（ｔ）を、図１３に示す。Here, the freshness function g (t) represented by the equation (52) is shown in FIG.

【０１２０】この場合、適応後の無音音響モデルＧ
_sは、式（５３）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (53).

【０１２１】[0121]

【数４１】 [Equation 41]

【０１２２】次に、新鮮度関数ｇ（ｔ）としては、例え
ば、離散値をとる、高次関数としての２次関数を用いる
ことができ、この場合、新鮮度関数ｇ（ｔ）は、例え
ば、式（５４）で表される。Next, as the freshness function g (t), for example, a quadratic function as a higher-order function taking a discrete value can be used. In this case, the freshness function g (t) is, for example, , (54).

【０１２３】[0123]

【数４２】 (Equation 42)

【０１２４】式（５４）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義か
ら、６／（ｔ_M（ｔ_M＋１）（２ｔ_M＋１））となる。従
って、式（５４）の新鮮度関数ｇ（ｔ）は、式（５５）
で表されることになる。In the equation (54), α is a predetermined constant. The constant α is 6 / (t _M (t _M +1) (2t _M +1)) from the definition of the freshness function in the equation (43). Becomes Therefore, the freshness function g (t) of the equation (54) is obtained by the equation (55).
Will be represented by

【０１２５】[0125]

【数４３】 [Equation 43]

【０１２６】ここで、式（５５）で表される新鮮度関数
ｇ（ｔ）を、図１４に示す。Here, the freshness function g (t) represented by the equation (55) is shown in FIG.

【０１２７】この場合、適応後の無音音響モデルＧ
_sは、式（５６）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (56).

【０１２８】[0128]

【数４４】 [Equation 44]

【０１２９】次に、新鮮度関数ｇ（ｔ）として、例え
ば、連続値をとる対数関数を用いる場合には、新鮮度関
数ｇ（ｔ）は、例えば、式（５７）で表される。Next, when a logarithmic function having a continuous value is used as the freshness function g (t), the freshness function g (t) is expressed by, for example, equation (57).

【０１３０】[0130]

【数４５】 [Equation 45]

【０１３１】式（５７）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義か
ら、１／（（ｔ_M＋１）ｌｏｇ（ｔ_M＋１）−ｔ_M）とな
る。従って、式（５７）の新鮮度関数ｇ（ｔ）は、式
（５８）で表されることになる。In the equation (57), α is a predetermined constant. From the definition of the freshness function in the equation (43), the constant α is 1 / ((t _M +1) log (t _M +1) -t _M ). Therefore, the freshness function g (t) of Expression (57) is expressed by Expression (58).

【０１３２】[0132]

【数４６】 [Equation 46]

【０１３３】ここで、式（５８）で表される新鮮度関数
ｇ（ｔ）を、図１５に示す。Here, the freshness function g (t) represented by the equation (58) is shown in FIG.

【０１３４】この場合、適応後の無音音響モデルＧ
_sは、式（５９）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (59).

【０１３５】[0135]

【数４７】 [Equation 47]

【０１３６】次に、新鮮度関数ｇ（ｔ）としては、例え
ば、離散値をとる対数関数を用いることができ、この場
合、新鮮度関数ｇ（ｔ）は、例えば、式（６０）で表さ
れる。Next, as the freshness function g (t), for example, a logarithmic function having a discrete value can be used. In this case, the freshness function g (t) can be expressed by, for example, an equation (60). Is done.

【０１３７】[0137]

【数４８】 [Equation 48]

【０１３８】式（６０）におけるαは、所定の定数であ
り、この定数αは、式（４３）の新鮮度関数の定義から
求められ、従って、式（６０）の新鮮度関数ｇ（ｔ）
は、式（６１）で表されることになる。Α in Expression (60) is a predetermined constant, and this constant α is obtained from the definition of the freshness function in Expression (43). Therefore, the freshness function g (t) in Expression (60) is obtained.
Is represented by Expression (61).

【０１３９】[0139]

【数４９】 [Equation 49]

【０１４０】ここで、式（６１）で表される新鮮度関数
ｇ（ｔ）を、図１６に示す。Here, the freshness function g (t) represented by the equation (61) is shown in FIG.

【０１４１】この場合、適応後の無音音響モデルＧ
_sは、式（６２）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (62).

【０１４２】[0142]

【数５０】 [Equation 50]

【０１４３】次に、新鮮度関数ｇ（ｔ）として、例え
ば、連続値をとる、一般的な高次関数を用いる場合に
は、新鮮度関数ｇ（ｔ）は、例えば、式（６３）で表さ
れる。Next, when a general higher-order function that takes a continuous value, for example, is used as the freshness function g (t), the freshness function g (t) is expressed by, for example, the equation (63). expressed.

【０１４４】[0144]

【数５１】 (Equation 51)

【０１４５】式（６３）におけるαは、所定の定数であ
り、また、ｐによって、新鮮度関数ｇ（ｔ）の次数が決
まる。Α in equation (63) is a predetermined constant, and the order of the freshness function g (t) is determined by p.

【０１４６】定数αは、式（４３）の新鮮度関数の定義
から求めることができ、従って、式（６３）の新鮮度関
数ｇ（ｔ）は、式（６４）で表されることになる。The constant α can be obtained from the definition of the freshness function in the equation (43). Therefore, the freshness function g (t) in the equation (63) is represented by the equation (64). .

【０１４７】[0147]

【数５２】 (Equation 52)

【０１４８】この場合、適応後の無音音響モデルＧ
_sは、式（６５）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (65).

【０１４９】[0149]

【数５３】 (Equation 53)

【０１５０】なお、式（６４）において、例えば、ｐが
１または２の場合は、新鮮度関数ｇ（ｔ）は、連続値を
とる、線形の関数または２次関数となり、式（４６）ま
たは（５２）に示したように、それぞれ表される。In the equation (64), for example, when p is 1 or 2, the freshness function g (t) is a linear function or a quadratic function that takes a continuous value. Each is represented as shown in (52).

【０１５１】また、式（６４）において、例えば、ｐが
３の場合は、新鮮度関数ｇ（ｔ）は、連続値をとる３次
関数となり、式（６６）に示すように表される。In equation (64), for example, when p is 3, the freshness function g (t) is a cubic function having a continuous value and is expressed as shown in equation (66).

【０１５２】[0152]

【数５４】 (Equation 54)

【０１５３】さらに、式（６４）において、例えば、ｐ
が４の場合は、新鮮度関数ｇ（ｔ）は、連続値をとる４
次関数となり、式（６７）に示すように表される。Further, in the equation (64), for example, p
Is 4, the freshness function g (t) takes a continuous value 4
It becomes the following function and is expressed as shown in equation (67).

【０１５４】[0154]

【数５５】 [Equation 55]

【０１５５】次に、新鮮度関数ｇ（ｔ）として、例え
ば、離散値をとる、一般的な高次関数を用いる場合に
は、新鮮度関数ｇ（ｔ）は、例えば、式（６８）で表さ
れる。Next, when a general higher-order function that takes a discrete value, for example, is used as the freshness function g (t), the freshness function g (t) can be expressed by, for example, the equation (68). expressed.

【０１５６】[0156]

【数５６】 [Equation 56]

【０１５７】式（６８）におけるαは、所定の定数であ
り、また、ｐによって、新鮮度関数Ｆ（ｘ）の次数が決
まる。Α in Expression (68) is a predetermined constant, and the order of the freshness function F (x) is determined by p.

【０１５８】定数αは、式（４３）の新鮮度関数の定義
から求めることができ、従って、式（６８）の新鮮度関
数ｇ（ｔ）は、式（６９）で表されることになる。The constant α can be obtained from the definition of the freshness function in the equation (43). Therefore, the freshness function g (t) in the equation (68) is represented by the equation (69). .

【０１５９】[0159]

【数５７】 [Equation 57]

【０１６０】この場合、適応後の無音音響モデルＧ
_sは、式（７０）にしたがって求められる。In this case, the silence acoustic model G after the adaptation
_s is obtained according to equation (70).

【０１６１】[0161]

【数５８】 [Equation 58]

【０１６２】なお、式（６９）において、例えば、ｐが
１または２の場合は、新鮮度関数ｇ（ｔ）は、離散値を
とる、線形の関数または２次関数となり、式（４９）ま
たは（５５）に示したように、それぞれ表される。In equation (69), for example, when p is 1 or 2, the freshness function g (t) is a linear function or a quadratic function that takes a discrete value. Each is represented as shown in (55).

【０１６３】また、式（６９）において、例えば、ｐが
３の場合は、新鮮度関数ｇ（ｔ）は、離散値値をとる３
次関数となり、式（７１）に示すように表される。In equation (69), for example, if p is 3, the freshness function g (t) takes a discrete value 3
It becomes the following function and is expressed as shown in Expression (71).

【０１６４】[0164]

【数５９】 [Equation 59]

【０１６５】さらに、式（６９）において、例えば、ｐ
が４の場合は、新鮮度関数ｇ（ｔ）は、離散値をとる４
次関数となり、式（７２）に示すように表される。Further, in the equation (69), for example, p
Is 4, the freshness function g (t) takes a discrete value 4
It becomes the following function and is expressed as shown in equation (72).

【０１６６】[0166]

【数６０】 [Equation 60]

【０１６７】なお、新鮮度関数ｇ（ｔ）の概念は、無音
音響モデルの適応の他、ノイズ環境下における話者適用
や、無音音響モデル以外の音響モデルの適応にも応用す
ることができる。さらに、音声検出や、非定常ノイズ検
出にも応用することが可能である。また、音響信号処理
や、画像信号処理、通信の分野でも、新鮮度関数Ｆ
（ｘ）の概念を用いることで、環境ノイズに対するロバ
スト性（頑健性）を向上させ、システム性能の改善を図
ることが可能となる。Note that the concept of the freshness function g (t) can be applied not only to the adaptation of a silent acoustic model, but also to the application of a speaker under a noise environment and the adaptation of acoustic models other than the silent acoustic model. Furthermore, it can be applied to voice detection and non-stationary noise detection. Also, in the field of sound signal processing, image signal processing, and communication, the freshness function F
By using the concept of (x), it is possible to improve robustness (robustness) against environmental noise and improve system performance.

【０１６８】以上、本発明を適用した音声認識装置につ
いて説明したが、このような音声認識装置は、例えば、
音声入力可能なカーナビゲーション装置、その他各種の
装置に適用可能である。The speech recognition apparatus to which the present invention is applied has been described above. Such a speech recognition apparatus, for example,
The present invention is applicable to a car navigation device capable of voice input and other various devices.

【０１６９】なお、本実施の形態では、ノイズの分布特
性を考慮した特徴分布パラメータを求めるようにした
が、このノイズには、例えば、発話を行う環境下におけ
る外部からのノイズの他、例えば、電話回線その他の通
信回線を介して送信されてくる音声の認識を行う場合に
は、その通信回線の特性なども含まれる。In the present embodiment, the characteristic distribution parameter is determined in consideration of the noise distribution characteristic. However, this noise includes, for example, external noise in an environment where speech is made, and When recognizing voice transmitted via a telephone line or other communication lines, the characteristics of the communication line are also included.

【０１７０】また、本発明は、音声認識の他、画像認識
その他のパターン認識を行う場合にも適用可能である。The present invention can be applied to the case of performing image recognition and other pattern recognition in addition to voice recognition.

【０１７１】さらに、本実施の形態では、特徴空間にお
ける分布として表される無音特徴分布を用いて、無音音
響モデルの適応を行うようにしたが、無音音響モデルの
適応は、特徴空間における点として表されるノイズの特
徴量を用いて行うことも可能である。Further, in the present embodiment, the silence acoustic model is adapted by using the silence feature distribution represented as the distribution in the feature space. However, the adaptation of the silence acoustic model is performed as points in the feature space. It is also possible to perform this by using the feature amount of the expressed noise.

【０１７２】また、本発明は、無音音響モデル以外の音
響モデルの適応に用いることも可能である。The present invention can also be used for adapting acoustic models other than the silent acoustic model.

【０１７３】次に、上述した一連の処理は、ハードウェ
アにより行うこともできるし、ソフトウェアにより行う
こともできる。一連の処理をソフトウェアによって行う
場合には、そのソフトウェアを構成するプログラムが、
汎用のコンピュータ等にインストールされる。Next, the above-described series of processing can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is
Installed on a general-purpose computer.

【０１７４】そこで、図１７は、上述した一連の処理を
実行するプログラムがインストールされるコンピュータ
の一実施の形態の構成例を示している。FIG. 17 shows an example of the configuration of an embodiment of a computer in which a program for executing the above-described series of processing is installed.

【０１７５】プログラムは、コンピュータに内蔵されて
いる記録媒体としてのハードディスク１０５やＲＯＭ１
０３に予め記録しておくことができる。The program is stored in a hard disk 105 or a ROM 1 as a recording medium built in the computer.
03 can be recorded in advance.

【０１７６】あるいはまた、プログラムは、フロッピー
（登録商標）ディスク、CD-ROM(Compact Disc Read Onl
y Memory)，MO(Magneto optical)ディスク，DVD(Digita
l Versatile Disc)、磁気ディスク、半導体メモリなど
のリムーバブル記録媒体１１１に、一時的あるいは永続
的に格納（記録）しておくことができる。このようなリ
ムーバブル記録媒体１１１は、いわゆるパッケージソフ
トウエアとして提供することができる。Alternatively, the program may be a floppy (registered trademark) disk, a CD-ROM (Compact Disc Read Onl
y Memory), MO (Magneto optical) disc, DVD (Digita
l Versatile Disc), a magnetic disk, a semiconductor memory, etc., can be temporarily or permanently stored (recorded) in a removable recording medium 111. Such a removable recording medium 111 can be provided as so-called package software.

【０１７７】なお、プログラムは、上述したようなリム
ーバブル記録媒体１１１からコンピュータにインストー
ルする他、ダウンロードサイトから、ディジタル衛星放
送用の人工衛星を介して、コンピュータに無線で転送し
たり、LAN(Local Area Network)、インターネットとい
ったネットワークを介して、コンピュータに有線で転送
し、コンピュータでは、そのようにして転送されてくる
プログラムを、通信部１０８で受信し、内蔵するハード
ディスク１０５にインストールすることができる。The program can be installed in the computer from the removable recording medium 111 as described above, can be wirelessly transferred from the download site to the computer via a digital satellite broadcasting artificial satellite, or can be transmitted to a LAN (Local Area Area Network). Network), the Internet, and the like, and can be transferred to a computer by wire. In the computer, the transferred program can be received by the communication unit 108 and installed on the built-in hard disk 105.

【０１７８】コンピュータは、CPU(Central Processing
Unit)１０２を内蔵している。CPU１０２には、バス１
０１を介して、入出力インタフェース１１０が接続され
ており、CPU１０２は、入出力インタフェース１１０を
介して、ユーザによって、キーボードやマウス等で構成
される入力部１０７が操作されることにより指令が入力
されると、それにしたがって、ROM(Read Only Memory)
１０３に格納されているプログラムを実行する。あるい
は、また、CPU１０２は、ハードディスク１０５に格納
されているプログラム、衛星若しくはネットワークから
転送され、通信部１０８で受信されてハードディスク１
０５にインストールされたプログラム、またはドライブ
１０９に装着されたリムーバブル記録媒体１１１から読
み出されてハードディスク１０５にインストールされた
プログラムを、RAM(Random Access Memory)１０４にロ
ードして実行する。これにより、CPU１０２は、上述し
たブロック図の構成により行われる処理を行う。そし
て、CPU１０２は、その処理結果を、必要に応じて、例
えば、入出力インタフェース１１０を介して、LCD(Liqu
id CryStal Display)やスピーカ等で構成される出力部
１０６から出力、あるいは、通信部１０８から送信、さ
らには、ハードディスク１０５に記録等させる。The computer has a CPU (Central Processing).
Unit) 102. The CPU 102 has a bus 1
01, an input / output interface 110 is connected. The CPU 102 receives a command via the input / output interface 110 by operating the input unit 107 including a keyboard, a mouse, and the like. Then, according to it, ROM (Read Only Memory)
The program stored in 103 is executed. Alternatively, the CPU 102 transmits the program stored in the hard disk 105, a satellite, or a network, receives the program by the communication unit 108, and
The program installed in the hard disk 105 is read from the removable recording medium 111 installed in the drive 109 and loaded into the RAM (Random Access Memory) 104 and executed. As a result, the CPU 102 performs processing performed by the configuration of the above-described block diagram. Then, the CPU 102 transmits the processing result to an LCD (Liquor
output from an output unit 106 composed of an id CryStal Display) or a speaker, or transmitted from the communication unit 108, and further recorded on the hard disk 105.

【０１７９】ここで、本明細書において、コンピュータ
に各種の処理を行わせるためのプログラムを記述する処
理ステップは、必ずしもフローチャートとして記載され
た順序に沿って時系列に処理する必要はなく、並列的あ
るいは個別に実行される処理（例えば、並列処理あるい
はオブジェクトによる処理）も含むものである。Here, in this specification, processing steps for writing a program for causing a computer to perform various processes do not necessarily have to be processed in chronological order in the order described in the flowchart, and may be performed in parallel. Alternatively, it also includes processing executed individually (for example, parallel processing or processing by an object).

【０１８０】また、プログラムは、１のコンピュータに
より処理されるものであっても良いし、複数のコンピュ
ータによって分散処理されるものであっても良い。さら
に、プログラムは、遠方のコンピュータに転送されて実
行されるものであっても良い。The program may be processed by one computer or may be processed by a plurality of computers in a distributed manner. Further, the program may be transferred to a remote computer and executed.

【０１８１】[0181]

【発明の効果】本発明のモデル適応装置およびモデル適
応方法、記録媒体、並びにパターン認識装置によれば、
所定区間における抽出データと、その抽出データの新し
さを表す新鮮度に基づいて、所定のモデルの適応が行わ
れる。従って、そのモデルを用いてパターン認識を行う
ことで、認識性能を向上させることが可能となる。According to the model adapting apparatus, the model adapting method, the recording medium, and the pattern recognition apparatus of the present invention,
A predetermined model is adapted based on extracted data in a predetermined section and freshness indicating the freshness of the extracted data. Therefore, by performing pattern recognition using the model, it is possible to improve recognition performance.

[Brief description of the drawings]

【図１】本発明を適用した音声認識装置の構成例を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a speech recognition device to which the present invention has been applied.

【図２】図１のノイズ観測区間抽出部３の動作を説明す
るための図である。FIG. 2 is a diagram for explaining an operation of a noise observation section extraction unit 3 of FIG.

【図３】図１の特徴抽出部５の詳細な構成例を示すブロ
ック図である。FIG. 3 is a block diagram illustrating a detailed configuration example of a feature extraction unit 5 of FIG. 1;

【図４】図１の音声認識部６の詳細な構成例を示すブロ
ック図である。FIG. 4 is a block diagram illustrating a detailed configuration example of a speech recognition unit 6 in FIG. 1;

【図５】ＨＭＭを示す図である。FIG. 5 is a diagram showing an HMM.

【図６】シミュレーション結果を示す図である。FIG. 6 is a diagram showing a simulation result.

【図７】無音音響モデルの正規分布を示す図である。FIG. 7 is a diagram showing a normal distribution of a silent acoustic model.

【図８】図１の無音音響モデル補正部７の構成例を示す
ブロック図である。FIG. 8 is a block diagram illustrating a configuration example of a silent acoustic model correction unit 7 in FIG. 1;

【図９】離散値を連続値に変換する様子を示す図であ
る。FIG. 9 is a diagram showing how discrete values are converted into continuous values.

【図１０】一般的な新鮮度関数ｇ（ｔ）を示す図であ
る。FIG. 10 is a diagram showing a general freshness function g (t).

【図１１】新鮮度関数ｇ（ｔ）の第１の例を示す図であ
る。FIG. 11 is a diagram illustrating a first example of a freshness function g (t).

【図１２】新鮮度関数ｇ（ｔ）の第２の例を示す図であ
る。FIG. 12 is a diagram illustrating a second example of the freshness function g (t).

【図１３】新鮮度関数ｇ（ｔ）の第３の例を示す図であ
る。FIG. 13 is a diagram illustrating a third example of the freshness function g (t).

【図１４】新鮮度関数ｇ（ｔ）の第４の例を示す図であ
る。FIG. 14 is a diagram illustrating a fourth example of the freshness function g (t).

【図１５】新鮮度関数ｇ（ｔ）の第５の例を示す図であ
る。FIG. 15 is a diagram illustrating a fifth example of the freshness function g (t).

【図１６】新鮮度関数ｇ（ｔ）の第６の例を示す図であ
る。FIG. 16 is a diagram illustrating a sixth example of the freshness function g (t).

【図１７】本発明を適用したコンピュータの一実施の形
態の構成例を示すブロック図である。And FIG. 17 is a block diagram illustrating a configuration example of a computer according to an embodiment of the present invention.

[Explanation of symbols]

１マイクロフォン，２フレーム化部，３ノイ
ズ観測区間抽出部，４発話スイッチ，５特徴抽出
部，６音声認識部，７無音音響モデル補正部，
３１新鮮度関数記憶部，３２補正部，１０１
バス，１０２ CPU，１０３ ROM，１０４ RA
M，１０５ハードディスク，１０６出力部，
１０７入力部，１０８通信部，１０９ドライ
ブ，１１０入出力インタフェース，１１１リムー
バブル記録媒体1 microphone, 2 framing unit, 3 noise observation section extraction unit, 4 speech switch, 5 feature extraction unit, 6 speech recognition unit, 7 silent acoustic model correction unit,
31 freshness function storage unit, 32 correction unit, 101
Bus, 102 CPU, 103 ROM, 104 RA
M, 105 hard disk, 106 output unit,
107 input unit, 108 communication unit, 109 drive, 110 input / output interface, 111 removable recording medium

Claims

[Claims]

1. A model adapting apparatus for adapting a model used for pattern recognition for classifying time-series input data into one of a predetermined number of models, comprising: Data extraction means for extracting the input data observed in the section and outputting the extracted data as extracted data; and adapting the predetermined model based on the extracted data in the predetermined section and freshness indicating the freshness of the extracted data. And a model adapting means for performing the following.

2. The model adaptation apparatus according to claim 1, wherein the pattern recognition is performed based on a feature distribution of the input data in a feature space.

3. The method according to claim 2, wherein the model adapting means adapts the predetermined model using a function whose value changes in accordance with a temporal position of the extracted data in the predetermined section as the freshness. The model adaptation apparatus according to claim 1, wherein

4. The model adaptation apparatus according to claim 3, wherein the function is a monotonically increasing function that increases with time.

5. The model adaptation apparatus according to claim 4, wherein the function is a linear or non-linear function.

6. The model adaptation apparatus according to claim 4, wherein said function takes a discrete value or a continuous value.

7. The model adaptation apparatus according to claim 4, wherein the function is a quadratic function or a higher-order function of third or higher order.

8. The apparatus according to claim 4, wherein the function is a logarithmic function.

9. The model adaptation apparatus according to claim 1, wherein the input data is audio data.

10. The model adaptation apparatus according to claim 9, wherein the predetermined model is an acoustic model representing noise in a section other than a speech section.

11. A model adaptation method for adapting a model used for pattern recognition for classifying time-series input data into one of a predetermined number of models, the method comprising: A data extraction step of extracting the input data observed in the section and outputting the extracted data as extracted data; and adapting the predetermined model based on the extracted data in the predetermined section and the freshness indicating the freshness of the extracted data. Performing a model adaptation step.

12. A recording medium storing a program for causing a computer to adapt the model used for pattern recognition for classifying time-series input data into one of a predetermined number of models. A data extraction step of extracting the input data observed in a predetermined section corresponding to a predetermined model and outputting the extracted data as extracted data; and extracting data in the predetermined section and freshness representing the freshness of the extracted data. And a model adaptation step of adapting the predetermined model based on the program.

13. A pattern recognition apparatus for classifying time-series input data into one of a predetermined number of models, wherein: a feature extraction unit for extracting a feature amount of the input data; A classifying unit that classifies a feature amount of the input data into one of the predetermined number of models; and extracts the input data observed in a predetermined section corresponding to a predetermined model. Data extracting means for outputting as extracted data, and model adapting means for adapting the predetermined model based on the extracted data in the predetermined section and freshness indicating the freshness of the extracted data. Characteristic pattern recognition device.