JP2014013324A

JP2014013324A - Sound model adaptation device, sound model adaptation method, and sound model adaptation program

Info

Publication number: JP2014013324A
Application number: JP2012150743A
Authority: JP
Inventors: Hideji Komeichi; 秀治古明地; Takayuki Arakawa; 隆行荒川; Takafumi Koshinaka; 孝文越仲
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-07-04
Filing date: 2012-07-04
Publication date: 2014-01-23
Anticipated expiration: 2032-07-04
Also published as: JP5966689B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound model adaptation device, a sound model adaptation method, and a sound model adaptation program capable of performing noise adaptation of a sound model in lower computations amount without degrading adaptation accuracy.SOLUTION: A first noise adaptation part 20-1 increases sound model, which is adapted for noise, and performs noise adaptation. A second noise adaptation part 20-2 performs noise adaptation using linear approximation. An adaptation system selection part 10 selects the first noise adaptation part 20-1 or the second noise adaptation part 20-2 based on statistics value of noise, which is adapted to a sound model and the sound model.

Description

本発明は、音響モデルを雑音に適応させるための音響モデル適応装置、音響モデル適応方法および音響モデル適応プログラムに関する。 The present invention relates to an acoustic model adaptation apparatus, an acoustic model adaptation method, and an acoustic model adaptation program for adapting an acoustic model to noise.

音声認識装置の性能は、実運用における雑音の影響によって著しく劣化するため、耐雑音手法が必要となる。性能劣化の原因は、音響モデル学習時に用いられた音声信号（以下、学習データと記す。)と、実運用で認識対象となる音声信号（以下、テストデータと記す。）とが異なることにより生じる、音響モデルとテストデータとの間の不一致である。このような不一致を抑制することを目的とした音声認識向けの耐雑音手法として、モデル適応法がある。 Since the performance of the speech recognition apparatus is significantly deteriorated due to the influence of noise in actual operation, a noise resistance method is required. The cause of performance degradation is caused by the difference between the audio signal used during acoustic model learning (hereinafter referred to as learning data) and the audio signal that is to be recognized in actual operation (hereinafter referred to as test data). There is a discrepancy between the acoustic model and the test data. There is a model adaptation method as a noise proofing method for speech recognition aimed at suppressing such inconsistency.

モデル適応法は、テストデータが含む雑音の統計量（以下、雑音統計量という。）を音響モデルに反映させることで、音響モデルを構成する混合ガウス分布を、テストデータが作る分布に近づける。雑音の統計量は、例えば、雑音の特徴量の平均や分散である。モデル適応法として、例えば、ＶＴＳ（ＶｅｃｔｏｒＴａｙｌｏｒＳｅｒｉｅｓ）適応法がある（例えば、非特許文献１参照。）。ＶＴＳ適応法は、ＭＦＣＣ（Ｍｅｌ−ＦｒｅｑｕｅｎｃｙＣｅｐｓｔｒａｌＣｏｅｆｆｉｃｉｅｎｔ）のような音響特徴量空間における、音声と雑音、雑音付加音声との関係を規定する非線形関数を１次テイラー近似し、クリーン音響モデル（学習データにクリーンな音声を使用して学習した音響モデル）を雑音に適応する。これにより、ＶＴＳ適応法は、非線形関数から生じる複雑さを排除し、線形演算のみの低演算な雑音適応を行う。 In the model adaptation method, the noise statistic (hereinafter referred to as noise statistic) included in the test data is reflected in the acoustic model, thereby bringing the mixed Gaussian distribution constituting the acoustic model closer to the distribution created by the test data. The noise statistic is, for example, the average or variance of the noise feature. As a model adaptation method, for example, there is a VTS (Vector Taylor Series) adaptation method (see, for example, Non-Patent Document 1). The VTS adaptation method performs first-order Taylor approximation of a nonlinear function that defines the relationship between speech, noise, and noise-added speech in an acoustic feature space such as the MFCC (Mel-Frequency Cepstial Coefficient), and provides a clean acoustic model (learning data). The acoustic model learned using clean speech is adapted to noise. As a result, the VTS adaptation method eliminates the complexity resulting from the non-linear function and performs low-computation noise adaptation using only linear computation.

しかし、ＶＴＳ適応法では、分散が大きい、または、非線形性の影響が大きい領域に平均を持つガウス分布の雑音適応に際して、テイラー近似の誤差が大きくなり、適応精度を劣化させる。そこで、予め、適応する音響モデルのガウス分布の個数を認識時に必要とする個数よりも増やして学習しておく。これにより、各々のガウス分布の分散が小さくなり、線形近似による誤差を小さくすることができる。しかし、ガウス分布の個数を増やした音響モデルを予め用意することにより、計算量が増加してしまうことが短所となる。 However, in the VTS adaptation method, the error of Taylor approximation becomes large and the adaptation accuracy is deteriorated in noise adaptation of a Gaussian distribution having an average in a region where the variance is large or the influence of nonlinearity is large. Therefore, learning is performed in advance by increasing the number of Gaussian distributions of the acoustic model to be adapted to the number necessary for recognition. Thereby, the variance of each Gaussian distribution becomes small, and the error by linear approximation can be made small. However, the disadvantage is that the amount of calculation increases by preparing in advance an acoustic model with an increased number of Gaussian distributions.

ガウス分布の個数を増やした音響モデルを用意出来ない場合に、適応精度の劣化を抑える方法として、ＵＴ（ＵｎｓｃｅｎｔｅｄＴｒａｎｓｆｏｒｍ）適応法がある（例えば、特許文献１参照。）。ＵＴ適応法では、ガウス分布毎に「シグマポイント」と呼ばれるサンプルの集合を生成し、サンプル点毎に雑音適応し、雑音適応ガウス分布を生成する。サンプル点の生成は、ガウス分布の個数を増やす処理に準ずる。これにより、ＵＴ適応法は、ガウス分布の個数を増やした音響モデルが用意できない場合において、ＶＴＳ適応法よりも高い精度で、音響モデルを雑音適応できる。 As a method for suppressing degradation of adaptation accuracy when an acoustic model with an increased number of Gaussian distributions cannot be prepared, there is a UT (Unsented Transform) adaptation method (see, for example, Patent Document 1). In the UT adaptation method, a set of samples called “sigma points” is generated for each Gaussian distribution, noise is applied to each sample point, and a noise adaptive Gaussian distribution is generated. The generation of sample points follows the process of increasing the number of Gaussian distributions. As a result, the UT adaptation method can apply noise to the acoustic model with higher accuracy than the VTS adaptation method when an acoustic model with an increased number of Gaussian distributions cannot be prepared.

特開２０１０−０７８６５０号公報JP 2010-077865 A

Ａ．Ａｃｅｒｏ，Ｌ．Ｄｅｎｇ，Ｔ．Ｋｒｉｓｔｊａｎｓｓｏｎ，ａｎｄＪ．Ｚｈａｎｇ， “ＨＭＭＡｄａｐｔａｔｉｏｎｕｓｉｎｇＶｅｃｔｏｒＴａｙｌｏｒＳｅｒｉｅｓｆｏｒＮｏｉｓy ＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ”，ｉｎＰｒｏｃ．ＩＣＳＬＰ，Ｖｏｌ．３，ｐｐ．８６９−８７２，２０００．A. Acero, L.M. Deng, T.A. Kristjansson, and J.M. Zhang, “HMM Adaptation using Vector Taylor Series for Noise Speech Recognition”, in Proc. ICSLP, Vol. 3, pp. 869-872, 2000.

しかし、クリーン音響モデルを構成するガウス分布の中には、線形近似を用いた雑音適応をしても適応誤差が小さいものが存在する。このため、クリーン音響モデルの全てのガウス分布に対して、ガウス分布を増やしてＶＴＳ適応法を実施する方法や、シグマポイントを生成するＵＴ適応法といった、高精度だが高演算量を必要とする方法を適用することは計算量の無駄である。 However, some of the Gaussian distributions constituting the clean acoustic model have a small adaptation error even if noise adaptation using linear approximation is performed. For this reason, for all Gaussian distributions of the clean acoustic model, a method that increases the Gaussian distribution and implements the VTS adaptation method or a UT adaptation method that generates sigma points requires a high accuracy but requires a large amount of computation. Applying is wasteful of computational complexity.

そこで、本発明は、適応精度を劣化させることなく、より低演算量で音響モデルを雑音適応することができる音響モデル適応装置および音響モデル適応方法および音響モデル適応プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide an acoustic model adaptation device, an acoustic model adaptation method, and an acoustic model adaptation program that can noise-adapt an acoustic model with a lower amount of computation without deteriorating adaptation accuracy. .

本発明による音響モデル適応装置は、音響モデルを雑音に適応して雑音音響モデルを生成する音響モデル適応装置であって、雑音に適応する音響モデルを増やして雑音適応する第一の雑音適応部と、線形近似を用いて雑音適応する第二の雑音適応部と、音響モデルと当該音響モデルを適応する雑音の統計量とに基づいて、第一の雑音適応部または第二の雑音適応部を選択する適応方式選択部とを備えたことを特徴とする。 An acoustic model adaptation apparatus according to the present invention is an acoustic model adaptation apparatus that generates a noise acoustic model by adapting an acoustic model to noise, and includes a first noise adaptation unit that performs noise adaptation by increasing the number of acoustic models adapted to noise. Select the first noise adaptor or the second noise adaptor based on the second noise adaptor that adapts to noise using linear approximation and the acoustic model and the noise statistic that adapts the acoustic model And an adaptive method selection unit.

本発明による音響モデル適応方法は、音響モデルを雑音に適応して雑音音響モデルを生成する音響モデル適応方法であって、音響モデルと当該音響モデルを適応する雑音の統計量とに基づいて、雑音に適応する音響モデルを増やして雑音適応を行うか、または、線形近似を用いて雑音適応を行うかを選択し、選択に基づいて雑音適応を行うことを特徴とする。 An acoustic model adaptation method according to the present invention is an acoustic model adaptation method for generating a noise acoustic model by adapting an acoustic model to noise, and based on the acoustic model and a noise statistic for adapting the acoustic model. It is characterized in that it is selected whether to perform noise adaptation by increasing the number of acoustic models adapted to, or to perform noise adaptation using linear approximation, and to perform noise adaptation based on the selection.

本発明による音響モデル適応プログラムは、音響モデルを雑音に適応して雑音音響モデルを生成する音響モデル適応装置における音響モデル適応プログラムであって、コンピュータに、音響モデルと当該音響モデルを適応する雑音の統計量とに基づいて、雑音に適応する音響モデルを増やして雑音適応を行うか、または、線形近似を用いて雑音適応を行うかを選択し、選択に基づいて雑音適応を行う処理を実行させることを特徴とする。 An acoustic model adaptation program according to the present invention is an acoustic model adaptation program in an acoustic model adaptation apparatus that generates a noise acoustic model by adapting an acoustic model to noise. Based on the statistics, select whether to perform noise adaptation by increasing the number of acoustic models that adapt to noise, or to perform noise adaptation using linear approximation, and execute the process of performing noise adaptation based on the selection It is characterized by that.

本発明によれば、高演算かつ高精度な方法と比較して同程度の適応精度で、つまり、適応精度を劣化させることなく、より低演算量で音響モデルを雑音適応することができる。 According to the present invention, an acoustic model can be noise-adapted with the same degree of adaptation accuracy as compared with a high-calculation and high-accuracy method, that is, without lowering the adaptation accuracy.

本発明による音響モデル適応装置の第１の実施形態における構成を示すブロック図である。It is a block diagram which shows the structure in 1st Embodiment of the acoustic model adaptation apparatus by this invention. 第１の実施形態における音響モデル適応装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the acoustic model adaptation apparatus in 1st Embodiment. 本発明による音響モデル適応装置の第２の実施形態における構成を示すブロック図である。It is a block diagram which shows the structure in 2nd Embodiment of the acoustic model adaptation apparatus by this invention. 第２の実施形態における第一の雑音適応部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the 1st noise adaptation part in 2nd Embodiment. 認識に用いるガウス分布集合と第一の雑音適応部で用いるガウス分布集合との関係を示す木構造の音響モデルの構成の一例を示す説明図である。It is explanatory drawing which shows an example of a structure of the acoustic model of a tree structure which shows the relationship between the Gaussian distribution set used for recognition, and the Gaussian distribution set used in the 1st noise adaptation part. 本発明による音響モデル適応装置の第３の実施形態における構成を示すブロック図である。It is a block diagram which shows the structure in 3rd Embodiment of the acoustic model adaptation apparatus by this invention. 第３の実施形態における適応方式選択部３０３の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the adaptive system selection part 303 in 3rd Embodiment. 本発明による音響モデル適応装置の最小構成を示すブロック図である。It is a block diagram which shows the minimum structure of the acoustic model adaptation apparatus by this invention. 本発明による音響モデル適応装置の他の最小構成を示すブロック図である。It is a block diagram which shows the other minimum structure of the acoustic model adaptation apparatus by this invention.

実施形態１．
以下、本発明の第１の実施形態を図面を参照して説明する。 Embodiment 1. FIG.
A first embodiment of the present invention will be described below with reference to the drawings.

図１は、本発明による音響モデル適応装置の第１の実施形態における構成を示すブロック図である。図１に示すように、音響モデル適応装置１００は、雑音統計量取得部１０１と、ガウス分布取得部１０２と、適応方式選択部１０３と、第一の雑音適応部１０４と、第二の雑音適応部１０５と、ガウス分布格納部１０６とを備える。 FIG. 1 is a block diagram showing the configuration of the acoustic model adaptation apparatus according to the first embodiment of the present invention. As shown in FIG. 1, the acoustic model adaptation apparatus 100 includes a noise statistic acquisition unit 101, a Gaussian distribution acquisition unit 102, an adaptation method selection unit 103, a first noise adaptation unit 104, and a second noise adaptation. Unit 105 and a Gaussian distribution storage unit 106.

また、図１に示すように、音響モデル適応装置１００は、音響モデル適応装置１００が入力する情報を記憶するクリーン音響モデル記憶装置１および雑音統計量記憶装置２と接続される。また、音響モデル適応装置１００は、音響モデル適応装置１００が出力する情報を記憶する雑音適応音響モデル記憶装置３と接続される。 As shown in FIG. 1, the acoustic model adaptation device 100 is connected to a clean acoustic model storage device 1 and a noise statistic storage device 2 that store information input by the acoustic model adaptation device 100. The acoustic model adaptation device 100 is connected to a noise adaptive acoustic model storage device 3 that stores information output from the acoustic model adaptation device 100.

なお、雑音統計量取得部１０１、ガウス分布取得部１０２、適応方式選択部１０３、第一の雑音適応部１０４、第二の雑音適応部１０５およびガウス分布格納部１０６は、音響モデル適応装置１００が備えるＣＰＵ等によって実現される。 The noise statistic acquisition unit 101, the Gaussian distribution acquisition unit 102, the adaptation method selection unit 103, the first noise adaptation unit 104, the second noise adaptation unit 105, and the Gaussian distribution storage unit 106 are included in the acoustic model adaptation apparatus 100. This is realized by a CPU or the like provided.

図２は、第１の実施形態における音響モデル適応装置１００の動作の一例を示すフローチャートである。 FIG. 2 is a flowchart illustrating an example of the operation of the acoustic model adaptation device 100 according to the first embodiment.

図２に示すように、雑音統計量取得部１０１が、雑音統計量記憶装置２から雑音統計量を取得する（ステップＳ１０１）。ガウス分布取得部１０２が、クリーン音響モデル記憶装置１からクリーン音響モデルを構成するガウス分布パラメータを一つずつ取得する（ステップＳ１０２）。適応方式選択部１０３が、ステップＳ１０１において取得された雑音統計量と、ステップＳ１０２において取得されたガウス分布パラメータとを基に、適応方式に第一の雑音適応部１０４の方式を用いるか、第二の雑音適応部１０５の方式を用いるか、を選択する（ステップＳ１０３）。つまり、第一の雑音適応部１０４と第二の雑音適応部１０５のどちらに雑音適応を実行させるか、を選択する。 As shown in FIG. 2, the noise statistic acquisition unit 101 acquires a noise statistic from the noise statistic storage device 2 (step S101). The Gaussian distribution acquisition unit 102 acquires the Gaussian distribution parameters constituting the clean acoustic model one by one from the clean acoustic model storage device 1 (step S102). Based on the noise statistic acquired in step S101 and the Gaussian distribution parameter acquired in step S102, the adaptive scheme selection unit 103 uses the scheme of the first noise adaptation unit 104 as the adaptation scheme, or second It is selected whether to use the method of the noise adaptation unit 105 (step S103). That is, it is selected which of the first noise adaptation unit 104 and the second noise adaptation unit 105 is to perform noise adaptation.

適応方式選択部１０３が第一の雑音適応部１０４の方式を選択した場合は（ステップＳ１０３におけるＹｅｓ）、第一の雑音適応部１０４が、ガウス分布パラメータを雑音適応する（ステップＳ１０４）。適応方式選択部１０３が第二の雑音適応部１０５の方式を選択した場合は（ステップＳ１０３におけるＮｏ）、第二の雑音適応部１０５が、ガウス分布パラメータを雑音適応する（ステップＳ１０５）。 When the adaptation scheme selection unit 103 selects the scheme of the first noise adaptation unit 104 (Yes in step S103), the first noise adaptation unit 104 performs noise adaptation on the Gaussian distribution parameter (step S104). When the adaptation scheme selection unit 103 selects the scheme of the second noise adaptation unit 105 (No in step S103), the second noise adaptation unit 105 performs noise adaptation on the Gaussian distribution parameter (step S105).

ガウス分布格納部１０６は、雑音適応したガウス分布パラメータ（以下、雑音適応音響モデルという。）を、雑音適応音響モデル記憶装置３に格納する（ステップＳ１０６）。 The Gaussian distribution storage unit 106 stores the noise-adapted Gaussian distribution parameter (hereinafter referred to as a noise adaptive acoustic model) in the noise adaptive acoustic model storage device 3 (step S106).

次に、本実施形態における音響モデル適応装置１００が備える各構成要素の詳細について説明する。 Next, the detail of each component with which the acoustic model adaptation apparatus 100 in this embodiment is provided is demonstrated.

まず、音響モデル適応装置１００が入力する情報を記憶するクリーン音響モデル記憶装置１および雑音統計量記憶装置２、音響モデル適応装置１００が出力する情報を記憶する雑音適応音響モデル記憶装置３の詳細を説明する。次に、音響モデル適応装置１００の構成要素である、雑音統計量取得部１０１、ガウス分布取得部１０２、適応方式選択部１０３、第一の雑音適応部１０４、第二の雑音適応部１０５、ガウス分布格納部１０５の詳細を説明する。 First, the details of the clean acoustic model storage device 1 and the noise statistic storage device 2 for storing information input by the acoustic model adaptation device 100 and the noise adaptive acoustic model storage device 3 for storing information output by the acoustic model adaptation device 100 will be described. explain. Next, the noise statistic acquisition unit 101, the Gaussian distribution acquisition unit 102, the adaptation method selection unit 103, the first noise adaptation unit 104, the second noise adaptation unit 105, and the Gauss, which are components of the acoustic model adaptation device 100. Details of the distribution storage unit 105 will be described.

クリーン音響モデル記憶装置１は、学習データにクリーンな音声を使用して学習したクリーン音響モデルを記憶する。以下、学習及び認識に用いる特徴量を、パワーに相当するＣ０特徴量を含むＭＦＣＣ１３次元とする。Ｃ０特徴量は、ＭＦＣＣ１３次元の０次の要素である。なお、ＭＦＣＣ１３次元、その一次動的成分（１３次元）及び二次動的成分（１３次元）で構成される計３９次元のベクトルとしてもよい。なお、パワーに相当する特徴量を含めば、例示したものに限らずあらゆる特徴量を使用することができる。以下の説明で、クリーン音響モデルのガウス分布における平均と分散をそれぞれ次のように表す。 The clean acoustic model storage device 1 stores a clean acoustic model learned using clean speech as learning data. Hereinafter, the feature quantity used for learning and recognition is the MFCC 13 dimension including the C0 feature quantity corresponding to power. The C0 feature amount is an MFCC 13-dimensional zeroth-order element. Note that a vector of 39 dimensions in total composed of MFCC 13 dimensions, its primary dynamic components (13 dimensions), and secondary dynamic components (13 dimensions) may be used. In addition, if the feature-value corresponding to power is included, not only what was illustrated but all the feature-values can be used. In the following description, the mean and variance in the Gaussian distribution of the clean acoustic model are expressed as follows.

μ_ｘ，ｉ，Σ_ｘ，ｉ（ｉ＝１，…，Ｎ） μ _{x, i} , Σ _{x, i} (i = 1,..., N)

ここで、添え字ｘはクリーン音響モデルのパラメータであることを示す。添え字ｉはガウス分布の分布ＩＤ番号を示す。Ｎはクリーン音響モデルに含まれるガウス分布の総数を示す。 Here, the subscript x indicates a parameter of the clean acoustic model. The subscript i indicates the distribution ID number of the Gaussian distribution. N indicates the total number of Gaussian distributions included in the clean acoustic model.

雑音統計量記憶装置２は、適応に用いる雑音の統計量を記憶する。本実施形態では、雑音統計量記憶装置２は、認識で用いるものと同じ特徴量領域における雑音の平均、分散を、雑音の統計量として記憶する。特徴量領域は、ある処理によって作られた特徴量の集合または空間をいう。以下の説明では、雑音の平均、分散をそれぞれ次のように表す。 The noise statistics storage device 2 stores noise statistics used for adaptation. In the present embodiment, the noise statistic storage device 2 stores the average and variance of noise in the same feature quantity region as that used for recognition as noise statistic. The feature amount area refers to a set or space of feature amounts created by a certain process. In the following description, the mean and variance of noise are expressed as follows.

μ_ｎ，Σ_ｎ μ _n , Σ _n

雑音適応音響モデル記憶装置３は、音響モデル適応装置１００によって適応された雑音適応音響モデルを記憶する。以下の説明では、音響モデルの各ガウス分布パラメータである平均と分散とを次のように表す。 The noise adaptive acoustic model storage device 3 stores the noise adaptive acoustic model adapted by the acoustic model adaptation device 100. In the following description, the mean and variance, which are each Gaussian distribution parameter of the acoustic model, are expressed as follows.

μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ（ｉ＝１，…，Ｎ） μ _{y￣, i} , Σ _{y￣, i} (i = 1,..., N)

ここで、添え字ｙ￣（￣は、ｙの上に付く。以下同じ。）は雑音適応音響モデルのパラメータであることを示す。添え字ｉはガウス分布の分布ＩＤ番号を示す。 Here, the subscript y￣ (￣ is on y. The same applies hereinafter) indicates that it is a parameter of the noise adaptive acoustic model. The subscript i indicates the distribution ID number of the Gaussian distribution.

雑音統計量取得部１０１は、雑音統計量記憶装置２に格納されている雑音統計量μ_ｎ，Σ_ｎを取得し、適応方式選択部１０３と、第一の雑音適応部１０４と、第二の雑音適応部１０５とに渡す。 The noise statistic acquisition unit 101 acquires the noise statistic μ _n and Σ _n stored in the noise statistic storage device 2, and the adaptive method selection unit 103, the first noise adaptation unit 104, and the second It passes to the noise adaptation unit 105.

ガウス分布取得部１０２は、クリーン音響モデル記憶装置１に格納されているクリーン音響モデルのＮ個のガウス分布パラメータμ_ｘ，ｉ，Σ_ｘ，ｉ（ｉ＝１，…，Ｎ）を一つずつ取得し適応方式選択部１０３に渡す。 The Gaussian distribution acquisition unit 102 stores N pieces of Gaussian distribution parameters μ _{x, i} , Σ _{x, i} (i = 1,..., N) of the clean acoustic model stored in the clean acoustic model storage device 1 one by one. Obtained and passed to the adaptive method selection unit 103.

適応方式選択部１０３は、ガウス分布取得部１０２と雑音統計量取得部１０１とからそれぞれ渡された、クリーン音響モデルのガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝と雑音統計量｛μ_ｎ，Σ_ｎ｝とを比較する。適応方式選択部１０３は、比較の結果に応じて、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を、第一の雑音適応部１０４で雑音適応するべきか、第二の雑音適応部１０５で雑音適応するべきかを選択する。以下に示すように、当該比較は、スカラー関数Ｃｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）を導入し、これが閾値Ｔｈ以上の値か、閾値Ｔｈ未満の値かを調べることで行う。 The adaptive scheme selection unit 103 receives the clean acoustic model Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } and the noise statistics {μ, respectively passed from the Gaussian distribution acquisition unit 102 and the noise statistics acquisition unit 101. Compare _n , Σ _n }. The adaptation method selection unit 103 determines whether the first noise adaptation unit 104 should perform noise adaptation on the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } according to the comparison result, or determines whether the second noise adaptation unit 104 In 105, it is selected whether noise adaptation should be performed. As shown below, the comparison introduces a scalar function Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ), and checks whether this is a value greater than or equal to a threshold Th. Do that.

式１を満たすとき、適応方式選択部１０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第二の雑音適応部１０５に渡す。 When Expression 1 is satisfied, the adaptation method selection unit 103 passes the Gaussian distribution parameter {μ _{x, i} , Σ _{x, i} } to the second noise adaptation unit 105.

式２を満たすとき、適応方式選択部１０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第一の雑音適応部１０４に渡す。 When Expression 2 is satisfied, the adaptive method selection unit 103 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } to the first noise adaptation unit 104.

次に、Ｃｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）の具体的な例を説明する。 Next, a specific example of Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ) will be described.

適応方式を選択する場合、例えば、ガウス分布の平均μ_ｘ，ｉと雑音の平均μ_ｎとのＣ０特徴量の差分を調べればよい。このとき、ガウス分布の平均μ_ｘ，ｉのＣ０特徴量、雑音の平均μ_ｎのＣ０特徴量をそれぞれ、（μ_ｘ，ｉ）_０、（μ_ｎ）_０とすれば、比較関数は式３のようになる。 When the adaptive method is selected, for example, a difference in C0 feature value between the average μ _{x, i} of the Gaussian distribution and the average μ _{n of the} noise may be examined. At this time, if the C0 feature quantity of the average μ _{x, i} of the Gaussian distribution and the C0 feature quantity of the average μ _n of the noise are (μ _{x, i} ) ₀ and (μ _n ) ₀ respectively, the comparison function is expressed by the following equation (3). become that way.

これは、Ｃ０特徴量はパワーに関する特徴量であることと、音声と雑音のパワーの差の大きさは雑音付加音声を示す非線形関数の非線形の度合いに影響を与えることとを利用している。 This utilizes the fact that the C0 feature amount is a feature amount related to power, and that the magnitude of the difference between the power of speech and noise affects the degree of nonlinearity of the nonlinear function indicating the noise-added speech.

また、適応方式を選択する場合に、ガウス分布の分散のＣ０特徴量を用いてもよい。これは、線形近似による適応誤差は適応するガウス分布の分散の大きさにも依存するためである。ここで、（ｆ（ｘ，ｎ））_０を、雑音付加音声のＣ０特徴量を示す非線形関数とする（ｘ，ｎは、それぞれ、音声、雑音の特徴量とする。）。なお、ｆ（ｘ，ｎ）は、具体的には、式４のように示される。式４において、ＤはＤＣＴ行列を示し、Ｄ^−１はＤＣＴ逆行列を示す。 Further, when an adaptive method is selected, a C0 feature value of Gaussian distribution variance may be used. This is because the adaptation error due to linear approximation also depends on the magnitude of the variance of the Gaussian distribution to be adapted. Here, (f (x, n)) ₀ is a non-linear function indicating the C0 feature amount of noise-added speech (x and n are speech and noise feature amounts, respectively). Note that f (x, n) is specifically expressed as in Expression 4. In Equation 4, D represents a DCT matrix, and D ⁻¹ represents a DCT inverse matrix.

（ｆ（ｘ，ｎ））_０のｘ＝μ_ｘ，ｉ、ｎ＝μ_ｎにおけるテイラー近似式のＣ０特徴量を、（ｆ￣μ_ｘ，ｉ，μ_ｎ（ｘ、ｎ））_０と表す（￣は、ｆの上に付く。μ_ｘ，ｉ，μ_ｎはｆの添え字を表す。以下同じ。）。ここで、ｆ￣μ_ｘ，ｉ，μ_ｎは式５のように示される。 _{(F (x, n))} 0 of _x = μ _{x, i,} the C0 feature quantity Taylor approximation formula in _n = μ n, expressed as _{_{(f¯μ x, i, μ n}} (x, n)) 0 (￣ is placed on f. Μ _{x, i} and μ _n represent subscripts of f. The same shall apply hereinafter.) Here, f￣μ _{x, i} , μ _n is expressed as in Equation 5.

式５におけるＦ_ｉは、ｆ（ｘ、ｎ）のｘに関する、ｘ＝μ_ｘ，ｉ、ｎ＝μ_ｎにおけるヤコビアンを示す。ガウス分布の分散Σ_ｘ，ｉから導出される、Ｃ０特徴量に関する２個のシグマポイントを、σ_１，０＝＋√（Σ_ｘ，ｉ）_０、σ_２，０＝−√（Σ_ｘ，ｉ）_０とする。すると、比較関数は、式６のように示される。なお、（Σ_ｘ，ｉ）_０は、０列目のベクトルとする。 F _i in Equation 5 represents the Jacobian at x = μ _{x, i} and n = μ _{n with} respect to x of f (x, n). Two sigma points related to the C0 feature amount derived from the variance Σ _{x, i} of the Gaussian distribution are expressed as σ _1,0 = + √ (Σ _{x, i} ) ₀ , σ _2,0 = −√ (Σ _{x, i} ) _Set to ₀ . Then, the comparison function is expressed as shown in Equation 6. Note that (Σ _{x, i} ) ₀ is a vector in the 0th column.

なお、式３、式６のＣｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）を、それぞれＣｏｍｐ_１（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）、Ｃｏｍｐ_２（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）として、それぞれの線形和を比較関数としてもよい。式７は、そのときの比較関数を示す。ｗ_１、ｗ_２は重みを表す。 It should be noted that Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ) of Equation 3 and Equation 6 is _expressed as Comp ₁ (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ), Comp, respectively. ₂ (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ), and the respective linear sums may be used as comparison functions. Equation 7 shows the comparison function at that time. w ₁ and w ₂ represent weights.

また、式６に関して、Ｃ０特徴量以外の特徴量を用いてもよい。例えば、全ての特徴量を使うとすると、式８のように表すことができる。 In addition, regarding Equation 6, a feature amount other than the C0 feature amount may be used. For example, if all the feature values are used, it can be expressed as Equation 8.

ここで、Ｊは特徴量の次元数を示し、σ_１，ｊ＝＋√（Σ_ｘ，ｉ）_ｊ、σ_２，ｊ＝−√（Σ_ｘ，ｉ）_ｊである。なお、（Σ_ｘ，ｉ）_ｊは、行列Σ_ｘ，ｉのｊ列目の列ベクトルを示す。なお、式１、２における最適な閾値Ｔｈや、式７における組み合わせ時の重みは、実験的に求めたものを使用してもよい。 Here, J represents the number of dimensions of the feature quantity, and is σ _{1, j} = + √ (Σ _{x, i} ) _j , σ _{2, j} = −√ (Σ _{x, i} ) _j . Note that (Σ _{x, i} ) _j represents a column vector of the j-th column of the matrix Σ _{x, i} . Note that the optimum threshold Th in Equations 1 and 2 and the weight in combination in Equation 7 may be experimentally obtained.

第二の雑音適応部１０５は、ＶＴＳ適応法を用いて、μ_ｙ￣，ｉ，Σ_ｙ￣，ｉを出力する。特徴量をパワーに相当するＣ０特徴量を含むＭＦＣＣ１３次元としたときのＶＴＳ適応法による変換式は、以下のように表される。 Second noise adaptation unit 105 uses the VTS adaptive _{_{method, μ y¯, i, Σ y¯}} , and outputs the _i. A conversion formula according to the VTS adaptive method when the feature amount is the MFCC 13-dimensional including the C0 feature amount corresponding to the power is expressed as follows.

第二の雑音適応部１０５は、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を、ガウス分布格納部１０６に渡す。なお、一次動的特徴量に関するパラメータ｛Δμ_ｘ，ｉ，ΔΣ_ｘ，ｉ｝や二次動的特徴量｛ΔΔμ_ｘ，ｉ，ΔΔΣ_ｘ，ｉ｝に関するパラメータのＶＴＳ適応法による変換式は、以下のように表される。 The second noise adaptation unit 105 _passes {μ _{y ￣, i} , Σ _{y ￣, i} } to the Gaussian distribution storage unit 106. In addition, the conversion formula by the VTS adaptive method of the parameter {Δμ _{x, i} , ΔΣ _{x, i} } relating to the primary dynamic feature quantity and the parameter relating to the secondary dynamic feature quantity {ΔΔμ _{x, i} , ΔΔΣ _{x, i} } is as follows: It is expressed as

ここで、ΔΣ_ｎ，ΔΔΣ_ｎは雑音の一次動的特徴量及び二次動的特徴量の分散を示す。 Here, ΔΣ _n and ΔΔΣ _n indicate the variance of the primary dynamic feature value and the secondary dynamic feature value of noise.

第一の雑音適応部１０４は、ＵＴ適応法を用いて、μ_ｙ￣，ｉ，Σ_ｙ￣，ｉを出力する。特徴量をパワーに相当するＣ０特徴量を含むＭＦＣＣ１３次元としたときのＵＴ適応法による変換式は、以下のように表される。 The first noise adaptation unit 104 outputs μ _{y ￣, i} , Σ _y _{￣, i} using the UT adaptation method. A conversion equation based on the UT adaptation method when the feature amount is the MFCC 13-dimensional including the C0 feature amount corresponding to the power is expressed as follows.

ここで、Ｓ_ｋはシグマポイントを示し、式１７のように表される。 Here, S _k represents a sigma point and is expressed as in Expression 17.

μ_ｓ，ｉ，Σ_ｓ，ｉは式１８のように表される。 μ _{s, i} , Σ _{s, i} are expressed as in Equation 18.

ここで、Ｄ＝１３である。また、（√Σ）_ｋは行列Σのｋ列目のベクトルを示す。そして、ｗ_ｋ＝１／４Ｄである。そして、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を、ガウス分布格納部１０６に渡す。なお、一次動的特徴量に関するパラメータ｛Δμ_ｘ，ｉ，ΔΣ_ｘ，ｉ｝や二次動的特徴量｛ΔΔμ_ｘ，ｉ，ΔΔΣ_ｘ，ｉ｝に関するパラメータのＵＴ適応法による変換式は、以下のように表される。 Here, D = 13. Further, (√Σ) _k represents a vector in the k-th column of the matrix Σ. And w _k = 1 / 4D. Then, {μ _{y￣, i} , Σ _{y, i} } is passed to the Gaussian distribution storage unit 106. In addition, the conversion formula by the UT adaptive method of the parameter {Δμ _{x, i} , ΔΣ _{x, i} } relating to the primary dynamic feature quantity and the parameter relating to the secondary dynamic feature quantity {ΔΔμ _{x, i} , ΔΔΣ _{x, i} } is as follows: It is expressed as

Ｆ´_ｉは、式１５におけるμ_ｙ￣，ｉのμ_ｘ，ｉに関するヤコビアンを示す。 F ′ _i indicates a Jacobian of μ _{y￣, i} in Expression 15 regarding μ _{x, i} .

ガウス分布格納部１０６は、雑音適応したガウス分布パラメータ｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を雑音適応音響モデル記憶装置３に格納する。 The Gaussian distribution storage unit 106 stores noise-adapted Gaussian distribution parameters {μ _{y ￣, i} , Σ _{y ￣, i} } in the noise adaptive acoustic model storage device 3.

以上に説明したように、本実施形態では、ガウス分布の平均のＣ０特徴量と、雑音の平均のＣ０特徴量との差分が一定値以上である場合に、ＵＴ適応法を適用するようにしている。従って、本実施形態によれば、全ガウス分布にＵＴ適応法を用いる場合と比較して、同程度の適応精度で、かつ、より低演算量で音響モデルを雑音適応できる。 As described above, in the present embodiment, the UT adaptation method is applied when the difference between the average C0 feature value of the Gaussian distribution and the average C0 feature value of the noise is a certain value or more. Yes. Therefore, according to the present embodiment, the acoustic model can be noise-adapted with the same degree of adaptation accuracy and with a lower amount of computation than when the UT adaptation method is used for the total Gaussian distribution.

実施形態２．
以下、本発明の第２の実施形態を図面を参照して説明する。 Embodiment 2. FIG.
Hereinafter, a second embodiment of the present invention will be described with reference to the drawings.

図３は、本発明による音響モデル適応装置の第２の実施形態における構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the acoustic model adaptation apparatus according to the second embodiment of the present invention.

音響モデル適応装置２００の構成は、第１の実施形態における音響モデル適応装置１００の構成と同様である。ただし、図３に示すように、音響モデル適応装置２００は、詳細ガウス分布取得部２０７を備える。 The configuration of the acoustic model adaptation device 200 is the same as the configuration of the acoustic model adaptation device 100 in the first embodiment. However, as illustrated in FIG. 3, the acoustic model adaptation device 200 includes a detailed Gaussian distribution acquisition unit 207.

また、音響モデル適応装置２００は、適応方式選択部１０３、第一の雑音適応部１０４の代わりに、適応方式選択部２０３、第一の雑音適応部２０４を備える。 The acoustic model adaptation apparatus 200 includes an adaptation method selection unit 203 and a first noise adaptation unit 204 instead of the adaptation method selection unit 103 and the first noise adaptation unit 104.

また、音響モデル適応装置２００は、クリーン音響モデル記憶装置１および雑音統計量記憶装置２の他に、音響モデル適応装置２００が入力する情報を記憶する詳細クリーン音響モデル記憶装置７と接続される。 In addition to the clean acoustic model storage device 1 and the noise statistic storage device 2, the acoustic model adaptation device 200 is connected to a detailed clean acoustic model storage device 7 that stores information input by the acoustic model adaptation device 200.

なお、適応方式選択部２０３、第一の雑音適応部２０４および詳細ガウス分布取得部２０７は、音響モデル適応装置２００が備えるＣＰＵ等によって実現される。 The adaptation method selection unit 203, the first noise adaptation unit 204, and the detailed Gaussian distribution acquisition unit 207 are realized by a CPU or the like included in the acoustic model adaptation device 200.

なお、本実施形態における音響モデル適応装置２００の動作の概要は、図２に示す音響モデル適応装置１００の動作の概要と同様であるため、説明を省略する。 In addition, since the outline | summary of operation | movement of the acoustic model adaptation apparatus 200 in this embodiment is the same as the outline | summary of operation | movement of the acoustic model adaptation apparatus 100 shown in FIG. 2, description is abbreviate | omitted.

次に、本実施形態における音響モデル適応装置２００が備える各構成要素の詳細について説明する。 Next, the detail of each component with which the acoustic model adaptation apparatus 200 in this embodiment is provided is demonstrated.

詳細クリーン音響モデル記憶装置７は、クリーン音響モデル記憶装置１が記憶するクリーン音響モデルのパラメータをさらに増加して学習した詳細クリーン音響モデルを格納する。以下の説明では、詳細クリーン音響モデルのガウス分布における平均と分散を、それぞれ次のように表す。 The detailed clean acoustic model storage device 7 stores the detailed clean acoustic model learned by further increasing the parameters of the clean acoustic model stored in the clean acoustic model storage device 1. In the following description, the mean and variance in the Gaussian distribution of the detailed clean acoustic model are respectively expressed as follows.

μ_ｘ，ｉｊ，Σ_ｘ，ｉｊ（ｉ＝１，…，Ｎ、ｊ＝１，…，Ｎ） μ _{x, ij} , Σ _{x, ij} (i = 1,..., N, j = 1,..., N)

ここで、μ_ｘ，ｉｊ，Σ_ｘ，ｉｊは、クリーン音響モデルにおける分布ＩＤをｉとするガウス分布から派生したｊ番目のガウス分布パラメータである。 Here, μ _{x, ij} , Σ _{x, ij} are j-th Gaussian distribution parameters derived from a Gaussian distribution with a distribution ID i in the clean acoustic model.

適応方式選択部２０３は、ガウス分布取得部１０２と雑音統計量取得部１０１とからそれぞれ渡された、クリーン音響モデルのガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝と雑音統計量｛μ_ｎ，Σ_ｎ｝とを比較する。当該比較は、第１の実施形態における適応方式選択部１０３と同じ方法であってもよい。適応方式選択部２０３は、比較の結果に応じて、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を、第一の雑音適応部２０４で雑音適応するべきか、第二の雑音適応部１０５で雑音適応するべきかを選択する。適応方式選択部２０３は、雑音適応に第一の雑音適応部２０４を選択した場合、詳細ガウス分布取得部２０７にクリーン音響モデルのガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を渡す。 The adaptive scheme selection unit 203 receives the clean acoustic model Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } and the noise statistics {μ, respectively passed from the Gaussian distribution acquisition unit 102 and the noise statistics acquisition unit 101. Compare _n , Σ _n }. The comparison may be the same method as the adaptive method selection unit 103 in the first embodiment. The adaptation method selection unit 203 determines whether the first noise adaptation unit 204 should perform noise adaptation on the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } according to the comparison result, or the second noise adaptation unit. In 105, it is selected whether noise adaptation should be performed. When the first noise adaptation unit 204 is selected for noise adaptation, the adaptation method selection unit 203 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } of the clean acoustic model to the detailed Gaussian distribution acquisition unit 207.

詳細ガウス分布取得部２０７は、適応方式選択部２０３から受けるガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝のＩＤ番号ｉに基づく、Ｎ_ｉ個のガウス分布パラメータ｛μ_ｘ，ｉｊ，Σ_ｘ，ｉｊ｝（ｊ＝１，…，Ｎ_ｉ）を詳細クリーン音響モデル記憶装置７から取得する。そして、詳細ガウス分布取得部２０７は、Ｎ_ｉ個のガウス分布パラメータ｛μ_ｘ，ｉｊ，Σ_ｘ，ｉｊ｝（ｊ＝１，…，Ｎ_ｉ）を第一の雑音適応部２０４に渡す。 The detailed Gaussian distribution acquisition unit 207 receives N _i Gaussian distribution parameters {μ _{x, ij} , Σ based on the ID number i of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } received from the adaptive method selection unit 203. _{x, ij} } (j = 1,..., N _i ) is acquired from the detailed clean acoustic model storage device 7. The detailed Gaussian distribution acquisition unit 207 then passes the N _i Gaussian distribution parameters {μ _{x, ij} , Σ _{x, ij} } (j = 1,..., N _i ) to the first noise adaptation unit 204.

第一の雑音適応部２０４は、Ｎ_ｉ個のガウス分布パラメータ｛μ_ｘ，ｉｊ，Σ_ｘ，ｉｊ｝（ｊ＝１，…，Ｎ_ｉ）を雑音適応する。第一の雑音適応部２０４は、雑音適応したガウス分布パラメータμ_ｙ￣，ｉ，Σ_ｙ￣，ｉを出力する。 The first noise adaptation unit 204 performs noise adaptation on N _i Gaussian distribution parameters {μ _{x, ij} , Σ _{x, ij} } (j = 1,..., N _i ). The first noise adaptation unit 204 outputs Gaussian distribution parameters μ _{y ￣, i} , Σ _{y ￣, i} that are noise-adapted.

図４は、第２の実施形態における第一の雑音適応部２０４の動作の一例を示すフローチャートである。 FIG. 4 is a flowchart illustrating an example of the operation of the first noise adaptation unit 204 in the second embodiment.

図４に示すように、第一の雑音適応部２０４は、適応方式選択部２０３から受けるガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝のＩＤ番号ｉに基づく、Ｎ_ｉ個のガウス分布パラメータ｛μ_ｘ，ｉｊ，Σ_ｘ，ｉｊ｝（ｊ＝１，…，Ｎ_ｉ）を取得する（ステップＳ２０４１）。 As shown in FIG. 4, the first noise adaptation unit 204 includes N _i Gaussian distributions based on the ID numbers i of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } received from the adaptation method selection unit 203. Parameters {μ _{x, ij} , Σ _{x, ij} } (j = 1,..., N _i ) are acquired (step S2041).

図５は、認識に用いるガウス分布集合と第一の雑音適応部２０４で用いるガウス分布集合との関係を示す木構造の音響モデル（木構造音響モデル）の構成の一例を示す説明図である。 FIG. 5 is an explanatory diagram showing an example of the configuration of a tree-structured acoustic model (tree-structured acoustic model) showing the relationship between the Gaussian distribution set used for recognition and the Gaussian distribution set used in the first noise adaptation unit 204.

第一の雑音適応部２０４は、それぞれのガウス分布に対して、式９、式１０に示すＶＴＳ適応法を用いて、雑音特徴量｛μ_ｎ，Σ_ｎ｝に適応したＮ_ｉ個のガウス分布パラメータ｛μ_{ｙ￣，ｉｊ}，Σ_{ｙ￣，ｉｊ}｝（ｊ＝１，…，Ｎ_ｉ）を取得する（ステップＳ２０４２）。 First noise adaptation unit 204, for each Gaussian distribution formula 9, using the VTS adaptive method shown in Equation 10, the noise feature quantity _{_{{μ n, Σ n} N}} i number of Gaussian distribution adapted to Parameters {μ _{y ￣, ij} , Σ _{y ￣, ij} } (j = 1,..., N _i ) are acquired (step S2042).

第一の雑音適応部２０４は、Ｎ_ｉ個の雑音適応したガウス分布パラメータ｛μ_{ｙ￣，ｉｊ}，Σ_{ｙ￣，ｉｊ}｝（ｊ＝１，…，Ｎ_ｉ）を一つのガウス分布パラメータに統合し、一つの雑音適応したガウス分布パラメータ｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を取得する（ステップＳ２０４２）。 The first noise adaptation unit 204 integrates N _i noise-adapted Gaussian distribution parameters {μ _{y ￣, ij} , Σ _{y ￣, ij} } (j = 1,..., N _i ) into one Gaussian distribution parameter. Then, one noise-adapted Gaussian distribution parameter {μ _{y ￣, i} , Σ _{y ￣, i} } is acquired (step S2042).

ここで、ｗ´_ｊは混合重みで、Σ_ｊ＝１ ^Ｎｉ（ｗ´_ｊ）＝１である。Σ_ｊ＝１ ^Ｎｉは、ｊ＝１からｊ＝Ｎ_ｉまでの総和を示す。混合重みは、実験的に定めてもよいし、等確率すなわち１／Ｎ_ｉとしてもよい。そして、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を、ガウス分布格納部１０６に渡す。 Here, w ′ _j is a mixing weight, and Σ _{j = 1} ^Ni (w ′ _j ) = 1. Σ _{j = 1} ^Ni indicates the total sum from j = 1 to j = N _i . The mixing weight may be determined experimentally, or may be equal probability, that is, 1 / N _i . Then, {μ _{y￣, i} , Σ _{y, i} } is passed to the Gaussian distribution storage unit 106.

以上に説明したように、本実施形態では、ガウス分布の平均のＣ０特徴量と、雑音の平均のＣ０特徴量との差分が一定値以上である場合に、ガウス分布の混合数を増やしてＶＴＳ適応するようにしている。従って、本実施形態によれば、全ガウス分布の適応における計算量の増加を防止することができる。つまり、より低演算量で音響モデルを雑音適応できる。また、適応精度を劣化させることがない。 As described above, in this embodiment, when the difference between the average C0 feature value of the Gaussian distribution and the average C0 feature value of the noise is equal to or greater than a certain value, the number of Gaussian distributions is increased to increase the VTS. I try to adapt. Therefore, according to the present embodiment, it is possible to prevent an increase in the amount of calculation in adaptation of the total Gaussian distribution. That is, the acoustic model can be noise-adapted with a lower amount of computation. Further, the adaptation accuracy is not deteriorated.

実施形態３．
以下、本発明の第３の実施形態を図面を参照して説明する。 Embodiment 3. FIG.
Hereinafter, a third embodiment of the present invention will be described with reference to the drawings.

図６は、本発明による音響モデル適応装置の第３の実施形態における構成を示すブロック図である。 FIG. 6 is a block diagram showing the configuration of the acoustic model adaptation apparatus according to the third embodiment of the present invention.

音響モデル適応装置３００の構成は、第１の実施形態における音響モデル適応装置１００の構成と同様である。 The configuration of the acoustic model adaptation device 300 is the same as the configuration of the acoustic model adaptation device 100 in the first embodiment.

ただし、図６に示すように、音響モデル適応装置３００は、適応方式選択部１０３の代わりに、適応方式選択部３０３を備える。また、音響モデル適応装置３００は、第一の雑音適応部１０４と第二の雑音適応部１０５の代わりに、第一の雑音適応部３０４１と、第二の雑音適応部３０４２と、第三の雑音適応部３０４３（図示せず）と、第四の雑音適応部３０４４とを備える。 However, as illustrated in FIG. 6, the acoustic model adaptation device 300 includes an adaptation method selection unit 303 instead of the adaptation method selection unit 103. In addition, the acoustic model adaptation apparatus 300 includes a first noise adaptation unit 3041, a second noise adaptation unit 3042, and a third noise instead of the first noise adaptation unit 104 and the second noise adaptation unit 105. An adaptation unit 3043 (not shown) and a fourth noise adaptation unit 3044 are provided.

なお、本実施形態における音響モデル適応装置３００の動作の概要は、図２に示す音響モデル適応装置１００の動作の概要と同様である。ただし、図２に示すステップＳ１０４、Ｓ１０５に相当する分岐が雑音適応部の数に応じて増える。 The outline of the operation of the acoustic model adaptation apparatus 300 in the present embodiment is the same as the outline of the operation of the acoustic model adaptation apparatus 100 shown in FIG. However, branches corresponding to steps S104 and S105 shown in FIG. 2 increase according to the number of noise adaptation units.

なお、適応方式選択部３０３、第一の雑音適応部３０４１、第二の雑音適応部３０４２、第三の雑音適応部３０４３および第四の雑音適応部３０４４は、音響モデル適応装置３００が備えるＣＰＵ等によって実現される。 Note that the adaptation method selection unit 303, the first noise adaptation unit 3041, the second noise adaptation unit 3042, the third noise adaptation unit 3043, and the fourth noise adaptation unit 3044 are a CPU or the like provided in the acoustic model adaptation device 300. It is realized by.

次に、本実施形態における音響モデル適応装置３００が備える各構成要素の詳細について説明する。 Next, the detail of each component with which the acoustic model adaptation apparatus 300 in this embodiment is provided is demonstrated.

適応方式選択部３０３は、クリーン音響モデルのガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝と雑音統計量｛μ_ｎ，Σ_ｎ｝とを比較する。適応方式選択部３０３は、比較の結果に応じて、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第一の雑音適応部から第四の雑音適応部のうちのどの雑音適応部で雑音適応するべきか、を決定する。決定の際、適応方式選択部３０３は、第１の実施形態で記載したスカラー関数Ｃｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）を用いる。また、適応方式選択部３０３は、二つの閾値Ｔｈ_１、Ｔｈ_２（ただし、Ｔｈ_１＜Ｔｈ_２）を用いる。なお、Ｔｈ_１、Ｔｈ_２は実験的に求めたものを使用してもよい。 The adaptive scheme selection unit 303 compares the Gaussian distribution parameter {μ _{x, i} , Σ _{x, i} } of the clean acoustic model with the noise statistic {μ _n , Σ _n }. The adaptation scheme selection unit 303 sets the Gaussian distribution parameter {μ _{x, i} , Σx _{, i} } to which noise adaptation unit from the first noise adaptation unit to the fourth noise adaptation unit according to the comparison result. Determine whether noise should be adapted. At the time of determination, the adaptive method selection unit 303 uses the scalar function Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ) described in the first embodiment. In addition, the adaptive scheme selection unit 303 uses two threshold values Th ₁ and Th ₂ (where Th ₁ <Th ₂ ). Th ₁ and Th ₂ may be experimentally obtained.

図７は、第３の実施形態における適応方式選択部３０３の動作の一例を示すフローチャートである。 FIG. 7 is a flowchart illustrating an example of the operation of the adaptive method selection unit 303 according to the third embodiment.

図７に示すように、適応方式選択部３０３は、ガウス分布取得部１０２からクリーン音響モデルのガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を取得し（ステップＳ３０３１）、雑音統計量取得部１０１から｛μ_ｎ，Σ_ｎ｝を取得する（ステップＳ３０３２）。 As shown in FIG. 7, the adaptive scheme selection unit 303 acquires the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } of the clean acoustic model from the Gaussian distribution acquisition unit 102 (step S3031), and acquires noise statistics. {Μ _n , Σ _n } is acquired from the unit 101 (step S3032).

まず、適応方式選択部３０３は、一つ目の閾値Ｔｈ_１とＣｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）との比較を行う（ステップＳ３０３３）。 First, the adaptive method selection unit 303 compares the _first threshold Th ₁ with Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ) (step S3033).

式２５を満たす場合は（ステップＳ３０３３におけるＹｅｓ）、適応方式選択部３０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第一の雑音適応部３０４１に渡す（ステップＳ３０３６）。そして、第一の雑音適応部３０４１が、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝の雑音適応を行う。 When Expression 25 is satisfied (Yes in Step S3033), the adaptive method selection unit 303 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } to the first noise adaptation unit 3041 (Step S3036). Then, the first noise adaptation unit 3041 performs noise adaptation of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} }.

そうでない場合は（ステップＳ３０３３におけるＮｏ）、適応方式選択部３０３は、二つ目の閾値Ｔｈ_２とＣｏｍｐ（μ_ｘ，ｉ，Σ_ｘ，ｉ，μ_ｎ，Σ_ｎ）との比較を行う（ステップＳ３０３４）。 Otherwise (No in step S3033), the adaptive method selection unit 303 compares the second threshold Th ₂ with Comp (μ _{x, i} , Σ _{x, i} , μ _n , Σ _n ) ( Step S3034).

式２６を満たす場合は（ステップＳ３０３４におけるＹｅｓ）、適応方式選択部３０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第二の雑音適応部３０４２に渡す（ステップＳ３０３７）。そして、第二の雑音適応部３０４２が、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝の雑音適応を行う。 When Expression 26 is satisfied (Yes in Step S3034), the adaptive method selection unit 303 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } to the second noise adaptation unit 3042 (Step S3037). Then, the second noise adaptation unit 3042 performs noise adaptation of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} }.

そうでない場合は（ステップＳ３０３４におけるＮｏ）、クリーン音響モデルのパラメータと雑音の統計量とのパワー差が大きいことが考えられる。具体的には、雑音が大きい場合およびクリーン音響モデルのパラメータが大きい場合の二通りが考えられる。この二通りを区別するために、（μ_ｘ、ｉ）_０と（μ_ｎ）_０とを比較する（ステップＳ３０３５）。 If not (No in step S3034), it is considered that the power difference between the clean acoustic model parameter and the noise statistic is large. Specifically, there are two cases where the noise is large and the parameters of the clean acoustic model are large. In order to distinguish these two ways, (μ _{x, i} ) ₀ and (μ _n ) ₀ are compared (step S3035).

式２７を満たす場合は（ステップＳ３０３５におけるＹｅｓ）、適応方式選択部３０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第三の雑音適応部３０４３に渡す（ステップＳ３０３８）。そして、第三の雑音適応部３０４３が、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝の雑音適応を行う。 When Expression 27 is satisfied (Yes in Step S3035), the adaptive method selection unit 303 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } to the third noise adaptation unit 3043 (Step S3038). Then, the third noise adaptation unit 3043 performs noise adaptation of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} }.

そうでない場合は（ステップＳ３０３５におけるＮｏ）、適応方式選択部３０３は、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を第四の雑音適応部３０４４に渡す（ステップＳ３０３９）。そして、第四の雑音適応部３０４４が、ガウス分布パラメータ｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝の雑音適応を行う。 Otherwise (No in step S3035), the adaptation method selection unit 303 passes the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} } to the fourth noise adaptation unit 3044 (step S3039). Then, the fourth noise adaptation unit 3044 performs noise adaptation of the Gaussian distribution parameters {μ _{x, i} , Σ _{x, i} }.

第一の雑音適応部３０４１は、式１５、式１６に示されるＵＴ適応法を適用し、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を出力する。 The first noise adaptation unit 3041 applies the UT adaptation method shown in Equations 15 and 16, and outputs {μ _{y ￣, i} , Σ _{y ￣, i} }.

第二の雑音適応部３０４２は、式９、式１０に示されるＶＴＳ適応法を適用し、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝を出力する。 The second noise adaptation unit 3042 applies the VTS adaptation method shown in Equations 9 and 10, and outputs {μ _{y ￣, i} , Σ _{y￣, i} }.

第三の雑音適応部３０４３は、｛μ_ｘ，ｉ，Σ_ｘ，ｉ｝を、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝として出力する。 The third noise adaptation unit 3043 outputs {μ _{x, i} , Σ _{x, i} } as {μ _{y ￣, i} , Σ _{y ￣, i} }.

第三の雑音適応部３０４４は、｛μ_ｎ，Σ_ｎ｝を、｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝として出力する。 The third noise adaptation unit 3044 outputs {μ _n , Σ _n } as {μ _{y ￣, i} , Σ _{y ￣, i} }.

各雑音適応部から出力された｛μ_ｙ￣，ｉ，Σ_ｙ￣，ｉ｝は、ガウス分布格納部１０６に格納される。 {Μ _{y￣, i} , Σ _{yΣ, i} } output from each noise adaptation unit is stored in the Gaussian distribution storage unit 106.

以上に説明したように、本実施形態によれば、ＶＴＳ適応法よりも低演算の第三の雑音適応部と第四の雑音適応部とを備えることにより、第一の実施の形態における音響モデル適応装置１００よりも、精度を保ちつつ、計算量を減らすことができる。 As described above, according to the present embodiment, the acoustic model according to the first embodiment is provided by including the third noise adaptation unit and the fourth noise adaptation unit that are lower in computation than the VTS adaptation method. Compared to the adaptive device 100, the amount of calculation can be reduced while maintaining accuracy.

なお、本実施形態では、４つの雑音適応部を備えるモデル適応装置を例にしたが、雑音適応部の数は４つに限定されない。つまり、モデル適応装置３００は、演算量と適応精度とがそれぞれ異なる雑音適応部をいくつ備えていてもよい。例えば、適応の近似粒度に応じた数の雑音適応部を備えていてもよい。 In the present embodiment, the model adaptation apparatus including four noise adaptation units is taken as an example, but the number of noise adaptation units is not limited to four. That is, the model adaptation apparatus 300 may include any number of noise adaptation units having different calculation amounts and adaptation accuracy. For example, a number of noise adaptation units corresponding to the approximate granularity of adaptation may be provided.

図８は、本発明による音響モデル適応装置の最小構成を示すブロック図である。図９は、本発明による音響モデル適応装置の他の最小構成を示すブロック図である。 FIG. 8 is a block diagram showing the minimum configuration of the acoustic model adaptation apparatus according to the present invention. FIG. 9 is a block diagram showing another minimum configuration of the acoustic model adaptation apparatus according to the present invention.

図８に示すように、音響モデル適応装置（図１に示す音響モデル適応装置１００に相当。）は、音響モデルを雑音に適応して雑音音響モデルを生成する音響モデル適応装置であって、雑音に適応する音響モデルを増やして雑音適応する第一の雑音適応部２０−１（図１に示す音響モデル適応装置１００における第一の雑音適応部１０４に相当。）と、線形近似を用いて雑音適応する第二の雑音適応部２０−２（図１に示す音響モデル適応装置１００における第二の雑音適応部１０５に相当。）と、音響モデルと当該音響モデルを適応する雑音の統計量とに基づいて、第一の雑音適応部２０−１または第二の雑音適応部２０−２を選択する適応方式選択部１０（図１に示す音響モデル適応装置１００における適応方式選択部１０３に相当。）とを備える。 As shown in FIG. 8, the acoustic model adaptation device (corresponding to the acoustic model adaptation device 100 shown in FIG. 1) is an acoustic model adaptation device that generates a noise acoustic model by adapting an acoustic model to noise. The first noise adapting unit 20-1 (corresponding to the first noise adapting unit 104 in the acoustic model adapting apparatus 100 shown in FIG. 1) that increases the number of acoustic models adapted to noise and adapts to noise, and noise using linear approximation The adaptive second noise adaptation unit 20-2 (corresponding to the second noise adaptation unit 105 in the acoustic model adaptation apparatus 100 shown in FIG. 1), the acoustic model, and the noise statistic for adapting the acoustic model. Based on this, the adaptive scheme selection unit 10 that selects the first noise adaptation unit 20-1 or the second noise adaptation unit 20-2 (corresponding to the adaptation scheme selection unit 103 in the acoustic model adaptation apparatus 100 shown in FIG. 1). When Provided.

上記の実施形態には、以下のような音響モデル適応装置も開示されている。 In the above-described embodiment, the following acoustic model adaptation device is also disclosed.

（１）適応方式選択部１０は、音響モデルと当該音響モデルを適応する雑音の統計量とに基づいて、音響モデルの学習時に用いられた音声と雑音とのパワーの差を判定し、パワーの差が予め定められた閾値より大きい場合は第一の雑音適応部２０−１を選択し、閾値以下である場合は第二の雑音適応部２０−２を選択する音響モデル適応装置。 (1) The adaptation method selection unit 10 determines the power difference between the speech and noise used when learning the acoustic model based on the acoustic model and the statistical amount of noise to which the acoustic model is adapted, The acoustic model adaptation apparatus that selects the first noise adaptation unit 20-1 when the difference is larger than a predetermined threshold value, and selects the second noise adaptation unit 20-2 when the difference is less than the threshold value.

そのような構成によれば、非線形性の影響が大きい領域に平均を持つガウス分布を正確に認識することができる。それは、音声と雑音のパワーの差の大きさは、雑音付加音声を示す非線形関数の非線形の度合いに影響を与えるからである。 According to such a configuration, a Gaussian distribution having an average in a region where the influence of nonlinearity is large can be accurately recognized. This is because the magnitude of the difference between the power of speech and noise affects the degree of nonlinearity of the nonlinear function indicating the noise-added speech.

（２）音響モデルはガウス分布を含み、第一の雑音適応部２０−１は、ガウス分布ごとに複数のシグマポイントを生成し、複数のシグマポイントごとに雑音適応を行う音響モデル適応装置。 (2) The acoustic model includes a Gaussian distribution, and the first noise adaptation unit 20-1 generates a plurality of sigma points for each Gaussian distribution and performs noise adaptation for each of the plurality of sigma points.

そのような構成によれば、適応するガウス分布に応じて、ＵＴ適応法を適用することができる。例えば、非線形性の影響が大きい領域に平均を持つガウス分布を雑音適応する場合はＵＴ適応法を適用し、それ以外の場合は計算量が少ないＶＴＳ適応方法を適用することができる。従って、全ガウス分布の適応における計算量の増加を防止することができる。 According to such a configuration, the UT adaptation method can be applied according to the Gaussian distribution to be adapted. For example, the UT adaptation method can be applied when noise-adapting a Gaussian distribution having an average in a region where the influence of nonlinearity is large, and the VTS adaptation method with a small calculation amount can be applied in other cases. Therefore, it is possible to prevent an increase in the amount of calculation in the adaptation of the total Gaussian distribution.

（３）音響モデルはガウス分布を含み、第一の雑音適応部２０−１（図２に示す音響モデル適応装置２００における第一の雑音適応部２０４に相当。）は、ガウス分布から派生させた複数のガウス分布を雑音に適応する音響モデル適応装置。 (3) The acoustic model includes a Gaussian distribution, and the first noise adaptation unit 20-1 (corresponding to the first noise adaptation unit 204 in the acoustic model adaptation apparatus 200 shown in FIG. 2) is derived from the Gaussian distribution. An acoustic model adaptation device that adapts multiple Gaussian distributions to noise.

そのような構成によれば、適応するガウス分布に応じて、ガウス分布パラメータを増加させることができる。従って、全ガウス分布の適応における計算量の増加を防止することができる。 According to such a configuration, the Gaussian distribution parameter can be increased according to the adaptive Gaussian distribution. Therefore, it is possible to prevent an increase in the amount of calculation in the adaptation of the total Gaussian distribution.

（４）図９に示すように、音響モデルを雑音音響モデルとして出力する第三の雑音適応部２０−３（図６に示す音響モデル適応装置３００における第三の雑音適応部３０４３（図示せず）に相当。）と、雑音の統計量を雑音音響モデルとして出力する第四の雑音適応部２０−４（図６に示す音響モデル適応装置３００における第四の雑音適応部３０４４に相当。）とを備え、適応方式選択部１０（図６に示す音響モデル適応装置３００における適応方式選択部３０３に相当。）は、音響モデルの学習時に用いられた音声と雑音とのパワーの差が予め定められた第二の閾値（閾値Ｔｈ_２に相当）以上である場合に、音声のパワーの方が大きいときは第三の雑音適応部２０−３を選択し、小さいときは第四の雑音適応部２０−４を選択する音響モデル適応装置。 (4) As shown in FIG. 9, the third noise adaptation unit 20-3 (third noise adaptation unit 3043 (not shown) in the acoustic model adaptation apparatus 300 shown in FIG. 6) outputs the acoustic model as a noise acoustic model. ), And a fourth noise adaptation unit 20-4 (corresponding to the fourth noise adaptation unit 3044 in the acoustic model adaptation apparatus 300 shown in FIG. 6) that outputs a noise statistic as a noise acoustic model. The adaptive method selection unit 10 (corresponding to the adaptive method selection unit 303 in the acoustic model adaptation apparatus 300 shown in FIG. 6) has a predetermined power difference between speech and noise used when learning the acoustic model. If the voice power is greater than the second threshold (corresponding to the threshold Th ₂ ), the third noise adaptation unit 20-3 is selected, and if it is smaller, the fourth noise adaptation unit 20 is selected. -4 to select the acoustic mode Dell adaptive device.

そのような構成によれば、全ガウス分布の適応における計算量の増加を防止することができる。それは、クリーン音響モデルのパラメータと雑音の統計量とのパワー差が大きい場合に、より低演算量で雑音適応を行うことができるからである。 According to such a configuration, it is possible to prevent an increase in the amount of calculation in adaptation of the entire Gaussian distribution. This is because, when the power difference between the clean acoustic model parameter and the noise statistic is large, noise adaptation can be performed with a smaller amount of computation.

（５）音響モデルを雑音に適応して雑音音響モデルを生成する音響モデル適応装置であって、演算量と適応精度とがそれぞれ異なる複数の雑音適応部（例えば、図６に示すモデル適応装置３００第一の雑音適応部３０４１、第二の雑音適応部３０４２、第三の雑音適応部３０４３および第四の雑音適応部３０４４に相当。）と、音響モデルおよび音響モデルに適応する雑音の統計量に基づいて、複数の雑音適応部のうちのいずれか一つを選択する適応方式選択部とを備える音響モデル適応装置。 (5) An acoustic model adaptation device that generates a noise acoustic model by adapting an acoustic model to noise, and has a plurality of noise adaptation units (for example, the model adaptation device 300 shown in FIG. Equivalent to the first noise adaptation unit 3041, the second noise adaptation unit 3042, the third noise adaptation unit 3043, and the fourth noise adaptation unit 3044), and the noise statistic adapted to the acoustic model. An acoustic model adaptation apparatus comprising: an adaptation scheme selection unit that selects any one of a plurality of noise adaptation units based on the above.

そのような構成によれば、高演算かつ高精度な方法と比較して、適応精度を劣化させることなく、より低演算量で音響モデルを雑音適応することができる。 According to such a configuration, the noise model can be noise-adapted with a smaller amount of computation without degrading the adaptation accuracy as compared with a method with high computation and high accuracy.

１クリーン音響モデル記憶装置
２雑音統計量記憶装置
３雑音適応音響モデル記憶装置
７詳細クリーン音響モデル記憶装置
１０、１０３、２０３、３０３適応方式選択部
２０−１、１０４、２０４、３０４１第一の雑音適応部
２０−２、１０５、３０４２第二の雑音適応部
２０−３第三の雑音適応部
２０−４、３０４４第四の雑音適応部
１００、２００、３００モデル適応装置
１０１雑音統計量取得部
１０２ガウス分布取得部
１０６ガウス分布格納部
２０７詳細ガウス分布取得部 DESCRIPTION OF SYMBOLS 1 Clean acoustic model memory | storage device 2 Noise statistic memory | storage device 3 Noise adaptive acoustic model memory | storage device 7 Detailed clean acoustic model memory | storage device 10, 103, 203, 303 Adaptive system selection part 20-1, 104, 204, 3041 1st noise Adaptation unit 20-2, 105, 3042 Second noise adaptation unit 20-3 Third noise adaptation unit 20-4, 3044 Fourth noise adaptation unit 100, 200, 300 Model adaptation device 101 Noise statistics acquisition unit 102 Gaussian distribution acquisition unit 106 Gaussian distribution storage unit 207 Detailed Gaussian distribution acquisition unit

Claims

An acoustic model adaptation device for generating a noise acoustic model by adapting an acoustic model to noise,
A first noise adaptation unit that adapts noise by increasing the number of acoustic models adapted to noise;
A second noise adaptation unit for noise adaptation using linear approximation;
An acoustic system comprising: an acoustic model and an adaptive method selection unit that selects the first noise adaptation unit or the second noise adaptation unit based on a statistical amount of noise to which the acoustic model is adapted. Model adaptation device.

The adaptation method selection unit determines a power difference between the speech and the noise used when learning the acoustic model based on the acoustic model and a noise statistic for adapting the acoustic model, and determines the power difference. The acoustic model adaptation apparatus according to claim 1, wherein when the value is larger than a predetermined threshold, the first noise adaptation unit is selected, and when the value is equal to or less than the threshold, the second noise adaptation unit is selected.

The acoustic model includes a Gaussian distribution,
The acoustic model adaptation device according to claim 1, wherein the first noise adaptation unit generates a plurality of sigma points for each of the Gaussian distributions and performs noise adaptation for each of the plurality of sigma points.

The acoustic model includes a Gaussian distribution,
The acoustic model adaptation device according to claim 1, wherein the first noise adaptation unit adapts a plurality of Gaussian distributions derived from the Gaussian distribution to noise.

A third noise adaptation unit that outputs the acoustic model as a noise acoustic model;
A fourth noise adaptation unit that outputs noise statistics as a noise acoustic model,
The adaptive method selection unit determines whether the power of the voice is larger when the difference in power between the voice used when learning the acoustic model and the noise is equal to or greater than a predetermined second threshold. The acoustic model adaptation device according to any one of claims 1 to 4, wherein a third noise adaptation unit is selected, and if it is small, a fourth noise adaptation unit is selected.

An acoustic model adaptation device for generating a noise acoustic model by adapting an acoustic model to noise,
A plurality of noise adaptation units each having different computational complexity and adaptation accuracy;
An acoustic model adaptation apparatus comprising: an acoustic model and an adaptation method selection unit that selects any one of the plurality of noise adaptation units based on a noise statistic adapted to the acoustic model.

An acoustic model adaptation method for generating a noise acoustic model by adapting an acoustic model to noise,
Based on the acoustic model and the noise statistic to which the acoustic model is adapted, it is selected whether noise adaptation is performed by increasing the number of acoustic models adapted to noise, or noise adaptation is performed using linear approximation. An acoustic model adaptation method characterized by performing noise adaptation based on selection.

An acoustic model adaptation program in an acoustic model adaptation apparatus for generating a noise acoustic model by adapting an acoustic model to noise,
On the computer,
Based on the acoustic model and the noise statistic to which the acoustic model is adapted, it is selected whether noise adaptation is performed by increasing the number of acoustic models adapted to noise, or noise adaptation is performed using linear approximation. An acoustic model adaptation program for executing a noise adaptation process based on the selection.