JP4542399B2

JP4542399B2 - Speech spectrum estimation apparatus and speech spectrum estimation program

Info

Publication number: JP4542399B2
Application number: JP2004268028A
Authority: JP
Inventors: 健小早川; 寛之世木
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2004-09-15
Filing date: 2004-09-15
Publication date: 2010-09-15
Anticipated expiration: 2024-09-15
Also published as: JP2006084639A

Description

本発明は、雑音スペクトルが重畳されている雑音重畳音声スペクトルから音声スペクトルを推定する音声スペクトル推定装置および音声スペクトル推定プログラムに関する。 The present invention relates to a speech spectrum estimation apparatus and speech spectrum estimation program for estimating a speech spectrum from a noise superimposed speech spectrum on which a noise spectrum is superimposed.

従来、音声（音声信号）を処理する分野において、音声に雑音（雑音信号）が予め重畳（混合）している雑音重畳音声（雑音重畳音声信号）から雑音を軽減するスペクトル・サブストラクション（スペクトル減算法）がある。このスペクトル減算法は、雑音重畳音声および雑音をスペクトル分析した結果である雑音重畳音声スペクトルおよび雑音スペクトルから音声スペクトルを推定する方法（例えば、特許文献１，２，３，４，５、非特許文献１，２を参照）であり、この方法を具現化した従来の音声スペクトル推定装置を図５に示す。 Conventionally, in the field of processing speech (speech signal), spectrum subtraction (spectrum subtraction) that reduces noise from noise superimposed speech (noise superimposed speech signal) in which noise (noise signal) is superimposed (mixed) in advance on speech. Law). This spectral subtraction method is a method for estimating a speech spectrum from a noise-superimposed speech spectrum and a noise spectrum that are the result of spectrum analysis of the noise-superimposed speech and noise (for example, Patent Documents 1, 2, 3, 4, 5, and Non-Patent Documents). FIG. 5 shows a conventional speech spectrum estimation apparatus that embodies this method.

この図５に示すように、音声スペクトル推定装置１０１は、雑音重畳音声スペクトルｒ_X、雑音スペクトルｒ_Nおよび信号対雑音比（Ｓ／Ｎ比）ｘから音声スペクトルｒ_Sを推定するもので、減算係数計算部１０３と、減算スペクトル計算部１０５と、スペクトル減算部１０７とを備えている。この音声スペクトル推定装置１０１では、次に示す数式（１）を用いて、雑音重畳音声スペクトルｒ_X、雑音スペクトルｒ_Nおよび信号対雑音比（Ｓ／Ｎ比）ｘから音声スペクトルｒ_Sを推定している。 As shown in FIG. 5, a speech spectrum estimation apparatus 101 estimates a speech spectrum r _S from a noise superimposed speech spectrum r _X , a noise spectrum r _N and a signal-to-noise ratio (S / N ratio) x. A coefficient calculation unit 103, a subtraction spectrum calculation unit 105, and a spectrum subtraction unit 107 are provided. The speech spectrum estimation apparatus 101 estimates the speech spectrum r _S from the noise superimposed speech spectrum r _X , the noise spectrum r _N, and the signal-to-noise ratio (S / N ratio) x using the following formula (1). ing.

減算係数計算部１０３は、入力された信号対雑音比（Ｓ／Ｎ比）ｘに基づいて、数式（１）の第二項（雑音スペクトルｒ_Nの項、減算スペクトル）の係数である減算係数１／（１＋γ・ｘ）を計算するものである。なお、γは、任意に調整可能なパラメータである。 The subtraction coefficient calculation unit 103 is a subtraction coefficient that is a coefficient of the second term (the term of the noise spectrum r _N , the subtraction spectrum) of Equation (1) based on the input signal-to-noise ratio (S / N ratio) x. 1 / (1 + γ · x) is calculated. Note that γ is an arbitrarily adjustable parameter.

減算スペクトル計算部１０５は、減算係数計算部１０３で計算した係数１／（１＋γ・ｘ）と、入力された雑音スペクトルｒ_Nとから数式（１）の第二項（減算スペクトル）を計算するものである。 The subtraction spectrum calculation unit 105 calculates the second term (subtraction spectrum) of Equation (1) from the coefficient 1 / (1 + γ · x) calculated by the subtraction coefficient calculation unit 103 and the input noise spectrum r _N. It is.

スペクトル減算部１０７は、入力された雑音重畳音声スペクトルｒ_X（第一項）から、減算スペクトル計算部１０５で計算された第二項（減算スペクトル）を減算することで、音声スペクトルｒ_Sを推定して出力するものである。
特許第２８３６２７１号明細書（段落００３２〜００３８、図５、図８）特許第２８６３２１４号明細書（発明の詳細な説明、図３）特許第３１１８０２３号明細書（段落０００３、０００４、図１）特許第３４５１１４６号明細書（段落００１３〜００２４、図１）特許第３４５４２０６号明細書（段落００３３〜００７３、図１）Ｐ．ＬｏｃｋｗｏｏｄａｎｄＪ．ＢｏｕｄｙＥｘｐｅｒｉｍｅｎｔｓｗｉｔｈａｎｏｎｌｉｎｅａｒｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｏｒ（ｎｓｓ），ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌｓａｎｄｔｈｅｐｒｏｊｅｃｔｉｏｎ，ｆｏｒｒｏｂｕｓｔｓｐｅｅｃｈｒｅｃｏｇｎｉｔｉｏｎｉｎｃａｒｓ．ＳｐｅｅｃｈＣｏｍｍｕｎｉｃａｔｉｏｎ，Ｖｏｌ．１１，ｐｐ．２１５−２２８，１９９２．ＮｕｍｅｒｉｃａｌＲｅｃｉｐｅｓｉｎＣ，ｃｈａｐｔｅｒ６．Ｃａｍｂｒｉｄｇｅ，２ｎｄｅｄｉｔｉｏｎ，１９９２． The spectrum subtracting unit 107 estimates the speech spectrum r _S by subtracting the second term (subtracted spectrum) calculated by the subtracted spectrum calculating unit 105 from the input noise superimposed speech spectrum r _X (first term). Output.
Japanese Patent No. 2836271 (paragraphs 0032 to 0038, FIGS. 5 and 8) Japanese Patent No. 2863214 (Detailed Description of the Invention, FIG. 3) Japanese Patent No. 3118023 (paragraphs 0003 and 0004, FIG. 1) Japanese Patent No. 3451146 (paragraphs 0013 to 0024, FIG. 1) Japanese Patent No. 3454206 (paragraphs 0033 to 0073, FIG. 1) P. Lockwood and J.M. Body Experiments with a non-linear spectral sub- tractor (nss), Hidden Markov Models and the projection, for robust speech recognition. Speech Communication, Vol. 11, pp. 215-228, 1992. Numerical Recipes in C, chapter 6. Cambridge, 2nd edition, 1992.

しかしながら、従来の音声スペクトル推定装置１０１は、雑音スペクトルｒ_Nに係る減算係数を計算して、雑音重畳音声スペクトルｒ_X（第一項）に含まれている雑音スペクトルｒ_Nの割合を、信号対雑音比（Ｓ／Ｎ比）ｘに従って変化させるだけであり、雑音重畳音声スペクトルｒ_X（第一項）が信号対雑音比（Ｓ／Ｎ比）ｘによってどのように変化するのかが考慮されていないので、必ずしも良好な雑音スペクトル除去が行われていないという問題がある。 However, the conventional speech spectrum estimation apparatus 101 calculates a subtraction coefficient related to the noise spectrum r _N and calculates the ratio of the noise spectrum r _N contained in the noise superimposed speech spectrum r _X (first term) as a signal pair. It is only changed according to the noise ratio (S / N ratio) x, and how the noise superimposed speech spectrum r _X (first term) changes depending on the signal-to-noise ratio (S / N ratio) x is considered. Therefore, there is a problem that good noise spectrum removal is not necessarily performed.

そこで、本発明では、前記した問題を解決し、雑音重畳音声スペクトルから雑音スペクトル除去を良好に行うことができる音声スペクトル推定装置および音声スペクトル推定プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a speech spectrum estimation device and a speech spectrum estimation program that can solve the above-described problems and can satisfactorily remove a noise spectrum from a noise superimposed speech spectrum.

前記課題を解決するため、請求項１記載の音声スペクトル推定装置は、音声スペクトルに雑音スペクトルが予め重畳されている雑音重畳音声スペクトルから、前記音声スペクトルを推定する音声スペクトル推定装置であって、係数計算手段と、雑音スペクトル減算手段と、を備え、係数計算手段が、第一項係数計算手段と、第二項係数計算手段とを備える構成とした。 In order to solve the above problem, the speech spectrum estimation device according to claim 1 is a speech spectrum estimation device that estimates the speech spectrum from a noise-superimposed speech spectrum in which a noise spectrum is preliminarily superimposed on the speech spectrum, and includes a coefficient A calculation means and a noise spectrum subtraction means are provided, and the coefficient calculation means includes a first term coefficient calculation means and a second term coefficient calculation means .

かかる構成によれば、音声スペクトル推定装置は、係数計算手段によって、信号対雑音比ｘに基づいて、下記の数式（２）における雑音重畳音声スペクトルｒ _Ｘの割合を示す雑音重畳音声スペクトル係数α（ｘ）および雑音スペクトルｒ _Ｎの割合を示す雑音スペクトル係数（１／（１＋β・ｘ））を計算する。なお、雑音重畳音声スペクトルは、何らかの雑音が存在する場所で、発話者が発話した音声を集音し、スペクトル分析して得られたものであり、雑音スペクトルは、発話者が発話した音声以外の音声等（何らかの雑音、例えば、別の発話者が発声した音声、オフィス機器や空調機等による音声以外の騒音）を集音し、周波数変換して得られたものである。 According to such a configuration, the speech spectrum estimation device uses the coefficient calculation means to calculate the noise superimposed speech spectrum coefficient α () indicating the ratio of the noise superimposed speech spectrum r _X in the following equation (2) based on the signal-to-noise ratio x. x) and a noise spectrum coefficient (1 / (1 + β · x)) indicating the ratio of the noise spectrum r _N is calculated. Note that the noise-superimposed speech spectrum is obtained by collecting the speech uttered by the speaker and analyzing the spectrum in a place where some noise exists, and the noise spectrum is other than the speech uttered by the speaker. It is obtained by collecting and converting the frequency of voice or the like (some noise, for example, voices uttered by another speaker, noises other than voices generated by office equipment, air conditioners, etc.).

続いて、音声スペクトル推定装置は、雑音スペクトル除去手段によって、係数計算手段で計算した雑音重畳音声スペクトル係数α（ｘ）および雑音スペクトル係数（１／（１＋β・ｘ））に基づいて、数式（２）における第一項α（ｘ）ｒ _Ｘから第二項ｒ _Ｎ／（１＋β・ｘ）を減算して、雑音重畳音声スペクトルから雑音スペクトルを除去する。例えば、雑音重畳音声スペクトルおよび雑音スペクトルに、信号対雑音比が反映された係数が乗算された後に、当該雑音重畳音声スペクトルから当該雑音スペクトルを減算することで、より正確な音声スペクトルが得られることになる。 Subsequently, the speech spectrum estimation device uses the noise spectrum removal unit to calculate the mathematical expression (2 ) based on the noise superimposed speech spectrum coefficient α (x) and the noise spectrum coefficient (1 / (1 + β · x)) calculated by the coefficient calculation unit. ) To subtract the second term r _N / (1 + β · x) from the first term α (x) r _X to remove the noise spectrum from the noise superimposed speech spectrum. For example, a more accurate speech spectrum can be obtained by subtracting the noise spectrum from the noise-superimposed speech spectrum after the noise-superimposed speech spectrum and the noise spectrum are multiplied by a coefficient reflecting the signal-to-noise ratio. become.

また、音声スペクトル推定装置は、第一項係数計算手段によって、雑音重畳音声スペクトル係数α（ｘ）を、下記の数式（３）および第二種完全楕円積分である数式（４）を用いて計算する。そして、音声スペクトル推定装置は、第二項係数計算手段によって、予め決定されたパラメータβを含む前記雑音スペクトル係数（１／（１＋β・ｘ））を計算する。 In addition, the speech spectrum estimation apparatus calculates the noise-superimposed speech spectrum coefficient α (x) by the first term coefficient calculation means using the following formula (3) and formula (4) that is the second type complete elliptic integral. To do. Then, the speech spectrum estimation device calculates the noise spectrum coefficient (1 / (1 + β · x)) including the predetermined parameter β by the second term coefficient calculation means.

また、音声スペクトル推定装置は、係数計算手段によって、信号対雑音比をパラメータとする楕円積分を含む関数を用いることで、信号対雑音比の変化に伴った最適な雑音重畳音声スペクトルを得ることができる。最も確からしい雑音重畳音声スペクトルを推定することで最も確からしい音声が推定され、効果的な雑音除去が可能になる。最も確からしいスペクトルの推定を行う際に、スペクトル強度（実スペクトル）だけでなく、楕円積分を含む関数を用いることで、スペクトル位相も考慮することとなり、従来法よりも高精度な音声スペクトルの推定が可能になる。 In addition, the speech spectrum estimation apparatus can obtain an optimum noise-superimposed speech spectrum accompanying a change in the signal-to-noise ratio by using a function including an elliptic integral with the signal-to-noise ratio as a parameter by the coefficient calculation means. it can. By estimating the most probable noise superimposed speech spectrum, the most probable speech is estimated, and effective noise removal becomes possible. When estimating the most probable spectrum, not only the spectrum intensity (real spectrum) but also a function including an elliptic integral is used to consider the spectrum phase, and the speech spectrum is estimated with higher accuracy than the conventional method. Is possible.

請求項２記載の音声スペクトル推定装置は、請求項１に記載の音声スペクトル推定装置において、前記係数計算手段が、前記雑音重畳音声スペクトル係数を計算する際に、前記楕円積分を含む関数の級数展開に基づいた多項式によって、前記楕円積分を含む関数を近似することを特徴とする。 Speech spectrum estimation device according to claim 2, in the speech spectrum estimation apparatus according to claim 1, wherein the coefficient calculation means, in calculating the noisy speech spectrum coefficient, series expansion of functions including the elliptic integrals The function including the elliptic integral is approximated by a polynomial based on.

かかる構成によれば、音声スペクトル推定装置は、係数計算手段によって、信号対雑音比をパラメータとする楕円積分を含む関数を用いることで、信号対雑音比の変化に伴った最適な雑音重畳音声スペクトルを得ることができる。楕円積分を含む関数を級数展開することで、高速な近似計算が実行でき、その結果、信号雑音比の変化に伴った最適な雑音重畳音声スペクトルを得ることができる。 According to this configuration, the speech spectrum estimation apparatus uses the coefficient calculation means to use a function including an elliptic integral with the signal-to-noise ratio as a parameter, so that the optimum noise-superimposed speech spectrum accompanying the change in the signal-to-noise ratio is obtained. Can be obtained. By performing series expansion of functions including elliptic integrals, high-speed approximate calculation can be performed, and as a result, an optimal noise-superimposed speech spectrum accompanying a change in signal-to-noise ratio can be obtained.

請求項３記載の音声スペクトル推定プログラムは、音声スペクトルに雑音スペクトルが予め重畳されている雑音重畳音声スペクトルから、前記音声スペクトルを推定するために、コンピュータを、係数計算手段、雑音スペクトル除去手段、として機能させ、係数計算手段を、第一項係数計算手段、第二項係数計算手段、として機能させる構成とした。 According to a third aspect of the present invention, there is provided a speech spectrum estimation program comprising: a computer as a coefficient calculation unit and a noise spectrum removal unit, in order to estimate the speech spectrum from a noise superimposed speech spectrum in which a noise spectrum is preliminarily superimposed on the speech spectrum. The coefficient calculation means is configured to function as a first term coefficient calculation means and a second term coefficient calculation means .

かかる構成によれば、音声スペクトル推定プログラムは、係数計算手段によって、信号対雑音比ｘに基づいて、下記の数式（２）における雑音重畳音声スペクトルｒ _Ｘの割合を示す雑音重畳音声スペクトル係数α（ｘ）および雑音スペクトルｒ _Ｎの割合を示す雑音スペクトル係数（１／（１＋β・ｘ））を計算する。続いて、音声スペクトル推定プログラムは、雑音スペクトル除去手段によって、係数計算手段で計算した雑音重畳音声スペクトル係数α（ｘ）および雑音スペクトル係数（１／（１＋β・ｘ））に基づいて、数式（２）における第一項α（ｘ）ｒ _Ｘから第二項ｒ _Ｎ／（１＋β・ｘ）を減算して、雑音重畳音声スペクトルから雑音スペクトルを除去する。そして、音声スペクトル推定プログラムは、第一項係数計算手段によって、雑音重畳音声スペクトル係数α（ｘ）を、下記の数式（３）および第二種完全楕円積分である数式（４）を用いて計算する。さらに、音声スペクトル推定プログラムは、第二項係数計算手段によって、予め決定されたパラメータβを含む前記雑音スペクトル係数（１／（１＋β・ｘ））を計算する。 According to this configuration, the speech spectrum estimation program uses the coefficient calculation means to calculate the noise superimposed speech spectrum coefficient α () indicating the ratio of the noise superimposed speech spectrum r _X in the following equation (2) based on the signal-to-noise ratio x. x) and a noise spectrum coefficient (1 / (1 + β · x)) indicating the ratio of the noise spectrum r _N is calculated. Subsequently, the speech spectrum estimation program, the noise spectrum removing means, based on the noisy speech spectrum coefficients calculated by the coefficient calculation unit alpha (x) and noise spectrum coefficient (1 / (1 + β · x)), equation (2 ) To subtract the second term r _N / (1 + β · x) from the first term α (x) r _X to remove the noise spectrum from the noise superimposed speech spectrum. Then, the speech spectrum estimation program calculates the noise-superimposed speech spectrum coefficient α (x) by the first term coefficient calculation means using the following formula (3) and the formula (4) that is the second type complete elliptic integral. To do. Further, the speech spectrum estimation program calculates the noise spectrum coefficient (1 / (1 + β · x)) including the predetermined parameter β by the second term coefficient calculation means.

請求項１または３に記載の発明によれば、雑音重畳音声スペクトルに、信号対雑音比に応じた雑音重畳音声スペクトル係数を反映させているので、雑音重畳音声スペクトルと雑音スペクトルとの双方に信号対雑音比が反映されることになり、雑音重畳音声スペクトルから雑音スペクトル除去を良好に行うことができ、より正確に音声スペクトルの推定を行うことができる。 According to the first or third aspect of the invention, since the noise superimposed speech spectrum coefficient corresponding to the signal-to-noise ratio is reflected in the noise superimposed speech spectrum, the signal is present in both the noise superimposed speech spectrum and the noise spectrum. The noise-to-noise ratio is reflected, so that noise spectrum can be favorably removed from the noise-superimposed speech spectrum, and speech spectrum can be estimated more accurately.

請求項１または３に記載の発明によれば、信号対雑音比をパラメータとする関数に、音声スペクトルおよび雑音スペクトルにおける複素スペクトルの位相差によって、スペクトルのレベルを平均する際に得られる楕円積分を用いて、雑音重畳音声スペクトル係数を計算しているので、信号対雑音比の変化に伴った最適な雑音重畳音声スペクトルを得ることができる。 According to the first or third aspect of the present invention, the elliptic integral obtained when the spectrum level is averaged by the phase difference of the complex spectrum in the speech spectrum and the noise spectrum is added to the function having the signal-to-noise ratio as a parameter. Thus, since the noise-superimposed speech spectrum coefficient is calculated, an optimum noise-superimposed speech spectrum accompanying a change in the signal-to-noise ratio can be obtained.

請求項２に記載の発明によれば、級数展開に基づいた多項式を用いて、信号対雑音比から雑音重畳音声スペクトル係数を計算しているので、信号対雑音比の変化に伴った最適な雑音重畳音声スペクトルを得ることができる。 According to the second aspect of the present invention, since the noise-superimposed speech spectrum coefficient is calculated from the signal-to-noise ratio using a polynomial based on the series expansion, the optimum noise accompanying the change in the signal-to-noise ratio is calculated. A superimposed speech spectrum can be obtained.

次に、本発明の実施形態について、適宜、図面を参照しながら詳細に説明する。
〈音声スペクトル推定活用システムの構成〉
図１は、音声スペクトル推定活用システムのブロック図である。この図１に示すように、音声スペクトル推定活用システムＳは、雑音重畳音声（雑音重畳音声信号）と雑音（雑音信号）とを入力して、これらをスペクトル分析した雑音重畳音声スペクトルと雑音スペクトルとから音声スペクトルを推定し、この推定した音声スペクトルを活用（利用）するもので、音声スペクトル推定装置１と、スペクトル出力部２と、音声スペクトル活用部４とを備えている。
音声スペクトル推定装置１の説明に先立ち、スペクトル出力部２および音声スペクトル活用部４の説明をする。 Next, embodiments of the present invention will be described in detail with reference to the drawings as appropriate.
<Configuration of speech spectrum estimation and utilization system>
FIG. 1 is a block diagram of a speech spectrum estimation utilization system. As shown in FIG. 1, the speech spectrum estimation and utilization system S inputs noise superimposed speech (noise superimposed speech signal) and noise (noise signal), and performs noise analysis on the noise superimposed speech spectrum and noise spectrum. The speech spectrum is estimated from the speech spectrum, and the estimated speech spectrum is utilized (utilized). The speech spectrum estimation apparatus 1, the spectrum output unit 2, and the speech spectrum utilization unit 4 are provided.
Prior to the description of the speech spectrum estimation apparatus 1, the spectrum output unit 2 and the speech spectrum utilization unit 4 will be described.

〔スペクトル出力部の構成〕
スペクトル出力部２は、雑音重畳音声と雑音とを入力（取得）し、これらをスペクトル分析した雑音重畳音声スペクトルおよび雑音スペクトルと、信号対雑音比（Ｓ／Ｎ比）とを、音声スペクトル推定装置１に出力するもので、雑音重畳音声収録マイク６ａ（６）と、雑音収録マイク６ｂ（６）と、マイクアンプ８ａ（８）と、マイクアンプ８ｂ（８）と、スペクトル分析部１０ａ（１０）と、スペクトル分析部１０ｂ（１０）と、補正装置１２と、Ｓ／Ｎ比推定部１４とを備えている。 [Configuration of spectrum output section]
The spectrum output unit 2 inputs (acquires) noise-superimposed speech and noise, and performs noise analysis on the noise-superimposed speech spectrum and noise spectrum, and the signal-to-noise ratio (S / N ratio). 1, a noise-superimposed voice recording microphone 6 a (6), a noise recording microphone 6 b (6), a microphone amplifier 8 a (8), a microphone amplifier 8 b (8), and a spectrum analysis unit 10 a (10). And a spectrum analyzer 10b (10), a correction device 12, and an S / N ratio estimator 14.

なお、雑音重畳音声収録マイク６ａ（６）および雑音収録マイク６ｂ（６）と、マイクアンプ８ａ（８）およびマイクアンプ８ｂ（８）と、スペクトル分析部１０ａ（１０）およびスペクトル分析部１０ｂ（１０）とについて、これらそれぞれの双方を指す、または、これらのいずれかに限定しない場合には、単に、収録マイク６、マイクアンプ８、スペクトル分析部１０と記載することにする。 The noise-superimposed voice recording microphone 6a (6), the noise recording microphone 6b (6), the microphone amplifier 8a (8), the microphone amplifier 8b (8), the spectrum analysis unit 10a (10), and the spectrum analysis unit 10b (10) ) And both of these, or when not limited to any one of them, they are simply described as a recording microphone 6, a microphone amplifier 8, and a spectrum analysis unit 10.

また、雑音重畳音声収録マイク６ａ、マイクアンプ８ａおよびスペクトル分析部１０ａと、雑音収録マイク６ｂ、マイクアンプ８ｂおよびスペクトル分析部１０ｂとは、それぞれ所定の回線特性を有する電気回線で接続されている。 The noise-superimposed voice recording microphone 6a, the microphone amplifier 8a, and the spectrum analyzing unit 10a are connected to the noise recording microphone 6b, the microphone amplifier 8b, and the spectrum analyzing unit 10b through electric lines having predetermined line characteristics.

雑音重畳音声収録マイク６ａは、音声スペクトル推定装置１で推定したい音声スペクトルの元となる音声（音声信号）を発話する発話者に装着（設置）され、当該発話者が発話した音声を収録するもの（主マイク）である。この雑音重畳音声収録マイク６ａによって、発話者が発話する音声を収録（集音）する際に、収録される音声に雑音が重畳されることになる。この雑音は、当該発話者以外の別の発話者が発話した音声や、収録場所が室内であれば、物音、オフィス機器や空調機等による音声以外の騒音等、収録場所が室外であれば、車やバイク等の騒音等、様々な音源から発せられる、多種多様な音量、周波数の音である。 The noise superimposing voice recording microphone 6a is attached (installed) to a speaker who utters a voice (voice signal) that is a source of a voice spectrum to be estimated by the voice spectrum estimation device 1, and records a voice uttered by the speaker. (Main microphone). When the voice uttered by the speaker is recorded (sound collection) by the noise superimposing voice recording microphone 6a, noise is superimposed on the recorded voice. If the recording location is outside the room, such as the voice uttered by another speaker other than the speaker concerned, or the noise when the recording location is indoors, the noise other than the sound caused by sound, office equipment, air conditioners, etc. It is a sound of various volumes and frequencies that are emitted from various sound sources such as noise from cars and motorcycles.

なお、この雑音重畳収録マイク６ａは、可能な限り、発話者の発話する音声を良好に収録できる場所（例えば、発話者の、口元近く、胸元等）に設置することが好ましい。また、この雑音重畳音声収録マイク６ａは、発話者が発話する音声のみを収録できるように、無指向性マイクよりは指向性マイクの方が好ましい。 In addition, it is preferable that the noise superimposing recording microphone 6a is installed in a place (for example, near the mouth of the speaker, the chest, etc.) where the sound of the speaker can be recorded satisfactorily as much as possible. The noise-superimposed voice recording microphone 6a is preferably a directional microphone rather than an omnidirectional microphone so that only the voice uttered by the speaker can be recorded.

雑音収録マイク６ｂは、雑音（雑音信号）を発する音源付近に設置し、当該音源が発した雑音を収録するものである。この雑音収録マイク６ｂは、発話者が発話した音声をなるべく収録（検出）しない場所に設置することが好ましい。なお、この雑音収録マイク６ｂは、雑音重畳音声収録マイク６ａと同一のマイク特性を備えている方が好ましい。 The noise recording microphone 6b is installed in the vicinity of a sound source that generates noise (noise signal), and records noise generated by the sound source. The noise recording microphone 6b is preferably installed in a place where the voice spoken by the speaker is not recorded (detected) as much as possible. The noise recording microphone 6b preferably has the same microphone characteristics as the noise superimposed sound recording microphone 6a.

この実施の形態では、雑音重畳音声収録マイク６ａと雑音収録マイク６ｂとの２台のマイクが備えられているが、雑音重畳音声と雑音とが分離できるのであれば、１台のマイクであってもよい。なお、一般に、スペクトル・サブストラクション（スペクトル減算法）では、同一の場所（同一地点）で、雑音重畳音声と雑音とを収録することが理想とされているが、現実には不可能（雑音重畳音声と雑音との分離が困難）であるので、この音声スペクトル推定活用システムＳのスペクトル出力部２のように、異なる場所（地点）で収録している。そして、雑音重畳音声と雑音とを異なる場所で収録することによって生じる違い（伝達特性）を補正するために、補正装置１２（伝達特性補正部１２ｃ）が設けられている。 In this embodiment, two microphones, a noise superimposing voice recording microphone 6a and a noise recording microphone 6b, are provided. If the noise superimposing voice and noise can be separated, one microphone is used. Also good. In general, in spectral subtraction (spectral subtraction method), it is ideal to record noise-superimposed speech and noise at the same location (same location), but in reality it is impossible (noise superimposition) Since it is difficult to separate speech and noise), recording is performed at different places (points) as in the spectrum output unit 2 of the speech spectrum estimation and utilization system S. A correction device 12 (transfer characteristic correction unit 12c) is provided in order to correct a difference (transfer characteristic) caused by recording the noise-superimposed voice and the noise at different locations.

マイクアンプ８ａは、雑音重畳音声収録マイク６ａで収録（集音）し、電気信号となった雑音重畳音声（雑音重畳音声信号）の電圧を増幅して、スペクトル分析部１０ａに出力するものである。
マイクアンプ８ｂは、雑音収録マイク６ｂで収録（集音）し、電気信号となった雑音（雑音信号）の電圧を増幅して、スペクトル分析部１０ｂに出力するものである。 The microphone amplifier 8a records (collects) the noise-superimposed voice recording microphone 6a, amplifies the voltage of the noise-superimposed speech (noise-superimposed speech signal) that has become an electric signal, and outputs the amplified signal to the spectrum analyzing unit 10a. .
The microphone amplifier 8b records (collects sound) with the noise recording microphone 6b, amplifies the voltage of the noise (noise signal) that is an electric signal, and outputs the amplified voltage to the spectrum analysis unit 10b.

スペクトル分析部１０ａは、マイクアンプ８ａで電圧が増幅された雑音重畳音声（増幅雑音重畳音声信号）を周波数変換し、この周波数変換したスペクトル信号である雑音重畳音声スペクトルを、Ｓ／Ｎ比推定部１４と音声スペクトル推定装置１とに出力するものである。 The spectrum analysis unit 10a frequency-converts the noise-superimposed speech (amplified noise-superimposed speech signal) whose voltage is amplified by the microphone amplifier 8a, and converts the noise-superimposed speech spectrum that is the frequency-converted spectrum signal into an S / N ratio estimation unit. 14 and the speech spectrum estimation apparatus 1.

スペクトル分析部１０ｂは、マイクアンプ８ｂで電圧が増幅された雑音（増幅雑音信号）を周波数変換し、この周波数変換したスペクトル信号である雑音スペクトルを、補正装置１２に出力するものである。 The spectrum analyzing unit 10b performs frequency conversion on noise (amplified noise signal) whose voltage has been amplified by the microphone amplifier 8b, and outputs a noise spectrum, which is the frequency-converted spectrum signal, to the correction device 12.

補正装置１２は、スペクトル分析部１０ｂで周波数変換した雑音スペクトルを補正し、Ｓ／Ｎ比推定部１４と音声スペクトル推定装置１とに出力するもので、マイク特性補正部１２ａと、回線特性補正部１２ｂと、伝達特性補正部１２ｃとを備えている。この補正装置１２は、これらマイク特性補正部１２ａ、回線特性補正部１２ｂおよび伝達特性補正部１２ｃによって、マイク特性、回線特性および伝達特性が補正された結果に基づいて、雑音スペクトルを補正して、Ｓ／Ｎ比推定部１４と音声スペクトル推定装置１とに出力している。 The correction device 12 corrects the noise spectrum frequency-converted by the spectrum analysis unit 10b and outputs it to the S / N ratio estimation unit 14 and the voice spectrum estimation device 1, and includes a microphone characteristic correction unit 12a, a line characteristic correction unit, and the like. 12b and a transfer characteristic correction unit 12c. The correction device 12 corrects the noise spectrum based on the result of correcting the microphone characteristic, the line characteristic, and the transfer characteristic by the microphone characteristic correction unit 12a, the line characteristic correction unit 12b, and the transfer characteristic correction unit 12c. This is output to the S / N ratio estimation unit 14 and the speech spectrum estimation apparatus 1.

なお、この補正装置１２には、音声スペクトル推定活用システムＳの利用者が操作する操作手段（図示せず）によって、雑音重畳音声収録マイク６ａおよび雑音収録マイク６ｂのマイク特性と、雑音重畳音声が処理される電気回線（雑音重畳音声収録マイク６ａ、マイクアンプ８ａおよびスペクトル分析部１０ａが接続されている回線）および雑音が処理される電気回線（雑音収録マイク６ｂ、マイクアンプ８ｂおよびスペクトル分析部１０ｂが接続されている回線）の回線特性と、雑音重畳音声収録マイク６ａが設置されている空間および雑音収録マイク６ｂが設置されている空間の伝達特性（伝達関数）とが入力されている。 Note that the correction device 12 receives the microphone characteristics of the noise-superimposed voice recording microphone 6a and the noise-recording microphone 6b and the noise-superimposed voice by operating means (not shown) operated by the user of the voice spectrum estimation and utilization system S. The electrical line to be processed (the line to which the noise superimposing voice recording microphone 6a, the microphone amplifier 8a and the spectrum analyzing unit 10a are connected) and the electric line to be processed (the noise recording microphone 6b, the microphone amplifier 8b and the spectrum analyzing unit 10b) And the transfer characteristics (transfer function) of the space in which the noise-superimposed voice recording microphone 6a and the space in which the noise recording microphone 6b are installed are input.

マイク特性補正部１２ａは、雑音重畳音声収録マイク６ａのマイク特性と雑音収録マイク６ｂのマイク特性とがほぼ同一になるように、雑音収録マイク６ｂのマイク特性を補正するものである。 The microphone characteristic correcting unit 12a corrects the microphone characteristics of the noise recording microphone 6b so that the microphone characteristics of the noise-superimposed voice recording microphone 6a and the microphone characteristics of the noise recording microphone 6b are substantially the same.

回線特性補正部１２ｂは、雑音重畳音声が処理される電気回線の回線特性と、雑音が処理される電気回線の回線特性とがほぼ同一になるように、雑音が処理される電気回線の回線特性を補正するものである。 The line characteristic correcting unit 12b is configured to make the line characteristic of the electric line on which the noise is processed so that the line characteristic of the electric line on which the noise-superimposed voice is processed and the line characteristic of the electric line on which the noise is processed are substantially the same. Is to correct.

伝達特性補正部１２ｃは、雑音重畳音声収録マイク６ａが設置されている空間の伝達特性（伝達関数）と、雑音収録マイク６ｂが設置されている空間の伝達特性とがほぼ同一になるように、雑音収録マイク６ｂが設置されている空間の伝達特性を補正するものである。 The transfer characteristic correction unit 12c is configured so that the transfer characteristic (transfer function) of the space in which the noise-superimposed sound recording microphone 6a is installed and the transfer characteristic of the space in which the noise recording microphone 6b is installed are substantially the same. The transfer characteristic of the space where the noise recording microphone 6b is installed is corrected.

Ｓ／Ｎ比推定部１４は、スペクトル分析部１０ａから出力された雑音重畳音声スペクトルと、補正装置１２から出力された雑音スペクトルとに基づいて、信号対雑音比（Ｓ／Ｎ比）を推定して、音声スペクトル推定装置１に出力するものである。 The S / N ratio estimation unit 14 estimates a signal-to-noise ratio (S / N ratio) based on the noise superimposed speech spectrum output from the spectrum analysis unit 10a and the noise spectrum output from the correction device 12. Is output to the speech spectrum estimation apparatus 1.

信号対雑音比は、有効な信号と、雑音との割合（比率）を示す尺度であり、有効な信号の電力である信号電力が、雑音の電力である雑音電力を超過したデシベル数で表されるものである。 The signal-to-noise ratio is a measure of the ratio (ratio) between the effective signal and the noise. The signal power, which is the power of the effective signal, is expressed as the number of decibels that exceeds the noise power, which is the power of the noise. Is.

なお、このＳ／Ｎ比推定部１４は、スペクトル出力部２に入力される雑音重畳音声が、人工的に生成されたもの（音声に別の音［雑音］を人工的に重畳したもの）である場合、信号対雑音比は既知となるので、省略することが可能になる。 The S / N ratio estimator 14 is generated by artificially generating a noise-superimposed voice input to the spectrum output unit 2 (artificially superposing another sound [noise] on the voice). In some cases, the signal to noise ratio is known and can be omitted.

〔音声スペクトル活用部の構成〕
音声スペクトル活用部４は、音声スペクトル推定装置１から出力された音声スペクトルを活用（利用）するもので、音声認識装置４ａと、話者認識装置４ｂと、音声合成装置４ｃとを備えている。この実施の形態では、音声スペクトルを活用するのに、音声認識装置４ａ、話者認識装置４ｂおよび音声合成装置４ｃの３つの装置を備えているが、いずれか１つを備えていればよい。 [Configuration of voice spectrum utilization section]
The speech spectrum utilization unit 4 utilizes (utilizes) the speech spectrum output from the speech spectrum estimation device 1, and includes a speech recognition device 4a, a speaker recognition device 4b, and a speech synthesis device 4c. In this embodiment, in order to utilize the speech spectrum, the speech recognition device 4a, the speaker recognition device 4b, and the speech synthesis device 4c are provided, but any one of them may be provided.

音声認識装置４ａは、音声スペクトル推定装置１から出力された音声スペクトルを音声認識し、音声認識した結果であるテキストデータを出力するものである。つまり、この音声認識装置４ａは、音声スペクトルをテキストデータに変換するものであり、図示を省略した、音声スペクトルを所定の探索単位（音素等）に分割する分割手段、音声スペクトルとテキストデータとを対応付けた音声データベース等を備えている。 The speech recognition device 4a recognizes the speech spectrum output from the speech spectrum estimation device 1 and outputs text data that is the result of speech recognition. That is, the speech recognition apparatus 4a converts the speech spectrum into text data. The voice recognition device 4a, which is not shown, divides the speech spectrum into predetermined search units (phonemes and the like), the speech spectrum and the text data. An associated voice database is provided.

話者認識装置４ｂは、音声スペクトル推定装置１から出力された音声スペクトルについて、当該音声スペクトルの元となった雑音重畳音声を発話した発話者を認識（特定）するものである。この話者認識装置４ｂは、図示を省略した、複数の発話者の音声を収録した発話者音声データベース等を備えている。 The speaker recognizing device 4b recognizes (identifies) a speaker who has uttered a noise-superimposed speech that is a source of the speech spectrum for the speech spectrum output from the speech spectrum estimating device 1. The speaker recognition device 4b includes a speaker voice database and the like that record voices of a plurality of speakers, not shown.

音声合成装置４ｃは、テキストデータを音声合成し、合成音声として出力するもので、図示を省略した、音声合成用データベースを備えている。そして、この音声合成用データベースに、音声スペクトル推定装置１から出力された音声スペクトルが蓄積されることで、様々な合成音声を合成可能になる。 The speech synthesizer 4c synthesizes text data as speech and outputs it as synthesized speech, and includes a speech synthesis database (not shown). The speech spectrum output from the speech spectrum estimation apparatus 1 is stored in the speech synthesis database, so that various synthesized speech can be synthesized.

〔音声スペクトル推定装置の構成〕
音声スペクトル推定装置１は、スペクトル出力部２から出力された雑音重畳音声スペクトル、雑音スペクトルおよび信号対雑音比から、発話者が発話した音声（音声信号）の音声スペクトルを推定するもので、係数計算手段３と、第一項スペクトル計算手段５と、第二項スペクトル計算手段７と、スペクトル減算手段９とを備えている。なお、これらの各手段は、音声スペクトル推定装置１の主制御部（図示せず）に展開しているプログラムである。また、第一項スペクトル計算手段５、第二項スペクトル計算手段７およびスペクトル減算手段９が雑音スペクトル除去手段に相当している。 [Configuration of speech spectrum estimation device]
The speech spectrum estimation apparatus 1 estimates a speech spectrum of speech (speech signal) uttered by a speaker from a noise superimposed speech spectrum, a noise spectrum, and a signal-to-noise ratio output from the spectrum output unit 2, and calculates coefficients. Means 3, first term spectrum calculating means 5, second term spectrum calculating means 7, and spectrum subtracting means 9 are provided. Each of these means is a program developed in a main control unit (not shown) of the speech spectrum estimation apparatus 1. The first term spectrum calculating means 5, the second term spectrum calculating means 7 and the spectrum subtracting means 9 correspond to a noise spectrum removing means.

係数計算手段３は、スペクトル出力部２から出力された信号対雑音比に基づいて、次に示す数式（２）における第一項の係数と、第二項の係数とを計算するもので、第一項係数計算手段３ａと、第二項係数計算手段３ｂとを備えている。 The coefficient calculation means 3 calculates the coefficient of the first term and the coefficient of the second term in the following formula (2) based on the signal-to-noise ratio output from the spectrum output unit 2. One term coefficient calculating means 3a and second term coefficient calculating means 3b are provided.

この数式（２）において、ｒ_Sは音声スペクトル、ｒ_Xは雑音重畳音声スペクトル、ｒ_Nは雑音スペクトル、ｘは信号対雑音比（Ｓ／Ｎ比）、βは調整可能なパラメータである。また、α（ｘ）（第一項の係数）は、ｘの関数であり、次に示す数式（３）および数式（４）で定義される。 In Equation (2), r _S is a speech spectrum, r _X is a noise superimposed speech spectrum, r _N is a noise spectrum, x is a signal-to-noise ratio (S / N ratio), and β is an adjustable parameter. Α (x) (coefficient of the first term) is a function of x, and is defined by the following formulas (3) and (4).

ここで、α（ｘ）を図２に示す（適宜、図１参照）。この図２に示すように、α（ｘ）（太線）は、信号対雑音比ｘを横軸にとった場合、信号対雑音比ｘが１になるまでは急激に増加し、信号対雑音比が１を超えてからは緩やかに減少する関数である。
図１に戻って、音声スペクトル推定装置１の構成の説明を続ける。 Here, α (x) is shown in FIG. 2 (see FIG. 1 as appropriate). As shown in FIG. 2, α (x) (thick line) increases rapidly until the signal-to-noise ratio x becomes 1 when the signal-to-noise ratio x is taken on the horizontal axis. It is a function that gradually decreases after 1 exceeds 1.
Returning to FIG. 1, the description of the configuration of the speech spectrum estimation apparatus 1 will be continued.

第一項係数計算手段３ａは、数式（２）における第一項（雑音重畳音声スペクトルｒ_Xの項）の係数、つまり、数式（３）におけるα（ｘ）を計算するものである。なお、このα（ｘ）を計算する際に現れるＥ（ｋ）（数式（４））は、第２種完全楕円積分であり、背景技術で示した非特許文献２に記載されている数値計算法によって求めることができる。この第２種完全楕円積分を用いて、第一項の係数を計算することで、最も確からしい雑音重畳音声スペクトルを推定することで、最も確からしい音声スペクトルを推定するスペクトル減算法により、正確な音声スペクトルを推定することができる。 Paragraph coefficient calculating means 3a, the coefficient of the equation paragraph in (2) (section noisy speech spectrum r _X), that is, is to compute the alpha (x) in equation (3). It should be noted that E (k) (formula (4)) that appears when calculating this α (x) is the second type complete elliptic integral, and is a numerical calculation described in Non-Patent Document 2 shown in the background art. It can be determined by law. By calculating the coefficient of the first term using this type 2 perfect elliptic integral, the most probable noise superimposed speech spectrum is estimated, and the most probable speech spectrum is estimated by the spectral subtraction method. The speech spectrum can be estimated.

また、第一項係数計算手段３ａは、級数展開に基づいた多項式を用いて、第一項の係数を計算することができる。この級数展開は、任意のｘの周りのテーラー展開を用いて、無数に作成することができる。例えば、ｘ＝０の周りでの級数展開に基づいた多項式は、α（ｘ）＝１＋ｘ−ｘ²／４−ｘ³／４・・・数式（５）であり、楕円積分を含む関数のｘ＝０の周りの値の近似計算に用いることができる。 The first term coefficient calculating means 3a can calculate the coefficient of the first term using a polynomial based on series expansion. This series expansion can be created innumerably using a Taylor expansion around any x. For example, the polynomial based on the series expansion around x = 0 is α (x) = 1 + x -x 2/4-x 3/4 ··· Equation (5), x function containing elliptic integrals Can be used for approximate calculation of values around = 0.

この級数展開に基づく多項式は、無限種類作成することができ、例えば、α（ｘ）をｘ＝０の周りで級数展開すると、数式（５）のようになる。また、級数展開は、他の場所を中心とすることもできて、ちなみに、ｘ＝１の周りの級数展開は、α（ｘ）＝π／２＋（０．１９６３５１ｌｏｇ（ｘ−１）−０．３１０１２３）（ｘ−１）²＋（−０．１９６３５ｌｏｇ（ｘ−１）＋０．２１１９４８）（ｘ−１）³＋０（ｘ⁴）となる。 An infinite variety of polynomials based on this series expansion can be created. For example, when α (x) is series-expanded around x = 0, Expression (5) is obtained. The series expansion can also be centered at other locations, and the series expansion around x = 1 is α (x) = π / 2 + (0.196351 log (x−1) −0. 310123) (x−1) ² + (− 0.19635 log (x−1) +0.211948) (x−1) ³ +0 (x ⁴ ).

このように、展開する場所によって、異なる多項式が得られる。この級数展開に基づく多項式を具体的に、図２の（１）〜（３）に示す。ｘ＝０の周りで級数展開した場合を（１）のグラフが示しており、ｘ＝１の周りで級数展開した場合を（２）のグラフが示しており、ｘ＝∞の周りで級数展開した場合を（３）のグラフが示している。なお、これら（１）から（３）のグラフは、いずれも３次の項で展開を打ち切った場合のものである。 In this way, different polynomials are obtained depending on the development location. Specific polynomials based on this series expansion are shown in (1) to (3) of FIG. The graph of (1) shows the case of series expansion around x = 0, the graph of (2) shows the case of series expansion around x = 1, and series expansion around x = ∞ The case of (3) shows the case. These graphs (1) to (3) are obtained when the expansion is terminated in the third-order term.

この級数展開に基づく多項式を用いる方法は、どのように計算していいか直接にはわからない関数の計算法として一般的に知られており、関数を有効に近似できる範囲（収束半径）内で用いる。例えば、図２の例では、α（ｘ）は、ｘ＝０の周りで展開した多項式（ｘ≦０．５のとき）、ｘ＝１の周りで展開した多項式（０．５≦ｘ≦１．５のとき）、ｘ＝∞の周りで展開した多項式（１．５≦ｘのとき）とすることで、計算することができる。 This method of using a polynomial based on series expansion is generally known as a function calculation method that does not directly know how to calculate, and is used within a range (convergence radius) in which the function can be effectively approximated. For example, in the example of FIG. 2, α (x) is a polynomial expanded around x = 0 (when x ≦ 0.5), and a polynomial expanded around x = 1 (0.5 ≦ x ≦ 1). .5), it can be calculated by using a polynomial expanded around x = ∞ (when 1.5 ≦ x).

さらに、第一項係数計算手段３ａは、予め計算した雑音重畳音声スペクトルに係る数値表を図示を省略した記憶手段に記憶しており、この数値表を参照し、信号対雑音比ｘに応じて、当該数値表の数値を決定して、第一項の係数を計算することもできる（なお、数値表とは、図２を数値表にしたものである）。 Further, the first term coefficient calculation means 3a stores a numerical table relating to the noise superimposed speech spectrum calculated in advance in a storage means not shown, and refers to this numerical table in accordance with the signal-to-noise ratio x. The numerical value of the numerical table can be determined and the coefficient of the first term can be calculated (the numerical table is obtained by converting FIG. 2 into a numerical table).

第二項係数計算手段３ｂは、数式（２）における第二項（雑音スペクトルｒ_Nの項）の係数を計算するものである。この第二項の係数（１／（１＋β・ｘ））を計算する際には、信号対雑音比ｘに乗算される、調整可能なパラメータβを決定する必要がある。このパラメータβは、当該システムＳの利用者が任意に調整可能なものであり、例えば、事前に予備実験を行うことにより当該システムＳの性能を最適にするように決定することが可能である。 The second term coefficient calculation means 3b calculates the coefficient of the second term (the term of the noise spectrum r _N ) in Equation (2). In calculating this second term coefficient (1 / (1 + β · x)), it is necessary to determine an adjustable parameter β to be multiplied by the signal-to-noise ratio x. The parameter β can be arbitrarily adjusted by the user of the system S. For example, the parameter β can be determined so as to optimize the performance of the system S by performing a preliminary experiment in advance.

第一項スペクトル計算手段５は、スペクトル出力部２から出力された雑音重畳音声スペクトルｒ_Xに、係数計算手段３の第一項係数計算手段３ａによって計算された第一項の係数α（ｘ）を乗算して、数式（２）における第一項α（ｘ）ｒ_Xを計算して、スペクトル減算手段９に出力するものである。 The first term spectrum calculation means 5 applies the coefficient α (x) of the first term calculated by the first term coefficient calculation means 3 a of the coefficient calculation means 3 to the noise superimposed speech spectrum r _X output from the spectrum output unit 2. To calculate the first term α (x) r _X in the formula (2) and output it to the spectrum subtracting means 9.

第二項スペクトル計算手段７は、スペクトル出力部２から出力された雑音スペクトルｒ_Nに、係数計算手段３の第二項係数計算手段３ｂによって計算された第二項の係数を乗算して、数式（２）における第二項ｒ_N／（１＋β・ｘ）を計算して、スペクトル減算手段９に出力するものである。 The second term spectrum calculation means 7 multiplies the noise spectrum r _N output from the spectrum output unit 2 by the coefficient of the second term calculated by the second term coefficient calculation means 3b of the coefficient calculation means 3, The second term r _N / (1 + β · x) in (2) is calculated and output to the spectrum subtracting means 9.

スペクトル減算手段９は、第一項スペクトル計算手段５で計算された第一項α（ｘ）ｒ_Xから第二項スペクトル計算手段７で計算された第二項ｒ_N／（１＋β・ｘ）を減算して、音声スペクトルｒ_Sを求めて、音声スペクトル活用部４に出力するものである。 The spectrum subtracting means 9 calculates the second term r _N / (1 + β · x) calculated by the second term spectrum calculating means 7 from the first term α (x) r _X calculated by the first term spectrum calculating means 5. By subtracting, the speech spectrum r _S is obtained and output to the speech spectrum utilization unit 4.

ここで、スペクトル減算手段９から出力された音声スペクトルｒ_Sが音声スペクトル活用部４の音声認識装置４ａに出力された場合の、単語正解精度について、図３を参照して説明する。図３は、スペクトルサブストラクションなし（スペクトル減算法を用いずに推定した音声スペクトルを使用した場合）と、従来法（従来のスペクトル減算法による音声スペクトルを使用した場合）と、提案法（音声スペクトル推定装置１によって推定した音声スペクトルを使用した場合）とについて、信号対雑音比（Ｓ／Ｎ比）と単語正解精度との関係を示した図である。 Here, the word correct accuracy when the speech spectrum r _S output from the spectrum subtraction means 9 is output to the speech recognition device 4a of the speech spectrum utilization unit 4 will be described with reference to FIG. FIG. 3 shows the case where there is no spectral subtraction (when the estimated speech spectrum is used without using the spectral subtraction method), the conventional method (when the conventional speech subtraction method is used), and the proposed method (speech spectrum). It is the figure which showed the relationship between a signal-to-noise ratio (S / N ratio) and word correct accuracy about the case where the speech spectrum estimated by the estimation apparatus 1 is used.

この図３から、どの信号対雑音比であっても、スペクトルサブストラクションなしおよび従来法に比べて、提案法の方が、単語正解精度が高いことがわかる。 As can be seen from FIG. 3, the word accuracy of the proposed method is higher than that of the conventional method without spectral subtraction at any signal-to-noise ratio.

この音声スペクトル推定装置１によれば、雑音が存在する雑音環境下で取得された雑音重畳音声を周波数変換した雑音重畳音声スペクトルｒ_Xおよび雑音を周波数変換した雑音スペクトルｒ_Nから推定される音声スペクトルｒ_Sの信号対雑音比（Ｓ／Ｎ比）を改善することができる。その結果、音声スペクトル活用部４において、音声認識率、話者認識率および合成音声の音質を向上させることができる。 According to the speech spectrum estimation apparatus 1, sound is estimated noisy speech spectrum r _X and noise the noisy speech obtained in noisy environments and frequency conversion the presence of noise from the noise spectrum r _N obtained by frequency transform spectrum The signal-to-noise ratio (S / N ratio) of r _S can be improved. As a result, the speech spectrum utilization unit 4 can improve the speech recognition rate, the speaker recognition rate, and the sound quality of the synthesized speech.

〈音声スペクトル推定活用システム（音声スペクトル推定装置）の動作〉
次に、図４に示すフローチャートを参照して、音声スペクトル推定活用システムＳ（音声スペクトル推定装置１）の動作を説明する（適宜、図１参照）。
まず、音声スペクトル推定活用システムＳは、スペクトル出力部２の雑音重畳音声収録マイク６ａおよび雑音収録マイク６ｂによって、雑音重畳音声および雑音を収録（集音）する（ステップＳ１）。続いて、音声スペクトル推定活用システムＳは、スペクトル出力部２のマイクアンプ８ａおよびマイクアンプ８ｂによって、電気信号となった雑音重畳音声および雑音の電圧を増幅し、スペクトル分析部１０ａおよびスペクトル分析部１０ｂによって、周波数変換（スペクトル分析）する（ステップＳ２）。 <Operation of speech spectrum estimation and utilization system (speech spectrum estimation device)>
Next, the operation of the speech spectrum estimation and utilization system S (speech spectrum estimation apparatus 1) will be described with reference to the flowchart shown in FIG. 4 (see FIG. 1 as appropriate).
First, the speech spectrum estimation and utilization system S records (collects) noise superimposed speech and noise by the noise superimposed speech recording microphone 6a and the noise recording microphone 6b of the spectrum output unit 2 (step S1). Subsequently, the speech spectrum estimation and utilization system S amplifies the noise-superimposed speech and the noise voltage that have become electrical signals by the microphone amplifier 8a and the microphone amplifier 8b of the spectrum output unit 2, and the spectrum analysis unit 10a and the spectrum analysis unit 10b. Thus, frequency conversion (spectrum analysis) is performed (step S2).

そして、音声スペクトル推定活用システムＳは、スペクトル出力部２の補正装置１２によって、雑音スペクトルを補正し、Ｓ／Ｎ比推定部１４によって、信号対雑音比（Ｓ／Ｎ比）を推定する（ステップＳ３）。そうしてから、音声スペクトル推定活用システムＳのスペクトル出力部２は、雑音重畳音声スペクトル、雑音スペクトルおよび信号対雑音比を音声スペクトル推定装置１に出力する。 Then, the speech spectrum estimation utilization system S corrects the noise spectrum by the correction device 12 of the spectrum output unit 2, and estimates the signal-to-noise ratio (S / N ratio) by the S / N ratio estimation unit 14 (step). S3). After that, the spectrum output unit 2 of the speech spectrum estimation utilization system S outputs the noise superimposed speech spectrum, the noise spectrum, and the signal-to-noise ratio to the speech spectrum estimation apparatus 1.

そうすると、音声スペクトル推定装置１は、係数計算手段３の第一項係数計算手段３ａおよび第二項係数計算手段３ｂによって、第一項係数および第二項係数を計算する（ステップＳ４）。計算した第一項係数は第一項スペクトル計算手段５に、計算した第二項係数は第二項スペクトル計算手段７にそれぞれ出力される。そして、音声スペクトル推定装置１は、第一項スペクトル計算手段５および第二スペクトル計算手段７によって、第一項スペクトルおよび第二項スペクトルを計算する（ステップＳ５）。 Then, the speech spectrum estimating apparatus 1 calculates the first term coefficient and the second term coefficient by the first term coefficient calculating means 3a and the second term coefficient calculating means 3b of the coefficient calculating means 3 (step S4). The calculated first term coefficient is output to the first term spectrum calculating means 5, and the calculated second term coefficient is output to the second term spectrum calculating means 7. Then, the speech spectrum estimation apparatus 1 calculates the first term spectrum and the second term spectrum by the first term spectrum calculation means 5 and the second spectrum calculation means 7 (step S5).

それから、音声スペクトル推定装置１は、スペクトル減算手段９によって、第一項スペクトル計算手段５で計算された第一項スペクトルから、第二項スペクトル計算手段７で計算された第二項スペクトルを減算して、音声スペクトルを推定し、音声スペクトル活用部４に出力する（ステップＳ６）。その後、音声スペクトル活用部４の音声認識装置４ａ、話者認識装置４ｂおよび音声合成装置４ｃにおいて、信号対雑音比が改善された音声スペクトルが活用される。 Then, the speech spectrum estimation apparatus 1 subtracts the second term spectrum calculated by the second term spectrum calculation means 7 from the first term spectrum calculated by the first term spectrum calculation means 5 by the spectrum subtraction means 9. The speech spectrum is estimated and output to the speech spectrum utilization unit 4 (step S6). Thereafter, in the speech recognition device 4a, the speaker recognition device 4b, and the speech synthesizer 4c of the speech spectrum utilization unit 4, a speech spectrum with an improved signal-to-noise ratio is utilized.

以上、本発明の実施形態について説明したが、本発明は前記実施形態には限定されない。例えば、本実施形態では、音声スペクトル推定装置１として説明したが、当該装置１の各構成の処理を１つずつの過程とみなした音声スペクトル推定方法と捉えることも可能であり、１つずつの処理を汎用的または特殊なコンピュータ言語で記述した音声スペクトル推定プログラムと捉えることも可能である。この場合、音声スペクトル推定装置１と同様の効果を得ることができる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment. For example, in the present embodiment, the speech spectrum estimation device 1 has been described. However, the processing of each component of the device 1 can be regarded as a speech spectrum estimation method that is regarded as a single process. The processing can be regarded as a speech spectrum estimation program described in a general-purpose or special computer language. In this case, the same effect as the speech spectrum estimation apparatus 1 can be obtained.

本発明の実施形態に係る音声スペクトル推定活用システム（音声スペクトル推定装置を包含）のブロック図である。1 is a block diagram of a speech spectrum estimation and utilization system (including a speech spectrum estimation device) according to an embodiment of the present invention. 音声スペクトルを推定する推定式における第一項の係数である関数を示した図である。It is the figure which showed the function which is a coefficient of the 1st term in the estimation formula which estimates an audio | voice spectrum. 本発明の実施形態における効果と、従来の方法による効果とを比較した図である。It is the figure which compared the effect in embodiment of this invention, and the effect by the conventional method. 図１に示した音声スペクトル推定活用システム（音声スペクトル推定装置を包含）の動作を説明したフローチャートである。It is the flowchart explaining operation | movement of the speech spectrum estimation utilization system (a speech spectrum estimation apparatus is included) shown in FIG. 従来の音声スペクトル推定装置のブロック図である。It is a block diagram of the conventional audio | voice spectrum estimation apparatus.

Explanation of symbols

１音声スペクトル推定装置
３係数計算手段
３ａ第一項係数計算手段
３ｂ第二項係数計算手段
５第一項スペクトル計算手段（雑音スペクトル除去手段）
７第二項スペクトル計算手段（雑音スペクトル除去手段）
９スペクトル減算手段（雑音スペクトル除去手段） DESCRIPTION OF SYMBOLS 1 Speech spectrum estimation apparatus 3 Coefficient calculating means 3a First term coefficient calculating means 3b Second term coefficient calculating means 5 First term spectrum calculating means (noise spectrum removing means)
7 Second term spectrum calculation means (noise spectrum removal means)
9 Spectrum subtraction means (noise spectrum removal means)

Claims

In a speech spectrum estimation device that estimates the speech spectrum from a noise-superimposed speech spectrum in which a noise spectrum is preliminarily superimposed on the speech spectrum,
Based on the signal-to-noise ratio x, the noise spectral coefficients indicating the proportion of the noisy speech spectrum r noisy speech spectrum coefficient representing a ratio of _X alpha (x) and the noise spectrum r _N in the following equation (2) ( Coefficient calculating means for calculating 1 / (1 + β · x)) ;
Based on the the noisy speech spectrum coefficients calculated by this coefficient calculation unit alpha (x) and noise spectrum coefficient (1 / (1 + β · x)), first paragraph (1) alpha (x) r _X in the equation (2) Noise spectrum removing means for subtracting binomial r _N / (1 + β · x) to remove the noise spectrum from the noise-superimposed speech spectrum ;
The coefficient calculation means includes
A first term coefficient calculating means for calculating the noise-superimposed speech spectrum coefficient α (x) using the following formula (3) and formula (4) that is a second-type complete elliptic integral;
Second term coefficient calculation means for calculating the noise spectrum coefficient (1 / (1 + β · x)) including a predetermined parameter β;
Sound spectrum estimation apparatus comprising: a.

The coefficient calculation means, in calculating the noisy speech spectrum coefficients, the polynomial based on the series expansion of the function containing the elliptic integral, claim 1, characterized in that approximates the function containing the elliptic integrals The speech spectrum estimation apparatus according to 1.

In order to estimate the speech spectrum from the noise-superimposed speech spectrum in which the noise spectrum is preliminarily superimposed on the speech spectrum,
Based on the signal-to-noise ratio x, the noise spectral coefficients indicating the proportion of the noisy speech spectrum r noisy speech spectrum coefficient representing a ratio of _X alpha (x) and the noise spectrum r _N in the following equation (2) ( Coefficient calculating means for calculating 1 / (1 + β · x)) ,
Based on the the noisy speech spectrum coefficients calculated by this coefficient calculation unit alpha (x) and noise spectrum coefficient (1 / (1 + β · x)), first paragraph (1) alpha (x) r _X in the equation (2) Subtracting the binomial r _N / (1 + β · x) to function as noise spectrum removing means for removing the noise spectrum from the noise-superimposed speech spectrum ;
The coefficient calculation means is
A first term coefficient calculating means for calculating the noise-superimposed speech spectrum coefficient α (x) using the following formula (3) and formula (4) which is a second elliptic complete elliptic integral;
A second term coefficient calculation means for calculating the noise spectrum coefficient (1 / (1 + β · x)) including a predetermined parameter β;
A speech spectrum estimation program characterized by functioning as