JP4173462B2

JP4173462B2 - Microphone position determination method, microphone position determination device, microphone position determination program

Info

Publication number: JP4173462B2
Application number: JP2004120377A
Authority: JP
Inventors: 哲小橋川; 敏高橋
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-04-15
Filing date: 2004-04-15
Publication date: 2008-10-29
Anticipated expiration: 2024-04-15
Also published as: JP2005303898A

Description

本発明は、音声認識に用いるマイクの位置を設定する際の最適なマイク位置を決めるマイク位置決定方法、マイク位置決定装置、マイク位置決定プログラムに関する。 The present invention relates to a microphone position determination method, a microphone position determination apparatus, and a microphone position determination program for determining an optimal microphone position when setting the position of a microphone used for speech recognition.

従来、最適なマイク位置を決めるためには、話者位置に置いたスピーカからマイクまでの空間伝達特性、すなわちインパルス応答及び、利用環境においてマイクに混入する背景雑音を収録し、雑音の無いクリーンな環境で、かつ話者の口とマイクの位置が近い環境で収録された接話クリーン音声に対して、各マイク位置で測定したインパルス応答を畳み込み、収録した雑音を重畳することによって得られた各マイク位置の収録音声を模擬した音声に対して、実際に音声認識実験を行い、最も認識性能の高いマイク位置を最適なマイク位置としていた。 Conventionally, in order to determine the optimum microphone position, the spatial transfer characteristics from the speaker to the microphone placed at the speaker position, that is, the impulse response and background noise mixed in the microphone in the usage environment are recorded, and the noise is clean. Each speech obtained by convolving the impulse response measured at each microphone position and superimposing the recorded noise on the close-talked clean speech recorded in an environment where the speaker's mouth and microphone are close to each other A voice recognition experiment was actually performed on the voice simulating the recorded voice at the microphone position, and the microphone position with the highest recognition performance was set as the optimum microphone position.

模擬音声を作成し、実際に認識実験により最適なマイク位置を決定する方法では、評価用の接話クリーン音声を作成したり、インパルス応答を畳み込み、収録した雑音を重畳した音声データを作成したり、実際に音声認識実験を行う時間やコスト、データ格納領域が必要となる。
本発明は、上記を鑑みてなされたもので、その目的とするところは、測定したインパルス応答及び収録した背景雑音から各マイク位置での模擬音声を作成し、実際に認識実験を行うのでは無く、インパルス応答及び背景雑音のデータから最適なマイク位置を決定することができる装置を提供することである。 In the method of creating simulated speech and determining the optimal microphone position through actual recognition experiments, you can create close speech speech for evaluation, or create speech data that convolves the impulse response and superimposes the recorded noise. The time, cost, and data storage area for actual speech recognition experiments are required.
The present invention has been made in view of the above. The purpose of the present invention is not to create a simulated voice at each microphone position from the measured impulse response and recorded background noise, and to actually perform a recognition experiment. Another object of the present invention is to provide an apparatus capable of determining an optimum microphone position from impulse response and background noise data.

上記目的を達成するために、請求項１で提案する本発明は、音声認識に用いる最適なマイク位置を決めるためのマイク位置決定方法において、マイク位置を順次移動させるか又は複数のマイクを切替えて各マイク位置におけるインパルス応答の直接波レベルと、反射波レベルの比を比べ、その比が最も大きいマイク位置を最適マイク位置とすることを要旨とする。
請求項１で提案する本発明にあっては、各マイク位置におけるインパルス応答を測定する。測定したインパルス応答のデータから図１に示すように測定したインパルス応答の振幅の絶対値レベルの最大値、又は最大値の時刻周辺の時刻の絶対値レベルの和をＤ（直接波レベル）とする。測定したインパルス応答のデータから図１に示すように直接波レベル以外の時刻の絶対値レベルの和をＲ（反射波レベル）とする。Ｄ／Ｒの比が最も大きいマイク位置を最適なマイク位置とする。 In order to achieve the above object, the present invention proposed in claim 1 is a microphone position determination method for determining an optimum microphone position used for speech recognition. In the microphone position determination method, a microphone position is moved sequentially or a plurality of microphones are switched. The gist is to compare the ratio of the direct wave level of the impulse response at each microphone position and the reflected wave level, and to determine the microphone position having the largest ratio as the optimum microphone position.
In the present invention proposed in claim 1, the impulse response at each microphone position is measured. As shown in FIG. 1, from the measured impulse response data, the maximum value of the absolute value level of the amplitude of the measured impulse response, or the sum of the absolute value levels of the time around the time of the maximum value is defined as D (direct wave level). . From the measured impulse response data, the sum of absolute value levels at times other than the direct wave level is R (reflected wave level) as shown in FIG. The microphone position having the largest D / R ratio is set as the optimum microphone position.

請求項２で提案する本発明は、音声認識に用いる最適なマイク位置を決めるためのマイク位置決定方法において、マイク位置を順次移動させるか又は複数のマイクを切替えて各マイク位置におけるインパルス応答の振幅の絶対値レベルの最大値と、インパルス応答のデータから音声認識に用いる分析フレームの幅を超えた区間のレベルの比を比べ、その比が最も大きいマイク位置を最適マイク位置とすることを要旨とする。
請求項２記載の本発明にあっては、各マイク位置におけるインパルス応答を測定する。測定したインパルス応答のデータから図２に示すように測定したインパルス応答の振幅の絶対値レベルの最大値、又は最大値の時刻周辺の時刻の絶対値レベルの和をＤ（直接波レベル）とする。測定したインパルス応答のデータから図２に示すようにインパルス応答のデータから直接波の時刻から音声認識に用いる分析フレームを超えた区間の振幅の絶対値レベルの和をＲ（反射波レベル）とする。Ｄ／Ｒの比が最も大きいマイク位置を最適なマイク位置とする。例えば、音声認識によく用いられるケプストラムやＭＦＣＣのような対数スペクトルに関連した特徴パラメータを用いて認識を行なう場合、その長時間平均を減算することにより、請求項２において無視したフレーム内に収まる伝達特性（インパルス応答）の影響を低減することが可能である。 According to a second aspect of the present invention, there is provided a microphone position determining method for determining an optimum microphone position used for speech recognition, wherein the microphone position is sequentially moved or a plurality of microphones are switched to change the amplitude of an impulse response at each microphone position. The sum of the absolute value level and the ratio of the level of the section exceeding the width of the analysis frame used for speech recognition from the impulse response data is compared, and the microphone position with the largest ratio is set as the optimum microphone position. To do.
According to the present invention, the impulse response at each microphone position is measured. From the measured impulse response data, the maximum value of the absolute value level of the amplitude of the measured impulse response as shown in FIG. 2, or the sum of the absolute value levels of the time around the time of the maximum value is D (direct wave level). . As shown in FIG. 2, from the measured impulse response data, the sum of the absolute value levels of the amplitudes from the time of the direct wave to the analysis frame used for speech recognition from the impulse response data is defined as R (reflected wave level). . The microphone position having the largest D / R ratio is set as the optimum microphone position. For example, when recognition is performed using a characteristic parameter related to a logarithmic spectrum, such as a cepstrum or MFCC, which is often used for speech recognition, transmission that falls within the frame ignored in claim 2 by subtracting the long-time average. It is possible to reduce the influence of the characteristic (impulse response).

請求項３で提案する本発明は、音声認識に用いる最適なマイク位置を決めるためのマイク位置決定方法において、請求項１及び請求項２記載のインパルス応答の振幅の直接波レベルと反射レベルの比に基づき、その比が大きい程最適とする判定に加え、各マイク位置に混入する背景雑音のレベルが低い程最適とする判定を加え、両判定結果が最も良好なものを最適なマイク位置とすることを要旨とする。
請求項３で提案する本発明にあっては、各マイク位置におけるインパルス応答を測定する。測定したインパルス応答の振幅の絶対値レベルの最大値、又は最大値の時刻周辺の時刻の絶対値レベルの和をＤ（直接波レベル）とする。前述の請求項１及び請求項２の方法で決めた反射レベルをＲとする。各マイク位置で収録した雑音の振幅の絶対値の平均レベルをＮ（背景雑音レベル）とする。そして、直接波レベルと反射波レベルの比Ｄ／Ｒ、及び直接波レベルと背景雑音レベルの比Ｄ／Ｎの和が最も大きくなるようなマイク位置を最適なマイク位置とする。直接波レベルの大きさは、マイクに入力する音声のパワーレベルを示しており、直接波レベルと背景雑音レベルの比Ｄ／Ｎは、音声パワーレベルと背景雑音パワーレベル比であるＳ／Ｎと相関が高い。Ｓ／Ｎの大小は、音声認識の性能に多大な影響を与えるため、Ｄ／Ｎを知ることで認識性能の推定が可能である。 According to a third aspect of the present invention, there is provided a microphone position determining method for determining an optimum microphone position used for speech recognition, wherein the ratio of the direct wave level and the reflection level of the amplitude of the impulse response according to the first and second aspects. Based on the above, in addition to the determination that is optimal as the ratio is large, the determination that is optimal is performed as the background noise level mixed in each microphone position is low, and the best microphone position is determined with the best result of both determinations. This is the gist.
In the present invention proposed in claim 3, the impulse response at each microphone position is measured. The maximum value of the absolute value level of the amplitude of the measured impulse response or the sum of the absolute value levels of the time around the time of the maximum value is defined as D (direct wave level). Let R be the reflection level determined by the method of claims 1 and 2 described above. Let N (background noise level) be the average level of the absolute value of the amplitude of noise recorded at each microphone position. Then, the microphone position at which the sum of the ratio D / R between the direct wave level and the reflected wave level and the ratio D / N between the direct wave level and the background noise level is maximized is determined as the optimum microphone position. The magnitude of the direct wave level indicates the power level of the voice input to the microphone, and the ratio D / N between the direct wave level and the background noise level is S / N, which is the ratio of the voice power level to the background noise power level. Correlation is high. Since the magnitude of S / N greatly affects the performance of speech recognition, the recognition performance can be estimated by knowing the D / N.

本発明によれば、各マイク位置における背景雑音の収録、インパルス応答の測定をするだけで、測定したインパルス応答を畳み込み背景雑音を重畳した模擬音声の作成から音声認識実験を行うことなく、最適なマイク位置を決定することができ、処理時間や計算コストを削減することが可能である。 According to the present invention, recording of background noise at each microphone position, measurement of impulse response, and convolution of the measured impulse response without the need for voice recognition experiments from the creation of a simulated voice superimposed with background noise. The microphone position can be determined, and the processing time and calculation cost can be reduced.

以下、図面を用いて本発明の実施の形態について説明する。
図３は、本発明の請求項１及び２で提案する最適マイク位置決定装置のブロック図である。図３に示す最適マイク位置決定装置は、各マイク位置において測定したインパルス応答から得られる反射波のレベルＲと、直接波のレベルＤを比較することにより、最適なマイク位置を決定することを特徴とするものであり、評価関数（例えば、式（１））により各マイク位置を評価し、最適なマイク位置を決定する。
Ｑ＝Ｄ／Ｒ ………（１）
具体的に説明すると、図３に示すように本実施形態の最適マイク位置決定装置は、収録ゲイン調整モジュール１００と、インパルス応答測定モジュール２００と、評価関数計算モジュール３００と、最適マイク位置決定モジュール４００とによって構成される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 3 is a block diagram of the optimum microphone position determination apparatus proposed in claims 1 and 2 of the present invention. The optimum microphone position determination apparatus shown in FIG. 3 determines the optimum microphone position by comparing the level R of the reflected wave obtained from the impulse response measured at each microphone position with the level D of the direct wave. Each microphone position is evaluated by an evaluation function (for example, Expression (1)), and an optimum microphone position is determined.
Q = D / R (1)
Specifically, as shown in FIG. 3, the optimum microphone position determination apparatus of the present embodiment includes a recording gain adjustment module 100, an impulse response measurement module 200, an evaluation function calculation module 300, and an optimum microphone position determination module 400. It is comprised by.

以上のように構成される最適マイク位置決定装置においては、先ず初期設定として、収録ゲインの調整を行う。図３の収録ゲイン調整モジュール１００では、図４に示すように、基準信号メモリ１０１から基準信号を取り出し、収録ゲイン調整部１０２でゲインを調整し、Ｄ／Ａ変換部１０３でアナログ信号に変換し、話者模擬スピーカＳＰから再生、音声認識用マイクＭで収録し、収録した信号をＡ／Ｄ変換部１０４でディジタル信号に変換し、収録信号パワーレベル計算部１０５で収録信号のパワーレベルを計算し、収録信号パワーレベル判定部１０６で収録した信号のレベルが適正範囲に収まっているか否かを判定し、適正範囲内である場合は、レベル判定スイッチ１０７を適正レベル端子１０８側にし、基準信号パワーレベルメモリ１１０に収録信号のパワーレベルを格納する。適正範囲外であればレベル判定スイッチ１０７を不適正レベル端子１０９側にし、収録ゲイン調整部１０２で収録ゲインを調整し、再度基準信号の再生収録を行なう。ここで用いる基準信号は、音声の一文あるいは複数分でも良く、話者模擬スピーカＳＰからの再生レベルが通常の話者の音量レベルと同等にしておく。以上により収録ゲインの初期設定が完了する。 In the optimum microphone position determination apparatus configured as described above, the recording gain is first adjusted as an initial setting. In the recording gain adjustment module 100 of FIG. 3, as shown in FIG. 4, the reference signal is taken out from the reference signal memory 101, the gain is adjusted by the recording gain adjustment unit 102, and converted into an analog signal by the D / A conversion unit 103. , Reproduced from the speaker simulation speaker SP, recorded by the microphone M for voice recognition, the recorded signal is converted into a digital signal by the A / D converter 104, and the power level of the recorded signal is calculated by the recorded signal power level calculator 105 Then, it is determined whether or not the level of the signal recorded by the recording signal power level determination unit 106 is within an appropriate range. If the level is within the appropriate range, the level determination switch 107 is set to the appropriate level terminal 108 side, and the reference signal The power level of the recorded signal is stored in the power level memory 110. If it is out of the proper range, the level judgment switch 107 is set to the inappropriate level terminal 109 side, the recording gain is adjusted by the recording gain adjusting unit 102, and the reference signal is reproduced and recorded again. The reference signal used here may be one sentence or a plurality of voices, and the reproduction level from the speaker simulation speaker SP is set to be equal to the volume level of a normal speaker. This completes the initial setting of the recording gain.

収録ゲイン調整部１０２の初期設定が終了後、図３のインパルス応答測定モジュール２００でインパルス応答の測定を行う。図３のインパルス応答測定モジュール２００では、図５に示すように、インパルス応答測定用信号メモリ２０１に保存してあるインパルス応答測定用信号を初期設定された収録ゲイン調整部１０２で調整し、Ｄ／Ａ変換部１０３でアナログ信号に変換し、話者模擬スピーカＳＰから再生する。このとき、インパルス応答測定用信号メモリ２０１に格納されているインパルス応答測定用信号のゲインは、前記基準信号メモリ１０１に格納されている基準信号のゲインと同一である。 After the initial setting of the recording gain adjustment unit 102 is completed, the impulse response is measured by the impulse response measurement module 200 of FIG. In the impulse response measurement module 200 of FIG. 3, as shown in FIG. 5, the impulse response measurement signal stored in the impulse response measurement signal memory 201 is adjusted by the initial recording gain adjustment unit 102, and the D / D The signal is converted into an analog signal by the A conversion unit 103 and reproduced from the speaker simulation speaker SP. At this time, the gain of the impulse response measurement signal stored in the impulse response measurement signal memory 201 is the same as the gain of the reference signal stored in the reference signal memory 101.

従って、話者模擬スピーカＳＰから放音されるインパルス応答測定用信号の音圧は基準信号メモリ１０１から読み出された基準信号の音圧と同一となる。
話者模擬スピーカＳＰから再生したインパルス応答測定用信号は、音声認識用マイクＭで収録されて、Ａ／Ｄ変換部２０２でディジタル信号に変換され、インパルス応答計算部２０３でインパルス応答に変換され、インパルス応答メモリ２０４に格納される。
そして、図３の評価関数計算モジュール３００では、測定したインパルス応答からマイク位置の最適度を示す評価関数値を計算する。図３の評価関数計算モジュール３００では、図６に示すように、測定したインパルス応答が格納されているインパルス応答メモリ２０４から読み出したインパルス応答の測定値から直接波レベルＤを直接波レベル計算手段３０１で計算し、反射波レベル計算手段３０２で反射波レベルＲを計算する。 Therefore, the sound pressure of the impulse response measurement signal emitted from the speaker simulation speaker SP is the same as the sound pressure of the reference signal read from the reference signal memory 101.
The impulse response measurement signal reproduced from the speaker simulation speaker SP is recorded by the voice recognition microphone M, converted into a digital signal by the A / D converter 202, converted into an impulse response by the impulse response calculator 203, Stored in the impulse response memory 204.
Then, the evaluation function calculation module 300 in FIG. 3 calculates an evaluation function value indicating the optimum degree of microphone position from the measured impulse response. In the evaluation function calculation module 300 of FIG. 3, as shown in FIG. 6, the direct wave level calculation means 301 calculates the direct wave level D from the measured value of the impulse response read from the impulse response memory 204 in which the measured impulse response is stored. The reflected wave level R is calculated by the reflected wave level calculation means 302.

ここで、この発明の請求項１及び４で提案するマイク位置決定方法及びマイク位置決定装置では直接波レベルを図１に示したインパルス応答の振幅の最大値の時刻の波又は、その周辺の時刻の波を直接波として計算し、直接波以外の時刻の波を反射波として反射波レベルＲを計算する。
これに対して、この発明の請求項２及び５で提案するマイク位置決定方法及びマイク位置決定装置では直接波レベルＤに関しては請求項１及び４と同じ計算方法を採るが、反射波レベルＲに関しては図２に示すようにインパルス応答の直接波を示す時刻から音声認識に用いる分析フレーム幅（数１０ｍｓ）を越える時刻以降を反射波として反射レベルＲを計算する。 Here, in the microphone position determination method and the microphone position determination device proposed in claims 1 and 4 of the present invention, the direct wave level is the time wave of the maximum value of the amplitude of the impulse response shown in FIG. Is calculated as a direct wave, and a reflected wave level R is calculated using a wave at a time other than the direct wave as a reflected wave.
On the other hand, the microphone position determination method and the microphone position determination device proposed in claims 2 and 5 of the present invention employ the same calculation method as in claims 1 and 4 with respect to the direct wave level D, but with respect to the reflected wave level R. As shown in FIG. 2, the reflection level R is calculated by using the reflected wave as the reflected wave after the time exceeding the analysis frame width (several tens of ms) used for speech recognition from the time indicating the direct wave of the impulse response.

算出された直接波レベルＤと反射波レベルＲに基づいて評価関数計算部３０３は評価関数を、例えば式（１）により計算し、その計算結果を最適マイク位置決定モジュール４００に入力する。
最適マイク位置決定モジュール４００は図６に示すように、最大値判定スイッチ４０１と、最適マイク位置候補メモリ４０４と、評価関数最大値メモリ４０５と、非最適マイク位置候補メモリ４０６とによって構成される。
最大値判定スイッチ４０１では得られた評価関数値を評価関数最大値メモリ４０５に記録してある関数値と比較し、他のマイク位置で測定された評価関数値よりも大きいか否かの判定を行い、大きい場合はスイッチを最大値端子４０２側にし、評価関数の最大値を更新し、評価関数最大値メモリ４０５に格納している値を書き替え、最適マイク位置を更新し、最適マイク位置候補メモリ４０４の値を当該マイク位置のＩＤ（マイク位置を表わす番号）に書き換える。 Based on the calculated direct wave level D and reflected wave level R, the evaluation function calculation unit 303 calculates an evaluation function using, for example, Expression (1), and inputs the calculation result to the optimum microphone position determination module 400.
As shown in FIG. 6, the optimal microphone position determination module 400 includes a maximum value determination switch 401, an optimal microphone position candidate memory 404, an evaluation function maximum value memory 405, and a non-optimal microphone position candidate memory 406.
The maximum value determination switch 401 compares the obtained evaluation function value with the function value recorded in the evaluation function maximum value memory 405, and determines whether or not the evaluation function value is larger than the evaluation function value measured at another microphone position. If it is larger, the switch is set to the maximum value terminal 402 side, the maximum value of the evaluation function is updated, the value stored in the evaluation function maximum value memory 405 is rewritten, the optimal microphone position is updated, and the optimal microphone position candidate The value in the memory 404 is rewritten with the ID (number representing the microphone position) of the microphone position.

評価関数値が他のマイク位置で測定された評価関数値よりも小さい場合は、スイッチ４０１を非最大値端子４０３側にし、非最適マイク位置候補メモリ４０６に当該マイク位置のＩＤを加える。全てのマイク位置における評価が終った段階で、最適マイク位置候補メモリ４０４に格納されているマイク位置のＩＤにより、最適マイク位置が決定される。
図７はこの発明の請求項３及び６で提案するマイク位置決定方法を実現するためのマイク位置決定装置の全体の構成を示す。図３に示したマイク位置決定装置と異なる構成は収録ゲイン調整モジュール１００とインパルス応答測定モジュール２００との間に、雑音レベル比調整モジュール５０と、雑音収録モジュール６０を追加した点である。 When the evaluation function value is smaller than the evaluation function value measured at another microphone position, the switch 401 is set to the non-maximum value terminal 403 side, and the ID of the microphone position is added to the non-optimum microphone position candidate memory 406. At the stage where evaluation has been completed for all microphone positions, the optimal microphone position is determined based on the microphone position ID stored in the optimal microphone position candidate memory 404.
FIG. 7 shows the overall configuration of a microphone position determining apparatus for realizing the microphone position determining method proposed in claims 3 and 6 of the present invention. A configuration different from the microphone position determination apparatus shown in FIG. 3 is that a noise level ratio adjustment module 50 and a noise recording module 60 are added between the recording gain adjustment module 100 and the impulse response measurement module 200.

この追加された構成により図１乃至図６で説明した直接波レベルＤと反射波レベルＲの比に基づき、その比が大きい程最適とする判定に加えて、各マイク位置で混入する背景雑音のレベルＮが低い程最適とする判定を加え、背景雑音の影響を含めた判定を得ようとするものである。
この場合の評価関数は次式で計算される。
Ｑ＝Ｄ／Ｒ＋ｋ・Ｄ／Ｎ ………（２）
ｋ：インパルス応答と収録レベルのゲイン差の補正係数
図７に示す収録ゲイン調整モジュール１００、インパルス応答測定モジュール２００は図４、図５と同じであるから、ここでは追加された部分と、それに係わる部分について説明する。 Based on the ratio of the direct wave level D and the reflected wave level R described in FIGS. 1 to 6 with this added configuration, in addition to the determination that the larger the ratio is, the background noise mixed at each microphone position is reduced. The determination that is optimal as the level N is lower is added to obtain a determination including the influence of background noise.
The evaluation function in this case is calculated by the following equation.
Q = D / R + k · D / N (2)
k: Correction coefficient for gain difference between impulse response and recording level The recording gain adjustment module 100 and impulse response measurement module 200 shown in FIG. 7 are the same as those in FIGS. The part will be described.

図７に示す雑音レベル比調整モジュール５０では、図８に示すように、話者模擬スピーカＳＰからは何も再生せず、背景雑音を音声認識用マイクＭで収録し、収録した信号をＡ／Ｄ変換部でディジタルに変換し、背景雑音レベル測定手段５２で雑音のパワーレベルを計算し、背景雑音パワーレベルメモリ５３に格納し、図４の収録ゲイン調整モジュール１００で格納した基準信号パワーレベルメモリ１１０に格納されている基準信号パワーレベルと合わせて基準／雑音レベル比メモリ５５に格納する。
初期設定終了後、各マイク位置において、図９に示す雑音収録モジュール６０で雑音の収録を行う。雑音収録モジュール６０では、図９に示すように、音声認識用マイクＭで収録した背景雑音信号を、Ａ／Ｄ変換部６１でディジタル信号に変換し、雑音パワーレベル計算部６２で雑音のパワーレベルを計算し、雑音パワーレベルメモリ６３に格納する。 In the noise level ratio adjustment module 50 shown in FIG. 7, as shown in FIG. 8, nothing is reproduced from the speaker simulation speaker SP, the background noise is recorded by the voice recognition microphone M, and the recorded signal is A / The digital signal is converted by the D converter, the noise power level is calculated by the background noise level measuring means 52, stored in the background noise power level memory 53, and the reference signal power level memory stored in the recording gain adjustment module 100 of FIG. The reference signal power level stored in 110 is stored in the reference / noise level ratio memory 55.
After completion of the initial setting, noise is recorded by the noise recording module 60 shown in FIG. 9 at each microphone position. In the noise recording module 60, as shown in FIG. 9, the background noise signal recorded by the speech recognition microphone M is converted into a digital signal by the A / D converter 61, and the noise power level is calculated by the noise power level calculator 62. Is calculated and stored in the noise power level memory 63.

評価関数計算モジュール３００はこの実施形態では図１０に示すように、インパルス応答メモリ２０４からインパルス測定値を読み出し、直接波レベル計算手段３０１と反射波レベル計算手段３０２で直接波レベルＤと反射波レベルＲを算出する。更に、雑音パワーレベルメモリ６３から背景雑音パワーレベルを読み出し、この背景雑音パワーレベルを基準／雑音レベル比メモリ５５に格納されている基準レベル／雑音レベルの比に基づき、雑音ゲイン調整部３０４で調整し、直接波レベル及び反射波レベルとレンジを合わせる。つまり、式（２）に示した補正係数ｋを決定する。得られた直接波レベルＤ及び反射波レベルＲ、雑音レベルＮから、例えば式（２）のような評価関数を評価関数計算部３０３で計算する。評価関数計算部３０３の計算結果は最適マイク位置決定モジュール４００で最大値がソートされ、最適マイク位置が決定される。 In this embodiment, the evaluation function calculation module 300 reads the impulse measurement value from the impulse response memory 204 as shown in FIG. 10, and the direct wave level calculation means 301 and the reflected wave level calculation means 302 perform the direct wave level D and the reflected wave level. R is calculated. Further, the background noise power level is read from the noise power level memory 63, and the background noise power level is adjusted by the noise gain adjustment unit 304 based on the ratio of the reference level / noise level stored in the reference / noise level ratio memory 55. Then, the direct wave level and the reflected wave level are matched with the range. That is, the correction coefficient k shown in Expression (2) is determined. From the obtained direct wave level D, reflected wave level R, and noise level N, an evaluation function such as Equation (2) is calculated by the evaluation function calculation unit 303, for example. The calculation result of the evaluation function calculation unit 303 is sorted by the optimum microphone position determination module 400, and the optimum microphone position is determined.

以下では、図１１に示すフローチャートを参照してこの発明のマイク位置決定プログラムの概要を説明する。
どのような状態において、システムの初期設定を行ったかを判定する（ステップＳ９１）。
初期設定を行ってない場合は、収録ゲインの調整を行い（ステップＳ９２）、雑音レベル比の調整を行う（ステップＳ９３）。
初期設定が終れば、全てのマイク位置の調査が終るまで（ステップＳ９４）以下の手順を繰返す。 Below, the outline | summary of the microphone position determination program of this invention is demonstrated with reference to the flowchart shown in FIG.
In what state it is determined whether the system has been initialized (step S91).
If the initial setting is not performed, the recording gain is adjusted (step S92), and the noise level ratio is adjusted (step S93).
When the initial setting is completed, the following procedure is repeated until the investigation of all microphone positions is completed (step S94).

まず雑音を収録し、雑音のパワーレベルを求める（ステップＳ９５）。
次にインパルス応答を測定する（ステップＳ９６）。
得られたインパルス応答と、収録した雑音のパワーレベルから、直接波レベル、反射波レベル、雑音レベルを求め、マイク位置の最適性を評価する評価関数を計算する（ステップＳ９７）。
評価関数値が他のマイクの評価関数値と比べて最大か否かを判定する（ステップＳ９８）。 First, noise is recorded and the power level of the noise is obtained (step S95).
Next, the impulse response is measured (step S96).
A direct wave level, a reflected wave level, and a noise level are obtained from the obtained impulse response and the recorded noise power level, and an evaluation function for evaluating the optimum microphone position is calculated (step S97).
It is determined whether or not the evaluation function value is maximum as compared with the evaluation function values of other microphones (step S98).

最大でない場合、当該マイク位置を、最適でないマイク位置のリストに登録し（ステップＳ９９）、他のマイク位置があるのかの判定に戻る（ステップＳ９４）。
最大である場合、最大評価関数値を更新し（ステップＳ１００）、最適マイク位置候補に当該マイク位置を入れ替えて（ステップＳ１０１）、他のマイク位置があるかの判定に戻る（ステップＳ９４）。
全てのマイク位置の判定が終った段階で最適マイク位置候補メモリに格納されているＩＤから最適マイク位置を出力する（ステップＳ１０２）。 If it is not the maximum, the microphone position is registered in the list of non-optimal microphone positions (step S99), and the process returns to the determination of whether there is another microphone position (step S94).
If it is the maximum, the maximum evaluation function value is updated (step S100), the microphone position is replaced with the optimum microphone position candidate (step S101), and the process returns to the determination of whether there is another microphone position (step S94).
When all microphone positions have been determined, the optimum microphone position is output from the ID stored in the optimum microphone position candidate memory (step S102).

を実行して終了する。
以上説明した本発明のマイク位置決定装置はコンピュータにマイク位置決定プログラムを解読させて実現することができる。この発明で提案するマイク位置決定プログラムはコンピュータが解読可能なプログラム言語によって記述され、磁気ディスク或はＣＤ−ＲＯＭ等の記録媒体に記録され、これら記録媒体からコンピュータにインストールされるか、又は通信回線を通じてコンピュータにインストールされ、コンピュータに備えられた中央演算処理装置に解読されてマイク位置決定装置として機能する。 To exit.
The microphone position determination apparatus of the present invention described above can be realized by causing a computer to decode a microphone position determination program. The microphone position determination program proposed in the present invention is written in a computer-readable program language, recorded on a recording medium such as a magnetic disk or CD-ROM, and installed in the computer from the recording medium, or a communication line. Installed in the computer, and decoded by a central processing unit provided in the computer to function as a microphone position determination device.

この発明によるマイク位置決定装置は例えば自動音声案内装置のように音声認識機能を備えた装置を設置する場合に、音声認識用の音声を取り込むためのマイク位置決定時に活用される。 The microphone position determining apparatus according to the present invention is utilized when determining a microphone position for taking in voice for voice recognition when a device having a voice recognition function such as an automatic voice guidance device is installed.

請求項１に示すインパルス応答から直接波レベルＤと反射波レベルＲの算出方法を説明するための図。The figure for demonstrating the calculation method of the direct wave level D and the reflected wave level R from the impulse response shown in FIG. 請求項２に示すインパルス応答から直接波レベルＤと反射波レベルＲの算出方法を説明するための図。The figure for demonstrating the calculation method of the direct wave level D and the reflected wave level R from the impulse response shown in Claim 2. 本発明によるマイク位置決定装置の実施形態を説明するためのブロック図。The block diagram for demonstrating embodiment of the microphone position determination apparatus by this invention. 図３に示すマイク位置決定装置の収録ゲイン調整モジュールを説明するためのブロック図。The block diagram for demonstrating the recording gain adjustment module of the microphone position determination apparatus shown in FIG. 図３に示すマイク位置決定装置のインパルス応答測定モジュールを説明するためのブロック図。The block diagram for demonstrating the impulse response measurement module of the microphone position determination apparatus shown in FIG. 図３に示すマイク位置決定装置の評価関数計算モジュール及び最適マイク位置決定モジュールを説明するためのブロック図。The block diagram for demonstrating the evaluation function calculation module and optimal microphone position determination module of the microphone position determination apparatus shown in FIG. 本発明によるマイク位置決定装置の他の実施形態を説明するためのブロック図。The block diagram for demonstrating other embodiment of the microphone position determination apparatus by this invention. 図７に示すマイク位置決定装置の雑音レベル比調整モジュールを説明するためのブロック図。The block diagram for demonstrating the noise level ratio adjustment module of the microphone position determination apparatus shown in FIG. 図７に示すマイク位置決定装置の雑音収録モジュールを説明するためのブロック図。The block diagram for demonstrating the noise recording module of the microphone position determination apparatus shown in FIG. 図７に示すマイク位置決定装置の評価関数計算モジュール及び最適マイク位置決定モジュールを説明するためのブロック図。The block diagram for demonstrating the evaluation function calculation module and optimal microphone position determination module of the microphone position determination apparatus shown in FIG. 図７に示すマイク位置決定装置をコンピュータに機能させるマイク位置決定プログラムの概要を説明するためのフローチャート。The flowchart for demonstrating the outline | summary of the microphone position determination program which makes a computer function the microphone position determination apparatus shown in FIG.

Explanation of symbols

５０雑音レベル比調整モジュール２０３インパルス応答計算部
６０雑音収録モジュール２０４インパルス応答メモリ
１００収録ゲイン調整モジュール３０１直接波レベル計算手段
２００インパルス応答測定モジュール３０２反射波レベル計算手段
３００評価関数計算モジュール３０３評価関数計算部
４００最適マイク位置決定モジュール４０１最大値判定スイッチ
ＳＰ話者模擬スピーカ４０４最適マイク位置候補メモリ
Ｍ音声認識用マイク４０５評価関数最大値メモリ
１０１基準信号メモリ４０６非最適マイク位置候補メモリ
１０２収録ゲイン調整部５１Ａ／Ｄ変換部
１０３Ｄ／Ａ変換部５２背景雑音レベル測定手段
１０４Ａ／Ｄ変換部５３背景雑音パワーレベルメモリ
１０５収録信号パワーレベル計算部５４基準／雑音レベル比計算部
１０６収録信号パワーレベル判定部５５基準／雑音レベル比メモリ
１０７レベル判定スイッチ６１Ａ／Ｄ変換部
１１０基準信号パワーレベルメモリ６２雑音パワーレベル計算部
２０１インパルス応答測定用信号メモリ６３雑音パワーレベルメモリ
２０２Ａ／Ｄ変換器 DESCRIPTION OF SYMBOLS 50 Noise level ratio adjustment module 203 Impulse response calculation part 60 Noise recording module 204 Impulse response memory 100 Recording gain adjustment module 301 Direct wave level calculation means 200 Impulse response measurement module 302 Reflected wave level calculation means 300 Evaluation function calculation module 303 Evaluation function calculation 400 Optimum microphone position determination module 401 Maximum value determination switch SP Speaker simulation speaker 404 Optimal microphone position candidate memory M Speech recognition microphone 405 Evaluation function maximum value memory 101 Reference signal memory 406 Non-optimal microphone position candidate memory 102 Recording gain adjustment unit 51 A / D converter 103 D / A converter 52 Background noise level measuring means 104 A / D converter 53 Background noise power level memory 105 Recorded signal power level calculator 54 Reference / Noise Level Ratio Calculation Unit 106 Recording Signal Power Level Determination Unit 55 Reference / Noise Level Ratio Memory 107 Level Determination Switch 61 A / D Conversion Unit 110 Reference Signal Power Level Memory 62 Noise Power Level Calculation Unit 201 Impulse Response Measurement Signal Memory 63 Noise power level memory 202 A / D converter

Claims

The impulse response from the speaker simulation speaker to the voice recognition microphone is measured for each of a plurality of positions by changing the position of the voice recognition microphone, and the time wave of the maximum value of the amplitude of the impulse response measured for each position, Alternatively, a direct wave level is calculated using a wave at a time around it as a direct wave, a reflected wave level is calculated using a wave at a time other than the direct wave as a reflected wave, and the ratio between the direct wave level and the reflected wave level is maximum. The microphone position determination method which makes the microphone position which becomes the optimal microphone position.

The impulse response from the speaker simulation speaker to the voice recognition microphone is measured for each of a plurality of positions by changing the position of the voice recognition microphone, and the time wave of the maximum value of the amplitude of the impulse response measured for each position, Or calculate the direct wave level as a direct wave of the surrounding time wave, calculate the reflected wave level as a reflected wave after the time exceeding the analysis frame width used for speech recognition from the time showing the direct wave of the impulse response, A microphone position determination method in which the microphone position at which the ratio of the direct wave level and the reflected wave level is maximum is the optimum microphone position.

3. The microphone position determination method according to claim 1, wherein a background noise level mixed in the voice recognition microphone is measured for each of a plurality of positions by changing a position of the voice recognition microphone, and the direct wave level is measured. And a background noise level ratio, and a microphone position determination method in which the microphone position at which the sum of the ratio and the ratio of the direct wave level and the reflected wave level is maximized is the optimum microphone position.

A device for determining the optimal microphone position used for speech recognition,
A speaker simulation speaker simulating a speaker placed at an assumed speaker's speaking position, a voice recognition microphone, and an impulse response from the speaker simulation speaker to the voice recognition microphone is changed in position of the voice recognition microphone. Impulse response measurement means that measures at multiple positions, and calculates the direct wave level using the wave of the maximum amplitude of the impulse response measured at each position or the surrounding time as a direct wave A direct wave level calculating means, a reflected wave level calculating means for calculating a reflected wave level using a wave at a time other than the direct wave as a reflected wave, and a microphone position at which a ratio of the direct wave level and the reflected wave level is maximum. A microphone position determining device, comprising: an optimum microphone position determining means for setting an optimal microphone position.

A device for determining the optimal microphone position used for speech recognition,
A speaker simulation speaker simulating a speaker placed at an assumed speaker's speaking position, a speech recognition microphone, and an impulse response from the previous speaker simulation speaker to the speech recognition microphone, changing the position of the speech recognition microphone And an impulse response measuring means for measuring at each of a plurality of positions, and directly calculating a direct wave level by using a wave at the time of the maximum amplitude of the impulse response measured at each position or a wave at a time around it as a direct wave. A wave level calculating means, a reflected wave level calculating means for calculating a reflected wave level from a time indicating a direct wave of an impulse response as a reflected wave after a time exceeding an analysis frame width used for speech recognition, the direct wave level and the reflected wave What is claimed is: 1. A microphone position determining apparatus comprising: an optimum microphone position determining unit that sets a microphone position having a maximum wave level ratio as an optimal microphone position.

6. The microphone position determining apparatus according to claim 4, wherein the background noise level is recorded by changing the position of the voice recognition microphone and recording background noise mixed in the voice recognition microphone for each of a plurality of positions. A background noise level measurement means for measuring the direct wave level and the background noise level, and a microphone position at which the sum of the ratio and the ratio of the direct wave level and the reflected wave level is maximized is determined as an optimum microphone position. A microphone position determining apparatus comprising: an optimum microphone position determining means.

A microphone position determination program, which is written in a computer-readable program language, and causes the computer to execute any function of the microphone position determination apparatus according to any one of claims 4 to 6.