JP6096437B2

JP6096437B2 - Audio processing device

Info

Publication number: JP6096437B2
Application number: JP2012186269A
Authority: JP
Inventors: 吉田　昌弘; 昌弘吉田
Original assignee: Xacti Corp
Current assignee: Xacti Corp
Priority date: 2012-08-27
Filing date: 2012-08-27
Publication date: 2017-03-15
Anticipated expiration: 2032-08-27
Also published as: JP2014045317A

Description

この発明は、音声処理装置に関し、特に並列的に取得された複数の音声信号のゲインを調整する、音声処理装置に関する。 The present invention relates to an audio processing device, and more particularly to an audio processing device that adjusts gains of a plurality of audio signals acquired in parallel.

この種の装置の一例が、特許文献１に開示されている。この背景技術によれば、一方のマイクロフォンによって捉えられた音声信号は、アンプおよび第１ＬＰＦを介して第１検波器に入力される。また、他方のマイクロフォンによって捉えられた音声信号は、可変利得アンプおよび第２ＬＰＦを介して第２検波器に入力される。第１検波器の出力および第２検波器の出力は比較器によって互いに比較され、可変利得アンプの増幅率は比較結果に基づいて調整される。これによって、マイクロフォンの感度のばらつきを抑制することができる。 An example of this type of device is disclosed in Patent Document 1. According to this background art, the audio signal captured by one microphone is input to the first detector via the amplifier and the first LPF. The audio signal captured by the other microphone is input to the second detector via the variable gain amplifier and the second LPF. The output of the first detector and the output of the second detector are compared with each other by the comparator, and the amplification factor of the variable gain amplifier is adjusted based on the comparison result. Thereby, variation in sensitivity of the microphone can be suppressed.

特開２００５−１３６６２８号公報JP 2005-136628 A

しかし、背景技術では、マイクロフォンへの音声信号の入射角によって可変利得アンプの増幅率が調整されることはなく、各マイクロフォンによって捉えられた音声信号間の位相のばらつきが抑制されることもない。このため、背景技術では、調整後の音声信号の品質に限界がある。 However, in the background art, the amplification factor of the variable gain amplifier is not adjusted by the incident angle of the audio signal to the microphone, and the phase variation between the audio signals captured by each microphone is not suppressed. For this reason, in the background art, the quality of the audio signal after adjustment is limited.

それゆえに、この発明の主たる目的は、音声信号の品質を高めることができる、音声処理装置を提供することである。 Therefore, a main object of the present invention is to provide an audio processing apparatus capable of improving the quality of an audio signal.

この発明に従う音声処理装置(10：実施例で相当する参照符号。以下同じ)は、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類手段(54, S1~S5, S13~S15)、分類手段の出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出手段(S7~S11)、検出手段によって検出されたＮ個の位相差の中から第１閾値(TH1)を下回る位相差を特定する第１特定手段(S19)、および第１特定手段によって特定された位相差を定義するＭ個の信号成分の間のレベル差が抑制されるようにＭ個の音声信号の振幅を調整する第１調整手段(50, S23~S25, S33~S37)を備える。 The speech processing apparatus according to the present invention (10: reference numerals corresponding to the embodiments; the same applies hereinafter) receives N (N: 2 integers) of M speech signals (M: an integer of 2 or more) acquired in parallel. Classification means (54, S1 to S5, S13 to S15) for classifying into N signal components respectively corresponding to the frequencies of the above integers), and M corresponding to each of the N frequencies with reference to the output of the classification means Detecting means (S7 to S11) for detecting a phase difference between the signal components, and a first specifying for specifying a phase difference lower than the first threshold (TH1) among the N phase differences detected by the detecting means. Means (S19), and first adjustment means for adjusting the amplitudes of the M audio signals so that the level difference between the M signal components defining the phase difference specified by the first specifying means is suppressed ( 50, S23 to S25, S33 to S37).

好ましくは、第１閾値はＭ個の音声信号をそれぞれ取得するＭ個のマイクロフォン(34L, 34R)の間の距離とＭ個の音声信号の許容入射角の上限とに基づく値を示す。 Preferably, the first threshold value is a value based on a distance between M microphones (34L, 34R) that respectively acquire M sound signals and an upper limit of an allowable incident angle of the M sound signals.

好ましくは、検出手段によって検出されたＮ個の位相差の中から第２閾値(TH2)以上の値を示す位相差を特定する第２特定手段(S21, S27)、および第２特定手段によって特定された位相差が抑制されるようにＭ個の音声信号の遅延量を調整する第２調整手段(52, S39~S41)がさらに備えられる。 Preferably, the second specifying means (S21, S27) for specifying a phase difference indicating a value equal to or greater than the second threshold (TH2) from the N phase differences detected by the detecting means, and the second specifying means Second adjustment means (52, S39 to S41) for adjusting the delay amount of the M audio signals so as to suppress the phase difference is further provided.

さらに好ましくは、第２閾値はＭ個の音声信号をそれぞれ取得するＭ個のマイクロフォン(34L, 34R)の間の距離に基づく値を示す。 More preferably, the second threshold value indicates a value based on a distance between M microphones (34L, 34R) that respectively acquire M audio signals.

好ましくは、分類手段はＭ個（Ｍ：２以上の整数）の音声信号の各々をフーリエ変換する変換手段(54)を含む。 Preferably, the classification means includes conversion means (54) for Fourier transforming each of M (M: an integer of 2 or more) audio signals.

この発明に従う音声処理装置(10)は、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類手段(54, S1~S5, S13~S15)、分類手段の出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出手段(S7~S11)、検出手段によって検出されたＮ個の位相差の中から閾値(TH2)以上の値を示す位相差を特定する特定手段(S21, S27)、および特定手段によって特定された位相差が抑制されるようにＭ個の音声信号の遅延量を調整する調整手段(52, S39~S41)を備える。 An audio processing apparatus (10) according to the present invention has N (M: integer greater than or equal to 2) audio signals acquired in parallel, each of which corresponds to N (N: integer greater than or equal to 2) frequencies. Classification means (54, S1 to S5, S13 to S15) for classifying into signal components, and detecting the phase difference between M signal components corresponding to each of N frequencies with reference to the output of the classification means Detecting means (S7 to S11), specifying means (S21, S27) for specifying a phase difference indicating a value equal to or greater than a threshold value (TH2) among N phase differences detected by the detecting means, and specifying by the specifying means Adjusting means (52, S39 to S41) for adjusting the delay amount of the M audio signals so as to suppress the phase difference.

この発明に従う音声処理装置(10)は、並列的に取得された複数の音声信号の相対位相差情報を検出する検出手段(S1~S15)、部品ばらつきによって生じる複数の音声信号の間の振幅・位相ずれを検出手段によって検出された相対位相差情報に基づいて判別する判別手段(S17~S21, S29~S31)、複数の音声信号の振幅および位相を補正する補正手段(50, 52)、および判別手段の判別結果に基づいて補正手段の補正量を調整する調整手段(S23~S27, S33~S41)を備える。 The sound processing device (10) according to the present invention is a detection means (S1 to S15) for detecting relative phase difference information of a plurality of sound signals acquired in parallel, the amplitude between the plurality of sound signals caused by component variations, Discriminating means (S17 to S21, S29 to S31) for discriminating the phase shift based on the relative phase difference information detected by the detecting means, correcting means (50, 52) for correcting the amplitude and phase of a plurality of audio signals, and Adjustment means (S23 to S27, S33 to S41) for adjusting the correction amount of the correction means based on the determination result of the determination means is provided.

この発明に従う音声処理プログラムは、音声処理装置(10)のプロセッサ(56)に、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類ステップ(S1~S5, S13~S15)、分類ステップの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出ステップ(S7~S11)、検出ステップによって検出されたＮ個の位相差の中から閾値(TH1)を下回る位相差を特定する特定ステップ(S19)、および特定ステップによって特定された位相差を定義するＭ個の信号成分の間のレベル差が抑制されるようにＭ個の音声信号の振幅を調整する調整ステップ(50, S23~S25, S33~S37)を実行させるための、音声処理プログラムである。 The audio processing program according to the present invention is configured such that the processor (56) of the audio processing device (10) receives N (N: 2 or more) each of M (M: integer of 2 or more) audio signals acquired in parallel. Classification step (S1 to S5, S13 to S15) for classifying into N signal components respectively corresponding to frequencies of M), and M signals corresponding to each of the N frequencies with reference to the output of the classification step A detection step (S7 to S11) for detecting a phase difference between components, a specification step (S19) for specifying a phase difference below a threshold (TH1) among N phase differences detected by the detection step, and a specification An adjustment step (50, S23 to S25, S33 to S37) for adjusting the amplitude of the M audio signals so that the level difference between the M signal components defining the phase difference identified by the step is suppressed. This is a voice processing program for execution.

この発明に従う音声処理方法は、音声処理装置(10)のプロセッサ(56)によって実行される音声処理方法であって、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類ステップ(S1~S5, S13~S15)、分類ステップの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出ステップ(S7~S11)、検出ステップによって検出されたＮ個の位相差の中から閾値(TH1)を下回る位相差を特定する特定ステップ(S19)、および特定ステップによって特定された位相差を定義するＭ個の信号成分の間のレベル差が抑制されるようにＭ個の音声信号の振幅を調整する調整ステップ(50, S23~S25, S33~S37)を備える。 The audio processing method according to the present invention is an audio processing method executed by the processor (56) of the audio processing device (10), and is an M (M: integer of 2 or more) audio signals acquired in parallel. Classification steps (S1 to S5, S13 to S15) for classifying each of the signals into N signal components corresponding to N (N: an integer of 2 or more) frequencies, and N frequencies with reference to the output of the classification step Detection step (S7 to S11) for detecting the phase difference between M signal components corresponding to each of the above, and the phase difference below the threshold (TH1) is identified from the N phase differences detected by the detection step A specific step (S19), and an adjusting step (50,) for adjusting the amplitude of the M audio signals so that a level difference between the M signal components defining the phase difference specified by the specific step is suppressed. S23 to S25, S33 to S37).

この発明に従う音声処理プログラムは、音声処理装置(10)のプロセッサ(56)に、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類ステップ(54, S1~S5, S13~S15)、分類ステップの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出ステップ(S7~S11)、検出ステップによって検出されたＮ個の位相差の中から閾値(TH2)以上の値を示す位相差を特定する特定ステップ(S21, S27)、および特定ステップによって特定された位相差が抑制されるようにＭ個の音声信号の遅延量を調整する調整ステップ(52, S39~S41)を実行させるための、音声処理プログラムである。 The audio processing program according to the present invention is configured such that the processor (56) of the audio processing device (10) receives N (N: 2 or more) each of M (M: integer of 2 or more) audio signals acquired in parallel. Classification step (54, S1 to S5, S13 to S15) for classifying into N signal components respectively corresponding to frequencies of M), and M corresponding to each of the N frequencies with reference to the output of the classification step Detection step (S7 to S11) for detecting a phase difference between the signal components of the signal, and a specifying step for specifying a phase difference indicating a value equal to or greater than a threshold value (TH2) among the N phase differences detected by the detection step ( S21, S27), and an audio processing program for executing an adjustment step (52, S39 to S41) for adjusting the delay amount of the M audio signals so that the phase difference specified by the specific step is suppressed is there.

この発明に従う音声処理方法は、音声処理装置(10)のプロセッサ(56)によって実行される音声処理方法であって、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する分類ステップ(54, S1~S5, S13~S15)、分類ステップの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する検出ステップ(S7~S11)、検出ステップによって検出されたＮ個の位相差の中から閾値(TH2)以上の値を示す位相差を特定する特定ステップ(S21, S27)、および特定ステップによって特定された位相差が抑制されるようにＭ個の音声信号の遅延量を調整する調整ステップ(52, S39~S41)を備える。 The audio processing method according to the present invention is an audio processing method executed by the processor (56) of the audio processing device (10), and is an M (M: integer of 2 or more) audio signals acquired in parallel. Classification step (54, S1 to S5, S13 to S15) for classifying each signal into N signal components corresponding to N (N: integer greater than or equal to 2) frequencies, N with reference to the output of the classification step A detection step (S7 to S11) for detecting a phase difference between M signal components corresponding to each of the frequencies, and a value equal to or greater than a threshold (TH2) among the N phase differences detected by the detection step. A specific step (S21, S27) for identifying the phase difference shown, and an adjustment step (52, S39 to S41) for adjusting the delay amount of the M audio signals so that the phase difference identified by the specific step is suppressed. Prepare.

Ｍ個の音声信号の振幅は、第１閾値を下回る位相差を定義するＭ個の信号成分の間のレベル差が抑制されるように調整される。つまり、第１閾値に相当する角度を下回る角度で入射された音声成分のレベル差が抑制される。これによって、音声信号の品質が向上する。 The amplitudes of the M audio signals are adjusted so that the level difference between the M signal components defining the phase difference below the first threshold is suppressed. That is, the level difference between the sound components incident at an angle lower than the angle corresponding to the first threshold is suppressed. This improves the quality of the audio signal.

Ｍ個の音声信号の位相は、閾値以上の位相差が抑制されるように調整される。つまり、閾値をマイク間隔から決まる理論上の最大閾値とすることで、品質バラツキの影響で生じた最大位相差を上回る位相差を抑制する。この抑制処理を繰り返すことにより、どの方向から到来した音に対しても位相差が最大閾値以内に収まるようになる。この結果、品質バラツキによる遅延が補正され、音声信号の品質が向上する。 The phases of the M audio signals are adjusted so that a phase difference equal to or greater than a threshold value is suppressed. That is, by setting the threshold value as the theoretical maximum threshold value determined from the microphone interval, a phase difference exceeding the maximum phase difference caused by quality variation is suppressed. By repeating this suppression processing, the phase difference falls within the maximum threshold value for sound coming from any direction. As a result, the delay due to the quality variation is corrected, and the quality of the audio signal is improved.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

（Ａ）はこの発明の一実施例の基本的構成を示すブロック図であり、（Ｂ）はこの発明の他の実施例の基本的構成を示すブロック図である。(A) is a block diagram showing a basic configuration of one embodiment of the present invention, and (B) is a block diagram showing a basic configuration of another embodiment of the present invention. この発明の一実施例の構成を示すブロック図である。It is a block diagram which shows the structure of one Example of this invention. 図２実施例に適用される音声処理回路の構成の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of a configuration of a sound processing circuit applied to the embodiment in FIG. 2; 図３に示す音声処理回路に設けられた制御回路の動作の一部を示すフロー図である。FIG. 4 is a flowchart showing a part of the operation of a control circuit provided in the sound processing circuit shown in FIG. 3. 図３に示す音声処理回路に設けられた制御回路の動作の他の一部を示すフロー図である。FIG. 4 is a flowchart showing another part of the operation of the control circuit provided in the sound processing circuit shown in FIG. 3. 図３に示す音声処理回路に設けられた制御回路の動作のその他の一部を示すフロー図である。FIG. 4 is a flowchart showing another part of the operation of the control circuit provided in the sound processing circuit shown in FIG. 3. マイクロフォンに入射される音声信号の一例を示す図解図である。It is an illustration figure which shows an example of the audio | voice signal which injects into a microphone. （Ａ）はＬチャネル周波数成分の波形の一例を示す図解図であり、（Ｂ）はＲチャネル周波数成分の波形の一例を示す図解図である。(A) is an illustrative view showing an example of a waveform of an L channel frequency component, and (B) is an illustrative view showing an example of a waveform of an R channel frequency component. マイクロフォンに入射される音声信号の他の一例を示す図解図である。It is an illustration figure which shows another example of the audio | voice signal which injects into a microphone. （Ａ）はＬチャネル周波数成分の波形の他の一例を示す図解図であり、（Ｂ）はＲチャネル周波数成分の波形の他の一例を示す図解図である。(A) is an illustrative view showing another example of the waveform of the L channel frequency component, and (B) is an illustrative view showing another example of the waveform of the R channel frequency component. マイクロフォンに入射される音声信号のその他の一例を示す図解図である。It is an illustration figure which shows another example of the audio | voice signal which injects into a microphone. （Ａ）はＬチャネル周波数成分の波形のその他の一例を示す図解図であり、（Ｂ）はＲチャネル周波数成分の波形のその他の一例を示す図解図である。(A) is an illustrative view showing another example of the waveform of the L channel frequency component, and (B) is an illustrative view showing another example of the waveform of the R channel frequency component. 図２実施例に適用される音声処理回路の構成の他の一例を示すブロック図である。It is a block diagram which shows another example of a structure of the audio | voice processing circuit applied to the FIG. 2 Example. 図２実施例に適用される音声処理回路の構成のその他の一例を示すブロック図である。FIG. 7 is a block diagram illustrating another example of the configuration of the sound processing circuit applied to the embodiment in FIG. 2;

以下、この発明の実施の形態を図面を参照しながら説明する。
［基本的構成１］ Embodiments of the present invention will be described below with reference to the drawings.
[Basic configuration 1]

図１（Ａ）を参照して、この実施例の音声処理装置は、基本的に次のように構成される。分類手段１ａは、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する。検出手段２ａは、分類手段１ａの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する。第１特定手段３ａは、検出手段２ａによって検出されたＮ個の位相差の中から第１閾値を下回る位相差を特定する。第１調整手段４ａは、第１特定手段３ａによって特定された位相差を定義するＭ個の信号成分の間のレベル差が抑制されるようにＭ個の音声信号の振幅を調整する。 Referring to FIG. 1A, the sound processing apparatus of this embodiment is basically configured as follows. The classifying means 1a classifies each of M (M: integer greater than or equal to 2) audio signals acquired in parallel into N signal components respectively corresponding to N (N: integer greater than or equal to 2) frequencies. To do. The detection means 2a refers to the output of the classification means 1a and detects the phase difference between the M signal components corresponding to each of the N frequencies. The first specifying unit 3a specifies a phase difference that is lower than the first threshold value from among the N phase differences detected by the detecting unit 2a. The first adjusting unit 4a adjusts the amplitudes of the M audio signals so that the level difference between the M signal components defining the phase difference specified by the first specifying unit 3a is suppressed.

Ｍ個の音声信号の振幅は、第１閾値を下回る位相差を定義するＭ個の信号成分の間のレベル差が抑制されるように調整される。つまり、第１閾値に相当する角度を下回る角度で入射された音声成分のレベル差が抑制されるように、Ｍ個の音声信号の全域の振幅が調整される。これによって、マイク感度のバラツキが補正され、音声信号の品質が向上する。
［基本的構成２］ The amplitudes of the M audio signals are adjusted so that the level difference between the M signal components defining the phase difference below the first threshold is suppressed. That is, the amplitude of the entire area of the M audio signals is adjusted so that the level difference between the audio components incident at an angle lower than the angle corresponding to the first threshold is suppressed. As a result, variations in microphone sensitivity are corrected, and the quality of the audio signal is improved.
[Basic configuration 2]

図１（Ｂ）を参照して、他の実施例の音声処理装置は、基本的に次のように構成される。分類手段１ｂは、並列的に取得されたＭ個（Ｍ：２以上の整数）の音声信号の各々をＮ個（Ｎ：２以上の整数）の周波数にそれぞれ対応するＮ個の信号成分に分類する。検出手段２ｂは、分類手段１ｂの出力を参照してＮ個の周波数の各々に対応するＭ個の信号成分の間の位相差を検出する。特定手段３ｂは、検出手段２ｂによって検出されたＮ個の位相差の中から閾値以上の値を示す位相差を特定する。調整手段４ｂは、特定手段３ｂによって特定された位相差が抑制されるようにＭ個の音声信号の遅延量を調整する。 With reference to FIG. 1 (B), the speech processing apparatus of another Example is fundamentally comprised as follows. The classifying means 1b classifies each of M (M: integer greater than or equal to 2) audio signals acquired in parallel into N signal components respectively corresponding to N (N: integer greater than or equal to 2) frequencies. To do. The detection means 2b refers to the output of the classification means 1b and detects the phase difference between the M signal components corresponding to each of the N frequencies. The specifying unit 3b specifies a phase difference indicating a value greater than or equal to a threshold value from among the N phase differences detected by the detecting unit 2b. The adjusting unit 4b adjusts the delay amount of the M audio signals so that the phase difference specified by the specifying unit 3b is suppressed.

Ｍ個の音声信号の位相は、閾値以上の位相差が抑制されるように調整される。つまり、閾値をマイク間隔から決まる理論上の最大閾値とすることで、品質バラツキの影響で生じた最大位相差を上回る位相差を抑制する。この抑制処理を繰り返すことにより、どの方向から到来した音に対しても位相差が最大閾値以内に収まるようになる。この結果、品質バラツキによる遅延が補正され、音声信号の品質が向上する。
［実施例］ The phases of the M audio signals are adjusted so that a phase difference equal to or greater than a threshold value is suppressed. That is, by setting the threshold value as the theoretical maximum threshold value determined from the microphone interval, a phase difference exceeding the maximum phase difference caused by quality variation is suppressed. By repeating this suppression processing, the phase difference falls within the maximum threshold value for sound coming from any direction. As a result, the delay due to the quality variation is corrected, and the quality of the audio signal is improved.
[Example]

図２を参照して、この実施例のディジタルカメラ１０は、ドライバ１８ａおよび１８ｂによってそれぞれ駆動されるフォーカスレンズ１２および絞りユニット１４を含む。これらの部材を経た光学像は、イメージャ１６の撮像面に照射され、光電変換を施される。 Referring to FIG. 2, the digital camera 10 of this embodiment includes a focus lens 12 and an aperture unit 14 driven by drivers 18a and 18b, respectively. The optical image that has passed through these members is irradiated onto the imaging surface of the imager 16 and subjected to photoelectric conversion.

電源が投入されると、ＣＰＵ３０は、動画取り込み処理を実行するべく、ドライバ１８ｃに露光動作および電荷読み出し動作の繰り返しを命令する。ドライバ１８ｃは、周期的に発生する垂直同期信号Ｖｓｙｎｃに応答して、イメージャ１６の撮像面を露光し、かつ撮像面で生成された電荷をラスタ走査態様で読み出す。イメージャ１６からは、読み出された電荷に基づく生画像データが周期的に出力される。 When the power is turned on, the CPU 30 instructs the driver 18c to repeat the exposure operation and the charge readout operation in order to execute the moving image capturing process. The driver 18c exposes the imaging surface of the imager 16 in response to a periodically generated vertical synchronization signal Vsync, and reads out the charges generated on the imaging surface in a raster scanning manner. From the imager 16, raw image data based on the read charges is periodically output.

カメラ処理回路２０は、イメージャ１６から出力された生画像データに白バランス調整，色分離，ＹＵＶ変換などの処理を施す。これによって生成されたＹＵＶ形式の画像データは、メモリ制御回路２２を通してＳＤＲＡＭ２４のＹＵＶ画像エリア２４ａに書き込まれる。ＬＣＤドライバ２６は、ＹＵＶ画像エリア２４ａに格納された画像データをメモリ制御回路２２を通して繰り返し読み出し、読み出された画像データに基づいてＬＣＤモニタ２８を駆動する。この結果、撮像面で捉えられたシーンを表すリアルタイム動画像（スルー画像）がモニタ画面に表示される。 The camera processing circuit 20 performs processing such as white balance adjustment, color separation, and YUV conversion on the raw image data output from the imager 16. The YUV format image data generated thereby is written into the YUV image area 24 a of the SDRAM 24 through the memory control circuit 22. The LCD driver 26 repeatedly reads out the image data stored in the YUV image area 24a through the memory control circuit 22, and drives the LCD monitor 28 based on the read image data. As a result, a real-time moving image (through image) representing the scene captured on the imaging surface is displayed on the monitor screen.

カメラ処理回路２０はまた、ＹＵＶ変換によって生成されたＹデータをＣＰＵ３０に与える。ＣＰＵ３０は、与えられたＹデータにＡＥ処理を施して適正ＥＶ値を算出し、算出された適正ＥＶ値を定義する絞り量および露光時間をドライバ１８ｂおよび１８ｃにそれぞれ設定する。これによって、スルー画像の明るさが確保される。ＣＰＵ３０はまた、前処理回路２０から与えられたＹデータの高周波成分を参照してＡＦ処理を継続的に実行する。これによってフォーカスレンズ１２が合焦点近傍に継続的に配置され、スルー画像の鮮鋭度が確保される。 The camera processing circuit 20 also provides the CPU 30 with Y data generated by YUV conversion. The CPU 30 performs AE processing on the given Y data to calculate an appropriate EV value, and sets the aperture amount and the exposure time that define the calculated appropriate EV value in the drivers 18b and 18c, respectively. This ensures the brightness of the through image. The CPU 30 also continuously executes the AF process with reference to the high frequency component of the Y data given from the preprocessing circuit 20. Accordingly, the focus lens 12 is continuously disposed in the vicinity of the in-focus point, and the sharpness of the through image is ensured.

キー入力装置３２に設けられたムービボタン３２ｍｖが操作されると、ＣＰＵ３０は、音声処理回路３６およびメモリＩ／Ｆ３８を起動する。音声処理回路３６は、マイクロフォン３４Ｌおよび３４Ｒからそれぞれ出力されたＬチャネルの音声データおよびＲチャネルの音声データに後述する音声処理を施す。処理を施されたＬチャネルの音声データおよびＲチャネルの音声データは、メモリ制御回路２２を介してＳＲＡＭ２４の音声エリア２４ｂに書き込まれる。 When the movie button 32mv provided on the key input device 32 is operated, the CPU 30 activates the voice processing circuit 36 and the memory I / F 38. The audio processing circuit 36 performs audio processing described later on the L channel audio data and the R channel audio data output from the microphones 34L and 34R, respectively. The processed L-channel audio data and R-channel audio data are written into the audio area 24 b of the SRAM 24 via the memory control circuit 22.

メモリＩ／Ｆ３８は、新規の画像ファイルを着脱自在の記録媒体３８に作成し（作成した画像ファイルはオープンされる）、ＹＵＶ画像エリア２４ａに格納された画像データおよび音声エリア２４ｂに格納された２チャネルの音声データをメモリ制御回路２２を通して繰り返し読み出し、そして読み出された画像データおよび音声データをオープン状態の画像ファイルに収める。 The memory I / F 38 creates a new image file on the removable recording medium 38 (the created image file is opened), and the image data stored in the YUV image area 24a and the 2 stored in the audio area 24b. The audio data of the channel is repeatedly read through the memory control circuit 22, and the read image data and audio data are stored in an open image file.

ムービボタン３４ｍｖが再度操作されると、ＣＰＵ３０は、音声処理回路３６およびメモリＩ／Ｆ３８を停止する。メモリＩ／Ｆ３８は、ＹＵＶ画像エリア２４ａおよび音声エリア２４ｂからのデータ読み出しを終了し、オープン状態の画像ファイルをクローズする。これによって、撮像シーンを継続的に表す動画像と撮像シーン周辺の音声とがファイル形式で記録媒体４０に記録される。 When the movie button 34mv is operated again, the CPU 30 stops the sound processing circuit 36 and the memory I / F 38. The memory I / F 38 finishes reading data from the YUV image area 24a and the audio area 24b, and closes the open image file. As a result, the moving image that continuously represents the imaging scene and the sound around the imaging scene are recorded in the recording medium 40 in a file format.

音声処理回路３６は、図３に示すように構成される。Ｌチャネルの音声データおよびＲチャネルの音声データはそれぞれ、振幅補正系５０を形成する振幅補正回路５０Ｌおよび５０Ｒに入力される。振幅補正回路５０Ｌおよび５０Ｒの各々は、入力された音声データの振幅を制御回路５６の設定に従って補正し、補正後の音声データを遅延補正系５２に与える。Ｌチャネルの音声データは遅延補正回路５２Ｌに入力され、Ｒチャネルの音声データは遅延補正回路５２Ｒに入力される。遅延補正回路５２Ｌおよび５２Ｒの各々は、入力された音声データを制御回路５６の設定に従って遅延させ、遅延後の音声データをメモリ制御回路２２に向けて出力する。 The audio processing circuit 36 is configured as shown in FIG. The L-channel audio data and the R-channel audio data are input to amplitude correction circuits 50L and 50R forming the amplitude correction system 50, respectively. Each of the amplitude correction circuits 50L and 50R corrects the amplitude of the input audio data in accordance with the setting of the control circuit 56, and supplies the corrected audio data to the delay correction system 52. The L channel audio data is input to the delay correction circuit 52L, and the R channel audio data is input to the delay correction circuit 52R. Each of the delay correction circuits 52L and 52R delays the input audio data according to the setting of the control circuit 56, and outputs the delayed audio data to the memory control circuit 22.

遅延補正を施されたＬチャネルの音声データおよびＲチャネルの音声データはまた、ＦＦＴ(Fast Fourier Transform)解析系５２を形成するＦＦＴ解析回路５４Ｌおよび５４Ｒにそれぞれ入力される。ＦＦＴ解析回路５４Ｌおよび５４Ｒの各々は、入力された音声データにフーリエ変換を施し、これによって得られた解析結果つまりＮｍａｘ個（Ｎｍａｘ：２以上の整数）の周波数成分を制御回路５６に与える。 The delay-corrected L-channel sound data and R-channel sound data are also input to FFT analysis circuits 54L and 54R that form an FFT (Fast Fourier Transform) analysis system 52, respectively. Each of the FFT analysis circuits 54L and 54R performs Fourier transform on the input audio data, and gives the analysis result, that is, Nmax (Nmax: integer of 2 or more) frequency components obtained thereby, to the control circuit 56.

Ｌチャネルの周波数成分とＲチャネルの周波数成分との位相差が１／２周期（＝π）以上ずれる周波数については、チャネル間の位相差を的確に判別することができない。このため、Ｎｍａｘ個の周波数成分の各々の周波数は、数１を満足する必要がある。
［数１］
Ｄ／Ｖ＊２πｆ＜π
Ｄ：マイクロフォン３４Ｌおよび３４Ｒの間隔
Ｖ：音速
ｆ：周波数 For the frequency where the phase difference between the frequency component of the L channel and the frequency component of the R channel is shifted by ½ period (= π) or more, the phase difference between the channels cannot be accurately determined. For this reason, each frequency of the Nmax frequency components needs to satisfy Equation 1.
[Equation 1]
D / V * 2πf <π
D: Distance between microphones 34L and 34R V: Sound velocity f: Frequency

なお、間隔Ｄを２０ミリメートルとし、音速を３４０ｍ／秒とすると、Ｎｍａｘ個の周波数成分はいずれも８．５ｋＨｚを下回る周波数のデータ成分に相当する。 If the interval D is 20 millimeters and the sound speed is 340 m / sec, all Nmax frequency components correspond to data components having a frequency lower than 8.5 kHz.

制御回路５６は、こうして与えられた周波数成分に基づいて振幅補正系５０および遅延補正系５２の設定を制御する。制御回路５６は、具体的にはＤＳＰ(Digital Signal Processor)であり、図４〜図６に示すフロー図に従う処理を１０２４サンプル毎に実行する。なお、振幅補正系５０および遅延補正系５２の設定は、電源投入時に初期化される。また、Ｌチャネルの音声データおよびＲチャネルの音声データはいずれも４８ｋＨｚのクロック周波数でサンプルされたデータに相当する。 The control circuit 56 controls the settings of the amplitude correction system 50 and the delay correction system 52 based on the frequency component thus given. The control circuit 56 is specifically a DSP (Digital Signal Processor), and executes processing according to the flowcharts shown in FIGS. 4 to 6 for every 1024 samples. The settings of the amplitude correction system 50 and the delay correction system 52 are initialized when the power is turned on. Both the L channel audio data and the R channel audio data correspond to data sampled at a clock frequency of 48 kHz.

図４を参照して、ステップＳ１ではＬチャネルの音声データのＦＦＴ解析結果をＦＦＴ解析回路５４Ｌから取得し、ステップＳ３ではＲチャネルの音声データのＦＦＴ解析結果をＦＦＴ解析回路５４Ｒから取得する。取得が完了すると、ステップＳ５で変数Ｎを“１”に設定する。 Referring to FIG. 4, in step S1, the FFT analysis result of the L channel audio data is acquired from the FFT analysis circuit 54L, and in step S3, the FFT analysis result of the R channel audio data is acquired from the FFT analysis circuit 54R. When the acquisition is completed, the variable N is set to “1” in step S5.

ステップＳ７ではＬチャネルに属するＮ番目の周波数成分の位相を“Ｐｈ＿Ｌ（Ｎ）”として算出し、ステップＳ９ではＲチャネルに属するＮ番目の周波数成分の位相を“Ｐｈ＿Ｒ（Ｎ）”として算出する。位相Ｐｈ＿Ｌ（Ｎ）は数２に従って算出され、位相Ｐｈ＿Ｒ（Ｎ）は数３に従って算出される。
［数２］
Ｐｈ＿Ｌ（Ｎ）＝ａｔａｎ（ｒｅａｌ（ｆ＿Ｎ＿Ｌ）／ｉｍａｇ（ｆ＿Ｎ＿Ｌ））
ａｔａｎ：アークタンジェント
ｒｅａｌ（ｆ＿Ｎ＿Ｌ）：Ｌチャネルに属するＮ番目の周波数成分の実部
ｉｍａｇ（ｆ＿Ｎ＿Ｌ）：Ｌチャネルに属するＮ番目の周波数成分の嘘部
［数３］
Ｐｈ＿Ｌ（Ｒ）＝ａｔａｎ（ｒｅａｌ（ｆ＿Ｎ＿Ｒ）／ｉｍａｇ（ｆ＿Ｎ＿Ｒ））
ｒｅａｌ（ｆ＿Ｎ＿Ｒ）：Ｒチャネルに属するＮ番目の周波数成分の実部
ｉｍａｇ（ｆ＿Ｎ＿Ｒ）：Ｒチャネルに属するＮ番目の周波数成分の嘘部 In step S7, the phase of the Nth frequency component belonging to the L channel is calculated as “Ph_L (N)”, and in step S9, the phase of the Nth frequency component belonging to the R channel is calculated as “Ph_R (N)”. The phase Ph_L (N) is calculated according to Equation 2, and the phase Ph_R (N) is calculated according to Equation 3.
[Equation 2]
Ph_L (N) = atan (real (f_N_L) / image (f_N_L))
atan: arctangent real (f_N_L): real part of the Nth frequency component belonging to the L channel imag (f_N_L): lie part of the Nth frequency component belonging to the L channel [Equation 3]
Ph_L (R) = atan (real (f_N_R) / image (f_N_R))
real (f_N_R): real part of the Nth frequency component belonging to the R channel imag (f_N_R): lie part of the Nth frequency component belonging to the R channel

ステップＳ１１では、こうして算出された位相Ｐｈ＿Ｌ（Ｎ）およびＰｈ＿Ｒ（Ｎ）の差分絶対値を“ΔＰｈ（Ｎ）”として算出する。ステップＳ１３では、変数Ｎが最大値Ｎｍａｘに達したか否かを判別する。判別結果がＮＯであればステップＳ１５で変数ＮをインクリメントしてからステップＳ７に戻り、判別結果がＹＥＳであればステップＳ１７に進む。 In step S11, the difference absolute value of the phases Ph_L (N) and Ph_R (N) calculated in this way is calculated as “ΔPh (N)”. In step S13, it is determined whether or not the variable N has reached the maximum value Nmax. If the determination result is NO, the variable N is incremented in step S15 and then the process returns to step S7. If the determination result is YES, the process proceeds to step S17.

ステップＳ１７では、変数Ｎを再度“１”に設定する。ステップＳ１９では差分絶対値ΔＰｈ（Ｎ）が閾値ＴＨ１を下回るか否かを判別し、ステップＳ２１では差分絶対値ΔＰｈ（Ｎ）が閾値ＴＨ２以上であるか否かを判別する。ここで、閾値ＴＨ１は数４に従って算出され、閾値ＴＨ２は数５に従って算出される。なお、数４における“８５°”は、同振幅で検出することが可能な正面方向からの音声信号とみなせる角度の限界に相当する。数５は、マイクを結ぶ直線の延長線上の方向から到来した場合の位相差を表し、理論上の最大位相差を示す。
［数４］
ＴＨ１＝Ｄ＊ｃｏｓ８５°／Ｖ＊２πｆ
［数５］
ＴＨ２＝Ｄ＊ｃｏｓ０°／Ｖ＊２πｆ In step S17, the variable N is set to “1” again. In step S19, it is determined whether or not the difference absolute value ΔPh (N) is less than the threshold value TH1, and in step S21, it is determined whether or not the difference absolute value ΔPh (N) is greater than or equal to the threshold value TH2. Here, the threshold value TH1 is calculated according to Equation 4, and the threshold value TH2 is calculated according to Equation 5. In addition, “85 °” in Equation 4 corresponds to a limit of an angle that can be regarded as an audio signal from the front direction that can be detected with the same amplitude. Equation 5 represents the phase difference when coming from the direction of the extended line of the straight line connecting the microphones, and shows the theoretical maximum phase difference.
[Equation 4]
TH1 = D * cos85 ° / V * 2πf
[Equation 5]
TH2 = D * cos0 ° / V * 2πf

ステップＳ１９の判別結果がＹＥＳであれば、Ｌチャネルに属するＮ番目の周波数成分のレベルをステップＳ２３で保存し、Ｒチャネルに属するＮ番目の周波数成分のレベルをステップＳ２５で保存する。ステップＳ２１の判別結果がＹＥＳであれば、ステップＳ２７で差分絶対値ΔＰｈ（Ｎ）を保存する。 If the decision result in the step S19 is YES, the level of the Nth frequency component belonging to the L channel is saved in a step S23, and the level of the Nth frequency component belonging to the R channel is saved in a step S25. If the determination result in the step S21 is YES, the difference absolute value ΔPh (N) is stored in a step S27.

ステップＳ２５またはＳ２７の処理が完了するか、或いはステップＳ１９およびＳ２１の判別結果がいずれもＮＯであれば、変数Ｎが最大値Ｎｍａｘに達したか否かをステップＳ２９で判別する。判別結果がＮＯであればステップＳ３１で変数ＮをインクリメントしてからステップＳ１９に戻り、判別結果がＹＥＳであればステップＳ３３に進む。 If the processing of step S25 or S27 is completed, or if the determination results of steps S19 and S21 are both NO, it is determined in step S29 whether the variable N has reached the maximum value Nmax. If the determination result is NO, the variable N is incremented in step S31 and then the process returns to step S19. If the determination result is YES, the process proceeds to step S33.

ステップＳ３３では、ステップＳ２３の処理によって保存されたレベルの平均値を“ＬＶａｖ＿Ｌ”として算出する。ステップＳ３５では、ステップＳ２５の処理によって保存されたレベルの平均値を“ＬＶａｖ＿Ｒ”として算出する。ステップＳ３７では、算出された平均値ＬＶａｖ＿ＬおよびＬＶａｖ＿Ｒの差分絶対値が抑制されるように振幅補正回路５０Ｌおよび５０Ｒの設定を調整する。 In step S33, the average value of the levels saved by the process of step S23 is calculated as “LVav_L”. In step S35, the average value of the levels saved by the process of step S25 is calculated as “LVav_R”. In step S37, the settings of the amplitude correction circuits 50L and 50R are adjusted so that the absolute difference value between the calculated average values LVav_L and LVav_R is suppressed.

ステップＳ３９では、ステップＳ２７の処理によって保存された差分絶対値の平均値を“ΔＰｈａｖ”として算出する。ステップＳ４１では、算出された平均値ΔＰｈａｖが抑制されるように、遅延補正回路５２Ｌおよび５２Ｒの設定を調整する。調整が完了すると、注目する１０２４サンプルに対する処理を終了する。 In step S39, the average value of the absolute differences stored by the process of step S27 is calculated as “ΔPhav”. In step S41, the settings of the delay correction circuits 52L and 52R are adjusted so that the calculated average value ΔPhav is suppressed. When the adjustment is completed, the processing for the target 1024 samples is terminated.

図７に示すように音声信号が前方から入射した場合、或る周波数に属するＬチャネルのデータ成分およびＲチャネルのデータ成分はそれぞれ図８（Ａ）に示す波形および図８（Ｂ）に示す波形を描く。また、図９に示すように音声信号が斜め右前方から入射した場合、或る周波数に属するＬチャネルのデータ成分およびＲチャネルのデータ成分はそれぞれ図１０（Ａ）に示す波形および図１０（Ｂ）に示す波形を描く。さらに、図１１に示すように音声信号が右側から入射した場合、或る周波数に属するＬチャネルのデータ成分およびＲチャネルのデータ成分はそれぞれ図１２（Ａ）に示す波形および図１２（Ｂ）に示す波形を描く。 When an audio signal is incident from the front as shown in FIG. 7, the L-channel data component and the R-channel data component belonging to a certain frequency are the waveform shown in FIG. 8A and the waveform shown in FIG. 8B, respectively. Draw. Also, as shown in FIG. 9, when the audio signal is incident obliquely from the right front, the L-channel data component and the R-channel data component belonging to a certain frequency have the waveforms shown in FIG. Draw the waveform shown in Further, when the audio signal is incident from the right side as shown in FIG. 11, the L-channel data component and the R-channel data component belonging to a certain frequency are respectively shown in the waveform shown in FIG. 12A and FIG. 12B. Draw the waveform shown.

ここで、図８（Ｂ），図１０（Ｂ）または図１２（Ｂ）に実線で示す波形は、振幅補正回路５０Ｒの特性が振幅補正回路５０Ｌの特性と一致し、かつ遅延補正回路５２Ｒの特性が遅延補正回路５２Ｌの特性と一致する場合のＲチャネルのデータ成分の変化を表す。 Here, in the waveform shown by the solid line in FIG. 8B, FIG. 10B, or FIG. 12B, the characteristic of the amplitude correction circuit 50R matches the characteristic of the amplitude correction circuit 50L, and the waveform of the delay correction circuit 52R. This represents a change in the data component of the R channel when the characteristic matches the characteristic of the delay correction circuit 52L.

また、図８（Ｂ），図１０（Ｂ）または図１２（Ｂ）に一点鎖線で示す波形は、振幅補正回路５０Ｒの特性が振幅補正回路５０Ｌの特性と相違し、かつ遅延補正回路５２Ｒの特性が遅延補正回路５２Ｌの特性と一致する場合のＲチャネルのデータ成分の変化を表す。 8B, 10B, or 12B, the waveform of the amplitude correction circuit 50R is different from that of the amplitude correction circuit 50L, and the waveform of the delay correction circuit 52R is different. This represents a change in the data component of the R channel when the characteristic matches the characteristic of the delay correction circuit 52L.

さらに、図８（Ｂ），図１０（Ｂ）または図１２（Ｂ）に破線で示す波形は、振幅補正回路５０Ｒの特性が振幅補正回路５０Ｌの特性と一致し、かつ遅延補正回路５２Ｒの特性が遅延補正回路５２Ｌの特性と相違する場合のＲチャネルのデータ成分の変化を表す。 8B, 10B, or 12B, the waveform of the amplitude correction circuit 50R matches the characteristic of the amplitude correction circuit 50L, and the characteristic of the delay correction circuit 52R. Represents a change in the data component of the R channel when the characteristic differs from the characteristic of the delay correction circuit 52L.

振幅補正回路５０Ｌと振幅補正回路５０Ｒとの間での特性の相違は、部品の性能のばらつきに起因して発生する。遅延補正回路５２Ｌと遅延補正回路５２Ｒとの間での特性の相違も、部品の性能のばらつきに起因して発生する。 Differences in characteristics between the amplitude correction circuit 50L and the amplitude correction circuit 50R occur due to variations in the performance of components. Differences in characteristics between the delay correction circuit 52L and the delay correction circuit 52R also occur due to variations in component performance.

また、音声信号の入射角が図７，図９および図１１の間で相違することから、図１０（Ｂ）に波形の位相は図８（Ｂ）に示す波形の位相よりも進み、図１２（Ｂ）に波形の位相は図１０（Ｂ）に示す波形の位相よりも進む。 Further, since the incident angle of the audio signal is different between FIGS. 7, 9 and 11, the phase of the waveform in FIG. 10B is ahead of the phase of the waveform shown in FIG. The phase of the waveform in (B) is ahead of the phase of the waveform shown in FIG.

これを踏まえて、図５に示すステップＳ１９の判別結果は、図７または図９に示す要領で入射された音声信号についてＹＥＳを示す一方、図１１に示す要領で入射された音声信号についてはＮＯを示す。これに対して、図５に示すステップＳ２１の判別結果は、図７または図９に示す要領で入射された音声信号についてＮＯを示す一方、図１１に示す要領で入射された音声信号についてはＹＥＳを示す。 Based on this, the determination result in step S19 shown in FIG. 5 indicates YES for the sound signal incident as shown in FIG. 7 or FIG. 9, while NO for the sound signal incident as shown in FIG. Indicates. On the other hand, the determination result of step S21 shown in FIG. 5 shows NO for the audio signal incident as shown in FIG. 7 or FIG. 9, while YES for the audio signal entered as shown in FIG. Indicates.

したがって、振幅補正系５０の設定は、図８（Ａ）に示す波形のレベルと図８（Ｂ）に示す波形のレベルとの相違が抑制されるように調整され、或いは図１０（Ａ）に示す波形のレベルと図１０（Ｂ）に示す波形のレベルとの相違が抑制されるように調整される。これに対して、遅延補正系５２の設定は、図１２（Ａ）に示す波形の位相と図１２（Ｂ）に示す波形の位相との相違が抑制されるように調整される。 Therefore, the setting of the amplitude correction system 50 is adjusted so that the difference between the waveform level shown in FIG. 8A and the waveform level shown in FIG. 8B is suppressed, or in FIG. Adjustment is made so that the difference between the waveform level shown and the waveform level shown in FIG. On the other hand, the setting of the delay correction system 52 is adjusted so that the difference between the phase of the waveform shown in FIG. 12A and the phase of the waveform shown in FIG.

以上の説明から分かるように、制御回路５６は、並列的に取得された２チャネルの音声データの各々をＮｍａｘ（Ｎｍａｘ：２以上の整数）の周波数にそれぞれ対応するＮｍａｘ個の周波数成分に分類し(S1~S5, S13~S15)、Ｎｍａｘ個の周波数の各々に対応する２つの周波数成分の間の位相差を差分絶対値ΔＰｈ（１）〜ΔＰｈ（Ｎｍａｘ）として検出する(S7~S11)。制御回路５６はまた、検出された差分絶対値ΔＰｈ（１）〜ΔＰｈ（Ｎｍａｘ）の中から閾値ＴＨ１を下回る差分絶対値を特定し(S19)、特定された差分絶対値を定義する２つの周波数成分の間のレベル差が抑制されるように振幅補正系５０の設定を調整する(S23~S25, S33~S37)。ここで、閾値ＴＨ１は、マイクロフォン３４Ｌおよび３４Ｒの間の距離と音声の許容入射角の上限とに基づく値を示す。 As can be seen from the above description, the control circuit 56 classifies each of the two channels of audio data acquired in parallel into Nmax frequency components respectively corresponding to Nmax (Nmax: an integer of 2 or more) frequencies. (S1 to S5, S13 to S15), the phase difference between the two frequency components corresponding to each of the Nmax frequencies is detected as difference absolute values ΔPh (1) to ΔPh (Nmax) (S7 to S11). The control circuit 56 also identifies a difference absolute value below the threshold TH1 from the detected difference absolute values ΔPh (1) to ΔPh (Nmax) (S19), and defines two frequencies defining the identified difference absolute value The setting of the amplitude correction system 50 is adjusted so that the level difference between the components is suppressed (S23 to S25, S33 to S37). Here, the threshold value TH1 indicates a value based on the distance between the microphones 34L and 34R and the upper limit of the allowable incident angle of sound.

制御回路５６はまた、Ｎｍａｘ個の差分絶対値ΔＰｈ（１）〜ΔＰｈ（Ｎｍａｘ）の中から閾値ＴＨ２以上の値を示す差分絶対値を特定し(S21, S27)、特定された差分絶対値に相当する位相差が抑制されるように遅延補正系５２の設定を調整する(S39~S41)。ここで、閾値ＴＨ２もまた、マイクロフォン３４Ｌおよび３４Ｒの間の距離に基づく値を示す。 The control circuit 56 also specifies a difference absolute value indicating a value equal to or greater than the threshold value TH2 from the Nmax difference absolute values ΔPh (1) to ΔPh (Nmax) (S21, S27), and sets the specified difference absolute value. The setting of the delay correction system 52 is adjusted so that the corresponding phase difference is suppressed (S39 to S41). Here, the threshold value TH2 also indicates a value based on the distance between the microphones 34L and 34R.

このように、音声データの振幅は、閾値ＴＨ１を下回る差分絶対値を定義する２つの周波数成分の間のレベル差が抑制されるように調整される。換言すれば、閾値ＴＨ１に相当する角度を下回る角度で入射された音声成分のレベル差が抑制されるように、Ｍ個の音声信号の全域の振幅が調整される。また、音声データの遅延量は、閾値ＴＨ２以上の差分絶対値に相当する位相差が抑制されるように調整される。換言すれば、閾値ＴＨ２をマイク間隔から決まる理論上の最大閾値とすることで、品質バラツキの影響で生じた最大位相差を上回る位相差を抑制する。この抑制処理を繰り返すことにより、どの方向から到来した音に対しても位相差が最大閾値以内に収まるようになる。この結果、品質バラツキによる遅延が補正され、音声信号の品質が向上する。 As described above, the amplitude of the audio data is adjusted so that the level difference between the two frequency components defining the absolute difference value below the threshold value TH1 is suppressed. In other words, the amplitudes of the entire area of the M audio signals are adjusted so that the level difference between the audio components incident at an angle lower than the angle corresponding to the threshold value TH1 is suppressed. Further, the delay amount of the audio data is adjusted so that the phase difference corresponding to the absolute difference value equal to or greater than the threshold value TH2 is suppressed. In other words, by setting the threshold value TH2 as the theoretical maximum threshold value determined from the microphone interval, a phase difference exceeding the maximum phase difference caused by the quality variation is suppressed. By repeating this suppression processing, the phase difference falls within the maximum threshold value for sound coming from any direction. As a result, the delay due to the quality variation is corrected, and the quality of the audio signal is improved.

なお、この実施例の音声処理回路３６は図３に示すように構成されるが、音声処理回路３６は図１３または図１４に示すように構成してもよい。 Although the audio processing circuit 36 of this embodiment is configured as shown in FIG. 3, the audio processing circuit 36 may be configured as shown in FIG. 13 or FIG.

図１３によれば、ＦＦＴ解析系５４は振幅補正系５０の前段に設けられ、逆ＦＦＴ系５８が遅延補正系５２の後段に設けられる。Ｌチャネルの音声データはＦＦＴ解析回路５４Ｌを介して振幅補正回路５０Ｌに与えられ、Ｒチャネルの音声データはＦＦＴ解析回路５４Ｒを介して振幅補正回路５０Ｒに与えられる。また、制御回路５６は、遅延補正系５２の出力に基づいて図４〜図６に示す処理を実行する。さらに、遅延補正回路５２Ｌの出力は逆ＦＦＴ回路５８Ｌによって音声データに戻された後にメモリ制御回路２２に向けて出力され、遅延補正回路５２Ｒの出力は逆ＦＦＴ回路５８Ｒによって音声データに戻された後にメモリ制御回路２２に向けて出力される。 According to FIG. 13, the FFT analysis system 54 is provided before the amplitude correction system 50, and the inverse FFT system 58 is provided after the delay correction system 52. The L channel audio data is provided to the amplitude correction circuit 50L via the FFT analysis circuit 54L, and the R channel audio data is provided to the amplitude correction circuit 50R via the FFT analysis circuit 54R. Further, the control circuit 56 executes the processes shown in FIGS. 4 to 6 based on the output of the delay correction system 52. Further, the output of the delay correction circuit 52L is output to the memory control circuit 22 after being returned to the audio data by the inverse FFT circuit 58L, and the output of the delay correction circuit 52R is returned to the audio data by the inverse FFT circuit 58R. It is output toward the memory control circuit 22.

図１４によれば、振幅補正回路５０Ｌおよび遅延補正回路５２Ｌの代わりに位相・振幅補正フィルタ６０Ｌが設けられ、振幅補正回路５０Ｒおよび遅延補正回路５２Ｒの代わりに位相・振幅補正フィルタ６０Ｒが設けられる。位相・振幅補正フィルタ６０Ｌおよび６０Ｒはいずれも、指向性を制御したり、ステレオ感を強調するための重み付けフィルタに相当する。このとき、図６に示すステップＳ３７およびＳ４１では、重み付けフィルタ６０Ｌおよび６０Ｒの設定が調整される。 According to FIG. 14, a phase / amplitude correction filter 60L is provided instead of the amplitude correction circuit 50L and the delay correction circuit 52L, and a phase / amplitude correction filter 60R is provided instead of the amplitude correction circuit 50R and the delay correction circuit 52R. Each of the phase / amplitude correction filters 60L and 60R corresponds to a weighting filter for controlling directivity or enhancing stereo feeling. At this time, in steps S37 and S41 shown in FIG. 6, the settings of the weighting filters 60L and 60R are adjusted.

また、この実施例では、図３に示す制御回路５６としてＤＳＰを採用しているが、ＤＳＰに代えてＣＰＵを採用するようにしてもよい。この場合、図４〜図６に示す処理に相当する制御プログラムは、図示しないフラッシュメモリに記憶される。 In this embodiment, a DSP is adopted as the control circuit 56 shown in FIG. 3, but a CPU may be adopted instead of the DSP. In this case, a control program corresponding to the processing shown in FIGS. 4 to 6 is stored in a flash memory (not shown).

１０ …ディジタルカメラ
１６ …イメージャ
２４ …ＳＤＲＡＭ
３０ …ＣＰＵ
３６ …音声処理回路
５０ …振幅補正系
５２ …遅延補正系
５４ …ＦＦＴ解析系
５６ …制御回路 10 ... Digital camera 16 ... Imager 24 ... SDRAM
30 ... CPU
36 ... Audio processing circuit 50 ... Amplitude correction system 52 ... Delay correction system 54 ... FFT analysis system 56 ... Control circuit

Claims

Classifying means for classifying each of M (M: integer greater than or equal to 2) audio signals acquired in parallel into N signal components respectively corresponding to N (N: integer greater than or equal to 2) frequencies;
Detecting means for detecting a phase difference between M signal components corresponding to each of the N frequencies with reference to an output of the classifying means;
A specifying means for specifying a phase difference indicating a value equal to or greater than a threshold value from among the N phase differences detected by the detecting means; and
Adjusting means for adjusting a delay amount of the M audio signals so that the phase difference specified by the specifying means is suppressed;
The threshold value indicates a value based on a distance between M microphones that respectively acquire the M audio signals.
Audio processing device.

The specifying unit specifies L (L: integer) phase differences indicating a value equal to or greater than the threshold value from among the N phase differences detected by the detection unit,
Calculating means for calculating an average value of the L phase differences specified by the specifying means;
The adjusting unit adjusts the delay amount of the M audio signals so that the average value of the phase differences calculated by the calculating unit is suppressed.
The speech processing apparatus according to claim 1.

The threshold value represents a phase difference in a case where the M audio signals arrive from a direction on an extension line of a straight line connecting the M microphones.
The speech processing apparatus according to claim 1.