JP5063489B2

JP5063489B2 - Judgment device, electronic apparatus including the same, and judgment method

Info

Publication number: JP5063489B2
Application number: JP2008146840A
Authority: JP
Inventors: 昌弘吉田; 誠山中; 智岐奥; 一眞原
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2008-06-04
Filing date: 2008-06-04
Publication date: 2012-10-31
Anticipated expiration: 2028-06-04
Also published as: JP2009296219A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a determination device capable of accurately carrying out direction determination for a broadband sound signal, even if intervals between a plurality of microphones are small. <P>SOLUTION: This determination device includes: an FFT part 23 for subjecting an output signal of a first microphone to time-frequency conversion; an FFT part 24 for subjecting an output signal of a second microphone having a directional characteristic different from that of the first microphone to time-frequency conversion; and a power comparison spectrum determination part 25 having a power comparison part for comparing power of a signal S1[F], in a frequency region output from the FFT part 23, with power of a signal S2[F] in a frequency region output from the FFT part 24, on frequency basis in a predetermined frequency band, and a determination part for determining a sound or sound source direction from a specific direction by using the comparison result in the power comparison part. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、特定方向からの音又は音源方向を判定する判定装置及びそれを備えた電子機器並びに特定方向からの音又は音源方向を判定する判定方法に関する。 The present invention relates to a determination device that determines a sound or sound source direction from a specific direction, an electronic apparatus including the determination device, and a determination method that determines a sound or sound source direction from a specific direction.

従来の特定音源強調手法として、例えば特許文献１に開示されているものがある。特許文献１に開示されている従来の特定音源強調手法を実現するための音声処理部は、図１４に示すような構成である。図１４に示す音声処理部では、ＦＦＴ（Fast Fourier Transform）部１０１が第１のマイクロホンの出力信号をデジタル信号に変換した後、更に周波数領域の信号に変換し、ＦＦＴ部１０２が第２のマイクロホンの出力信号をデジタル信号に変換した後、更に周波数領域の信号に変換し、スペクトラム判定部１０３がＦＦＴ部１０１から出力される周波数領域の信号とＦＦＴ部１０２から出力される周波数領域の信号から算出した相対パラメータをもとに必要なスペクトラムを判定し、その判定結果に基づいて不要スペクトラム部１０４及び１０５を制御し、不要スペクトラム部１０４がＦＦＴ部１０１から出力される周波数領域の信号の不要スペクトラムを減衰させ、不要スペクトラム部１０５がＦＦＴ部１０２から出力される周波数領域の信号の不要スペクトラムを減衰させ、不要スペクトラム部１０４から出力される周波数領域の信号がＩＦＦＴ（Inverse Fast Fourier Transform）部１０６によって時系列データに変換され、不要スペクトラム部１０５から出力される周波数領域の信号がＩＦＦＴ部１０７によって時系列データに変換される。 As a conventional specific sound source enhancement technique, there is one disclosed in Patent Document 1, for example. The speech processing unit for realizing the conventional specific sound source enhancement method disclosed in Patent Document 1 has a configuration as shown in FIG. In the audio processing unit shown in FIG. 14, an FFT (Fast Fourier Transform) unit 101 converts the output signal of the first microphone into a digital signal, and then converts it into a frequency domain signal, and the FFT unit 102 uses the second microphone. Is converted to a digital signal and then further converted to a frequency domain signal, and the spectrum determination unit 103 calculates from the frequency domain signal output from the FFT unit 101 and the frequency domain signal output from the FFT unit 102. The required spectrum is determined based on the relative parameters, and the unnecessary spectrum units 104 and 105 are controlled based on the determination result. The unnecessary spectrum unit 104 determines the unnecessary spectrum of the frequency domain signal output from the FFT unit 101. Attenuate the unnecessary spectrum portion 105 of the frequency domain signal output from the FFT portion 102. The unwanted spectrum is attenuated, the frequency domain signal output from the unwanted spectrum section 104 is converted into time-series data by an IFFT (Inverse Fast Fourier Transform) section 106, and the frequency domain signal output from the unwanted spectrum section 105 is converted to IFFT. The unit 107 converts the data into time series data.

特許第３４３５３５７号公報Japanese Patent No. 3435357

相対パラメータとして位相情報を用いる場合、第１のマイクロホンと第２のマイクロホンとの間隔に応じて制御可能な上限周波数が決定する。第１のマイクロホンと第２のマイクロホンとを近接させるほど制御可能な上限周波数が高くなるが、一般的なマイクロホンのサイズなどを考えると、第１のマイクロホンと第２のマイクロホンとの中心間隔は２ｃｍ程度が限界であり、上限周波数は８ｋＨｚ前後となる。 When phase information is used as a relative parameter, the controllable upper limit frequency is determined according to the interval between the first microphone and the second microphone. The upper limit frequency that can be controlled becomes higher as the first microphone and the second microphone are brought closer to each other. However, considering the size of a general microphone, the center distance between the first microphone and the second microphone is 2 cm. The upper limit frequency is around 8 kHz.

一方、相対パラメータとしてパワー情報を用いる場合、下限・上限周波数に制約はないが、図１５に示すように第１のマイクロホン１０８、第２のマイクロホン１０９に対して特定の方向から音が到来した場合、第１のマイクロホン１０８と第２のマイクロホン１０９との間隔が短いと、音源から第１のマイクロホン１０８までの行路と音源から第２のマイクロホン１０９までの行路との差ｄが微小になり、この行路差ｄ分の音の減衰が微小になるため、相対パラメータを識別することが困難である。 On the other hand, when power information is used as a relative parameter, there is no restriction on the lower limit / upper limit frequency, but when sound comes from a specific direction with respect to the first microphone 108 and the second microphone 109 as shown in FIG. When the distance between the first microphone 108 and the second microphone 109 is short, the difference d between the path from the sound source to the first microphone 108 and the path from the sound source to the second microphone 109 becomes small. Since the attenuation of the sound corresponding to the path difference d becomes minute, it is difficult to identify the relative parameter.

従って、特許文献１に開示されている従来の特定音源強調手法は、第１のマイクロホンと第２のマイクロホンとの間隔が短い場合、広帯域の音声信号に対する方向判定を精度良く行うことができなかった。 Therefore, the conventional specific sound source enhancement method disclosed in Patent Document 1 cannot accurately determine the direction of a wideband audio signal when the distance between the first microphone and the second microphone is short. .

本発明は、上記の状況に鑑み、複数のマイクロホンの間隔が短くても広帯域の音声信号に対する方向判定を精度良く行うことができる判定装置及びそれを備えた電子機器並びに判定方法を提供することを目的とする。 In view of the above situation, the present invention provides a determination device capable of accurately performing direction determination on a wideband audio signal even when the interval between a plurality of microphones is short, an electronic device including the determination device, and a determination method. Objective.

上記目的を達成するために本発明に係る判定装置は、第１のマイクロホンの出力信号を時間周波数変換する第１の時間周波数変換部と、前記第１のマイクロホンとは指向特性が異なる第２のマイクロホンの出力信号を時間周波数変換する第２の時間周波数変換部と、前記第１の時間周波数変換部から出力される周波数領域の信号のパワーと前記第２の時間周波数変換部から出力される周波数領域の信号のパワーとを、所定の周波数帯域において周波数毎に比較するパワー比較部と、前記パワー比較部での比較結果を用いて特定方向からの音又は音源方向を判定する判定部とを備える構成とする。なお、周波数領域の信号の或る周波数におけるパワーは、例えば、周波数領域の信号の或る周波数における振幅の二乗の平方根で表すことができる。 In order to achieve the above object, a determination apparatus according to the present invention includes a first time-frequency conversion unit that performs time-frequency conversion on an output signal of a first microphone, and a second directional characteristic different from that of the first microphone. A second time-frequency converter that converts the output signal of the microphone to time-frequency, a power of a frequency-domain signal output from the first time-frequency converter, and a frequency output from the second time-frequency converter. A power comparison unit that compares the signal power of the region for each frequency in a predetermined frequency band; and a determination unit that determines a sound or sound source direction from a specific direction using a comparison result in the power comparison unit. The configuration. The power at a certain frequency of the frequency domain signal can be expressed by, for example, the square root of the square of the amplitude at the certain frequency of the frequency domain signal.

このような構成によると、前記第１の時間周波数変換部から出力される周波数領域の信号のパワーと前記第２の時間周波数変換部から出力される周波数領域の信号のパワーとの比較結果である相対パワーパラメータを用いて方向判定を行っているので、下限・上限周波数に制約はなく、広帯域の音声信号に対する方向判定を精度良く行うことができる。また、前記第１のマイクロホンと前記第２のマイクロホンの指向特性が互いに異なるので、音源方向の相違による相対パワーパラメータの変化を大きくすることができる。したがって、前記第１のマイクロホンと前記第２のマイクロホンの間隔が短くても音声信号に対する方向判定を精度良く行うことができる。 According to such a configuration, it is a comparison result between the power of the frequency domain signal output from the first time frequency converter and the power of the frequency domain signal output from the second time frequency converter. Since the direction determination is performed using the relative power parameter, there is no restriction on the lower limit and the upper limit frequency, and the direction determination for the wideband audio signal can be performed with high accuracy. Further, since the directivity characteristics of the first microphone and the second microphone are different from each other, the change in the relative power parameter due to the difference in the sound source direction can be increased. Therefore, even when the interval between the first microphone and the second microphone is short, the direction determination for the audio signal can be performed with high accuracy.

また、前記第１のマイクロホンと前記第２のマイクロホンとの指向特性の相違量に基づく判定条件を格納する記憶部を備え、前記判定部が、前記パワー比較部での比較結果と前記記憶部に格納されている判定条件とから特定方向からの音を判定するようにしてもよい。 A storage unit that stores a determination condition based on a difference in directivity between the first microphone and the second microphone; and the determination unit stores the comparison result in the power comparison unit and the storage unit. You may make it determine the sound from a specific direction from the stored determination conditions.

また、前記所定の周波数帯域が第１の周波数帯域であって、前記第２のマイクロホンと指向特性が同一である第３のマイクロホンの出力信号を時間周波数変換する第３の時間周波数変換部と、前記第２の時間周波数変換部から出力される周波数領域の信号の位相と前記第３の時間周波数変換部から出力される周波数領域の信号の位相とを、前記第１の周波数帯域より低い帯域である第２の周波数帯域において周波数毎に比較する位相比較部と、前記第１のマイクロホンと前記第２のマイクロホンとの指向特性の相違量に基づく第１の判定条件を格納する第１の記憶部と、前記第２のマイクロホンと前記第３のマイクロホンとの位置関係に基づく第２の判定条件を格納する第２の記憶部とを備え、前記判定部が、前記パワー比較部での比較結果と前記第１の記憶部に格納されている第１の判定条件とから前記第１の周波数帯域の特定方向からの音を判定し、前記位相比較部での比較結果と前記第２の記憶部に格納されている第２の判定条件とから前記第２の周波数帯域の特定方向からの音を判定するようにしてもよい。 A third time-frequency conversion unit that performs time-frequency conversion on an output signal of a third microphone that has the same frequency direction as the second microphone, and the predetermined frequency band is a first frequency band; The phase of the frequency domain signal output from the second time frequency converter and the phase of the frequency domain signal output from the third time frequency converter in a band lower than the first frequency band. A phase comparison unit for comparing each frequency in a certain second frequency band, and a first storage unit for storing a first determination condition based on the amount of difference in directivity between the first microphone and the second microphone And a second storage unit that stores a second determination condition based on a positional relationship between the second microphone and the third microphone, wherein the determination unit is a comparison result of the power comparison unit The sound from the specific direction of the first frequency band is determined based on the first determination condition stored in the first storage unit, and the comparison result in the phase comparison unit and the second storage unit You may make it determine the sound from the specific direction of the said 2nd frequency band from the stored 2nd determination conditions.

また、２方向の音を判定することができるように、前記第１の時間周波数変換部から出力される周波数領域の信号の位相と前記第２の時間周波数変換部から出力される周波数領域の信号の位相とを、前記所定の周波数帯域において周波数毎に比較する位相比較部と、前記第１のマイクロホンと前記第２のマイクロホンとの指向特性の相違量に基づく第１の判定条件を格納する第１の記憶部と、前記第１のマイクロホンと前記第２のマイクロホンとの位置関係に基づく第２の判定条件を格納する第２の記憶部とを備え、前記判定部が、前記パワー比較部での比較結果と前記第１の記憶部に格納されている第１の判定条件とから第１の方向からの音、第２の方向からの音のいずれかであるか否かを判定する一次判定部と、前記一次判定部によって、前記第１の方向からの音、前記第２の方向からの音のいずれかであると判定された場合、前記位相比較部での比較結果と前記第２の記憶部に格納されている第２の判定条件とから、前記第１の方向からの音であるか否かを判定する二次判定部とを有するようにしてもよい。 Further, the phase of the frequency domain signal output from the first time-frequency converter and the frequency domain signal output from the second time-frequency converter so that sound in two directions can be determined. And a first comparison condition that stores a first determination condition based on a difference in directivity characteristics between the first microphone and the second microphone. 1 storage unit, and a second storage unit that stores a second determination condition based on the positional relationship between the first microphone and the second microphone, and the determination unit is the power comparison unit Primary determination to determine whether the sound is from the first direction or the sound from the second direction from the comparison result of the first and the first determination condition stored in the first storage unit And the primary determination unit When it is determined that the sound is from the first direction or the sound from the second direction, the comparison result in the phase comparison unit and the second stored in the second storage unit And a secondary determination unit that determines whether or not the sound is from the first direction.

また、前記第１のマイクロホンの指向性パターンと前記第２のマイクロホンの指向特性パターンとが左右対称であって、前記パワー比較部によって、前記第１の時間周波数変換部から出力される周波数領域の信号のパワーと前記第２の時間周波数変換部から出力される周波数領域の信号のパワーとが等しいとの比較結果が得られたときに、前記判定部が、正面方向からの音であると判定するようにしてもよい。 Further, the directivity pattern of the first microphone and the directivity pattern of the second microphone are symmetric, and the power comparison unit outputs a frequency domain output from the first time frequency conversion unit. When a comparison result is obtained that the power of the signal is equal to the power of the signal in the frequency domain output from the second time-frequency conversion unit, the determination unit determines that the sound is from the front direction. You may make it do.

上記目的を達成するために本発明に係る電子機器は、上記構成の判定装置を少なくとも一つ備え、前記判定装置の判定結果に基づき、集音した音声信号に対して音声処理を施す構成とする。 In order to achieve the above object, an electronic apparatus according to the present invention includes at least one determination device having the above-described configuration, and performs sound processing on the collected sound signal based on the determination result of the determination device. .

また、上記構成の電子機器において、集音した音声信号の記録・再生機能を有し、集音した音声信号を記録する際、又は、記録した音声信号を再生する際のいずれかにおいて前記判定装置が判定処理を行うようにしてもよい。 Further, the electronic device having the above-described configuration has a recording / playback function of the collected audio signal, and the determination device is used either when recording the collected audio signal or reproducing the recorded audio signal May perform the determination process.

また、上記各構成の電子機器の一例としては、映像を撮影するカメラを備える撮像装置が挙げられる。 Moreover, as an example of the electronic device having each configuration described above, an imaging apparatus including a camera that captures an image can be given.

上記目的を達成するために本発明に係る判定方法は、第１のマイクロホンの出力信号を時間周波数変換する第１の時間周波数変換ステップと、前記第１のマイクロホンとは指向特性が異なる第２のマイクロホンの出力信号を時間周波数変換する第２の時間周波数変換ステップと、前記第１の時間周波数変換ステップによって得られる周波数領域の信号のパワーと前記第２の時間周波数変換ステップによって得られる周波数領域の信号のパワーとを、所定の周波数帯域において周波数毎に比較するパワー比較ステップと、前記パワー比較ステップによって得られる比較結果を用いて特定方向からの音又は音源方向を判定する判定ステップとを有する。 In order to achieve the above object, a determination method according to the present invention includes a first time-frequency conversion step of converting the output signal of the first microphone to time-frequency, and a second directional characteristic different from that of the first microphone. A second time-frequency conversion step for time-frequency conversion of the output signal of the microphone; a power of the signal in the frequency domain obtained by the first time-frequency conversion step; and a frequency domain obtained by the second time-frequency conversion step. A power comparison step of comparing the power of the signal for each frequency in a predetermined frequency band; and a determination step of determining a sound or sound source direction from a specific direction using a comparison result obtained by the power comparison step.

本発明によると、相対パワーパラメータを用いて方向判定を行っているので、下限・上限周波数に制約はなく、広帯域の音声信号に対する方向判定を精度良く行うことができる。また、互いに異なる複数のマイクロホンの出力信号を利用しているので、音源方向の相違による相対パワーパラメータの変化を大きくすることができる。したがって、複数のマイクロホンの間隔が短くても音声信号に対する方向判定を精度良く行うことができる。 According to the present invention, since the direction determination is performed using the relative power parameter, there is no restriction on the lower limit and the upper limit frequency, and the direction determination with respect to the wideband audio signal can be performed with high accuracy. In addition, since the output signals of a plurality of different microphones are used, the change in relative power parameter due to the difference in sound source direction can be increased. Therefore, even when the interval between the plurality of microphones is short, the direction determination for the audio signal can be performed with high accuracy.

本発明の実施形態について図面を参照して以下に説明する。 Embodiments of the present invention will be described below with reference to the drawings.

本発明に係る判定方法は、集音した音声信号を利用して判定を行うので、集音した音声信号を記録する際だけでなく、すでに集音・記録された音声信号を再生する際にも適用できる。 Since the determination method according to the present invention makes a determination using the collected audio signal, not only when the collected audio signal is recorded, but also when the already collected / recorded audio signal is reproduced. Applicable.

以下では、本発明に係る判定方法を集音した音声信号を記録する際に適用した判定装置を搭載した撮像装置（例えば、ビデオカメラ、デジタルカメラなど）を例に挙げて説明する。 Hereinafter, an image pickup apparatus (for example, a video camera, a digital camera, etc.) equipped with a determination apparatus applied when recording an audio signal collected by the determination method according to the present invention will be described as an example.

図１は、本発明に係る判定装置を搭載した撮像装置の一内部構成例を示すブロック図である。 FIG. 1 is a block diagram showing an example of an internal configuration of an imaging apparatus equipped with a determination apparatus according to the present invention.

図１に示す撮像装置は、入射される光を電気信号に変換するＣＣＤ（Charge Coupled Device）またはＣＭＯＳ（Complimentary Metal Oxide Semiconductor）センサなどの固体撮像素子（イメージセンサ）１と、被写体の光学像をイメージセンサ１に結像させるズームレンズとズームレンズの焦点距離すなわち光学ズーム倍率を変化させるモータとズームレンズの焦点を被写体に合わせるためのモータとを有するレンズ部２と、イメージセンサ１から出力されるアナログ信号である画像信号をデジタル信号に変換するＡＦＥ（Analog Front End）３と、指向特性の異なる複数のマイクロホンを有するマイク部４と、ＡＦＥ３からのデジタル信号となる画像信号に対して、階調補正等の各種画像処理を施す画像処理部５と、マイク部４からのアナログ信号である音声信号に対してデジタル信号に変換するとともに音声補正処理を施す音声処理部６と、画像処理部５からの画像信号及び音声処理部６からの音声信号それぞれに対してＭＰＥＧ（Moving Picture Experts Group）圧縮方式などの圧縮符号化処理を施す圧縮処理部７と、圧縮処理部７で圧縮符号化された圧縮符号化信号をＳＤカードなどの外部メモリ２２に記録するドライバ部８と、ドライバ部８で外部メモリ２２から読み出した圧縮符号化信号を伸長して復号する伸長処理部９と、伸長処理部９で復号されて得られた画像信号をアナログ信号に変換するビデオ出力回路部１０と、ビデオ出力回路部１０で変換された信号を出力するビデオ出力端子１１と、ビデオ出力回路部１０からの信号に基づく画像の表示を行うＬＣＤ等を有するディスプレイ部１２と、伸長処理部９からの音声信号をアナログ信号に変換する音声出力回路部１３と、音声出力回路部１３で変換された信号を出力する音声出力端子１４と、音声出力回路部１３からの音声信号に基づいて音声を再生出力するスピーカ部１５と、各ブロックの動作タイミングを一致させるためのタイミング制御信号を出力するタイミングジェネレータ（ＴＧ）１６と、撮像装置内全体の駆動動作を制御するＣＰＵ（Central Processing Unit）１７と、各動作のための各プログラムを記憶するとともにプログラム実行時のデータの一時保管を行うメモリ１８と、ユーザからの指示が入力される操作部１９と、ＣＰＵ１７と各ブロックとの間でデータのやりとりを行うためのバス回線２０と、メモリ１８と各ブロックとの間でデータのやりとりを行うためのバス回線２１と、を備える。なお、ＣＰＵ１７は、画像処理部５で検出した画像信号に応じて、レンズ部２内の各モータを駆動して焦点、絞りの制御を行う。 The image pickup apparatus shown in FIG. 1 has a solid-state image pickup device (image sensor) 1 such as a CCD (Charge Coupled Device) or CMOS (Complimentary Metal Oxide Semiconductor) sensor that converts incident light into an electric signal, and an optical image of a subject. The image sensor 1 outputs a zoom lens that forms an image on the image sensor 1, a motor that changes the focal length of the zoom lens, that is, a motor that changes the optical zoom magnification, and a motor that focuses the zoom lens on the subject. An AFE (Analog Front End) 3 that converts an image signal that is an analog signal into a digital signal, a microphone unit 4 that has a plurality of microphones having different directivity characteristics, and an image signal that is a digital signal from the AFE 3 An image processing unit 5 that performs various image processing such as correction, and an audio signal that is an analog signal from the microphone unit 4 On the other hand, the audio processing unit 6 that converts the digital signal and performs audio correction processing, and the MPEG (Moving Picture Experts Group) compression method for the image signal from the image processing unit 5 and the audio signal from the audio processing unit 6, etc. A compression processing unit 7 that performs the compression encoding process, a driver unit 8 that records the compression-encoded signal compression-encoded by the compression processing unit 7 in an external memory 22 such as an SD card, and the driver unit 8 an external memory 22 A decompression processing unit 9 that decompresses and decodes the compression-coded signal read from the video signal, a video output circuit unit 10 that converts an image signal obtained by decoding by the decompression processing unit 9 into an analog signal, and a video output circuit unit 10 A video output terminal 11 for outputting the signal converted in step S4, a display unit 12 having an LCD or the like for displaying an image based on the signal from the video output circuit unit 10, and decompression. Based on the audio output circuit unit 13 that converts the audio signal from the processing unit 9 into an analog signal, the audio output terminal 14 that outputs the signal converted by the audio output circuit unit 13, and the audio signal from the audio output circuit unit 13 A speaker unit 15 that reproduces and outputs sound, a timing generator (TG) 16 that outputs a timing control signal for matching the operation timing of each block, and a CPU (Central Processing Unit) that controls the overall driving operation in the imaging apparatus. ) 17, a memory 18 for storing each program for each operation and temporarily storing data when the program is executed, an operation unit 19 to which an instruction from the user is input, and between the CPU 17 and each block A bus line 20 for exchanging data and a bus circuit for exchanging data between the memory 18 and each block. It includes a 21, a. The CPU 17 controls the focus and the diaphragm by driving each motor in the lens unit 2 in accordance with the image signal detected by the image processing unit 5.

音声処理部６は、本発明に係る判定装置を備え、本発明に係る判定装置の判定結果に応じた音声処理を行っている。小型化及び低コスト化の観点から、音声処理部６単独または音声処理部６に圧縮処理部７内の音声圧縮符号化器を含めたものは、１パッケージ化されたＬＳＩパッケージになっていることが望ましい。 The audio processing unit 6 includes the determination device according to the present invention, and performs audio processing according to the determination result of the determination device according to the present invention. From the viewpoint of downsizing and cost reduction, the audio processing unit 6 alone or the audio processing unit 6 including the audio compression encoder in the compression processing unit 7 is an LSI package that is made into one package. Is desirable.

次に、図１に示す撮像装置の動画撮影時の基本動作について図２のフローチャートを用いて説明する。まず、ユーザが操作部１９を操作して撮像装置を動画撮影用に設定して電源をＯＮにすると（ＳＴＥＰ１）、撮像装置の駆動モードつまりイメージセンサ１の駆動モードがプレビューモードに設定される（ＳＴＥＰ２）。続いて撮影モードの入力待ち状態となる。撮影モードが入力されない場合は通常撮影用のモードが選択されたものとする（ＳＴＥＰ３）。プレビューモードでは、イメージセンサ１の光電変換動作によって得られたアナログ信号である画像信号がＡＦＥ３においてデジタル信号に変換されて、画像処理部５で画像処理が施され、圧縮処理部７で圧縮された現時点の画像に対する画像信号が外部メモリ２２に一時的に記録される。この圧縮信号は、ドライバ部８を経て、伸長処理部９で伸長され、現時点で設定されているレンズ部２のズーム倍率での画角の画像がディスプレイ部１２に表示される。 Next, the basic operation of the image pickup apparatus shown in FIG. 1 during moving image shooting will be described with reference to the flowchart of FIG. First, when the user operates the operation unit 19 to set the imaging device for moving image shooting and turn on the power (STEP 1), the driving mode of the imaging device, that is, the driving mode of the image sensor 1 is set to the preview mode ( (Step 2). Subsequently, the camera enters a shooting mode input waiting state. When the shooting mode is not input, it is assumed that the normal shooting mode is selected (STEP 3). In the preview mode, an image signal that is an analog signal obtained by the photoelectric conversion operation of the image sensor 1 is converted into a digital signal by the AFE 3, subjected to image processing by the image processing unit 5, and compressed by the compression processing unit 7. An image signal for the current image is temporarily recorded in the external memory 22. The compressed signal is expanded by the expansion processing unit 9 via the driver unit 8, and an image with an angle of view at the zoom magnification of the lens unit 2 set at the present time is displayed on the display unit 12.

続いてユーザが、撮影の対象とする被写体に対して所望の画角となるように、操作部１９を操作すると、その操作に応じた光学ズームでのズーム倍率が設定される（ＳＴＥＰ４）。その際、画像処理部５に入力された画像信号を基にＣＰＵ１７によってレンズ部２を制御して、最適な露光制御（Automatic Exposure；ＡＥ）・焦点合わせ制御（オートフォーカス、Auto Focus；ＡＦ）が行われる（ＳＴＥＰ５）。 Subsequently, when the user operates the operation unit 19 so as to obtain a desired angle of view with respect to the subject to be photographed, the zoom magnification with the optical zoom corresponding to the operation is set (STEP 4). At this time, the lens unit 2 is controlled by the CPU 17 based on the image signal input to the image processing unit 5, and optimum exposure control (Automatic Exposure; AE) / focusing control (Auto Focus; AF) is performed. It is performed (STEP 5).

その後、操作部１９の録画開始ボタン（静止画撮影用のシャッターボタンと兼用でも構わない）が全押しされ、録画動作の開始が指示されると（ＳＴＥＰ６のＹ）、録画動作が開始され、イメージセンサ１の光電変換動作によって得られたアナログ信号である画像信号がＡＦＥ３に出力される。このとき、イメージセンサ１では、ＴＧ１６からのタイミング制御信号が与えられることによって、水平走査及び垂直走査が行われて、画素毎のデータとなる画像信号が出力される。そして、ＡＦＥ３において、アナログ信号である画像信号（生データ）がデジタル信号に変換されて、画像処理部５内のフレームメモリに書き込まれる（ＳＴＥＰ７）。 Thereafter, when the recording start button of the operation unit 19 (which may also be used as a shutter button for taking a still image) is fully pressed and the start of the recording operation is instructed (Y in STEP 6), the recording operation is started and the image is displayed. An image signal that is an analog signal obtained by the photoelectric conversion operation of the sensor 1 is output to the AFE 3. At this time, the image sensor 1 receives the timing control signal from the TG 16 to perform horizontal scanning and vertical scanning, and output an image signal as data for each pixel. Then, in the AFE 3, the image signal (raw data) that is an analog signal is converted into a digital signal and written into the frame memory in the image processing unit 5 (STEP 7).

画像処理部５では輝度信号及び色差信号の生成を行う信号変換処理などの各種画像処理が施され、その画像処理が施された画像信号が圧縮処理部７に与えられる。一方、音声処理部６では、マイク部４に音声入力されることで得られたアナログ信号である音声信号に対してＡ／Ｄ変換処理が施されるとともに、本発明に係る判定装置の判定結果に応じた音声処理が施され、その音声処理が施された音声信号が圧縮処理部７に与えられる（ＳＴＥＰ８）。この音声処理については後述する。 The image processing unit 5 performs various image processing such as signal conversion processing for generating a luminance signal and a color difference signal, and the image signal subjected to the image processing is given to the compression processing unit 7. On the other hand, the audio processing unit 6 performs A / D conversion processing on an audio signal that is an analog signal obtained by inputting the sound into the microphone unit 4, and the determination result of the determination device according to the present invention. The audio signal is subjected to the audio processing and the audio signal subjected to the audio processing is supplied to the compression processing unit 7 (STEP 8). This voice processing will be described later.

圧縮処理部７では、デジタル信号である画像信号及び音声信号に対して、ＭＰＥＧ圧縮符号方式に基づいて、圧縮符号化し（ＳＴＥＰ９）、ドライバ部８に与えて、外部メモリ２２に記録させる（ＳＴＥＰ１０）。また、このとき、外部メモリ２２に記録された圧縮データがドライバ部８によって読み出されて伸長処理部９に与えられて、伸長処理が施されて画像信号が得られる。この画像信号がディスプレイ部１２に与えられて、現在、イメージセンサ１を通じて撮影されている被写体画像が表示される。その後、再び操作部１９の録画開始ボタンが全押しされ、録画動作の終了が指示されると（ＳＴＥＰ１１のＹ）プレビューモードに戻る（ＳＴＥＰ２）。 The compression processing unit 7 compresses and encodes the image signal and the audio signal, which are digital signals, based on the MPEG compression coding method (STEP 9), gives them to the driver unit 8, and records them in the external memory 22 (STEP 10). . At this time, the compressed data recorded in the external memory 22 is read out by the driver unit 8 and given to the expansion processing unit 9, and is subjected to expansion processing to obtain an image signal. This image signal is given to the display unit 12 to display a subject image currently photographed through the image sensor 1. After that, when the recording start button of the operation unit 19 is fully pressed again and the end of the recording operation is instructed (Y in STEP 11), the process returns to the preview mode (STEP 2).

このような撮像動作を行うとき、ＴＧ１６によって、ＡＦＥ３、画像処理部５、音声処理部６、圧縮処理部７、及び伸長処理部９に対してタイミング制御信号が与えられ、イメージセンサ１による１フレームごとの撮像動作に同期した動作が行われる。 When performing such an imaging operation, a timing control signal is given by the TG 16 to the AFE 3, the image processing unit 5, the audio processing unit 6, the compression processing unit 7, and the decompression processing unit 9, and one frame by the image sensor 1. An operation synchronized with each imaging operation is performed.

また、外部メモリ２２に記録された圧縮動画データを再生することが、操作部１９を通じて指示されると、外部メモリ２２に記録された圧縮動画データは、ドライバ部８によって読み出されて伸長処理部９に与えられる。そして、伸長処理部９において、ＭＰＥＧ圧縮符号方式に基づいて、伸長復号されて、画像信号及び音声信号が取得される。そして、画像信号がディスプレイ部１２に与えられて画像が再生されるとともに、音声信号が音声出力回路部１３を介してスピーカ部１５に与えられて音声が再生される。これにより、外部メモリ２２に記録された圧縮動画データに基づく画像が音声とともに再生される。 In addition, when it is instructed through the operation unit 19 to reproduce the compressed moving image data recorded in the external memory 22, the compressed moving image data recorded in the external memory 22 is read by the driver unit 8 and decompressed. 9 is given. Then, the decompression processing unit 9 decompresses and decodes the image signal and the audio signal based on the MPEG compression encoding method. Then, the image signal is given to the display unit 12 to reproduce the image, and the audio signal is given to the speaker unit 15 via the audio output circuit unit 13 to reproduce the audio. Thereby, an image based on the compressed moving image data recorded in the external memory 22 is reproduced together with the sound.

次に、音声処理部６の具体例として４つの実施形態（第１実施形態〜第４実施形態）を示し、各実施形態の音声処理部６が図２のＳＴＥＰ８において実施する音声処理について説明する。 Next, four embodiments (first to fourth embodiments) are shown as specific examples of the audio processing unit 6, and the audio processing performed by the audio processing unit 6 of each embodiment in STEP 8 of FIG. 2 will be described. .

＜第１実施形態＞
第１実施形態の音声処理部６を用いる場合、マイク部４は、図３に示すように互いに近接して配置される指向性マイク４Ａ及び無指向性マイク４Ｂから成る構成とする。例えば、指向性マイク４Ａと無指向性マイク４Ｂの中心間隔を２ｃｍとする。さらに、指向性マイク４Ａが図４に示す単一指向性パターンＰ１を有し、無指向性マイク４Ｂが図４に示す無指向性パターンＰ２を有するように指向性マイク４Ａ及び無指向性マイク４Ｂを配置する。なお、図４に示す指向特性（単一指向性パターンＰ１、無指向性パターンＰ２、音声処理により得られる新たな指向性パターンＰ３）は、音到来方向別のマイク感度を表しており、パターンを形成する或る点が中心Ｏから離れているほど、その或る点から中心Ｏに向かう方向からの音に対するマイク感度が高いことを表している。 <First Embodiment>
When the audio processing unit 6 of the first embodiment is used, the microphone unit 4 includes a directional microphone 4A and an omnidirectional microphone 4B that are arranged close to each other as shown in FIG. For example, the center distance between the directional microphone 4A and the omnidirectional microphone 4B is 2 cm. Furthermore, the directional microphone 4A and the omnidirectional microphone 4B are arranged so that the directional microphone 4A has the unidirectional pattern P1 shown in FIG. 4 and the omnidirectional microphone 4B has the omnidirectional pattern P2 shown in FIG. Place. Note that the directional characteristics shown in FIG. 4 (unidirectional pattern P1, omnidirectional pattern P2, and new directional pattern P3 obtained by voice processing) represent microphone sensitivities by sound arrival direction. The farther a certain point to be formed is from the center O, the higher the microphone sensitivity with respect to the sound from the direction from the certain point toward the center O is.

第１実施形態の音声処理部６は、図５に示すように、ＦＦＴ部２３及び２４と、パワー比較スペクトラム判定部２５と、メモリ部２６と、不要スペクトラム除去部２７と、ＩＦＦＴ部２８とを備える。 As shown in FIG. 5, the audio processing unit 6 of the first embodiment includes FFT units 23 and 24, a power comparison spectrum determination unit 25, a memory unit 26, an unnecessary spectrum removal unit 27, and an IFFT unit 28. Prepare.

ＦＦＴ部２３は、指向性マイク４Ａの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ１［Ｆ］に変換し、その周波数領域の信号Ｓ１［Ｆ］をパワー比較スペクトラム判定部２５及び不要スペクトラム除去部２７に出力する。また、ＦＦＴ部２４は、無指向性マイク４Ｂの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ２［Ｆ］に変換し、その周波数領域の信号Ｓ２［Ｆ］をパワー比較スペクトラム判定部２５に出力する。 The FFT unit 23 samples the output signal of the directional microphone 4A at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S1 [F] by FFT processing every 2048 samples. S 1 [F] is output to the power comparison spectrum determination unit 25 and the unnecessary spectrum removal unit 27. Further, the FFT unit 24 samples the output signal of the omnidirectional microphone 4B at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S2 [F] by FFT processing every 2048 samples. The region signal S 2 [F] is output to the power comparison spectrum determination unit 25.

パワー比較スペクトラム判定部２５は、周波数領域の信号Ｓ１［Ｆ］のパワーと周波数領域の信号Ｓ２［Ｆ］のパワーそれぞれを所定の周波数帯域（例えば、ＦＦＴ部２３及び２４でのＦＦＴ処理対象の周波数領域）において周波数毎に計算し、周波数領域の信号Ｓ１［Ｆ］のパワーと周波数領域の信号Ｓ２［Ｆ］のパワーとを比較して、相対パワーパラメータ（ここでは、周波数領域の信号Ｓ１［Ｆ］のパワーから周波数領域の信号Ｓ２［Ｆ］のパワーを差し引いた値とする）を周波数毎に求める。メモリ部２６は、図４に示す単一指向性パターンＰ１と無指向性パターンＰ２から決定される値であって図４に示すベクトルｖ１に対応する値である閾値ＴＨを予め記憶している。パワー比較スペクトラム判定部２５は、相対パワーパラメータが閾値ＴＨより大きいか否かを周波数毎に判定し、相対パワーパラメータが閾値ＴＨより大きい場合に正面方向からの音成分であると判定する。 The power comparison spectrum determination unit 25 uses the power of the frequency domain signal S1 [F] and the power of the frequency domain signal S2 [F] in a predetermined frequency band (for example, the frequency of the FFT processing target in the FFT units 23 and 24). Area), the power of the frequency domain signal S1 [F] is compared with the power of the frequency domain signal S2 [F], and the relative power parameter (here, the frequency domain signal S1 [F] is compared). ] Is obtained for each frequency by subtracting the power of the signal S2 [F] in the frequency domain from the power of The memory unit 26 stores in advance a threshold value TH that is a value determined from the unidirectional pattern P1 and the omnidirectional pattern P2 shown in FIG. 4 and corresponding to the vector v1 shown in FIG. The power comparison spectrum determination unit 25 determines whether or not the relative power parameter is greater than the threshold value TH for each frequency, and determines that the sound component is from the front direction when the relative power parameter is greater than the threshold value TH.

不要スペクトラム除去部２７は、パワー比較スペクトラム判定部２５の判定結果に基づいて、周波数領域の信号Ｓ１［Ｆ］から正面方向からの音成分でない不要な成分を周波数領域上で除去し、その不要な成分が除去された周波数領域の信号をＩＦＦＴ部２８に出力する。ＩＦＦＴ部２８は、不要スペクトラム除去部２７の出力信号をＩＦＦＴ処理にて時間領域の信号に変換し、第１実施形態の音声処理部６の出力信号として圧縮処理部７（図１参照）に出力する。 Based on the determination result of the power comparison spectrum determination unit 25, the unnecessary spectrum removal unit 27 removes unnecessary components that are not sound components from the front direction on the frequency domain from the signal S1 [F] in the frequency domain. The frequency domain signal from which the component has been removed is output to the IFFT unit 28. The IFFT unit 28 converts the output signal of the unnecessary spectrum removing unit 27 into a signal in the time domain by IFFT processing, and outputs it to the compression processing unit 7 (see FIG. 1) as the output signal of the audio processing unit 6 of the first embodiment. To do.

上記のような音声処理によって新たな指向性パターンＰ３を得ることができる。第１実施形態の音声処理部６では、相対パワーパラメータを用いて方向判定を行っているので、下限・上限周波数に制約はなく、広帯域の音声信号に対する方向判定を精度良く行うことができる。また、例えば図４に示すような指向特性のマイク部４を用いることにより、音源方向の相違による相対パワーパラメータの変化を大きくすることができるので、複数のマイクロホンの間隔が短くても音声信号に対する方向判定を精度良く行うことができる。 A new directivity pattern P3 can be obtained by the sound processing as described above. In the voice processing unit 6 of the first embodiment, since the direction determination is performed using the relative power parameter, there is no restriction on the lower limit / upper limit frequency, and the direction determination for the wideband voice signal can be performed with high accuracy. Further, for example, by using the microphone unit 4 having directivity characteristics as shown in FIG. 4, the change in the relative power parameter due to the difference in the sound source direction can be increased. Direction determination can be performed with high accuracy.

なお、撮像装置では撮影状況により主要音源とマイク部との位置関係が変化することが想定されるが、主要音源とマイク部との位置関係がどのように変化したとしても、上述した通り、例えば図４に示すような指向特性にすることにより、音源方向の相違による相対パワーパラメータの変化を大きくすることができるので、何ら問題は生じない。また、本実施形態では、正面方向からの音を判定対象としているが、メモリ部２６に予め記憶させる閾値ＴＨの値を変えることにより、他の特定方向からの音を判定対象とすることも可能である。さらに、閾値ＴＨの値を複数用意し、その中から判定に用いる値を選択することにより、特定方向の切り替えが可能となる。 Note that in the imaging device, it is assumed that the positional relationship between the main sound source and the microphone unit changes depending on the shooting situation, but no matter how the positional relationship between the main sound source and the microphone unit changes, as described above, for example, By using the directivity characteristics as shown in FIG. 4, the change in relative power parameter due to the difference in sound source direction can be increased, so no problem occurs. In the present embodiment, sound from the front direction is targeted for determination, but by changing the threshold value TH stored in the memory unit 26 in advance, it is also possible to determine sound from other specific directions. It is. Furthermore, a specific direction can be switched by preparing a plurality of threshold TH values and selecting a value to be used for determination from among them.

＜第２実施形態＞
第２実施形態の音声処理部６を用いる場合、マイク部４は、図６に示すようにそれぞれ近接して配置される指向性マイク４Ａ並びに無指向性マイク４Ｂ及び４Ｃから成る構成とする。例えば、指向性マイク４Ａと無指向性マイク４Ｂの中心間隔、無指向性マイク４Ｂと無指向性マイク４Ｃの中心間隔をそれぞれ２ｃｍとする。さらに、指向性マイク４Ａが図４に示す単一指向性パターンＰ１を有し、無指向性マイク４Ｂ及び４Ｃがそれぞれ図４に示す無指向性パターンＰ２を有するように指向性マイク４Ａ並びに無指向性マイク４Ｂ及び４Ｃを配置する。 Second Embodiment
When the audio processing unit 6 of the second embodiment is used, the microphone unit 4 includes a directional microphone 4A and omnidirectional microphones 4B and 4C that are arranged close to each other as shown in FIG. For example, the center interval between the directional microphone 4A and the omnidirectional microphone 4B and the center interval between the omnidirectional microphone 4B and the omnidirectional microphone 4C are each 2 cm. Furthermore, the directional microphone 4A and the omnidirectional are set such that the directional microphone 4A has the unidirectional pattern P1 shown in FIG. 4 and the omnidirectional microphones 4B and 4C have the omnidirectional pattern P2 shown in FIG. The sex microphones 4B and 4C are arranged.

第１実施形態の音声処理部６は、図７に示すように、ＦＦＴ部２９及び３１と、ＨＰＦ（High Pass Filter）３２及び３３と、ＬＰＦ（Low Pass Filter）３４及び３５と、パワー比較スペクトラム判定部３６と、位相比較スペクトラム判定部３７と、メモリ部３８と、不要スペクトラム除去部３９と、ＩＦＦＴ部４０とを備える。ＨＰＦ３２及び３３のカットオフ周波数、ＬＰＦ３４及び３５のカットオフ周波数はともに８ｋＨｚとしている。 As shown in FIG. 7, the sound processing unit 6 of the first embodiment includes FFT units 29 and 31, HPFs (High Pass Filters) 32 and 33, LPFs (Low Pass Filters) 34 and 35, and a power comparison spectrum. A determination unit 36, a phase comparison spectrum determination unit 37, a memory unit 38, an unnecessary spectrum removal unit 39, and an IFFT unit 40 are provided. The cutoff frequencies of the HPFs 32 and 33 and the cutoff frequencies of the LPFs 34 and 35 are both 8 kHz.

ＦＦＴ部２９は、指向性マイク４Ａの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ１［Ｆ］に変換し、その周波数領域の信号Ｓ１［Ｆ］を、不要スペクトラム除去部３９と、ＨＰＦ３２を介してパワー比較スペクトラム判定部３６とに出力する。また、ＦＦＴ部３０は、無指向性マイク４Ｂの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ２［Ｆ］に変換し、その周波数領域の信号Ｓ２［Ｆ］を、ＨＰＦ３３を介してパワー比較スペクトラム判定部３６と、ＬＰＦ３４を介して位相比較スペクトラム判定部３７とに出力する。また、ＦＦＴ部３１は、無指向性マイク４Ｃの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ３［Ｆ］に変換し、その周波数領域の信号Ｓ３［Ｆ］を、ＬＰＦ３５を介して位相比較スペクトラム判定部３７に出力する。 The FFT unit 29 samples the output signal of the directional microphone 4A at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S1 [F] by FFT processing every 2048 samples. S1 [F] is output to the unnecessary spectrum removal unit 39 and the power comparison spectrum determination unit 36 via the HPF 32. Further, the FFT unit 30 samples the output signal of the omnidirectional microphone 4B at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S2 [F] by FFT processing every 2048 samples. The region signal S2 [F] is output to the power comparison spectrum determination unit 36 via the HPF 33 and to the phase comparison spectrum determination unit 37 via the LPF 34. Further, the FFT unit 31 samples the output signal of the omnidirectional microphone 4C at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S3 [F] by FFT processing every 2048 samples. The region signal S3 [F] is output to the phase comparison spectrum determination unit 37 via the LPF 35.

パワー比較スペクトラム判定部３６は、周波数領域の信号Ｓ１［Ｆ］における８ｋＨｚ以上の周波数成分のパワーと周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以上の周波数成分のパワーそれぞれを周波数毎に計算し、周波数領域の信号Ｓ１［Ｆ］における８ｋＨｚ以上の周波数成分のパワーと周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以上の周波数成分のパワーとを比較して、相対パワーパラメータ（ここでは、周波数領域の信号Ｓ１［Ｆ］における８ｋＨｚ以上の周波数成分のパワーから周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以上の周波数成分のパワーを差し引いた値とする）を周波数毎に求める。 The power comparison spectrum determination unit 36 calculates, for each frequency, the power of the frequency component of 8 kHz or higher in the frequency domain signal S1 [F] and the power of the frequency component of 8 kHz or higher in the frequency domain signal S2 [F]. The power of the frequency component of 8 kHz or more in the signal S1 [F] in the region is compared with the power of the frequency component of 8 kHz or more in the signal S2 [F] in the frequency region, and the relative power parameter (here, the signal S1 in the frequency region is compared). The frequency component power of 8 kHz or more in [F] is subtracted from the power of the frequency component of 8 kHz or more in the signal S2 [F] in the frequency domain) for each frequency.

位相比較スペクトラム判定部３７は、周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以下の周波数成分の位相と周波数領域の信号Ｓ３［Ｆ］における８ｋＨｚ以下の周波数成分の位相それぞれを周波数毎に計算し、周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以下の周波数成分の位相と周波数領域の信号Ｓ３［Ｆ］における８ｋＨｚ以下の周波数成分の位相とを比較して、相対位相パラメータ（ここでは、周波数領域の信号Ｓ２［Ｆ］における８ｋＨｚ以下の周波数成分の位相から周波数領域の信号Ｓ３［Ｆ］における８ｋＨｚ以下の周波数成分の位相を差し引いた値とする）を周波数毎に求める。 The phase comparison spectrum determination unit 37 calculates, for each frequency, the phase of the frequency component of 8 kHz or less in the frequency domain signal S2 [F] and the phase of the frequency component of 8 kHz or less in the frequency domain signal S3 [F] for each frequency. The phase of the frequency component of 8 kHz or less in the domain signal S2 [F] is compared with the phase of the frequency component of 8 kHz or less in the frequency domain signal S3 [F], and the relative phase parameter (here, the signal S2 in the frequency domain) is compared. A value obtained by subtracting the phase of the frequency component of 8 kHz or less in the frequency domain signal S3 [F] from the phase of the frequency component of 8 kHz or less in [F] is obtained for each frequency.

メモリ部２６は、図４に示す単一指向性パターンＰ１と無指向性パターンＰ２から決定される値であって図４に示すベクトルｖ１に対応する値である閾値ＴＨと、図４に示す角度α°に対応する値である閾値θとを予め記憶している。なお、閾値θは以下の式で表される。ただし、Ｆは周波数、Ｖは音速を表している。
θ＝２．０π×Ｆ×０．０２cos（９０−α）°／Ｖ The memory unit 26 has a threshold TH that is a value determined from the unidirectional pattern P1 and the omnidirectional pattern P2 shown in FIG. 4 and that corresponds to the vector v1 shown in FIG. 4, and an angle shown in FIG. A threshold value θ that is a value corresponding to α ° is stored in advance. The threshold θ is expressed by the following formula. Here, F represents frequency and V represents sound velocity.
θ = 2.0π × F × 0.02 cos (90−α) ° / V

パワー比較スペクトラム判定部３６は、８ｋＨｚ以上の周波数成分に対して、相対パワーパラメータが閾値ＴＨより大きいか否かを周波数毎に判定し、相対パワーパラメータが閾値ＴＨより大きい場合に正面方向からの音成分であると判定する。また、位相比較スペクトラム判定部３７は、８ｋＨｚ以下の周波数成分に対して、相対位相パラメータの絶対値が閾値θより小さいか否かを周波数毎に判定し、相対位相パラメータの絶対値が閾値θより小さい場合に正面方向からの音成分であると判定する。 The power comparison spectrum determination unit 36 determines for each frequency whether or not the relative power parameter is larger than the threshold value TH for a frequency component of 8 kHz or more. If the relative power parameter is larger than the threshold value TH, the sound from the front direction is determined. Determined to be a component. Further, the phase comparison spectrum determination unit 37 determines, for each frequency, whether or not the absolute value of the relative phase parameter is smaller than the threshold value θ for a frequency component of 8 kHz or less, and the absolute value of the relative phase parameter is smaller than the threshold value θ. When it is small, it is determined that the sound component is from the front direction.

不要スペクトラム除去部３９は、パワー比較スペクトラム判定部３６の判定結果及び位相比較スペクトラム判定部３７の判定結果に基づいて、周波数領域の信号Ｓ１［Ｆ］から正面方向からの音成分でない不要な成分を周波数領域上で除去し、その不要な成分が除去された周波数領域の信号をＩＦＦＴ部４０に出力する。ＩＦＦＴ部３０は、不要スペクトラム除去部３９の出力信号をＩＦＦＴ処理にて時間領域の信号に変換し、第２実施形態の音声処理部６の出力信号として圧縮処理部７（図１参照）に出力する。 Based on the determination result of the power comparison spectrum determination unit 36 and the determination result of the phase comparison spectrum determination unit 37, the unnecessary spectrum removal unit 39 removes an unnecessary component that is not a sound component from the front direction from the signal S1 [F] in the frequency domain. A signal in the frequency domain that has been removed on the frequency domain and from which unnecessary components have been removed is output to IFFT section 40. The IFFT unit 30 converts the output signal of the unnecessary spectrum removing unit 39 into a signal in the time domain by IFFT processing, and outputs the signal to the compression processing unit 7 (see FIG. 1) as the output signal of the audio processing unit 6 of the second embodiment. To do.

第２実施形態の音声処理部６は、相対位相パラメータを用いて方向判定を精度良く行うことができる低周波数帯域では、相対位相パラメータを用いて方向判定を行っているが、相対位相パラメータを用いて方向判定を精度良く行うことができない高周波数帯域では、第１実施形態の音声処理部６と同様の音声処理を行っているので、第１実施形態の音声処理部６と同様の効果を奏する。 The audio processing unit 6 according to the second embodiment performs direction determination using the relative phase parameter in the low frequency band where the direction determination can be performed accurately using the relative phase parameter. In the high frequency band where the direction determination cannot be performed with high accuracy, the same audio processing as that of the audio processing unit 6 of the first embodiment is performed, and thus the same effect as that of the audio processing unit 6 of the first embodiment is obtained. .

＜第３実施形態＞
第１実施形態及び第２実施形態の音声処理部６では正面方向の音を判定したが、第３実施形態の音声処理部６では２方向（Ｌｃｈ、Ｒｃｈ）の音を判定する。第３実施形態の音声処理部６を用いる場合、マイク部４の構成及び各マイクの配置を第３実施形態の音声処理部６を用いる場合と同一にする（図３及び図４参照）。 <Third Embodiment>
The sound processing unit 6 of the first embodiment and the second embodiment determines a sound in the front direction, but the sound processing unit 6 of the third embodiment determines a sound in two directions (Lch, Rch). When using the audio processing unit 6 of the third embodiment, the configuration of the microphone unit 4 and the arrangement of each microphone are the same as when using the audio processing unit 6 of the third embodiment (see FIGS. 3 and 4).

第３実施形態の音声処理部６は、図８に示すように、ＦＦＴ部４１及び４２と、パワー比較スペクトラム判定部４３と、位相比較スペクトラム判定部４４と、メモリ部４５と、不要スペクトラム除去部４６及び４７と、ＩＦＦＴ部４８及び４９とを備える。 As shown in FIG. 8, the sound processing unit 6 of the third embodiment includes FFT units 41 and 42, a power comparison spectrum determination unit 43, a phase comparison spectrum determination unit 44, a memory unit 45, and an unnecessary spectrum removal unit. 46 and 47, and IFFT units 48 and 49.

続いて、第３実施形態の音声処理部６の動作について図９に示すフローチャートを参照して説明する。 Next, the operation of the voice processing unit 6 of the third embodiment will be described with reference to the flowchart shown in FIG.

ステップ＃１０１では、ＦＦＴ部４１が、指向性マイク４Ａの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ１［Ｆ］に変換し、その周波数領域の信号Ｓ１［Ｆ］をパワー比較スペクトラム判定部４３及び位相比較スペクトラム判定部４４に出力する。また、ステップ＃１０では、ＦＦＴ部４２が、無指向性マイク４Ｂの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ２［Ｆ］に変換し、その周波数領域の信号Ｓ２［Ｆ］をパワー比較スペクトラム判定部４３及び位相比較スペクトラム判定部４４並びに不要スペクトラム除去部４６及び４７に出力する。 In step # 101, the FFT unit 41 samples the output signal of the directional microphone 4A at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S1 [F] by FFT processing every 2048 samples. The frequency domain signal S 1 [F] is output to the power comparison spectrum determination unit 43 and the phase comparison spectrum determination unit 44. In step # 10, the FFT unit 42 samples the output signal of the omnidirectional microphone 4B at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S2 [F] by FFT processing every 2048 samples. Then, the signal S2 [F] in the frequency domain is output to the power comparison spectrum determination unit 43, the phase comparison spectrum determination unit 44, and the unnecessary spectrum removal units 46 and 47.

続くステップ＃１０２では、パワー比較スペクトラム判定部４３及び位相比較スペクトラム判定部４４が、処理対象周波数ｆを最小値にセットする。 In subsequent step # 102, the power comparison spectrum determination unit 43 and the phase comparison spectrum determination unit 44 set the processing target frequency f to the minimum value.

続くステップ＃１０３では、パワー比較スペクトラム判定部４３が、周波数領域の信号Ｓ１［Ｆ］の処理対象周波数ｆ成分でのパワーＰＷ１［ｆ］と、周波数領域の信号Ｓ２［Ｆ］の処理対象周波数ｆ成分でのパワーＰＷ２［ｆ］とを算出する。また、ステップ＃３０では、位相比較スペクトラム判定部４４が、周波数領域の信号Ｓ１［Ｆ］の処理対象周波数ｆ成分での位相ＰＨ１［ｆ］と、周波数領域の信号Ｓ２［Ｆ］の処理対象周波数ｆ成分での位相ＰＨ２［ｆ］とを算出する。 In subsequent step # 103, the power comparison spectrum determination unit 43 determines the power PW1 [f] at the processing target frequency f component of the frequency domain signal S1 [F] and the processing target frequency f of the frequency domain signal S2 [F]. The power PW2 [f] at the component is calculated. Also, in step # 30, the phase comparison spectrum determination unit 44 performs processing on the phase PH1 [f] at the processing target frequency f component of the frequency domain signal S1 [F] and the processing target frequency of the frequency domain signal S2 [F]. The phase PH2 [f] at the f component is calculated.

続くステップ＃１０４では、パワー比較スペクトラム判定部４３が、パワーＰＷ１［ｆ］とパワーＰＷ２［ｆ］とを比較して、相対パワーパラメータ（ＰＷ１［ｆ］−ＰＷ２［ｆ］）を求め、その相対パワーパラメータが、メモリ部２６に予め記憶されている閾値ＴＨ_L,R（図４に示すベクトルｖ_L、ｖ_Rに対応する値）の±０．５dBの範囲内であるか否かを判定する。 In subsequent step # 104, the power comparison spectrum determination unit 43 compares the power PW1 [f] with the power PW2 [f] to obtain a relative power parameter (PW1 [f] −PW2 [f]), and the relative It is determined whether or not the power parameter is within a range of ± 0.5 dB of threshold values TH _{L, R} (values corresponding to the vectors v _L and v _R shown in FIG. 4) stored in the memory unit 26 in advance. .

相対パワーパラメータ（ＰＷ１［ｆ］−ＰＷ２［ｆ］）が（閾値ＴＨ_L,R−０．５dB）以上（閾値ＴＨ_L,R＋０．５dB）以下であれば（ステップ＃１０４のＹＥＳ）、Ｌ方向からの音成分、Ｒ方向からの音成分のいずれかであると判定し（ステップ＃１０５）、ステップ＃１０７に移行する。一方、相対パワーパラメータ（ＰＷ１［ｆ］−ＰＷ２［ｆ］）が（閾値ＴＨ_L,R−０．５dB）以上（閾値ＴＨ_L,R＋０．５dB）以下でなければ（ステップ＃１０４のＮＯ）、Ｌ方向からの音成分、Ｒ方向からの音成分のいずれでもなく不要音であると判定し（ステップ＃１０６）、ステップ＃１１２に移行する。 Relative power parameter (PW1 [f] -PW2 [f ]) is (threshold TH _L, R -0.5dB) or more (threshold TH _L, R + 0.5dB) not more than (YES in Step # 104), L The sound component from the direction and the sound component from the R direction are determined (step # 105), and the process proceeds to step # 107. On the other hand, the relative power parameter (PW1 [f] −PW2 [f]) is not (threshold TH _{L, R} −0.5 dB) or more (threshold TH _{L, R} +0.5 dB) or less (NO in step # 104). The sound component from the L direction and the sound component from the R direction are determined to be unnecessary sounds (step # 106), and the process proceeds to step # 112.

ステップ＃１０７では、位相比較スペクトラム判定部４４が、位相ＰＨ１［ｆ］と位相ＰＨ２［ｆ］とを比較して、相対位相パラメータ（ＰＨ１［ｆ］−ＰＨ２［ｆ］）を求め、その相対位相パラメータとメモリ部２６に予め記憶されている閾値θ_L（図４に示す角度β_L°に対応する値）との差の絶対値と、その相対位相パラメータとメモリ部２６に予め記憶されている閾値θ_R（図４に示す角度β_R°に対応する値）との差の絶対値との大小関係を判定する。なお、閾値θ_L、θ_Rはそれぞれ以下の式で表される。ただし、Ｆは周波数、Ｖは音速を表している。
θ_L＝２．０π×Ｆ×０．０２cos（９０−β_L）°／Ｖ
θ_R＝２．０π×Ｆ×０．０２cos（９０＋β_R）°／Ｖ In step # 107, the phase comparison spectrum determination unit 44 compares the phase PH1 [f] with the phase PH2 [f] to obtain a relative phase parameter (PH1 [f] −PH2 [f]), and calculates the relative phase. The absolute value of the difference between the parameter and the threshold value θ _L (a value corresponding to the angle β _L ° shown in FIG. 4) stored in advance in the memory unit 26, its relative phase parameter, and the memory unit 26 are stored in advance. The magnitude relation with the absolute value of the difference from the threshold value θ _R (value corresponding to the angle β _R ° shown in FIG. 4) is determined. The threshold values θ _L and θ _R are each expressed by the following equations. Here, F represents frequency and V represents sound velocity.
θ _L = 2.0π × F × 0.02 cos (90−β _L ) ° / V
θ _R = 2.0π × F × 0.02 cos (90 + β _R ) ° / V

相対位相パラメータと閾値θ_Lとの差の絶対値が相対位相パラメータと閾値θ_Rとの差の絶対値以下であれば（ステップ＃１０７のＹＥＳ）、Ｌ方向からの音成分であると判定し（ステップ＃１０８）、ステップ＃１１０に移行する。一方、相対位相パラメータと閾値θ_Lとの差の絶対値が相対位相パラメータと閾値θ_Rとの差の絶対値より大きければ（ステップ＃１０７のＮＯ）、Ｒ方向からの音成分であると判定し（ステップ＃１０９）、ステップ＃１１１に移行する。 If the absolute value of the difference between the relative phase parameter and the threshold value θ _L is equal to or smaller than the absolute value of the difference between the relative phase parameter and the threshold value θ _R (YES in step # 107), it is determined that the sound component is from the L direction. (Step # 108), the process proceeds to Step # 110. On the other hand, if the absolute value of the difference between the relative phase parameter and the threshold value θ _L is larger than the absolute value of the difference between the relative phase parameter and the threshold value θ _R (NO in step # 107), it is determined that the sound component is from the R direction. (Step # 109), the process proceeds to Step # 111.

ステップ＃１１０では、不要スペクトラム除去部４６がＳＬ［ｆ］＝Ｓ２［ｆ］とし、不要スペクトラム除去部４７がＳＲ［ｆ］＝０とし、ステップ＃１１３に移行する。 In Step # 110, the unnecessary spectrum removing unit 46 sets SL [f] = S2 [f], and the unnecessary spectrum removing unit 47 sets SR [f] = 0, and the process proceeds to Step # 113.

ステップ＃１１１では、不要スペクトラム除去部４６がＳＬ［ｆ］＝０とし、不要スペクトラム除去部４７がＳＲ［ｆ］＝Ｓ２［ｆ］とし、ステップ＃１１３に移行する。 In Step # 111, the unnecessary spectrum removing unit 46 sets SL [f] = 0, and the unnecessary spectrum removing unit 47 sets SR [f] = S2 [f], and the process proceeds to Step # 113.

ステップ＃１１２では、不要スペクトラム除去部４６がＳＬ［ｆ］＝０とし、不要スペクトラム除去部４７がＳＲ［ｆ］＝０とし、ステップ＃１１３に移行する。 In Step # 112, the unnecessary spectrum removing unit 46 sets SL [f] = 0, and the unnecessary spectrum removing unit 47 sets SR [f] = 0, and the process proceeds to Step # 113.

ステップ＃１１３では、パワー比較スペクトラム判定部４３及び位相比較スペクトラム判定部４４によって、処理対象周波数ｆが最大値（例えば２４ｋＨｚ）にセットされているかを判定する。処理対象周波数ｆが最大値（例えば２４ｋＨｚ）にセットされていなければ（ステップ＃１１３のＮＯ）、処理対象周波数ｆを更新して一段階大きい値にセットし（ステップ＃１１４）、ステップ＃１０３に戻る。一方、処理対象周波数ｆが最大値（例えば２４ｋＨｚ）にセットされていれば（ステップ＃１１３のＹＥＳ）、各処理対象周波数のＳＬ［ｆ］の集合体であるＳＬ［Ｆ］がＩＦＦＴ部４８によってＩＦＦＴ処理されて時間領域の信号に変換され、第３実施形態の音声処理部６のＬｃｈ出力信号として圧縮処理部７（図１参照）に出力され、各処理対象周波数のＳＲ［ｆ］の集合体であるＳＲ［Ｆ］がＩＦＦＴ部４９によってＩＦＦＴ処理されて時間領域の信号に変換され、第３実施形態の音声処理部６のＲｃｈ出力信号として圧縮処理部７（図１参照）に出力される（ステップ＃１１５）。 In step # 113, the power comparison spectrum determination unit 43 and the phase comparison spectrum determination unit 44 determine whether the processing target frequency f is set to the maximum value (for example, 24 kHz). If the processing target frequency f is not set to the maximum value (for example, 24 kHz) (NO in step # 113), the processing target frequency f is updated and set to a value larger by one step (step # 114). Return. On the other hand, if the processing target frequency f is set to the maximum value (for example, 24 kHz) (YES in step # 113), SL [F] that is an aggregate of SL [f] of each processing target frequency is generated by the IFFT unit 48. The IFFT process is performed to convert the signal into a time domain signal, which is output to the compression processing unit 7 (see FIG. 1) as an Lch output signal of the audio processing unit 6 of the third embodiment, and a set of SR [f] of each processing target frequency SR [F], which is a body, is IFFT processed by the IFFT unit 49 and converted into a signal in the time domain, and is output to the compression processing unit 7 (see FIG. 1) as an Rch output signal of the audio processing unit 6 of the third embodiment. (Step # 115).

第３実施形態の音声処理部６は、第１実施形態の音声処理部６と同様の効果を奏することに加えて、２方向の音の判定が可能であるという特徴も有している。また、メモリ部４５に予め記憶させる相対パワーパラメータに関する閾値及び相対位相パラメータに関する閾値の数を本実施形態よりも増やすことによって３方向以上の音の判定が可能となる。 The sound processing unit 6 of the third embodiment has the feature that it can determine sound in two directions in addition to the same effects as the sound processing unit 6 of the first embodiment. Further, by increasing the number of thresholds related to the relative power parameter and relative phase parameter stored in advance in the memory unit 45 as compared with the present embodiment, it is possible to determine sounds in three or more directions.

＜第４実施形態＞
第４実施形態の音声処理部６を用いる場合、マイク部４は、図１０に示すように互いに近接して配置される指向性マイク４Ｄ及び指向性マイク４Ｅから成る構成とする。例えば、指向性マイク４Ｄと指向性マイク４Ｅの中心間隔を２ｃｍとする。さらに、指向性マイク４Ｄと指向性マイク４Ｅが互いに異なる指向特性を有するように、より具体的には、指向性マイク４Ｄが図１１に示す単一指向性パターンＰ４を有し、指向性マイク４Ｅが図１１に示す単一指向性パターンＰ５を有するように指向性マイク４Ｄ及び無指向性マイク４Ｅを配置する。なお、図１１に示す指向特性（単一指向性パターンＰ４及びＰ５）は、音到来方向別のマイク感度を表しており、パターンを形成する或る点が中心Ｏから離れているほど、その或る点から中心Ｏに向かう方向からの音に対するマイク感度が高いことを表している。 <Fourth embodiment>
When the audio processing unit 6 according to the fourth embodiment is used, the microphone unit 4 includes a directional microphone 4D and a directional microphone 4E that are arranged close to each other as shown in FIG. For example, the center distance between the directional microphone 4D and the directional microphone 4E is 2 cm. Furthermore, more specifically, the directional microphone 4D has the unidirectional pattern P4 shown in FIG. 11 so that the directional microphone 4D and the directional microphone 4E have different directivity characteristics, and the directional microphone 4E. The directional microphone 4D and the omnidirectional microphone 4E are arranged so as to have the unidirectional pattern P5 shown in FIG. Note that the directivity characteristics (unidirectional patterns P4 and P5) shown in FIG. 11 represent the microphone sensitivity according to the sound arrival direction, and the more a certain point forming the pattern is from the center O, the more This shows that the microphone sensitivity with respect to the sound from the direction from the point toward the center O is high.

第４実施形態の音声処理部６は、図１２に示すように、ＦＦＴ部５０及び５１と、パワー比較スペクトラム判定部５２と、不要スペクトラム除去部５３と、ＩＦＦＴ部５４と、利得がαである利得調整器５５と、利得が（１−α）である利得調整器５６及び５７と、加算器５８及び５９とを備える。 As shown in FIG. 12, the audio processing unit 6 according to the fourth embodiment includes FFT units 50 and 51, a power comparison spectrum determination unit 52, an unnecessary spectrum removal unit 53, an IFFT unit 54, and a gain α. A gain adjuster 55, gain adjusters 56 and 57 having a gain of (1-α), and adders 58 and 59 are provided.

ＦＦＴ部５０は、指向性マイク４Ｄの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ１［Ｆ］に変換し、その周波数領域の信号Ｓ１［Ｆ］をパワー比較スペクトラム判定部５２に出力する。また、ＦＦＴ部５１は、指向性マイク４Ｅの出力信号を４８ｋＨｚでサンプリングしてデジタル信号に変換した後、２０４８サンプル毎にＦＦＴ処理にて周波数領域の信号Ｓ２［Ｆ］に変換し、その周波数領域の信号Ｓ２［Ｆ］をパワー比較スペクトラム判定部５２及び不要スペクトラム除去部５３に出力する。 The FFT unit 50 samples the output signal of the directional microphone 4D at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S1 [F] by FFT processing every 2048 samples. S 1 [F] is output to the power comparison spectrum determination unit 52. Further, the FFT unit 51 samples the output signal of the directional microphone 4E at 48 kHz and converts it into a digital signal, and then converts it into a frequency domain signal S2 [F] by FFT processing every 2048 samples. The signal S2 [F] is output to the power comparison spectrum determination unit 52 and the unnecessary spectrum removal unit 53.

パワー比較スペクトラム判定部５２は、周波数領域の信号Ｓ１［Ｆ］のパワーと周波数領域の信号Ｓ２［Ｆ］のパワーそれぞれを周波数毎に計算し、周波数領域の信号Ｓ１［Ｆ］のパワーと周波数領域の信号Ｓ２［Ｆ］のパワーとが一致しているか否かを周波数毎に判定し、一致している場合に正面方向からの音成分であると判定する。 The power comparison spectrum determination unit 52 calculates the power of the frequency domain signal S1 [F] and the power of the frequency domain signal S2 [F] for each frequency, and the power of the frequency domain signal S1 [F] and the frequency domain. It is determined for each frequency whether or not the power of the signal S2 [F] matches, and if it matches, the sound component from the front direction is determined.

不要スペクトラム除去部５３は、パワー比較スペクトラム判定部５２の判定結果に基づいて、周波数領域の信号Ｓ２［Ｆ］から正面方向からの音成分でない不要な成分を周波数領域上で除去し、その不要な成分が除去された周波数領域の信号をＩＦＦＴ部５４に出力する。ＩＦＦＴ部５４は、不要スペクトラム除去部５３の出力信号をＩＦＦＴ処理にて時間領域の信号に変換する。 Based on the determination result of the power comparison spectrum determination unit 52, the unnecessary spectrum removal unit 53 removes an unnecessary component that is not a sound component from the front direction on the frequency domain from the signal S2 [F] in the frequency domain. The frequency domain signal from which the component is removed is output to IFFT section 54. The IFFT unit 54 converts the output signal of the unnecessary spectrum removal unit 53 into a signal in the time domain by IFFT processing.

ＩＦＦＴ部５４の出力信号は利得調整器５５によって利得調整され、利得調整器５６によって利得調整された指向性マイク４Ｄの出力信号と加算器５８においてミキシングされたのち、第４実施形態の音声処理部６のＬｃｈ出力信号として圧縮処理部７（図１参照）に出力される。 The output signal of the IFFT unit 54 is gain-adjusted by the gain adjuster 55, mixed with the output signal of the directional microphone 4D gain-adjusted by the gain adjuster 56 in the adder 58, and then the sound processing unit of the fourth embodiment. 6 Lch output signals are output to the compression processing unit 7 (see FIG. 1).

また、ＩＦＦＴ部５４の出力信号は利得調整器５５によって利得調整され、利得調整器５７によって利得調整された指向性マイク４Ｅの出力信号と加算器５９においてミキシングされたのち、第４実施形態の音声処理部６のＲｃｈ出力信号として圧縮処理部７（図１参照）に出力される。 The output signal of the IFFT unit 54 is gain-adjusted by the gain adjuster 55, mixed with the output signal of the directional microphone 4E gain-adjusted by the gain adjuster 57 by the adder 59, and then the sound of the fourth embodiment. The Rch output signal of the processing unit 6 is output to the compression processing unit 7 (see FIG. 1).

利得調整器５５〜５７は、例えば、ＣＰＵ１７（図１参照）から出力されるカメラのズーム情報に連動してαの値を変更し、最大ズーム時にはαを１．０、ズームをしない場合にはαを０．０とする。 For example, the gain adjusters 55 to 57 change the value of α in conjunction with the camera zoom information output from the CPU 17 (see FIG. 1). α is set to 0.0.

第４実施形態の音声処理部６は、第１実施形態の音声処理部６と同様の効果を奏することに加えて、正面方向の音の強調を変更することができるという特徴も有している。 The voice processing unit 6 of the fourth embodiment has the feature that the sound enhancement in the front direction can be changed in addition to the same effects as the voice processing unit 6 of the first embodiment. .

上述した図１に示す撮像装置は、本発明に係る判定方法を集音した音声信号を記録する際に適用した集音環境判定装置を搭載した撮像装置であるが、しかし、本発明に係る判定方法は、集音した音声信号を利用して音声信号に対する方向判定を行うため、必ずしも集音した音声信号を記録する際に判定を行う必要は無く、集音した音声信号を利用して再生する際に音声信号に対する方向判定を行うこと、すなわち、集音した音声信号を記録・再生し、その再生した音声信号を利用して判定を行うことも可能である。つまり本発明では、音声信号に対する方向判定が行われる時期が集音した音声信号を記録する際に限定されず、音声信号を再生する際であっても良い。上記の通り、本発明に係る判定方法では、音声信号に対する方向判定が行われる時期が集音した音声信号を記録する際に限定されないため、映像及び音声情報を利用して行う他の処理に応じて、集音した音声信号を記録する際又は記録した音声信号を再生する際のいずれかにおいて音声信号に対する方向判定を行うことが可能となる。 The above-described imaging apparatus shown in FIG. 1 is an imaging apparatus equipped with a sound collection environment determination apparatus applied when recording a sound signal collected by the determination method according to the present invention. However, the determination according to the present invention is performed. Since the method uses the collected audio signal to determine the direction of the audio signal, there is no need to make a determination when recording the collected audio signal, and playback is performed using the collected audio signal. It is also possible to determine the direction with respect to the audio signal, that is, to record / reproduce the collected audio signal and to make the determination using the reproduced audio signal. In other words, in the present invention, the timing for determining the direction of the audio signal is not limited to recording the collected audio signal, but may be when reproducing the audio signal. As described above, in the determination method according to the present invention, the timing for performing the direction determination on the audio signal is not limited to recording the collected audio signal, so that it depends on other processing performed using video and audio information. Thus, it is possible to determine the direction of the audio signal when recording the collected audio signal or when reproducing the recorded audio signal.

以下では本発明に係る判定方法を再生時に適用した判定装置を搭載した撮像装置について説明する。 Hereinafter, an imaging apparatus equipped with a determination apparatus to which the determination method according to the present invention is applied during reproduction will be described.

図１３は、本発明に係る判定装置を搭載した撮像装置の他の内部構成例を示すブロック図である。なお、図１３において図１と実質上同一の部分には同一の符号を付している。 FIG. 13 is a block diagram showing another example of the internal configuration of an imaging apparatus equipped with the determination apparatus according to the present invention. In FIG. 13, parts that are substantially the same as those in FIG.

図１３に示す撮像装置が図１に示す撮像装置と異なる点は、音声処理部６の代わりに音声処理部６ａを設け、さらに、伸長処理部９と音声出力回路部１３との間に音声処理部６ｂを設けている点である。 The imaging apparatus shown in FIG. 13 is different from the imaging apparatus shown in FIG. 1 in that an audio processing unit 6a is provided instead of the audio processing unit 6, and audio processing is performed between the expansion processing unit 9 and the audio output circuit unit 13. The point is that a portion 6b is provided.

音声処理部６ａは、音声処理部６と異なり、マイク部４からのアナログ信号である音声信号に対してＡ／Ｄ変換は行うが、本発明に係る判定方法と、その判定結果に応じた音声処理を行わないものである。 Unlike the audio processing unit 6, the audio processing unit 6a performs A / D conversion on an audio signal that is an analog signal from the microphone unit 4, but the determination method according to the present invention and the audio corresponding to the determination result No processing is performed.

音声処理部６ｂは、ＦＦＴ部がＡ／Ｄ変換を行わない点を除き、音声処理部６と同様の構成である。音声処理部６ｂにおいて行われる音声処理は、基本的に音声処理部６において行われる音声処理と同様であるので、ここでは説明を省略する。 The audio processing unit 6b has the same configuration as the audio processing unit 6 except that the FFT unit does not perform A / D conversion. Since the audio processing performed in the audio processing unit 6b is basically the same as the audio processing performed in the audio processing unit 6, the description thereof is omitted here.

また、本発明は、音声信号に対する方向判定に関するものであるので、映像に関連するブロックは必須のものでない。したがって、本発明は、撮像装置以外の電子機器、例えば、音声記録装置、音声再生装置、音声記録再生装置（例えばＩＣレコーダ）等にも適用することができる。 In addition, since the present invention relates to direction determination for an audio signal, a block related to a video is not essential. Therefore, the present invention can also be applied to electronic devices other than the imaging device, for example, an audio recording device, an audio reproducing device, an audio recording / reproducing device (for example, an IC recorder), and the like.

また、上述した第１〜第４実施形態では、閾値を音声処理部６内のメモリ部に記憶させたが、メモリ１８に記憶させ音声処理部６内のメモリ部を廃止するようにしてもよい。 In the first to fourth embodiments described above, the threshold value is stored in the memory unit in the voice processing unit 6. However, the threshold value may be stored in the memory 18 and the memory unit in the voice processing unit 6 may be abolished. .

また、上述した第１〜第４実施形態は適宜組み合わせて実施することが可能である。例えば、第１実施形態の判定手法と第３実施形態の判定手法を組み合わせ、第１実施形態の判定手法により得られた正面方向の音を利得αで利得調整したものと、第３実施形態の判定手法により得られたＬ方向の音を利得（１−α）で利得調整したものとをミキシングしてＬｃｈ出力信号を生成し、第１実施形態の判定手法により得られた正面方向の音を利得αで利得調整したものと、第３実施形態の判定手法により得られたＲ方向の音を利得（１−α）で利得調整したものとをミキシングしてＲｃｈ出力信号を生成することが可能である。 The first to fourth embodiments described above can be implemented in appropriate combination. For example, the determination method of the first embodiment and the determination method of the third embodiment are combined, and the sound in the front direction obtained by the determination method of the first embodiment is adjusted with the gain α, and the third embodiment The L direction sound obtained by the determination method is mixed with the gain adjusted by the gain (1-α) to generate an Lch output signal, and the front direction sound obtained by the determination method of the first embodiment is generated. It is possible to generate an Rch output signal by mixing the gain adjusted with the gain α and the R direction sound obtained by the determination method of the third embodiment and adjusting the gain with the gain (1-α). It is.

また、上述した第１〜第４実施形態では、音声信号に対する方向判定として、特定方向からの音成分を判定したが、本発明はこれに限定されることはなく、周波数毎に求めた相対パワーパラメータに基づいて音源方向を判定するようにしてもよい。音源方向の判定結果の利用方法としては、例えばテレビ会議システムにおいて利用し、音源方向（発言者の方向）にカメラが向くように、音源方向の判定結果に応じてカメラを制御する等の利用方法が考えられる。 In the first to fourth embodiments described above, the sound component from the specific direction is determined as the direction determination for the audio signal. However, the present invention is not limited to this, and the relative power obtained for each frequency is determined. The sound source direction may be determined based on the parameter. As a method of using the determination result of the sound source direction, for example, a method of using in a video conference system and controlling the camera according to the determination result of the sound source direction so that the camera faces the sound source direction (speaker direction). Can be considered.

本発明は、音声信号を記憶及び／又は再生する電子機器（例えば、撮像装置やＩＣレコーダ、それらの機能を搭載した携帯機器、或いは、音声信号を記憶及び／又は再生する手段としてコンピュータを機能させるためのソフトウェアにより動作するコンピュータ）等に適用可能である。 The present invention makes an electronic device that stores and / or reproduces an audio signal (for example, an imaging apparatus, an IC recorder, a portable device equipped with those functions, or a computer function as a means for storing and / or reproducing an audio signal) For example, a computer that is operated by software).

は、本発明に係る集音環境判定装置を搭載した撮像装置の一内部構成例を示すブロック図である。These are block diagrams which show the example of 1 internal structure of the imaging device carrying the sound collection environment determination apparatus which concerns on this invention. は、図１に示す撮像装置の動画撮影時の基本動作を説明するためのフローチャートである。These are the flowcharts for demonstrating the basic operation | movement at the time of video recording of the imaging device shown in FIG. は、第１実施形態におけるマイク部の指向特性を示す図である。These are figures which show the directivity characteristic of the microphone part in 1st Embodiment. は、第１実施形態におけるマイク部の各マイクの配置を示す図である。These are figures which show arrangement | positioning of each microphone of the microphone part in 1st Embodiment. は、第１実施形態の音声処理部の構成を示すブロック図である。These are block diagrams which show the structure of the audio | voice processing part of 1st Embodiment. は、第２実施形態におけるマイク部の各マイクの配置を示す図である。These are figures which show arrangement | positioning of each microphone of the microphone part in 2nd Embodiment. は、第２実施形態の音声処理部の構成を示すブロック図である。These are block diagrams which show the structure of the audio | voice processing part of 2nd Embodiment. は、第３実施形態の音声処理部の構成を示すブロック図である。These are block diagrams which show the structure of the audio | voice processing part of 3rd Embodiment. は、第３実施形態の音声処理部の動作フローチャートである。These are the operation | movement flowcharts of the audio | voice processing part of 3rd Embodiment. は、第４実施形態におけるマイク部の指向特性を示す図である。These are figures which show the directivity characteristic of the microphone part in 4th Embodiment. は、第４実施形態におけるマイク部の各マイクの配置を示す図である。These are figures which show arrangement | positioning of each microphone of the microphone part in 4th Embodiment. は、第４実施形態の音声処理部の構成を示すブロック図である。These are block diagrams which show the structure of the audio | voice processing part of 4th Embodiment. は、本発明に係る判定装置を搭載した撮像装置の他の内部構成例を示すブロック図である。These are block diagrams which show the other internal structural example of the imaging device carrying the determination apparatus which concerns on this invention. は、従来の特定音源強調手法を実現するための音声処理部の構成を示すブロック図である。These are block diagrams which show the structure of the audio | voice processing part for implement | achieving the conventional specific sound source emphasis method. は、マイクロホンと音源との位置関係を示す図である。These are figures which show the positional relationship of a microphone and a sound source.

Explanation of symbols

１固体撮像素子（イメージセンサ）
２レンズ部
３ＡＦＥ
４マイク部
４Ａ指向性マイク
４Ｂ無指向性マイク
５画像処理部
６、６ａ、６ｂ音声処理部
７圧縮処理部
８ドライバ部
９伸長処理部
１０ビデオ出力回路部
１１ビデオ出力端子
１２ディスプレイ部
１３音声出力回路部
１４音声出力端子
１５スピーカ部
１６タイミングジェネレータ（ＴＧ）
１７ＣＰＵ
１８メモリ
１９操作部
２０、２１バス回線
２２外部メモリ
２３、２４、２９〜３１、４１、４２、５０、５１ＦＦＴ部
２５、３６、４３、５２パワー比較スペクトラム判定部
２６、３８、４５メモリ部
２７、３９、４６、４７、５３不要スペクトラム除去部
２８、４０、４８、４９、５４ＩＦＦＴ部
３２、３３ＨＰＦ
３４、３５ＬＰＦ
３７、４４位相比較スペクトラム判定部
５５〜５７利得調整器
５８、５９加算器
Ｐ１、Ｐ４、Ｐ５単一指向性パターン
Ｐ２無指向性パターン
Ｐ３音声処理により得られる新たな指向性パターン 1 Solid-state image sensor (image sensor)
2 Lens part 3 AFE
DESCRIPTION OF SYMBOLS 4 Microphone part 4A Directional microphone 4B Omnidirectional microphone 5 Image processing part 6, 6a, 6b Audio processing part 7 Compression processing part 8 Driver part 9 Decompression processing part 10 Video output circuit part 11 Video output terminal 12 Display part 13 Audio output Circuit unit 14 Audio output terminal 15 Speaker unit 16 Timing generator (TG)
17 CPU
18 Memory 19 Operation unit 20, 21 Bus line 22 External memory 23, 24, 29-31, 41, 42, 50, 51 FFT unit 25, 36, 43, 52 Power comparison spectrum determination unit 26, 38, 45 Memory unit 27 , 39, 46, 47, 53 Unnecessary spectrum removing unit 28, 40, 48, 49, 54 IFFT unit 32, 33 HPF
34, 35 LPF
37, 44 Phase comparison spectrum determination unit 55-57 Gain adjuster 58, 59 Adder P1, P4, P5 Unidirectional pattern P2 Non-directional pattern P3 New directivity pattern obtained by voice processing

Claims

A first time-frequency converter that converts the output signal of the first microphone to time-frequency;
A second time-frequency conversion unit that performs time-frequency conversion of an output signal of a second microphone having a directivity characteristic different from that of the first microphone;
The power of the frequency domain signal output from the first time frequency converter is compared with the power of the frequency domain signal output from the second time frequency converter for each frequency in a predetermined frequency band. A power comparator,
And a determination unit that determines a sound or a sound source direction from a specific direction using a comparison result in the power comparison unit.

A storage unit for storing a determination condition based on a difference in directivity between the first microphone and the second microphone;
The determination device according to claim 1, wherein the determination unit determines a sound from a specific direction based on a comparison result in the power comparison unit and a determination condition stored in the storage unit.

The predetermined frequency band is a first frequency band;
A third time-frequency conversion unit for time-frequency converting the output signal of the third microphone having the same directivity characteristics as the second microphone;
The phase of the frequency domain signal output from the second time frequency converter and the phase of the frequency domain signal output from the third time frequency converter in a band lower than the first frequency band. A phase comparator for comparing each frequency in a certain second frequency band;
A first storage unit that stores a first determination condition based on a difference in directivity between the first microphone and the second microphone;
A second storage unit that stores a second determination condition based on a positional relationship between the second microphone and the third microphone;
The determination unit determines sound from a specific direction of the first frequency band from a comparison result in the power comparison unit and a first determination condition stored in the first storage unit, and the phase The sound from a specific direction of the second frequency band is determined from a comparison result in the comparison unit and a second determination condition stored in the second storage unit. Judgment device.

The phase of the frequency domain signal output from the first time frequency converter is compared with the phase of the frequency domain signal output from the second time frequency converter for each frequency in the predetermined frequency band. A phase comparator to
A first storage unit that stores a first determination condition based on a difference in directivity between the first microphone and the second microphone;
A second storage unit that stores a second determination condition based on a positional relationship between the first microphone and the second microphone;
The determination unit is either a sound from the first direction or a sound from the second direction based on the comparison result in the power comparison unit and the first determination condition stored in the first storage unit. If the primary determination unit that determines whether the sound is a sound from the first direction or the sound from the second direction is determined by the primary determination unit and the primary determination unit, the phase comparison A secondary determination unit that determines whether the sound is from the first direction based on the comparison result in the unit and the second determination condition stored in the second storage unit. The determination apparatus according to claim 1, characterized in that:

The directivity pattern of the first microphone and the directivity pattern of the second microphone are symmetrical,
Comparison result that the power of the frequency domain signal output from the first time frequency converter is equal to the power of the frequency domain signal output from the second time frequency converter by the power comparator. The determination device according to claim 1, wherein the determination unit determines that the sound is from the front direction when the sound is obtained.

Comprising at least one determination device according to claim 1,
An electronic apparatus that performs sound processing on a collected sound signal based on a determination result of the determination device.

It has a recording / playback function for collected audio signals,
The electronic device according to claim 6, wherein the determination device performs determination processing either when recording the collected audio signal or when reproducing the recorded audio signal.

The electronic apparatus according to claim 6, wherein the electronic apparatus is an imaging apparatus including a camera that captures an image.

A first time-frequency conversion step of time-frequency converting the output signal of the first microphone;
A second time-frequency conversion step for time-frequency converting an output signal of a second microphone having a directivity characteristic different from that of the first microphone;
Power comparison for comparing the frequency domain signal power obtained by the first time frequency conversion step and the frequency domain signal power obtained by the second time frequency conversion step for each frequency in a predetermined frequency band. Steps,
And a determination step of determining a sound or sound source direction from a specific direction using a comparison result obtained by the power comparison step.