JP2010283676A

JP2010283676A - Sound detection apparatus, sound detection method and imaging system

Info

Publication number: JP2010283676A
Application number: JP2009136442A
Authority: JP
Inventors: Nobuyuki Kihara; 信之木原; Yohei Sakuraba; 洋平櫻庭; Takeshi Yamaguchi; 健山口
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-06-05
Filing date: 2009-06-05
Publication date: 2010-12-16

Abstract

<P>PROBLEM TO BE SOLVED: To detect only a sound source in the room by reliably excluding sounds outside the room. <P>SOLUTION: A sound detection apparatus includes: a plurality of microphones 112A, 112B disposed while being spaced apart from each other; a sound direction information-calculating section 132 for calculating an incident angle θ of sounds to the microphones based on a phase difference between sound information collected by the microphones 112A, 112B; a sound directivity-discriminating section 134 for discriminating directivity of sounds generated from a sound source based on the incident angle θ; and a sound detecting section 120 for detecting only sound information for which it is determined that the directivity has been fixed, based on the sound information collected by the microphones 112A, 112B. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音声検出装置、音声検出方法及び撮像システムに関する。 The present invention relates to a voice detection device, a voice detection method, and an imaging system.

近時では、セキュリティー用途などの監視装置において、監視カメラの画像を用いた動体検知や、マイクロフォンなどからの音声を用いた異音検知などにより、不審者や侵入者の検出を行うことが想定されている。 Recently, in surveillance devices for security applications, it is assumed that suspicious persons and intruders will be detected by detecting moving objects using images from surveillance cameras and detecting abnormal sounds using sound from microphones. ing.

特開２００７−４７０８８号公報JP 2007-47088 A 特開２００３−１８９１３９号公報JP 2003-189139 A 特開平１１−６４０９０号公報Japanese Patent Laid-Open No. 11-64090

セキュリティー用途などの監視装置において、音声を用いて不審者等を検知する手法には、収音した音声の音圧等のみを使用した音圧検出や、音声の特徴量を使用した音声検知などがある。これらの手法では、マイクロフォンなどを用いて収音した音声を使用するが、室内の音に基づいて不審者等を検出したい場合に、室外の音がともに検出されてしまい、誤検出が行われる可能性がある。このため、アレイマイクなどにより指向性を持たせて室外の音を減衰させる手法や、定常騒音を除去する手法、室外からの音声の特徴量を学習し除外する手法などが考えられている。 In surveillance devices such as security applications, methods for detecting suspicious individuals, etc. using sound include sound pressure detection using only the sound pressure of the collected sound, and sound detection using sound features. is there. In these methods, sound collected using a microphone or the like is used. However, when it is desired to detect a suspicious person or the like based on indoor sound, both outdoor sounds are detected and erroneous detection can be performed. There is sex. For this reason, a method of attenuating outdoor sound by providing directivity with an array microphone or the like, a method of removing stationary noise, a method of learning and removing a feature amount of sound from outside the room, and the like are considered.

しかしながら、これらの手法は、いずれも室外の音声情報が事前に判っている場合にはある程度有効であるが、室外からの非定常な音声や未知な音声などには全く効果が得られず、本来検出するべきではない室外の騒音を誤検出する可能性が高い。 However, these methods are effective to some extent when outdoor sound information is known in advance, but they are not effective at all for non-stationary or unknown sound from outside. There is a high possibility of false detection of outdoor noise that should not be detected.

例えば、上述したアレイマイクなどで指向性を持たせて室外の音声を減衰させる手法は、検出を目的とする室内の音声の方向に指向性を持たせることで、それ以外の方向からの音声を抑圧するものである。しかしながら、室外からの騒音は指向を向けた方向からも収音される可能性が高く、この手法では誤検出の可能性が高くなる。 For example, the above-described method of attenuating outdoor sound by providing directivity with an array microphone or the like provides directivity in the direction of indoor sound intended for detection, so that sound from other directions can be obtained. It is to suppress. However, there is a high possibility that noise from outside the room is picked up from the direction of directivity, and this method increases the possibility of erroneous detection.

また、定常騒音を除去する手法は、スペクトラルサブトラクションなど定常的な騒音レベルを学習し除去するものであるが、非定常な騒音にはほとんど効果が得られない。 Further, the method for removing stationary noise is to learn and remove stationary noise levels such as spectral subtraction, but it is hardly effective for non-stationary noise.

更に、室外からの音声の特徴量を学習して除外する手法は、室外からの音声の特徴量を予め学習し、収音した音声と特徴量が一致していた場合に室外からの音声とみなすものであるが、予め学習が必要な為、未学習の音声を除去することはできない。このため、室外の音声の特徴量が室内の音声の特徴量と類似している場合、室外の音声を除去することができない問題がある。 Furthermore, the method of learning and excluding the feature amount of the sound from the outside is learned in advance, and the feature amount of the sound from the outside is learned in advance, and if the collected sound matches the feature amount, it is regarded as the sound from the outside. However, since learning is required in advance, unlearned speech cannot be removed. For this reason, when the feature amount of the outdoor sound is similar to the feature amount of the indoor sound, there is a problem that the outdoor sound cannot be removed.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、複数の音源からの音声のうち、必要な音声のみを検出することが可能な、新規かつ改良された音声検出装置、音声検出方法及び撮像システムを提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is a novel and improved technique capable of detecting only necessary sound from sounds from a plurality of sound sources. An object of the present invention is to provide a voice detection device, a voice detection method, and an imaging system.

上記課題を解決するために、本発明のある観点によれば、互いに離間して配置された複数のマイクロフォンと、前記マイクロフォンで収音された音声情報の位相差に基づいて、前記マイクロフォンへの音声の入射角を算出する音方向情報算出部と、前記入射角に基づいて、音源から発せられた音声の方向性を判別する音方向性判別部と、前記マイクロフォンで収音された音声情報から、前記方向性が確定していると判断される音声情報のみを検出する音声検出部と、を備える音声検出装置が提供される。 In order to solve the above-described problem, according to an aspect of the present invention, a plurality of microphones that are spaced apart from each other, and a voice to the microphone based on a phase difference between voice information collected by the microphone From the sound direction information calculating unit for calculating the incident angle of the sound, the sound directionality determining unit for determining the directionality of the sound emitted from the sound source based on the incident angle, and the sound information collected by the microphone, There is provided a voice detection device including a voice detection unit that detects only voice information for which it is determined that the directionality is fixed.

また、前記複数のマイクロフォンのそれぞれで収音された音声情報をフーリエ変換などの時間周波数変換するフーリエ変換部を備え、前記音方向情報算出部は、前記フーリエ変換で得られた各周波数の位相差に基づいて各周波数毎に前記入射角を算出し、音方向性判別部は、前記各周波数毎に前記方向性を判別し、前記音声検出部は、前記マイクロフォンで収音された音声情報について、方向性が確定していると判断される周波数の音声情報のみを検出するものであってもよい。 The sound information collected by each of the plurality of microphones includes a Fourier transform unit that performs time-frequency transform such as Fourier transform, and the sound direction information calculation unit includes a phase difference of each frequency obtained by the Fourier transform. The incident angle is calculated for each frequency based on the sound direction, the sound direction determination unit determines the directionality for each frequency, the sound detection unit for the sound information collected by the microphone, Only the audio information of the frequency for which it is determined that the directionality is fixed may be detected.

また、前記音方向性判別部は、特定の周波数の音声について、前記入射角の時間的な変化に基づいて、前記方向性が確定しているか否かを判別するものであってもよい。 The sound directionality determination unit may determine whether or not the directionality is determined based on a temporal change in the incident angle for a sound having a specific frequency.

また、前記音方向性判別部は、複数の周波数の音声について、入射角の一致度に基づいて前記方向性が確定しているか否かを判別するものであってもよい。 The sound directionality determination unit may determine whether or not the directionality is determined based on the degree of coincidence of incident angles for a plurality of frequencies of sound.

また、上記課題を解決するために、本発明の別の観点によれば、互いに離間して配置された複数のマイクロフォンで収音された音声情報の位相差に基づいて、前記マイクロフォンへの音声の入射角を算出する音方向情報算出ステップと、前記入射角に基づいて、音源から発せられた音声の方向性を判別する音方向性判別ステップと、前記マイクロフォンで収音された音声情報から、前記方向性が確定していると判断される音声情報のみを検出する音声検出ステップと、を備える音声検出方法が提供される。 In order to solve the above-described problem, according to another aspect of the present invention, based on the phase difference of audio information collected by a plurality of microphones arranged apart from each other, the audio to the microphone is From the sound direction information calculating step for calculating the incident angle, the sound direction determining step for determining the directionality of the sound emitted from the sound source based on the incident angle, and the sound information collected by the microphone, There is provided a voice detection method including a voice detection step of detecting only voice information for which directionality is determined to be determined.

また、前記複数のマイクロフォンのそれぞれで収音された音声情報をフーリエ変換するステップを備え、前記音方向情報算出ステップにおいて、前記フーリエ変換で得られた各周波数の位相差に基づいて各周波数毎に前記入射角を算出し、音方向性判別ステップにおいて、前記各周波数毎に前記方向性を判別し、前記音声検出ステップにおいて、前記マイクロフォンで収音された音声情報について、方向性が確定していると判断される周波数の音声情報のみを検出するものであってもよい。 The sound information collected by each of the plurality of microphones includes a step of Fourier transform, and in the sound direction information calculation step, for each frequency based on a phase difference of each frequency obtained by the Fourier transform The incident angle is calculated, the directionality is determined for each frequency in the sound direction determination step, and the directionality is determined for the sound information collected by the microphone in the sound detection step. Only audio information having a frequency determined to be detected may be detected.

また、前記音方向性判別ステップにおいて、特定の周波数の音声について、前記入射角の時間的な変化に基づいて、前記方向性が確定しているか否かを判別するものであってもよい。 In the sound direction determination step, it may be determined whether or not the directionality is determined based on a temporal change in the incident angle for a sound having a specific frequency.

また、前記音方向性判別ステップにおいて、複数の周波数の音声について、入射角の一致度に基づいて前記方向性が確定しているか否かを判別するものであってもよい。 Further, in the sound directionality determining step, it may be determined whether or not the directionality is determined based on the degree of coincidence of incident angles for a plurality of frequencies of sound.

また、上記課題を解決するために、本発明の別の観点によれば、互いに離間して配置された複数のマイクロフォンと、前記マイクロフォンで収音された音声情報の位相差に基づいて、前記マイクロフォンへの音声の入射角を算出する音方向情報算出部と、前記入射角に基づいて、音源から発せられた音声の方向性を判別する音方向性判別部と、前記マイクロフォンで収音された音声情報から、前記方向性が確定していると判断される音声情報のみを検出する音声検出部と、前記音声検出部による検出結果に基づいて、撮影方向を変更するトラッキングカメラと、を備える撮像システムが提供される。 In order to solve the above-described problem, according to another aspect of the present invention, a plurality of microphones arranged apart from each other and a microphone based on a phase difference between sound information collected by the microphone. A sound direction information calculating unit for calculating the incident angle of the sound to the sound, a sound direction determining unit for determining the direction of the sound emitted from the sound source based on the incident angle, and the sound collected by the microphone An imaging system comprising: a voice detection unit that detects only voice information for which the directionality is determined from information; and a tracking camera that changes a shooting direction based on a detection result by the voice detection unit. Is provided.

本発明によれば、複数の音源からの音声のうち、必要な音声のみを検出することが可能な、音声検出装置、音声検出方法及び撮像システムを提供することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to provide the audio | voice detection apparatus, the audio | voice detection method, and imaging system which can detect only the required audio | voice among the audio | voices from a several sound source.

本発明の第１の実施形態に係る音声監視装置の構成を示す模式図である。It is a schematic diagram which shows the structure of the audio | voice monitoring apparatus which concerns on the 1st Embodiment of this invention. 音源の方向を検知する手法を示す模式図である。It is a schematic diagram which shows the method of detecting the direction of a sound source. 図２の構成をより詳細に示した模式図である。It is the schematic diagram which showed the structure of FIG. 2 in detail. 第１の実施形態の音声監視装置で行われる処理手順を示すフローチャートである。It is a flowchart which shows the process sequence performed with the audio | voice monitoring apparatus of 1st Embodiment. 第２の実施形態に係るトラッキングカメラシステムを示す模式図である。It is a schematic diagram which shows the tracking camera system which concerns on 2nd Embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。なお、説明は以下の順序で行うものとする。
１．第１の実施の形態
（１）音声監視装置の構成例
（２）音声監視装置で行われる処理
２．第２の実施の形態
（１）撮像システムの構成例 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol. The description will be made in the following order.
1. 1. First Embodiment (1) Configuration Example of Voice Monitoring Device (2) Processing Performed by Voice Monitoring Device Second Embodiment (1) Configuration Example of Imaging System

＜１．第１の実施形態＞
（１）音声監視装置の構成例
まず、図１を参照して、本発明の第１の実施形態に係る音声監視装置１００の概略構成について説明する。図１に示すように、本実施形態に係る音声監視装置１００は、マイクモジュール１１０、音声検出器１２０、音方向性判別器１３０、を備える。 <1. First Embodiment>
(1) Configuration Example of Voice Monitoring Device First, a schematic configuration of the voice monitoring device 100 according to the first embodiment of the present invention will be described with reference to FIG. As shown in FIG. 1, the voice monitoring apparatus 100 according to the present embodiment includes a microphone module 110, a voice detector 120, and a sound directionality discriminator 130.

図１に示すように、音声監視装置１００は室内に配置されている。また、図１に示すように、室内には室内音源があり、室内に対して壁で隔てられた室外には室外音源がある。ここで、室内音源とは、例えば室内に侵入した不審者による音源である。また、室外音源とは、例えば、室内に対して壁、天井、床面などによって隔てられた室外を通行する人、車両、機器等を発生源とする音源である。マイクモジュール１１０は、複数のマイクロフォンを備えており、室内音源と室外音源の双方の音声を収音することができる。 As shown in FIG. 1, the voice monitoring apparatus 100 is arranged indoors. Further, as shown in FIG. 1, there is an indoor sound source in the room, and there is an outdoor sound source outside the room separated by a wall. Here, the indoor sound source is, for example, a sound source by a suspicious person who has entered the room. In addition, the outdoor sound source is a sound source whose source is a person, a vehicle, a device, or the like that travels outside the room separated by walls, ceilings, floors, and the like. The microphone module 110 includes a plurality of microphones, and can collect sound from both the indoor sound source and the outdoor sound source.

室内音源で発生した音声と室外音源で発生した音声の相違点の１つとして、直接音の有無が挙げられる。音声監視装置１００が配置された室内の音源で発生した音声は、直接音であり、収音用のマイクモジュール１１０にほぼ直接入ることが多い。一方、室外の音源からの音声は、必ず壁、天井、床などを通過するため、間接音がほとんどであり、マイクモジュール１１０に直接入る成分は無い。このため、通常使用する無指向性マイクや、アレイマイクを含む指向性マイクで収音したモノラルの音声のみでは、音源の方向情報を得ることができない。このため、室外からの音を判別するのは非常に困難になっている。 One of the differences between the sound generated by the indoor sound source and the sound generated by the outdoor sound source is the presence or absence of direct sound. The sound generated by the sound source in the room where the sound monitoring apparatus 100 is arranged is a direct sound and often enters the sound collecting microphone module 110 almost directly. On the other hand, since the sound from the outdoor sound source always passes through the wall, ceiling, floor, etc., most of the sound is indirect and there is no component that directly enters the microphone module 110. For this reason, the direction information of the sound source cannot be obtained only with monaural sound picked up by a normally used omnidirectional microphone or a directional microphone including an array microphone. For this reason, it is very difficult to discriminate sounds from outside.

また、室内音源、室外音源からは、異なる複数の周波数の音声が発生するが、各周波数ごとの音声がマイクモジュール１１０に入射する際の入射角は、室外からの音声か室内からの音声かによって、以下のように特徴に相違点がある。第１の相違点は、１つの周波数に着目した場合の時間経過による入射角の変化であり、室内音源では一定の角度に安定するが、室外音源では角度が安定しない。第２の相違点は、周波数毎に入射角を比較した場合、室内音源では入射角が各周波数で一致する可能性が高いが、室外音源では入射角が周波数毎にバラバラになり易い。 In addition, sound from a plurality of different frequencies is generated from the indoor sound source and the outdoor sound source, but the incident angle when sound for each frequency is incident on the microphone module 110 depends on whether the sound is from outside or from inside the room. There are differences in features as follows. The first difference is a change in incident angle over time when attention is paid to one frequency. The indoor sound source is stable at a constant angle, but the outdoor sound source is not stable in angle. The second difference is that when the incident angles are compared for each frequency, there is a high possibility that the incident angle is the same for each frequency in the indoor sound source, but the incident angle is likely to be different for each frequency in the outdoor sound source.

図１の例では、室内音源からの音声は音源からマイクモジュール１１０へ一方向に直接的に伝達される。一方、室外音源からの音声は、先ず壁に伝わり、壁に沿って伝わるため、壁の広い範囲からマイクモジュール１１０に伝わる。従って、室外音源からの音声は方向性が失われており、多方向から間接的にマイクモジュール１０２へ伝わる。 In the example of FIG. 1, the sound from the indoor sound source is directly transmitted from the sound source to the microphone module 110 in one direction. On the other hand, since the sound from the outdoor sound source is first transmitted to the wall and then transmitted along the wall, it is transmitted to the microphone module 110 from a wide range of the wall. Therefore, the sound from the outdoor sound source has lost its directionality and is indirectly transmitted to the microphone module 102 from multiple directions.

本実施形態では、以上のような室内音源と室外音源の音声の相違に基づいて、第１及び第２の相違点による特徴に基づいて、収音された音声の方向性を求め、室外からの音声であるか室内からの音声であるかを判別することとしている。図２は、音源の方向を検知する手法を示す模式図である。マイクモジュールは、図２に示すように、マイクロフォン（マイク）Ａ、マイクロフォン（マイク）Ｂを備えている。空間上にマイクＡ、マイクＢ、及び音源が配置されており、マイクＡとマイクＢを結ぶ直線に対し、マイクＡおよびマイクＢに入ってくる音源からの音の入射角をθとする。 In this embodiment, based on the difference between the sound of the indoor sound source and the outdoor sound source as described above, the directionality of the collected sound is obtained based on the characteristics of the first and second differences, It is determined whether the sound is from the room or from the room. FIG. 2 is a schematic diagram showing a method for detecting the direction of the sound source. As shown in FIG. 2, the microphone module includes a microphone (microphone) A and a microphone (microphone) B. The microphone A, the microphone B, and the sound source are arranged in the space, and the incident angle of the sound from the sound source entering the microphone A and the microphone B with respect to the straight line connecting the microphone A and the microphone B is θ.

マイクＡとマイクＢとの間隔をＤとすると、２つのマイクＡ，Ｂ間の音の到達距離差ｄは、以下の（１）式から求めることができる。
ｄ＝Ｄｃｏｓθ ・・・（１）
また、各周波数ごとの位相差φ（ｆ）は、音速をＳとすると、以下の（２）式で表すことができる。
φ（ｆ）＝２πｆｄ／Ｓ・・・（２）
従って、マイクＡとマイクＢで取得された各周波数ｆの音声について、各周波数ｆの位相差から、各周波数ｆ毎に入射角θを求めることが可能である。 If the distance between the microphones A and B is D, the sound arrival distance difference d between the two microphones A and B can be obtained from the following equation (1).
d = D cos θ (1)
The phase difference φ (f) for each frequency can be expressed by the following equation (2), where S is the sound speed.
φ (f) = 2πfd / S (2)
Accordingly, for the sound of each frequency f acquired by the microphone A and the microphone B, the incident angle θ can be obtained for each frequency f from the phase difference of each frequency f.

入射角θの算出は、音源から発生した各周波数の音声情報に対して行う。音声監視装置１００では、マイクモジュール１１０において、マイクＡ，Ｂで収音した音声情報をＦＦＴでフーリエ変換し、周波数毎に音圧（パワー）と位相を検出する。そして、音方向性判別器１３０は、各周波数毎に、マイクＡの音声の位相とマイクＢの音声の位相との間の位相差φ（ｆ）を算出し、（１）式、（２）式から入射角θを求める。 Calculation of the incident angle θ is performed on audio information of each frequency generated from the sound source. In the sound monitoring apparatus 100, the sound information collected by the microphones A and B is Fourier-transformed by FFT in the microphone module 110, and sound pressure (power) and phase are detected for each frequency. Then, the sound direction discriminator 130 calculates the phase difference φ (f) between the phase of the sound of the microphone A and the phase of the sound of the microphone B for each frequency, and the equations (1) and (2) The incident angle θ is obtained from the equation.

音方向性判別器１３０は、求めた各周波数毎の入射角θから、音源の方向性を判別する。具体的には、音方向性判別器１３０は、収音した音声が室内音源からのものであるか、室外音源からのものであるかを判別し、その結果情報（尤度）を音声検出器１２０に出力する。 The sound directionality discriminator 130 discriminates the directionality of the sound source from the obtained incident angle θ for each frequency. Specifically, the sound directionality discriminator 130 discriminates whether the collected sound is from an indoor sound source or from an outdoor sound source, and uses the result information (likelihood) as a sound detector. 120 is output.

音声検出器１２０には、マイクモジュール１１０から、収音した音声情報をフーリエ変換した結果の音声情報が送られる。音声検出器１２０は、音方向性判別器１３０から送られた音方向性判別結果情報に基づいて、音源が室内であると判断された周波数の音声を検出結果として出力し、音源が室外であると判断された周波数の音声を削除する。 The sound detector 120 receives sound information as a result of Fourier-transforming the collected sound information from the microphone module 110. The sound detector 120 outputs, as a detection result, sound having a frequency at which the sound source is determined to be indoors based on the sound direction determination result information sent from the sound direction determiner 130, and the sound source is outdoors. The sound of the frequency determined to be deleted is deleted.

次に、図３に基づいて、音声監視装置１００で行われる処理についてより詳細に説明する。図３は、図２の音声監視装置１００の構成をより詳細に示した模式図である。図３に示すように、マイクモジュール１１０は、マイク（Ａ）１１２Ａ、マイク（Ｂ）１１２Ｂ、フーリエ変換器（ＦＦＴ）１１４Ａ、フーリエ変換器（ＦＦＴ）１１４Ｂ、音圧・位相情報分離部１１６を備える。マイク（Ａ）１１２Ａ、マイク（Ｂ）１１２Ｂは、室内音源、室外音源からの音声を収音する。 Next, the process performed by the voice monitoring apparatus 100 will be described in more detail based on FIG. FIG. 3 is a schematic diagram showing the configuration of the voice monitoring apparatus 100 of FIG. 2 in more detail. As shown in FIG. 3, the microphone module 110 includes a microphone (A) 112A, a microphone (B) 112B, a Fourier transformer (FFT) 114A, a Fourier transformer (FFT) 114B, and a sound pressure / phase information separation unit 116. . Microphone (A) 112A and microphone (B) 112B collect sound from indoor sound sources and outdoor sound sources.

マイク（Ａ）１１２Ａの出力はフーリエ変換器（ＦＦＴ）１１４Ａへ送られる。フーリエ変換器（ＦＦＴ）１１４Ａは、入力された音声情報を周波数軸にフーリエ変換し、各周波数ｆについて出力Ｘ_Ａ（ｆ）を出力する。また、マイク（Ｂ）１１２Ｂの出力はフーリエ変換器（ＦＦＴ）１１４Ｂへ送られる。フーリエ変換器（ＦＦＴ）１１４Ｂは、入力された音声情報を周波数軸にフーリエ変換し、各周波数ｆについて出力Ｘ_Ｂ（ｆ）を出力する。ここで、フーリエ変換によって算出されたＸ_Ａ（ｆ），Ｘ_Ｂ（ｆ）は、Ｘ_Ａ（ｆ）＝Ｐ_Ａ（ｆ）＋φ_Ａ（ｆ）・ｉ、Ｘ_Ｂ（ｆ）＝Ｐ_Ｂ（ｆ）＋φ_Ｂ（ｆ）・ｉとして表される（ｉは複素数）。Ｐ_Ａ（ｆ），Ｐ_Ｂ（ｆ）は音圧（パワー）、φ_Ａ（ｆ），φ_Ｂ（ｆ）は位相である。 The output of the microphone (A) 112A is sent to a Fourier transformer (FFT) 114A. The Fourier transformer (FFT) 114A Fourier-transforms the input audio information on the frequency axis, and outputs an output X _A (f) for each frequency f. The output of the microphone (B) 112B is sent to the Fourier transformer (FFT) 114B. Fourier transformer (FFT) 114B is Fourier transformed into a frequency axis audio information, and outputs the output _X B (f) for each frequency f. Here, X _A (f) and X _B (f) calculated by Fourier transform are X _A (f) = P _A (f) + φ _A (f) · i, X _B (f) = P _B ( f) + φ _B (f) · expressed as i (i is a complex number). P _A (f) and P _B (f) are sound pressures (power), and φ _A (f) and φ _B (f) are phases.

フーリエ変換器１１４Ａ，１１４Ｂの出力は、音圧・位相情報分離部１１６へ入力され、音圧と位相が分離される。音圧Ｐ_Ａ（ｆ），Ｐ_Ｂ（ｆ）は音声検出器１２０へ出力され、位相φ_Ａ（ｆ），φ_Ｂ（ｆ）は音方向性判別器１３４へ入力される。 The outputs of the Fourier transformers 114A and 114B are input to the sound pressure / phase information separation unit 116, and the sound pressure and phase are separated. The sound pressures P _A (f) and P _B (f) are output to the sound detector 120, and the phases φ _A (f) and φ _B (f) are input to the sound directionality discriminator 134.

音方向性判別器１３４は、音方向情報算出部１３２と音方向性算出部１３４とから構成される。音方向情報算出部１３２は、各周波数についてφ（ｆ）＝φ_Ａ（ｆ）−φ_Ｂ（ｆ）の演算を行い、マイク（Ａ）１１２Ａとマイク（Ｂ）１１２Ｂで収音された音声の位相差φ（ｆ）を算出する。そして、音方向情報算出部１３２は、（１）式、（２）式から、周波数毎に入射角θ（ｆ）を算出する。 The sound directionality discriminator 134 includes a sound direction information calculation unit 132 and a sound directionality calculation unit 134. The sound direction information calculation unit 132 calculates φ (f) = φ _A (f) −φ _B (f) for each frequency, and calculates the sound collected by the microphone (A) 112A and the microphone (B) 112B. The phase difference φ (f) is calculated. And the sound direction information calculation part 132 calculates incident angle (theta) (f) for every frequency from (1) Formula and (2) Formula.

周波数毎の入射角θ（ｆ）の情報は、音方向性判別部１３４へ入力される。音方向性判別部１３４では、周波数毎の入射角θ（ｆ）に基づいて、音源から発生した音声の方向性を判別し、各周波数の音声が室内音源を発生源とするものであるか否かを判別する。 Information on the incident angle θ (f) for each frequency is input to the sound directionality determination unit 134. The sound directionality determination unit 134 determines the directionality of the sound generated from the sound source based on the incident angle θ (f) for each frequency, and whether or not the sound of each frequency is generated from the indoor sound source. Is determined.

具体的には、音方向性判別部１３４は、個々の周波数の入射角θ（ｆ）についてある一定期間Ｔの間の変動を求め、変動が大きい場合、その周波数の音声は室外音源を発生源とするものであると判別する。一方、音方向性判別部１３４は、入射角θ（ｆ）の変動が小さい場合、その周波数の音声は室内音源を発生源とするものであると判別する。 More specifically, the sound directionality determination unit 134 obtains a fluctuation during a certain period T with respect to the incident angle θ (f) of each frequency, and when the fluctuation is large, the sound of the frequency generates an outdoor sound source. It is determined that On the other hand, when the variation in the incident angle θ (f) is small, the sound directionality determination unit 134 determines that the sound having the frequency is generated from the indoor sound source.

また、音方向性判別部１３４は、各周波数の入射角θ（ｆ）が一致しているか否かを求め、入射角θ（ｆ）が一致しているか、または所定範囲内である周波数の音声については、室内音源を発生源とするものであると判別する。一方、各周波数の入射角θ（ｆ）が一致していない場合、または各周波数の入射角θ（ｆ）が所定範囲内ではない場合、それらの周波数の音声については、室外音源を発生源とするものであると判別する。例えば、検出された複数の周波数ｆ１，ｆ２，ｆ３，ｆ４，ｆ５の音声のうち、ｆ１，ｆ２，ｆ３については入射角が３０°で一致しており、ｆ４の入射角が４０°であり、ｆ５の入射角が５０°であったとする。この場合、周波数ｆ１，ｆ２，ｆ３の音声については、室内音源を発生源とするものと判別し、周波数ｆ４，ｆ４の音声については、室外音源を発生源とするものと判別する。また、複数の周波数ｆ１〜ｆ８のうち、ｆ１，ｆ２，ｆ３については入射角が３０°で一致しており、ｆ４の入射角が４０°であり、ｆ５の入射角が５０°であり、ｆ６，ｆ７，ｆ８については入射角が６０°で一致していたものとする。この場合、周波数ｆ１，ｆ２，ｆ３を室内音源と判別するとともに、周波数ｆ６，ｆ７，ｆ８も室内音源と判別し、周波数ｆ４，ｆ４については室外音源と判別する。この場合、室内には、入射角３０°の音源と、入射角６０°の音源の２つがあるものと判別する。 In addition, the sound directionality determination unit 134 determines whether or not the incident angles θ (f) of the respective frequencies coincide with each other, and the sound having a frequency within which the incident angles θ (f) are coincident or within a predetermined range. Is determined to have an indoor sound source as a generation source. On the other hand, when the incident angles θ (f) of the respective frequencies do not coincide with each other, or when the incident angles θ (f) of the respective frequencies are not within the predetermined range, an outdoor sound source is used as the generation source for the sound of those frequencies. It is determined that it is to be. For example, among the detected sounds having the frequencies f1, f2, f3, f4, and f5, the incident angles of f1, f2, and f3 are the same at 30 °, and the incident angle of f4 is 40 °. Assume that the incident angle of f5 is 50 °. In this case, it is determined that the sound of the frequencies f1, f2, and f3 is generated from the indoor sound source, and the sound of the frequencies f4 and f4 is determined to be generated from the outdoor sound source. Further, among the plurality of frequencies f1 to f8, the incident angles of f1, f2, and f3 are the same at 30 °, the incident angle of f4 is 40 °, the incident angle of f5 is 50 °, and f6 , F7, and f8 are assumed to have the same incident angle of 60 °. In this case, the frequencies f1, f2, and f3 are determined as indoor sound sources, the frequencies f6, f7, and f8 are also determined as indoor sound sources, and the frequencies f4 and f4 are determined as outdoor sound sources. In this case, it is determined that there are two sound sources in the room, a sound source with an incident angle of 30 ° and a sound source with an incident angle of 60 °.

このように、音方向性判別部１３４では、１つの周波数に着目した場合の時間経過による入射角θの変化、または周波数毎に入射角を比較した場合の入射角θの変化、のいずれかが基準値よりも大きい場合は、音源が室外であると判定する。 As described above, the sound directionality determining unit 134 is either a change in the incident angle θ over time when focusing on one frequency or a change in the incident angle θ when comparing the incident angles for each frequency. When it is larger than the reference value, it is determined that the sound source is outdoor.

音方向性判別器１３０による判別結果は、音方向性情報として音声検出器１２０へ出力される。音方向性情報は、各周波数の音声について、室内音源を発生源とすることの確からしさ（尤度０〜１）として表される。ある周波数について尤度が１に近いほど、その周波数の音声が室内音源を発生源とする確からしさが高くなる。上述の例において、５つの周波数ｆ１，ｆ２，ｆ３，ｆ４，ｆ５の音声のうち、ｆ１，ｆ２，ｆ３については入射角が３０°で一致しており、ｆ４の入射角が４０°であり、ｆ５の入射角が５０°であった場合、ｆ１，ｆ２，ｆ３の音声は尤度が１に近くなる。一方、周波数ｆ４，ｆ５の音声については、尤度が０に近くなる。 The discrimination result by the sound direction discriminator 130 is output to the sound detector 120 as sound direction information. The sound directionality information is expressed as the probability (likelihood 0 to 1) of using the indoor sound source as the generation source for the sound of each frequency. The closer the likelihood is to 1 for a certain frequency, the higher the probability that the sound of that frequency will be generated from the indoor sound source. In the above example, among the sounds of the five frequencies f1, f2, f3, f4, and f5, the incident angles for f1, f2, and f3 are the same at 30 °, and the incident angle of f4 is 40 °. When the incident angle of f5 is 50 °, the voices of f1, f2, and f3 have a likelihood close to 1. On the other hand, the likelihood is close to 0 for the sounds of the frequencies f4 and f5.

音声検出器１２０は、マイクモジュール１１０から入力された音圧情報について、音方向性判別器１３０から入力された尤度に基づいて、音声の検出を行う。この際、尤度の高い周波数については、音方向性判別器１３０から入力された音圧Ｐ_Ａ（ｆ），Ｐ_Ｂ（ｆ）をそのまま出力し、尤度の低い周波数については、尤度に応じて音圧Ｐ_Ａ（ｆ），Ｐ_Ｂ（ｆ）を減少させるか、または除外する。これにより、音声検出器１２０からは、室内音源を発生源とする音声のみが出力される。音声検出器１２０から出力する検出結果には、音方向性判別器１３０から送られた音方向情報（入射角θ）、音方向性判別結果が含まれていても良い。 The sound detector 120 detects the sound of the sound pressure information input from the microphone module 110 based on the likelihood input from the sound direction discriminator 130. At this time, for frequencies higher likelihood, the sound inputted from the sound directional discriminator 130 pressure P _{A (f),} and outputs the P B _(f), for the low frequency of likelihood, the likelihoods Accordingly, the sound pressures P _A (f) and P _B (f) are decreased or excluded. Thereby, only the sound which uses the indoor sound source as the generation source is output from the sound detector 120. The detection result output from the sound detector 120 may include sound direction information (incident angle θ) and sound direction determination result sent from the sound direction determination device 130.

（２）音声監視装置で行われる処理
次に、図４のフローチャートに基づいて、本実施形態の音声監視装置１００で行われる処理手順について説明する。先ず、ステップＳ１０では、マイクモジュール１１０によって、室外および室内の音声の収音を行う。次のステップＳ１２では、ステップＳ１０で収音した音声から各周波数毎の音方向情報（入射角θ）を抽出し、音方向性判別部１３４に音方向情報を出力する。 (2) Processing Performed by Voice Monitoring Device Next, processing procedures performed by the voice monitoring device 100 of the present embodiment will be described based on the flowchart of FIG. First, in step S10, the microphone module 110 collects outdoor and indoor audio. In the next step S 12, sound direction information (incident angle θ) for each frequency is extracted from the sound collected in step S 10, and the sound direction information is output to the sound directionality determination unit 134.

次のステップＳ１４では、ステップＳ１０で収音した音声から音声情報（音圧）を抽出し、音声検出器１２０にこの音声情報を出力する。次のステップＳ１６では、音方向性判別部１３４によってステップＳ１２で求めた音方向情報（入射角θ）から音方向性（尤度）を決定し、音声検出器１２０に音方向性の情報を出力する。 In the next step S14, voice information (sound pressure) is extracted from the voice collected in step S10, and this voice information is output to the voice detector 120. In the next step S16, the sound directionality (likelihood) is determined from the sound direction information (incident angle θ) obtained in step S12 by the sound directionality determination unit 134, and the sound directionality information is output to the sound detector 120. To do.

次のステップＳ１８では、音声検出器１２０によってステップＳ１４とステップＳ１６で求めた音声情報及び音方向性情報を用いて音声検出を行う。ここでは、音方向性情報に基づいて、室外音源と判断された音声情報が除外され、室内音源の音声のみが検出される。ステップＳ２０では、音声検出器１２０の検出結果を出力する。 In the next step S18, voice detection is performed by the voice detector 120 using the voice information and sound directionality information obtained in steps S14 and S16. Here, the sound information determined as the outdoor sound source is excluded based on the sound directionality information, and only the sound of the indoor sound source is detected. In step S20, the detection result of the sound detector 120 is output.

上述したように、音方向情報の抽出は、マイクモジュール１１０が備える複数のマイクＡ，Ｂで収音した音声について、各周波数毎に音源からの音声の入射角θを求めることにより行う。音方向性判別は、音方向情報を元に、各周波数毎の入射角θの安定性や、複数周波数の入射角θの一致性などを総合的に判断して決定する。 As described above, the sound direction information is extracted by obtaining the incident angle θ of the sound from the sound source for each frequency for the sound collected by the plurality of microphones A and B included in the microphone module 110. The sound directionality determination is determined by comprehensively judging the stability of the incident angle θ for each frequency and the coincidence of the incident angles θ of a plurality of frequencies based on the sound direction information.

以上説明したように第１の実施形態によれば、各周波数毎に入射角θを求め、入射角θの一致性に基づいて音声が室内を発生源とするものであるか否かを判別することが可能となる。従って、室外からの音が比較的大きい場合や、室外の音声と室内の音声の特徴量が似ている場合などにおいても、室外からの音と室内の音の区別を容易且つ確実に行うことが可能となり、室外からの音の誤検出を確実に抑止することができる。従って、室内音のみに基づいて、例えば室内への不審者等の侵入を確実に検出することが可能となる。 As described above, according to the first embodiment, the incident angle θ is obtained for each frequency, and it is determined whether the sound is generated from the room based on the coincidence of the incident angles θ. It becomes possible. Therefore, even when the outdoor sound is relatively loud, or when the outdoor sound and the indoor sound feature amount are similar, it is possible to easily and reliably distinguish the outdoor sound from the indoor sound. It becomes possible, and the erroneous detection of the sound from the outside can be surely suppressed. Therefore, for example, it is possible to reliably detect intrusion of a suspicious person or the like into the room based only on the room sound.

＜２．第２の実施形態＞
（１）撮像システムの構成例
次に、本発明の第２の実施形態について説明する。図５は、第２の実施形態に係る撮像システム２００を示す模式図である。撮像システム２００は、音声検出結果に基づいて撮影方向を換えるトラッキングカメラ２１０を備えている。また、撮像システム２００は、第１の実施形態の音声監視装置１００と同様に、音声検出器１２０、音方向性判別器１３０を備えている。 <2. Second Embodiment>
(1) Configuration Example of Imaging System Next, a second embodiment of the present invention will be described. FIG. 5 is a schematic diagram illustrating an imaging system 200 according to the second embodiment. The imaging system 200 includes a tracking camera 210 that changes the shooting direction based on the sound detection result. In addition, the imaging system 200 includes a sound detector 120 and a sound direction discriminator 130 as in the sound monitoring apparatus 100 of the first embodiment.

トラッキングカメラ２１０は、撮影光学系、撮影光学系で結像された被写体像を光電変換する撮像素子、及び撮像光学系の光軸の向きを変更して撮影方向を変更する駆動部を備えている。また、トラッキングカメラ２１０は、第１の実施形態のマイクモジュール１１０と同様に複数のマイクを備えている。複数のマイクで収音された音声情報は、第１の実施形態と同様に、音圧情報と位相情報に分離されて、音声検出器１２０と音方向性判別器１３０へ入力される。そして、第１の実施形態と同様の手法により音声検出器１２０から検出結果が出力される。 The tracking camera 210 includes an imaging optical system, an imaging element that photoelectrically converts a subject image formed by the imaging optical system, and a drive unit that changes the imaging direction by changing the direction of the optical axis of the imaging optical system. . The tracking camera 210 includes a plurality of microphones as in the microphone module 110 of the first embodiment. The sound information collected by the plurality of microphones is separated into sound pressure information and phase information as in the first embodiment, and is input to the sound detector 120 and sound direction discriminator 130. Then, the detection result is output from the sound detector 120 by the same method as in the first embodiment.

第２の実施形態では、音声検出器１２０の検出結果はトラッキングカメラ２１０に入力される。ここで、音声検出器１２０から出力された検出結果には、音方向性判別器１３０から送られた音方向情報（入射角θ）、音方向性判別結果が含まれている。トラッキングカメラ２１０は、検出結果の入力に基づいて駆動部を駆動し、室内で音声を発信している話者に向けて撮影レンズを向けて話者を撮影する。 In the second embodiment, the detection result of the sound detector 120 is input to the tracking camera 210. Here, the detection result output from the sound detector 120 includes sound direction information (incident angle θ) and sound direction determination result sent from the sound direction determination unit 130. The tracking camera 210 drives the drive unit based on the input of the detection result, and photographs the speaker by pointing the photographing lens toward the speaker who is transmitting the sound indoors.

第２の実施形態においても、室外の人の音声は、音声検出器１２０によって除外される。トラッキングカメラ２１０は、室内を音源とする音声情報（音圧）、室内音源の音方向情報（入射角θ）、及び音方向性に基づいて、撮影方向を決定する。従って、トラッキングカメラ２１０の撮影方向が室外音源に向いてしまうことが抑止され、トラッキングカメラ２１０の撮影方向を室内の話者のみに向けることが可能となる。 Also in the second embodiment, the voice of the outdoor person is excluded by the voice detector 120. The tracking camera 210 determines the shooting direction based on sound information (sound pressure) using the room as a sound source, sound direction information (incident angle θ) of the room sound source, and sound directionality. Therefore, the shooting direction of the tracking camera 210 is prevented from being directed toward the outdoor sound source, and the shooting direction of the tracking camera 210 can be directed only to the indoor speaker.

従って、例えばテレビ会議を行う場合などにおいて、会議室の外に音源がある場合であっても、トラッキングカメラ２１０の撮影方向を室内の音源（話者）のみに向けることができる。 Therefore, for example, in the case of a video conference, even when the sound source is outside the conference room, the shooting direction of the tracking camera 210 can be directed only to the indoor sound source (speaker).

以上説明したように第２の実施形態によれば、トラッキングカメラ２１０を備えた撮像システム２００において、室内音源のみに基づいてトラッキングカメラ２１０の撮影方向を決定することが可能となる。従って、テレビ会議を行う場合などにおいて、トラッキングカメラ２１０の撮影方向が室外の音源に向いてしまうことを確実に抑止することが可能となる。 As described above, according to the second embodiment, in the imaging system 200 including the tracking camera 210, the shooting direction of the tracking camera 210 can be determined based only on the indoor sound source. Therefore, it is possible to reliably prevent the shooting direction of the tracking camera 210 from being directed toward an outdoor sound source when performing a video conference.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。例えば、上述した各実施形態では、室外音源と室内音源の判別に本発明を適用したが、本発明は音源の方向性を判別して必要な音声のみを抽出する場合等に広く適用できる。 The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention. For example, in each of the above-described embodiments, the present invention is applied to the discrimination between the outdoor sound source and the indoor sound source. However, the present invention can be widely applied to the case where only the necessary sound is extracted by determining the direction of the sound source.

１００音声監視装置
１１０マイクモジュール
１１２Ａ，１１２Ｂマイクロフォン
１２０音声検出器
１３２音方向情報算出部
１３４音方向性判別部
１１４Ａ，１１４Ｂフーリエ変換器（ＦＦＴ）
２００撮像システム
２１０トラッキングカメラ DESCRIPTION OF SYMBOLS 100 Audio | voice monitoring apparatus 110 Microphone module 112A, 112B Microphone 120 Audio | voice detector 132 Sound direction information calculation part 134 Sound directionality discrimination | determination part 114A, 114B Fourier transformer (FFT)
200 Imaging System 210 Tracking Camera

Claims

A plurality of microphones spaced apart from each other;
A sound direction information calculation unit that calculates an incident angle of sound to the microphone based on a phase difference of sound information collected by the microphone;
A sound directionality determining unit that determines the directionality of the sound emitted from the sound source based on the incident angle;
A voice detection unit that detects only voice information in which the directionality is determined from voice information collected by the microphone;
A voice detection device comprising:

A Fourier transform unit for Fourier transforming the sound information collected by each of the plurality of microphones;
The sound direction information calculation unit calculates the incident angle for each frequency based on the phase difference of each frequency obtained by the Fourier transform,
The sound direction determining unit determines the direction for each frequency,
The voice detection device according to claim 1, wherein the voice detection unit detects only voice information of a frequency for which it is determined that directionality is determined for voice information collected by the microphone.

The sound detection device according to claim 2, wherein the sound direction determination unit determines whether or not the directionality is determined based on a temporal change in the incident angle with respect to a sound having a specific frequency. .

The sound detection device according to claim 2, wherein the sound directionality determination unit determines whether or not the directionality is determined based on a degree of coincidence of incident angles with respect to sound having a plurality of frequencies.

Sound direction information calculation step for calculating the incident angle of the sound to the microphone based on the phase difference of the sound information collected by a plurality of microphones arranged apart from each other;
A sound directionality determining step for determining the directionality of the sound emitted from the sound source based on the incident angle;
A voice detection step for detecting only voice information for which the directionality is determined from voice information collected by the microphone;
A voice detection method comprising:

A step of Fourier transforming sound information collected by each of the plurality of microphones;
In the sound direction information calculation step, the incident angle is calculated for each frequency based on the phase difference of each frequency obtained by the Fourier transform,
In the sound directionality determination step, the directionality is determined for each frequency,
The voice detection method according to claim 5, wherein, in the voice detection step, only voice information having a frequency at which it is determined that directionality is determined is detected from voice information collected by the microphone.

The voice detection device according to claim 6, wherein in the sound directionality determination step, it is determined whether or not the directionality is determined based on a temporal change in the incident angle for a sound having a specific frequency. .

The sound detection device according to claim 6, wherein in the sound directionality determination step, it is determined whether or not the directionality is determined based on a degree of coincidence of incident angles for a plurality of frequencies of sound.

A plurality of microphones spaced apart from each other;
A sound direction information calculation unit that calculates an incident angle of sound to the microphone based on a phase difference of sound information collected by the microphone;
A sound directionality determining unit that determines the directionality of the sound emitted from the sound source based on the incident angle;
A voice detection unit that detects only voice information in which the directionality is determined from voice information collected by the microphone;
Based on the detection result by the voice detection unit, a tracking camera that changes the shooting direction;
An imaging system comprising: