JP2011232238A

JP2011232238A - Sound source direction estimation device

Info

Publication number: JP2011232238A
Application number: JP2010103971A
Authority: JP
Inventors: Toshiharu Kamijo; 利治上條; Nobuhiro Nishikawa; 信宏西川
Original assignee: Nidec Copal Corp
Current assignee: Nidec Copal Corp
Priority date: 2010-04-28
Filing date: 2010-04-28
Publication date: 2011-11-17

Abstract

PROBLEM TO BE SOLVED: To provide a sound source direction estimation device which can estimate a direction of a sound source even when a non-directional microphone is used.SOLUTION: In a sound source direction estimation device 20, a sound generated by a sound source located outside a first housing 10a is collected in the housing 10a through each of apertures 11 to 14. Each of microphones 1 to 4 embedded in the first housing 10a converts the sound propagated into the first housing 10a to an electric signal. A high-pass filter (HPF) 21 passes a frequency component of 2.5 kHz or higher out of the electric signal outputted by each of the microphones 1 to 4. The frequency component of a bandwidth having a high correlation with an angle of the position of the sound source is extracted. A sound source direction calculation unit 23 estimates the direction of the sound source based on the frequency component having passed through the HPF 21. Accordingly, even when the microphones 1 to 4 are non-directional, the sound source direction estimation device 20 can estimate the direction of the sound source.

Description

本発明は、音源方向推定装置に関する。 The present invention relates to a sound source direction estimating apparatus.

従来、下記特許文献１に示されるように、前面側と背面側とで集音可能な指向性のマイクロフォンを内蔵した電子機器が知られている。この機器では、第１の開口部が形成されたフロントキャビネット（筺体）の内面にマイクロフォンの前面が当接するようにしてマイクロフォンを保持すると共に、マイクロフォンの背面側には、フロントキャビネットとマイクロフォンカバーとにより空間部を形成しており、更に、この空間部に面するフロントキャビネットの部分に第２の開口部を形成している。 2. Description of the Related Art Conventionally, as shown in Patent Document 1 below, an electronic device incorporating a directional microphone that can collect sound on the front side and the back side is known. In this device, the microphone is held such that the front surface of the microphone comes into contact with the inner surface of the front cabinet (housing) in which the first opening is formed. A space portion is formed, and a second opening is formed in the portion of the front cabinet that faces the space portion.

このような構成により、上記の機器では、第１の開口部によって筺体の外部とマイクロフォンの前面側とを連通させると共に、第２の開口部によって筺体の外部とマイクロフォンの背面側の空間部とを連通させている。そして、フロントキャビネットの外部で生じた音を第１の開口部を通じてマイクロフォンの前面側で集音し、第２の開口部を通じてマイクロフォンの背面側で集音することにより、マイクロフォンの指向性を高めている。 With such a configuration, in the above-described device, the first opening allows the outside of the housing to communicate with the front side of the microphone, and the second opening connects the outside of the housing and the space on the back side of the microphone. Communicate. Then, sound generated outside the front cabinet is collected on the front side of the microphone through the first opening, and collected on the back side of the microphone through the second opening, thereby improving the directivity of the microphone. Yes.

特開平１０−５６６８１号公報Japanese Patent Laid-Open No. 10-56681

しかしながら、指向性のマイクロフォンを用いる上記の機器では、マイクロフォンの前面側と背面側とで集音する構造のため、マイクロフォンの背面側に空間部を形成する必要があり、構造上の制約が大きかった。従って、マイクロフォンの背面側の空間部を確保するためには、筺体を大型化するか、筺体から突出した位置にマイクロフォンを設置することとなり、例えばテレビ会議システムに用いられるような音源の方向へ向くカメラに適用する場合には、カメラのコンパクト化を図り難いといった問題があった。そこで、こうした構造上の制約を低減するため、無指向性のマイクロフォンを用いた場合でも音源の方向を推定し得る技術が望まれていた。 However, the above-described device using a directional microphone has a structure in which sound is collected on the front side and the back side of the microphone, so it is necessary to form a space on the back side of the microphone, and there are significant structural restrictions. . Therefore, in order to secure the space on the back side of the microphone, the casing is enlarged or the microphone is installed at a position protruding from the casing. For example, the microphone is directed toward the sound source used in the video conference system. When applied to a camera, there is a problem that it is difficult to make the camera compact. Therefore, in order to reduce such structural constraints, a technique that can estimate the direction of a sound source even when an omnidirectional microphone is used has been desired.

本発明は、無指向性のマイクロフォンを用いた場合でも音源の方向を推定することができる音源方向推定装置を提供することを目的とする。 An object of this invention is to provide the sound source direction estimation apparatus which can estimate the direction of a sound source, even when an omnidirectional microphone is used.

本発明に係る音源方向推定装置は、外部の音源で生じた音を取り込むための開口が形成された筺体と、筺体の開口の内側に埋設され、開口を通って筺体内に伝搬する音により振動板を振動させて音を電気信号に変換するマイクロフォンと、マイクロフォンから出力された電気信号のうち可聴域内にある所定の周波数以上の周波数成分を通過させるハイパスフィルタと、ハイパスフィルタを通過した周波数成分に基づいて音源の方向を推定する音源方向推定部と、を備えることを特徴とする。 The sound source direction estimating device according to the present invention is a case in which an opening for capturing sound generated by an external sound source is formed, and is vibrated by sound embedded in the opening of the case and propagating through the opening into the case. A microphone that vibrates a plate to convert sound into an electrical signal, a high-pass filter that passes a frequency component of a predetermined frequency or higher in the audible range of the electrical signal output from the microphone, and a frequency component that passes through the high-pass filter And a sound source direction estimating unit that estimates the direction of the sound source based on the sound source.

本発明に係る音源方向推定装置によれば、筺体の外部の音源で生じた音が開口を通じて筺体内に取り込まれ、筺体内に埋設されたマイクロフォンにより、筺体内に伝搬した音が電気信号に変換される。ここで、開口とマイクロフォンとを結ぶ方向線に対してある角度だけずれた位置で生じた音は、開口付近で回折してマイクロフォンに達する。この回折によって音は減衰するが、音源の位置する角度と音の減衰量とは、音の周波数に応じた相関関係を有する。すなわち、周波数が低い音では、角度が大きくなっても減衰量は小さいが、周波数が高い音では、角度が大きくなるほど減衰量は増大する。よって、マイクロフォンから出力された電気信号のうち所定の周波数以上の周波数成分がハイパスフィルタを通過することで、音源の位置する角度との高い相関関係を有する帯域の周波数成分を抽出することができる。そして、音源方向推定部によって、ハイパスフィルタを通過した周波数成分に基づいて音源の方向が推定されるため、マイクロフォン自体が無指向性であっても音源の方向を推定することができる。更に、マイクロフォンは、開口に向けられた前面側でのみ集音すればよいので、マイクロフォンの背面側に集音のための空間を設ける必要がなく、筺体内の空間を配線や他の部品の配置のために利用でき、その結果として筺体のコンパクト化が図られる。また、マイクロフォンは筺体内に埋設されるため、筺体から突出した位置にマイクロフォンを設置する場合に比して外観の向上が図られる。 According to the sound source direction estimating apparatus according to the present invention, sound generated by a sound source outside the housing is taken into the housing through the opening, and the sound propagated into the housing is converted into an electric signal by the microphone embedded in the housing. Is done. Here, the sound generated at a position shifted by a certain angle with respect to the direction line connecting the opening and the microphone diffracts near the opening and reaches the microphone. Although sound is attenuated by this diffraction, the angle at which the sound source is located and the sound attenuation amount have a correlation according to the frequency of the sound. That is, for a sound with a low frequency, the amount of attenuation is small even when the angle is large, but for a sound with a high frequency, the amount of attenuation increases as the angle increases. Therefore, a frequency component in a band having a high correlation with the angle at which the sound source is located can be extracted by passing a frequency component of a predetermined frequency or higher in the electrical signal output from the microphone through the high-pass filter. Since the direction of the sound source is estimated by the sound source direction estimation unit based on the frequency component that has passed through the high-pass filter, the direction of the sound source can be estimated even if the microphone itself is omnidirectional. Furthermore, since the microphone needs to collect sound only on the front side facing the opening, it is not necessary to provide a space for collecting sound on the back side of the microphone, and the space inside the enclosure is arranged with wiring and other parts. As a result, the housing can be made compact. Further, since the microphone is embedded in the housing, the appearance can be improved as compared with the case where the microphone is installed at a position protruding from the housing.

ここで、筺体には、開口から筺体内に向けて延びる伝搬路を形成する筒状体が設けられ、マイクロフォンは、筒状体の端部に配置されており、伝搬路の断面は、マイクロフォンの振動板よりも大きいと好適である。 Here, the casing is provided with a cylindrical body that forms a propagation path extending from the opening toward the casing, and the microphone is disposed at the end of the cylindrical body. It is preferable that it is larger than the diaphragm.

この場合、筒状体が設けられることにより、マイクロフォンの振動板よりも大きい断面を有する伝搬路を通って開口からマイクロフォンへと音が伝搬するため、この音が振動板に確実に伝わり、音源の方向をより一層精度良く推定することができる。 In this case, since the cylindrical body is provided, the sound propagates from the opening to the microphone through the propagation path having a larger cross section than the diaphragm of the microphone, so that the sound is reliably transmitted to the diaphragm and the sound source The direction can be estimated with higher accuracy.

また、開口からマイクロフォンまでの最短距離は、開口の幅よりも大きいと好適である。 The shortest distance from the opening to the microphone is preferably larger than the width of the opening.

開口からマイクロフォンまでの最短距離が開口の幅よりも小さいと、周波数の高い音が回折せずにマイクロフォンに直接達してしまう可能性がある。開口からマイクロフォンまでの最短距離が開口の幅よりも大きいと、周波数の高い音がマイクロフォンに直接達してしまうことが防止され、開口付近での回折によって減衰した音がマイクロフォンに達するため、上記した音源の方向の推定効果がより好適に発揮される。 If the shortest distance from the opening to the microphone is smaller than the width of the opening, high-frequency sound may reach the microphone directly without being diffracted. If the shortest distance from the opening to the microphone is larger than the width of the opening, high-frequency sound is prevented from reaching the microphone directly, and sound attenuated by diffraction near the opening reaches the microphone. The effect of estimating the direction is more suitably exhibited.

本発明によれば、無指向性のマイクロフォンを用いた場合でも音源の方向を推定することができる。 According to the present invention, the direction of a sound source can be estimated even when an omnidirectional microphone is used.

本発明の一実施形態に係る音源方向推定装置が適用されたテレビ会議用カメラの斜視図である。1 is a perspective view of a video conference camera to which a sound source direction estimating device according to an embodiment of the present invention is applied. 図２（ａ）は図１のテレビ会議用カメラにおける開口を含む位置の断面図であり、図２（ｂ）は図２（ａ）中の筒状体の軸線方向から見た側面図である。2A is a cross-sectional view of the position including the opening in the video conference camera of FIG. 1, and FIG. 2B is a side view of the cylindrical body in FIG. 2A viewed from the axial direction. . 図１のテレビ会議用カメラの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the camera for video conferences of FIG. 筒状体の軸線に対して音源の方向がなす角度を示す模式図である。It is a schematic diagram which shows the angle which the direction of a sound source makes with respect to the axis line of a cylindrical body. 図５（ａ）はテレビ会議用カメラに対する音源の位置を示す図であり、図５（ｂ）は図５（ａ）の場合のマイクロフォンの出力特性を示す図である。FIG. 5A is a diagram showing the position of the sound source with respect to the video conference camera, and FIG. 5B is a diagram showing the output characteristics of the microphone in the case of FIG.

以下、本発明の一実施形態に係る音源方向推定装置について図面を参照しながら説明する。以下の説明では、音源方向推定装置がテレビ会議用カメラに適用される場合について説明する。 Hereinafter, a sound source direction estimating apparatus according to an embodiment of the present invention will be described with reference to the drawings. In the following description, a case where the sound source direction estimating device is applied to a video conference camera will be described.

図１〜図３に示すように、テレビ会議用カメラＡは、例えばテレビ会議に参加する複数の参加者により囲まれた位置に設置されて、発言する参加者（以下、「発言者」という）の方向を推定してその方向へカメラ３５を向けることにより、発言者の映像を取得するものである。テレビ会議用カメラＡは、ネットワークを介して相手方のテレビ会議システムと通信可能になっている。 As shown in FIGS. 1 to 3, the video conference camera A is installed at a position surrounded by, for example, a plurality of participants participating in the video conference, and speaks (hereinafter referred to as “speaker”). The video of the speaker is acquired by directing the camera 35 in that direction. The video conference camera A can communicate with the other party's video conference system via a network.

テレビ会議用カメラＡは、テーブル等に載置される直径約１０ｃｍの略円筒形状で樹脂製の第１筺体１０ａと、第１筺体１０ａの軸線に沿って鉛直方向に配置された軸（図示せず）を介して第１筺体１０ａの上側に取り付けられ、その軸を中心として第１筺体１０ａに対して回転可能な略直方体形状で樹脂製の第２筺体１０ｂとを有している。カメラ３５は、第２筺体１０ｂ内に収納されている。 The video conference camera A has a substantially cylindrical shape with a diameter of about 10 cm placed on a table or the like, and a resin-made first housing 10a, and an axis (not shown) arranged in the vertical direction along the axis of the first housing 10a. And a second housing 10b made of resin having a substantially rectangular parallelepiped shape that is rotatable about the axis of the first housing 10a. The camera 35 is housed in the second housing 10b.

第１筺体１０ａの側面には、同じ高さで周方向に９０°ずつ離間する位置に４つの円形の開口１１〜１４が形成されている。各開口１１〜１４は、テレビ会議用カメラＡの周囲で発せられた発言者の音声を第１筺体１０ａ内に取り込むための穴であり、内径が約８ｍｍになっている。 On the side surface of the first housing 10a, four circular openings 11 to 14 are formed at the same height and spaced apart by 90 ° in the circumferential direction. Each opening 11-14 is a hole for taking in the voice of the speaker uttered around the video conference camera A into the first housing 10a, and has an inner diameter of about 8 mm.

第１筺体１０ａの内面側には、各開口１１〜１４から第１筺体１０ａ内に向けて第１筺体１０ａの半径方向に延びる円筒形状の４本の筒状体１６〜１９が固定されている。各筒状体１６〜１９の内径は約８ｍｍであり、各開口１１〜１４の内周面と各筒状体１６〜１９の内壁面とは略面一になっている。各筒状体１６〜１９は、これらの内部に直径約８ｍｍの円柱形上の伝搬路Ｂ１〜Ｂ４を形成している。すなわち、各伝搬路Ｂ１〜Ｂ４は、各開口１１〜１４から第１筺体１０ａ内に向けて第１筺体１０ａの半径方向に水平に延びるように形成されている。なお、各開口１１〜１４は、第１筺体１０ａの外壁に形成された穴を意味しており、各開口１１〜１４には、各筒状体１６〜１９及び各伝搬路Ｂ１〜Ｂ４は含まれない。 Four cylindrical bodies 16 to 19 having a cylindrical shape extending in the radial direction of the first casing 10a from the openings 11 to 14 into the first casing 10a are fixed to the inner surface side of the first casing 10a. . The inner diameter of each cylindrical body 16-19 is about 8 mm, and the inner peripheral surface of each opening 11-14 and the inner wall surface of each cylindrical body 16-19 are substantially flush. Each cylindrical body 16-19 forms propagation paths B1-B4 on a cylindrical shape having a diameter of about 8 mm inside thereof. That is, each propagation path B1-B4 is formed so that it may extend horizontally in the radial direction of the 1st housing | casing 10a toward the inside of the 1st housing | casing 10a from each opening 11-14. In addition, each opening 11-14 means the hole formed in the outer wall of the 1st housing 10a, and each cylindrical body 16-19 and each propagation path B1-B4 are included in each opening 11-14. I can't.

更に、第１筺体１０ａの中心側に位置する各筒状体１６〜１９の端部には、前面が各開口１１〜１４に平行になるようにして円柱形状の４個のマイクロフォン（以下、「マイク」という）１〜４が埋設されている。各マイク１〜４は、例えば直径約７ｍｍ、厚さ約４ｍｍの、前面側でのみ集音可能な無指向性のエレクトリックコンデンサマイクである。各マイク１〜４は、直径約５ｍｍの円形のダイヤフラムからなる振動板１ａ〜４ａを内蔵している。各開口１１〜１４から各マイク１〜４の前面までの最短距離は約１０ｍｍであり、各開口１１〜１４の内径よりも長くなっている。また、上記したように、伝搬路Ｂ１〜Ｂ４の内径（断面）は、振動板１ａ〜４ａよりも大きくなっている（図２参照）。これにより、伝搬路Ｂ１〜Ｂ４を通る音は振動板１ａ〜４ａに確実に伝達されるようになっている。各マイク１〜４と各筒状体１６〜１９の内壁面との間には、スペーサ（図示せず）が配設されることが好ましい。 Furthermore, four cylindrical microphones (hereinafter referred to as “hereinafter referred to as“ microphones ”) are formed at the ends of the cylindrical bodies 16 to 19 located on the center side of the first casing 10a so that the front faces are parallel to the openings 11 to 14, respectively. 1-4) are embedded. Each of the microphones 1 to 4 is an omnidirectional electric condenser microphone having a diameter of about 7 mm and a thickness of about 4 mm and capable of collecting sound only on the front side. Each of the microphones 1 to 4 incorporates diaphragms 1a to 4a made of a circular diaphragm having a diameter of about 5 mm. The shortest distance from each opening 11-14 to the front surface of each microphone 1-4 is about 10 mm, which is longer than the inner diameter of each opening 11-14. Moreover, as above-mentioned, the internal diameter (cross section) of propagation path B1-B4 is larger than the diaphragms 1a-4a (refer FIG. 2). Thereby, the sound passing through the propagation paths B1 to B4 is surely transmitted to the diaphragms 1a to 4a. A spacer (not shown) is preferably disposed between the microphones 1 to 4 and the inner wall surfaces of the cylindrical bodies 16 to 19.

このような構成により、テレビ会議用カメラＡでは、会議における発言者の音声が開口１１〜１４及び伝搬路Ｂ１〜Ｂ４を通じてマイク１〜４に入力される仕組みとなっている。そして、マイク１〜４に入力される音声により振動板１ａ〜４ａが振動させられ、この振動に応じて音声が電気信号に変換され、電気信号がマイク１〜４からアンプ５〜８へ出力される（図３参照）。なお、マイク１〜４の背面側の内部空間Ｓには第１筺体１０ａの外部の音が伝搬しない構成になっている。 With such a configuration, the video conference camera A has a mechanism in which the voice of the speaker in the conference is input to the microphones 1 to 4 through the openings 11 to 14 and the propagation paths B1 to B4. Then, the diaphragms 1a to 4a are vibrated by the sound input to the microphones 1 to 4, and the sound is converted into an electrical signal according to the vibration, and the electrical signal is output from the microphones 1 to 4 to the amplifiers 5 to 8. (See FIG. 3). In addition, it is the structure which the sound outside the 1st housing 10a does not propagate to the internal space S of the back side of the microphones 1-4.

図３に示すように、テレビ会議用カメラＡは、マイク１〜４に入力された音声に基づいて所定の処理を施すことにより、発言者の方向を推定する機能を備えている。具体的には、テレビ会議用カメラＡは、各マイク１〜４から出力された電気信号を入力し、入力した電気信号を増幅させる４個のアンプ５〜８と、各アンプ５〜８で増幅された電気信号のうち所定の周波数以上の周波数成分を通過させるハイパスフィルタ（以下、「ＨＰＦ」という）２１と、ＨＰＦ２１を通過した周波数成分をアナログ−デジタル（Analog−Digital）変換するＡＤ変換部２２と、ＡＤ変換部２２でデジタル変換された周波数成分に基づいて音源の方向を推定する音源方向演算部（音源方向推定部）２３、ヒストグラム登録部２４、及び音源方向再演算部２５とを備えている。 As shown in FIG. 3, the video conference camera A has a function of estimating the direction of the speaker by performing a predetermined process based on the sound input to the microphones 1 to 4. Specifically, the video conference camera A receives the electric signals output from the microphones 1 to 4 and amplifies the input electric signals by the four amplifiers 5 to 8 and the amplifiers 5 to 8. A high-pass filter (hereinafter referred to as “HPF”) 21 that passes a frequency component equal to or higher than a predetermined frequency in the received electrical signal, and an AD conversion unit 22 that performs analog-digital conversion on the frequency component that has passed through the HPF 21. A sound source direction calculation unit (sound source direction estimation unit) 23 that estimates the direction of the sound source based on the frequency component digitally converted by the AD conversion unit 22, a histogram registration unit 24, and a sound source direction recalculation unit 25. Yes.

アンプ５〜８は、マイク１〜４に各々接続されており、各マイク１〜４から出力された電気信号を増幅回路により増幅させる。各アンプ５〜８は、電気信号の増幅度を調整するための調整つまみを有してもよい。各アンプ５〜８は、増幅させた電気信号をＨＰＦ２１に出力する。 The amplifiers 5 to 8 are connected to the microphones 1 to 4, respectively, and amplify the electrical signals output from the microphones 1 to 4 by an amplifier circuit. Each amplifier 5-8 may have an adjustment knob for adjusting the amplification degree of an electric signal. Each of the amplifiers 5 to 8 outputs the amplified electric signal to the HPF 21.

ＨＰＦ２１は、各アンプ５〜８から出力された電気信号を入力し、入力した電気信号のうち、例えば２．５ｋＨｚ未満の周波数成分を遮断し、２．５ｋＨｚ以上の周波数成分を通過させる。ＨＰＦ２１における遮断周波数は、可聴域内にある所定の周波数とされる。ＨＰＦ２１は、アンプ５〜８のいずれから出力された周波数成分であるかを識別する機能を有してもよいし、各アンプ５〜８ごとに複数設けられてもよい。 The HPF 21 receives the electrical signal output from each of the amplifiers 5 to 8, blocks the frequency component of, for example, less than 2.5 kHz, and passes the frequency component of 2.5 kHz or more from the input electrical signal. The cutoff frequency in the HPF 21 is a predetermined frequency within the audible range. The HPF 21 may have a function of identifying which one of the amplifiers 5 to 8 is a frequency component output, or a plurality of HPFs 21 may be provided for each of the amplifiers 5 to 8.

ＡＤ変換部２２は、ＨＰＦ２１を通過した周波数成分をアナログ信号からデジタル信号に変換する。ＡＤ変換部２２は、デジタル信号に変換した周波数成分を生成し、生成した周波数成分を音源方向演算部２３に出力する。ＡＤ変換部２２は、アンプ５〜８のいずれから出力された周波数成分であるかを識別する機能を有してもよいし、各アンプ５〜８ごとにＨＰＦ２１が複数設けられる場合には、各アンプ５〜８に対応して複数設けられてもよい。 The AD converter 22 converts the frequency component that has passed through the HPF 21 from an analog signal to a digital signal. The AD conversion unit 22 generates a frequency component converted into a digital signal, and outputs the generated frequency component to the sound source direction calculation unit 23. The AD converter 22 may have a function of identifying which one of the amplifiers 5 to 8 is a frequency component output. When a plurality of HPFs 21 are provided for each of the amplifiers 5 to 8, A plurality of amplifiers 5 to 8 may be provided.

音源方向演算部２３、ヒストグラム登録部２４、及び音源方向再演算部２５は、例えばＣＰＵ（Central Processing Unit)、ＲＯＭ(Read Only Memory)、ＲＡＭ(Random Access Memory)等により構成されている。音源方向演算部２３、ヒストグラム登録部２４、及び音源方向再演算部２５は、ＡＤ変換部２２から出力された周波数成分に基づいて発言者の方向を演算することにより、発言者の方向を推定する。 The sound source direction calculation unit 23, the histogram registration unit 24, and the sound source direction recalculation unit 25 are configured by, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The sound source direction calculation unit 23, the histogram registration unit 24, and the sound source direction recalculation unit 25 estimate the speaker direction by calculating the speaker direction based on the frequency component output from the AD conversion unit 22. .

具体的には、音源方向演算部２３は、ＡＤ変換部２２から出力された周波数成分に基づいて、各マイク１〜４に対応する周波数成分の出力値（音の大きさ）を比較することにより、音声が発せられた音源の方向を演算する。音源方向演算部２３での音源の方向の演算処理としては、公知の技術を用いることができる。テレビ会議用カメラＡにおける音源の方向は、第１筺体１０ａの周方向の所定位置（例えば開口１１の中心位置）を基準として、第１筺体１０ａの上方から見た場合に時計回りを正とした角度（０°〜３６０°）で表される。音源方向演算部２３による音源の方向の演算は、所定時間毎、例えば４０ミリ秒毎に実行される。 Specifically, the sound source direction calculation unit 23 compares the output values (sound volume) of frequency components corresponding to the microphones 1 to 4 based on the frequency components output from the AD conversion unit 22. The direction of the sound source from which the sound is emitted is calculated. As the calculation processing of the direction of the sound source in the sound source direction calculation unit 23, a known technique can be used. The direction of the sound source in the video conference camera A is positive in the clockwise direction when viewed from above the first housing 10a with reference to a predetermined position in the circumferential direction of the first housing 10a (for example, the center position of the opening 11). It is expressed as an angle (0 ° to 360 °). The calculation of the direction of the sound source by the sound source direction calculation unit 23 is executed every predetermined time, for example, every 40 milliseconds.

ここで、本実施形態のテレビ会議用カメラＡにあっては、マイク１〜４が開口１１〜１４から第１筺体１０ａ内に所定長（例えば約１０ｍｍ）入り込んだ位置に埋設されているため、周波数が高い音ほど、マイク１〜４に達する音は回折により減衰する。例えば、図４に示すように、マイク１の中心と開口１１の中心とを結ぶ方向線（すなわち筒状体１６の軸線）に対して音源の位置する方向が角度θだけずれていると、周波数の高い音は、角度θずれている分減衰してマイク１に達する。そして、ＨＰＦ２１により２．５ｋＨｚ以上の周波数成分が通過させられるため、角度の影響を受けた出力値が音源方向演算部２３に入力され、各マイク１〜４に対応する出力値の比較により、音源方向演算部２３における音源の方向の演算が精度良く行われる。音源方向演算部２３は、演算した所定時間毎の音源の方向をヒストグラム登録部２４に出力する。 Here, in the video conference camera A of the present embodiment, since the microphones 1 to 4 are embedded in the first casing 10a through the openings 11 to 14 at a predetermined length (for example, about 10 mm), As the frequency increases, the sound reaching the microphones 1 to 4 is attenuated by diffraction. For example, as shown in FIG. 4, if the direction in which the sound source is located is shifted by an angle θ with respect to the direction line connecting the center of the microphone 1 and the center of the opening 11 (that is, the axis of the cylindrical body 16), Sound that is high is attenuated by the angle θ shift and reaches the microphone 1. Since the frequency component of 2.5 kHz or more is passed by the HPF 21, an output value affected by the angle is input to the sound source direction calculation unit 23, and the sound source is compared by comparing the output values corresponding to the microphones 1 to 4. The direction calculation unit 23 calculates the direction of the sound source with high accuracy. The sound source direction calculation unit 23 outputs the calculated direction of the sound source every predetermined time to the histogram registration unit 24.

ヒストグラム登録部２４は、音源方向演算部２３から出力された音源の方向を入力し、入力した音源の方向を逐次記憶する。ヒストグラム登録部２４による音源の方向の記憶では、常に、最も古いデータに最新のデータが上書きされる。そして、ヒストグラム登録部２４は、記憶した音源の方向を２０°刻みで１８段階の角度範囲に分類し、分類結果に応じてヒストグラムを生成する。ヒストグラム登録部２４は、生成したヒストグラムを音源方向再演算部２５に出力する。 The histogram registration unit 24 inputs the direction of the sound source output from the sound source direction calculation unit 23 and sequentially stores the input direction of the sound source. In storing the direction of the sound source by the histogram registration unit 24, the oldest data is always overwritten with the latest data. Then, the histogram registration unit 24 classifies the stored sound source directions into 20-degree angle ranges in increments of 20 °, and generates a histogram according to the classification result. The histogram registration unit 24 outputs the generated histogram to the sound source direction recalculation unit 25.

音源方向再演算部２５は、ヒストグラム登録部２４から出力されたヒストグラムを入力し、入力したヒストグラムの中で度数が最大である角度範囲における度数と、この角度範囲の近傍の角度範囲における度数とに基づいて角度範囲の平均値を求めることにより、音源の方向を再演算する。音源方向再演算部２５は、再演算した音源の方向を発言者の方向としてモータ制御部３１に出力する。 The sound source direction recalculation unit 25 receives the histogram output from the histogram registration unit 24, and converts the frequency in the angle range where the frequency is maximum in the input histogram and the frequency in the angle range near the angle range. The direction of the sound source is recalculated by obtaining the average value of the angle range based on it. The sound source direction recalculation unit 25 outputs the recalculated sound source direction to the motor control unit 31 as the speaker direction.

こうして、音源方向演算部２３、ヒストグラム登録部２４、及び音源方向再演算部２５による音源の方向の演算処理が行われ、発言者の方向が推定される。図１及び図３に示すように、テレビ会議用カメラＡでは、第１筺体１０ａ、開口１１〜１４、筒状体１６〜１９、マイク１〜４、アンプ５〜８、ＨＰＦ２１、ＡＤ変換部２２、音源方向演算部２３、ヒストグラム登録部２４、及び音源方向再演算部２５を備えて音源方向推定装置２０が構成されている。 In this way, the sound source direction calculation process is performed by the sound source direction calculation unit 23, the histogram registration unit 24, and the sound source direction recalculation unit 25, and the direction of the speaker is estimated. As shown in FIGS. 1 and 3, in the video conference camera A, the first casing 10 a, openings 11 to 14, cylindrical bodies 16 to 19, microphones 1 to 4, amplifiers 5 to 8, HPF 21, AD conversion unit 22. The sound source direction estimating unit 20 includes a sound source direction calculating unit 23, a histogram registration unit 24, and a sound source direction recalculating unit 25.

更に、図３に示すように、テレビ会議用カメラＡは、音源方向推定装置２０によって推定された音源の方向へカメラ３５を向け、カメラ３５により発言者の映像を取得する機能を備えている。具体的には、テレビ会議用カメラＡは、音源方向再演算部２５から出力された音源の方向に基づいてモータ駆動部３２を制御するモータ制御部３１と、モータ制御部３１により制御されてモータ３３を駆動させるモータ駆動部３２と、モータ駆動部３２により駆動させられてカメラ回転機構３４に回転力を与えるモータ３３と、モータ３３に連結されて回転により第２筺体１０ｂと共にカメラ３５を旋回させるカメラ回転機構３４と、映像を取得するカメラ３５と、カメラ３５より取得された映像の映像データを映像処理する映像処理部３６と、映像処理部３６により処理された映像データを転送するデータ転送部３７と、データ転送部３７により転送された映像データをネットワークに送信するネットワーク通信部３８とを備えている。 Further, as shown in FIG. 3, the video conference camera A has a function of directing the camera 35 in the direction of the sound source estimated by the sound source direction estimating device 20 and acquiring the video of the speaker by the camera 35. Specifically, the video conference camera A includes a motor control unit 31 that controls the motor driving unit 32 based on the direction of the sound source output from the sound source direction recalculation unit 25, and a motor controlled by the motor control unit 31. A motor drive unit 32 that drives the motor 33, a motor 33 that is driven by the motor drive unit 32 to apply a rotational force to the camera rotation mechanism 34, and is connected to the motor 33 to rotate the camera 35 together with the second casing 10b by rotation. A camera rotation mechanism 34, a camera 35 that acquires video, a video processing unit 36 that performs video processing on video data acquired from the camera 35, and a data transfer unit that transfers video data processed by the video processing unit 36 37 and a network communication unit 38 that transmits the video data transferred by the data transfer unit 37 to the network.

次に、このようなテレビ会議用カメラＡにおける動作について説明する。以下の説明では、図５（ａ）に示すように、開口１１と開口１２とから等しい距離で開口１３及び開口１４とは反対側の４５°に位置する発言者から音声が発せられた場合を例として説明する。図５（ｂ）に示すように、各マイク１〜４に対応する周波数成分の出力値は、２．５ｋＨｚ〜１０ｋＨｚの間でピーク値に差が見られる。より具体的には、２．５ｋＨｚ〜１０ｋＨｚの間に見られる出力値は、マイク１，２では同等であり、マイク３，４ではマイク１，２よりも減衰している。一方、２．５ｋＨｚ未満に見られる出力値は、マイク１〜４で略等しくなっている。 Next, the operation of the video conference camera A will be described. In the following description, as shown in FIG. 5A, a case where a voice is uttered from a speaker located at 45 ° opposite to the openings 13 and 14 at an equal distance from the openings 11 and 12 is used. This will be described as an example. As shown in FIG. 5B, the output values of the frequency components corresponding to the microphones 1 to 4 have a difference in peak value between 2.5 kHz and 10 kHz. More specifically, the output values seen between 2.5 kHz and 10 kHz are the same for the microphones 1 and 2, and are attenuated more than the microphones 1 and 2 for the microphones 3 and 4. On the other hand, the output values seen below 2.5 kHz are substantially equal for the microphones 1 to 4.

音源方向推定装置２０では、ＨＰＦ２１によって２．５ｋＨｚ未満の周波数成分は遮断され、２．５ｋＨｚ以上の周波数成分が通過させられる。図５（ｂ）に示す例の場合、回折の影響を大きく受けた２．５ｋＨｚ〜１０ｋＨｚの間に見られる出力値に基づいて音源の方向が演算され、音源は、４５°の位置にあると推定される。音源方向推定装置２０は、４５°を示す信号をモータ制御部３１に出力する。 In the sound source direction estimating apparatus 20, the frequency component of less than 2.5 kHz is blocked by the HPF 21, and the frequency component of 2.5 kHz or more is passed. In the case of the example shown in FIG. 5B, the direction of the sound source is calculated based on the output value seen between 2.5 kHz and 10 kHz that is greatly affected by diffraction, and the sound source is at a position of 45 °. Presumed. The sound source direction estimating device 20 outputs a signal indicating 45 ° to the motor control unit 31.

モータ制御部３１は、音源方向推定装置２０から出力された４５°を示す信号を入力し、モータ駆動部３２を制御してモータ３３を駆動させ、カメラ３５が４５°の方向へ向くように、カメラ回転機構３４によってカメラ３５を第２筺体１０ｂと共に旋回させる。なお、モータ制御部３１によるカメラ３５の旋回制御は、所定時間連続して一定範囲内の音源の方向が入力された際に、入力された一定範囲内の音源の方向の平均値を用いて実行されてもよい。 The motor control unit 31 inputs a signal indicating 45 ° output from the sound source direction estimating device 20, controls the motor driving unit 32 to drive the motor 33, and the camera 35 faces in the 45 ° direction. The camera 35 is rotated together with the second casing 10b by the camera rotation mechanism 34. Note that the turning control of the camera 35 by the motor control unit 31 is executed using the average value of the direction of the sound source within the certain range when the direction of the sound source within the certain range is input continuously for a predetermined time. May be.

カメラ３５は、前方の映像を取得し、取得した映像を示す映像データを生成する。カメラ３５は、生成した映像データを映像処理部３６に出力する。映像処理部３６は、カメラ３５から出力された映像データを入力し、入力した映像データに映像処理を施し、映像処理後の映像データをデータ転送部３７に出力する。データ転送部３７は、映像処理部３６から出力された映像データを入力し、入力した映像データをネットワーク通信部３８に転送する。そして、ネットワーク通信部３８は、データ転送部３７から転送された映像データをネットワークを介して相手方のテレビ会議システムに送信する。 The camera 35 acquires a front video and generates video data indicating the acquired video. The camera 35 outputs the generated video data to the video processing unit 36. The video processing unit 36 receives the video data output from the camera 35, performs video processing on the input video data, and outputs the video data after the video processing to the data transfer unit 37. The data transfer unit 37 receives the video data output from the video processing unit 36 and transfers the input video data to the network communication unit 38. Then, the network communication unit 38 transmits the video data transferred from the data transfer unit 37 to the other party's video conference system via the network.

本実施形態の音源方向推定装置２０によれば、第１筺体１０ａの外部の音源で生じた音が各開口１１〜１４を通じて第１筺体１０ａ内に取り込まれ、第１筺体１０ａ内に埋設された各マイク１〜４により、第１筺体１０ａ内に伝搬した音が電気信号に変換される。そして、各マイク１〜４から出力された電気信号のうち２．５ｋＨｚ以上の周波数成分がＨＰＦ２１を通過することで、音源の位置する角度との高い相関関係を有する帯域の周波数成分を抽出することができ、音源方向演算部２３によって、ＨＰＦ２１を通過した周波数成分に基づいて音源の方向が推定されるため、各マイク１〜４が無指向性であっても音源の方向を推定することができる。更に、各マイク１〜４は、各開口１１〜１４に向けられた前面側でのみ集音すればよいので、各マイク１〜４の背面側に集音のための空間を設ける必要がなく、第１筺体１０ａ内の内部空間Ｓを配線や他の部品の配置のために利用でき、その結果として第１筺体１０ａのコンパクト化が図られる。また、各マイク１〜４は第１筺体１０ａ内に埋設されるため、筺体から突出した位置にマイクを設置する場合に比して外観の向上が図られる。 According to the sound source direction estimating device 20 of the present embodiment, the sound generated by the sound source outside the first housing 10a is taken into the first housing 10a through the openings 11 to 14 and embedded in the first housing 10a. The sound propagated into the first housing 10a is converted into an electric signal by the microphones 1 to 4. And the frequency component of the band which has a high correlation with the angle where a sound source is located is extracted because the frequency component of 2.5 kHz or more passes through HPF 21 among the electric signals output from each microphone 1-4. Since the direction of the sound source is estimated by the sound source direction calculation unit 23 based on the frequency component that has passed through the HPF 21, the direction of the sound source can be estimated even if each of the microphones 1 to 4 is omnidirectional. . Furthermore, since each microphone 1-4 should collect sound only on the front side directed to each opening 11-14, there is no need to provide a space for collecting sound on the back side of each microphone 1-4. The internal space S in the first housing 10a can be used for the arrangement of wiring and other components, and as a result, the first housing 10a can be made compact. Moreover, since each microphone 1-4 is embed | buried in the 1st housing 10a, the improvement of an external appearance is achieved compared with the case where a microphone is installed in the position protruded from the housing.

更に、筒状体１６〜１９が設けられることにより、各マイク１〜４の振動板１ａ〜４ａよりも大きい断面を有する伝搬路Ｂ１〜Ｂ４を通って各開口１１〜１４から各マイク１〜４へと音が伝わるため、この音が各振動板１ａ〜４ａに確実に伝わり、音源の方向をより一層精度良く推定することができる。 Furthermore, by providing the cylindrical bodies 16 to 19, the microphones 1 to 4 pass from the openings 11 to 14 through the propagation paths B1 to B4 having a larger cross section than the diaphragms 1a to 4a of the microphones 1 to 4, respectively. Since the sound is transmitted to the sound, this sound is reliably transmitted to each of the diaphragms 1a to 4a, and the direction of the sound source can be estimated with higher accuracy.

また、各開口１１〜１４から各マイク１〜４までの最短距離が各開口１１〜１４の内径よりも大きいため、周波数の高い音が各マイク１〜４に直接達してしまうことが防止され、各開口１１〜１４付近での回折によって減衰した音が各マイク１〜４に達し、上記した音源の方向の推定効果がより好適に発揮される。 Moreover, since the shortest distance from each opening 11-14 to each microphone 1-4 is larger than the internal diameter of each opening 11-14, it is prevented that the sound with a high frequency reaches each microphone 1-4 directly, The sound attenuated by the diffraction in the vicinity of each opening 11-14 reaches each microphone 1-4, and the above-described effect of estimating the direction of the sound source is more suitably exhibited.

以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限られるものではない。 The preferred embodiment of the present invention has been described above, but the present invention is not limited to the above embodiment.

例えば、上記実施形態では、筒状体１６〜１９は第１筺体１０ａの半径方向に配置され、各伝搬路Ｂ１〜Ｂ４は水平に延びる場合について説明したが、筒状体１６〜１９は、第１筺体１０ａの中心側から各開口１１〜１４に向けて上方又は下方に傾斜するように配置されてもよい。このような構成によれば、テレビ会議用カメラが会議の参加者の頭部の位置よりも低い位置又は高い位置に設置される場合であっても、発言者の音声を第１筺体１０ａ内に効果的に取り込むことができる。 For example, in the above-described embodiment, the cylindrical bodies 16 to 19 are arranged in the radial direction of the first casing 10a, and the propagation paths B1 to B4 extend horizontally. However, the cylindrical bodies 16 to 19 are You may arrange | position so that it may incline upward or downward toward each opening 11-14 from the center side of the 1 housing 10a. According to such a configuration, even when the video conference camera is installed at a position lower or higher than the position of the head of the participant of the conference, the voice of the speaker is placed in the first housing 10a. It can be taken in effectively.

また、上記実施形態では、筒状体１６〜１９は円筒形状である場合について説明したが、角筒形状であってもよい。この場合、伝搬路は角柱形状となる。また、マイク１〜４は円柱形状である場合について説明したが、直方体形状であってもよい。 Moreover, although the cylindrical bodies 16-19 demonstrated the case where it was a cylindrical shape in the said embodiment, a rectangular tube shape may be sufficient. In this case, the propagation path has a prismatic shape. Moreover, although the microphones 1-4 demonstrated the case where it was cylindrical shape, a rectangular parallelepiped shape may be sufficient.

また、上記実施形態では、４個のマイク１〜４が設けられる場合について説明したが、マイクの設置個数は１〜３個であってもよく、５個以上であってもよい。例えば、音源方向推定装置の周囲で発生する音の大きさが予め想定される場合には、１個のマイクによって、所定の周波数以上の周波数成分の出力値に基づいて音源の方向を推定することができる。 Moreover, although the said embodiment demonstrated the case where the four microphones 1-4 were provided, the installation number of microphones may be 1-3, and may be five or more. For example, when the volume of sound generated around the sound source direction estimation device is assumed in advance, the direction of the sound source is estimated based on the output value of the frequency component of a predetermined frequency or more with one microphone. Can do.

また、ＨＰＦ２１における遮断周波数は、マイクの特定に応じて可聴域内の範囲で適宜設定することができる。また、ＨＰＦ２１に加えて、ＨＰＦ２１を通過した周波数成分のうちＨＰＦ２１の遮断周波数よりも高い周波数以下（例えば７ｋＨｚ以下）の周波数成分を通すローパスフィルターを設けてもよい。この場合、ＨＰＦ２１及びローパスフィルターによってバンドパスフィルターが構成される。 Further, the cutoff frequency in the HPF 21 can be set as appropriate within a range within the audible range according to the specification of the microphone. In addition to the HPF 21, a low-pass filter that passes a frequency component that is lower than the cutoff frequency of the HPF 21 (for example, 7 kHz or less) among frequency components that have passed through the HPF 21 may be provided. In this case, a band pass filter is configured by the HPF 21 and the low pass filter.

更にまた、上記実施形態では、音源方向推定装置２０がテレビ会議用カメラＡに適用される場合について説明したが、マイクを備えて音源の方向を推定し、推定した音源の方向を利用する装置であればこれに限られない。例えば、本発明の音源方向推定装置は、異常音を検知して、その異常音が発生した音源の方向を映す監視カメラ等にも適用できる。 Furthermore, in the above-described embodiment, the case where the sound source direction estimating device 20 is applied to the video conference camera A has been described. However, the sound source direction estimating device 20 is a device that includes a microphone and estimates the direction of the sound source and uses the estimated sound source direction. If there is, it is not limited to this. For example, the sound source direction estimating apparatus of the present invention can be applied to a monitoring camera or the like that detects abnormal sound and reflects the direction of the sound source in which the abnormal sound has occurred.

１〜４…マイク、１ａ〜４ａ…振動板、１０ａ…筺体、１１〜１４…開口、１６〜１９…筒状体、２０…音源方向推定装置、２１…ＨＰＦ、２３…音源方向演算部（音源方向推定部）、Ｂ１〜Ｂ４…伝搬路。 DESCRIPTION OF SYMBOLS 1-4 ... Microphone, 1a-4a ... Diaphragm, 10a ... Housing, 11-14 ... Opening, 16-19 ... Cylindrical body, 20 ... Sound source direction estimation apparatus, 21 ... HPF, 23 ... Sound source direction calculating part (sound source Direction estimation unit), B1 to B4... Propagation path.

Claims

A housing in which an opening for capturing sound generated by an external sound source is formed;
A microphone that is embedded inside the opening of the housing and oscillates a diaphragm by the sound that propagates through the opening and into the housing, and converts the sound into an electrical signal;
A high-pass filter that passes a frequency component of a predetermined frequency or higher in the audible range of the electrical signal output from the microphone;
A sound source direction estimation unit that estimates the direction of the sound source based on the frequency component that has passed through the high-pass filter;
A sound source direction estimating apparatus comprising:

The casing is provided with a cylindrical body that forms a propagation path extending from the opening toward the casing,
The microphone is disposed at an end of the cylindrical body;
The sound source direction estimating apparatus according to claim 1, wherein a cross section of the propagation path is larger than the diaphragm of the microphone.

3. The sound source direction estimating apparatus according to claim 1, wherein a shortest distance from the opening to the microphone is larger than a width of the opening.