JP2007312181A

JP2007312181A - Imaging sound pickup signal reproduction system

Info

Publication number: JP2007312181A
Application number: JP2006140039A
Authority: JP
Inventors: Katsumi Hasegawa; 勝巳長谷川
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2006-05-19
Filing date: 2006-05-19
Publication date: 2007-11-29

Abstract

<P>PROBLEM TO BE SOLVED: To attain an imaging sound pickup signal reproduction system with which recognition and judgment of an abnormality by a viewer and listener are surely performed and a video is output by being switched to a video at a monitoring point where the abnormality is generated. <P>SOLUTION: An imaging sound pickup device 1 is constituted by providing an imaging sound pickup part 11 having a cylindrical part 116 with a plurality of cameras 113a-113d and a plurality of microphones 112a-112h which pick up binaural sound and a selection part 12 which outputs a sound signal of a video signal set as a main video signal from among a plurality of video signals obtained from the plurality of cameras as a selected sound signal, a playback device 3 has a video signal output part 33 which sets the main video signal to be output to a display means 42 and a crosstalk canceler 34 which processes a crosstalk signal so as to be canceled by using a filter in which a head-related transfer function measured by using a cylindrical sound receiving object 64 is stored and outputs the selected sound signal to right and left speakers. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、複数の被写体を撮影するための複数の撮像カメラと、それぞれの撮像カメラの撮影する方向を正面としてバイノーラル収音するための複数のバイノーラル収音マイクを有する一方、バイノーラル収音された音声信号を視聴者の前面に配置される左右のスピーカで再生する撮像収音信号再生システムに関する。 The present invention has a plurality of imaging cameras for shooting a plurality of subjects and a plurality of binaural sound pickup microphones for binaural sound pickup with the shooting direction of each imaging camera as the front, while binaural sound pickup is performed. The present invention relates to a picked-up image pickup signal reproduction system that reproduces audio signals with left and right speakers arranged in front of a viewer.

最近になり、監視カメラの性能も向上し、監視カメラは工場設備の監視、コンビニエンスストアなどでのレジ部の監視、商品売り場の監視など監視システムは多くの場所で使用されるようになってきた。
一方、1台の監視カメラはカメラの設置される周囲をパンしながら監視すると広い視野角での監視ができるなど監視を効果的に行うことが出来る。監視領域の音場をバイノーラル収音する場合は高品質化された監視映像と共に臨場感豊かな再生音場により監視領域を有効に監視できる。さらに、監視中に異常を発見した場合に、即座に監視カメラを異常とされる被写体に切り替えて監視領域を監視できるようにする。バイノーラル収音する音声も監視カメラの切り替えにより、必要に応じた収音方向の切り替えができることは好ましい。 Recently, the performance of surveillance cameras has improved, and surveillance cameras have been used in many places, such as surveillance of factory equipment, surveillance of cash register departments at convenience stores, etc., surveillance of merchandise stores, etc. .
On the other hand, if one surveillance camera pans the surroundings where the camera is installed, surveillance can be performed effectively with a wide viewing angle. When binaural sound pickup is performed in the monitoring area, the monitoring area can be effectively monitored by a high-quality monitoring video and a realistic reproduction sound field. Further, when an abnormality is found during monitoring, the monitoring camera can be immediately switched to a subject that is abnormal and the monitoring area can be monitored. It is preferable that the sound to be binaurally picked up can be switched in the sound collecting direction as needed by switching the monitoring camera.

特許文献１には、複数の監視カメラで複数の監視対象を監視中に、異常が生じた監視領域を特定し易くした多地点遠隔監視方式が開示されている。監視室の使用者の周囲３６０度に被監視所の監視地点Ａ，Ｂ，Ｃと対応した平面的な広がりをもった音の仮想空間を定義し、使用者の周囲３６０度にその音情報が定位するような音像定位処理を施し、その音像定位位置から実際の音情報の地点を使用者に知らせ、異常発見の遅れを改善するようにした多地点遠隔監視方式が開示されている。
特開平１０−５１７５９号公報 Patent Document 1 discloses a multipoint remote monitoring system that makes it easy to specify a monitoring area where an abnormality has occurred while monitoring a plurality of monitoring targets with a plurality of monitoring cameras. A virtual sound space having a two-dimensional spread corresponding to the monitoring points A, B, and C of the monitored station is defined around 360 degrees around the user in the monitoring room, and the sound information is located around 360 degrees around the user. A multi-point remote monitoring system is disclosed in which sound image localization processing is performed so that localization is performed, the point of actual sound information is notified to the user from the sound image localization position, and the delay of abnormality detection is improved.
JP 10-51759 A

しかしながら、特許文献１に開示されている多地点遠隔監視方式では、複数の監視地点を監視室内に仮想空間として再配置し、監視室内で音の聞こえる方向を異常が発生した地点として使用者に認識させようとしている。その認識の後、使用者が認識された監視領域の映像と音声とを切り替えて表示器に出力し、監視領域での異常の探索をするようにしている。この多地点遠隔監視方式では、使用者が異常音を認識した後に、その認識に従って映像と音声とを切り替えて出力するようにしているため、認識や判断が曖昧な場合は異常原因の発見が遅れる。異常が生じてから監視映像及び監視音声を即時に異常の生じた監視地点に切り替えて出力することのできる多地点遠隔監視方式を実現することはできなかった。 However, in the multipoint remote monitoring method disclosed in Patent Document 1, a plurality of monitoring points are rearranged as a virtual space in the monitoring room, and the direction in which sound can be heard in the monitoring room is recognized as a point where an abnormality has occurred. I am trying to let you. After the recognition, the video and audio of the monitoring area recognized by the user are switched and output to the display to search for an abnormality in the monitoring area. In this multipoint remote monitoring method, after the user recognizes the abnormal sound, the video and the sound are switched and output according to the recognition, so when the recognition and judgment are ambiguous, the discovery of the cause of the abnormality is delayed. . It has not been possible to realize a multi-point remote monitoring system that can immediately switch and output the monitoring video and the monitoring sound to the monitoring point where the abnormality has occurred after the abnormality has occurred.

そこで、本発明は、上記のような問題点を解消するためになされたもので、視聴者による異常の認識や判断が確実に行え、異常が生じてから監視映像及び監視音声を即時に異常の生じた監視地点に切り替えて出力することを可能とする撮像収音信号再生システムを提供することを目的とする。 Therefore, the present invention has been made to solve the above-described problems, and it is possible to reliably recognize and judge the abnormality by the viewer. It is an object of the present invention to provide an image pickup / acquisition signal reproduction system capable of switching to a generated monitoring point and outputting it.

本願発明における第１の発明は、カメラにて撮像された映像信号と、マイクロフォンにて収音された音声信号とを出力する撮像収音装置と、前記撮像収音装置から供給される前記映像信号を表示装置に対して出力すると共に前記撮像収音装置から供給される前記音声信号を左右のスピーカに対して出力する再生装置とを有する撮像収音信号再生システムにおいて、前記撮像収音装置は、複数のカメラと、前記複数のカメラのうち１つを撮影用カメラとする場合に前記撮影用カメラに対して所定角度左方向及び右方向に配置され、かつ底面からの高さが同一な一対の左右マイクロフォンで収音したバイノーラル音声を前記撮影用カメラで撮影した映像信号に付随する音声信号として前記複数のカメラのそれぞれに対するバイノーラル音声を収音する複数のマイクロフォンとを備えた円筒部を有する撮像収音部と、前記複数のカメラから得られる複数の映像信号のうちから前記再生装置側において主映像信号として設定された映像信号に付随する音声信号を選択音声信号として出力する選択部と、を備え、前記再生装置は、前記複数の映像信号のうちから１つを前記主映像信号として設定し、前記主映像信号として設定した映像信号から表示用の主映像信号を生成して表示手段に出力する映像信号出力部と、前記選択音声信号が供給され、前記選択音声信号を前記左右のスピーカによって再生した場合に左右のスピーカから再生された信号が視聴者の右左側の耳部で視聴されるクロストーク信号を打ち消すように処理して前記左右のスピーカに対して出力するクロストークキャンセラと、を備え、前記クロストークキャンセラは予め求めた頭部伝達関数を記憶したフィルタを有し、前記頭部伝達関数は、測定信号を円筒型構造体に装着した一対のマイクロフォンで収音した音声信号に基づいて計測した頭部伝達関数であることを特徴とする撮像収音信号再生システムを提供する。 According to a first aspect of the present invention, there is provided an image pickup device that outputs a video signal picked up by a camera and an audio signal picked up by a microphone, and the video signal supplied from the image pickup device. In the imaging sound pickup signal reproduction system, the image pickup sound collection device includes: a reproduction device that outputs the sound signal supplied from the image pickup sound collection device to the left and right speakers. When a plurality of cameras and one of the plurality of cameras is used as a shooting camera, the pair of cameras are arranged at a predetermined angle leftward and rightward with respect to the shooting camera and have the same height from the bottom surface. Binaural sound picked up by left and right microphones is picked up as binaural sound for each of the plurality of cameras as a sound signal accompanying a video signal picked up by the shooting camera. An image pickup / sound pickup unit having a cylindrical portion having a plurality of microphones, and an audio accompanying a video signal set as a main video signal on the playback device side from a plurality of video signals obtained from the plurality of cameras A selection unit that outputs a signal as a selected audio signal, wherein the playback device sets one of the plurality of video signals as the main video signal and displays the video signal set as the main video signal A video signal output unit for generating a main video signal for output to the display means, and a signal reproduced from the left and right speakers when the selected audio signal is supplied and the selected audio signal is reproduced by the left and right speakers A crosstalk canceller that processes to cancel the crosstalk signal viewed at the right and left ears of the viewer and outputs to the left and right speakers The crosstalk canceller includes a filter that stores a head-related transfer function obtained in advance, and the head-related transfer function converts the measurement signal into an audio signal collected by a pair of microphones attached to a cylindrical structure. An imaging sound pickup signal reproduction system characterized by being a head-related transfer function measured based on the above is provided.

本発明によれば、撮像収音装置は、複数のカメラと、複数のカメラのうち１つを撮影用カメラとする場合に撮影用カメラに対して所定角度左方向及び右方向に配置され、かつ底面からの高さが同一な一対の左右マイクロフォンで収音したバイノーラル音声を撮影用カメラで撮影した映像信号に付随する音声信号として複数のカメラのそれぞれに対するバイノーラル音声を収音する複数のマイクロフォンとを備えた円筒部を有する撮像収音部と、複数のカメラから得られる複数の映像信号のうちから再生装置側において主映像信号として設定された映像信号に付随する音声信号を選択音声信号として出力する選択部と、を備え、再生装置は、複数の映像信号のうちから１つを主映像信号として設定し、主映像信号として設定した映像信号から表示用の主映像信号を生成して表示手段に出力する映像信号出力部と、選択音声信号が供給され、選択音声信号を左右のスピーカによって再生した場合に左右のスピーカから再生された信号が視聴者の右左側の耳部で視聴されるクロストーク信号を打ち消すように処理して左右のスピーカに対して出力するクロストークキャンセラと、を備え、クロストークキャンセラは予め求めた頭部伝達関数を記憶したフィルタを有し、頭部伝達関数は、測定信号を円筒型構造体に装着した一対のマイクロフォンで収音した音声信号に基づいて計測した頭部伝達関数であるので、視聴者による異常の認識や判断が確実に行え、異常が生じてから監視映像及び監視音声を即時に異常の生じた監視地点に切り替えて出力することを可能とする撮像収音信号再生システムを実現できる。 According to the present invention, the imaging and sound collection device is arranged in a left direction and a right direction at a predetermined angle with respect to the photographing camera when one of the plurality of cameras is a photographing camera, and A plurality of microphones for collecting binaural sound for each of a plurality of cameras as an audio signal accompanying a video signal obtained by photographing a pair of left and right microphones having the same height from the bottom surface with a photographing camera. An imaging sound collection unit having a cylindrical portion provided, and an audio signal accompanying a video signal set as a main video signal on the playback device side among a plurality of video signals obtained from a plurality of cameras is output as a selection audio signal A reproducing unit configured to set one of a plurality of video signals as a main video signal and display from the video signal set as the main video signal. A video signal output unit that generates a main video signal for output to the display means, and a selection audio signal is supplied, and when the selection audio signal is reproduced by the left and right speakers, the signals reproduced from the left and right speakers are A crosstalk canceller that processes to cancel the crosstalk signal viewed at the right and left ears and outputs it to the left and right speakers, and the crosstalk canceller stores the head-related transfer function obtained in advance. Since the head-related transfer function is a head-related transfer function measured based on the audio signal collected by the pair of microphones mounted on the cylindrical structure, the head-related transfer function has a filter. An imaging sound pickup signal reproduction system that makes it possible to make a reliable judgment and immediately switch and output the monitoring video and the monitoring sound to the monitoring point where the abnormality has occurred after the abnormality has occurred. It can be realized Temu.

以下に本発明の実施例に係る撮像収音再生システムについて図１〜図１２を用いて説明する。
図１は、本発明の実施に係る撮像収音再生システムの構成例を示すブロック図である。図２は、本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その１）である。図３は、本発明の実施に係る撮像収音再生システムの要部の構成例を説明するための図である。図４は、本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その２）である。図５は、本発明の実施に係る再生画像の表示例を示す図である。図６は、本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その３）である。図７は、本発明の実施に係るクロストークキャンセラのフィルタ特性計測装置の構成例を示す図である。図８は、本発明の実施に係るフィルタ特性測定時に用いるバイノーラル受音部の構成例を示す図である。図９は、本発明の実施に係るクロストークキャンセラのインパルス応答特性例を示す図である。図１０は、本発明の実施に係るクロストークキャンセラの周波数応答特性例を示す図である。図１１は、従来のクロストークキャンセラのインパルス応答特性例を参考に示した図である。図１２は、従来のクロストークキャンセラの周波数応答特性例を参考に示した図である。 Hereinafter, an image pickup and reproduction system according to an embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is a block diagram illustrating a configuration example of an imaging sound collection / reproduction system according to an embodiment of the present invention. FIG. 2 is a diagram (part 1) illustrating a configuration example of a main part of the image pickup and reproduction system according to the embodiment of the present invention. FIG. 3 is a diagram for explaining a configuration example of a main part of the image pickup and reproduction system according to the embodiment of the present invention. FIG. 4 is a diagram (part 2) illustrating a configuration example of a main part of the image pickup and reproduction system according to the embodiment of the present invention. FIG. 5 is a diagram showing a display example of a reproduced image according to the embodiment of the present invention. FIG. 6 is a diagram (No. 3) illustrating a configuration example of a main part of the image pickup and reproduction system according to the embodiment of the present invention. FIG. 7 is a diagram illustrating a configuration example of a filter characteristic measuring apparatus of a crosstalk canceller according to an embodiment of the present invention. FIG. 8 is a diagram illustrating a configuration example of the binaural sound receiving unit used when the filter characteristics are measured according to the embodiment of the present invention. FIG. 9 is a diagram illustrating an example of impulse response characteristics of the crosstalk canceller according to the embodiment of the present invention. FIG. 10 is a diagram illustrating an example of frequency response characteristics of the crosstalk canceller according to the embodiment of the present invention. FIG. 11 is a diagram showing an example of impulse response characteristics of a conventional crosstalk canceller for reference. FIG. 12 is a diagram showing an example of frequency response characteristics of a conventional crosstalk canceller.

その撮像収音信号再生システムは視聴者による異常の認識や判断が確実に行え、異常が生じてから監視映像及び監視音声を即時に異常の生じた監視地点に切り替えて出力することを可能とする撮像収音信号再生システムを実現するという目的を、撮像収音装置は、複数のカメラと、複数のカメラのうち１つを撮影用カメラとする場合に撮影用カメラに対して所定角度左方向及び右方向に配置され、かつ底面からの高さが同一な一対の左右マイクロフォンで収音したバイノーラル音声を撮影用カメラで撮影した映像信号に付随する音声信号として複数のカメラのそれぞれに対するバイノーラル音声を収音する複数のマイクロフォンとを備えた円筒部を有する撮像収音部と、複数のカメラから得られる複数の映像信号のうちから再生装置側において主映像信号として設定された映像信号に付随する音声信号を選択音声信号として出力する選択部と、を備え、再生装置は、複数の映像信号のうちから１つを主映像信号として設定し、主映像信号として設定した映像信号から表示用の主映像信号を生成して表示手段に出力する映像信号出力部と、選択音声信号が供給され、選択音声信号を左右のスピーカによって再生した場合に左右のスピーカから再生された信号が視聴者の右左側の耳部で視聴されるクロストーク信号を打ち消すように処理して左右のスピーカに対して出力するクロストークキャンセラと、を備え、クロストークキャンセラは予め求めた頭部伝達関数を記憶したフィルタを有し、頭部伝達関数は、測定信号を円筒型構造体に装着した一対のマイクロフォンで収音した音声信号に基づいて計測した頭部伝達関数であるようにして実現した。 The image pickup / acquisition signal reproduction system can surely recognize and judge the abnormality by the viewer, and after the abnormality occurs, it is possible to immediately switch and output the monitoring video and the monitoring sound to the monitoring point where the abnormality has occurred. For the purpose of realizing an image pickup / acquisition signal reproduction system, an image pickup / collection device has a plurality of cameras and a left angle of a predetermined angle with respect to the image pickup camera when one of the plurality of cameras is set as an image pickup camera. Binaural audio picked up by a pair of left and right microphones arranged in the right direction and having the same height from the bottom is collected as binaural audio for each of the multiple cameras as an audio signal accompanying the video signal shot by the shooting camera. An image pickup / sound pickup unit having a cylindrical portion with a plurality of sounding microphones and a plurality of video signals obtained from a plurality of cameras. A selection unit that outputs an audio signal accompanying the video signal set as the main video signal as the selected audio signal, and the playback device sets one of the plurality of video signals as the main video signal, A video signal output unit that generates a main video signal for display from the video signal set as the video signal and outputs it to the display means, and a selection audio signal is supplied, and when the selection audio signal is reproduced by the left and right speakers, A crosstalk canceller that processes the signal reproduced from the speaker so as to cancel the crosstalk signal that is viewed by the right and left ears of the viewer and outputs the signal to the left and right speakers. The head-related transfer function has a filter that stores the obtained head-related transfer function. The head-related transfer function is a voice signal obtained by collecting a measurement signal with a pair of microphones attached to a cylindrical structure. It was achieved as is HRTFs measured based on.

撮像収音信号再生システムの構成について述べる。
図１に示す撮像収音信号再生システム１０は、撮像収音部１１、選択部１２、圧縮部１３、及び送出部１４よりなる撮像収音装置１と、伝送路２と、受信部３１、伸長部３２、信号制御部３３、及びクロストークキャンセラ３４よりなる再生装置３と、スピーカ４１、４３、及びモニタ４２よりなる表示部４とで構成される。
図２に示す撮像収音部１１は耳介１１１ａ〜１１１ｄ、マイクロフォン１１２ａ〜１１２ｄ、撮像カメラ１１３ｂ、１１３ｃ、及び構造体１１７とで構成される。
図４に示す撮像収音部１１ａは耳介１１１ａ〜１１１ｈ、マイクロフォン１１２ａ〜１１２ｈ、撮像カメラ１１３ａ〜１１３ｄ、及び円筒型構造体１１６とで構成される。
図６に示すクロストークキャンセラ３４は、フィルタ３４１〜３４４、加算器３４５、３４６、フィルタ３４７及び３４８で構成される。
図７に示すフィルタ特性測定装置６は、パーソナルコンピュータ６１、増幅器６２、６５、６６、スピーカ６３、及び円筒型受音体６４で構成される。
図８（Ａ）、（Ｂ）に示す円筒型受音体６４はマイクロフォン６４１、６４２、及び円筒型構造体６４９より構成され、（Ｃ）に示す人頭型受音体６９はマイクロフォン６９１、６９２、耳道６９３、６９４、耳介６９５、６９６及び人頭型構造体６９９より構成される。 The configuration of the image pickup sound signal reproduction system will be described.
An image pickup sound collection signal reproduction system 10 shown in FIG. 1 includes an image pickup sound collection device 1 including an image pickup sound collection unit 11, a selection unit 12, a compression unit 13, and a transmission unit 14, a transmission path 2, a reception unit 31, and expansion. The playback device 3 includes a unit 32, a signal control unit 33, and a crosstalk canceller 34, and the display unit 4 includes speakers 41 and 43 and a monitor 42.
The imaging sound collection unit 11 illustrated in FIG. 2 includes an auricle 111a to 111d, microphones 112a to 112d, imaging cameras 113b and 113c, and a structure 117.
The imaging sound collection unit 11a illustrated in FIG. 4 includes an auricle 111a to 111h, microphones 112a to 112h, imaging cameras 113a to 113d, and a cylindrical structure 116.
The crosstalk canceller 34 shown in FIG. 6 includes filters 341 to 344, adders 345 and 346, and filters 347 and 348.
The filter characteristic measuring apparatus 6 shown in FIG. 7 includes a personal computer 61, amplifiers 62, 65, 66, a speaker 63, and a cylindrical sound receiver 64.
A cylindrical sound receiving body 64 shown in FIGS. 8A and 8B includes microphones 641 and 642 and a cylindrical structure 649, and a human head type sound receiving body 69 shown in FIG. 8C includes microphones 691 and 692. , Ear canals 693 and 694, auricles 695 and 696, and a human head structure 699.

まず、監視領域に設置される撮像収音装置１の撮像収音部１１は、その撮像収音部１１の前方、左方、後方、及び右方の４方向を監視領域とする監視画像を撮影すると共に、複数の方向を正面方向とした複数のバイノーラル音声を収音する。選択部１２は、再生装置３の信号制御部３３により制御され、前方の画像を主画像、他の左方、後方、及び右方の３方向の画像を副画像として認識すると共に、撮像収音部１１で収音された複数のバイノーラル音声のうち主画像が撮影される方向を正面として収音されるバイノーラル音声を音声信号として選択し出力する。撮像収音部１１で収音された複数の画像及びバイノーラル音声は圧縮部１３で圧縮符号化され、送出部１４から伝送路２に送出される。再生装置３の受信部３１は撮像収音装置１から送出された画像信号及び音声信号を受信し、伸長部３２で圧縮符号化された複数の画像及びバイノーラル音声を復号化して得る。得られた複数の画像は信号制御部３３で主画像および副画像で構成されるマルチ画像を生成してモニタ４２に供給して表示させると共に、得られたバイノーラル音声はクロストークキャンセラ３４でクロストークキャンセル処理がなされ左右のスピーカ４１、４３に供給されて発音される。視聴者はモニタ４２に表示されるマルチ画像を、主画像を正面方向として収音されたバイノーラル音声により視聴する。 First, the imaging / sound collecting unit 11 of the imaging / sound-collecting apparatus 1 installed in the monitoring area captures a monitoring image with the four directions of the imaging / sound collecting unit 11 in front, left, rear, and right as monitoring areas. In addition, a plurality of binaural sounds having a plurality of directions as front directions are collected. The selection unit 12 is controlled by the signal control unit 33 of the playback device 3 to recognize a front image as a main image and other left, rear, and right three-direction images as sub-images, and to capture and collect images. Of the plurality of binaural sounds picked up by the unit 11, the binaural sound picked up with the direction in which the main image is taken as the front is selected and output as a sound signal. A plurality of images and binaural sound collected by the imaging sound collection unit 11 are compression-coded by the compression unit 13 and sent from the sending unit 14 to the transmission path 2. The receiving unit 31 of the reproduction apparatus 3 receives the image signal and the audio signal sent from the imaging / acquisition / collection apparatus 1, and decodes and obtains a plurality of images and binaural audio that have been compression-encoded by the expansion unit 32. The signal control unit 33 generates a multi-image including a main image and a sub-image and supplies it to the monitor 42 for display. The obtained binaural sound is cross-talked by the cross-talk canceller 34. Canceling processing is performed, and the sound is supplied to the left and right speakers 41 and 43 to be sounded. The viewer views the multi-image displayed on the monitor 42 by binaural sound collected with the main image as the front direction.

バイノーラル音声は後述のクロストークキャンセラ３４でクロストークキャンセル処理がなされ、スピーカ４１、４３から発音される。クロストークキャンセル処理がなされたバイノーラル音声は、左側（右側）のスピーカから発音されて視聴者の右側（左側）の耳で受音されるクロストーク音を打ち消すための打消し信号を予め付加した信号としてスピーカに供給され発音される。従って、視聴者は、スピーカ再生でありながらバイノーラル音声信号をヘッドフォン受聴したと同様な臨場感豊かなバイノーラル音声として受聴できる。 The binaural sound is subjected to crosstalk cancellation processing by a crosstalk canceller 34, which will be described later, and is produced from the speakers 41 and 43. The binaural sound that has been subjected to the crosstalk cancellation processing is a signal in which a cancellation signal for canceling the crosstalk sound generated by the left (right) speaker and received by the viewer's right (left) ear is added in advance. Is supplied to the speaker and pronounced. Accordingly, the viewer can listen to the binaural audio signal as rich binaural sound similar to the case of listening to the binaural audio signal with headphones while reproducing the speaker.

次に、例えば、受聴音の左側から異音が受聴された場合に、受聴者は左方向で撮影される画像を主画像とするための操作を信号制御部３３に対して行う。信号制御部３３は選択部１２に左方向画像を主画像とする制御信号を伝送する。選択部１２は左側の画像を主画像とし、他の３画像を副画像として認識すると共に、撮像収音部１１の左側を正面として収音したバイノーラル音声を得て上記と同様に撮像収音装置１から送出する。モニタ４２には左方向を正面する主画像が表示され、スピーカ４１、４３からは左方向を正面としたバイノーラル音声が再生される。異常音の生じた方向を主画像とし、主画像の方向を正面とするバイノーラル音声により監視領域の監視ができる。 Next, for example, when an abnormal sound is received from the left side of the listening sound, the listener performs an operation on the signal control unit 33 to set the image captured in the left direction as the main image. The signal control unit 33 transmits a control signal having the left direction image as the main image to the selection unit 12. The selection unit 12 recognizes the left image as a main image and the other three images as sub-images, obtains binaural sound collected with the left side of the imaging sound collection unit 11 as the front, and obtains an imaging sound collection device in the same manner as described above. Send from 1. The monitor 42 displays a main image in front of the left direction, and binaural sounds with the left direction in front are reproduced from the speakers 41 and 43. The monitoring area can be monitored by binaural sound with the direction in which the abnormal sound is generated as the main image and the direction of the main image as the front.

次に、詳細に説明する。
図２を参照し、撮像収音装置１の撮像収音部１１について述べる。
同図（Ａ）は撮像収音部１１の立面図であり、構造体１１７の周囲に備えられている４個の撮像カメラ１１３ａ〜１１３ｄのそれぞれは４つの監視領域の撮像を行う。構造体１１７は音響的に視聴者の頭部とほぼ同等な音響信号の遮蔽特性を有している。その構造体１１７の周囲には耳介１１１ａ〜１１１ｄとマイクロフォン１１２ａ〜１１２ｆが備えられており、それらの耳介やマイクロフォンのうち対向して配置される２つの耳介とマイクロフォンが用いられて、４方向のうちの２方向を正面とする２つのバイノーラル音声を収音する。
（Ｂ）は撮像収音部１１の正面図である。撮像カメラ１１３ｃを主画像撮影用カメラとして設定する場合には耳介１１１ｂ、１１１ｄとマイクロフォン１１２ｂ、１１２ｆが用いられてバイノーラル音声が収音される。収音されるバイノーラル音声は撮像カメラ１１３ｃが向けられる方向を正中面とするバイノーラル音声である。 Next, this will be described in detail.
The imaging sound collection unit 11 of the imaging sound collection device 1 will be described with reference to FIG.
FIG. 4A is an elevation view of the image pickup sound collecting unit 11, and each of the four image pickup cameras 113 a to 113 d provided around the structure 117 performs image pickup of four monitoring areas. The structure 117 has an acoustic signal shielding characteristic that is substantially equivalent to that of the viewer's head. Around the structure 117, auricles 111 a to 111 d and microphones 112 a to 112 f are provided, and two auricles and microphones that are arranged to face each other are used. Two binaural sounds are picked up with two of the directions in front.
FIG. 4B is a front view of the imaging sound pickup unit 11. When the imaging camera 113c is set as a main image capturing camera, binaural sounds are collected using the auricles 111b and 111d and the microphones 112b and 112f. The binaural sound to be collected is binaural sound with the median plane in the direction in which the imaging camera 113c is directed.

図３を参照し、撮像収音部１１の耳介についてさらに述べる。
同図（Ａ）は図２（Ａ）に比し、撮像カメラ１１３ｂを主画像撮影用カメラとして設定する場合にバイノーラル収音に用いられる耳介とマイクロフォンのみを示した図である。撮影用カメラ１１３ｂの向けられる方向が正面である。構造体１１７の側面に備えられる耳介１１１ａ、１１１ｃ、マイクロフォン１１２ａ、１１２ｃは、撮像カメラ１１３ｂの正面で撮像される被写体から発せられる音声を前方の音声としてバイノーラル収音を行う。耳介１１１ａ及び１１１ｃは同図の左側に位置される。構造体１１７の右側に位置する被写体から発せられる音は耳介１１１ａ、１１１ｃにより集音されてマイクロフォン１１２ａ、１１２ｃに入力される。構造体１１７の左側（後方）から発せられる音声は、耳介１１１ａ、１１１ｃにより特に高域周波数成分が減衰されてマイクロフォン１１２ａ、１１２ｃに入力される。 With reference to FIG. 3, the auricle of the imaging sound pickup unit 11 will be further described.
FIG. 2A shows only the auricle and microphone used for binaural sound collection when the imaging camera 113b is set as the main image capturing camera, as compared with FIG. 2A. The direction in which the photographing camera 113b is directed is the front. The auricles 111a and 111c and the microphones 112a and 112c provided on the side surface of the structure 117 perform binaural sound collection using sound emitted from a subject imaged in front of the imaging camera 113b as front sound. The auricles 111a and 111c are located on the left side of the figure. Sounds emitted from the subject located on the right side of the structure 117 are collected by the auricles 111a and 111c and input to the microphones 112a and 112c. The sound emitted from the left side (rear side) of the structure 117 is input to the microphones 112a and 112c after the high frequency components are attenuated by the auricles 111a and 111c.

同図（Ｃ）は主画像撮影用カメラが撮像カメラ１１３ｂから撮像カメラ１１３ｄに切り替えられた場合の耳介１１１ｃの動きを示す。耳介１１ｃは球を４分割した形状をしており、その耳介１１ｃは図中１点鎖線で示す中心線を中心とし９０度回転する。耳介１１ｃが９０度回転した状態では、マイクロフォン１１２ａ、及び１１２ｃから出力されるバイノーラル音声の左右が切り替わる。耳介１１１ａ、１１１ｃ、マイクロフォン１１２ａ、及び１１２ｃは撮像カメラ１１３ｄで撮像される被写体から発せられる音声を前方の音声としてバイノーラル収音を行う。 FIG. 3C shows the movement of the auricle 111c when the main image capturing camera is switched from the imaging camera 113b to the imaging camera 113d. The auricle 11c has a shape obtained by dividing a sphere into four parts, and the auricle 11c rotates 90 degrees around a center line indicated by a one-dot chain line in the figure. In the state where the pinna 11c is rotated 90 degrees, the left and right of the binaural sound output from the microphones 112a and 112c are switched. The auricles 111a and 111c and the microphones 112a and 112c perform binaural sound collection using a sound emitted from a subject imaged by the imaging camera 113d as a front sound.

図４を参照し、撮像収音部の応用例について述べる。
同図に示す撮像収音部１１ａは円筒型構造体１１６の上部の円周上には耳介１１１ａ〜１１１ｄ、マイクロフォン１１２ａ〜１１２ｄが、中部の円周上には撮像カメラ１１３ａ〜１１３ｄが、そして下部の円周上には耳介１１１ｅ〜１１１ｈ、マイクロフォン１１２ｅ〜１１２ｈが備えられている。ここで耳介１１１ａ〜１１１ｄ、と耳介１１１ｅ〜１１１ｈとはお互いに反対方向が向けられている。 An application example of the imaging sound collection unit will be described with reference to FIG.
The image pickup / sound pickup unit 11a shown in the figure has the auricles 111a to 111d and the microphones 112a to 112d on the circumference of the upper part of the cylindrical structure 116, the image pickup cameras 113a to 113d on the circumference of the center, and Auricles 111e to 111h and microphones 112e to 112h are provided on the lower circumference. Here, the auricles 111a to 111d and the auricles 111e to 111h are directed in opposite directions.

図２に示した撮像収音部１１の耳介は可動して収音方向を反転させるのに比し、図４に示す撮像収音部１１ａの耳介は４つの収音方向に対してそれぞれ４対の耳介とマイクロフォンとを備えているため、バイノーラル音声の切り替えはそれぞれのマイクロフォンから出力される音声信号を切り替えるのみで実行できる。撮像収音部１１は各４個の耳介１１１ａ〜１１１ｄ及びマイクロフォン１１２ａ〜１１２ｄを用いるのみで４方向のバイノーラル音声を収音できる。撮像収音部１１ａは各８個の耳介１１１ａ〜１１１ｈ及びマイクロフォン１１２ａ〜１１２ｈを用い４方向のバイノーラル音声を収音する。撮像収音部１１ａは撮像収音部１１に比しさらに４個のマイクロフォン等を備えているため、収音方向の切り替え時に耳介１１１ａ〜１１１ｈを回転させる必要がなく、耳介の切替機構を有しなく、切り替えは瞬時であり、耳介の切り替え時に生じる摩擦音も生じない。 The auricle of the imaging sound collection unit 11 shown in FIG. 2 is movable and reverses the sound collection direction, whereas the auricle of the imaging sound collection unit 11a shown in FIG. Since four pairs of auricles and microphones are provided, binaural audio can be switched by simply switching audio signals output from the respective microphones. The imaging sound collection unit 11 can collect binaural sound in four directions only by using the four auricles 111a to 111d and the microphones 112a to 112d. The imaging sound collecting unit 11a collects binaural sounds in four directions using eight auricles 111a to 111h and microphones 112a to 112h. Since the imaging sound collecting unit 11a further includes four microphones and the like as compared with the imaging sound collecting unit 11, it is not necessary to rotate the auricles 111a to 111h when switching the sound collecting direction, and a pinna switching mechanism is provided. Without switching, the switching is instantaneous and no frictional sound is generated when the pinna is switched.

図５を参照してモニタ４２に表示されるマルチ画像について述べる。
同図（Ａ）は大画面表示領域の（ａ）と、小画面表示領域の（ｂ）〜（ｄ）を有する。表示領域（ａ）〜（ｄ）のそれぞれには撮像カメラ１１３ａ〜１１３ｄから出力される映像信号が表示される。撮像カメラ１１３ａから出力される映像信号を主画像として指定し、撮像カメラ１１３ｂ〜１１３ｄの映像信号を副画像として選択した場合の表示例である。同図（Ｂ）は大画面表示領域が（ｂ）であり、小画面表示領域は（ａ）、（ｃ）、（ｄ）である。撮像カメラ１１３ｂから出力される映像信号が主画像として指定され、他の撮像カメラから出力される映像信号が副画像として選択された場合の表示領域設定例である。 A multi-image displayed on the monitor 42 will be described with reference to FIG.
FIG. 5A has (a) a large screen display area and (b) to (d) small screen display areas. Video signals output from the imaging cameras 113a to 113d are displayed in the display areas (a) to (d), respectively. This is a display example when the video signal output from the imaging camera 113a is designated as a main image and the video signals of the imaging cameras 113b to 113d are selected as sub-images. In FIG. 5B, the large screen display area is (b), and the small screen display areas are (a), (c), and (d). This is a display area setting example when a video signal output from the imaging camera 113b is designated as a main image and a video signal output from another imaging camera is selected as a sub-image.

図６を参照してクロストークキャンセラについて述べる。
まず、バイノーラル音声信号のうち左チャンネルの信号Ｐ_L（ｔ）はフィルタ３４１、３４２に、右チャンネルの信号Ｐ_R（ｔ）はフィルタ３４３、３４４に入力される。フィルタ３４１〜３４４は順に、後記の頭部伝達関数に係る特性ｈ_rs（ｔ）、−ｈ_lo（ｔ）、−ｈ_ro（ｔ）、ｈ_ls（ｔ）の電気音響変換特性を有し、入力される信号にそれらの特性を与える。加算器３４５はフィルタ３４１及び３４３から出力される信号を加算し、フィルタ３４７は加算された信号にｄ（ｔ）の特性を与える。加算器３４６はフィルタ３４２及び３４４から出力される信号を加算し、フィルタ３４８は加算された信号にｄ（ｔ）の特性を与える。フィルタ３４７、３４８から出力される信号はクロストークキャンセラ３４の出力信号である。それらの出力信号は図示しないアンプで増幅され、スピーカ４１、４３に供給され発音される。 The crosstalk canceller will be described with reference to FIG.
First, of the binaural audio signal, the left channel signal P _L (t) is input to the filters 341 and 342, and the right channel signal P _R (t) is input to the filters 343 and 344. The filters 341 to 344 sequentially have electroacoustic conversion characteristics of characteristics h _rs (t), −h _lo (t), −h _ro (t), and h _ls (t) related to the head-related transfer functions described later. Give these characteristics to the input signal. The adder 345 adds the signals output from the filters 341 and 343, and the filter 347 gives a characteristic of d (t) to the added signal. The adder 346 adds the signals output from the filters 342 and 344, and the filter 348 gives a characteristic of d (t) to the added signal. Signals output from the filters 347 and 348 are output signals of the crosstalk canceller 34. These output signals are amplified by an amplifier (not shown), supplied to speakers 41 and 43, and sounded.

スピーカ４１から発音された信号は視聴者４８の左耳で受聴されると共に、発音された信号の一部は破線にて示す第１のクロストーク信号として視聴者４８の右耳で受聴される。クロストークキャンセラ３４は受聴される第１のクロストーク信号を打ち消すための第１のクロストーク打消し信号を生成してスピーカ４３から発音させる。第１のクロストーク打消し信号により右耳で受聴されるクロストーク信号は打ち消され（減衰され）る。同様にして、スピーカ４３から発音され、視聴者４８の左耳で受聴される破線にて示す第２のクロストーク信号を打ち消すための第２のクロストーク打消し信号はスピーカ４１から発音される。 A signal sounded from the speaker 41 is received by the viewer's 48 left ear, and part of the sounded signal is received by the viewer's 48 right ear as a first crosstalk signal indicated by a broken line. The crosstalk canceller 34 generates a first crosstalk cancel signal for canceling the received first crosstalk signal and causes the speaker 43 to generate a sound. The crosstalk signal received by the right ear is canceled (attenuated) by the first crosstalk cancellation signal. Similarly, a second crosstalk cancellation signal for canceling the second crosstalk signal, which is generated by the speaker 43 and is received by the viewer 48 with the left ear of the viewer 48, is generated from the speaker 41.

なお、スピーカ４３から発音される第１のクロストーク打消し信号は視聴者４８の左耳でさらにクロストーク成分として受聴される。第１のクロストーク打消し信号のクロストーク成分は、左チャンネルのバイノーラル音声信号Ｐ_L（ｔ）と共に左耳で視聴されることになる。両者は類似する信号であるため受聴品質に与える影響は少ない。また、スピーカ４１から発音される第２のクロストーク打消し信号のクロストーク成分は、右チャンネルのバイノーラル音声信号Ｐ_R（ｔ）と共に右耳で視聴される。同様に、第２のクロストーク打消し信号のクロストーク成分が受聴品質に与える影響は少ない。 Note that the first crosstalk cancellation signal generated from the speaker 43 is further received as a crosstalk component by the viewer's 48 left ear. The crosstalk component of the first crosstalk cancellation signal is viewed with the left ear together with the binaural audio signal P _L (t) of the left channel. Since both are similar signals, there is little effect on listening quality. The crosstalk component of the second crosstalk cancellation signal generated from the speaker 41 is viewed with the right ear together with the binaural audio signal P _R (t) of the right channel. Similarly, the influence of the crosstalk component of the second crosstalk cancellation signal on the listening quality is small.

図７を用い、フィルタ３４１〜３４４、３４７、及び３４８に格納する頭部伝達関数特性を求めるためのフィルタ特性測定装置６につき説明する。
まず、パソコン６１により例えばレイズドコサインの波形を有するインパルス音などの測定信号を生成する。測定信号は増幅器６２で増幅される。スピーカ６３から発音された測定信号は円筒型受音体６４に付される左右のマイクユニット６４１、６４２で受音される。受音して得られた左右の信号は増幅器６６、６７で増幅されパソコン６１に入力される。パソコン６１は生成した測定信号と受音した信号とを比較して、スピーカ６３から発音され、円筒型受音体６４に付される左右のマイクユニット６４１、６４２で受音される信号の頭部伝達関数ｈ_ls（ｔ）及びｈ_lo（ｔ）を求める。ｈ_ls（ｔ）は左側のスピーカから発音された信号が左側のマイクユニットで受聴される特性であり、ｈ_lo（ｔ）は左側のスピーカから発音された信号が右側のマイクユニットで受聴されるクロストークに係る特性である。 The filter characteristic measuring apparatus 6 for obtaining the head-related transfer function characteristics stored in the filters 341 to 344, 347, and 348 will be described with reference to FIG.
First, a measurement signal such as an impulse sound having a raised cosine waveform is generated by the personal computer 61. The measurement signal is amplified by the amplifier 62. The measurement signal generated from the speaker 63 is received by the left and right microphone units 641 and 642 attached to the cylindrical sound receiver 64. The left and right signals obtained by receiving the sound are amplified by the amplifiers 66 and 67 and input to the personal computer 61. The personal computer 61 compares the generated measurement signal with the received signal, and the head of the signal generated by the speaker 63 and received by the left and right microphone units 641 and 642 attached to the cylindrical sound receiver 64. Transfer functions h _ls (t) and h _lo (t) are obtained. h _ls (t) is a characteristic in which a signal generated from the left speaker is received by the left microphone unit, and h _lo (t) is a signal generated by the left speaker received by the right microphone unit. This is a characteristic related to crosstalk.

ここで、図７には左側のスピーカ６３のみを示しているが、右側のスピーカについては左右対称な位置にスピーカを配置して同様に測定し、頭部伝達関数ｈ_ro（ｔ）及びｈ_rs（ｔ）が求められる。ｈ_rs（ｔ）は右側のスピーカから発音されて右側のマイクユニットで受聴される特性であり、ｈ_ro（ｔ）は右側のスピーカから発音されて左側のマイクユニットで受聴される特性である。 Here, only the left speaker 63 is shown in FIG. 7, but the right speaker is placed in a symmetrical position and measured in the same manner, and the head related transfer functions h _ro (t) and h _rs are measured. (T) is determined. h _rs (t) is a characteristic generated by the right speaker and heard by the right microphone unit, and h _ro (t) is a characteristic generated by the right speaker and received by the left microphone unit.

図８を用いて円筒型受音体６４について述べる。（Ａ）は上面図であり、（Ｂ）は斜視図である。
同図に示す円筒型受音体６４の両側にマイクユニット６４１、６４２が備えられている。それぞれのマイクユニット６４１、６４２は耳介及び外耳道を有していない。マイクユニット６４１、６４２の図示しない振動板は円筒型受音体６４の表面に配置されている。参考として示した（Ｃ）はいわゆる人頭型受音体の上面図である。その人頭型受音体６９は、人頭型構造体６９９の両側に耳介形部材６９５、６９６、耳道６９３、６９４が備えられており、耳道の奥側にマイク６９１、６９２が備えられている。マイク６９１、６９２は人の鼓膜の位置に相当する位置に設けられており、人が聴取する音に近似する音声信号を収音しようとしている。 The cylindrical sound receiver 64 will be described with reference to FIG. (A) is a top view and (B) is a perspective view.
Microphone units 641 and 642 are provided on both sides of the cylindrical sound receiving body 64 shown in FIG. Each microphone unit 641 and 642 does not have an auricle or an external auditory canal. The diaphragms (not shown) of the microphone units 641 and 642 are disposed on the surface of the cylindrical sound receiving body 64. (C) shown as a reference is a top view of a so-called human head type sound receiver. The human head type sound receiving body 69 is provided with pinna members 695 and 696 and ear canals 693 and 694 on both sides of the human head structure 699, and microphones 691 and 692 are provided on the back side of the ear canal. It has been. The microphones 691 and 692 are provided at positions corresponding to the positions of the human eardrum, and attempt to pick up sound signals that approximate the sound that a person listens to.

（Ａ）（Ｂ）に示したマイクユニット６４１、６４２が付される円筒型受音体６４は、（Ｃ）に示した人頭型受音体６９に比し耳介や外耳道を備えていない。円筒型構造体６４９に付されるマイクユニット６４１、６４２の受音特性は、人それぞれで形状が異なる耳介や外耳道により与えられる特性差の影響を受けることがなく、頭部伝達関数を計測できる特性になっている。マイクユニット６４１、６４２が用いられて計測される特性はスピーカ６３から発音された音波が円筒型構造体６４９により遮蔽されるものの、円筒型構造体６４９の表面に添って回折してマイクユニット６４２に到来する音波の特性として計測される。円筒型構造体６４９を用いることにより、複数の頭部形状の視聴者が有する頭部遮蔽特性の異なりに対して平均的な頭部遮蔽特性を有する頭部伝達関数を得ることができる。 The cylindrical sound receiver 64 to which the microphone units 641 and 642 shown in (A) and (B) are attached does not have an auricle or an external auditory canal compared to the human head-type sound receiver 69 shown in (C). . The sound receiving characteristics of the microphone units 641 and 642 attached to the cylindrical structure 649 are not affected by the characteristic difference given by the pinna or the ear canal having different shapes, and the head-related transfer function can be measured. It is a characteristic. The characteristics measured by using the microphone units 641 and 642 are that the sound wave generated from the speaker 63 is shielded by the cylindrical structure 649, but diffracts along the surface of the cylindrical structure 649 and is reflected on the microphone unit 642. It is measured as a characteristic of the incoming sound wave. By using the cylindrical structure 649, it is possible to obtain a head-related transfer function having an average head shielding characteristic with respect to different head shielding characteristics of a plurality of head-shaped viewers.

図９に、音響信号伝達特性測定装置６によりインパルス音を用いて測定して得られた頭部伝達関数に係るそれぞれの波形を示す。（Ａ）〜（Ｄ）は順にｈ_ls（ｔ）、ｈ_lo（ｔ）、ｈ_rs（ｔ）、ｈ_ro（ｔ）の波形図である。（Ｅ）はｄ（ｔ）の波形図である。縦軸は所定の出力電圧でノーマライズした信号電圧の振幅であり、横軸は測定された信号を４８ｋＨｚで標本化した場合のサンプルの個数で示した時間である。
図１０（Ａ）〜（Ｅ）に、図９に示した信号をフーリエ解析して得られる周波数特性を示す。図中１００Ｈｚ、１ｋＨｚ、１０ｋＨｚの周波数位置を破線で示している。応答特性における破線と破線との間の利得差は１０ｄＢである。 FIG. 9 shows respective waveforms related to the head-related transfer function obtained by measurement using the impulse sound by the acoustic signal transfer characteristic measuring device 6. (A)-(D) are waveform diagrams of h _ls (t), h _lo (t), h _rs (t), and h _ro (t) in this order. (E) is a waveform diagram of d (t). The vertical axis represents the amplitude of the signal voltage normalized with a predetermined output voltage, and the horizontal axis represents the time indicated by the number of samples when the measured signal is sampled at 48 kHz.
10A to 10E show frequency characteristics obtained by Fourier analysis of the signal shown in FIG. In the figure, the frequency positions of 100 Hz, 1 kHz, and 10 kHz are indicated by broken lines. The gain difference between the broken line and the broken line in the response characteristic is 10 dB.

フィルタ３４１〜３４４の特性ｈ_rs（ｔ）、−ｈ_lo（ｔ）、−ｈ_ro（ｔ）、ｈ_ls（ｔ）は音響信号伝達特性測定装置６で測定して得られた特性ｈ_ls（ｔ）及びｈ_rs（ｔ）と、ｈ_lo（ｔ）及びｈ_ro（ｔ）の符号を反転した特性（逆極性）を用いる。
フィルタ３４７及び３４８の特性ｄ（ｔ）は次式により与えられる。
ｄ（ｔ）＝｛ｈ_ls（ｔ）＊ｈ_rs（ｔ）−ｈ_lo（ｔ）＊ｈ_ro（ｔ）｝^-1
ここで、「＊」は乗算を示す算術記号である。
以上により求められたそれぞれのフィルタ特性は、それらのフィルタ中に設けられる図示しない記憶部に格納され、その記憶部から読み出した特性を入力される信号に畳み込む様にして所定のフィルタ特性を与える。 Characteristic of the filter _{341~344 h rs (t), -} h lo (t), - h ro (t), h ls (t) is obtained by measuring an acoustic signal transfer characteristics measuring device 6 characteristic h _ls ( The characteristics (reverse polarity) obtained by inverting the signs of t) and h _rs (t) and h _lo (t) and h _ro (t) are used.
The characteristic d (t) of the filters 347 and 348 is given by
d (t) = {h _ls (t) * h _rs (t) −h _lo (t) * h _ro (t)} ⁻¹
Here, “*” is an arithmetic symbol indicating multiplication.
Each filter characteristic obtained as described above is stored in a storage unit (not shown) provided in the filter, and a predetermined filter characteristic is given by convolving the characteristic read from the storage unit with an input signal.

図１１、図１２に、円筒型構造体６４９に付されるマイクユニット６４１、６４２の代わりに図８の（Ｃ）に示した人頭型受音体６９を用いて測定した特性を示す。
図１１は図９と同様に測定して得られた特性である。人頭型受音体６９を用いて測定したインパルス応答波形は振動を多く含む波形であるのに比し、マイクユニット６４１、６４２を用いて得た波形は測定用入力信号であるレイズドコサインに近い波形である。負の電圧部分の振幅が小さい波形であり、それに続く振動波形の振幅も小さい。 FIG. 11 and FIG. 12 show characteristics measured using the human head-type sound receiver 69 shown in FIG. 8C instead of the microphone units 641 and 642 attached to the cylindrical structure 649.
FIG. 11 shows characteristics obtained by measurement in the same manner as in FIG. The impulse response waveform measured using the human head-type sound receiver 69 is closer to the raised cosine, which is the measurement input signal, than the waveform obtained using the microphone units 641 and 642, compared to the waveform containing a lot of vibration. It is a waveform. The negative voltage portion has a small amplitude, and the subsequent vibration waveform has a small amplitude.

図１２は人頭型受音体６９を用いて測定した図１１に対応する周波数応答特性である。円筒型受音体６４に付されるマイクユニット６４１、６４２を用いて得た特性の方が周波数特性の乱れが小さく平坦に近い。図１２に示す（Ａ）〜（Ｅ）のそれぞれの特性は１．５〜７ｋＨｚの周波数の間で増強及び減衰がなされているのに比し、図１０に示す特性の方が増強及び減衰のレベルは小さい。耳介や外耳道により生じる特性の乱れがないためである。マイクユニット６４１、６４２を用いる場合は、スピーカ６３より発音された音波の一部が耳介で反射され、反射された音波と直接到来する音波とが同相で合成されて増強されたり、逆相で合成されて減衰したりする影響がなく、更に外耳道の共振、反共振などの影響により特定の周波数で増強されたり減衰したりすることの小さい特性として得られている。 FIG. 12 shows frequency response characteristics corresponding to FIG. 11 measured using the human head type sound receiver 69. The characteristics obtained by using the microphone units 641 and 642 attached to the cylindrical sound receiving body 64 are less flat in the frequency characteristics and are almost flat. Each characteristic of (A) to (E) shown in FIG. 12 is enhanced and attenuated between frequencies of 1.5 to 7 kHz, and the characteristic shown in FIG. 10 is more enhanced and attenuated. The level is small. This is because there is no disturbance in characteristics caused by the pinna or the external auditory canal. When the microphone units 641 and 642 are used, a part of the sound wave generated by the speaker 63 is reflected by the auricle, and the reflected sound wave and the directly arriving sound wave are combined in the same phase to be enhanced or reversed phase. It has no effect of being synthesized and attenuated, and is obtained as a small characteristic that is further enhanced or attenuated at a specific frequency due to the effects of resonance and anti-resonance of the ear canal.

フィルタ３４１〜３４４の特性の異なりにより得られるクロストークキャンセル特性の差を比較して説明する。
クロストークキャンセル特性の測定は図７に示したバイノーラル音声スピーカ再生システムを用いて行った。同システムにおけるクロストークキャンセラ３４のフィルタ３４１〜３４４及びフィルタ３４７、３４８の特性は円筒型受音体６４に付されるマイクユニット６４１、６４２を用いて得られた特性と、人頭型受音体６９を用いて得られた特性とを入れ替えながら複数の視聴者による視聴テストを行った。視聴者は耳道に細い小型マイクを挿入し、その小型マイクで受音される音が視聴者により受聴される音であるとして測定した。フィルタ３４１〜３４４及びフィルタ３４７、３４８の特性を円筒型受音体６４に付されるマイクユニット６４１、６４２で得られた特性を用いて測定した場合は１００Ｈｚ〜２ｋＨｚで２０ｄＢを越える分離性能（クロストークキャンセル効果）が得られているのに比し、人頭型受音体６９で得られた特性を用いる場合では１２ｄＢ程度の分離度しか得られなかった。 A description will be given by comparing the difference in the crosstalk cancellation characteristics obtained by the difference in the characteristics of the filters 341 to 344.
The crosstalk cancellation characteristic was measured using the binaural audio speaker reproduction system shown in FIG. The characteristics of the filters 341 to 344 and the filters 347 and 348 of the crosstalk canceller 34 in the system are the characteristics obtained using the microphone units 641 and 642 attached to the cylindrical sound receiver 64, and the human head sound receiver. A viewing test by a plurality of viewers was performed while exchanging the characteristics obtained by using 69. The viewer inserted a thin small microphone into the ear canal and measured that the sound received by the small microphone was the sound received by the viewer. When the characteristics of the filters 341 to 344 and the filters 347 and 348 are measured using the characteristics obtained by the microphone units 641 and 642 attached to the cylindrical sound receiving body 64, the separation performance (crossover) exceeding 100 dB at 100 Hz to 2 kHz. Compared to the case where the talk canceling effect is obtained, when the characteristics obtained with the human head type sound receiving body 69 are used, only a separation degree of about 12 dB can be obtained.

十分な分離度が得られることは、クロストークをキャンセル信号が有効に動作していることであり、クロストークキャンセル信号が逆相成分として視聴されないことでもある。バイノーラル収音した信号は好適な臨場感を有して視聴者により聴取される。異常音が発生した場合に、その箇所がモニタ４２に表示される主画面に表示されない箇所の場合であっても視聴者は概略の発生箇所を認識できる。
ここで、図５に示したマルチ画面は再生装置３の信号制御部３３で生成されるとして述べた。そのマルチ画面は撮像収音装置１の選択部１２で生成するようにしても良い。その場合は、マルチ画面を圧縮部１３で圧縮符号化して伝送路２を伝送するため、副画面を伝送するための符号量は小さくできる。
また、撮像収音装置１内に映像信号の録画機能を備えて撮像収音部１１で撮像された副画面の映像を縮小せずに記録しておけば、記録画像を後の異音発生原因の解析に役立てられる。 Obtaining a sufficient degree of separation means that the crosstalk cancellation signal is operating effectively, and that the crosstalk cancellation signal is not viewed as a reverse phase component. The binaural signal is picked up by a viewer with a suitable presence. When an abnormal sound occurs, the viewer can recognize the approximate occurrence location even if the location is a location that is not displayed on the main screen displayed on the monitor 42.
Here, it has been described that the multi-screen shown in FIG. 5 is generated by the signal control unit 33 of the playback apparatus 3. The multi-screen may be generated by the selection unit 12 of the imaging sound pickup apparatus 1. In that case, since the multi-screen is compression-encoded by the compression unit 13 and transmitted through the transmission path 2, the code amount for transmitting the sub-screen can be reduced.
Further, if the image pickup / recording apparatus 1 has a video signal recording function and the sub-screen image picked up by the image pickup / collection unit 11 is recorded without being reduced, the recorded image is caused by the subsequent generation of abnormal noise. Useful for analysis.

以上のように、本実施例で示した撮像収音信号再生システムによれば、撮像収音装置１は、複数のカメラ１１３ａ〜１１３ｄと、複数のカメラのうち１つを撮影用カメラとする場合に撮影用カメラに対して所定角度左方向及び右方向に配置され、かつ底面からの高さが同一な一対の左右マイクロフォンで収音したバイノーラル音声を撮影用カメラで撮影した映像信号に付随する音声信号として複数のカメラのそれぞれに対するバイノーラル音声を収音する複数のマイクロフォン１１２ａ〜１１２ｈを備えた円筒部１１６を有する撮像収音部１１と、複数のカメラから得られる複数の映像信号のうちから再生装置側において主映像信号として設定された映像信号に付随する音声信号を選択音声信号として出力する選択部１２と、を備え、再生装置３は、複数の映像信号のうちから１つを主映像信号として設定し、主映像信号として設定した映像信号から表示用の主映像信号を生成して表示手段４２に出力する映像信号出力部３３と、選択音声信号が供給され、選択音声信号を左右のスピーカによって再生した場合に左右のスピーカから再生された信号が視聴者の右左側の耳部で視聴されるクロストーク信号を打ち消すように処理して左右のスピーカに対して出力するクロストークキャンセラ３４と、を備え、クロストークキャンセラ３４は予め求めた頭部伝達関数を記憶したフィルタ３４１〜３４４を有し、頭部伝達関数は、測定信号を円筒型構造体６４９に装着した一対のマイクロフォン６４１、６４２で収音した音声信号に基づいて計測した頭部伝達関数であるので、視聴者による異常の認識や判断が確実に行え、異常が生じてから監視映像及び監視音声を即時に異常の生じた監視地点に切り替えて出力することを可能とする撮像収音信号再生システム１０を実現できる。 As described above, according to the imaging / acquisition / acquisition signal reproduction system shown in the present embodiment, the imaging / acquisition / acquisition apparatus 1 uses a plurality of cameras 113a to 113d and one of the plurality of cameras as a shooting camera. Audio that accompanies the video signal captured by the camera for binaural sound picked up by a pair of left and right microphones arranged at a predetermined angle left and right with respect to the camera for shooting and having the same height from the bottom Image pickup and pickup unit 11 having cylindrical portion 116 having a plurality of microphones 112a to 112h for collecting binaural sound for each of a plurality of cameras as a signal, and a playback device from a plurality of video signals obtained from the plurality of cameras And a selection unit 12 that outputs an audio signal accompanying the video signal set as the main video signal on the side as a selected audio signal. 3 is a video signal output unit 33 that sets one of a plurality of video signals as a main video signal, generates a main video signal for display from the video signal set as the main video signal, and outputs the generated main video signal to the display means 42. When the selected audio signal is supplied and the selected audio signal is reproduced by the left and right speakers, the signal reproduced from the left and right speakers cancels the crosstalk signal viewed by the viewer's right and left ears. And a crosstalk canceller 34 for outputting to the left and right speakers, and the crosstalk canceller 34 has filters 341 to 344 storing the head-related transfer functions obtained in advance. Is a head-related transfer function measured on the basis of audio signals picked up by a pair of microphones 641 and 642 attached to a cylindrical structure 649. Recognition and judgment reliably performed abnormality, the abnormality can be realized an image picked-up sound signal reproducing system 10 to allow the monitoring image and switch to the resulting monitoring point monitoring voice immediately abnormal output from occurring.

本発明の実施に係る撮像収音再生システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the imaging sound collection / reproducing system which concerns on implementation of this invention. 本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その１）である。It is FIG. (1) which shows the structural example of the principal part of the imaging sound collection / reproducing system which concerns on implementation of this invention. 本発明の実施に係る撮像収音再生システムの要部の構成例を説明するための図である。It is a figure for demonstrating the example of a structure of the principal part of the imaging sound collection / reproducing system based on implementation of this invention. 本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その２）である。It is FIG. (2) which shows the structural example of the principal part of the imaging sound-collecting / reproducing system which concerns on implementation of this invention. 本発明の実施に係る再生画像の表示例を示す図である。It is a figure which shows the example of a display of the reproduction | regeneration image which concerns on implementation of this invention. 本発明の実施に係る撮像収音再生システムの要部の構成例を示す図（その３）である。It is FIG. (3) which shows the structural example of the principal part of the imaging sound collection / reproduction system which concerns on implementation of this invention. 本発明の実施に係るクロストークキャンセラのフィルタ特性計測装置の構成例を示す図である。It is a figure which shows the structural example of the filter characteristic measuring apparatus of the crosstalk canceller which concerns on implementation of this invention. 本発明の実施に係るフィルタ特性測定時に用いるバイノーラル受音部の構成例を示す図である。It is a figure which shows the structural example of the binaural sound receiving part used at the time of the filter characteristic measurement which concerns on implementation of this invention. 本発明の実施に係るクロストークキャンセラのインパルス応答特性例を示す図である。It is a figure which shows the example of an impulse response characteristic of the crosstalk canceller which concerns on implementation of this invention. 本発明の実施に係るクロストークキャンセラの周波数応答特性例を示す図である。It is a figure which shows the example of a frequency response characteristic of the crosstalk canceller which concerns on implementation of this invention. 従来のクロストークキャンセラのインパルス応答特性例を参考に示した図である。It is the figure which showed for reference the example of the impulse response characteristic of the conventional crosstalk canceller. 従来のクロストークキャンセラの周波数応答特性例を参考に示した図である。It is the figure which showed the example of the frequency response characteristic of the conventional crosstalk canceller for reference.

Explanation of symbols

１撮像収音装置
２伝送路
３再生装置
４表示部
６フィルタ特性測定装置
１０撮像収音信号再生システム
１１、１１ａ撮像収音部
１２選択部
１３圧縮部
１４送出部
３１受信部
３２伸長部
３３信号制御部
３４クロストークキャンセラ
４１、４３、６３スピーカ
４２モニタ
６１パーソナルコンピュータ
６２、６５、６６増幅器
６４円筒型受音体
６９人頭型受音体
１１１ａ〜１１１ｈ耳介
１１２ａ〜１１２ｈマイクロフォン
１１３ａ〜１１３ｄ撮像カメラ
１１６円筒型構造体
１１７構造体
３４１〜３４４、３４７、３４８フィルタ
３４５、３４６加算器
６４１、６４２、６９１、６９２マイクロフォン
６４９円筒型構造体
６９３、６９４耳道
６９５、６９６耳介
６９９人頭型構造体

DESCRIPTION OF SYMBOLS 1 Imaging sound collection apparatus 2 Transmission path 3 Reproducing apparatus 4 Display part 6 Filter characteristic measurement apparatus 10 Imaging sound collection signal reproduction system 11, 11a Imaging sound collection part 12 Selection part 13 Compression part 14 Sending part 31 Receiving part 32 Expansion part 33 Signal Control unit 34 Crosstalk canceller 41, 43, 63 Speaker 42 Monitor 61 Personal computer 62, 65, 66 Amplifier 64 Cylindrical receiver 69 Human head receiver 111a-111h Auricle 112a-112h Microphone 113a-113d Imaging camera 116 Cylindrical structure 117 Structures 341 to 344, 347, 348 Filters 345, 346 Adders 641, 642, 691, 692 Microphone 649 Cylindrical structure 693, 694 Ear canal 695, 696 Auricle 699 Human head structure

Claims

An image pickup device that outputs a video signal picked up by a camera and an audio signal picked up by a microphone, and outputs the video signal supplied from the image pickup sound pickup device to a display device In an imaging / acquisition / acquisition signal reproduction system including a reproduction device that outputs the audio signal supplied from the imaging / acquisition / acquisition apparatus to left and right speakers,
The imaging sound pickup device is:
When a plurality of cameras and one of the plurality of cameras is used as a shooting camera, the pair of cameras are arranged at a predetermined angle leftward and rightward with respect to the shooting camera and have the same height from the bottom surface. An imaging receiver having a cylindrical portion having a plurality of microphones that pick up binaural sound for each of the plurality of cameras as a sound signal accompanying a binaural sound picked up by the left and right microphones as a video signal picked up by the photographing camera. The clef,
A selection unit that outputs, as a selection audio signal, an audio signal associated with the video signal set as a main video signal on the playback device side among the plurality of video signals obtained from the plurality of cameras;
The playback device
A video signal output unit that sets one of the plurality of video signals as the main video signal, generates a main video signal for display from the video signal set as the main video signal, and outputs the main video signal to a display unit;
When the selected audio signal is supplied and the selected audio signal is reproduced by the left and right speakers, the signal reproduced from the left and right speakers cancels the crosstalk signal viewed by the viewer's right and left ears. A crosstalk canceller for processing and outputting to the left and right speakers,
The crosstalk canceller has a filter that stores a head-related transfer function obtained in advance, and the head-related transfer function is measured based on an audio signal collected by a pair of microphones mounted on a cylindrical structure. An image pickup sound signal reproduction system characterized by having a head related transfer function.