JP2016152557A

JP2016152557A - Sound collection system and sound collection setting method

Info

Publication number: JP2016152557A
Application number: JP2015029920A
Authority: JP
Inventors: 宏之松本; Hiroyuki Matsumoto; 渡辺　周一; Shuichi Watanabe; 周一渡辺; 寿嗣辻; Toshitsugu Tsuji
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2015-02-18
Filing date: 2015-02-18
Publication date: 2016-08-22
Anticipated expiration: 2035-02-18
Also published as: JP6504539B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound collection system and a sound collection setting method that can properly form directivity at a predetermined imaging position and clearly output sounds at the predetermined imaging position even when the positional relationship between a camera and a microphone array is unknown.SOLUTION: When a microphone array device MA is secured from the rear side in a shop 10 where camera devices C1 to C4 are installed, in preset processing, sound sources are disposed at preset positions P1 to Pn as the imaging centers of the camera devices C1 to Cn, and sounds are produced. When the microphone array device MA collects sounds of a predetermined sound volume or more output from the sound sources and transmit the audio data thereof to an audio processing device 50, a sound source direction detector 52 makes a display 58 display sound source marks SD to SD4 representing the orientation directions (horizontal angle θ and vertical angle φ), thereby promoting a user to select an orientation direction and input camera information. The sound processing device 50 transmits the input camera information and the orientation direction to the microphone array device MA. The microphone array MA stores them as a preset information table 90 in the storage part 24.SELECTED DRAWING: Figure 7

Description

本発明は、収音された音声に対して既定位置に指向性を形成して出力する収音システム及び収音設定方法に関する。 The present invention relates to a sound collection system and a sound collection setting method for forming and outputting directivity at a predetermined position with respect to collected sound.

従来、工場、店舗（例えば小売店、銀行）や商店街、公共の場（例えば駅、図書館）の所定位置（例えば天井や壁）に設置される監視システムでは、ネットワークを介して複数のカメラ装置を接続し、監視対象の所定範囲の映像データ（静止画像及び動画像を含む。以下同様。）を、一箇所に設置された監視装置で監視することが行われている。 Conventionally, in a surveillance system installed at a predetermined position (for example, a ceiling or a wall) in a factory, a store (for example, a retail store or a bank), a shopping street, or a public place (for example, a station or a library), a plurality of camera devices are connected via a network. Are connected, and video data (including still images and moving images; the same applies hereinafter) within a predetermined range to be monitored is monitored by a monitoring device installed at one location.

しかし、映像だけの監視では、得られる情報量にどうしても限界があるので、音声による監視を行うために、音声データを得る監視システムの要請が高まっている。 However, since the amount of information that can be obtained is inevitably limited in video-only monitoring, there is an increasing demand for a monitoring system that obtains audio data in order to perform audio monitoring.

この要請に対し、一部のカメラ装置には、マイクロホンが装着され、映像データに音声データを乗せてネットワークに送信する製品がある。しかし、このような製品で使用されるマイクロホンは無指向性であることが多く、たとえ単一指向性のマイクロホンであってもその指向特性は広角である。従って、監視中に聞きたい音が騒音にかき消されて聞こえないことが多い。 In response to this request, some camera devices include a product that is equipped with a microphone and transmits audio data to video data. However, the microphones used in such products are often omnidirectional, and even if they are unidirectional microphones, their directional characteristics are wide-angle. Therefore, it is often the case that the sound desired to be heard during monitoring is drowned out by noise and cannot be heard.

近年、監視カメラシステムにおいて、異常音が発生した方向を検知したり、騒音に埋もれた中から特定の方向の音だけを聞き取りたいという要求に対応したマイクアレイの開発が進んでいる（例えば特許文献１参照）。特許文献１のマイクアレイは、複数のマイクロホンユニットを配置し、各マイクロホンユニットで収音された音声を用い、監視カメラ（例えば全方位カメラ）で撮像された映像が表示された画面に対する、ユーザの指定箇所からの音声を強調して再生するために、使用される。 In recent years, in a surveillance camera system, development of a microphone array that responds to a request to detect a direction in which an abnormal sound has occurred or to listen to only a sound in a specific direction from among noises has been developed (for example, Patent Documents). 1). The microphone array of Patent Document 1 includes a plurality of microphone units, uses sound collected by each microphone unit, and a user's screen with respect to a screen on which an image captured by a surveillance camera (for example, an omnidirectional camera) is displayed. Used to emphasize and reproduce audio from a specified location.

特開２０１４−１４３６７８号公報JP 2014-143678 A

特許文献１では、全方位カメラとマイクアレイとを同軸上に取り付けた場合には、映像と音声の座標が一致する（つまり、全方位カメラからの撮像方向とマイクアレイにより収音された音声が強調される方向とが同一となる）ので、特に問題は生じない。しかし、監視カメラとマイクアレイとが離れて取り付けられている場合には、互いの位置関係が不明である。このため、例えば初期設定の際に、監視カメラの座標系とマイクアレイの座標系との対応関係が得られていないと、監視中に監視カメラが撮像する既定位置に向かう方向に音声の指向性を形成することが困難であるという課題がある。 In Patent Document 1, when the omnidirectional camera and the microphone array are mounted on the same axis, the coordinates of the video and the audio coincide with each other (that is, the imaging direction from the omnidirectional camera and the audio collected by the microphone array are the same). The direction to be emphasized is the same), so no particular problem occurs. However, when the surveillance camera and the microphone array are mounted apart from each other, the positional relationship between them is unknown. For this reason, for example, when the correspondence between the coordinate system of the monitoring camera and the coordinate system of the microphone array is not obtained at the time of initial setting, the directivity of the sound in the direction toward the predetermined position captured by the monitoring camera during monitoring There is a problem that it is difficult to form.

特に、建物或いは構造物の強度や構造上の理由から、監視カメラやマイクアレイの取り付け位置が制約されると、実際の取り付け位置の情報は正確に得られない。このため、カメラの座標系とマイクアレイの座標系との対応関係を得るためには、取り付け位置や方向等を現場で計測しなければならず、専用の計測器の他、多大な手間のかかる作業が必要になってしまう。 In particular, if the installation position of the surveillance camera or the microphone array is restricted due to the strength of the building or the structure or the structural reason, information on the actual installation position cannot be obtained accurately. For this reason, in order to obtain the correspondence between the coordinate system of the camera and the coordinate system of the microphone array, it is necessary to measure the mounting position, direction, etc. on site, and it takes a lot of time and effort other than dedicated measuring instruments. Work becomes necessary.

本発明は、上述した従来の課題を解決するために、カメラとマイクアレイとの互いの位置関係が不明である場合でも、既定の撮像位置に指向性を適正に形成し、既定の撮像位置における音声を明瞭に出力する収音システム及び収音設定方法を提供することを目的とする。 In order to solve the above-described conventional problems, the present invention appropriately forms directivity at a predetermined imaging position even when the positional relationship between the camera and the microphone array is unknown, and at the predetermined imaging position. An object of the present invention is to provide a sound collection system and a sound collection setting method for outputting sound clearly.

本発明は、複数の収音素子を有し、前記収音素子により音声を収音する収音部と、既定位置を撮像する少なくとも１つの撮像部と、収音された音声の音声データを基に、前記収音部からの音源方向を検出する音源検出部と、前記既定位置における所定の音声出力に応じて、検出された前記収音部からの音源方向を表示する表示部と、表示された前記収音部からの音源方向の指定に応じて、前記既定位置を撮像する前記撮像部に関する情報の入力を受け付ける操作部と、入力された前記撮像部に関する情報と前記収音部からの音源方向とを対応付けた対応情報を記憶する記憶部と、を備える収音システムである。 The present invention has a plurality of sound collection elements, a sound collection section for collecting sound by the sound collection elements, at least one image pickup section for picking up an image of a predetermined position, and sound data of the collected sound. A sound source detection unit that detects a sound source direction from the sound collection unit, and a display unit that displays a detected sound source direction from the sound collection unit according to a predetermined sound output at the predetermined position. In response to the designation of the sound source direction from the sound collection unit, an operation unit that receives input of information related to the image pickup unit that images the predetermined position, the input information related to the image pickup unit, and the sound source from the sound collection unit And a storage unit that stores correspondence information in which directions are associated with each other.

また、本発明は、既定位置を撮像する少なくとも１つの撮像部と収音部とを含む収音システムにおける収音設定方法であって、前記既定位置に置かれた音源の所定出力音声を、複数の収音素子を有する前記収音部により収音するステップと、前記収音部により収音された音声の音声データを基に、前記収音部からの音源方向を検出するステップと、検出された前記収音部からの音源方向を表示部に表示するステップと、前記表示部に表示された前記収音部からの音源方向が指定されるステップと、前記音源方向の指定に応じて、前記既定位置を撮像する前記撮像部に関する情報が入力されるステップと、入力された前記撮像部に関する情報と前記収音部からの音源方向とを対応付けた対応情報を記憶部に記憶するステップと、を有する、収音設定方法である。 The present invention is also a sound collection setting method in a sound collection system including at least one imaging unit that images a predetermined position and a sound collection unit, wherein a plurality of predetermined output sounds of a sound source placed at the predetermined position are A step of collecting sound by the sound collecting unit having the sound collecting element, a step of detecting a sound source direction from the sound collecting unit based on sound data of the sound collected by the sound collecting unit, and The sound source direction from the sound collection unit is displayed on a display unit, the sound source direction from the sound collection unit displayed on the display unit is designated, and according to the designation of the sound source direction, A step of inputting information related to the imaging unit that images a predetermined position; a step of storing correspondence information in which the input information related to the imaging unit and the sound source direction from the sound collection unit are associated with each other in the storage unit; Having sound collection A constant way.

本発明によれば、カメラとマイクアレイとの互いの位置関係が不明である場合でも、既定の撮像位置に指向性を適正に形成できるので、既定の撮像位置における音声を明瞭に出力できる。 According to the present invention, even when the positional relationship between the camera and the microphone array is unknown, the directivity can be appropriately formed at the predetermined imaging position, so that the sound at the predetermined imaging position can be clearly output.

第１の実施形態における収音システムの構成を示すブロック図The block diagram which shows the structure of the sound collection system in 1st Embodiment. 音声処理装置の構成を示すブロック図Block diagram showing the configuration of the speech processing apparatus マイクアレイ装置の構成を示すブロック図Block diagram showing the configuration of the microphone array device マイクアレイ装置から音声処理装置に送信される音声データのパケットの構造を示す図The figure which shows the structure of the packet of the audio | voice data transmitted to a sound processing apparatus from a microphone array apparatus 収音システムが設置された店舗内のレイアウトを示す図The figure which shows the layout in the store where the sound collection system is installed プリセット処理の概略を説明する図Diagram explaining the outline of preset processing プリセット処理及び監視時における収音手順を示すフローチャートFlow chart showing sound collection procedure during preset processing and monitoring プリセット処理時に音声マップが表示されるディスプレイの画面を示す図The figure which shows the screen of the display where the voice map is displayed at the time of preset processing マイクアレイ装置に格納されたプリセット情報テーブルの登録内容を示す図The figure which shows the registration content of the preset information table stored in the microphone array apparatus プリセット処理後に表示されるディスプレイの画面を示す図The figure which shows the screen of the display which is displayed after the preset processing 監視時に表示されるディスプレイの画面及びスピーカの発音動作を示す図The figure which shows the sound generation operation | movement of the display screen and speaker displayed at the time of monitoring 第２の実施形態における収音システムの構成を示すブロック図The block diagram which shows the structure of the sound collection system in 2nd Embodiment. プリセット処理及び監視時における収音手順を示すフローチャートFlow chart showing sound collection procedure during preset processing and monitoring 図１３に続くプリセット処理及び監視時における収音手順を示すフローチャートFIG. 13 is a flowchart showing a sound collection procedure during preset processing and monitoring following FIG. プリセット処理時に表示されるディスプレイの画面を示す図Diagram showing the display screen displayed during preset processing 監視時に表示されるディスプレイの画面及びスピーカの発音動作を示す図The figure which shows the sound generation operation | movement of the display screen and speaker displayed at the time of monitoring 第３の実施形態における収音システムの構成を示すブロック図The block diagram which shows the structure of the sound collection system in 3rd Embodiment. テーブルメモリに格納されたプリセット情報テーブルの登録内容を示す図The figure which shows the registration contents of the preset information table stored in the table memory プリセット処理手順を示すフローチャートFlow chart showing preset processing procedure プリセット処理時に表示されるディスプレイの画面を示す図Diagram showing the display screen displayed during preset processing 監視時における収音手順を示すフローチャートFlow chart showing the sound collection procedure during monitoring 監視時に表示される表示されるディスプレイの画面及びスピーカの発音動作を示す図The figure which shows the sounding operation | movement of the display screen and speaker which are displayed at the time of monitoring 第３の実施形態の変形例１における監視時に表示されるディスプレイの画面を示す図The figure which shows the screen of the display displayed at the time of monitoring in the modification 1 of 3rd Embodiment. 第３の実施形態の変形例３におけるプリセット情報テーブルの登録内容を示すテーブルA table showing registered contents of the preset information table in the third modification of the third embodiment

以下、本発明に係る収音システム及び収音設定方法を具体的に開示した各実施形態について、図面を参照して説明する。 Embodiments that specifically disclose a sound collection system and a sound collection setting method according to the present invention will be described below with reference to the drawings.

（第１の実施形態）
図１は、第１の実施形態における収音システム５の構成を示すブロック図である。収音システム５は、例えばコンビニエンスストア等の店舗に設置され、監視用の複数のカメラ装置Ｃ１〜Ｃｎと、マイクアレイ装置ＭＡと、レコーダ装置４０と、ＰＣ（Personal Computer）３０とがネットワーク１５を介して相互に接続された構成である。 (First embodiment)
FIG. 1 is a block diagram illustrating a configuration of a sound collection system 5 according to the first embodiment. The sound collection system 5 is installed in a store such as a convenience store, for example, and a plurality of monitoring camera devices C1 to Cn, a microphone array device MA, a recorder device 40, and a PC (Personal Computer) 30 are connected to the network 15. Are connected to each other.

カメラ装置Ｃ１〜Ｃｎは、それぞれ画角が固定された固定カメラであり、各々の撮像対象エリアにある既定位置の周囲の映像（静止画及び動画を含む。以下同様）を撮像する。ｎはカメラ装置の識別番号に相当する正の値である。カメラ装置Ｃ１〜Ｃｎの違いは、撮像対象エリアが異なるだけで、いずれも同様の構成であるため、カメラ装置Ｃ１の構成及び動作を例示して説明する。また、カメラ装置Ｃ１と異なる仕様の場合、その都度、そのカメラ装置について説明する。カメラ装置Ｃ１は、撮像した映像のデータ（映像データ）を、ネットワーク１５を介してＰＣ３０に転送し、また、レコーダ装置４０に記録する。 Each of the camera devices C1 to Cn is a fixed camera having a fixed angle of view, and captures images (including still images and moving images, and so on) around a predetermined position in each imaging target area. n is a positive value corresponding to the identification number of the camera device. The difference between the camera devices C1 to Cn is that the imaging target areas are different and all have the same configuration, so the configuration and operation of the camera device C1 will be described as an example. In the case of specifications different from those of the camera device C1, the camera device will be described each time. The camera device C 1 transfers captured video data (video data) to the PC 30 via the network 15 and records it in the recorder device 40.

マイクアレイ装置ＭＡは、例えば店舗１０内（図５参照）の天井に設置され、複数のマイクロホンＭ１〜Ｍｎ（図３参照）が同心円状に複数個（例えば８個）下方に向けて配置され、店舗内の音声を収音可能である。マイクアレイ装置ＭＡは、各々のマイクロホンＭ１〜Ｍｎを用いて、撮像対象エリア周囲の音声を収音し、各々のマイクロホンＭ１〜Ｍｎにより収音された音声のデータ（音声データ）を、ネットワーク１５を介してＰＣ３０に送信し、また、レコーダ装置４０に記録する。なお、各マイクロホンＭ１〜Ｍｎは、無指向性マイクロホンでも良いし、双指向性マイクロホン、単一指向性マイクロホン、鋭指向性マイクロホンでも良い。 The microphone array device MA is installed on the ceiling of the store 10 (see FIG. 5), for example, and a plurality of microphones M1 to Mn (see FIG. 3) are concentrically arranged downward (for example, eight). The sound in the store can be picked up. The microphone array device MA collects sound around the area to be imaged using each of the microphones M1 to Mn, and uses the network 15 to collect sound data (voice data) collected by each of the microphones M1 to Mn. To the PC 30 and to be recorded in the recorder device 40. Each of the microphones M1 to Mn may be an omnidirectional microphone, a bi-directional microphone, a unidirectional microphone, or a sharp directional microphone.

レコーダ装置４０は、データの記録等の各処理を制御するための制御部（不図示）と、映像データ及び音声データを格納するための記録部（不図示）とを含む構成である。レコーダ装置４０は、カメラ装置Ｃ１〜Ｃｎにより撮像された各映像データと、マイクアレイ装置ＭＡにより収音された音声データとを対応付けて記録する。 The recorder device 40 includes a control unit (not shown) for controlling each process such as data recording, and a recording unit (not shown) for storing video data and audio data. The recorder device 40 records each video data imaged by the camera devices C1 to Cn in association with the audio data collected by the microphone array device MA.

ＰＣ３０は、カメラ装置Ｃ１〜Ｃｎで撮像される映像、及びマイクアレイ装置ＭＡで収音される音声を監視し、音声処理装置５０及び映像処理装置７０を有する構成である。 The PC 30 is configured to monitor an image picked up by the camera devices C1 to Cn and a sound collected by the microphone array device MA, and includes an audio processing device 50 and a video processing device 70.

図２は、音声処理装置５０の構成を示すブロック図である。音声処理装置５０は、信号処理部５１、メモリ５５、通信部５６、操作部５７、ディスプレイ５８及びスピーカ５９を有する。通信部５６は、ネットワーク１５を介してマイクアレイ装置ＭＡ、またはレコーダ装置４０から送信されたパケットＰＫＴ（図４参照）を受信して信号処理部５１に出力し、また、信号処理部５１で生成されたプリセット情報（図９参照）をマイクアレイ装置ＭＡに送信する。メモリ５５は、例えばＲＡＭ（Random Access Memory）を用いて構成され、音声処理装置５０の各部による動作時のワークメモリとして機能し、更に、音声処理装置５０の各部による動作時に必要なデータを記憶する。 FIG. 2 is a block diagram showing a configuration of the audio processing device 50. The audio processing device 50 includes a signal processing unit 51, a memory 55, a communication unit 56, an operation unit 57, a display 58, and a speaker 59. The communication unit 56 receives the packet PKT (see FIG. 4) transmitted from the microphone array device MA or the recorder device 40 via the network 15 and outputs the packet PKT to the signal processing unit 51, and the signal processing unit 51 generates the communication unit 56. The preset information (see FIG. 9) is transmitted to the microphone array device MA. The memory 55 is configured using, for example, a RAM (Random Access Memory), functions as a work memory during operation by each unit of the audio processing device 50, and further stores data necessary for operation by each unit of the audio processing device 50. .

信号処理部５１は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）又はＤＳＰ（Digital Signal Processor）を用いて構成され、音源方向検知部５２、指向性形成部５３及び入出力制御部５４を有し、ＰＣ３０の各部の動作を全体的に統括するための制御処理、他の各部との間のデータの入出力処理、データの演算（計算）処理及びデータの記憶処理を実行する。 The signal processing unit 51 is configured using, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a DSP (Digital Signal Processor), and a sound source direction detecting unit 52, a directivity forming unit 53, and an input / output control unit 54. Control processing for overall control of the operations of each unit of the PC 30, data input / output processing with other units, data calculation (calculation) processing, and data storage processing are executed.

音源方向検知部５２は、マイクアレイ装置ＭＡで収音された音声に対し、いずれの方向に音源があるか推定する。本実施形態では、音源方向は、マイクアレイ装置ＭＡを中心とする、水平角θ及び垂直角φ（図６参照）で表され、例えば音量を用いて推定される。水平角θはマイクアレイ装置ＭＡの中心を原点とする実空間上の水平面（Ｘ−Ｙ面）内の角度であり、垂直角φはマイクアレイ装置ＭＡの中心を通るＺ軸からの傾きである。例えば、音源がマイクアレイ装置ＭＡの真下近辺にある場合、垂直角φは小さな値として検出される。 The sound source direction detection unit 52 estimates in which direction the sound source is present with respect to the sound collected by the microphone array apparatus MA. In the present embodiment, the sound source direction is represented by a horizontal angle θ and a vertical angle φ (see FIG. 6) centered on the microphone array apparatus MA, and is estimated using, for example, a sound volume. The horizontal angle θ is an angle in a horizontal plane (XY plane) in the real space with the center of the microphone array device MA as the origin, and the vertical angle φ is an inclination from the Z axis passing through the center of the microphone array device MA. . For example, when the sound source is near the microphone array apparatus MA, the vertical angle φ is detected as a small value.

また、後述するプリセット処理を行うために、音声を発生させる場合、単に音量だけでなく、音の特徴から発音場所（音源方向）を特定する方法が有効になる。音に特徴を持たせる方法として、例えば、周波数が一定である正弦波、一定の周期で周波数が変化する正弦波、一定の周期でオン／オフに切り替わるホワイトノイズ、登録済みの発話等をスピーカ５９から出力することで、特徴を持った音が得られる。これにより、騒音が大きい場所（例えば、内装工事中の室内や商店街）でプリセット処理を行う場合であっても、音源方向検知部５２は、マイクアレイ装置ＭＡで収音された音声の中から、特徴を持たせた音を特定できる。また、静かな夜間にプリセット処理を行い、音声を発生させる場合でも、音源方向検知部５２は、マイクアレイ装置ＭＡで収音された音声の中から、音量が小さくても、特徴を持たせた音を特定できる。 In addition, in order to perform a preset process, which will be described later, when sound is generated, a method of specifying a sound generation location (sound source direction) based not only on sound volume but also on sound characteristics is effective. As a method for characterizing the sound, for example, a sine wave with a constant frequency, a sine wave with a frequency changing at a constant period, white noise switching on / off at a constant period, a registered utterance, etc. The sound with the characteristic can be obtained. As a result, even when preset processing is performed in a place where noise is high (for example, indoors or shopping streets where interior work is being performed), the sound source direction detection unit 52 can select from the sounds collected by the microphone array device MA. , It can identify the sound with the characteristics. In addition, even when preset processing is performed in a quiet night and sound is generated, the sound source direction detection unit 52 is characterized even if the volume is low from the sound collected by the microphone array device MA. Sound can be identified.

また、特徴を持たせた音を発生させる音源の方向を特定させる方法として、次の２通りの方法がある。第１の方法として、音源方向検知部５２は、マイクアレイ装置ＭＡで収音された音声に対し、音量の大きい場所から順番に音の特徴を分析し、特徴が一致した音の方向を音源の方向と判断する。第２の方法として、音源方向検知部５２は、撮像対象エリアを分割し、分割された各エリア（分割エリアともいう）毎に音の特徴をサーチし、特徴が一致した分割エリアの方向を音源の方向と判断する。第２の方法では、音源方向検知部５２は、一定の収音時間で音声データを取り込んでから音の特徴を探索（サーチ）するので、音量は無関係であり、小音量で済む。従って、周囲に迷惑がかからず、騒がしい場所でも、特徴を持たせた音を特定できる。第３の方法として、発生信号との相互相関で判断することも可能で有る。なお、後述するように、プリセット処理時、音源から発声する音声としては、人間が発した音声、ブザーを鳴らした音、スピーカから出力された音等が挙げられる。 In addition, there are the following two methods for specifying the direction of a sound source that generates a characteristic sound. As a first method, the sound source direction detection unit 52 analyzes the sound characteristics in order from the place where the volume is high with respect to the sound picked up by the microphone array device MA, and determines the direction of the sound having the matched characteristics as the sound source. Judge the direction. As a second method, the sound source direction detection unit 52 divides the imaging target area, searches for a sound feature for each divided area (also referred to as a divided area), and determines the direction of the divided area where the features match as a sound source. Judging from the direction. In the second method, the sound source direction detection unit 52 retrieves sound data after a certain sound collection time and searches for the characteristics of the sound. Therefore, the sound volume is irrelevant and a small sound volume is sufficient. Accordingly, it is possible to specify a sound having a characteristic even in a noisy place where there is no inconvenience to the surroundings. As a third method, it is possible to make a determination based on the cross-correlation with the generated signal. As will be described later, the voice uttered from the sound source during the preset processing includes a voice uttered by a human, a buzzer sound, a sound output from a speaker, and the like.

指向性形成部５３は、マイクアレイ装置ＭＡから直接転送された音声データ或いはレコーダ装置４０に記録された音声データを用いて、音声データの指向性制御処理によって各々のマイクロホンＭ１〜Ｍｎにより収音された各音声データを加算し、マイクアレイ装置ＭＡの各マイクロホンＭ１〜Ｍｎの位置から特定方向への音声（音量レベル）を強調（増幅）するために、特定方向への指向性を形成した音声データを生成する。特定方向とは、マイクアレイ装置ＭＡから操作部５７で指定された位置に対応する実空間上の位置に向かう方向（指向方向ともいう）である。なお、マイクアレイ装置ＭＡによって収音される音声の指向性を形成するための音声データの指向性制御処理に関する技術は、例えば特開２０１４−１４３６７８号公報（上述した特許文献１）等に示されるように、公知の技術である。 The directivity forming unit 53 uses the audio data directly transferred from the microphone array device MA or the audio data recorded in the recorder device 40 to collect sound by each of the microphones M1 to Mn by the audio data directivity control processing. In order to emphasize (amplify) the sound (volume level) in the specific direction from the positions of the microphones M1 to Mn of the microphone array device MA, the sound data in which directivity in the specific direction is formed is added. Is generated. The specific direction is a direction (also referred to as a pointing direction) from the microphone array device MA toward a position in the real space corresponding to the position designated by the operation unit 57. In addition, the technique regarding the directivity control process of the audio | voice data for forming the directivity of the audio | voice collected by the microphone array apparatus MA is shown by Unexamined-Japanese-Patent No. 2014-143678 (patent document 1 mentioned above) etc., for example. Thus, it is a known technique.

入出力制御部５４は、操作部５７、ディスプレイ５８及びスピーカ５９に対し、各種データの入出力を制御する。操作部５７は、例えばディスプレイ５８の画面に対応して配置され、ユーザの指又はスタイラスペンによって入力操作が可能なタッチパネル又はタッチパッドを用いて構成される。操作部５７は、ユーザの操作に応じて、音声データの音量レベルの強調（増幅）を所望する１つ以上の指定箇所（座標）のデータを信号処理部５１に出力する。なお、操作部５７は、マウス又はキーボード等のポインティングデバイスを用いて構成されても良い。 The input / output control unit 54 controls input / output of various data to the operation unit 57, the display 58, and the speaker 59. The operation unit 57 is arranged corresponding to the screen of the display 58, for example, and is configured using a touch panel or a touch pad that can be input by a user's finger or stylus pen. The operation unit 57 outputs data of one or more designated locations (coordinates) for which enhancement (amplification) of the volume level of the audio data is desired to the signal processing unit 51 in accordance with a user operation. The operation unit 57 may be configured using a pointing device such as a mouse or a keyboard.

ディスプレイ５８は、音源方向検知部５２で推定された音源位置を示す音声マップ６５を表示する。スピーカ５９は、マイクアレイ装置ＭＡによって収音され、ネットワーク１５を介して転送された音声データ或いはレコーダ装置４０に記録された音声データ、若しくはその音声データを基に指向性形成部５３によって特定方向への強調処理を行った音声データを出力する。 The display 58 displays an audio map 65 indicating the sound source position estimated by the sound source direction detection unit 52. The speaker 59 is picked up by the microphone array device MA and transferred in a specific direction by the directivity forming unit 53 based on the sound data transferred via the network 15 or the sound data recorded in the recorder device 40 or the sound data. Audio data that has been subjected to the emphasis process is output.

一方、映像処理装置７０は、音声処理装置５０と連動せず独立して動作し、ユーザからの操作指示に従い、固定カメラ装置Ｃ１〜Ｃｎによって撮像された映像データをディスプレイ５８に表示する制御を行う。つまり、ユーザが使用するカメラ装置を選択すると、映像処理装置７０は、この選択されたカメラ装置による映像を、搭載したカメラモニタ７１に表示させる。 On the other hand, the video processing device 70 operates independently without being linked to the audio processing device 50, and performs control to display video data captured by the fixed camera devices C1 to Cn on the display 58 in accordance with an operation instruction from the user. . That is, when the user selects a camera device to be used, the video processing device 70 displays a video image from the selected camera device on the mounted camera monitor 71.

図３は、マイクアレイ装置ＭＡの構成を示すブロック図である。マイクアレイ装置ＭＡは、全方位（３６０度）の方向の音声を収音し、複数のマイクロホンユニット（単にマイクロホンともいう）Ｍ１〜Ｍｎ（ここではｎ＝８）、複数のマイクロホンユニットＭ１〜Ｍｎの出力信号をそれぞれ増幅する複数の増幅器（アンプ）ＰＡ１〜ＰＡｎ、各増幅器ＰＡ１〜ＰＡｎから出力されるアナログ信号をそれぞれデジタル信号に変換する複数のＡ／Ｄ変換器Ａ１〜Ａｎ、符号化部２５、記憶部２４及び送信部２６を有する。 FIG. 3 is a block diagram showing the configuration of the microphone array apparatus MA. The microphone array device MA collects sound in all directions (360 degrees), and includes a plurality of microphone units (also simply referred to as microphones) M1 to Mn (here, n = 8) and a plurality of microphone units M1 to Mn. A plurality of amplifiers (amplifiers) PA1 to PAn for amplifying output signals, a plurality of A / D converters A1 to An for converting analog signals output from the amplifiers PA1 to PAn into digital signals, an encoding unit 25, A storage unit 24 and a transmission unit 26 are included.

記憶部２４には、カメラ装置Ｃ１〜Ｃｎで撮像される既定位置（プリセット位置Ｐ１〜Ｐｎ）とマイクアレイ装置ＭＡからの指向方向（具体的には、水平角θ及び垂直角φの組）との対応関係を表すプリセット情報を記憶する。符号化部２５は、Ａ／Ｄ変換器Ａ１〜Ａｎから出力されるデジタル音声信号に、記憶部２４に記憶されたプリセット情報を付加して、音声データのパケットＰＫＴを生成する。送信部２６は、符号化部２５で生成された音声データのパケットＰＫＴを、ネットワーク１５を介して音声処理装置５０に送信する。 The storage unit 24 stores a predetermined position (preset positions P1 to Pn) picked up by the camera devices C1 to Cn and a directivity direction from the microphone array device MA (specifically, a set of a horizontal angle θ and a vertical angle φ). The preset information representing the correspondence relationship is stored. The encoding unit 25 adds the preset information stored in the storage unit 24 to the digital audio signal output from the A / D converters A1 to An to generate a packet PKT of audio data. The transmission unit 26 transmits the packet PKT of the audio data generated by the encoding unit 25 to the audio processing device 50 via the network 15.

このように、マイクアレイ装置ＭＡは、マイクロホンＭ１〜Ｍｎの出力信号を増幅器ＰＡ１〜ＰＡｎで増幅し、Ａ／Ｄ変換器Ａ１〜Ａｎでデジタル音声信号に変換した後、記憶部２４に記憶されているプリセット情報をデジタル音声信号に付加して音声データのパケットＰＫＴを生成し、この音声データのパケットＰＫＴを、ネットワーク１５を介してＰＣ３０内の音声処理装置５０に送信する。 As described above, the microphone array apparatus MA amplifies the output signals of the microphones M1 to Mn by the amplifiers PA1 to PAn, converts them into digital audio signals by the A / D converters A1 to An, and then stores them in the storage unit 24. The audio data packet PKT is generated by adding the preset information to the digital audio signal, and the audio data packet PKT is transmitted to the audio processing device 50 in the PC 30 via the network 15.

図４は、マイクアレイ装置ＭＡから音声処理装置５０に送信される音声データのパケットＰＫＴの構造を示す図である。音声データのパケットＰＫＴは、ヘッダ及びペイロードである音声データから構成される。このヘッダには、前述したプリセット情報が含まれる。 FIG. 4 is a diagram showing the structure of a packet PKT of audio data transmitted from the microphone array device MA to the audio processing device 50. The audio data packet PKT is composed of audio data as a header and a payload. This header includes the preset information described above.

本実施形態では、音声データのパケットＰＫＴにプリセット情報を含めることで、マイクアレイ装置ＭＡから音声処理装置５０にプリセット情報を送信していたが、他の方法で音声処理装置５０がプリセット情報を取得してもよい。例えば、音声処理装置５０がマイクアレイ装置ＭＡの初期情報を読み込む際、プリセット情報を一緒に読み込んでもよい。また、マイクアレイ装置ＭＡが音声処理装置５０から要求に応じて送信してもよい。 In the present embodiment, the preset information is transmitted from the microphone array device MA to the sound processing device 50 by including the preset information in the packet PKT of the sound data. However, the sound processing device 50 acquires the preset information by another method. May be. For example, when the audio processing device 50 reads the initial information of the microphone array device MA, the preset information may be read together. Further, the microphone array apparatus MA may transmit the voice processing apparatus 50 in response to a request.

図５は、収音システム５が設置された店舗１０内のレイアウトを示す図である。一例として、コンビニエンスストア等の店舗１０には、「出入口」、「レジＲ１」、「レジＲ２」、３列の「商品棚」、「弁当棚」、「惣菜棚」、「飲料（ドリンク）棚」及び「雑誌棚」が配置される。店舗１０の天井には、マイクアレイ装置ＭＡが設置され、また、店舗１０の壁上部或いは天井には、複数のカメラ装置Ｃ１〜Ｃｎ（ここでは、ｎ＝４）が設置される。 FIG. 5 is a diagram showing a layout in the store 10 in which the sound collection system 5 is installed. As an example, a store 10 such as a convenience store includes “entrance / exit”, “register R1”, “register R2”, three rows of “product shelf”, “bento shelf”, “delivery shelf”, “beverage (drink) shelf ”And“ Magazine shelf ”are arranged. A microphone array device MA is installed on the ceiling of the store 10, and a plurality of camera devices C 1 to Cn (here, n = 4) are installed on the upper wall or ceiling of the store 10.

図５中、複数のカメラ装置Ｃ１〜Ｃｎは、予め店舗１０内に設定された複数のプリセット位置Ｐ１〜Ｐｎをそれぞれ撮像するように向けられている。複数のプリセット位置Ｐ１〜Ｐｎは、監視される撮像対象エリアとしてユーザによって決定される。各々のカメラ装置Ｃ１〜Ｃｎで撮像される映像の撮像範囲はそれぞれＣＲ１〜ＣＲｎで表され、各撮像範囲ＣＲ１〜ＣＲｎの略中心には、それぞれプリセット位置Ｐ１〜Ｐｎが存在する。 In FIG. 5, the plurality of camera devices C 1 to Cn are directed to image a plurality of preset positions P 1 to Pn set in the store 10 in advance. The plurality of preset positions P1 to Pn are determined by the user as an imaging target area to be monitored. The imaging ranges of the images captured by the respective camera devices C1 to Cn are represented by CR1 to CRn, respectively, and preset positions P1 to Pn exist at substantially the centers of the imaging ranges CR1 to CRn, respectively.

カメラ装置Ｃ１、Ｃ２は、それぞれ監視対象エリア（収音領域）である「レジＲ１」のプリセット位置Ｐ１、及び「レジＲ２」のプリセット位置Ｐ２の映像を撮像する。カメラ装置Ｃ３は、収音領域である「雑誌棚」のプリセット位置Ｐ３の映像を撮像する。このように、既にカメラ装置Ｃ１〜Ｃ４が設置されている店舗１０内において、マイクアレイ装置ＭＡを後から取り付ける場合には、後述するプリセット処理が行われる。 The camera devices C1 and C2 capture images of the preset position P1 of “Register R1” and the preset position P2 of “Register R2”, which are monitoring target areas (sound collection areas), respectively. The camera device C3 captures an image of the preset position P3 of the “magazine shelf” that is the sound collection area. Thus, when the microphone array apparatus MA is attached later in the store 10 where the camera apparatuses C1 to C4 are already installed, a preset process described later is performed.

上記構成を有する収音システム５の動作を示す。始めに、収音システム５の運用（監視）開始前に行われるプリセット処理について説明する。ここで、プリセット処理とは、マイクアレイ装置ＭＡから店舗１０内の既定位置（プリセット位置）Ｐ１〜Ｐｎに向かう方向（つまり、マイクアレイ装置ＭＡの中心からの水平角θ及び垂直角φ）を設定する処理である。図６は、プリセット処理の概略を説明する図である。店舗１０の天井ＲＦには、マイクアレイ装置ＭＡとカメラ装置Ｃ１〜Ｃｎとが設置される。固定カメラであるカメラ装置Ｃ１〜Ｃｎは、プリセット位置Ｐ１〜Ｐｎに向けられ、カメラ装置Ｃ１〜Ｃｎで撮像された映像は、映像処理装置７０に搭載されるカメラモニタ７１に表示される。プリセット処理に伴う作業では、プリセット位置Ｐ１〜Ｐｎに音源（発音源）が置かれる。音源としては、前述したように、例えば人間の発声、ブザー音、スピーカから出力される音等が挙げられる。図６では、床面ＦＬＲに立つ発声者８１がプリセット位置Ｐ１〜Ｐｎで発声する場合が示されている。マイクアレイ装置ＭＡは、音声を収音すると、その音声データを音声処理装置５０に送信する。音声処理装置５０は、収音された音声の発音源位置をディスプレイ５８の画面（後述する音声マップ６５）に音源マーク（マーカ）ＳＤとして表示する。 Operation | movement of the sound collection system 5 which has the said structure is shown. First, a preset process performed before the operation (monitoring) of the sound collection system 5 is started will be described. Here, the preset processing is to set the directions from the microphone array device MA toward the predetermined positions (preset positions) P1 to Pn in the store 10 (that is, the horizontal angle θ and the vertical angle φ from the center of the microphone array device MA). It is processing to do. FIG. 6 is a diagram for explaining the outline of the preset process. On the ceiling RF of the store 10, a microphone array device MA and camera devices C1 to Cn are installed. Camera devices C1 to Cn, which are fixed cameras, are directed to preset positions P1 to Pn, and images captured by the camera devices C1 to Cn are displayed on a camera monitor 71 mounted on the video processing device 70. In the work associated with the preset processing, a sound source (sound source) is placed at the preset positions P1 to Pn. As described above, the sound source includes, for example, a human voice, a buzzer sound, a sound output from a speaker, and the like. FIG. 6 shows a case where a speaker 81 standing on the floor surface FLR speaks at preset positions P1 to Pn. When the microphone array device MA picks up the sound, it transmits the sound data to the sound processing device 50. The sound processing device 50 displays the sound source position of the collected sound as a sound source mark (marker) SD on the screen of the display 58 (a sound map 65 described later).

なお、カメラモニタ７１は、音声処理装置５０に含まれるディスプレイ５８で代用されてもよい。また、音声処理装置５０と映像処理装置７０とが監視装置として一体化される場合、ディスプレイの画面（ウインドウ）を切り替えることで、ディスプレイはカメラモニタとして使用可能であるし、分割表示で両方を同時に表示することも可能である。 The camera monitor 71 may be replaced with a display 58 included in the audio processing device 50. In addition, when the audio processing device 50 and the video processing device 70 are integrated as a monitoring device, the display can be used as a camera monitor by switching the display screen (window), and both can be simultaneously displayed in divided display. It is also possible to display.

図７は、プリセット処理及び監視時における収音手順を示すフローチャートである。マイクアレイ装置ＭＡを店舗１０の天井に取り付けた後、プリセット処理が行われる。まず、音声処理装置５０は、マイクアレイ装置ＭＡの初期設定を行う（Ｓ１）。この初期設定では、音声処理装置５０は、マイクアレイ装置ＭＡのＩＰアドレスを設定し、マイクアレイ装置ＭＡを通信可能な状態にする。更に、音声処理装置５０は、プリセットモードに入り、音声マップ６５（図８参照）をディスプレイ５８に表示させる。 FIG. 7 is a flowchart showing a sound collection procedure during preset processing and monitoring. After the microphone array device MA is attached to the ceiling of the store 10, preset processing is performed. First, the voice processing device 50 performs initial setting of the microphone array device MA (S1). In this initial setting, the voice processing device 50 sets the IP address of the microphone array device MA and makes the microphone array device MA communicable. Further, the audio processing device 50 enters the preset mode and displays the audio map 65 (see FIG. 8) on the display 58.

マイクアレイ装置ＭＡの初期設定が終わると、カメラ装置Ｃ１〜Ｃｎが向けられたプリセット位置Ｐ１〜Ｐｎに音源（発音源）を置き、所定音量以上で所定時間音源が発音する（Ｓ２）。ここでは、音源として発声者８１がプリセット位置Ｐ１〜Ｐｎで発声する。マイクアレイ装置ＭＡは、この音声を収音し、この音声データを音声処理装置５０に送信する。 When the initial setting of the microphone array device MA is completed, a sound source (sound generation source) is placed at preset positions P1 to Pn to which the camera devices C1 to Cn are directed, and a sound source is sounded at a predetermined volume or more for a predetermined time (S2). Here, the speaker 81 utters at the preset positions P1 to Pn as a sound source. The microphone array device MA collects the sound and transmits the sound data to the sound processing device 50.

音声処理装置５０内の通信部５６は、マイクアレイ装置ＭＡから送信された音声データを受信する（Ｓ３）。音声処理装置５０内の音源方向検知部５２は、受信した音声データの音量を基に、マイクアレイ装置ＭＡから音源に向かう指向方向（水平角θ及び垂直角φ）を求め、発音源位置を表す音源マークＳＤ１をディスプレイ５８に表示された音声マップ６５上に表示する（Ｓ４）。図８は、プリセット処理時に音声マップ６５が表示されるディスプレイ５８の画面を示す図である。 The communication unit 56 in the sound processing device 50 receives the sound data transmitted from the microphone array device MA (S3). The sound source direction detection unit 52 in the sound processing device 50 obtains the directivity direction (horizontal angle θ and vertical angle φ) from the microphone array device MA toward the sound source based on the volume of the received sound data, and represents the sound source position. The sound source mark SD1 is displayed on the voice map 65 displayed on the display 58 (S4). FIG. 8 is a diagram showing a screen of the display 58 on which the audio map 65 is displayed during the preset process.

音声マップ６５は、マイクアレイ装置ＭＡの位置を中心点Ｏとする、３つの同心円６５ｈ、６５ｉ、６５ｊ及びこれらの中心角を１２等分する半径となる線分６５ｍで描画される。３つの同心円のうち、最も内側の同心円６５ｈは垂直角φ＝３０°に相当し、中間の同心円６５ｉは垂直角φ＝６０°に相当し、最も外側の同心円６５ｊは垂直角φ＝９０°に相当する。従って、音源マークＳＤ１が内側にある程、マイクアレイ装置ＭＡに近くなる。また、中心点Ｏから右側の水平方向に延びる線分６５ｍは、中心角が０°であり、水平角θ＝０°に相当する。１２等分された線分６５ｍは、中心角０°から３０°刻みで水平角０°〜３６０°を表す。ここでは、音声マップ６５上の座標（θ，φ）として、座標（２４０°，７０°）に音源マークＳＤ１が描かれている。この音源マークＳＤ１は、プリセット位置にある音源として確定される前であるので、矩形で描画される。なお、図８では、説明のために、３０°、６０°等の角度が付されているが、表示されなくてもよい。また目盛りの表示も、例えば垂直角を１５°毎にとっても構わない。 The audio map 65 is drawn with three concentric circles 65h, 65i, 65j having a center point O at the position of the microphone array device MA and a line segment 65m having a radius that divides these central angles into 12 equal parts. Of the three concentric circles, the innermost concentric circle 65h corresponds to a vertical angle φ = 30 °, the middle concentric circle 65i corresponds to a vertical angle φ = 60 °, and the outermost concentric circle 65j has a vertical angle φ = 90 °. Equivalent to. Therefore, the closer to the sound source mark SD1, the closer to the microphone array device MA. A line segment 65m extending in the horizontal direction on the right side from the center point O has a center angle of 0 ° and corresponds to a horizontal angle θ = 0 °. A line segment 65m divided into 12 equals represents a horizontal angle of 0 ° to 360 ° in increments of 30 ° from the central angle of 0 °. Here, the sound source mark SD1 is drawn at coordinates (240 °, 70 °) as coordinates (θ, φ) on the audio map 65. Since this sound source mark SD1 is before being determined as the sound source at the preset position, it is drawn in a rectangle. In FIG. 8, for the sake of explanation, angles such as 30 ° and 60 ° are given, but they may not be displayed. In addition, the scale display may be performed with a vertical angle of, for example, every 15 °.

ユーザ（発声者）は、ディスプレイ５８に表示された音源マークＳＤ１をカーソル８７で選択し、この音源マークＳＤ１に対応するカメラ装置Ｃ１の情報（カメラ情報）を入力する（Ｓ５）。音源マークＳＤ１が選択されると、ディスプレイ５８の画面右下隅には、カメラ情報の入力欄８８が表示される。カメラ情報の入力欄８８には、場所（例えばレジＲ１）及びカメラＩＰアドレスがユーザ操作により入力可能である。なお、ユーザが音源マークを選択する代わりに、マイクアレイ装置が収音した音声の音量が閾値以上である時間が所定時間続いた場合に、音声処理装置が自動的にその音声の発生方向を認識してディスプレイに音源マークを表示させ、ユーザにカメラ情報の入力を促すようにしてもよい。 The user (speaker) selects the sound source mark SD1 displayed on the display 58 with the cursor 87, and inputs information (camera information) of the camera device C1 corresponding to the sound source mark SD1 (S5). When the sound source mark SD1 is selected, a camera information input field 88 is displayed in the lower right corner of the screen of the display 58. In the camera information input field 88, a location (for example, cash register R1) and a camera IP address can be input by a user operation. Note that instead of the user selecting a sound source mark, if the sound volume picked up by the microphone array device continues for a predetermined time, the sound processing device automatically recognizes the direction of the sound. Then, a sound source mark may be displayed on the display to prompt the user to input camera information.

音声処理装置５０は、音源マークＳＤ１の水平角θ及び垂直角φを読み出し（Ｓ６）、ステップＳ５で入力されたカメラ情報と、音源マークＳＤ１の水平角θ及び垂直角φとをマイクアレイ装置ＭＡに送信する（Ｓ７）。マイクアレイ装置ＭＡは、音声処理装置５０から送信されたプリセット情報をプリセット情報テーブル９０（図９参照）に登録して記憶部２４に格納する。 The sound processing device 50 reads out the horizontal angle θ and the vertical angle φ of the sound source mark SD1 (S6), and uses the camera information input at step S5 and the horizontal angle θ and the vertical angle φ of the sound source mark SD1 as the microphone array device MA. (S7). The microphone array device MA registers the preset information transmitted from the sound processing device 50 in the preset information table 90 (see FIG. 9) and stores it in the storage unit 24.

図９は、マイクアレイ装置ＭＡに格納されたプリセット情報テーブル９０の登録内容を示す図である。プリセット情報テーブル９０には、カメラ装置Ｃ１〜Ｃｎの撮像対象エリアに含まれるプリセット位置Ｐ１〜Ｐｎ毎に、カメラＩＰアドレス、場所及び指向方向（水平角θ，垂直角φ）が登録される。 FIG. 9 is a diagram showing registration contents of the preset information table 90 stored in the microphone array apparatus MA. In the preset information table 90, a camera IP address, a location, and a directing direction (horizontal angle θ, vertical angle φ) are registered for each of the preset positions P1 to Pn included in the imaging target areas of the camera devices C1 to Cn.

ユーザは全てのプリセット位置の設定処理が終了したか否かを判断し（Ｓ８）、全てのプリセット位置の設定処理が終了していない場合、プリセット処理はステップＳ２に戻り、同様の処理が繰り返される。 The user determines whether or not all preset position setting processes have been completed (S8). If all preset position setting processes have not been completed, the preset process returns to step S2 and the same process is repeated. .

一方、全てのプリセット位置でプリセット処理が終了すると、運用開始前のプリセット処理が完了し、運用（実際の監視処理）を開始する。音声処理装置５０は、マイクアレイ装置ＭＡから音声データを取得し、この音声データのパケットＰＫＴのヘッダに含まれるプリセット情報から抽出された全てのプリセット位置をディスプレイ５８に表示する（Ｓ９）。図１０は、プリセット処理後に表示されるディスプレイ５８の画面を示す図である。ディスプレイ５８の画面に表示された音声マップ６５上には、プリセット位置である「レジＲ１」、「レジＲ２」、「雑誌棚」、「飲料棚」の音源方向（指向方向）として確定した、それぞれ音源マークＳＤ１、ＳＤ２、ＳＤ３、ＳＤ４が円形で描画される。特に音源マークＳＤ１〜ＳＤ４を区別する必要が無い場合、単に音源マークＳＤと総称する。また、ディスプレイ５８の画面の右下隅には、カメラ情報６７が表示される。カメラ情報６７には、カメラ装置Ｃ１〜Ｃｎに対応するプリセット位置とカメラＩＰアドレスが含まれる。 On the other hand, when the preset process is completed at all preset positions, the preset process before the start of operation is completed, and the operation (actual monitoring process) is started. The audio processing device 50 acquires the audio data from the microphone array device MA, and displays all preset positions extracted from the preset information included in the header of the packet PKT of the audio data on the display 58 (S9). FIG. 10 is a diagram showing a screen of the display 58 displayed after the preset process. On the audio map 65 displayed on the screen of the display 58, the sound source directions (direction directions) of the “registration R1”, “registration R2”, “magazine shelf”, and “beverage shelf” that are preset positions are determined, respectively. The sound source marks SD1, SD2, SD3, and SD4 are drawn in a circle. In particular, when it is not necessary to distinguish the sound source marks SD1 to SD4, they are simply referred to as a sound source mark SD. Camera information 67 is displayed at the lower right corner of the screen of the display 58. The camera information 67 includes preset positions and camera IP addresses corresponding to the camera devices C1 to Cn.

音声処理装置５０は、ディスプレイ５８に表示されているプリセット位置がユーザによって操作部５７を介して指定されると（Ｓ１０）、指定されたプリセット位置の指向方向を表す水平角θ及び垂直角φを読み込む（Ｓ１１）。音声処理装置５０内の指向性形成部５３は、読み込んだ水平角θ及び垂直角φで特定されるプリセット位置の指向方向に音声データの指向性を形成し、スピーカ５９から音声を出力する（Ｓ１２）。 When the preset position displayed on the display 58 is designated by the user via the operation unit 57 (S10), the audio processing device 50 sets the horizontal angle θ and the vertical angle φ representing the pointing direction of the designated preset position. Read (S11). The directivity forming unit 53 in the sound processing device 50 forms the directivity of the sound data in the directivity direction of the preset position specified by the read horizontal angle θ and vertical angle φ, and outputs the sound from the speaker 59 (S12). ).

また、運用（つまり、実際の監視）時に収音された音声再生中にユーザがディスプレイ５８に表示された別の位置を指定すると（Ｓ１３、ＹＥＳ）、音声処理装置５０は、指定された位置のプリセット情報から、水平角と垂直角とを読込み（Ｓ１１）、指向方向に音声データの指向性を形成し、スピーカ５９から音声を出力する（Ｓ１２）。 Further, when the user designates another position displayed on the display 58 during the reproduction of the sound collected during operation (that is, actual monitoring) (S13, YES), the sound processing device 50 has the designated position. The horizontal angle and the vertical angle are read from the preset information (S11), the directivity of the audio data is formed in the directivity direction, and the sound is output from the speaker 59 (S12).

一方、新たな指定位置の指定が無ければ（Ｓ１３、ＮＯ）、音声処理装置５０は、電源がＯＦＦになるまで再生を続ける（Ｓ１４）。尚、ユーザの指示で、電源ＯＦＦまで指向性形成を続けるのでなく、ユーザの支持で指向性形成を解除して、次の位置指定があるまで全体の音をモニタリングしても良い。 On the other hand, if there is no designation of a new designated position (S13, NO), the audio processing device 50 continues reproduction until the power is turned off (S14). The directivity formation may not be continued until the power is turned off by the user's instruction, but the directivity formation may be canceled by the user's support and the entire sound may be monitored until the next position is designated.

図１１は、監視時に表示されるディスプレイ５８の画面及びスピーカ５９の発音動作を示す図である。運用（監視）時、例えばユーザが操作部５７を介して音源マークＳＤ３を選択すると、音声処理装置５０は、音源マークＳＤ３の指向方向（θ３，φ３）、つまり雑誌棚（プリセット位置Ｐ３）の方向に音声データの指向性を形成して音声を収音し、スピーカ５９からこの音声を出力する。この時、プリセット位置Ｐ３又はその近傍で、音声処理装置５０は、音量が閾値を超えるような異常音（大きな音）を検出すると、音源マークＳＤ３を点滅させてユーザに知らせる。ここでは、点滅させることで音源マークを識別可能に表示したが、色、形状、サイズ等を変えることでもよい。また、音声処理装置５０は、プリセット位置の音量の大きさに応じて、音源マーク色を変化させ、又は音源マークのサイズや形状等を変化させたり、カメラ情報の中から該当するカメラ情報だけ文字色または背景色を変化させてもよい。 FIG. 11 is a diagram illustrating the sound generation operation of the screen of the display 58 and the speaker 59 displayed during monitoring. During operation (monitoring), for example, when the user selects the sound source mark SD3 via the operation unit 57, the sound processing device 50 directs the direction of the sound source mark SD3 (θ3, φ3), that is, the direction of the magazine shelf (preset position P3). The directivity of the sound data is formed in the sound and the sound is collected, and this sound is output from the speaker 59. At this time, if the sound processing device 50 detects an abnormal sound (loud sound) whose volume exceeds the threshold value at or near the preset position P3, the sound processing mark SD3 blinks to notify the user. Here, the sound source mark is displayed so as to be identifiable by blinking, but the color, shape, size, etc. may be changed. Also, the audio processing device 50 changes the sound source mark color, changes the size or shape of the sound source mark, etc. according to the volume of the preset position, or changes only the corresponding camera information from the camera information. The color or background color may be changed.

以上により、第１の実施形態の収音システム５では、カメラ装置Ｃ１〜Ｃ４が既に設置された店舗１０内において、マイクアレイ装置ＭＡを後から取り付ける際、プリセット処理では、カメラ装置Ｃ１〜Ｃｎの撮像中心である光軸方向にプリセット位置Ｐ１〜Ｐｎに音源を置き、音声を発生させる。マイクアレイ装置ＭＡが音源から出力される音声を収音してその音声データを音声処理装置５０に送ると、音源方向検知部５２は、ディスプレイ５８に指向方向（水平角θ，垂直角φ）を表す音源マークＳＤ〜ＳＤ４を表示させ、ユーザにその選択を促すとともにカメラ情報の入力を促す。音声処理装置５０は、入力されたカメラ情報と指向方向をマイクアレイ装置ＭＡに送信する。マイクアレイ装置ＭＡは、カメラ情報と指向方向をプリセット情報テーブル９０に登録して記憶部２４に格納する。運用時、ユーザがディスプレイ５８に表示された音声マップ６５上の音源マークＳＤ〜ＳＤ４のいずれかを選択すると、マイクアレイ装置ＭＡで収音された音声データに対し、指向性形成部５３は、その音源マークに対応する指向方向（水平角θ，垂直角φ）に音声データの指向性を形成し、音声処理装置５０は、スピーカ５９から音声を出力させる。 As described above, in the sound collection system 5 according to the first embodiment, when the microphone array device MA is attached later in the store 10 where the camera devices C1 to C4 are already installed, the preset processing includes the camera devices C1 to Cn. A sound source is placed at preset positions P1 to Pn in the optical axis direction, which is the center of imaging, and a sound is generated. When the microphone array device MA picks up the sound output from the sound source and sends the sound data to the sound processing device 50, the sound source direction detecting unit 52 indicates the directional direction (horizontal angle θ, vertical angle φ) on the display 58. The sound source marks SD to SD4 to be displayed are displayed to prompt the user to select the sound source marks SD to SD4 and to input camera information. The sound processing device 50 transmits the input camera information and the directivity direction to the microphone array device MA. The microphone array apparatus MA registers the camera information and the directing direction in the preset information table 90 and stores them in the storage unit 24. During operation, when the user selects any one of the sound source marks SD to SD4 on the sound map 65 displayed on the display 58, the directivity forming unit 53 for the sound data collected by the microphone array device MA, The directivity of the sound data is formed in the directivity direction (horizontal angle θ, vertical angle φ) corresponding to the sound source mark, and the sound processing device 50 causes the speaker 59 to output sound.

これにより、収音システム５は、カメラ装置Ｃ１〜Ｃｎとマイクアレイ装置ＭＡとの互いの位置関係が不明である場合でも、マイクアレイ装置ＭＡから既定の撮像位置（つまり、プリセット位置Ｐ１〜Ｐｎ）に向かう指向方向に指向性を形成でき、そこで発音している音源からの音声を明瞭に聞くことができる。従って、収音システム５は、カメラ装置の座標系とマイクアレイ装置の座標系との対応関係を得るためには、取り付け位置や方向等を現場で計測したり、幾何学的な計算を行う必要が無くなり、簡便にカメラ装置とマイクアレイ装置とを対応付けることができる。また、音声処理装置５０は、マイクアレイ装置ＭＡだけを用いてプリセット情報を得ることができる。 As a result, the sound collection system 5 has a predetermined imaging position (that is, preset positions P1 to Pn) from the microphone array device MA even when the positional relationship between the camera devices C1 to Cn and the microphone array device MA is unknown. Directivity can be formed in the directivity direction toward, and the sound from the sound source that is pronounced there can be heard clearly. Therefore, in order to obtain the correspondence between the coordinate system of the camera device and the coordinate system of the microphone array device, the sound collection system 5 needs to measure the mounting position, direction, etc. on the site or perform geometric calculations. Therefore, the camera device and the microphone array device can be easily associated with each other. Further, the audio processing device 50 can obtain preset information using only the microphone array device MA.

また、収音システム５は、運用（監視）時には、プリセット情報を基にプリセット位置に対応付けられた指向方向（水平角θ，垂直角φ）に音声データの指向性を形成し、スピーカ５９からプリセット位置で収音された音声を出力できる。 In operation (monitoring), the sound collection system 5 forms sound data directivity in the directivity direction (horizontal angle θ, vertical angle φ) associated with the preset position based on the preset information. The sound collected at the preset position can be output.

また、ディスプレイ５８は、マイクアレイ装置ＭＡを中心とし、中心角が水平角かつ半径の長さが垂直角の大きさで表された同心円で描画された音声マップ６５を表示し、この音声マップ６５上に音源マークＳＤｎを表示するので、ユーザが音源マークＳＤｎで示されたプリセット位置Ｐｎを容易に視認できる。 The display 58 displays an audio map 65 drawn with concentric circles centered on the microphone array device MA, the central angle of which is a horizontal angle and the length of the radius is represented by a vertical angle. Since the sound source mark SDn is displayed on the top, the user can easily visually recognize the preset position Pn indicated by the sound source mark SDn.

また、操作部５７は、音声マップ６５上に表示された音源マークＳＤの指定を受け付けると、指向性形成部５３は、指定された音源マークＳＤに対応付けられた指向方向に、マイクアレイ装置ＭＡで収音された音声の音声データの指向性を形成するので、ユーザはプリセット位置Ｐｎで発せられた音声を簡単な操作で聴くことができる。 When the operation unit 57 receives the designation of the sound source mark SD displayed on the audio map 65, the directivity forming unit 53 causes the microphone array apparatus MA to move in the directivity direction associated with the designated sound source mark SD. Since the directivity of the voice data of the voice picked up at is formed, the user can listen to the voice uttered at the preset position Pn with a simple operation.

また、ディスプレイ５８は、プリセット位置Ｐで発せられた音声の音量が閾値を超える場合、その音源マークＳＤを点滅して表示するので、音量が閾値を超えるような大きな音（異常音）が検出されたことをユーザに速やかに知らせることができる。 Further, when the volume of the sound emitted at the preset position P exceeds the threshold value, the display 58 blinks and displays the sound source mark SD, so that a loud sound (abnormal sound) whose volume exceeds the threshold value is detected. This can be quickly notified to the user.

また、音声データのパケットＰＫＴのヘッダにプリセット情報が記述されるので、音声データのみでプリセット位置の指向方向の情報を得ることができる。また、マイクアレイ装置ＭＡの記憶部２４にプリセット情報テーブル９０が格納されるので、複数のマイクアレイ装置が設置された場合でも、マイクアレイ装置ＭＡとプリセット情報との対応関係を管理しなくて済む。 Further, since the preset information is described in the header of the packet PKT of the audio data, it is possible to obtain the information on the direction of the preset position with only the audio data. In addition, since the preset information table 90 is stored in the storage unit 24 of the microphone array device MA, even when a plurality of microphone array devices are installed, it is not necessary to manage the correspondence between the microphone array device MA and the preset information. .

（第２の実施形態）
第１の実施形態では、音声処理装置と映像処理装置とは別々に動作としていたが、第２の実施形態では、音声処理装置と映像処理装置とが監視装置として一体化され、ディスプレイの画面にカメラ装置で撮像された映像及び音声マップが同時に表示される場合を示す。 (Second Embodiment)
In the first embodiment, the audio processing device and the video processing device are operated separately. However, in the second embodiment, the audio processing device and the video processing device are integrated as a monitoring device, and are displayed on the display screen. The case where the image | video and audio | voice map imaged with the camera apparatus are displayed simultaneously is shown.

図１２は、第２の実施形態における収音システム５Ａの構成を示すブロック図である。第２の実施形態の収音システムは第１の実施形態とほぼ同一の構成を有する。前記第１の実施形態と同一の構成要素については同一の符号を用いることで、その説明を省略する。 FIG. 12 is a block diagram illustrating a configuration of a sound collection system 5A according to the second embodiment. The sound collection system of the second embodiment has almost the same configuration as that of the first embodiment. The same components as those in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

収音システム５Ａでは、ネットワーク１５に監視装置１００が接続される。監視装置１００は、音声処理部１０５、映像処理部１０７、操作部１１７、ディスプレイ１１８及びスピーカ１１９を有する。 In the sound collection system 5 A, the monitoring device 100 is connected to the network 15. The monitoring apparatus 100 includes an audio processing unit 105, a video processing unit 107, an operation unit 117, a display 118, and a speaker 119.

音声処理部１０５は、第１の実施形態における音声処理装置５０内の信号処理部５１と同様、音源方向検知部５２、指向性形成部５３及び入出力制御部５４の機能を有する。映像処理部１０７は、ユーザからの操作指示に従い、固定カメラ装置Ｃ１〜Ｃｎによって撮像された映像データをディスプレイ１１８に表示させる。 The sound processing unit 105 has the functions of a sound source direction detecting unit 52, a directivity forming unit 53, and an input / output control unit 54, like the signal processing unit 51 in the sound processing device 50 in the first embodiment. The video processing unit 107 displays video data captured by the fixed camera devices C1 to Cn on the display 118 in accordance with an operation instruction from the user.

操作部１１７は、例えばディスプレイ１１８の画面に対応して配置され、ユーザの指又はスタイラスペンによって入力操作が可能なタッチパネル又はタッチパッドを用いて構成される。操作部１１７は、ユーザの操作に応じて、音声データの音量レベルの強調（増幅）を所望する１つ以上の指定箇所の座標のデータを音声処理部１０５に出力する。なお、操作部１１７は、マウス又はキーボード等のポインティングデバイスを用いて構成されても良い。 The operation unit 117 is arranged using, for example, a touch panel or a touch pad that is arranged corresponding to the screen of the display 118 and can be input with a user's finger or stylus pen. The operation unit 117 outputs, to the sound processing unit 105, data of coordinates of one or more designated locations for which enhancement (amplification) of the volume level of the sound data is desired in accordance with a user operation. Note that the operation unit 117 may be configured using a pointing device such as a mouse or a keyboard.

ディスプレイ１１８は、カメラ装置Ｃ１〜Ｃｎによって撮像され、ネットワーク１５を介して転送された、或いはレコーダ装置４０に記録された映像データに基づく映像（画像）を表示する。 The display 118 displays a video (image) based on video data captured by the camera devices C 1 to Cn and transferred via the network 15 or recorded in the recorder device 40.

スピーカ１１９は、マイクアレイ装置ＭＡによって収音され、ネットワーク１５を介して転送された、或いはレコーダ装置４０に記録された音声データ、若しくはその音声データを基に音声処理部１０５によって特定方向への強調処理が行われた音声データを出力する。 The loudspeaker 119 is picked up by the microphone array device MA and transferred via the network 15, or recorded in the recorder device 40, or emphasized in a specific direction by the sound processing unit 105 based on the sound data. The processed voice data is output.

上記構成を有する収音システム５Ａの動作を示す。図１３は、プリセット処理及び監視時における収音手順を示すフローチャートである。図１４は、図１３に続くプリセット処理及び監視時における収音手順を示すフローチャートである。第１の実施形態と同一のステップ処理について同一のステップ番号を付すことで、その説明を省略する。 The operation of the sound collection system 5A having the above configuration will be described. FIG. 13 is a flowchart showing a sound collection procedure during preset processing and monitoring. FIG. 14 is a flowchart showing a sound collection procedure during preset processing and monitoring following FIG. The same step number is assigned to the same step process as in the first embodiment, and the description thereof is omitted.

ステップＳ１では、音声処理部１０５はマイクアレイ装置ＭＡのＩＰアドレスを設定してマイクアレイ装置ＭＡを通信可能な状態にする。更に、音声処理部１０５は、プリセットモードに入り、音声マップ６５をディスプレイ１１８に表示させる。映像処理部１０７は、ネットワーク１５に接続された全てのカメラ装置Ｃ１〜Ｃｎに対し、ブロードキャストを行ってその応答を受信することで、ネットワーク１５に接続されているカメラ装置Ｃ１〜Ｃｎを探索して検出する（Ｓ１Ａ）。 In step S1, the voice processing unit 105 sets the IP address of the microphone array device MA to make the microphone array device MA communicable. Further, the sound processing unit 105 enters the preset mode and displays the sound map 65 on the display 118. The video processing unit 107 searches for the camera devices C1 to Cn connected to the network 15 by broadcasting to all the camera devices C1 to Cn connected to the network 15 and receiving the responses. Detect (S1A).

音声処理部１０５は、探索の結果得られたカメラ装置の総数ｎと各ＩＰアドレスを音声処理部１０５内のメモリ（図示せず）に格納する（Ｓ１Ｂ）。映像処理部１０７は、探索されたカメラ装置Ｃ１〜Ｃｎで撮像された映像をディスプレイ１１８の画面に表示する。図１５は、プリセット処理時に表示されるディスプレイ１１８の画面を示す図である。ディスプレイ１１８の画面の左側には、カメラ装置Ｃ１〜Ｃｎでそれぞれ撮像された映像のサムネイルＳＺ１〜ＳＺ４が選択可能に表示される。特に、サムネイルＳＺ１〜ＳＺ４を区別する必要が無い場合、単にサムネイルＳＺと称する。また、サムネイルＳＺは、カメラ装置Ｃ１〜Ｃｎで撮像された映像から、一定時間毎に静止画を取り出すことで表示される。また、音声処理部１０５は、ディスプレイ１１８の画面の中央から右側に音声マップ６５を表示する。 The voice processing unit 105 stores the total number n of camera devices and each IP address obtained as a result of the search in a memory (not shown) in the voice processing unit 105 (S1B). The video processing unit 107 displays the video captured by the searched camera devices C1 to Cn on the screen of the display 118. FIG. 15 is a diagram showing a screen of the display 118 displayed during the preset process. On the left side of the screen of the display 118, thumbnails SZ1 to SZ4 of images captured by the camera devices C1 to Cn are displayed in a selectable manner. In particular, when there is no need to distinguish the thumbnails SZ1 to SZ4, they are simply referred to as thumbnails SZ. In addition, the thumbnail SZ is displayed by taking out a still image at regular time intervals from videos captured by the camera devices C1 to Cn. In addition, the sound processing unit 105 displays the sound map 65 on the right side from the center of the screen of the display 118.

音声処理部１０５は、カメラ装置の番号を表す変数ｉを初期値０に設定する（Ｓ１Ｃ）。そして、音声処理部１０５は、変数ｉを値１増加させる（Ｓ１Ｄ）。音声処理部１０５は、操作部１１７を介してユーザにより選択されたサムネイルＳＺを受け付ける（Ｓ１Ｅ）。このサムネイルの選択では、ユーザは、ディスプレイ１１８の画面に表示されるカーソル１２３を移動させてサムネイルＳＺを選択する。図１５では、サムネイルＳＺ３が選択されている。選択されたサムネイルＳＺ３の枠は赤色等で強調表示される。なお、サムネイルＳＺの代わりに、前記第１の実施形態と同様、音源マークＳＤが選択されてもよい。 The audio processing unit 105 sets a variable i representing the camera device number to an initial value 0 (S1C). Then, the voice processing unit 105 increases the variable i by 1 (S1D). The audio processing unit 105 receives the thumbnail SZ selected by the user via the operation unit 117 (S1E). In this thumbnail selection, the user moves the cursor 123 displayed on the screen of the display 118 to select the thumbnail SZ. In FIG. 15, the thumbnail SZ3 is selected. The frame of the selected thumbnail SZ3 is highlighted in red or the like. Note that the sound source mark SD may be selected instead of the thumbnail SZ as in the first embodiment.

ユーザは、サムネイルＳＺに対応するカメラ装置Ｃ１〜Ｃｎで撮像された撮像範囲に音源を設置し、所定の音量以上で所定時間発音させる（Ｓ２）。なお、音源の位置は、カメラ装置Ｃ１〜Ｃｎの光軸上でなくてもよく、撮像範囲内であればよい。また、音源としてユーザが撮像範囲内に立って発声してもよいことは前記第１の実施形態と同様である。 The user installs a sound source in the imaging range captured by the camera devices C1 to Cn corresponding to the thumbnail SZ, and generates a sound for a predetermined time at a predetermined volume or more (S2). The position of the sound source does not have to be on the optical axis of the camera devices C1 to Cn, and may be within the imaging range. Further, as in the first embodiment, the user may utter while standing within the imaging range as a sound source.

マイクアレイ装置ＭＡが音源から発せられた所定音量以上の音声を収音し、その音声データを音声処理部１０５に送信すると、音声処理部１０５は、マイクアレイ装置ＭＡから送信された音声データを受信する（Ｓ３）。 When the microphone array device MA picks up a sound of a predetermined volume or more emitted from the sound source and transmits the sound data to the sound processing unit 105, the sound processing unit 105 receives the sound data transmitted from the microphone array device MA. (S3).

音声処理部１０５は、受信した音声データの音量を基に、マイクアレイ装置ＭＡから音源に向かう指向方向（水平角θ及び垂直角φ）を求め、発音源位置を表す音源マークＳＤをディスプレイ１１８に表示された音声マップ６５上に表示する（Ｓ４）。ディスプレイ１１８には、新たな発音源位置を表す音源マークＳＤ３が矩形で描画される（図１５参照）。なお、「レジＲ１」、「レジＲ２」の各音源マークＳＤ１、ＳＤ２は、既に確定されているので、丸形で描画される。更に、音声処理部１０５は、ディスプレイ１１８の画面の右下隅にカメラ名称（例えば場所名）の入力欄１２９を表示して入力を促す。 The sound processing unit 105 obtains a directivity direction (horizontal angle θ and vertical angle φ) from the microphone array apparatus MA toward the sound source based on the volume of the received sound data, and displays a sound source mark SD indicating the sound source position on the display 118. It is displayed on the displayed voice map 65 (S4). On the display 118, a sound source mark SD3 representing a new sound source position is drawn in a rectangular shape (see FIG. 15). Since the sound source marks SD1 and SD2 of “Register R1” and “Register R2” have already been determined, they are drawn in a round shape. Furthermore, the voice processing unit 105 displays an input field 129 for a camera name (for example, a place name) in the lower right corner of the screen of the display 118 to prompt input.

ユーザは、サムネイルＳＺ又は音源マークＳＤを選択し、そのカメラ情報をカメラ名称の入力欄１２９に入力する（Ｓ５Ａ）。なお、ユーザがサムネイルＳＺ又は音源マークＳＤを選択する代わりに、マイクアレイ装置が収音した音声の音量が閾値以上である時間が所定時間続いた場合に、音声処理装置が自動的にその音声の発生方向を認識してディスプレイに音源マークを表示させ、ユーザにカメラ情報の入力を促すようにしてもよい。 The user selects the thumbnail SZ or the sound source mark SD, and inputs the camera information into the camera name input field 129 (S5A). Note that, instead of the user selecting the thumbnail SZ or the sound source mark SD, if the sound volume collected by the microphone array apparatus continues for a predetermined time period, the sound processing apparatus automatically selects the sound. A sound source mark may be displayed on the display by recognizing the generation direction, and the user may be prompted to input camera information.

音声処理部１０５は、音源マークＳＤの水平角θ及び垂直角φを読み出し（Ｓ６）、ステップＳ５Ａで入力されたカメラ情報（カメラ名称とＩＰアドレス）と、マイクアレイ装置ＭＡの指向方向（水平角θ及び垂直角φ）とをマイクアレイ装置ＭＡに送信する（Ｓ７）。マイクアレイ装置ＭＡは、音声処理部１０５から送信されたプリセット情報をプリセット情報テーブル９０に登録して記憶部２４に格納する。 The sound processing unit 105 reads the horizontal angle θ and the vertical angle φ of the sound source mark SD (S6), the camera information (camera name and IP address) input in step S5A, and the directivity direction (horizontal angle) of the microphone array device MA. θ and vertical angle φ) are transmitted to the microphone array device MA (S7). The microphone array apparatus MA registers the preset information transmitted from the audio processing unit 105 in the preset information table 90 and stores it in the storage unit 24.

音声処理部１０５は、変数ｉが探索されたカメラ装置の総数ｎに達したか否かを判別する（Ｓ８Ａ）。変数ｉがカメラ装置の総数ｎに達していない場合、音声処理部１０５はステップＳ１Ｄに戻り、同様の処理を繰り返す。一方、変数ｉがカメラ装置の総数ｎに達した場合、プリセット処理が完了し、運用（監視）時の処理に移行する。 The voice processing unit 105 determines whether or not the variable i has reached the total number n of searched camera devices (S8A). If the variable i has not reached the total number n of camera devices, the audio processing unit 105 returns to step S1D and repeats the same processing. On the other hand, when the variable i reaches the total number n of camera devices, the preset process is completed, and the process proceeds to the operation (monitoring) process.

監視時、音声処理部１０５は、マイクアレイ装置ＭＡから全てのプリセット位置Ｐ１〜Ｐｎを取得し、ディスプレイ１１８に表示された音声マップ６５上に表示する（Ｓ９Ａ）。映像処理部１０７は、カメラ装置Ｃ１〜Ｃｎで撮像された映像を読み込み、ディスプレイ１１８の画面に表示する（Ｓ９Ｂ）。図１６は監視時に表示されるディスプレイ１１８の画面及びスピーカ１１９の発音動作を示す図である。ここでは、８台のカメラ装置Ｃ１〜Ｃ８が設置されている場合を示している。カメラ装置Ｃ１〜Ｃ８でそれぞれ撮像された画像ＧＺ１〜ＧＺ８は、ディスプレイ１１８の画面の左側に表示される。ここでは、画像ＧＺ１〜ＧＺ８は、サムネイルでなく、カメラ装置Ｃ１〜Ｃ８がそれぞれ「レジＲ１」、「レジＲ２」、「レジＲ３」、「入り口」、「雑誌棚Ｔ２」、「雑誌棚Ｔ１」、「通路」、「通用口」を撮像した画像である。 At the time of monitoring, the voice processing unit 105 acquires all the preset positions P1 to Pn from the microphone array device MA and displays them on the voice map 65 displayed on the display 118 (S9A). The video processing unit 107 reads the video captured by the camera devices C1 to Cn and displays it on the screen of the display 118 (S9B). FIG. 16 is a diagram showing the screen of the display 118 displayed during monitoring and the sound generation operation of the speaker 119. Here, a case where eight camera apparatuses C1 to C8 are installed is shown. Images GZ1 to GZ8 captured by the camera devices C1 to C8 are displayed on the left side of the screen of the display 118. Here, the images GZ1 to GZ8 are not thumbnails, and the camera devices C1 to C8 are “Register R1”, “Register R2”, “Register R3”, “Entry”, “Magazine shelf T2”, and “Magazine shelf T1”, respectively. , “Passage” and “Passport”.

ディスプレイ１１８の画面の右側には、音声マップ６５及び操作パネル１４０が表示される。音声マップ６５には、音源マークＳＤ１〜ＳＤ８が表示される。また、操作パネル１４０には、画像ＧＺ１〜ＧＺ８の明るさを調節する輝度ボタン１４１、カメラ装置Ｃ１〜Ｃ８で撮像される映像のフォーカスを調節するフォーカスボタン１４２、カメラ装置Ｃ１〜Ｃ８のいずれかを選択する選択ボタン１４３、音量を調節する音量ボタン１４５、及び指向性収音から全体音を収音する状態に切り替えるためのプリセットボタン１４６が設けられている。 An audio map 65 and an operation panel 140 are displayed on the right side of the screen of the display 118. The sound map 65 displays sound source marks SD1 to SD8. In addition, the operation panel 140 includes any one of a luminance button 141 for adjusting the brightness of the images GZ1 to GZ8, a focus button 142 for adjusting the focus of an image captured by the camera devices C1 to C8, and the camera devices C1 to C8. A selection button 143 for selecting, a volume button 145 for adjusting the volume, and a preset button 146 for switching from directivity sound collection to a state of collecting the whole sound are provided.

スピーカ１１９から音声を出力させる場合、音声処理部１０５は、ユーザによって指定された音源マークＳＤ又は画像ＧＺを受け付ける（Ｓ１０Ａ）。ユーザは、ディスプレイ１１８の画面に表示された音声マップ６５上の音源マークＳＤをカーソル１２３でクリックして選択する、或いはディスプレイ１１８の画面に表示された画像ＧＺ１〜ＧＺ８をカーソル１２３でクリックして選択する。ここでは、画像ＧＺ５或いは音源マークＳＤ５が選択されており、画像ＧＺ５の枠が赤色で強調表示され、音源マークＳＤ５が赤色の背景色を有する矩形で囲まれる。スピーカ１１９は、「雑誌棚Ｔ２」を指向方向とする音声データを出力する。 When outputting sound from the speaker 119, the sound processing unit 105 receives the sound source mark SD or the image GZ designated by the user (S10A). The user selects the sound source mark SD on the sound map 65 displayed on the screen of the display 118 by clicking with the cursor 123 or selects the images GZ1 to GZ8 displayed on the screen of the display 118 with the cursor 123. To do. Here, the image GZ5 or the sound source mark SD5 is selected, the frame of the image GZ5 is highlighted in red, and the sound source mark SD5 is surrounded by a rectangle having a red background color. The speaker 119 outputs audio data in which the “magazine shelf T2” is directed.

一方、新たな指定位置の指定が無ければ（Ｓ１３、ＮＯ）、音声処理装置５０は、電源がＯＦＦになるまで再生を続ける（Ｓ１４）。尚、ユーザがプリセットボタン１４６をクリックしてプリセットテーブルの内容を新規追加、変更、削除でき、次の位置指定があるまで全体の音をモニタリングしても良い。 On the other hand, if there is no designation of a new designated position (S13, NO), the audio processing device 50 continues reproduction until the power is turned off (S14). The user can click the preset button 146 to newly add, change, or delete the contents of the preset table, and the entire sound may be monitored until the next position is designated.

以上により、第２の実施形態の収音システム５Ａは、カメラ装置Ｃ１〜Ｃｎで撮像された映像を実際に見ながら、プリセット位置に対応する音声の指向方向を登録するプリセット処理を行うことができ、プリセット処理時の作業性が向上する。例えば、音源をプリセット位置に置く場合（発音者がプリセット位置に立つ場合も同様）、カメラ装置で撮像される映像の中心に音源を置けば良いことが簡単に分かる。また、監視時、マイクアレイ装置ＭＡの指向方向を切り替える場合、ユーザは、カメラ装置Ｃ１〜Ｃｎで撮像された映像を見て切り替え先を決めることができる。 As described above, the sound collection system 5A of the second embodiment can perform the preset processing for registering the sound directivity direction corresponding to the preset position while actually viewing the video imaged by the camera devices C1 to Cn. Workability at the time of preset processing is improved. For example, when the sound source is placed at a preset position (the same applies when the sounder stands at the preset position), it can be easily understood that the sound source may be placed at the center of the image captured by the camera device. Further, when switching the directivity direction of the microphone array device MA during monitoring, the user can determine the switching destination by looking at the images captured by the camera devices C1 to Cn.

（第３の実施形態）
第１及び第２の実施形態では、プリセット情報はマイクアレイ装置に格納されたが、第３の実施形態では、複数のマイクアレイ装置が設置され、監視装置がプリセット情報を一元管理する場合を示す。 (Third embodiment)
In the first and second embodiments, the preset information is stored in the microphone array device. In the third embodiment, a plurality of microphone array devices are installed, and the monitoring device centrally manages the preset information. .

図１７は、第３の実施形態における収音システム５Ｂの構成を示すブロック図である。第３の実施形態の収音システムは第１の実施形態とほぼ同一の構成を有する。第１の実施形態と同一の構成要素については同一の符号を用いることで、その説明を省略する。 FIG. 17 is a block diagram illustrating a configuration of a sound collection system 5B according to the third embodiment. The sound collection system of the third embodiment has almost the same configuration as that of the first embodiment. About the same component as 1st Embodiment, the description is abbreviate | omitted by using the same code | symbol.

ネットワーク１５には、複数のマイクアレイ装置ＭＡ１〜ＭＡｍが接続される。複数のマイクアレイ装置ＭＡ〜ＭＡｍは、第１、第２の実施形態と異なり、プリセット情報を格納する記憶部を有していない。本実施形態では、２つのマイクアレイ装置ＭＡ１、ＭＡ２がネットワーク１５に接続された場合を示す。なお、３つ以上のマイクアレイ装置が接続されてもよい。 A plurality of microphone array devices MA1 to MAm are connected to the network 15. Unlike the first and second embodiments, the plurality of microphone array devices MA to MAm do not have a storage unit for storing preset information. In the present embodiment, a case where two microphone array devices MA1 and MA2 are connected to the network 15 is shown. Three or more microphone array devices may be connected.

監視装置１００Ａは、プリセット情報が登録されるプリセット情報テーブル１３０を格納するテーブルメモリ１１０を有する。図１８は、テーブルメモリ１１０に格納されたプリセット情報テーブル１３０の登録内容を示す図である。 The monitoring apparatus 100A includes a table memory 110 that stores a preset information table 130 in which preset information is registered. FIG. 18 is a diagram showing the registration contents of the preset information table 130 stored in the table memory 110.

プリセット情報テーブル１３０には、場所、プリセット値、及びカメラＩＰアドレスが登録されている。また、プリセット値として、マイクアレイ装置の番号（ＭｉｃＮｏ．）、指向方向、及び指向性制御パラメータが登録される。指向性制御パラメータは、指向性フィルタの係数であり、それぞれの指向方向で学習制御を行うことで決まる。 In the preset information table 130, a location, a preset value, and a camera IP address are registered. Also, the microphone array device number (Mic No.), directivity direction, and directivity control parameter are registered as preset values. The directivity control parameter is a coefficient of the directivity filter, and is determined by performing learning control in each directivity direction.

プリセット情報として、例えば、場所：レジＲ１、ＭｉｃＮｏ．：ＭＡ１、指向方向：（θ_１１，φ_１１）、指向性制御パラメータ（ｐ_１１１，…，ｐ_１１ｑ）、カメラＩＰアドレス：「165.254.10.11」が登録されている。また、雑誌棚では、マイクアレイ装置ＭＡ１とマイクアレイ装置ＭＡ２によって、プリセット情報が重複して登録されている。即ち、プリセット情報として、場所：雑誌棚、ＭｉｃＮｏ．：ＭＡ１、指向方向：（θ_１３，φ_１３）、指向性制御パラメータ（ｐ_１３１，…，ｐ_１３ｑ）、カメラＩＰアドレス：「165.254.10.13」、及び場所：雑誌棚、ＭｉｃＮｏ．：ＭＡ２、指向方向：（θ_２３，φ_２３）、指向性制御パラメータ（ｐ_２３１，…，ｐ_２３ｑ）、カメラＩＰアドレス：「165.254.10.13」の両方が登録されている。同じプリセット位置に対し、プリセット情報が重複して登録されている場合、２つのマイクアレイ装置ＭＡ１、ＭＡ２で収音される音声のうち、音量が大きい方のプリセット情報が優先して用いられ、このプリセット情報に対応するマイクアレイ装置ＭＡで収音された音声がスピーカ１１９から出力される。 As preset information, for example, location: cash register R1, Mic No. : MA1, directivity direction: (θ ₁₁ , φ ₁₁ ), directivity control parameters (p ₁₁₁ ,..., P _11q ), camera IP address: “165.254.10.11” are registered. In the magazine shelf, preset information is registered in duplicate by the microphone array device MA1 and the microphone array device MA2. That is, as preset information, location: magazine shelf, Mic No. : MA1, directivity direction: (θ ₁₃ , φ ₁₃ ), directivity control parameter (p ₁₃₁ ,..., P _13q ), camera IP address: “165.254.10.13”, and place: magazine shelf, Mic No. : MA2, directivity direction: (θ ₂₃ , φ ₂₃ ), directivity control parameter (p ₂₃₁ ,..., P _23q ), and camera IP address: “165.254.10.13” are registered. When preset information is registered in duplicate for the same preset position, the preset information with the larger volume among the sounds collected by the two microphone array devices MA1 and MA2 is used preferentially. The sound collected by the microphone array apparatus MA corresponding to the preset information is output from the speaker 119.

上記構成を有する収音システム５Ｂの動作を示す。図１９は、プリセット処理手順を示すフローチャートである。第１、第２の実施形態と同一のステップ処理については同一ステップ番号を付すことでその説明を省略する。また、２つのマイクアレイ装置ＭＡ１、ＭＡ２がある場合を示す。 Operation | movement of the sound collection system 5B which has the said structure is shown. FIG. 19 is a flowchart showing a preset processing procedure. About the same step process as 1st, 2nd embodiment, the description is abbreviate | omitted by attaching | subjecting the same step number. Moreover, the case where there are two microphone array devices MA1 and MA2 is shown.

監視装置１００Ａ内の音声処理部１０５は、マイクアレイ装置ＭＡ１、ＭＡ２のそれぞれに対し、ＩＰアドレスを設定してマイクアレイ装置ＭＡ１、ＭＡ２を通信可能な状態にする（ステップＳ１Ｚ）。更に、音声処理部１０５は、プリセットモードに入り、２つの音声マップ６５Ａ、６５Ｂをディスプレイ１１８に表示させる。映像処理部１０７は、ネットワーク１５に接続された全てのカメラ装置Ｃ１〜Ｃｎに対し、ブロードキャストを行ってその応答を受信することで、ネットワーク１５に接続されているカメラ装置Ｃ１〜Ｃｎを探索する（Ｓ１Ａ）。 The audio processing unit 105 in the monitoring device 100A sets an IP address for each of the microphone array devices MA1 and MA2 so that the microphone array devices MA1 and MA2 can communicate with each other (step S1Z). Further, the sound processing unit 105 enters the preset mode and displays two sound maps 65A and 65B on the display 118. The video processing unit 107 searches for the camera devices C1 to Cn connected to the network 15 by broadcasting to all the camera devices C1 to Cn connected to the network 15 and receiving a response thereof ( S1A).

音声処理部１０５は、探索の結果得られたカメラ装置の総数ｎと各ＩＰアドレスを音声処理部１０５内のメモリ（図示せず）に格納する（Ｓ１Ｂ）。映像処理部１０７は、探索されたカメラ装置Ｃ１〜Ｃｎで撮像された映像をディスプレイ１１８の画面に表示する。図２０は、プリセット処理時に表示されるディスプレイ１１８の画面を示す図である。ディスプレイ１１８の画面の左側には、カメラ装置Ｃ１〜Ｃｎでそれぞれ撮像された映像のサムネイルＳＺ１〜ＳＺ４が選択可能に表示される。特に、サムネイルＳＺ１〜ＳＺ４を区別する必要が無い場合、単にサムネイルＳＺと総称する。また、サムネイルＳＺは、カメラ装置Ｃ１〜Ｃｎで撮像された映像から、一定時間毎に静止画を取り出すことで表示される。また、音声処理部１０５は、ディスプレイ１１８の画面の中央から右側に２つの音声マップ６５Ａ、６５Ｂを表示する。 The voice processing unit 105 stores the total number n of camera devices and each IP address obtained as a result of the search in a memory (not shown) in the voice processing unit 105 (S1B). The video processing unit 107 displays the video captured by the searched camera devices C1 to Cn on the screen of the display 118. FIG. 20 is a diagram showing a screen of the display 118 displayed during the preset process. On the left side of the screen of the display 118, thumbnails SZ1 to SZ4 of images captured by the camera devices C1 to Cn are displayed in a selectable manner. In particular, when there is no need to distinguish the thumbnails SZ1 to SZ4, they are simply referred to as thumbnails SZ. In addition, the thumbnail SZ is displayed by taking out a still image at regular time intervals from videos captured by the camera devices C1 to Cn. In addition, the sound processing unit 105 displays two sound maps 65A and 65B on the right side from the center of the screen of the display 118.

ステップＳ２で、ユーザがサムネイルＳＺに対応するカメラ装置Ｃ１〜Ｃｎで撮像された撮像範囲に音源を設置し、所定の音量以上で所定時間発音させると、マイクアレイ装置ＭＡ１〜ＭＡｍがそれぞれ音源から発せられた所定音量以上の音声を収音し、各音声データを音声処理部１０５に送信する。音声処理部１０５は、各々のマイクアレイ装置ＭＡ１、ＭＡ２から送信された音声データを受信する（Ｓ３Ａ）。 In step S2, when the user installs a sound source in the imaging range imaged by the camera devices C1 to Cn corresponding to the thumbnail SZ and generates sound for a predetermined time at a predetermined sound volume or higher, the microphone array devices MA1 to MAm respectively emit sound from the sound source. The sound of the predetermined volume or higher is collected, and each sound data is transmitted to the sound processing unit 105. The audio processing unit 105 receives the audio data transmitted from each of the microphone array devices MA1 and MA2 (S3A).

音声処理部１０５は、各々のマイクアレイ装置ＭＡ１、ＭＡ２から受信した音声データの音量を基に、マイクアレイ装置ＭＡ１、ＭＡ２から音源に向かう指向方向（水平角θ及び垂直角φ）をそれぞれ求め、発音源位置を表す音源マークＳＤ（ＳＤ３Ａ、ＳＤ３Ｂ）をディスプレイ１１８に表示された音声マップ６５Ａ、６５Ｂ上に表示する（Ｓ４Ａ）。ディスプレイ１１８には、新たな発音源位置を表す音源マークＳＤ３Ａ、ＳＤ３Ｂが矩形で描画される（図２０参照）。なお、「レジＲ１」、「レジＲ２」の音源マークＳＤ１、ＳＤ２は、既に確定されているので、丸形で描画される。更に、音声処理部１０５は、ディスプレイ１１８の画面の右下隅にカメラ名称（例えば場所名）の入力欄１２９を表示して入力を促す。ここでは、マイクアレイ装置ＭＡ２で収音された音声の方が音量が大きいので、この音源マークＳＤ３Ｂの矩形の大きさが、マイクアレイ装置ＭＡ１に対応する音源マークＳＤ３Ａと比べて大きい。 The sound processing unit 105 obtains the directivity directions (horizontal angle θ and vertical angle φ) from the microphone array devices MA1 and MA2 toward the sound source based on the volume of the sound data received from the respective microphone array devices MA1 and MA2. The sound source mark SD (SD3A, SD3B) representing the sound source position is displayed on the voice maps 65A, 65B displayed on the display 118 (S4A). On the display 118, sound source marks SD3A and SD3B representing new sound source positions are drawn in a rectangular shape (see FIG. 20). Since the sound source marks SD1 and SD2 of “Register R1” and “Register R2” have already been determined, they are drawn in a round shape. Furthermore, the voice processing unit 105 displays an input field 129 for a camera name (for example, a place name) in the lower right corner of the screen of the display 118 to prompt input. Here, since the sound collected by the microphone array apparatus MA2 has a higher volume, the rectangular size of the sound source mark SD3B is larger than that of the sound source mark SD3A corresponding to the microphone array apparatus MA1.

音声処理部１０５は、プリセット処理の対象となる複数のマイクアレイ装置（ここではマイクアレイ装置ＭＡ１、ＭＡ２）のいずれかを選択する（Ｓ４Ｂ）。複数のマイクアレイ装置で所定音量以上の音声が収音された場合、マイクアレイ装置ＭＡ１、ＭＡ２の選択は、次の３通りの方法のいずれかで行われる。第１の方法では、音声処理部１０５が、マイクアレイ装置ＭＡ１、ＭＡ２のうち、音量の大きい方のマイクアレイ装置を選択する。第２の方法では、ユーザがマイクアレイ装置ＭＡ１、ＭＡ２の一方を選択する。第３の方法では、音量を閾値と比較し、閾値以上の音量で収音したマイクアレイ装置を選択する。この場合、マイクアレイ装置が複数選択される場合もある。 The audio processing unit 105 selects one of a plurality of microphone array devices (here, the microphone array devices MA1 and MA2) to be subjected to preset processing (S4B). When sound having a predetermined volume or higher is picked up by a plurality of microphone array devices, the microphone array devices MA1 and MA2 are selected by one of the following three methods. In the first method, the sound processing unit 105 selects a microphone array device having a larger volume among the microphone array devices MA1 and MA2. In the second method, the user selects one of the microphone array devices MA1 and MA2. In the third method, the sound volume is compared with a threshold value, and a microphone array apparatus that picks up sound with a sound volume equal to or higher than the threshold value is selected. In this case, a plurality of microphone array devices may be selected.

ユーザは、選択されたマイクアレイ装置ＭＡに対し、サムネイルＳＺ又は音源マークＳＤを選択し、そのカメラ情報をカメラ名称の入力欄１２９に入力する（Ｓ５Ｂ）。この後のステップＳ６〜Ｓ８Ａの処理は、第２の実施形態と同様である。ステップＳ８Ａで変数ｉがカメラ装置の総数ｎに達すると、音声処理部１０５は本動作を終了する。 The user selects the thumbnail SZ or the sound source mark SD for the selected microphone array device MA, and inputs the camera information to the camera name input field 129 (S5B). The subsequent steps S6 to S8A are the same as those in the second embodiment. When the variable i reaches the total number n of camera devices in step S8A, the sound processing unit 105 ends this operation.

図２１は、監視時における収音手順を示すフローチャートである。監視装置１００Ａ内の映像処理部１０７は、カメラ装置Ｃ１〜Ｃｎのいずれかを選択し、選択されたカメラ装置Ｃ１〜Ｃｎで撮像された映像をディスプレイ１１８に表示させる（Ｓ２１）。 FIG. 21 is a flowchart showing a sound collection procedure during monitoring. The video processing unit 107 in the monitoring device 100A selects any one of the camera devices C1 to Cn, and displays the video imaged by the selected camera devices C1 to Cn on the display 118 (S21).

図２２は、監視時に表示される表示されるディスプレイ１１８の画面及びスピーカ１１９の発音動作を示す図である。ディスプレイ１１８の画面の左側には、各種の項目のプルダウンメニュー１６０が表示される。ここでは、機器ツリーのプルダウンメニューが展開され、カメラ装置Ｃ２が選択された状態にある。ディスプレイ１１８の画面の略中央の上部には、選択されたカメラ装置Ｃ２で撮像された映像が表示されるモニタ画面１５０が配置される。ディスプレイ１１８の画面の略中央の下部には、操作パネル１４０Ａが配置される。操作パネル１４０Ａには、映像の明るさを調節する輝度ボタン１４１、カメラ装置Ｃ１〜Ｃ８で撮像される映像のフォーカスを調節するフォーカスボタン１４２Ａ、カメラ装置Ｃ１〜Ｃ８のいずれかを選択する選択ボタン１４３、ズーミング操作を行うズームボタン１４７、及び新たにプリセット位置を追加する場合に入力されるプリセット入力欄１４６Ａが設けられている。 FIG. 22 is a diagram illustrating the screen of the display 118 displayed during monitoring and the sound generation operation of the speaker 119. On the left side of the screen of the display 118, a pull-down menu 160 for various items is displayed. Here, the pull-down menu of the device tree is expanded and the camera device C2 is selected. A monitor screen 150 on which an image picked up by the selected camera device C2 is displayed is arranged at the upper part of the center of the display 118. An operation panel 140 A is arranged at the lower portion of the center of the display 118. The operation panel 140A includes a brightness button 141 for adjusting the brightness of the image, a focus button 142A for adjusting the focus of the image captured by the camera devices C1 to C8, and a selection button 143 for selecting any of the camera devices C1 to C8. A zoom button 147 for performing a zooming operation, and a preset input field 146A for inputting a preset position newly are provided.

音声処理部１０５は、選択されたカメラ情報に対応するプリセット情報を読み込む（Ｓ２２）。音声処理部１０５は、このプリセット情報から得られる指向方向（水平角θ，垂直角φ）に音声データの指向性を形成する（Ｓ２３）。音声処理部１０５は、プリセット処理されたマイクアレイ装置ＭＡが複数であるか否かを判別する（Ｓ２４）。複数のマイクアレイ装置ＭＡがある場合、音声処理部１０５は、例えばプリセット処理時に決定された音量の一番大きいマイクアレイ装置ＭＡを選択する（Ｓ２５）。 The audio processing unit 105 reads preset information corresponding to the selected camera information (S22). The audio processing unit 105 forms the directivity of the audio data in the directivity direction (horizontal angle θ, vertical angle φ) obtained from the preset information (S23). The audio processing unit 105 determines whether or not there are a plurality of preset microphone array devices MA (S24). When there are a plurality of microphone array devices MA, the audio processing unit 105 selects, for example, the microphone array device MA having the largest volume determined during the preset process (S25).

音声処理部１０５は、選択されたマイクアレイ装置ＭＡで指向性が形成された音声データをスピーカ１１９から出力する（Ｓ２６）。図２２では、スピーカ１１９から「いらっしゃいませ」の音声が出力されている。この後、音声処理部１０５は、ステップＳ２１に戻り、同様の動作を繰り返す。 The audio processing unit 105 outputs the audio data in which directivity is formed by the selected microphone array device MA from the speaker 119 (S26). In FIG. 22, “I welcome you” is output from the speaker 119. Thereafter, the sound processing unit 105 returns to step S21 and repeats the same operation.

以上により、第３の実施形態の収音システム５Ｂでは、複数のマイクアレイ装置ＭＡが備わるので、店舗内で発音された音声をユーザが聞き取り易い方のマイクアレイ装置を用いて収音できる。また、プリセット位置で複数のマイクアレイ装置が所定音量以上の音声を収音可能である場合、一番大きな音量で収音したマイクアレイ装置に対し、プリセット処理を行うことで、小さな音でも漏らさずに聞き取ることが可能となる。更に、１つのマイクアレイ装置が故障した場合でも、他のマイクアレイ装置を使って、同じプリセット位置における音声を聞くことができる。 As described above, since the sound collection system 5B of the third embodiment includes the plurality of microphone array devices MA, sound collected in the store can be collected using the microphone array device that is easy for the user to hear. In addition, if multiple microphone array devices can pick up sound at a predetermined volume or higher at the preset position, the microphone array device that picks up the sound at the loudest volume will perform preset processing so that even small sounds will not be leaked. Can be heard. Furthermore, even when one microphone array device fails, the other microphone array device can be used to listen to the sound at the same preset position.

また、監視装置１００Ａがプリセット情報を一元管理しているので、各マイクアレイ装置はプリセット情報を格納する記憶部を有しなくてよく、構成を単純化できる。また、音声処理部１０５が各マイクアレイ装置ＭＡにプリセット情報を送信しなくて済み、処理の負荷を軽減できるとともに、ネットワークのトラフィックの軽減に繋がる。 Further, since the monitoring apparatus 100A centrally manages the preset information, each microphone array apparatus does not need to have a storage unit for storing the preset information, and the configuration can be simplified. In addition, the voice processing unit 105 does not need to transmit preset information to each microphone array apparatus MA, so that the processing load can be reduced and network traffic can be reduced.

（変形例１）
図２３は、第３の実施形態の変形例１における監視時に表示されるディスプレイ１１８Ａの画面を示す図である。ディスプレイ１１８Ａの画面の下部を除く部分には、９分割されたモニタ画面が配置される。モニタ画面には、カメラ装置Ｃ１〜Ｃ９で撮像された画像ＧＺ１Ａ〜ＧＺ８Ａ，ＧＺ９がやや大きめに表示される。また、画面の下部左側には、プルダウンメニュー１６０Ａが配置される。また、画面の下部右側には、操作パネル１４０Ｂが配置される。操作パネル１４０Ｂは、プリセットボタン１４６の代わりにプリセット入力欄１４６Ａが配置される他、第２の実施形態と同様である。変形例１では、音声マップは表示されない。 (Modification 1)
FIG. 23 is a diagram illustrating a screen of the display 118 A displayed at the time of monitoring in the first modification of the third embodiment. A monitor screen divided into nine parts is arranged in a portion excluding the lower part of the screen of the display 118A. On the monitor screen, images GZ1A to GZ8A and GZ9 captured by the camera devices C1 to C9 are displayed slightly larger. A pull-down menu 160A is arranged on the lower left side of the screen. An operation panel 140B is arranged on the lower right side of the screen. The operation panel 140B is the same as the second embodiment except that a preset input field 146A is arranged instead of the preset button 146. In the first modification, the audio map is not displayed.

変形例１の監視装置では、ユーザがディスプレイ１１８Ａの画面に表示された複数の映像の中から聞きたい場所を選択すると、スピーカ１１９は、撮影された場所の音声を出力する。また、音声処理部１０５が大きな音が発生した音声データを受信すると、映像処理部１０７は、大きな音が発生した場所の映像の枠の色を変えることで、ユーザにその発生場所を知らせる。なお、音声の切り替えは、手動又は自動で行われる。 In the monitoring apparatus according to the first modification, when the user selects a place to be heard from among a plurality of videos displayed on the screen of the display 118 A, the speaker 119 outputs the sound of the taken place. In addition, when the audio processing unit 105 receives audio data in which a loud sound is generated, the video processing unit 107 notifies the user of the generation location by changing the color of the frame of the video where the large sound is generated. Note that the sound switching is performed manually or automatically.

（変形例２）
第３の実施形態の変形例２では、複数のマイクアレイ装置ＭＡ１〜ＭＡｎが記憶部を有し、監視装置が複数のマイクアレイ装置ＭＡ１〜ＭＡｎからそれぞれ受信したプリセット情報を統合し、１つのプリセット情報テーブルを作成する。監視装置は、作成したプリセット情報テーブルをテーブルメモリに格納する。また、テーブルメモリにプリセット情報テーブルとして格納された、統合したプリセット情報は各々のマイクアレイ装置ＭＡ１〜ＭＡｎに送信される。 (Modification 2)
In the second modification of the third embodiment, the plurality of microphone array devices MA1 to MAn each have a storage unit, and the monitoring device integrates preset information received from each of the plurality of microphone array devices MA1 to MAn. Create an information table. The monitoring apparatus stores the created preset information table in the table memory. The integrated preset information stored as a preset information table in the table memory is transmitted to each of the microphone array devices MA1 to MAn.

これにより、新たなマイクアレイ装置が接続された場合でも、監視装置は新たにプリセット処理を行う必要がなく、マイクアレイ装置からプリセット情報を取得して統合するだけで、新たなプリセット情報が登録されたプリセット情報テーブルを得ることができる。また、別の監視装置が追加された場合でも、マイクアレイ装置から別の監視装置にプリセット情報を送信し、別の監視装置が複数のマイクアレイ装置から送信されたプリセット情報を統合することで、プリセット情報テーブルを得ることができる。このように、複数のマイクアレイ装置と複数の監視装置とを組み合わせた収音システムの構築を簡単に行うことができる。 As a result, even when a new microphone array device is connected, the monitoring device does not need to perform a new preset process. New preset information is registered simply by acquiring and integrating preset information from the microphone array device. A preset information table can be obtained. In addition, even when another monitoring device is added, by transmitting preset information from the microphone array device to another monitoring device, by integrating the preset information transmitted from the plurality of microphone array devices by another monitoring device, A preset information table can be obtained. In this way, it is possible to easily construct a sound collection system in which a plurality of microphone array devices and a plurality of monitoring devices are combined.

（変形例３）
第３の実施形態の変形例３では、複数のカメラ装置Ｃ１〜Ｃｎのうち、１台のカメラ装置は、監視装置から遠隔操作可能なパンチルト機能、ズームイン機能及びズームアウト機能（以下、ＰＴＺ機能という）を有するＰＴＺカメラである。ＰＴＺカメラは、予め特定した場所をプリセット値とし、そのパン・チルト角及びズーム値をメモリに格納しておく。 (Modification 3)
In the third modification of the third embodiment, one of the plurality of camera devices C1 to Cn has a pan / tilt function, a zoom-in function, and a zoom-out function (hereinafter referred to as a PTZ function) that can be remotely operated from the monitoring device. ) Having a PTZ camera. The PTZ camera uses preset locations as preset values, and stores the pan / tilt angles and zoom values in a memory.

ＰＴＺカメラに複数のプリセット位置が設定されている場合、複数のカメラ装置Ｃ１〜ＣｎのうちＰＴＺカメラ以外に固定カメラが含まれていれば、マイクのプリセットを行なう回数は、カメラ総数ｎではなく、固定カメラの台数とＰＴＺカメラのプリセット数との和を考慮したプリセット数Ｎとなる。 When a plurality of preset positions are set in the PTZ camera, if a fixed camera is included in addition to the PTZ camera among the plurality of camera devices C1 to Cn, the number of times of performing the microphone preset is not the total number n of cameras, The preset number N takes into consideration the sum of the number of fixed cameras and the preset number of PTZ cameras.

図２４は、第３の実施形態の変形例３におけるプリセット情報テーブル１３０Ａの登録内容を示すテーブルである。プリセット情報テーブル１３０Ａには、場所、マイクプリセット値、カメラＩＰアドレス及びカメラプリセット値が登録される。場所、マイクプリセット値及びカメラＩＰアドレスの登録については、図１８に示すプリセット情報テーブル１３０と同様である。新たなカメラプリセット値については、固定カメラの場合、撮像位置が変わらないので、カメラプリセット値は「Ｎｕｌｌ」である。一方、ＰＴＺカメラの場合、ＰＴＺカメラから見た撮像位置（言い換えると、マイクアレイ装置からの指向方向）は雑誌棚Ｔ２と通路Ｕ１とで変わるので、カメラプリセット値は「ＰＴ１」、「ＰＴ２」である。 FIG. 24 is a table showing registration contents of the preset information table 130A in the third modification of the third embodiment. In the preset information table 130A, a location, a microphone preset value, a camera IP address, and a camera preset value are registered. The registration of the location, microphone preset value, and camera IP address is the same as in the preset information table 130 shown in FIG. As for the new camera preset value, since the imaging position does not change in the case of a fixed camera, the camera preset value is “Null”. On the other hand, in the case of a PTZ camera, the imaging position viewed from the PTZ camera (in other words, the direction of orientation from the microphone array device) changes between the magazine shelf T2 and the passage U1, so the camera preset values are “PT1” and “PT2”. is there.

雑誌棚Ｔ２、通路Ｕ１のようなＰＴＺカメラが撮像する対象エリア（場所）が選択された場合、監視装置は、マイクアレイ装置の指向方向の音声データを読み出すと同時に、ＰＴＺカメラにカメラプリセット値を送信する。ＰＴＺカメラは、プリセット値に対応する撮像方向の映像を撮像する。ＰＴＺカメラを用いることで、監視対象となる撮像エリアを容易に変えることができる。なお、変形例３では、固定カメラの代わりとして、ＰＴＺカメラを用いたが、全方位カメラが用いられてもよい。 When a target area (location) to be imaged by the PTZ camera such as the magazine shelf T2 and the passage U1 is selected, the monitoring device reads the audio data in the direction of the microphone array device, and at the same time, sets the camera preset value to the PTZ camera. Send. The PTZ camera captures an image in the imaging direction corresponding to the preset value. By using the PTZ camera, the imaging area to be monitored can be easily changed. In Modification 3, a PTZ camera is used instead of a fixed camera, but an omnidirectional camera may be used.

以上、図面を参照しながら各種の実施形態について説明したが、本発明はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 While various embodiments have been described above with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood.

本発明は、カメラとマイクアレイとの互いの位置関係が不明である場合でも、既定の撮像位置に指向性を適正に形成し、既定の撮像位置における音声を明瞭に出力する、収音システム及び収音設定方法として有用である。 The present invention provides a sound collection system that appropriately forms directivity at a predetermined imaging position and clearly outputs sound at the predetermined imaging position even when the positional relationship between the camera and the microphone array is unknown. This is useful as a sound collection setting method.

５、５Ａ、５Ｂ収音システム
１０店舗
１５ネットワーク
２４記憶部
２５符号化部
２６送信部
４０レコーダ装置
５０音声処理装置
５１信号処理部
５２音源方向検知部
５３指向性形成部
５４入出力制御部
５５メモリ
５６通信部
５７、１１７操作部
５８、１１８ディスプレイ
５９、１１９スピーカ
６５、６５Ａ、６５Ｂ音声マップ
６５ｈ、６５ｉ、６５ｊ同心円
６５ｍ線分
６７カメラ情報
７０映像処理装置
７１カメラモニタ
８１発声者
８７、１２３カーソル
８８入力欄
９０、１３０、１３０Ａプリセット情報テーブル
１００監視装置
１０５音声処理部
１０７映像処理部
１１０テーブルメモリ
１２９入力欄
１４０、１４０Ａ、１４０Ｂ操作パネル
１４１輝度ボタン
１４２、１４２Ａフォーカスボタン
１４３選択ボタン
１４５音量ボタン
１４６プリセットボタン
１４６Ａプリセット入力欄
１４７ズームボタン
１５０モニタ画面
１６０、１６０Ａプルダウンメニュー
Ａ１〜ＡｎＡ／Ｄ変換器
Ｃ１〜Ｃｎカメラ装置
ＣＲ１〜ＣＲｎ撮像範囲
ＦＬＲ床面
ＧＺ１〜ＧＺｎ、ＧＺ１Ａ〜ＧＺｎＡ画像
Ｍ１〜Ｍｎマイクロホン
ＭＡ、ＭＡ１〜ＭＡｎマイクアレイ装置
Ｏ中心点
Ｐ１〜Ｐｎプリセット位置
ＰＡ１〜ＰＡｎ増幅器（アンプ）
ＰＫＴパケット
ＲＦ天井
ＳＤ、ＳＤ１〜ＳＤｎ、ＳＤ３Ａ、ＳＤ３Ｂ音源マーク
ＳＺ１〜ＳＺ４サムネイル 5, 5A, 5B Sound collection system 10 Store 15 Network 24 Storage unit 25 Coding unit 26 Transmitting unit 40 Recorder device 50 Audio processing device 51 Signal processing unit
52 Sound source direction detection unit 53 Directivity forming unit 54 Input / output control unit 55 Memory 56 Communication unit 57, 117 Operation unit 58, 118 Display 59, 119 Speaker 65, 65A, 65B Audio map 65h, 65i, 65j Concentric circle 65m Line segment 67 Camera information 70 Video processing device 71 Camera monitor 81 Speaker 87, 123 Cursor 88 Input field 90, 130, 130A Preset information table 100 Monitoring device 105 Audio processing unit 107 Video processing unit 110 Table memory 129 Input field 140, 140A, 140B Operation Panel 141 Brightness button 142, 142A Focus button 143 Select button 145 Volume button 146 Preset button 146A Preset input field 147 Zoom button 150 Monitor screen 160, 160A Down menu A1-An A / D converter C1-Cn Camera device CR1-CRn Imaging range FLR Floor GZ1-GZn, GZ1A-GSnA Image M1-Mn Microphone MA, MA1-MAn Microphone array device O Center point P1-Pn Preset Position PA1 to PAn Amplifier (Amplifier)
PKT packet RF ceiling SD, SD1-SDn, SD3A, SD3B Sound source mark SZ1-SZ4 Thumbnail

Claims

A sound collection unit having a plurality of sound collection elements, and collecting sound by the sound collection elements;
At least one imaging unit for imaging a predetermined position;
A sound source detection unit that detects a sound source direction from the sound collection unit, based on the sound data of the collected sound;
A display unit that displays a detected sound source direction from the sound pickup unit according to a predetermined sound output at the predetermined position;
An operation unit that receives input of information related to the imaging unit that images the predetermined position in accordance with the designation of the sound source direction from the displayed sound collection unit;
A storage unit that stores correspondence information in which the input information about the imaging unit and the sound source direction from the sound collection unit are associated with each other;
Sound collection system.

The sound collection system according to claim 1,
Based on the correspondence information, a directivity forming unit that forms the directivity of the voice data of the sound collected by the sound collection unit in the sound source direction associated with the predetermined position;
An output unit that outputs voice data in which directivity is formed by the directivity forming unit;
Sound collection system.

The sound collection system according to claim 2,
The display unit displays a marker indicating the sound source direction from the sound collection unit on a concentric sound map centering on the position of the sound collection unit, the center angle being a horizontal angle and the length of the radius representing a vertical angle. To
Sound collection system.

The sound collection system according to claim 3,
The operation unit accepts designation of the marker displayed on the voice map,
The directivity forming unit forms the directivity of the voice data of the voice collected by the sound collecting unit in a sound source direction corresponding to the designated marker.
Sound collection system.

The sound collection system according to claim 4,
The display unit identifies the marker corresponding to the sound source direction in which the directivity is formed on the audio map when the volume of the audio data in which directivity is formed by the directivity forming unit exceeds a threshold value. Display as possible,
Sound collection system.

The sound collection system according to claim 2,
The sound collection unit includes the storage unit that stores the correspondence information, and transmits data in which the correspondence information is added to voice data of the collected voice to the directivity forming unit.
Sound collection system.

The sound collection system according to claim 1,
A plurality of the imaging units are provided,
The display unit displays an image of each predetermined position imaged by each imaging unit,
The operation unit designates the sound source direction by a selection operation of the image at the predetermined position displayed on the display unit.
Sound collection system.

The sound collection system according to claim 1,
A plurality of the sound collection unit and the imaging unit are provided,
The storage unit stores, for each of the plurality of sound collection units, the correspondence information in which information on any one of the imaging units and a sound source direction from the sound collection unit are associated with each other.
Sound collection system.

The sound collection system according to claim 8,
The correspondence information includes the sound source directions by a plurality of the sound collection units at the overlapped default positions.
Sound collection system.

A sound collection setting method in a sound collection system including at least one image pickup unit that picks up a predetermined position and a sound pickup unit,
Collecting a predetermined output sound of the sound source placed at the predetermined position by the sound collecting unit having a plurality of sound collecting elements;
Detecting the sound source direction from the sound collection unit based on the sound data of the sound collected by the sound collection unit;
Displaying the detected sound source direction from the sound collection unit on a display unit;
A sound source direction from the sound collection unit displayed on the display unit is designated;
In response to designation of the sound source direction, information regarding the imaging unit that images the predetermined position is input;
Storing correspondence information in which information about the input image capturing unit and a sound source direction from the sound collecting unit are associated with each other in a storage unit,
Sound pickup setting method.