JP5192414B2

JP5192414B2 - Audio information display system

Info

Publication number: JP5192414B2
Application number: JP2009026090A
Authority: JP
Inventors: 直人秋良; 達彦影広; 恒弥栗原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-02-06
Filing date: 2009-02-06
Publication date: 2013-05-08
Anticipated expiration: 2029-02-06
Also published as: JP2010183417A

Description

本発明は、複数のカメラ及び複数のマイクを備える映像表示システムに関する。 The present invention relates to a video display system including a plurality of cameras and a plurality of microphones.

ＷｅｂカメラなどのＩＰ接続可能な撮像装置が普及することにより、ＩＰネットワークを利用した広域・大規模監視システムを構築することが可能となった。このような監視システムでは、大量の監視画像を取得することが可能となり、多角的なモニタリングが可能となる。最近では、監視モニタ端末の画面に、１００台以上のカメラの映像を並べて表示し、監視業務を行うといった用途も存在する。 With the widespread use of IP-capable imaging devices such as Web cameras, it has become possible to construct a wide-area and large-scale monitoring system using an IP network. In such a monitoring system, it is possible to acquire a large amount of monitoring images, and multi-faceted monitoring is possible. Recently, there are also applications in which images of 100 or more cameras are displayed side by side on the screen of a monitoring monitor terminal to perform monitoring work.

しかし、広域エリアに死角無くカメラを設置しようとすると、多大なコストを必要とする上に、大量のモニタ装置が必要となり、効率的な監視システムを構成することが困難である。そこで、監視の必要性が高い箇所に撮影方向を制御可能なカメラを設置し、監視対象領域内で異常音が発生した方向にカメラを向けることによってカメラの死角を最小限に抑えて監視する技術が提案されている（特許文献１参照）。
特開２００６−２５４２７７号公報 However, if a camera is to be installed in a wide area without a blind spot, it requires a great deal of cost and a large number of monitor devices, making it difficult to construct an efficient monitoring system. Therefore, by installing a camera that can control the shooting direction in places where there is a high need for monitoring, and directing the camera in the direction in which abnormal noise occurred in the monitored area, it is possible to minimize the blind spot of the camera for monitoring. Has been proposed (see Patent Document 1).
JP 2006-254277 A

しかし、特許文献１に開示された技術では、不審者にカメラの動作を知られている場合には、物を投げて音を立てることによってカメラを不審者の行動範囲以外に向けることによって、不審者の位置が撮影範囲外となってしまうという問題がある。 However, in the technique disclosed in Patent Document 1, when a suspicious person knows the operation of the camera, the camera is turned outside the suspicious person's action range by throwing an object and making a sound. There is a problem that the position of the person outside the shooting range.

また、複数の方向で異常音が生じた場合には、カメラの方向を一意に決定することができないといった問題がある。その他、異常音が生じた方向にカメラを向けてしまうと、他の重要な場所が撮影できなくなってしまう危険性がある。通常、カメラの設置位置は重要な方向に向けられており、その方向の映像を撮影しつつ、周囲の情報を把握できることが好ましい。 In addition, when abnormal sounds occur in a plurality of directions, there is a problem that the direction of the camera cannot be determined uniquely. In addition, if the camera is pointed in the direction in which the abnormal sound is generated, there is a risk that other important places cannot be photographed. Usually, the installation position of the camera is oriented in an important direction, and it is preferable that surrounding information can be grasped while shooting an image in that direction.

例えば、カメラの死角の情報を把握するために、マイクを設置することが考えられる。しかし、収集された音声をそのまま監視者に聴かせてしまうと、複数のマイクで同時に音声が発生した場合に状況把握が困難といった問題が生じたり、会話の内容を把握されてしまうことでプライバシーの侵害になってしまう可能性がある。 For example, it is conceivable to install a microphone in order to grasp information on the blind spot of the camera. However, if the collected sound is directly listened to by the monitor, problems such as difficulty in grasping the situation when the sound is generated simultaneously by multiple microphones, or the content of the conversation can be grasped. There is a possibility of infringement.

本発明は、音声情報を活用しながら、プライバシーを侵害しないように配慮し、監視効率を低下させずにカメラの死角の情報を把握することを目的とする。 An object of the present invention is to grasp information on the blind spot of a camera without degrading the monitoring efficiency in consideration of not infringing on privacy while utilizing audio information.

本発明の代表的な一形態では、カメラと、音声が入力されるマイクと、前記カメラによって撮影された映像を表示する表示装置を含む音声情報表示システムであって、前記マイクに入力された音声に基づいて音声情報を生成する音声処理装置と、前記カメラによって撮影された映像を処理する映像処理装置と、を備え、前記音声の種類を含む音声種別情報、前記マイクが設置された位置を含むマイク管理情報、及び前記カメラが設置された位置を含むカメラ管理情報を管理し、前記音声処理装置は、前記マイクに入力された音声を解析することによって、前記表示装置に表示可能な音声認識結果を取得し、前記音声種別情報に基づいて、前記マイクに入力された音声の種類を推定し、前記マイク管理情報及び前記カメラ管理情報に基づいて、前記マイクに入力された音声の音源の位置を推定し、前記取得された音声認識結果、前記推定された音声の種類、及び、前記推定された音源の位置を示す情報を含む音声情報を生成し、前記生成された音声情報を前記映像処理装置に送信し、前記映像処理装置は、前記音声処理装置から受信した音声情報を前記カメラによって撮影された映像に合成し、前記音声情報が合成された映像を前記表示装置に表示し、前記音声処理装置は、前記マイクに入力された音声の音源の位置を推定するときに、前記音声が入力されたマイクと前記音声情報が合成される映像を撮影したカメラとの距離が所定の閾値よりも大きい場合には、前記音声が入力されたマイクが設置された方向を前記音源の位置を示す情報とする。 In a typical embodiment of the present invention, an audio information display system including a camera, a microphone to which audio is input, and a display device that displays an image captured by the camera, the audio input to the microphone It includes a sound processing device for generating audio information, and a video processor for processing the images captured by the camera, sound type information including a type of the voice, a position where the microphone is installed on the basis of The microphone management information and the camera management information including the position where the camera is installed are managed, and the voice processing device analyzes the voice input to the microphone to thereby display the voice recognition result that can be displayed on the display device. acquires, on the basis of the sound type information, estimates the type of voice input to the microphone, on the basis of the microphone management information and the camera control information, The position of the sound source of the sound input to the microphone is estimated, and sound information including information indicating the acquired sound recognition result, the estimated sound type, and the position of the estimated sound source is generated. The generated audio information is transmitted to the video processing device, and the video processing device combines the audio information received from the audio processing device with the video shot by the camera, and the audio information is synthesized. The video is displayed on the display device, and when the sound processing device estimates the position of the sound source of the audio input to the microphone, the audio is captured and the audio information is combined with the audio information. When the distance to the camera is larger than a predetermined threshold, the direction in which the microphone to which the sound is input is set as information indicating the position of the sound source .

本発明の一形態によれば、音声情報を合成（重畳）した映像を表示することによって、映像だけでは把握しにくいカメラの死角の状況を把握することができる。 According to one aspect of the present invention, it is possible to grasp the situation of the blind spot of the camera that is difficult to grasp only by the video by displaying the video in which the audio information is synthesized (superimposed).

以下、本発明の実施の形態について図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態の音声情報表示監視システムの一例を示す構成図である。 FIG. 1 is a configuration diagram showing an example of an audio information display monitoring system according to an embodiment of the present invention.

音声情報表示監視システムは、建造物内などに設置された複数のカメラ１１０及び複数のマイク１１１によって収集された映像情報及び音声情報を、監視センタ２の管理者又は監視員に提供する。 The audio information display monitoring system provides video information and audio information collected by a plurality of cameras 110 and a plurality of microphones 111 installed in a building or the like to an administrator or a monitor of the monitoring center 2.

本発明の実施の形態の音声情報表示監視システムは、複数のカメラ１１０と複数のマイク１１１を含む入力装置群（１Ａ、１Ｂ）と、入力装置群を制御し、監視映像を表示する監視センタ２とを含む。 The audio information display monitoring system according to the embodiment of the present invention includes an input device group (1A, 1B) including a plurality of cameras 110 and a plurality of microphones 111, and a monitoring center 2 that controls the input device group and displays a monitoring video. Including.

監視センタ２には、音声処理ＰＣ３、カメラ制御ＰＣ４、表示ＰＣ５及び映像蓄積ＰＣ６を備える。 The monitoring center 2 includes an audio processing PC 3, a camera control PC 4, a display PC 5, and a video storage PC 6.

音声処理ＰＣ３は、マイク１１１から音声を取得して音声情報を生成する。カメラ制御ＰＣ４は、カメラ１１０によって撮影された映像を取得し、表示ＰＣ５に取得された映像を送信する。 The sound processing PC 3 acquires sound from the microphone 111 and generates sound information. The camera control PC 4 acquires the video imaged by the camera 110 and transmits the acquired video image to the display PC 5.

表示ＰＣ５は、カメラ制御ＰＣ４から送信された映像に音声処理ＰＣ３によって生成された音声情報を合成（重畳）した監視映像を生成する。音声処理ＰＣ３から映像に対応する音声情報が送信されていない場合には、カメラ制御ＰＣ４から送信された映像を監視映像とする。なお、生成された音声情報には、取得された音声を解析することによって推定されたイベントに関する情報などが含まれる。 The display PC 5 generates a monitoring video obtained by synthesizing (superimposing) the audio information generated by the audio processing PC 3 on the video transmitted from the camera control PC 4. When the audio information corresponding to the video is not transmitted from the audio processing PC 3, the video transmitted from the camera control PC 4 is set as the monitoring video. Note that the generated audio information includes information related to an event estimated by analyzing the acquired audio.

表示ＰＣ５には、監視映像を表示するモニタ装置５０が接続される。モニタ装置５０は、複数のカメラ１１０に対応する監視映像をカメラ別に並べて表示することができる。管理者又は監視員は、モニタ装置５０に表示される映像を監視する。映像蓄積ＰＣ６は、監視映像を記憶装置に蓄積する。 A monitor device 50 that displays a monitoring video is connected to the display PC 5. The monitor device 50 can display the monitoring videos corresponding to the plurality of cameras 110 side by side for each camera. An administrator or a monitor monitors the video displayed on the monitor device 50. The video storage PC 6 stores the monitoring video in a storage device.

表示ＰＣ５では、前述したように、音声処理ＰＣ３から音声情報を受信すると、当該音声情報に対応するイベントが発生したマイク付近のカメラ１１０の映像に当該イベントの情報を重畳した監視映像をモニタ装置５０に表示する。マイク１１１は所定の位置に固定され、かつ、所定のカメラ１１０に対応づけられている。マイク１１１によって取得された音声を処理することによって得られる音声情報（イベント情報）を、対応するカメラ１１０によって撮影された映像に合成してモニタ装置５０に表示させることによって、カメラの死角で発生した出来事及び映像では気づきにくい変化などを監視することができる。また、表示ＰＣ５では、管理者によって指定されたカメラ１１０の映像を高精細度で表示することができる。 As described above, when the display PC 5 receives the audio information from the audio processing PC 3, the monitor device 50 displays the monitoring video in which the event information is superimposed on the video of the camera 110 near the microphone in which the event corresponding to the audio information has occurred. To display. The microphone 111 is fixed at a predetermined position and is associated with a predetermined camera 110. The sound information (event information) obtained by processing the sound acquired by the microphone 111 is combined with the video photographed by the corresponding camera 110 and displayed on the monitor device 50, thereby generating the blind spot of the camera. It is possible to monitor changes that are difficult to notice in events and images. The display PC 5 can display the video of the camera 110 specified by the administrator with high definition.

以下、各構成の詳細について説明する。 Details of each component will be described below.

＜入力装置群の構成＞
入力装置群（１Ａ、１Ｂ）は、監視センタ２によって管理される。各入力装置群は建造物などの所定の領域（例えば、各フロア）ごと、あるいは、カメラ又はマイクの台数ごとに設定される。入力装置群には、複数のカメラ１１０及び複数のマイク１１１が含まれる。なお、以下の説明では、各入力装置群の構成は同様であるので、同一の要素に同一の符号を付して重複した説明を省略する。 <Configuration of input device group>
The input device group (1A, 1B) is managed by the monitoring center 2. Each input device group is set for each predetermined area (for example, each floor) such as a building or for each camera or microphone. The input device group includes a plurality of cameras 110 and a plurality of microphones 111. In the following description, since the configuration of each input device group is the same, the same reference numerals are assigned to the same elements, and redundant descriptions are omitted.

カメラ１１０の台数とマイク１１１の台数とは、同じである必要はなく、相違していてもよい。さらに、カメラ１１０とマイク１１１とを同じ位置に設置してもよいし、別の位置に設置してもよい。カメラ１１０とマイク１１１とは、一体型の構成となっていてもよいし、別の筐体であってもよい。 The number of cameras 110 and the number of microphones 111 do not have to be the same and may be different. Furthermore, the camera 110 and the microphone 111 may be installed at the same position, or may be installed at different positions. The camera 110 and the microphone 111 may have an integrated configuration or may be a separate housing.

所定の位置に設置されたカメラ１１０は、所定のフレームレートで所定の解像度の動画を撮影する。撮影された動画（映像）は、通信網１００を介して送信される。通信網１００は、ネットワークケーブル又は専用ケーブルなどによって構成される。 The camera 110 installed at a predetermined position captures a moving image having a predetermined resolution at a predetermined frame rate. The captured moving image (video) is transmitted via the communication network 100. The communication network 100 is configured by a network cable or a dedicated cable.

カメラ１１０は、通信網１００を介してカメラ制御ＰＣ４に接続される。カメラ１１０は、例えば、ネットワークに接続可能なＩＰカメラなどである。カメラ１１０は、外部からの要求に応じて映像を出力するカメラであっても、カメラ側から一方的に映像を出力するカメラであってもよい。 The camera 110 is connected to the camera control PC 4 via the communication network 100. The camera 110 is, for example, an IP camera that can be connected to a network. The camera 110 may be a camera that outputs video in response to an external request, or may be a camera that outputs video unilaterally from the camera side.

所定の位置に設置されたマイク１１１は、通信網１００を介して、音声処理ＰＣ３に接続される。なお、マイク１１１から音声処理ＰＣ３に音声を送信可能であれば、他の通信手段であってもよい。マイク１１１は、マイクロホンアレイなど、装置単独で音源方向の特定が可能なマイクであってもよい。 The microphone 111 installed at a predetermined position is connected to the voice processing PC 3 via the communication network 100. Other communication means may be used as long as the sound can be transmitted from the microphone 111 to the sound processing PC 3. The microphone 111 may be a microphone that can specify the direction of the sound source by a single device, such as a microphone array.

カメラ１１０及びマイク１１１は、拠点ごとに複数設置されている。カメラ１１０の位置情報は、カメラ制御ＰＣ４によって管理される。マイク１１１の位置情報は、音声処理ＰＣ３によって管理される。 A plurality of cameras 110 and microphones 111 are installed for each base. The position information of the camera 110 is managed by the camera control PC 4. The position information of the microphone 111 is managed by the sound processing PC 3.

＜監視センタの構成＞
次に、監視センタ２の構成要素（音声処理ＰＣ３、カメラ制御ＰＣ４、表示ＰＣ５及び映像蓄積ＰＣ６）について、図１から図５を参照しながら説明する。 <Configuration of monitoring center>
Next, the components of the monitoring center 2 (audio processing PC 3, camera control PC 4, display PC 5 and video storage PC 6) will be described with reference to FIGS.

図２は、本発明の実施の形態の音声処理ＰＣ３の構成の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the configuration of the audio processing PC 3 according to the embodiment of this invention.

音声処理ＰＣ３は、前述のように、マイク１１１から取得した音声を解析することによってイベント情報を含む音声情報を生成し、生成された音声情報を表示ＰＣ５に送信する。音声処理ＰＣ３は、ＣＰＵ２０１、メモリ２０２、入力部２０３、通信部２０４及び記憶部２１０を含む計算機である。 As described above, the sound processing PC 3 generates sound information including event information by analyzing the sound acquired from the microphone 111, and transmits the generated sound information to the display PC 5. The voice processing PC 3 is a computer including a CPU 201, a memory 202, an input unit 203, a communication unit 204, and a storage unit 210.

ＣＰＵ２０１は、メモリ２０２に記憶されたプログラムを処理することによって、各種処理を実行する。メモリ２０２は、ＣＰＵ２０１によって実行されるプログラム及び当該プログラムを実行するために必要なデータを記憶する。 The CPU 201 executes various processes by processing a program stored in the memory 202. The memory 202 stores a program executed by the CPU 201 and data necessary for executing the program.

入力部２０３は、利用者などからの情報の入力を受け付けるためのインタフェースである。具体的には、キーボード又はマウスなどの入力機器である。通信部２０４は、外部の機器と通信するためのインタフェースである。例えば、ネットワークインタフェースカードである。音声処理ＰＣ３は、通信部２０４を介して通信網１００に接続されたマイク１１１から音声情報の入力を受け付ける。 The input unit 203 is an interface for accepting input of information from a user or the like. Specifically, an input device such as a keyboard or a mouse. The communication unit 204 is an interface for communicating with an external device. For example, a network interface card. The voice processing PC 3 receives input of voice information from the microphone 111 connected to the communication network 100 via the communication unit 204.

記憶部２１０は、音声の処理に必要なプログラム及びデータを格納する。具体的には、記憶部２１０には、ＯＳ２１１、マイク管理情報２１２、異常音声データ２１３、異常会話データ２１４、音声取得プログラム２１５、音声有無判定プログラム２１６、音源方向推定プログラム２１７、音声種別判定プログラム２１８、音声認識プログラム２１９及びイベント通知プログラム２２０が格納される。 The storage unit 210 stores programs and data necessary for audio processing. Specifically, the storage unit 210 includes an OS 211, microphone management information 212, abnormal voice data 213, abnormal conversation data 214, a voice acquisition program 215, a voice presence / absence determination program 216, a sound source direction estimation program 217, and a voice type determination program 218. A voice recognition program 219 and an event notification program 220 are stored.

ＯＳ２１１は、各種プログラムを実行するために必要なオペレーティングシステムである。 The OS 211 is an operating system necessary for executing various programs.

マイク管理情報２１２は、設置されたマイク１１１の識別情報及び設置情報などが格納される。マイク管理情報２１２の詳細については、図８にて説明する。 The microphone management information 212 stores identification information, installation information, and the like of the installed microphone 111. Details of the microphone management information 212 will be described with reference to FIG.

異常音声データ２１３は、マイク１１１に入力された音声の種別を特定するための情報である。例えば、マイク１１１に入力された音声が足音であることを識別したり、破壊音などの異常音であることを識別したりするための情報である。異常音声データ２１３の詳細については、図１０にて説明する。 The abnormal sound data 213 is information for specifying the type of sound input to the microphone 111. For example, it is information for identifying that the sound input to the microphone 111 is a footstep or for identifying an abnormal sound such as a destructive sound. Details of the abnormal audio data 213 will be described with reference to FIG.

異常会話データ２１４は、マイク１１１に入力された音声が会話である場合に、会話に不審な言葉が含まれているか否かを判定するための情報が格納される。不審者の会話及び緊急時の会話などに含まれる単語又は言葉が会話に含まれているか否かを判定することによって発生したイベントに問題があるか否かを判定する。異常会話データ２１４の詳細については、図１１にて説明する。 The abnormal conversation data 214 stores information for determining whether or not a suspicious word is included in the conversation when the voice input to the microphone 111 is a conversation. It is determined whether or not there is a problem with an event that has occurred by determining whether or not a word or a word included in the conversation of a suspicious person or an emergency conversation is included in the conversation. Details of the abnormal conversation data 214 will be described with reference to FIG.

音声取得プログラム２１５は、マイク１１１から音声を取得する。音声有無判定プログラム２１６は、マイク１１１に音声が入力されたか否かを判定する。 The sound acquisition program 215 acquires sound from the microphone 111. The sound presence / absence determination program 216 determines whether or not sound is input to the microphone 111.

音源方向推定プログラム２１７は、マイク１１１に入力された音声及びマイク管理情報２１２に基づいて、当該音声の音源の方向を推定する。 The sound source direction estimation program 217 estimates the direction of the sound source of the sound based on the sound input to the microphone 111 and the microphone management information 212.

音声種別判定プログラム２１８は、マイク１１１に入力された音声及び異常音声データ２１３に基づいて、音声の種別を判定する。音声の種別は、当該音声が発せられた原因となったイベントに対応する。 The sound type determination program 218 determines the sound type based on the sound input to the microphone 111 and the abnormal sound data 213. The type of sound corresponds to the event that caused the sound to be emitted.

音声認識プログラム２１９は、マイク１１１に入力された音声の種別が会話である場合に、会話の内容及び異常会話データ２１４に基づいて、会話に異常の可能性が高い単語が含まれているか否かを判定する。 The speech recognition program 219 determines whether or not a word having a high possibility of abnormality is included in the conversation based on the content of the conversation and the abnormal conversation data 214 when the type of the sound input to the microphone 111 is conversation. Determine.

イベント通知プログラム２２０は、音声取得プログラム２１５、音源方向推定プログラム２１７、音声種別判定プログラム２１８及び音声認識プログラム２１９によって生成されたイベント情報を含む音声情報を表示ＰＣ５に送信する。 The event notification program 220 transmits audio information including event information generated by the audio acquisition program 215, the sound source direction estimation program 217, the audio type determination program 218, and the audio recognition program 219 to the display PC 5.

図３は、本発明の実施の形態のカメラ制御ＰＣ４の構成の一例を示す図である。 FIG. 3 is a diagram showing an example of the configuration of the camera control PC 4 according to the embodiment of the present invention.

カメラ制御ＰＣ４は、前述のように、カメラ１１０から映像を取得し、映像を表示ＰＣ５に送信する。カメラ制御ＰＣ４は、ＣＰＵ３０１、メモリ３０２、入力部３０３、通信部３０４及び記憶部３１０を含む計算機である。 As described above, the camera control PC 4 acquires a video from the camera 110 and transmits the video to the display PC 5. The camera control PC 4 is a computer that includes a CPU 301, a memory 302, an input unit 303, a communication unit 304, and a storage unit 310.

ＣＰＵ３０１は、メモリ３０２に記憶されたプログラムを処理することによって、各種処理を実行する。メモリ３０２は、ＣＰＵ３０１によって実行されるプログラム及び当該プログラムを実行するために必要なデータを記憶する。 The CPU 301 executes various processes by processing a program stored in the memory 302. The memory 302 stores a program executed by the CPU 301 and data necessary for executing the program.

入力部３０３は、管理者又は監視員などによる入力を受け付けるためのインタフェースである。具体的には、キーボード又はマウスなどの入力機器である。通信部３０４は、外部の機器と通信するためのインタフェースである。例えば、ネットワークインタフェースカードである。カメラ制御ＰＣ４は、通信部３０４を介して通信網１００に接続されたカメラ１１０から映像の入力を受け付ける。 The input unit 303 is an interface for accepting input by an administrator or a monitor. Specifically, an input device such as a keyboard or a mouse. The communication unit 304 is an interface for communicating with an external device. For example, a network interface card. The camera control PC 4 receives video input from the camera 110 connected to the communication network 100 via the communication unit 304.

記憶部３１０は、映像を取得又は送信するために必要なプログラム及びデータを格納する。具体的には、記憶部３１０には、ＯＳ３１１、カメラ管理情報３１２、カメラ映像取得プログラム３１３及びカメラ映像送信プログラム３１４が格納される。 The storage unit 310 stores programs and data necessary for acquiring or transmitting video. Specifically, the storage unit 310 stores an OS 311, camera management information 312, a camera video acquisition program 313, and a camera video transmission program 314.

ＯＳ３１１は、各種プログラムを実行するために必要なオペレーティングシステムである。 The OS 311 is an operating system necessary for executing various programs.

カメラ管理情報３１２は、設置されたカメラ１１０の識別情報及び設置情報などが格納される。カメラ管理情報３１２の詳細については、図９にて説明する。 The camera management information 312 stores identification information and installation information of the installed camera 110. Details of the camera management information 312 will be described with reference to FIG.

カメラ映像取得プログラム３１３は、指定されたカメラ１１０によって撮影された映像を取得する。カメラ映像送信プログラム３１４は、表示ＰＣ５に映像を送信する。 The camera video acquisition program 313 acquires video captured by the designated camera 110. The camera video transmission program 314 transmits video to the display PC 5.

図４は、本発明の実施の形態の表示ＰＣ５の構成の一例を示す図である。 FIG. 4 is a diagram showing an example of the configuration of the display PC 5 according to the embodiment of the present invention.

表示ＰＣ５は、音声処理ＰＣ３から送信された音声情報を、カメラ１１０によって撮影された映像に重畳して表示する。表示ＰＣ５は、ＣＰＵ４０１、メモリ４０２、入力部４０３、表示部４０４、通信部４０５及び記憶部４１０を含む計算機である。 The display PC 5 superimposes and displays the audio information transmitted from the audio processing PC 3 on the video imaged by the camera 110. The display PC 5 is a computer including a CPU 401, a memory 402, an input unit 403, a display unit 404, a communication unit 405, and a storage unit 410.

ＣＰＵ４０１は、メモリ４０２に記憶されたプログラムを処理することによって、各種処理を実行する。メモリ４０２は、ＣＰＵ４０１によって実行されるプログラム及び当該プログラムを実行するために必要なデータを記憶する。 The CPU 401 executes various processes by processing a program stored in the memory 402. The memory 402 stores a program executed by the CPU 401 and data necessary for executing the program.

入力部４０３は、利用者などからの情報の入力を受け付けるためのインタフェースである。具体的には、キーボード又はマウスなどの入力機器である。例えば、映像を表示するカメラ１１０の指定を受け付ける。 The input unit 403 is an interface for accepting input of information from a user or the like. Specifically, an input device such as a keyboard or a mouse. For example, the designation of the camera 110 that displays the video is accepted.

表示部４０４は、音声処理ＰＣ３から送信された音声情報を、カメラ制御ＰＣ４から受信した映像に重畳した映像信号を出力し、モニタ装置５０に表示させる。 The display unit 404 outputs a video signal in which the audio information transmitted from the audio processing PC 3 is superimposed on the video received from the camera control PC 4 and causes the monitor device 50 to display the video signal.

通信部４０５は、外部の機器と通信するためのインタフェースである。例えば、ネットワークインタフェースカードである。表示ＰＣ５は、通信部４０５を介して通信網１００に接続された音声処理ＰＣ３及びカメラ制御ＰＣ４から送信された音声情報及び映像の入力を受け付ける。 The communication unit 405 is an interface for communicating with an external device. For example, a network interface card. The display PC 5 receives input of audio information and video transmitted from the audio processing PC 3 and the camera control PC 4 connected to the communication network 100 via the communication unit 405.

記憶部４１０は、映像を表示するために必要なプログラム及びデータを格納する。具体的には、記憶部４１０には、ＯＳ４１１、イベント取得プログラム４１２、映像処理プログラム４１３、表示制御プログラム４１４、映像蓄積プログラム４１５及び蓄積映像取得プログラム４１６が格納される。 The storage unit 410 stores programs and data necessary for displaying video. Specifically, the storage unit 410 stores an OS 411, an event acquisition program 412, a video processing program 413, a display control program 414, a video storage program 415, and a stored video acquisition program 416.

ＯＳ４１１は、各種プログラムを実行するために必要なオペレーティングシステムである。 The OS 411 is an operating system necessary for executing various programs.

イベント取得プログラム４１２は、音声処理ＰＣ３によって送信されたイベント情報を含む音声情報を取得する。映像処理プログラム４１３は、カメラ制御ＰＣ４から受信した映像情報に音声情報を重畳した映像を生成する。 The event acquisition program 412 acquires audio information including event information transmitted by the audio processing PC 3. The video processing program 413 generates a video in which audio information is superimposed on video information received from the camera control PC 4.

表示制御プログラム４１４は、映像情報の表示を制御する。例えば、複数のカメラ１１０によって撮影された映像を同時に表示したり、指定されたカメラ１１０によって撮影された映像を拡大して表示したりする。 The display control program 414 controls display of video information. For example, videos taken by a plurality of cameras 110 are displayed simultaneously, or videos taken by a designated camera 110 are enlarged and displayed.

映像蓄積プログラム４１５は、表示ＰＣ５で受信した映像及び音声情報を映像蓄積ＰＣ６に送信する。蓄積映像取得プログラム４１６は、映像蓄積ＰＣ６に蓄積された映像及び音声情報を取得する。 The video storage program 415 transmits the video and audio information received by the display PC 5 to the video storage PC 6. The stored video acquisition program 416 acquires video and audio information stored in the video storage PC 6.

図５は、本発明の実施の形態の映像蓄積ＰＣ６の構成の一例を示す図である。 FIG. 5 is a diagram showing an example of the configuration of the video storage PC 6 according to the embodiment of the present invention.

映像蓄積ＰＣ６は、ＣＰＵ５０１、メモリ５０２、入力部５０３、通信部５０４及び記憶部５１０を含む計算機である。映像蓄積ＰＣ６は、表示ＰＣ５で表示された映像を格納する。さらに、取得した映像に音声情報が重畳されている場合には、映像に対応する音声及び音声情報を格納する。表示ＰＣ５は、映像蓄積ＰＣ６に格納された映像データ及び音声データを随時取得し、閲覧することができる。 The video storage PC 6 is a computer including a CPU 501, a memory 502, an input unit 503, a communication unit 504, and a storage unit 510. The video storage PC 6 stores the video displayed on the display PC 5. Furthermore, when audio information is superimposed on the acquired video, audio and audio information corresponding to the video is stored. The display PC 5 can acquire and browse video data and audio data stored in the video storage PC 6 as needed.

ＣＰＵ５０１は、メモリ５０２に記憶されたプログラムを処理することによって、各種処理を実行する。メモリ５０２は、ＣＰＵ５０１によって実行されるプログラム及び当該プログラムを実行するために必要なデータを記憶する。 The CPU 501 executes various processes by processing the program stored in the memory 502. The memory 502 stores a program executed by the CPU 501 and data necessary for executing the program.

入力部５０３は、管理者又は監視員などによる入力を受け付けるためのインタフェースである。具体的には、キーボード又はマウスなどの入力機器である。通信部５０４は、外部の機器と通信するためのインタフェースである。例えば、ネットワークインタフェースカードである。映像蓄積ＰＣ６は、通信部５０４を介して通信網１００に接続された表示ＰＣ５から映像データ及び音声情報データを受信する。 The input unit 503 is an interface for receiving an input from an administrator or a monitor. Specifically, an input device such as a keyboard or a mouse. The communication unit 504 is an interface for communicating with an external device. For example, a network interface card. The video storage PC 6 receives video data and audio information data from the display PC 5 connected to the communication network 100 via the communication unit 504.

記憶部５１０は、映像データ５１２及び音声情報データ５１３を格納する。さらに、映像及び音声情報を蓄積及び管理するために必要なプログラム及びデータを格納する。具体的には、記憶部５１０には、ＯＳ５１１、映像蓄積プログラム５１４、映像検索プログラム５１５及びデータ管理プログラム５１６が格納される。 The storage unit 510 stores video data 512 and audio information data 513. Further, it stores programs and data necessary for accumulating and managing video and audio information. Specifically, the storage unit 510 stores an OS 511, a video accumulation program 514, a video search program 515, and a data management program 516.

ＯＳ５１１は、各種プログラムを実行するために必要なオペレーティングシステムである。 The OS 511 is an operating system necessary for executing various programs.

映像データ５１２は、カメラ１１０によって撮影された映像であって、カメラ制御ＰＣ４及び表示ＰＣ５を経由して映像蓄積ＰＣ６に送信される。 The video data 512 is a video shot by the camera 110 and is transmitted to the video storage PC 6 via the camera control PC 4 and the display PC 5.

音声情報データ５１３は、マイク１１１に入力された音声及び当該音声を基に生成された情報であって、音声処理ＰＣ３から表示ＰＣ５を経由して映像蓄積ＰＣ６に送信される。 The audio information data 513 is audio input to the microphone 111 and information generated based on the audio, and is transmitted from the audio processing PC 3 to the video storage PC 6 via the display PC 5.

映像蓄積プログラム５１４は、表示ＰＣ５から送信された映像及び音声情報を受信し、記憶部５１０に格納する。映像検索プログラム５１５は、指定された映像を検索し、映像データ５１２及び音声情報データ５１３を取得し、表示ＰＣ５に送信する。 The video accumulation program 514 receives video and audio information transmitted from the display PC 5 and stores them in the storage unit 510. The video search program 515 searches for the specified video, acquires the video data 512 and the audio information data 513, and transmits them to the display PC 5.

データ管理プログラム５１６は、映像データ５１２及び音声情報データ５１３を管理するためのプログラムである。データ管理プログラム５１６は、汎用プログラムであってもよいし、映像データ５１２及び音声情報データ５１３の管理に特化した専用プログラムであってもよい。 The data management program 516 is a program for managing the video data 512 and the audio information data 513. The data management program 516 may be a general-purpose program or a dedicated program specialized for managing the video data 512 and the audio information data 513.

なお、音声処理ＰＣ３、カメラ制御ＰＣ４、表示ＰＣ５、及び映像蓄積ＰＣ６を、１台又は複数台のＰＣに集約して音声情報表示監視システムを構成してもよい。 Note that the audio information display monitoring system may be configured by integrating the audio processing PC 3, the camera control PC 4, the display PC 5, and the video storage PC 6 into one or a plurality of PCs.

図８は、本発明の実施の形態のマイク管理情報２１２の一例を示す図である。 FIG. 8 is a diagram illustrating an example of the microphone management information 212 according to the embodiment of this invention.

マイク管理情報２１２には、マイクＩＤ８０１、アドレス８０２、拠点ＩＤ８０３、設置位置８０４、及び周辺カメラＩＤ８０５が含まれる。 The microphone management information 212 includes a microphone ID 801, an address 802, a base ID 803, an installation position 804, and a peripheral camera ID 805.

マイクＩＤ８０１は、マイク１１１の識別子である。アドレス８０２は、マイク１１１のアクセス先情報である。本発明の実施の形態では、アクセス先情報としてＩＰアドレスを利用している。 The microphone ID 801 is an identifier of the microphone 111. Address 802 is access destination information of the microphone 111. In the embodiment of the present invention, an IP address is used as the access destination information.

拠点ＩＤ８０３は、マイク１１１が設置された拠点の識別子である。設置拠点は、例えば、入力装置群が設置されている建物又は部屋などに対応する。設置位置８０４は、設置拠点内の設置位置を示す情報である。具体的には、マイク１１１の設置された座標などで表される。本発明の実施の形態では、さらに、マイク１１１の設置角度が含まれる。周辺カメラＩＤ８０５は、マイクＩＤ８０１によって識別されるマイク１１１の周辺に設置されたカメラ１１０の識別子である。 The site ID 803 is an identifier of the site where the microphone 111 is installed. The installation base corresponds to, for example, a building or a room where the input device group is installed. The installation position 804 is information indicating the installation position in the installation base. Specifically, it is represented by coordinates where the microphone 111 is installed. In the embodiment of the present invention, the installation angle of the microphone 111 is further included. The peripheral camera ID 805 is an identifier of the camera 110 installed around the microphone 111 identified by the microphone ID 801.

図９は、本発明の実施の形態のカメラ管理情報３１２の一例を示す図である。 FIG. 9 is a diagram illustrating an example of the camera management information 312 according to the embodiment of this invention.

カメラ管理情報３１２には、カメラＩＤ９０１、アドレス９０２、拠点ＩＤ９０３、設置位置９０４、及び周辺マイクＩＤ９０５が含まれる。 The camera management information 312 includes a camera ID 901, an address 902, a base ID 903, an installation position 904, and a peripheral microphone ID 905.

カメラＩＤ９０１は、カメラ１１０の識別子である。アドレス９０２は、カメラ１１０のアクセス先情報である。本発明の実施の形態では、マイク１１１と同様に、アクセス先情報としてＩＰアドレスを利用している。 The camera ID 901 is an identifier of the camera 110. An address 902 is access destination information of the camera 110. In the embodiment of the present invention, like the microphone 111, an IP address is used as access destination information.

拠点ＩＤ９０３は、カメラ１１０が設置された拠点の識別子である。設置拠点は、マイク１１１と同様に、入力装置群が設置されている建物又は部屋などに対応する。設置位置９０４は、設置拠点内の設置位置を示す情報である。設置位置９０４は、マイク１１１の設置位置８０４と同様に、座標及び設置角度によって構成される。周辺マイクＩＤ９０５は、カメラＩＤ９０１によって識別されるカメラ１１０の周辺に設置されたマイク１１１の識別子である。 The site ID 903 is an identifier of the site where the camera 110 is installed. Similar to the microphone 111, the installation base corresponds to a building or a room where the input device group is installed. The installation position 904 is information indicating the installation position in the installation base. The installation position 904 is configured by coordinates and an installation angle similarly to the installation position 804 of the microphone 111. The peripheral microphone ID 905 is an identifier of the microphone 111 installed around the camera 110 identified by the camera ID 901.

図１０は、本発明の実施の形態の異常音声データ２１３の一例を示す図である。 FIG. 10 is a diagram illustrating an example of the abnormal sound data 213 according to the embodiment of this invention.

異常音声データ２１３には、音声ＩＤ１００１、音声種別１００２及び重要度１００３が含まれる。 The abnormal voice data 213 includes a voice ID 1001, a voice type 1002, and an importance level 1003.

音声ＩＤ１００１は、音声種別の識別子である。音声種別１００２は、マイク１１１に入力された音声の種類である。例えば、会話又は足音などの平常時に発せられる音声であったり、悲鳴又は打撃音などの異常が発生した場合に発せられる可能性が高い音声であったりする。 The voice ID 1001 is a voice type identifier. The audio type 1002 is the type of audio input to the microphone 111. For example, it may be a voice that is uttered during normal times such as conversation or footsteps, or a voice that is highly likely to be uttered when an abnormality such as a scream or a hitting sound occurs.

重要度１００３は、音声種別１００２の重要度であって、異常の可能性が高い種類の音声ほど、高い重要度が設定される。本発明の実施の形態では、１から５までの値が設定されるが、さらに細分化して設定してもよい。 The importance 1003 is the importance of the voice type 1002, and a higher importance is set for a voice having a higher possibility of abnormality. In the embodiment of the present invention, values from 1 to 5 are set, but they may be further subdivided.

図１１は、本発明の実施の形態の異常会話データ２１４の一例を示す図である。 FIG. 11 is a diagram illustrating an example of the abnormal conversation data 214 according to the embodiment of this invention.

異常会話データ２１４には、会話ＩＤ１１０１、テキスト１１０２、読み１１０３及び重要度１１０４が含まれる。 The abnormal conversation data 214 includes a conversation ID 1101, text 1102, reading 1103, and importance 1104.

会話ＩＤ１１０１は、会話に含まれる単語又は言葉を識別する識別子である。テキスト１１０２は、会話に含まれる単語又は言葉である。テキスト１１０２には、主に不審な行為に想起させる言葉（異常音声）が登録される。 The conversation ID 1101 is an identifier for identifying a word or a word included in the conversation. Text 1102 is a word or word included in the conversation. Registered in the text 1102 are words (abnormal voices) that are recalled mainly by suspicious acts.

読み１１０３は、テキスト１１０２の読みを示す情報である。異常音声は、マイク１１１から収集された音声種別が「会話」である場合に、会話の内容から読み１１０３に一致する単語又は言葉を取得することによって抽出される。 Reading 1103 is information indicating reading of the text 1102. Abnormal speech is extracted by acquiring a word or word that matches the reading 1103 from the content of the conversation when the speech type collected from the microphone 111 is “conversation”.

重要度１１０４は、会話内容の重要度であって、異常の可能性が高い音声情報ほど、高い重要度が設定される。本発明の実施の形態では、異常音声データ２１３と同様に、１から５までの値が設定されている。 The importance 1104 is the importance of the conversation content, and the higher importance is set for the voice information having a higher possibility of abnormality. In the embodiment of the present invention, a value from 1 to 5 is set as in the case of the abnormal sound data 213.

＜処理手順＞
続いて、本発明の実施の形態の音声情報表示監視システムの各処理の手順について説明する。まず、音声処理ＰＣ３によって、マイク１１１から入力された音声を処理する手順について説明する。 <Processing procedure>
Next, the procedure of each process of the audio information display monitoring system according to the embodiment of this invention will be described. First, a procedure for processing voice input from the microphone 111 by the voice processing PC 3 will be described.

図６は、本発明の実施の形態のマイク１１１に入力された音声から音声情報を生成する手順を示すフローチャートである。 FIG. 6 is a flowchart illustrating a procedure for generating voice information from the voice input to the microphone 111 according to the embodiment of this invention.

音声処理ＰＣ３のＣＰＵ２０１は、まず、音声取得プログラム２１５を実行することによって、マイク管理情報２１２に基づいて各マイク１１１から音声データを取得する（Ｓ６０１）。なお、ステップＳ６０１における音声取得処理は、マイク１１１から一方的に送信される音声を受信する方式であってもよいし、音声処理ＰＣ３からの音声取得要求に応じてマイク１１１から音声を受信する方式であってもよい。 The CPU 201 of the voice processing PC 3 first acquires voice data from each microphone 111 based on the microphone management information 212 by executing the voice acquisition program 215 (S601). Note that the audio acquisition process in step S601 may be a method of receiving audio unilaterally transmitted from the microphone 111, or a method of receiving audio from the microphone 111 in response to an audio acquisition request from the audio processing PC3. It may be.

音声処理ＰＣ３のＣＰＵ２０１は、次に、音声有無判定プログラム２１６を実行することによって、ステップＳ６０１の処理で取得された音声データに音声が含まれているか否かを判定する（Ｓ６０２）。 Next, the CPU 201 of the sound processing PC 3 determines whether or not sound is included in the sound data acquired in the process of step S601 by executing the sound presence / absence determination program 216 (S602).

音声処理ＰＣ３のＣＰＵ２０１は、ステップＳ６０２の判定結果に基づいて、音声データに音声が含まれているか否かを判定する（Ｓ６０３）。音声有無の判定は、事前に定めた時間の音声を取得し、取得された時間の音声データに事前に定めた音圧レベル以上の音声が含まれているか否かを判定する。また、マイク１１１ごとに異なる音圧レベルの閾値を設けてもよい。 The CPU 201 of the sound processing PC 3 determines whether or not sound is included in the sound data based on the determination result of step S602 (S603). The determination of the presence / absence of sound is performed by acquiring sound at a predetermined time, and determining whether sound data at a predetermined time is included in the sound data at the acquired time. Also, different thresholds for sound pressure levels may be provided for each microphone 111.

音声処理ＰＣ３のＣＰＵ２０１は、取得された音声データが無音区間である場合には（Ｓ６０３の結果が「Ｎ」）、音声取得のステップＳ６０１の処理に戻る。 When the acquired audio data is a silent section (result of S603 is “N”), the CPU 201 of the audio processing PC 3 returns to the process of the audio acquisition step S601.

一方、音声処理ＰＣ３のＣＰＵ２０１は、取得された音声データに音声が含まれている場合には（Ｓ６０３の結果が「Ｙ」）、音源方向推定プログラム２１７を実行することによって、マイク１１１で取得された音声の音源方向を推定する（Ｓ６０４）。音源方向の推定は、特開２００８−９２５１２号公報に開示されているように、複数のマイクによって得られた情報に基づいて方向を推定してもよい。 On the other hand, the CPU 201 of the sound processing PC 3 acquires the sound by the microphone 111 by executing the sound source direction estimation program 217 when the acquired sound data includes sound (result of S603 is “Y”). The sound source direction of the voice is estimated (S604). The sound source direction may be estimated based on information obtained by a plurality of microphones, as disclosed in Japanese Patent Laid-Open No. 2008-92512.

また、音源の方向を取得する別の方法としては、音圧が最大となるマイクを取得し、図８に示したマイク管理情報２１２及び図９に示したカメラ管理情報３１２に基づいて、音源の方向を推定するようにしてもよい。 As another method for acquiring the direction of the sound source, a microphone having the maximum sound pressure is acquired, and the sound source direction is determined based on the microphone management information 212 shown in FIG. 8 and the camera management information 312 shown in FIG. The direction may be estimated.

カメラ１１０及びマイク１１１の位置情報を用いて音源の方向を推定する方法では、事前にカメラとマイクを同じ場所として扱うか否かを判定するための閾値距離Ｘを設定しておく。そして、音圧が一定以上のマイク１１１がカメラ１１０から距離Ｘ以上離れている場合には、音源の位置を特定することが困難なため、マイクの位置を音源位置とする。図１２を参照しながらさらに詳しく説明する。 In the method of estimating the direction of the sound source using the positional information of the camera 110 and the microphone 111, a threshold distance X for determining whether or not the camera and the microphone are handled as the same place is set in advance. When the microphone 111 having a certain sound pressure or more is separated from the camera 110 by a distance X or more, it is difficult to specify the position of the sound source, and therefore the position of the microphone is set as the sound source position. This will be described in more detail with reference to FIG.

図１２は、本発明の実施の形態のカメラ１１０及びマイク１１１の設置レイアウトの一例を示す図である。 FIG. 12 is a diagram illustrating an example of an installation layout of the camera 110 and the microphone 111 according to the embodiment of this invention.

図１２に示すレイアウトでは、３台のカメラ１１０（Ｃ０１〜Ｃ０３）及び４台のマイク１１１（Ｍ０１〜Ｍ０４）が設置されている。 In the layout shown in FIG. 12, three cameras 110 (C01 to C03) and four microphones 111 (M01 to M04) are installed.

ここで、マイクＭ０４の音圧が高い場合には、まず、図８に示したマイク管理情報２１２に基づいてカメラＣ０３を周辺カメラと特定する。さらに、マイクＭ０４の設置位置８０４及びカメラＣ０３の設置位置９０４に基づいて、マイクＭ０４とカメラＣ０３との間の距離を算出する。算出された距離がＸ以上離れていた場合には、カメラＣ０３に対する音源の方向は、真下方向、つまり２７０°とする。 Here, when the sound pressure of the microphone M04 is high, the camera C03 is first identified as a peripheral camera based on the microphone management information 212 shown in FIG. Further, the distance between the microphone M04 and the camera C03 is calculated based on the installation position 804 of the microphone M04 and the installation position 904 of the camera C03. If the calculated distance is greater than or equal to X, the direction of the sound source with respect to the camera C03 is set to a direct downward direction, that is, 270 °.

また、マイクＭ０３の音圧が高い場合には、同様に、カメラＣ０３が周辺カメラと特定する。そして、マイクＭ０３とカメラＣ０３との間の距離がＸ以下であれば、カメラＣ０３に対する音源の方向はマイクＭ０３で推定される音源の方向となる。 Similarly, when the sound pressure of the microphone M03 is high, the camera C03 is identified as a peripheral camera. If the distance between the microphone M03 and the camera C03 is X or less, the direction of the sound source with respect to the camera C03 is the direction of the sound source estimated by the microphone M03.

なお、同じ位置に設置されていても、カメラ１１０とマイク１１１の方向が異なっている場合は、マイク管理情報２１２及びカメラ管理情報３１２に登録されている設置位置の情報に基づいて角度を補正する。 Even if the camera 110 and the microphone 111 are in different directions even if they are installed at the same position, the angle is corrected based on the installation position information registered in the microphone management information 212 and the camera management information 312. .

ここで、図６の音声情報を取得する手順を示すフローチャートの説明に戻る。 Here, the description returns to the flowchart of the procedure for acquiring the audio information in FIG.

音声処理ＰＣ３のＣＰＵ２０１は、音源方向の推定が完了すると、音声種別判定プログラム２１８を実行することによって、異常音声データ２１３に基づいて、破壊音又は足音などの音声の種別を推定する（Ｓ６０５）。ステップＳ６０５の音声種別を推定する処理は、公知技術を利用することができる。例えば、特開２００１−３１２２９２公報には、音声のスペクトル情報を利用することによって、音声の種別を取得する技術が開示されている。 When the estimation of the sound source direction is completed, the CPU 201 of the audio processing PC 3 executes the audio type determination program 218 to estimate the type of audio such as a destructive sound or a footstep based on the abnormal audio data 213 (S605). A known technique can be used for the process of estimating the voice type in step S605. For example, Japanese Patent Laid-Open No. 2001-31292 discloses a technique for acquiring the type of sound by using sound spectrum information.

さらに、音声処理ＰＣ３のＣＰＵ２０１は、音声の種別が会話か否かを判定する（Ｓ６０６）。音声の種別が会話と判定された場合には（Ｓ６０６の結果が「Ｙ」）、音声認識プログラム２１９を実行することによって、会話内容の所定のテキストが含まれているかを判定することによって会話内容を認識する（Ｓ６０７）。所定のテキストとは、図１１に示した異常会話データ２１４に含まれるテキスト１１０２である。 Further, the CPU 201 of the voice processing PC 3 determines whether or not the voice type is conversation (S606). If the speech type is determined to be conversation (the result of S606 is “Y”), the content of the conversation is determined by executing the speech recognition program 219 to determine whether a predetermined text of the conversation content is included. Is recognized (S607). The predetermined text is the text 1102 included in the abnormal conversation data 214 shown in FIG.

音声認識には、音声認識の分野で広く知られているＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）に基づいた方式などを利用することができる。例えば、特開２００８―５８５０３には、音声認識の対象となる発話者と他者との会話が常時存在し得る環境下で、発話者の発話部分のみある程度の信頼性で音声認識が可能な方法が開示されている。 For speech recognition, a method based on HMM (Hidden Markov Model) widely known in the field of speech recognition can be used. For example, Japanese Patent Application Laid-Open No. 2008-58503 discloses a method capable of performing speech recognition with a certain degree of reliability only in the utterance portion of a speaker in an environment where a conversation between a speaker who is a target of speech recognition and another person can always exist. Is disclosed.

また、一般に公開されているオープンソースの音声認識ソフトウェアＪｕｌｉｕｓ（ｈｔｔｐ：／／ｊｕｌｉｕｓ．ｓｏｕｒｃｅｆｏｒｇｅ．ｊｐ）などを利用することも可能である。なお、音声の内容を示す情報であれば、抑揚情報などの音響情報に基づいて会話の危険度を取得するなど言葉以外の情報を取得してもよい。 It is also possible to use publicly available open source speech recognition software Julius (http://julius.sourceforge.jp). In addition, as long as it is information indicating the content of speech, information other than words may be acquired such as acquiring the degree of conversation risk based on acoustic information such as intonation information.

音声処理ＰＣ３のＣＰＵ２０１は、次に、イベント通知プログラム２２０を実行することによって、ステップＳ６０４からＳ６０７の処理で取得された音声情報を、表示ＰＣ５に送信する（Ｓ６０８）。なお、音声情報を送信することによってイベントを通知する際には、マイク１１１を識別するための識別子及び対応するカメラ１１０の識別子も併せて送信する。 Next, the CPU 201 of the sound processing PC 3 transmits the sound information acquired in the processing of steps S604 to S607 to the display PC 5 by executing the event notification program 220 (S608). When notifying an event by transmitting audio information, an identifier for identifying the microphone 111 and an identifier of the corresponding camera 110 are also transmitted.

音声処理ＰＣ３のＣＰＵ２０１は、次に、終了指示を受け付けたか否かを判定する（Ｓ６０９）。終了指示を受け付けた場合には（Ｓ６０９の結果が「Ｙ」）、本処理を終了する。終了指示を受け付けていない場合には（Ｓ６０９の結果が「Ｎ」）、ステップＳ６０１の処理に戻り、本処理を継続して実行する。 Next, the CPU 201 of the voice processing PC 3 determines whether or not an end instruction has been received (S609). If an end instruction has been received (the result of S609 is “Y”), this process ends. If an end instruction has not been received (the result of S609 is “N”), the process returns to the process of step S601, and this process is continued.

続いて、音声情報が付与された監視映像をモニタ装置５０に表示する手順について説明する。 Next, a procedure for displaying the monitoring video to which the audio information is added on the monitor device 50 will be described.

図７は、本発明の実施の形態の音声情報が付与された監視映像を生成及び表示する手順を示すフローチャートである。 FIG. 7 is a flowchart illustrating a procedure for generating and displaying a monitoring video to which audio information is added according to the embodiment of this invention.

まず、カメラ制御ＰＣ４のＣＰＵ３０１は、カメラ映像取得プログラム３１３を実行することによって、カメラ管理情報３１２を参照し、カメラ１１０から映像を取得する（Ｓ７０１）。さらに、カメラ映像送信プログラム３１４を実行し、表示ＰＣ５に映像を送信する。なお、映像を取得する方式は、カメラ制御ＰＣ４からカメラ１１０に映像取得要求を送信して映像を取得する方式であってもよいし、カメラ１１０からカメラ制御ＰＣ４に一方的に送信される映像を取得する方式であってもよい。 First, the CPU 301 of the camera control PC 4 acquires a video from the camera 110 by referring to the camera management information 312 by executing the camera video acquisition program 313 (S701). Further, the camera video transmission program 314 is executed to transmit the video to the display PC 5. The video acquisition method may be a method of acquiring a video by transmitting a video acquisition request from the camera control PC 4 to the camera 110, or a video unilaterally transmitted from the camera 110 to the camera control PC 4. The acquisition method may be used.

次に、表示ＰＣ５のＣＰＵ４０１は、イベント取得プログラム４１２を実行することによって、音声処理ＰＣ３によって送信された音声情報を取得する（Ｓ７０２）。さらに、音声情報を受信したか否かに基づいて、イベントの有無を判定する（Ｓ７０３、Ｓ７０４）。 Next, the CPU 401 of the display PC 5 acquires the audio information transmitted by the audio processing PC 3 by executing the event acquisition program 412 (S702). Furthermore, the presence / absence of an event is determined based on whether or not audio information is received (S703, S704).

本発明の実施の形態では、表示ＰＣ５のＣＰＵ４０１は、カメラ１１０によって撮影された監視映像をカメラ制御ＰＣ４から受信し、当該監視映像を撮影したカメラ１１０の識別情報などとともに表示する。表示ＰＣ５のＣＰＵ４０１は、イベントがあったと判定された場合には（Ｓ７０４の結果が「Ｙ」）、映像処理プログラム４１３を実行し、音声処理ＰＣ３から受信した音声情報を映像データにさらに重畳する（Ｓ７０５）。 In the embodiment of the present invention, the CPU 401 of the display PC 5 receives the monitoring video shot by the camera 110 from the camera control PC 4 and displays the monitoring video together with the identification information of the camera 110 that shot the monitoring video. If the CPU 401 of the display PC 5 determines that an event has occurred (the result of S704 is “Y”), it executes the video processing program 413 and further superimposes the audio information received from the audio processing PC 3 on the video data ( S705).

映像情報に重畳される音声情報は、例えば、矢印など音源の方向を示す記号、及び音声の種別を示す情報である。さらに、会話から異常会話データ２１４で指定された言葉が抽出された場合には、音声種別情報とともに抽出された言葉を監視映像に重畳させる。 The audio information to be superimposed on the video information is, for example, a symbol indicating the direction of the sound source such as an arrow and information indicating the type of audio. Furthermore, when a word specified by the abnormal conversation data 214 is extracted from the conversation, the extracted word together with the voice type information is superimposed on the monitoring video.

なお、監視映像に音声情報を重畳させる期間は、イベント発生時から事前に設定された時間内とする。また、重要度に応じて、文字の大きさ又は色などを変更して音声情報を表示するようにしてもよい。また、音圧の大きさ又は周波数成分ごとの音の分布など、音声の性質を示す情報を表示してもよい。さらに、複数の映像が表示され、重要度が大きいイベントが発生した場合には、当該イベントが発生している映像が表示されている領域の枠を強調するなどして、管理者の注意を引くように表示してもよい。以下、図１３を参照しながら表示ＰＣ５によって表示される監視映像の一例について説明する。 Note that the period in which the audio information is superimposed on the monitoring video is within a preset time from the event occurrence. Moreover, according to importance, you may make it display audio | voice information by changing the magnitude | size or color of a character. Also, information indicating the sound properties such as the sound pressure level or the sound distribution for each frequency component may be displayed. In addition, when an event with a high level of importance occurs when multiple videos are displayed, draw the attention of the administrator by highlighting the frame of the area in which the video in which the event occurred is displayed. May be displayed as follows. Hereinafter, an example of the monitoring video displayed by the display PC 5 will be described with reference to FIG.

図１３は、本発明の実施の形態の表示ＰＣ５によって表示される監視映像を表示した画面の一例を示す図である。 FIG. 13 is a diagram illustrating an example of a screen displaying a monitoring video displayed by the display PC 5 according to the embodiment of the present invention.

図１３に示す画面の一例では、表示制御プログラム４１４によって、表示対象のカメラの映像を並べた複数の映像を表示している。表示される映像の数は、設置されたカメラ１１０の数としてもよい。設置されたカメラ１１０の数が画面に表示可能な映像の数Ｍを超えている場合には、管理者が選択した画面、又は音声入力イベントを直前に受信したＭ個の映像を並べて表示してもよい。 In the example of the screen shown in FIG. 13, the display control program 414 displays a plurality of videos in which videos of cameras to be displayed are arranged. The number of displayed images may be the number of cameras 110 installed. If the number of installed cameras 110 exceeds the number M of videos that can be displayed on the screen, the screen selected by the administrator or M videos that have been received immediately before the audio input event are displayed side by side. Also good.

各画像には、「拠点Ａ」「拠点Ｂ」などの撮影の拠点を示す情報が表示されている。さらに、撮影しているカメラ１１０の詳細な設置位置を表示してもよいし、カメラ１１０の識別情報を表示するようにしてもよい。 In each image, information indicating a shooting base such as “base A” and “base B” is displayed. Further, the detailed installation position of the camera 110 that is photographing may be displayed, or the identification information of the camera 110 may be displayed.

また、映像１３２などには、音声処理ＰＣ３から受信した音声情報が表示されている。映像１３２には、マイク１１１によって通行人の足音を検知し、音声処理ＰＣ３が表示ＰＣ５に通知した結果、音声情報として、音声種別「足音」と、通行人の方向を指す矢印が表示されている。 Also, the audio information received from the audio processing PC 3 is displayed on the video 132 or the like. In the video 132, the footstep sound of the passerby is detected by the microphone 111, and the voice processing PC3 notifies the display PC5. As a result, the voice type “footstep” and an arrow indicating the direction of the passerby are displayed. .

また、映像１３３では、拠点Ａにおいて、画面との相対位置で右方向に「危ない」という会話が検知されている。また、映像１３６及び映像１３７では、拠点Ｂにおいて、カメラ１１０の死角で衝撃音が発生し、異常が発生していることを確認することができる。 Also, in the video 133, a conversation “dangerous” in the right direction is detected at the location A relative to the screen. In addition, in the video 136 and the video 137, it is possible to confirm that, at the site B, an impact sound is generated at the blind spot of the camera 110 and an abnormality has occurred.

特に、衝撃音などの重要度の高い異常音声を検知した場合には、図１３の映像１３６及び映像１３７に示すように、映像を表示した枠を強調して表示するなどして管理者の注意を引くようにしている。 In particular, when abnormal sound with high importance such as impact sound is detected, as shown in the video 136 and the video 137 in FIG. 13, the frame displaying the video is highlighted and displayed. To pull.

ここで、図７の監視映像を生成及び表示する手順を示すフローチャートの説明に戻る。 Returning to the description of the flowchart showing the procedure for generating and displaying the monitoring video in FIG.

表示ＰＣ５のＣＰＵ４０１は、表示制御プログラム４１４を実行することによって、図１３に示したように、カメラ１１０によって撮影された映像を並べた表示用映像を生成する（Ｓ７０６）。さらに、表示部４０４（モニタ装置５０）に生成された表示用映像を出力する（Ｓ７０７）。 The CPU 401 of the display PC 5 executes the display control program 414 to generate a display video in which video images captured by the camera 110 are arranged as shown in FIG. 13 (S706). Further, the display video generated on the display unit 404 (monitor device 50) is output (S707).

次に、表示ＰＣ５のＣＰＵ４０１は、映像蓄積プログラム４１５を実行することによって、表示ＰＣ５から映像蓄積ＰＣ６に映像及び音声情報を送信する。映像蓄積ＰＣ６のＣＰＵ５０１は、映像蓄積プログラム５１４を実行することによって、表示ＰＣ５から送信された映像及び音声情報を映像データ５１２及び音声情報データ５１３に蓄積する（Ｓ７０８）。 Next, the CPU 401 of the display PC 5 transmits the video and audio information from the display PC 5 to the video storage PC 6 by executing the video storage program 415. The CPU 501 of the video storage PC 6 stores the video and audio information transmitted from the display PC 5 in the video data 512 and the audio information data 513 by executing the video storage program 514 (S708).

また、表示ＰＣ５のＣＰＵ４０１は、蓄積映像取得プログラム４１６を実行することによって、映像蓄積ＰＣ６に蓄積された過去の監視映像を表示することができる。表示ＰＣ５のＣＰＵ４０１は、過去の監視映像の指定を表示するために入力された撮影日時などの条件を映像蓄積ＰＣ６に送信する。 Further, the CPU 401 of the display PC 5 can display the past monitoring video stored in the video storage PC 6 by executing the stored video acquisition program 416. The CPU 401 of the display PC 5 transmits to the video storage PC 6 the conditions such as the shooting date and time input in order to display the past monitoring video designation.

映像蓄積ＰＣ６のＣＰＵ５０１は、映像検索プログラム５１５を実行することによって、指定された条件を満たす映像データ５１２及び音声情報データ５１３を検索し、表示ＰＣ５に送信する。表示ＰＣ５のＣＰＵ４０１は、受信した映像データ及び音声データに基づいて要求された映像を表示する。 The CPU 501 of the video storage PC 6 searches the video data 512 and the audio information data 513 that satisfy the specified conditions by executing the video search program 515, and transmits it to the display PC 5. The CPU 401 of the display PC 5 displays the requested video based on the received video data and audio data.

なお、映像蓄積ＰＣ６のＣＰＵ５０１は、記憶部５１０の空き容量が事前に定めた閾値よりも少なくなった場合には、古い映像及び音声情報を消去することによって、空き容量が閾値以上になるようにする。 The CPU 501 of the video storage PC 6 deletes old video and audio information so that the free capacity becomes equal to or greater than the threshold when the free capacity of the storage unit 510 is less than a predetermined threshold. To do.

最後に、表示ＰＣ５のＣＰＵ４０１は、管理者から終了指示を受け付けたか否かを判定する（Ｓ７０９）。管理者から終了指示を受け付けた場合には（Ｓ７１０の結果が「Ｙ」）、本処理を終了する。管理者から終了指示を受け付けていない場合には（Ｓ７１０の結果が「Ｎ」）、ステップＳ７０１の処理に戻り、映像データを取得する処理から継続して本処理を実行する。 Finally, the CPU 401 of the display PC 5 determines whether an end instruction has been received from the administrator (S709). When an end instruction is received from the administrator (the result of S710 is “Y”), this process ends. If no termination instruction has been received from the administrator (the result of S710 is “N”), the process returns to the process of step S701, and this process is executed continuously from the process of acquiring video data.

本発明の実施の形態によれば、多数のカメラによって撮影された映像に音声情報を重畳して表示することができる。したがって、カメラ１１０の死角で発生した異常及び映像では把握しにくい異常を視覚的に表示することができるため、効率よく監視対象を監視することができる。 According to the embodiment of the present invention, it is possible to superimpose and display audio information on video captured by a large number of cameras. Therefore, since an abnormality that has occurred at the blind spot of the camera 110 and an abnormality that is difficult to grasp from the video can be visually displayed, the monitoring target can be efficiently monitored.

また、本発明の実施の形態によれば、音声の種別及び異常会話の認識結果を表示することによって、会話の具体的な内容を表示してプライバシーを侵害する可能性を少なくすることができる。 Further, according to the embodiment of the present invention, by displaying the speech type and the recognition result of the abnormal conversation, it is possible to reduce the possibility of infringing privacy by displaying the specific contents of the conversation.

本発明の実施の形態の音声情報表示監視システムの一例を示す構成図である。It is a block diagram which shows an example of the audio | voice information display monitoring system of embodiment of this invention. 本発明の実施の形態の音声処理ＰＣの構成の一例を示す図である。It is a figure which shows an example of a structure of the audio | voice processing PC of embodiment of this invention. 本発明の実施の形態のカメラ制御ＰＣの構成の一例を示す図である。It is a figure which shows an example of a structure of camera control PC of embodiment of this invention. 本発明の実施の形態の表示ＰＣの構成の一例を示す図である。It is a figure which shows an example of a structure of display PC of embodiment of this invention. 本発明の実施の形態の映像蓄積ＰＣの構成の一例を示す図である。It is a figure which shows an example of a structure of the image | video storage PC of embodiment of this invention. 本発明の実施の形態のマイクに入力された音声から音声情報を生成する手順を示すフローチャートである。It is a flowchart which shows the procedure which produces | generates audio | voice information from the audio | voice input into the microphone of embodiment of this invention. 本発明の実施の形態の音声情報が付与された監視映像を生成及び表示する手順を示すフローチャートである。It is a flowchart which shows the procedure which produces | generates and displays the monitoring image | video with which the audio | voice information of embodiment of this invention was provided. 本発明の実施の形態のマイク管理情報の一例を示す図である。It is a figure which shows an example of the microphone management information of embodiment of this invention. 本発明の実施の形態のカメラ管理情報の一例を示す図である。It is a figure which shows an example of the camera management information of embodiment of this invention. 本発明の実施の形態の異常音声データの一例を示す図である。It is a figure which shows an example of the abnormal audio | voice data of embodiment of this invention. 本発明の実施の形態の異常会話データの一例を示す図である。It is a figure which shows an example of the abnormal conversation data of embodiment of this invention. 本発明の実施の形態のカメラ及びマイクの設置レイアウトの一例を示す図である。It is a figure which shows an example of the installation layout of the camera and microphone of embodiment of this invention. 本発明の実施の形態の表示ＰＣによって表示される監視映像を表示した画面の一例を示す図である。It is a figure which shows an example of the screen which displayed the monitoring image | video displayed by the display PC of embodiment of this invention.

１入力装置群
２監視センタ
３音声処理ＰＣ
４カメラ制御ＰＣ
５表示ＰＣ
６映像蓄積ＰＣ
５０モニタ
１００通信網
１１０カメラ
１１１マイク
２０１ＣＰＵ
２０２メモリ
２０３入力部
２０４通信部
２１０記憶部
２１１ＯＳ
２１２マイク管理情報
２１３異常音声データ
２１４異常会話データ
２１５音声取得プログラム
２１６音声有無判定プログラム
２１７音源方向推定プログラム
２１８音声種別判定プログラム
２１９音声認識プログラム
２２０イベント通知プログラム
３０１ＣＰＵ
３０２メモリ
３０３入力部
３０４通信部
３１０記憶部
３１１ＯＳ
３１２カメラ管理情報
３１３カメラ映像取得プログラム
３１４カメラ映像送信プログラム
４０１ＣＰＵ
４０２メモリ
４０３入力部
４０４表示部
４０５通信部
４１０記憶部
４１１ＯＳ
４１２イベント取得プログラム
４１３映像処理プログラム
４１４表示制御プログラム
４１５映像蓄積プログラム
４１６蓄積映像取得プログラム
５０１ＣＰＵ
５０２メモリ
５０３入力部
５０４通信部
５１０記憶部
５１１ＯＳ
５１２映像データ
５１３音声情報データ
５１４映像蓄積プログラム
５１５映像検索プログラム
５１６データ管理プログラム 1 Input device group 2 Monitoring center 3 Voice processing PC
4 Camera control PC
5 Display PC
6 Video storage PC
50 Monitor 100 Communication Network 110 Camera 111 Microphone 201 CPU
202 Memory 203 Input unit 204 Communication unit 210 Storage unit 211 OS
212 Microphone management information 213 Abnormal voice data 214 Abnormal conversation data 215 Voice acquisition program 216 Voice presence / absence determination program 217 Sound source direction estimation program 218 Voice type determination program 219 Voice recognition program 220 Event notification program 301 CPU
302 Memory 303 Input Unit 304 Communication Unit 310 Storage Unit 311 OS
312 Camera management information 313 Camera image acquisition program 314 Camera image transmission program 401 CPU
402 Memory 403 Input unit 404 Display unit 405 Communication unit 410 Storage unit 411 OS
412 Event acquisition program 413 Video processing program 414 Display control program 415 Video storage program 416 Stored video acquisition program 501 CPU
502 Memory 503 Input unit 504 Communication unit 510 Storage unit 511 OS
512 Video data 513 Audio information data 514 Video storage program 515 Video search program 516 Data management program

Claims

An audio information display system including a camera, a microphone to which audio is input, and a display device that displays an image captured by the camera,
An audio processing device that generates audio information based on audio input to the microphone;
A video processing device that processes video captured by the camera;
Managing audio type information including the type of audio , microphone management information including a position where the microphone is installed , and camera management information including a position where the camera is installed ;
The voice processing device
By analyzing the voice input to the microphone, a voice recognition result that can be displayed on the display device is obtained,
Based on the audio type information, estimate the type of audio input to the microphone,
Based on the microphone management information and the camera management information , estimate the position of the sound source of the audio input to the microphone,
Generating voice information including information indicating the acquired voice recognition result, the estimated voice type, and the position of the estimated sound source;
Transmitting the generated audio information to the video processing device;
The video processing device includes:
Synthesizing audio information received from the audio processing device with video captured by the camera;
Displaying the synthesized image of the audio information on the display device ;
When the sound processing apparatus estimates the position of the sound source of the sound input to the microphone, the distance between the microphone to which the sound is input and the camera that has captured the video in which the sound information is synthesized is a predetermined threshold value. If it is larger, the direction in which the microphone to which the sound is input is set as information indicating the position of the sound source .

In the audio type information, importance is set for each type of audio,
The audio information display system according to claim 1, wherein the video processing device displays the audio information based on the importance.

The voice information display system includes abnormal conversation information including a predesignated word,
The voice processing device
When the estimated voice type is conversation, by recognizing the voice input to the microphone, information indicating the content of the conversation is generated,
The voice information is generated based on the abnormal conversation information when a word included in the abnormal conversation information is included in information indicating the content of the conversation. Voice information display system.

In the abnormal conversation information, importance is set for each word,
The audio information display system according to claim 3, wherein the video processing device displays the audio information based on the importance.

The audio information display system according to claim 1, wherein the audio information includes information indicating a sound pressure of the audio input to the microphone.

2. The audio information display system according to claim 1, wherein the video processing apparatus preferentially displays the video combined with the audio information when displaying a plurality of videos taken by the camera. 3. .

  The audio information display system further includes a video storage device including a storage unit that stores video captured by the camera.
  The video processing device transmits a video to be displayed on the display device to the video storage device,
  The video storage device
  Storing the video transmitted from the video processing device in the storage unit;
  The audio information display system according to claim 1, wherein the requested video is provided when an acquisition request for the video stored in the storage unit is received.