JP4175180B2

JP4175180B2 - Monitoring and reporting system

Info

Publication number: JP4175180B2
Application number: JP2003152826A
Authority: JP
Inventors: 剛宏関根; 朗馬場; 高史西山
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2003-05-29
Filing date: 2003-05-29
Publication date: 2008-11-05
Anticipated expiration: 2023-05-29
Also published as: JP2004357014A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の通報ユニットにて発生した緊急事態を監視者に通報するための監視通報システムに関する。
【０００２】
【従来の技術】
従来より、緊急通報装置としては、下記の特許文献１〜特許文献４などに記載された技術が知られている。下記特許文献１に記載された技術は、緊急通報装置に設けられた緊急通報ボタンが押下された場合、又は話者の特徴的な音声を認識した場合に、異常が発生しているか否かを判断している。
【０００３】
また、下記の特許文献２及び特許文献３に記載された技術では、水平面で異なる方位に向けて配置した複数のマイクを用いて、侵入物体等の音源の方位を特定し、当該特定した方位にビデオカメラを向けることにより、侵入物体等の画像及び音声を取得している。
【０００４】
さらに、下記特許文献４に関連する監視通報システムとして、出願人は図１５に示すシステムを構成している。具体的には、同図に示すように、複数の通報ユニット１０１Ａ〜１０１Ｃ・・・と、監視サーバ１０２及び監視用モニタ１０３とを光ネットワーク１０４にて接続して、各通報ユニット１０１にて取得した画像及び音声を送信するように構成されている。ここで、各通報ユニット１０１は、全方位式カメラ１１１及び可動式カメラ１１２により撮像した画像データを環境画像ノイズ除去部１２１によりノイズ除去して画像記憶部１２２に記憶すると共に、マイク１１３により集音した音声を環境音除去部１３１により雑音除去して音声記憶部１３２に記憶したりする。
【０００５】
そして、この監視通報システムでは、監視者が監視サーバ１０２を操作することによりカメラ遠隔制御信号を生成して、カメラ制御部１２３により可動式カメラ１１２の撮像方向を制御している。また、この監視通報システムでは、通話スイッチ１１４がユーザにより押下操作された場合には、画像記憶部１２２に記憶した画像データをメディアコンバータ１１５及びルータ１１６を介して送信したり、ＶＯＩＰ処理部１３３によってスピーカ部１１７及びマイク１１３を使用して、監視者とユーザとの通話を実現していた。
【０００６】
【特許文献１】
特開２０００−３４８２７８号公報
【０００７】
【特許文献２】
特開２００２−３４４９５７号公報
【０００８】
【特許文献３】
特開平７−２８４１８６号公報
【０００９】
【特許文献４】
特開２００２−２８８７６４号公報
【００１０】
【発明が解決しようとする課題】
しかしながら、前記特許文献４に関連する技術では、クライアント側である端末にて取得した画像情報をそのまま監視サーバ等に伝送する構成となっており、クライアントとサーバ間での伝送帯域や、サーバ側での監視タスクが限られているため、大規模な数量のクライアントに対応することが困難である。また、仮に同技術を採用して、大規模クライアントを備えたシステムを構築できたとしても、クライアントが設置されている全域に亘ってサーバ側で監視を行うためのタスクが大きくなり、監視者の負担が大きくなるという問題点があった。
【００１１】
また、かかる問題を解決するために、本願の発明者らは、後述する解決手段を提案するに至ったが、特許文献１のような屋内での異常認識でなく、屋外での異常認識にあっては環境騒音が認識精度を低下させるという問題に直面した。より詳しく説明すると、例えば、異常音声に似ている環境騒音が発せられた場合には、当該環境騒音を異常音声であると誤認識して誤報を発生することがあり、その誤報がまず問題となること、及びその誤報が多い場合には、サーバ側の監視タスクを増大させるに至り、当該解決手段の効果を没却せしめかねないという問題があった。
【００１２】
そこで、本発明は、上述した実情に鑑みて提案されたものであり、大規模なクライアント側で取得した音声及び画像をサーバ側に伝送するに際して、クライアントから送信する情報伝送量を低減してサーバ側の監視タスクを低減すると共に、緊急通常時の監視応答レスポンスを高め、更には通報者にて特別な操作をする必要を無くすことができ、併せて、環境騒音があっても誤報することなく通報信頼性を向上させた監視通報システムを提供することを目的とする。
【００２５】
【課題を解決するための手段】
本発明に係る他の監視通報システムは、上述の課題を解決するために、複数の通報ユニットと、当該通報ユニットにより生成した音声及び画像を監視者に提示する監視サーバとが通信回線を介して接続されたシステムであって、前記各通報ユニットは、監視対象を撮像して、画像データを生成する撮像手段と、前記撮像手段周辺の音を集音して、音声データを生成する集音手段と、前記集音手段で生成された音声データから前記監視対象の状況を認識する音声認識手段と、少なくとも画像データ及び音声データを前記監視サーバとの間で通信し、前記音声認識手段により前記監視対象の状況に異常が発生したと判定した場合に、異常発生信号を前記監視サーバに送信する通信手段と、前記監視サーバからの遠隔制御信号により前記撮像手段の撮像方向を制御する撮像制御手段とを備え、前記監視サーバは、前記通報ユニットとの間で通信をする通信手段と、前記複数の通報ユニットの設置場所を示す設置場所データを記憶し、前記複数の通報ユニットから複数の異常発生信号を受信した場合に、前記設置場所データを参照して異常が発生した通報ユニットの設置場所を抽出し、異常が発生した前記通報ユニットの設置場所の順序に従って、前記監視対象の移動方向を認識する移動方向認識手段と、前記異常発生信号を送信した通報ユニット、及び当該通報ユニットから前記移動方向認識手段により認識した移動方向に存在する通報ユニットの前記撮像手段に対する前記監視対象の方向を推定する方向推定手段と、前記方向推定手段により推定した前記各通報ユニットの撮像手段の撮像方向を示す遠隔制御信号を前記各通報ユニットに送信するように前記通信手段を制御し、当該遠隔制御信号に従って前記撮像制御手段により撮像方向が制御された各通報ユニットの前記撮像手段により生成した画像データ、及び前記集音手段により生成した音声データを受信する監視制御手段とを備える。
【００２６】
このような監視通報システムでは、監視サーバにより周囲の異常を監視させるために、監視サーバに予め各通報ユニットの設置場所データを記憶しておき、移動方向を認識して、当該移動方向に応じて通報ユニットの撮像手段の撮像方向を制御する。
【００２７】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
【００２８】
本発明は、例えば図１に示すように構成された監視通報システムに適用される。
【００２９】
［監視通報システムの構成］
この監視通報システムは、光伝送ネットワーク１に、クライアント側である複数の通報ユニット２Ａ、２Ｂ、２Ｃ・・・（以下、総称するときには単に「通報ユニット２」と呼ぶ。）と、各通報ユニット２を管理するための監視サーバ３とが接続されて構成されている。
【００３０】
監視サーバ３は、光伝送ネットワーク１を介して、各通報ユニット２からの音声データ及び画像データが送信され、受信した音声データを音声記憶部に記憶すると共に、画像データを画像記憶部に記憶する。また、この監視サーバ３では、ユーザである監視者に画像及び音声を提示するための複数の監視用モニタ１１、及び監視者に異常を判断させるための異常判断表示用モニタ１２を備える。更に、監視用モニタ１１及び異常判断表示用モニタ１２に表示している画像に対応した音声信号を放音する放音機構が設けられている。これにより、監視サーバ３では、監視者により通報ユニット２の周辺で発生した緊急事態を監視することを可能としている。
【００３１】
更にまた、この監視サーバ３は、監視者に操作される操作入力機構を備え、当該操作入力機構が操作されることに応じた通報ユニット２に対するカメラ遠隔制御信号、音声認識設定信号及び画像認識制御信号を通報ユニット２に送る。なお、監視サーバ３による他の処理については後述する。
【００３２】
この監視通報システムでは、例えば交差点道路周辺において、図２に示すように、通報ユニット２Ａ〜２Ｇが屋外のある地域の範囲の道路に、間隔を介して設置されて光伝送ネットワーク１に接続されて構成されている。各通報ユニット２は、その構成例の一例を図３に示すように、道路に設けられたポール２ａにマイク２６、全方位カメラ４１及び可動式カメラ４２が設けられている。本例において、通報ユニット２は、ユーザが歩行しているときの略頭部位置のポール２ａにマイク２６を設け、ユーザの頭上のポール２ａに可動式カメラ４２及び全方位カメラ４１が設けられている。
【００３３】
通報ユニット２は、設置場所周囲の音声及び画像を取得して、監視サーバ３に送信するものである。この通報ユニット２は、光伝送ネットワーク１を介して監視サーバ３との間で通信をするための通信機能として、メディアコンバータ２１、ルータ２２を備える。
【００３４】
この通報ユニット２では、音声や画像を監視サーバ３に送信するに際して、ルータ２２により宛先を監視サーバ３とした通信データを作成し、メディアコンバータ２１により各種信号変換を行って光信号として光伝送ネットワーク１に送出する。また、この通報ユニット２では、光伝送ネットワーク１を介して監視サーバ３からの光信号をメディアコンバータ２１及びルータ２２にて受信して、カメラサーバ２３や音声信号処理部２４に出力する。
【００３５】
「通報ユニット２による音声データ処理」
つぎに、通報ユニット２による音声データに関する処理及びその機能的な構成について説明する。
【００３６】
また、この通報ユニット２は、監視サーバ３に音声データを伝送するための構成として、音声信号処理部２４、スピーカ部２５、複数のマイク２６Ａ、２６Ｂ、２６Ｃ（以下、総称するときには単に「マイク２６」と呼ぶ。）、通話スイッチ２７を備える。
【００３７】
マイク２６Ａ〜２６Ｃは、図４にポール２ａを上方から見た場合を示すように、ポール２ａの側面に同一直線上に並べずに、一定の距離ｄを介して配置されている。このマイク２６は、通報ユニット２の周辺の音声を受けると、当該音声から音声信号を生成して音声信号処理部２４に出力する。
【００３８】
音声信号処理部２４は、マイク２６Ａ〜２６Ｃからの音声信号を入力すると、当該各音声信号を図示しないアンプ及びＡ／Ｄ変換器により所定のレベルのディジタルデータである音声データとして環境音除去部３１に送る。
【００３９】
環境音除去部３１は、マイク２６Ａ〜２６Ｃから音声データが送られると、環境雑音成分を除去する処理を各音声データについて行い、通報ユニット２の近傍に存在するユーザの音声を含む複数の音声データを作成し、音源方向推定部３２及び音声認識部３３に送る。
【００４０】
音源方向推定部３２では、環境音除去部３１からの複数の音声データが送られると、当該複数の音声データを用いて相関処理を行う。このとき、音源方向推定部３２は、サンプリング時刻をずらした２つの異なる音声データを足し合わせ、信号レベルが最も大きくなる音声データの組み合わせを認識し、当該音声データの組み合わせたマイク２６の音声検出方向を音源到来方向候補とする。具体的に説明すると、音源方向推定部３２は、例えばマイク２６Ａにより検出された音声の音声データとマイク２６Ｂにより検出された音声の音声データとを足し合わせ、その音声レベルを計算する。また、音源方向推定部３２は、マイク２６Ａとマイク２６Ｃとの組み合わせ、マイク２６Ｂとマイク２６Ｃとの組み合わせについても同様の計算をして、３つの組み合わせのうち最も音声レベルが高い組み合わせを認識して音源到来方向候補とする。そして、音源方向推定部３２では、複数の音源到来方向候補から、監視対象の方向となる方向推定情報θｖを推定する。
【００４１】
なお、音源方向推定部３２では、同一のポール２ａに取り付けられたマイク２６のみを使用して方向推定情報θｖを推定する場合のみならず、複数の通報ユニット２で生成した音声データを光伝送ネットワーク１を介して受信して方向推定情報θｖを推定しても良い。
【００４２】
また、音源方向推定部３２では、各マイク２６の音声検出時刻の時間差又は位相差を求めて、監視対象の方向を推定しても良い。更に、音源方向推定部３２では、各マイク２６が、地面に対して垂直に設置されたポール２ａの側面であって、地面に対する水平面上に他のマイク２６と所定の間隔を介して取り付けられている場合、各マイク２６の音波の到達時間差又は位相差を求めて、監視対象の水平面方向を推定しても良い。更にまた、この音源方向推定部３２では、各マイク２６が、地面に対して垂直に設置されたポール２ａの側面であって、地面に対する垂直面上に他のマイク２６と所定の間隔を介して取り付けられている場合、各マイク２６の音波の到達時間差又は位相差を求めて、監視対象の垂直面方向を推定しても良い。これにより、音源方向推定部３２では、監視対象からの異常音の到来方向を的確に推定することができる。
【００４３】
音声認識部３３は、環境音除去部３１から音声データが送られると、当該音声データから音響的な特徴ベクトルを抽出し、当該抽出した特徴ベクトルと、予め異常特徴音データベース記憶部３４に記憶しておいた特徴ベクトルで表現された単語辞書データ或いは文章辞書データとのマッチングを行って音声認識をする。ここで、単語辞書及び文章辞書には、例えば「助けて」という音声を示す特徴ベクトルや、「やめて」という音声を示す特徴ベクトルなど、異常時に発声されると想定されるものが含まれている。
【００４４】
また、この異常特徴音データベース記憶部３４には、監視対象が生体である場合の異常発生時の音声データのみならず、又は監視対象が物体である場合の物体の衝突音又は破壊音を示す音響データを記憶していても良く、当該音響データを用いて監視対象の異常発生の有無を認識しても良い。これにより、音声認識部３３では、異常特徴音データベース記憶部３４の内容を参照して、特定の異常音を認識することができる。
【００４５】
更に、異常特徴音データベース記憶部３４には、音声データ又は音響データの基本周波数、パワースペクトル、フォルマント、ケプストラム及びこれらの時間的変位のうち、少なくとも一つを音声データ又は音響データの特徴ベクトル（特徴量）として記憶しておいても良い。
【００４６】
更にまた、異常特徴音データベース記憶部３４には、音声データがサンプリングされた時系列データｓ（ｔ）として与えられたとき、下記の式に示すＦＦＴ（Fast Fourier Transform）などのフーリエ変換を行って、音声データを周波数関数に変換し、図５に示すように、当該周波数関数の複数のピークｆ１〜ｆ３に対応した周波数を近似的にフォルマントとする。
【００４７】
Ｓ（ｅｘｐ（−ｊｗｍ））＝Σｓ（ｔ）・ｅｘｐ（−ｊｗｍ）
そして、異常特徴音データベース記憶部３４には、フォルマントを音声データの特徴量として記憶しておく。
【００４８】
また、異常特徴音データベース記憶部３４には、上記式における周波数関数Ｓ（ｅｘｐ（−ｊｗｍ））の対数を下記の式のように逆フーリエ変換を行うことによりケプストラムＣ（ｎ）を求め、当該ケプストラムを音声データの特徴量として記憶しておいても良い。
【００４９】
Ｃ（ｎ）＝（１／Ｎ）ΣｌｏｇＹ・ｅｘｐ（２πｋ／Ｎ）
そして、音声認識部３３は、異常特徴音データベース記憶部３４に記憶されている特徴ベクトルのうち、抽出した特徴ベクトルと最も距離が近い特徴ベクトルを選択する。そして、音声認識部３３は、選択した特徴ベクトルと、抽出した特徴ベクトルとの距離によって異常音である度合い（以下、音声異常類似度Ａｖと呼ぶ。）を計算する。音声認識部３３は、計算した音声異常類似度Ａｖを音声記憶部３５に記憶する。
【００５０】
なお、異常特徴音データベース記憶部３４に記憶されているデータは、人の声に限らず、例えば車の衝突音の特徴ベクトルなどであっても良い。これにより、人の声のみならず、物体の衝突や破壊等の事故の異常であっても音声異常類似度Ａｖを生成することができる。
【００５１】
また、この音声信号処理部２４は、通報ユニット２が設置されている場所に応じて特有の環境音の特徴ベクトルが記憶された環境特徴音データベース記憶部３６を備える。この環境特徴音データベース記憶部３６には、例えば電車の踏切音や横断歩道の警告音など異常音でない環境音の特徴ベクトルが記憶されている。
【００５２】
この環境音の特徴ベクトルは、音声信号処理部２４からルータ２２及びメディアコンバータ２１を介して監視サーバ３に送られ、監視サーバ３にて異常音でない音と判断されたことを示す音声認識設定信号が送られることに応じて、環境特徴音データベース記憶部３６に登録される。なお、監視サーバ３では、通報ユニット２からの環境音が異常音か否かを判定する処理を予め用意しておいた環境特徴音データベースを使用して自動的に行っても良く、監視者の手動により行っても良い。
【００５３】
この環境特徴音データベース記憶部３６に記憶された環境音の特徴ベクトルは、音声認識部３３にて異常音の音声認識をするに際して、音声認識部３３により読み込まれて、異常音か否かを判定する特徴ベクトルと比較される。そして、音声認識部３３では、異常音か否かを判定する特徴ベクトルが環境音の特徴ベクトルに近い場合には異常音と判定しないとする。これにより、通報ユニット２では、環境音を異常音と誤認識することなく、異常検出の信頼性を向上させることができる。
【００５４】
音声記憶部３５には、以前にマイク２６にて検出した音声データ、及び当該音声データに対応した音声異常類似度Ａｖが記憶されている。この音声記憶部３５に記憶されている音声データ及び音声異常類似度Ａｖは、監視者が監視サーバ３を操作することにより、光伝送ネットワーク１を介して監視サーバ３でダウンロードすることが可能となっている。
【００５５】
なお、この音声異常類似度Ａｖは、監視サーバ３からの音声認識設定信号によって設定されたものを含む。すなわち、監視サーバ３では、各通報ユニット２の音声記憶部３５に記憶された音声データを参照して、当該音声データが異常音に該当すると判定した場合には、当該音声データを音声認識部３３により音声認識させて特徴ベクトルを異常音として異常特徴音データベース記憶部３４に追加する。このように、通報ユニット２では、異常特徴音データベース記憶部３４や環境特徴音データベース記憶部３６に記憶する特徴ベクトルが監視サーバ３により追加されることにより、設置初期時と比較して異常検出の信頼性を向上させることができる。
【００５６】
また、音声信号処理部２４は、環境音除去部３１から音声データが出力されるＶＯＩＰ（Voice over IP（Internet Protocol））処理部３７を備える。このＶＯＩＰ処理部３７は、通話スイッチ２７が操作された場合に、環境音除去部３１からの音声データを監視サーバ３に送る。
【００５７】
「通報ユニット２による画像データ処理」
つぎに、通報ユニット２による画像データに関する処理及びその機能的な構成について説明する。
【００５８】
この通報ユニット２は、監視サーバ３に画像信号を伝送するための構成として、カメラサーバ２３、全方位カメラ４１、可動式カメラ４２及び入力センサ４３を備える。
【００５９】
全方位カメラ４１は、広視野角を有するレンズにより集光し、内部のＣＣＤ（Charge Coupled Device）撮像素子により画像信号を生成する。可動式カメラ４２は、パン機能、チルト機能及びズーム機能を備え、ＣＣＤ撮像素子により画像信号を生成する。全方位カメラ４１及び可動式カメラ４２は、カメラサーバ２３と接続され、画像信号をカメラサーバ２３に出力する。
【００６０】
カメラサーバ２３では、全方位カメラ４１から通報ユニット２の周辺状況を撮像した画像信号を入力すると、当該画像信号をＡ／Ｄ変換して画像データとして環境画像ノイズ除去部５１に送る。この環境画像ノイズ除去部５１では、画像データを入力すると、当該画像データからノイズを除去して、画像認識部５２に送る。
【００６１】
画像認識部５２は、図６に示すような機能構成を有し、全方位カメラ４１及び可動式カメラ４２からの画像データについて画像認識処理をする。
【００６２】
この画像認識部５２では、画像データを入力すると、オブジェクト画像抽出部６１により、例えば背景差分法などの画像処理を行うことにより、背景画像と、その他のオブジェクト画像とを分割する。このオブジェクト画像としては、例えば人物や車などを示す画像データである。そして、画像認識部５２では、例えば静止画像の画像データを入力した場合、特徴量抽出部６２により、オブジェクト画像の画像内位置情報、大きさ情報、色情報などを認識して、当該各情報をオブジェクトについての特徴量に変換する。また、特徴量抽出部６２では、複数のフレームに亘る動画像を入力した場合には、オブジェクト画像の動き速さ情報も画像特徴量として変換する。
【００６３】
そして、画像認識部５２では、オブジェクト画像の位置情報から、当該オブジェクト画像に相当する監視対象物（以下、オブジェクトと呼ぶ。）の方向θｉの候補である方向推定情報を移動方向推定部６３により求めて、統合方向検知部５３に送る。
【００６４】
また、このカメラサーバ２３は、異常と想定されるオブジェクトの特徴量を蓄積して記憶した特徴画像データベース記憶部５４を備える。画像認識部５２では、オブジェクトの特徴量を求めると、特徴画像データベース記憶部５４に蓄積されたオブジェクトの特徴量とマッチング処理をして、求めたオブジェクトの特徴量と蓄積されたオブジェクトの特徴量とを用いて特徴ベクトルの距離を計算する。そして、画像認識部５２では、特徴ベクトルの距離によって、異常画像類似度計算部６４により、異常である度合いを示す画像異常類似度Ａｊを求める。
【００６５】
更に、画像認識部５２は、環境画像ノイズ除去部５１からの画像データを画像記憶部５５に記憶させる。
【００６６】
更に、このカメラサーバ２３は、例えば赤外線を用い、異なる方向の人体を検知するための複数の人体検知センサを入力センサ４３として備える。この入力センサ４３は、人物オブジェクトが通報ユニット２の周囲に存在する場合に、複数の人体検知センサのうち、人体を検出した人体検知センサを特定する。そして、この入力センサ４３は、特定した人体検知センサの人体検知方向から人体オブジェクトが存在する方向を推定して、方向推定情報として統合方向検知部５３に送る。
【００６７】
また、この入力センサ４３としては、図７に通報ユニット２を上方から見た様子を示すように、全方位カメラ４１又は可動式カメラ４２の撮像領域内であって異なる検出範囲とされた複数の距離センサ４４Ａ〜４４Ｄを備えるものであっても良い。そして、この距離センサ４４Ａ〜４４Ｄでは、人体等の監視対象を検出した場合には距離情報を含む方向推定情報を統合方向検知部５３に送る。なお、各距離センサ４４は、超音波センサ又は光学式センサであれば良い。
【００６８】
更に、入力センサ４３としては、全方位カメラ４１又は可動式カメラ４２の撮像領域内であって異なる検出範囲とされた複数の赤外線センサからなるものであっても良い。これにより、各赤外線センサでは、監視対象から反射して検出した赤外線情報を用いて監視対象の存在する領域を示す方向推定情報を統合方向検知部５３に送る。
【００６９】
統合方向検知部５３は、音源方向推定部３２、画像認識部５２及び入力センサ４３からの各方向推定情報から、オブジェクトの存在する通報ユニット２に対する方向を決定して、カメラ制御部５６に送る。
【００７０】
カメラ制御部５６では、統合方向検知部５３により決定されたオブジェクトの方向から、可動式カメラ４２のパン及びチルト量Δθ、ズーム量ΔＺを設定する。このとき、カメラ制御部５６では、パン及びチルト量Δθを設定するための可動式カメラ４２の方向推定角θを、音声異常類似度Ａｖ、画像異常類似度Ａｉ、音声を用いた方向推定情報θｖ、画像を用いた方向推定情報θｉを用いて、
θ＝（Ａｖ×θｖ＋Ａｉ×θｉ）／（Ａｖ＋Ａｉ）
なる演算をすることにより求める。すなわち、カメラ制御部５６では、方向推定情報θｖ及び方向推定情報θｉの重み付け係数として、音声異常類似度Ａｖ及び画像異常類似度Ａｉを使用する。そして、カメラ制御部５６では、求めた方向推定角θに対する現在の可動式カメラ４２の撮像方向から、パン及びチルト量Δθを決定する。
【００７１】
また、カメラ制御部５６は、ズーム量ΔＺを設定するためのズーム設定値Ｚを、方向推定情報θｖと方向推定情報θｉとの差が小さい場合には、当該方向推定情報θｉ及び方向推定情報θｖにおけるオブジェクトの存在確率が高いのでズーム設定値Ｚを大きくし、方向推定情報θｖと方向推定情報θｉとの差が大きい場合には当該方向推定情報θｉ及び方向推定情報θｖにおけるオブジェクトの存在確率が低いのでズーム設定値Ｚを小さくするように設定する。
【００７２】
このとき、カメラ制御部５６は、例えば、方向推定情報θｖ及び方向推定情報θｉを用いて、
Ｚ＝α／（θｖ−θｉ）
α：定数
なる演算をすることによりズーム設定値Ｚを求める。そして、カメラ制御部５６では、求めたズーム設定値Ｚに対する現在の可動式カメラ４２のズーム設定値から、ズーム量ΔＺを決定する。
【００７３】
そして、カメラ制御部５６では、パン及びチルト量Δθ及びズーム量ΔＺだけ可動式カメラ４２を駆動させて、可動式カメラ４２にオブジェクトを撮像させ、可動式カメラ４２により撮像した画像データを画像認識部５２に送る。
【００７４】
「異常判定処理」
つぎに、通報ユニット２及び監視サーバ３による異常判定に関する処理及びその機能的な構成について説明する。
【００７５】
画像認識部５２では、異常判定をするに際して、上述の音声異常類似度Ａｖ及び画像異常類似度Ａｉを統合することにより異常度Ａを演算する。このとき、画像認識部５２では、例えば下記の式を用いて、
Ａ＝α×Ａｉ＋β×Ａｖ
α、β：定数
なる演算をする。これにより、画像認識部５２では、通報ユニット２についての異常度Ａを計算し、光伝送ネットワーク１を介して監視サーバ３に送る。また、画像認識部５２では、異常値Ａを演算し、当該異常値Ａが予め設定しておいた閾値よりも高くなった場合には、可動式カメラ４２にて撮像された画像及び音声を監視サーバ３に送信させても良い。
【００７６】
これに応じて、監視サーバ３では、通報ユニット２から送信された複数の画像を監視用モニタ１１または異常判断表示用モニタ１２にて表示することにより、監視者に異常度Ａが閾値以上となっている通報ユニット２周辺の画像及び音声を監視させる。
【００７７】
また、監視サーバ３では、図８に示すように、光伝送ネットワーク１を介して接続されている複数の通報ユニット２が存在する自身の監視地域７１において、図８（Ａ）に示すように、通報ユニット２−１の異常度Ａが低い場合には、任意の通報ユニット２−２，２−３，２−４により撮像した画像データを監視しているとする。そして、通報ユニット２−１の異常度Ａが閾値を越えた場合には、当該通報ユニット２−１から監視サーバ３にその旨の情報が送信される。
【００７８】
これに応じて、監視サーバ３では、異常が通報ユニット２−１周辺で発生していることを判断して、当該通報ユニット２−１の周辺の通報ユニット２−１１，通報ユニット２−１２，通報ユニット２−１３により撮像した画像データ及び音声データを送信する制御信号を各通報ユニット２に送信する。また、監視サーバ３では、通報ユニット２−１の方向を撮像方向とするカメラ遠隔制御信号を通報ユニット２−１１，通報ユニット２−１２，通報ユニット２−１３に送信する。また、監視サーバ３では、異常度Ａが閾値以上となった通報ユニット２−１について、ＶＯＩＰ処理部３７による音声通信を可能とする制御信号を送信して、通報ユニット２−１の通話スイッチ２７を操作したユーザと監視サーバ３側の監視者との通話を可能とする。これにより、通報ユニット２−１では、通話スイッチ２７が操作されてマイク２６で検出した音声データを監視サーバ３に送信する状態となる。
【００７９】
これにより、通報ユニット２−１１，２−１２，２−１３は、可動式カメラ４２によりカメラ遠隔制御信号に従った方向を撮像して、画像データ及び音声データを監視サーバ３に送信して、監視者による監視を可能とするアクティブ状態となる。監視サーバ３では、通報ユニット２−１，通報ユニット２−１１，通報ユニット２−１２，通報ユニット２−１３から画像データ及び音声データが送られると、監視用モニタ１１及び異常判断表示用モニタ１２により画像及び音声を確認して、多くの情報によって監視者による監視を行わせる。
【００８０】
また、この監視通報システムでは、異常度Ａが高くなった場合に、通報ユニット２−１のみならず、通報ユニット２−１１，通報ユニット２−１２，通報ユニット２−１３により取得した画像データ及び音声データを記憶しておくことにより、多角的に異常が発生した地域の情報を分析することを可能とする。
【００８１】
このような監視通報システムでは、異常度Ａを上回る通報ユニット２についてのみ監視サーバ３にて監視をするようにしたので、監視サーバ３側にてすべての通報ユニット２について監視を行う必要がない。したがって、この監視通報システムによれば、監視サーバ３側の監視タスクを低減することができる。
【００８２】
また、この監視通報システムでは、全方位カメラ４１及び可動式カメラ４２にて撮像して取得した画像データを画像記憶部５５に記憶して、監視サーバ３からカメラサーバ２３へのダウンロード要求に応じて画像データを送信することができるので、監視者により過去の画像をダウンロードして解析をさせることができる。
【００８３】
「異常監視処理」
つぎに、上述したように構成された監視通報システムにおいて、上述した処理の他に実現可能な異常監視処理の処理手順について図９〜図１４を参照して説明する。
【００８４】
図９に示す通報ユニット２の異常監視処理では、先ず、処理開始時において可動式カメラ４２の角度を初期位置とするようにカメラ制御部５６により図示しない可動機構を制御し（ステップＳ１）、ズームアウトの程度を最大とするように可動式カメラ４２のズーム機能をカメラ制御部５６により制御する（ステップＳ２）。これにより、通報ユニット２では、可動式カメラ４２により撮像した画像信号を用いて広角の画像データを生成する状態となる。
【００８５】
このような状態において、通報ユニット２では、可動式カメラ４２により撮像されて取得した画像データを用いて、少なくとも画像認識部５２による画像異常類似度Ａｉを求めて、異常度Ａを演算する異常判定処理を行う。そして、通報ユニット２では、異常度Ａが所定の閾値よりも低いと判定した場合には（ステップＳ３）、ステップＳ１及びステップＳ２の処理を繰り返し、異常度Ａが所定の閾値よりも高いと判定した場合には（ステップＳ３）、画像認識部５２により、ステップＳ２にて設定したズーム機能により撮像して生成している広角の画像データから、監視対象となる移動物体を抽出する処理をする（ステップＳ４）。そして、画像認識部５２では、抽出した移動物体の位置から、方向推定情報θｉを作成して統合方向検知部５３に送る。
【００８６】
次に、通報ユニット２では、統合方向検知部５３により、少なくとも方向推定情報θｉを用いて可動式カメラ４２の撮像方向を推定し、移動物体が画像データの中心となるように可動式カメラ４２を制御し（ステップＳ５）、ズームインするように可動式カメラ４２を制御する（ステップＳ６）。これにより、通報ユニット２では、移動物体を画像中心に含む画像データを生成して、画像記憶部５５に記憶したり、監視サーバ３に送信することが可能となる。
【００８７】
つぎに、図１０に示す通報ユニット２の異常監視処理では、図９に示した異常監視処理と同様にステップＳ１〜ステップＳ３の処理を行い、ステップＳ３にて異常度Ａが所定の閾値よりも高いと判定した場合に、画像認識部５２により、全方位カメラ４１により撮像して生成した全方位の画像データを用いて移動物体を抽出する（ステップＳ１１）。そして、画像認識部５２では、移動物体の抽出位置から、方向推定情報θｉを作成して統合方向検知部５３に送る。
【００８８】
これに応じて、通報ユニット２では、統合方向検知部５３により少なくとも方向推定情報θｉを用いて移動物体の存在位置を計算し（ステップＳ１２）、当該存在位置の計算値からカメラ制御部５６により可動式カメラ４２の撮像方向（角度）を制御し（ステップＳ１３）、ズームインするように可動式カメラ４２を制御する（ステップＳ６）。これにより、通報ユニット２では、移動物体を含む画像データを生成して、画像記憶部５５に記憶したり、監視サーバ３に送信することが可能となる。
【００８９】
つぎに、図１１に示す通報ユニット２の異常監視処理では、先ず、ステップＳ２１において、音声認識部３３により、複数のマイク２６Ａ〜マイク２６Ｃについて、監視対象からの音声の方向を決定するための遅延時間を決定する。そして、音声認識部３３では、ステップＳ２２において、各マイク２６Ａ、マイク２６Ｂ及びマイク２６Ｃにより検出して生成した音声データ（チャンネル信号）を、ステップＳ２１にて決定した遅延時間だけずらして加算することにより、遅延和Ｓ（λ）を得る。このとき、音声認識部３３では、マイク２６Ａとマイク２６Ｂ、マイク２６Ａとマイク２６Ｃ、マイク２６Ｂとマイク２６Ｃの組み合わせについての遅延和を求める。
【００９０】
これにより、音声認識部３３では、ステップＳ２３において、全ての組み合わせ、すなわち全ての角度について遅延和Ｓ（λ）を計算したと判定した場合には、最も値が大きい遅延和Ｓ（λ）を求めて統合方向検知部５３に方向推定情報θｖとして送る（ステップＳ２４）。また、この音声認識部３３では、音声検出時刻の時間差又は位相差を求めて、方向推定情報θｖを作成しても良い。
【００９１】
次に、統合方向検知部５３では、少なくとも方向推定情報θｖから可動式カメラ４２が撮像する推定方向を求め、カメラ制御部５６により可動式カメラ４２の撮像方向を制御する。
【００９２】
つぎに、図１２に示す監視サーバ３の異常監視処理では、監視サーバ３により、通報ユニット２Ａからの時系列データである音声データＳＡ（ｔ）を受信し（ステップＳ３１）、通報ユニット２Ｂからの時系列データである音声データＳＢ（ｔ）を受信し（ステップＳ３２）、通報ユニット２Ｃからの時系列データである音声データＳＣ（ｔ）を受信する（ステップＳ３３）。
【００９３】
次に、監視サーバ３では、各音声データＳＡ（ｔ）、音声データＳＢ（ｔ）及び音声データＳＣ（ｔ）を用いて、各通報ユニット２間における音声データの相関を計算する（ステップＳ３４）。そして、監視サーバ３では、相関計算結果から、監視対象が発する音声の到来方向を推定し、当該到来方向を撮像方向とするためのカメラ遠隔制御信号を通報ユニット２Ａ、通報ユニット２Ｂ、通報ユニット２Ｃについて作成して送信する。これにより、通報ユニット２Ａ、通報ユニット２Ｂ及び通報ユニット２Ｃは、ステップＳ３５にて推定された到来方向を撮像方向とするように可動式カメラ４２を制御することができる。
【００９４】
このような異常監視処理を行う監視通報システムによれば、設置場所が異なる複数の通報ユニット２を単一の音声到来場所に向けることにより、複数の角度から異常事態の画像データを得ることができ、多角的な異常事態の分析をさせることができる。
【００９５】
つぎに、図１３に示す監視サーバ３の異常監視処理では、先ず、監視用モニタ１１により画像及び音声を監視者に提示している通報ユニット２Ａ、及び当該通報ユニット２Ａの周辺場所に設置された通報ユニット２Ｂ及び通報ユニット２Ｃにより取得した音声データＳＡ（λ）、ＳＢ（λ）及びＳＣ（λ）を監視サーバ３により受信する（ステップＳ４１）。
【００９６】
そして、監視サーバ３では、通報ユニット２Ａの異常度Ａが所定の閾値を超えて、通報ユニット２Ａから異常発生信号を受信した場合（ステップＳ４２）、通報ユニット２Ａの周辺の通報ユニット２Ｂ及び通報ユニット２Ｃを監視対象とし（ステップＳ４３）、ステップＳ４１にて受信した各音声データＳＡ（λ）、ＳＢ（λ）、ＳＣ（λ）の相関を計算する（ステップＳ４４）。そして、監視サーバ３では、相関計算結果から、監視対象が発する音声の到来方向を推定し、当該到来方向を撮像方向とするためのカメラ遠隔制御信号を通報ユニット２Ｂ、通報ユニット２Ｃについて作成して送信する。これにより、通報ユニット２Ｂ及び通報ユニット２Ｃは、ステップＳ３５にて推定された到来方向を撮像方向とするように可動式カメラ４２を制御することができる。
【００９７】
このような異常監視処理を行う監視通報システムによれば、異常が発生した通報ユニット２の周囲の通報ユニット２の撮像方向を音声の到来方向とすることができるので、監視対象が移動して異常発生場所が移動する場合であっても周囲状況を把握することができる。
【００９８】
つぎに、図１４に示す監視サーバ３の異常監視処理では、予め監視サーバ３に複数の通報ユニット２の設置場所を示す設置場所データを記憶しておき、監視地域７１（通報ユニット２の設置エリア）内で異常度Ａが最も高い通報ユニット２Ａを抽出する（ステップＳ５１）。なお、本例では、例えば異常度Ａが所定値以上となった場合に自動的に通報ユニット２から監視サーバ３に異常度Ａを示す情報を送信するものとする。
【００９９】
次に、監視サーバ３では、異常度Ａが最も高い通報ユニット２が通報ユニット２Ａから通報ユニット２Ｂに変更したか否かを判定する（ステップＳ５２）。そして、監視サーバ３では、通報ユニット２Ａ及び通報ユニット２Ｂの設置場所を設置場所データから抽出し、異常発生順に設置場所データを並べて監視対象の移動方向を認識する。これにより、監視サーバ３では、監視対象の移動方向（通報ユニット２Ａから通報ユニット２Ｂ）の延長方向を可動式カメラ４２の撮像方向とする（ステップＳ５３）。
【０１００】
このような異常監視処理を行う監視通報システムによれば、通報ユニット２の撮像方向を異常が発生した監視対象の移動方向とすることができるので、監視対象が移動して異常発生場所が移動する場合であっても周囲状況を把握することができる。
【０１０１】
［実施形態の効果］
以上詳細に説明したように、本発明を適用した監視通報システムによれば、通報ユニット２の近傍にいるユーザが危険な状態に遭遇したときに、適切に監視サーバ３の監視者に通報を行い、迅速な対応をすることで、より安全な環境を提供することができる。
【０１０２】
また、この監視通報システムによれば、多数の通報ユニット２を設定して大規模なクライアントで取得した音声及び画像を監視サーバ３に伝送するに際して、異常度Ａが閾値以上となった各通報ユニット２や周辺の通報ユニット２からのみ画像及び音声を送信するので、各通報ユニット２の情報伝送量を低減して監視サーバ３の監視タスクを低減すると共に、緊急通常時の監視応答レスポンスを高め、更には通報者にて特別な操作をする必要を無くすことができる。
【０１０３】
更に、この監視通報システムによれば、通報ユニット２が設置されている地域において、犯罪や事故が発生した時、通報ユニット２の近傍にいるユーザや被害者が通報ユニット２の通報ボタンを操作して通報して監視サーバ３側で通報ユニット２のカメラ機構を手動で切り換え、カメラ機構の角度やズーム等を制御する必要が無い。
【０１０４】
更にまた、この監視通報システムによれば、通話スイッチ２７をユーザが操作することによる通報や監視サーバ３の監視者による手動の可動式カメラ４２の制御に加えて、自動的に且つ迅速に警察や警備会社等の監視者に通報を行うことができる。例えば、この監視通報システムによれば、歩行しているユーザがひったくりなどの犯罪者に襲われて悲鳴等を発した場合、異常な状態になったことを通報ユニット２により検出してユーザや犯罪者の音声や画像を通報することができる。
【０１０５】
なお、上述の実施の形態は本発明の一例である。このため、本発明は、上述の実施形態に限定されることはなく、この実施の形態以外であっても、本発明に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能であることは勿論である。
【０１０６】
【発明の効果】
本発明によれば、多数の場所の画像及び音声を取得する大規模なシステムを使用した場合であって、音声及び画像を監視サーバに伝送するに際して、送信する情報伝送量を低減してサーバ側の監視タスクを低減すると共に、緊急通常時の監視応答レスポンスを高め、更には通報者にて特別な操作をする必要を無くすことができる。
【図面の簡単な説明】
【図１】本発明を適用した監視通報システムの機能的な構成を示すブロック図である。
【図２】本発明を適用した監視通報システムにおいて、通報ユニットの配置例を説明するための図である。
【図３】本発明を適用した監視通報システムの通報ユニットの側面図である。
【図４】本発明を適用した監視通報システムにおいて、通報ユニットのマイクの設置例を示す上面図である。
【図５】異常特徴音データベース記憶部に記憶する音声データの特徴量としてのフォルマントについて説明するための図である。
【図６】画像認識部の機能的な構成を示すブロック図である。
【図７】入力センサとして距離センサを使用した場合に、方向推定情報を作成するときの説明図である。
【図８】監視地域に多数の通報ユニットを設定した場合において、異常度が低い場合に監視サーバでモニタによる監視が行われているアクティブ状態の通報ユニットを説明するための図を（Ａ）に示し、異常度が高い場合に監視サーバでモニタによる監視が行われているアクティブ状態の通報ユニットを説明するための図を（Ｂ）に示す。
【図９】通報ユニットによる異常監視処理の一例を示すフローチャートである。
【図１０】通報ユニットによる異常監視処理の他の一例を示すフローチャートである。
【図１１】通報ユニットによる異常監視処理の更に他の一例を示すフローチャートである。
【図１２】監視サーバによる異常監視処理の一例を示すフローチャートである。
【図１３】監視サーバによる異常監視処理の他の一例を示すフローチャートである。
【図１４】監視サーバによる異常監視処理の更に他の一例を示すフローチャートである。
【図１５】従来の監視通報システムの具体的な構成例を示すブロック図である。
【符号の説明】
１光伝送ネットワーク
２通報ユニット
３監視サーバ
１１監視用モニタ
１２異常判断表示用モニタ
２１メディアコンバータ
２２ルータ
２３カメラサーバ
２４音声信号処理部
２５スピーカ部
２６マイク
２７通話スイッチ
３１環境音除去部
３２音源方向推定部
３３音声認識部
３４異常特徴音データベース記憶部
３５音声記憶部
３６環境特徴音データベース記憶部
３７ＶＯＩＰ処理部
４１全方位カメラ
４２可動式カメラ
４３入力センサ
５１環境画像ノイズ除去部
５２画像認識部
５３統合方向検知部
５４特徴画像データベース記憶部
５５画像記憶部
５６カメラ制御部
６１オブジェクト画像抽出部
６２特徴量抽出部
６３移動方向推定部
６４異常画像類似度計算部
７１監視領域[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a monitoring and reporting system for reporting an emergency situation that has occurred in a plurality of reporting units to a supervisor.
[0002]
[Prior art]
Conventionally, as an emergency call device, techniques described in the following Patent Literature 1 to Patent Literature 4 are known. The technology described in Patent Document 1 below determines whether or not an abnormality has occurred when an emergency call button provided in an emergency call device is pressed or when a speaker's characteristic voice is recognized. Deciding.
[0003]
Moreover, in the technique described in the following patent document 2 and patent document 3, the direction of a sound source such as an intruding object is specified using a plurality of microphones arranged in different directions on a horizontal plane, and the specified direction is By pointing the video camera, images and sounds of intruding objects and the like are acquired.
[0004]
Further, as a monitoring notification system related to Patent Document 4 below, the applicant configures the system shown in FIG. Specifically, as shown in the figure, a plurality of reporting units 101A to 101C,..., A monitoring server 102 and a monitoring monitor 103 are connected by an optical network 104 and acquired by each reporting unit 101. The transmitted image and sound are configured to be transmitted. Here, each notification unit 101 removes noise from the image data captured by the omnidirectional camera 111 and the movable camera 112 by the environmental image noise removal unit 121 and stores it in the image storage unit 122, and collects sound by the microphone 113. The environmental sound removal unit 131 removes noise from the sound and stores it in the voice storage unit 132.
[0005]
In this monitoring and reporting system, a camera remote control signal is generated by the monitoring person operating the monitoring server 102, and the imaging direction of the movable camera 112 is controlled by the camera control unit 123. Also, in this monitoring and reporting system, when the call switch 114 is pressed by the user, the image data stored in the image storage unit 122 is transmitted via the media converter 115 and the router 116, or by the VOIP processing unit 133. The speaker unit 117 and the microphone 113 are used to realize a call between the supervisor and the user.
[0006]
[Patent Document 1]
JP 2000-348278 A
[0007]
[Patent Document 2]
JP 2002-344957 A
[0008]
[Patent Document 3]
Japanese Patent Laid-Open No. 7-284186
[0009]
[Patent Document 4]
JP 2002-288774 A
[0010]
[Problems to be solved by the invention]
However, the technology related to Patent Document 4 has a configuration in which image information acquired by a terminal on the client side is directly transmitted to a monitoring server or the like. Because of the limited monitoring tasks, it is difficult to accommodate a large number of clients. Even if the same technology is adopted and a system with a large-scale client can be constructed, the task for monitoring on the server side over the entire area where the client is installed becomes large. There was a problem of increasing the burden.
[0011]
Further, in order to solve such a problem, the inventors of the present application have proposed a solution means to be described later. In the past, environmental noise faced the problem of reducing recognition accuracy. More specifically, for example, when environmental noise similar to abnormal sound is emitted, the environmental noise may be misrecognized as abnormal sound and a false alarm may be generated. When there are many false alarms, there is a problem that the number of monitoring tasks on the server side is increased, and the effect of the solution may be lost.
[0012]
Therefore, the present invention has been proposed in view of the above-described circumstances, and reduces the amount of information transmitted from a client when transmitting audio and images acquired on a large-scale client side to the server side. The monitoring task on the side can be reduced, the response response to the emergency response during emergency can be improved, and there is no need to perform special operations on the whistleblower. The purpose is to provide a monitoring and reporting system with improved reporting reliability.
[0025]
[Means for Solving the Problems]
In order to solve the above-described problem, another monitoring and reporting system according to the present invention includes a plurality of reporting units and a monitoring server that presents a monitor with sound and images generated by the reporting unit via a communication line. In each of the connected systems, each reporting unit captures an image of a monitoring target, generates image data, and collects sound around the imaging unit to generate sound data. And voice recognition means for recognizing the status of the monitoring target from the voice data generated by the sound collection means, at least image data and voice data are communicated with the monitoring server, and the monitoring is performed by the voice recognition means. When it is determined that an abnormality has occurred in the target situation, a communication unit that transmits an abnormality occurrence signal to the monitoring server, and a remote control signal from the monitoring server Imaging control means for controlling the image direction, wherein the monitoring server stores communication location for communicating with the reporting unit, installation location data indicating the installation location of the plurality of reporting units, and When a plurality of abnormality occurrence signals are received from the reporting unit, the installation location of the reporting unit where the abnormality has occurred is extracted with reference to the installation location data, and according to the order of the location of the reporting unit where the abnormality has occurred, The moving direction recognition means for recognizing the moving direction of the monitoring target, the reporting unit that has transmitted the abnormality occurrence signal, and the imaging means of the reporting unit that exists in the moving direction recognized by the moving direction recognition means from the reporting unit The direction estimation means for estimating the direction of the monitoring target, and the imaging means of each notification unit estimated by the direction estimation means. The communication unit is controlled to transmit a remote control signal indicating a direction to each reporting unit, and an image generated by the imaging unit of each reporting unit whose imaging direction is controlled by the imaging control unit according to the remote control signal Monitoring control means for receiving data and voice data generated by the sound collecting means.
[0026]
In such a monitoring and reporting system, in order to monitor the surrounding abnormality by the monitoring server, the monitoring server stores in advance the location data of each reporting unit, recognizes the moving direction, and according to the moving direction. Control the imaging direction of the imaging means of the reporting unit.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0028]
The present invention is applied to, for example, a monitoring and reporting system configured as shown in FIG.
[0029]
[Configuration of monitoring and reporting system]
This monitoring and reporting system includes a plurality of reporting units 2A, 2B, 2C (hereinafter simply referred to as “reporting unit 2” when referred to collectively) on the optical transmission network 1 and each reporting unit 2. Are connected to a monitoring server 3 for managing
[0030]
The monitoring server 3 transmits audio data and image data from each reporting unit 2 via the optical transmission network 1, stores the received audio data in the audio storage unit, and stores the image data in the image storage unit. . In addition, the monitoring server 3 includes a plurality of monitoring monitors 11 for presenting images and sounds to a supervisor who is a user, and an abnormality determination display monitor 12 for causing the supervisor to determine an abnormality. Furthermore, a sound emission mechanism is provided that emits an audio signal corresponding to the image displayed on the monitoring monitor 11 and the abnormality determination display monitor 12. Thereby, in the monitoring server 3, it is possible to monitor the emergency which generate | occur | produced around the notification unit 2 by the supervisor.
[0031]
Furthermore, the monitoring server 3 includes an operation input mechanism that is operated by a supervisor, and a camera remote control signal, a voice recognition setting signal, and an image recognition control for the notification unit 2 in response to the operation input mechanism being operated. Send a signal to the reporting unit 2. Other processes performed by the monitoring server 3 will be described later.
[0032]
In this monitoring and reporting system, for example, in the vicinity of an intersection road, as shown in FIG. 2, the reporting units 2A to 2G are installed on a road in a certain area outside through an interval and connected to the optical transmission network 1. It is configured. As shown in FIG. 3, each notification unit 2 is provided with a microphone 26, an omnidirectional camera 41, and a movable camera 42 on a pole 2 a provided on the road. In this example, the reporting unit 2 is provided with a microphone 26 on the pole 2a at a substantially head position when the user is walking, and a movable camera 42 and an omnidirectional camera 41 are provided on the pole 2a above the user's head. Yes.
[0033]
The reporting unit 2 acquires sound and images around the installation location and transmits them to the monitoring server 3. The notification unit 2 includes a media converter 21 and a router 22 as a communication function for communicating with the monitoring server 3 via the optical transmission network 1.
[0034]
In the notification unit 2, when transmitting voice and images to the monitoring server 3, communication data with the destination set as the monitoring server 3 is created by the router 22, and various signal conversions are performed by the media converter 21 to generate an optical signal as an optical transmission network. 1 to send. Further, in the notification unit 2, the optical signal from the monitoring server 3 is received by the media converter 21 and the router 22 through the optical transmission network 1 and is output to the camera server 23 and the audio signal processing unit 24.
[0035]
"Voice data processing by the reporting unit 2"
Next, processing related to voice data by the reporting unit 2 and its functional configuration will be described.
[0036]
Further, the notification unit 2 has a configuration for transmitting audio data to the monitoring server 3 as an audio signal processing unit 24, a speaker unit 25, a plurality of microphones 26A, 26B, and 26C (hereinafter simply referred to as “microphone 26”). And a call switch 27 is provided.
[0037]
As shown in FIG. 4 when the pole 2a is viewed from above, the microphones 26A to 26C are arranged on the side surface of the pole 2a through a certain distance d without being arranged on the same straight line. When the microphone 26 receives sound around the reporting unit 2, the microphone 26 generates a sound signal from the sound and outputs the sound signal to the sound signal processing unit 24.
[0038]
When the audio signals from the microphones 26A to 26C are input, the audio signal processing unit 24 converts each audio signal into audio data that is digital data of a predetermined level by an amplifier and an A / D converter (not shown). Send to.
[0039]
When the sound data is sent from the microphones 26 </ b> A to 26 </ b> C, the environmental sound removing unit 31 performs a process for removing the environmental noise component on each sound data, and a plurality of sound data including the user's sound existing in the vicinity of the reporting unit 2. Is sent to the sound source direction estimation unit 32 and the speech recognition unit 33.
[0040]
When a plurality of sound data from the environmental sound removal unit 31 is sent, the sound source direction estimation unit 32 performs a correlation process using the plurality of sound data. At this time, the sound source direction estimation unit 32 adds the two different audio data whose sampling times are shifted, recognizes the combination of the audio data having the highest signal level, and detects the audio detection direction of the microphone 26 combined with the audio data. Is a sound source arrival direction candidate. More specifically, the sound source direction estimating unit 32 adds, for example, voice data detected by the microphone 26A and voice data detected by the microphone 26B, and calculates the voice level. The sound source direction estimation unit 32 also performs the same calculation for the combination of the microphone 26A and the microphone 26C and the combination of the microphone 26B and the microphone 26C, and recognizes the combination having the highest sound level among the three combinations. A sound source arrival direction candidate. Then, the sound source direction estimation unit 32 estimates direction estimation information θv that is the direction of the monitoring target from a plurality of sound source arrival direction candidates.
[0041]
Note that the sound source direction estimation unit 32 not only estimates the direction estimation information θv using only the microphones 26 attached to the same pole 2a, but also transmits the audio data generated by the plurality of notification units 2 to the optical transmission network. 1 may be used to estimate the direction estimation information θv.
[0042]
In addition, the sound source direction estimation unit 32 may estimate the direction of the monitoring target by obtaining the time difference or phase difference between the sound detection times of the microphones 26. Further, in the sound source direction estimation unit 32, each microphone 26 is a side surface of the pole 2a installed perpendicular to the ground, and is attached to another microphone 26 at a predetermined interval on a horizontal plane with respect to the ground. If there is, the arrival time difference or phase difference of the sound wave of each microphone 26 may be obtained to estimate the horizontal plane direction to be monitored. Furthermore, in the sound source direction estimation unit 32, each microphone 26 is a side surface of the pole 2a installed perpendicular to the ground, and is placed on a surface perpendicular to the ground with a predetermined distance from another microphone 26. When attached, the arrival time difference or phase difference of the sound wave of each microphone 26 may be obtained to estimate the vertical plane direction of the monitoring target. Thereby, in the sound source direction estimation part 32, the arrival direction of the abnormal sound from the monitoring object can be estimated accurately.
[0043]
When voice data is sent from the environmental sound removal unit 31, the voice recognition unit 33 extracts an acoustic feature vector from the voice data, and stores the extracted feature vector and the abnormal feature sound database storage unit 34 in advance. Speech recognition is performed by matching with word dictionary data or sentence dictionary data expressed by the feature vector. Here, the word dictionary and the sentence dictionary include, for example, a feature vector indicating a voice of “help” and a feature vector indicating a voice of “stop”, which are supposed to be uttered at the time of abnormality. .
[0044]
In addition, the abnormal feature sound database storage unit 34 stores not only audio data when an abnormality occurs when the monitoring target is a living body, but also an acoustic sound indicating a collision sound or destruction sound of an object when the monitoring target is an object. Data may be stored, and the presence or absence of abnormality of the monitoring target may be recognized using the acoustic data. Thereby, the voice recognition unit 33 can recognize a specific abnormal sound with reference to the contents of the abnormal feature sound database storage unit 34.
[0045]
Further, the abnormal feature sound database storage unit 34 stores at least one of the fundamental frequency, power spectrum, formant, cepstrum, and temporal displacement of the sound data or sound data as a feature vector (feature of sound data or sound data). (Amount) may be stored.
[0046]
Furthermore, when the voice data is given as the sampled time series data s (t), the abnormal feature sound database storage unit 34 performs Fourier transform such as FFT (Fast Fourier Transform) shown in the following equation. The voice data is converted into a frequency function, and the frequencies corresponding to the plurality of peaks f1 to f3 of the frequency function are approximately formant as shown in FIG.
[0047]
S (exp (−jwm)) = Σs (t) · exp (−jwm)
Then, the abnormal feature sound database storage unit 34 stores formants as feature values of the voice data.
[0048]
Further, the abnormal feature sound database storage unit 34 obtains a cepstrum C (n) by performing inverse Fourier transform on the logarithm of the frequency function S (exp (−jwm)) in the above equation as in the following equation, A cepstrum may be stored as a feature amount of audio data.
[0049]
C (n) = (1 / N) ΣlogY · exp (2πk / N)
Then, the voice recognition unit 33 selects a feature vector that is closest to the extracted feature vector from the feature vectors stored in the abnormal feature sound database storage unit 34. Then, the speech recognition unit 33 calculates the degree of abnormal sound (hereinafter referred to as speech abnormal similarity Av) based on the distance between the selected feature vector and the extracted feature vector. The voice recognition unit 33 stores the calculated voice abnormality similarity Av in the voice storage unit 35.
[0050]
The data stored in the abnormal feature sound database storage unit 34 is not limited to a human voice, and may be a feature vector of a car collision sound, for example. As a result, not only a human voice but also an abnormal sound such as a collision or destruction of an object can generate the sound abnormality similarity Av.
[0051]
In addition, the audio signal processing unit 24 includes an environmental feature sound database storage unit 36 in which a characteristic vector of a specific environmental sound is stored according to the location where the reporting unit 2 is installed. The environmental feature sound database storage unit 36 stores feature vectors of environmental sounds that are not abnormal sounds, such as train crossing sounds and pedestrian crossing warning sounds.
[0052]
The feature vector of the environmental sound is sent from the audio signal processing unit 24 to the monitoring server 3 via the router 22 and the media converter 21, and a voice recognition setting signal indicating that the monitoring server 3 determines that the sound is not an abnormal sound. Is registered in the environmental feature sound database storage unit 36. The monitoring server 3 may automatically perform a process for determining whether or not the environmental sound from the reporting unit 2 is an abnormal sound by using an environmental characteristic sound database prepared in advance. It may be done manually.
[0053]
The feature vector of the environmental sound stored in the environmental feature sound database storage unit 36 is read by the speech recognition unit 33 when the speech recognition unit 33 recognizes the abnormal sound, and determines whether or not it is an abnormal sound. Compared to the feature vector. The speech recognition unit 33 does not determine that the sound is abnormal when the feature vector for determining whether or not the sound is abnormal is close to the feature vector of the environmental sound. Thereby, the reporting unit 2 can improve the reliability of abnormality detection without erroneously recognizing environmental sound as abnormal sound.
[0054]
The sound storage unit 35 stores sound data previously detected by the microphone 26 and sound abnormality similarity Av corresponding to the sound data. The voice data and the voice abnormality similarity Av stored in the voice storage unit 35 can be downloaded by the monitoring server 3 via the optical transmission network 1 when the supervisor operates the monitoring server 3. ing.
[0055]
The voice abnormality similarity Av includes those set by a voice recognition setting signal from the monitoring server 3. That is, when the monitoring server 3 refers to the voice data stored in the voice storage unit 35 of each reporting unit 2 and determines that the voice data corresponds to an abnormal sound, the monitoring server 3 converts the voice data into the voice recognition unit 33. And the feature vector is added to the abnormal feature sound database storage unit 34 as an abnormal sound. As described above, in the reporting unit 2, the feature vector stored in the abnormal feature sound database storage unit 34 or the environmental feature sound database storage unit 36 is added by the monitoring server 3, so that abnormality detection is performed compared to the initial installation time. Reliability can be improved.
[0056]
The audio signal processing unit 24 includes a VOIP (Voice over IP (Internet Protocol)) processing unit 37 to which audio data is output from the environmental sound removal unit 31. The VOIP processing unit 37 sends the audio data from the environmental sound removal unit 31 to the monitoring server 3 when the call switch 27 is operated.
[0057]
"Image data processing by report unit 2"
Next, processing related to image data by the reporting unit 2 and its functional configuration will be described.
[0058]
The notification unit 2 includes a camera server 23, an omnidirectional camera 41, a movable camera 42, and an input sensor 43 as a configuration for transmitting an image signal to the monitoring server 3.
[0059]
The omnidirectional camera 41 collects light with a lens having a wide viewing angle, and generates an image signal with an internal CCD (Charge Coupled Device) image sensor. The movable camera 42 has a pan function, a tilt function, and a zoom function, and generates an image signal by a CCD image sensor. The omnidirectional camera 41 and the movable camera 42 are connected to the camera server 23 and output image signals to the camera server 23.
[0060]
In the camera server 23, when an image signal obtained by capturing the surrounding situation of the notification unit 2 is input from the omnidirectional camera 41, the image signal is A / D converted and sent to the environmental image noise removing unit 51 as image data. When the environmental image noise removing unit 51 receives image data, the environmental image noise removing unit 51 removes noise from the image data and sends it to the image recognition unit 52.
[0061]
The image recognition unit 52 has a functional configuration as shown in FIG. 6 and performs image recognition processing on image data from the omnidirectional camera 41 and the movable camera 42.
[0062]
In the image recognition unit 52, when image data is input, the object image extraction unit 61 divides the background image and other object images by performing image processing such as a background subtraction method. The object image is image data indicating, for example, a person or a car. In the image recognition unit 52, for example, when image data of a still image is input, the feature amount extraction unit 62 recognizes in-image position information, size information, color information, and the like of the object image, and each piece of the information is recognized. Convert to features for the object. In addition, when a moving image over a plurality of frames is input, the feature amount extraction unit 62 converts motion speed information of the object image as an image feature amount.
[0063]
In the image recognition unit 52, the movement direction estimation unit 63 obtains direction estimation information that is a candidate for the direction θi of the monitoring target (hereinafter referred to as an object) corresponding to the object image from the position information of the object image. To the integrated direction detection unit 53.
[0064]
In addition, the camera server 23 includes a feature image database storage unit 54 that accumulates and stores feature amounts of objects that are assumed to be abnormal. When the feature amount of the object is obtained, the image recognition unit 52 performs matching processing with the feature amount of the object stored in the feature image database storage unit 54, and calculates the feature amount of the obtained object and the feature amount of the accumulated object. Is used to calculate the distance between feature vectors. In the image recognition unit 52, the abnormal image similarity calculation unit 64 obtains an abnormal image similarity Aj indicating the degree of abnormality based on the distance between the feature vectors.
[0065]
Further, the image recognition unit 52 stores the image data from the environmental image noise removal unit 51 in the image storage unit 55.
[0066]
Further, the camera server 23 includes, as an input sensor 43, a plurality of human body detection sensors for detecting human bodies in different directions using, for example, infrared rays. This input sensor 43 specifies a human body detection sensor that detects a human body among a plurality of human body detection sensors when a person object exists around the reporting unit 2. And this input sensor 43 estimates the direction in which a human body object exists from the human body detection direction of the specified human body detection sensor, and sends it to the integrated direction detection part 53 as direction estimation information.
[0067]
In addition, as the input sensor 43, as shown in FIG. 7 when the notification unit 2 is viewed from above, a plurality of detection ranges that are different in the imaging region of the omnidirectional camera 41 or the movable camera 42 are set. You may provide distance sensor 44A-44D. The distance sensors 44 </ b> A to 44 </ b> D send direction estimation information including distance information to the integrated direction detection unit 53 when a monitoring target such as a human body is detected. Each distance sensor 44 may be an ultrasonic sensor or an optical sensor.
[0068]
Further, the input sensor 43 may be composed of a plurality of infrared sensors within the imaging area of the omnidirectional camera 41 or the movable camera 42 and having different detection ranges. As a result, each infrared sensor sends direction estimation information indicating an area where the monitoring target exists to the integrated direction detection unit 53 using the infrared information reflected and detected from the monitoring target.
[0069]
The integrated direction detection unit 53 determines the direction with respect to the reporting unit 2 where the object exists from each direction estimation information from the sound source direction estimation unit 32, the image recognition unit 52, and the input sensor 43, and sends it to the camera control unit 56.
[0070]
The camera control unit 56 sets the pan and tilt amount Δθ and the zoom amount ΔZ of the movable camera 42 from the object direction determined by the integrated direction detection unit 53. At this time, the camera control unit 56 uses the direction estimation angle θ of the movable camera 42 for setting the pan and tilt amount Δθ as the audio abnormality similarity Av, the image abnormality similarity Ai, and the direction estimation information θv using the audio. Using the direction estimation information θi using the image,
θ = (Av × θv + Ai × θi) / (Av + Ai)
Is obtained by performing the following calculation. That is, the camera control unit 56 uses the audio abnormality similarity Av and the image abnormality similarity Ai as the weighting coefficients of the direction estimation information θv and the direction estimation information θi. Then, the camera control unit 56 determines the pan and tilt amount Δθ from the current imaging direction of the movable camera 42 with respect to the obtained direction estimation angle θ.
[0071]
In addition, when the difference between the direction estimation information θv and the direction estimation information θi is small, the camera control unit 56 sets the zoom setting value Z for setting the zoom amount ΔZ to the direction estimation information θi and the direction estimation information θv. When the zoom setting value Z is increased and the difference between the direction estimation information θv and the direction estimation information θi is large, the object existence probability in the direction estimation information θi and the direction estimation information θv is low. Therefore, the zoom setting value Z is set to be small.
[0072]
At this time, the camera control unit 56 uses, for example, the direction estimation information θv and the direction estimation information θi,
Z = α / (θv−θi)
α: Constant
The zoom setting value Z is obtained by performing the following calculation. Then, the camera control unit 56 determines the zoom amount ΔZ from the current zoom setting value of the movable camera 42 with respect to the obtained zoom setting value Z.
[0073]
Then, the camera control unit 56 drives the movable camera 42 by the pan and tilt amount Δθ and the zoom amount ΔZ to cause the movable camera 42 to image an object, and image data captured by the movable camera 42 is image recognition unit. 52.
[0074]
"Abnormality judgment processing"
Next, processing related to abnormality determination by the reporting unit 2 and the monitoring server 3 and its functional configuration will be described.
[0075]
The image recognition unit 52 calculates the degree of abnormality A by integrating the above-described audio abnormality similarity Av and image abnormality similarity Ai when making an abnormality determination. At this time, the image recognition unit 52 uses, for example, the following equation:
A = α × Ai + β × Av
α, β: Constant
Perform the following operation. Accordingly, the image recognition unit 52 calculates the degree of abnormality A for the reporting unit 2 and sends it to the monitoring server 3 via the optical transmission network 1. Further, the image recognition unit 52 calculates an abnormal value A, and when the abnormal value A becomes higher than a preset threshold value, the image and sound captured by the movable camera 42 are monitored. It may be transmitted to the server 3.
[0076]
In response to this, the monitoring server 3 displays a plurality of images transmitted from the reporting unit 2 on the monitor 11 for monitoring or the monitor 12 for abnormality determination display, so that the abnormality degree A becomes equal to or greater than the threshold for the monitor. The image and sound around the reporting unit 2 is monitored.
[0077]
Moreover, in the monitoring server 3, as shown in FIG. 8, in the own monitoring area 71 where there are a plurality of reporting units 2 connected via the optical transmission network 1, as shown in FIG. When the degree of abnormality A of the reporting unit 2-1 is low, it is assumed that image data captured by any reporting unit 2-2, 2-3, 2-4 is monitored. When the abnormality degree A of the reporting unit 2-1 exceeds the threshold value, information to that effect is transmitted from the reporting unit 2-1 to the monitoring server 3.
[0078]
In response to this, the monitoring server 3 determines that an abnormality has occurred in the vicinity of the notification unit 2-1, and notifies the notification unit 2-11, the notification unit 2-12 in the vicinity of the notification unit 2-1. A control signal for transmitting image data and audio data captured by the reporting unit 2-13 is transmitted to each reporting unit 2. In addition, the monitoring server 3 transmits a camera remote control signal with the direction of the reporting unit 2-1 in the imaging direction to the reporting unit 2-11, the reporting unit 2-12, and the reporting unit 2-13. In addition, the monitoring server 3 transmits a control signal enabling voice communication by the VOIP processing unit 37 to the reporting unit 2-1 in which the degree of abnormality A is equal to or greater than the threshold, and the call switch 27 of the reporting unit 2-1 It is possible to make a call between the user who has operated and the monitor on the monitoring server 3 side. Thereby, in the reporting unit 2-1, the voice switch 27 is operated and the voice data detected by the microphone 26 is transmitted to the monitoring server 3.
[0079]
As a result, the reporting units 2-11, 12 and 2-13 image the direction according to the camera remote control signal by the movable camera 42, and transmit the image data and the audio data to the monitoring server 3, It becomes an active state that enables monitoring by a monitor. In the monitoring server 3, when image data and audio data are sent from the reporting unit 2-1, the reporting unit 2-11, the reporting unit 2-12, and the reporting unit 2-13, the monitoring monitor 11 and the abnormality determination display monitor 12 are sent. By confirming the image and sound, the monitoring by the supervisor is performed with a lot of information.
[0080]
In this monitoring and reporting system, when the degree of abnormality A becomes high, not only the reporting unit 2-1, but also the image data acquired by the reporting unit 2-11, the reporting unit 2-12, and the reporting unit 2-13, and By storing audio data, it is possible to analyze information on areas where abnormality has occurred from various angles.
[0081]
In such a monitoring and reporting system, since only the reporting unit 2 exceeding the abnormality degree A is monitored by the monitoring server 3, it is not necessary to monitor all the reporting units 2 on the monitoring server 3 side. Therefore, according to this monitoring notification system, monitoring tasks on the monitoring server 3 side can be reduced.
[0082]
Further, in this monitoring notification system, image data obtained by capturing with the omnidirectional camera 41 and the movable camera 42 is stored in the image storage unit 55, and in response to a download request from the monitoring server 3 to the camera server 23. Since image data can be transmitted, a past image can be downloaded and analyzed by a supervisor.
[0083]
"Abnormality monitoring process"
Next, in the monitoring notification system configured as described above, a processing procedure of the abnormality monitoring processing that can be realized in addition to the processing described above will be described with reference to FIGS.
[0084]
In the abnormality monitoring process of the reporting unit 2 shown in FIG. 9, first, the movable mechanism (not shown) is controlled by the camera control unit 56 so that the angle of the movable camera 42 is set to the initial position at the start of the process (step S1). The zoom function of the movable camera 42 is controlled by the camera control unit 56 so as to maximize the degree of out (step S2). As a result, the reporting unit 2 is in a state of generating wide-angle image data using the image signal captured by the movable camera 42.
[0085]
In such a state, the notification unit 2 uses the image data captured and acquired by the movable camera 42 to obtain at least the image abnormality similarity Ai by the image recognition unit 52 and calculates the abnormality A. Process. Then, in the reporting unit 2, when it is determined that the degree of abnormality A is lower than the predetermined threshold (step S3), the processing of step S1 and step S2 is repeated to determine that the degree of abnormality A is higher than the predetermined threshold. If so (step S3), the image recognition unit 52 performs a process of extracting a moving object to be monitored from wide-angle image data generated by imaging with the zoom function set in step S2 (step S3). Step S4). Then, the image recognizing unit 52 creates direction estimation information θi from the extracted position of the moving object and sends it to the integrated direction detecting unit 53.
[0086]
Next, in the reporting unit 2, the integrated direction detection unit 53 estimates the imaging direction of the movable camera 42 using at least the direction estimation information θi, and the movable camera 42 is set so that the moving object becomes the center of the image data. Control (step S5) and control the movable camera 42 to zoom in (step S6). As a result, the reporting unit 2 can generate image data including the moving object at the center of the image and store it in the image storage unit 55 or transmit it to the monitoring server 3.
[0087]
Next, in the abnormality monitoring process of the reporting unit 2 shown in FIG. 10, the processes of steps S1 to S3 are performed in the same manner as the abnormality monitoring process shown in FIG. 9, and the abnormality degree A is higher than a predetermined threshold value in step S3. When it is determined that the height is high, the image recognition unit 52 extracts a moving object using the omnidirectional image data generated by imaging with the omnidirectional camera 41 (step S11). Then, the image recognition unit 52 creates direction estimation information θi from the moving object extraction position and sends it to the integrated direction detection unit 53.
[0088]
In response to this, the notification unit 2 calculates the presence position of the moving object by using the integrated direction detection unit 53 using at least the direction estimation information θi (step S12), and is moved by the camera control unit 56 from the calculated value of the presence position. The imaging direction (angle) of the camera 42 is controlled (step S13), and the movable camera 42 is controlled to zoom in (step S6). As a result, the reporting unit 2 can generate image data including a moving object and store it in the image storage unit 55 or transmit it to the monitoring server 3.
[0089]
Next, in the abnormality monitoring process of the reporting unit 2 shown in FIG. 11, first, in step S21, the voice recognition unit 33 determines the direction of the voice from the monitoring target for the plurality of microphones 26A to 26C. Determine the time. Then, in the voice recognition unit 33, in step S22, the voice data (channel signal) detected and generated by each microphone 26A, microphone 26B, and microphone 26C is shifted and added by the delay time determined in step S21. , The delay sum S (λ) is obtained. At this time, the speech recognition unit 33 obtains a delay sum for the combination of the microphone 26A and the microphone 26B, the microphone 26A and the microphone 26C, and the combination of the microphone 26B and the microphone 26C.
[0090]
As a result, when it is determined in step S23 that the delay sum S (λ) has been calculated for all the combinations, that is, all angles, the speech recognition unit 33 obtains the delay sum S (λ) having the largest value. Is sent to the integrated direction detector 53 as direction estimation information θv (step S24). Further, the voice recognition unit 33 may obtain the direction estimation information θv by obtaining the time difference or phase difference of the voice detection time.
[0091]
Next, the integrated direction detection unit 53 obtains an estimated direction captured by the movable camera 42 from at least the direction estimation information θv, and the camera control unit 56 controls the imaging direction of the movable camera 42.
[0092]
Next, in the abnormality monitoring process of the monitoring server 3 shown in FIG. 12, the monitoring server 3 receives the voice data SA (t), which is time-series data from the reporting unit 2A (step S31), and receives from the reporting unit 2B. The voice data SB (t) that is time series data is received (step S32), and the voice data SC (t) that is time series data from the reporting unit 2C is received (step S33).
[0093]
Next, the monitoring server 3 calculates the correlation of the voice data between the reporting units 2 using each voice data SA (t), voice data SB (t), and voice data SC (t) (step S34). . Then, the monitoring server 3 estimates the arrival direction of the sound emitted by the monitoring target from the correlation calculation result, and uses the camera remote control signals for setting the arrival direction as the imaging direction as the notification unit 2A, the notification unit 2B, and the notification unit 2C. Create and send about. Thereby, the reporting unit 2A, the reporting unit 2B, and the reporting unit 2C can control the movable camera 42 so that the arrival direction estimated in step S35 is the imaging direction.
[0094]
According to the monitoring and reporting system that performs such abnormality monitoring processing, it is possible to obtain image data of an abnormal situation from a plurality of angles by directing a plurality of reporting units 2 with different installation locations to a single voice arrival location. , Can be analyzed from various abnormal situations.
[0095]
Next, in the abnormality monitoring process of the monitoring server 3 shown in FIG. 13, first, the monitoring unit 11 is installed in the notification unit 2A presenting an image and sound to the monitor, and in the vicinity of the notification unit 2A. The monitoring server 3 receives the voice data SA (λ), SB (λ), and SC (λ) acquired by the reporting unit 2B and the reporting unit 2C (step S41).
[0096]
In the monitoring server 3, when the abnormality degree A of the reporting unit 2A exceeds a predetermined threshold and an abnormality occurrence signal is received from the reporting unit 2A (step S42), the reporting unit 2B and the reporting unit around the reporting unit 2A 2C is set as a monitoring target (step S43), and the correlation between the audio data SA (λ), SB (λ), and SC (λ) received in step S41 is calculated (step S44). Then, the monitoring server 3 estimates the arrival direction of the sound emitted by the monitoring target from the correlation calculation result, and creates a camera remote control signal for the notification unit 2B and the notification unit 2C for setting the arrival direction as the imaging direction. Send. Thereby, the reporting unit 2B and the reporting unit 2C can control the movable camera 42 so that the arrival direction estimated in step S35 is the imaging direction.
[0097]
According to the monitoring and reporting system that performs such an abnormality monitoring process, the imaging direction of the reporting unit 2 around the reporting unit 2 in which the abnormality has occurred can be set as the voice arrival direction. Even if the place of occurrence moves, the surrounding situation can be grasped.
[0098]
Next, in the abnormality monitoring process of the monitoring server 3 shown in FIG. 14, installation location data indicating the installation locations of the plurality of notification units 2 is stored in the monitoring server 3 in advance, and the monitoring area 71 (the installation area of the notification unit 2) is stored. ), The reporting unit 2A having the highest degree of abnormality A is extracted (step S51). In this example, for example, when the degree of abnormality A exceeds a predetermined value, information indicating the degree of abnormality A is automatically transmitted from the reporting unit 2 to the monitoring server 3.
[0099]
Next, the monitoring server 3 determines whether or not the reporting unit 2 having the highest degree of abnormality A has been changed from the reporting unit 2A to the reporting unit 2B (step S52). Then, the monitoring server 3 extracts the installation locations of the reporting unit 2A and the reporting unit 2B from the installation location data, and arranges the installation location data in the order of occurrence of the abnormality to recognize the moving direction of the monitoring target. Thereby, in the monitoring server 3, the extending direction of the monitoring target moving direction (the reporting unit 2A to the reporting unit 2B) is set as the imaging direction of the movable camera 42 (step S53).
[0100]
According to the monitoring and reporting system that performs such abnormality monitoring processing, the imaging direction of the reporting unit 2 can be the moving direction of the monitoring target in which an abnormality has occurred, so that the monitoring target moves and the location of occurrence of the abnormality moves. Even if it is a case, the surrounding situation can be grasped.
[0101]
[Effect of the embodiment]
As described in detail above, according to the monitoring and reporting system to which the present invention is applied, when a user in the vicinity of the reporting unit 2 encounters a dangerous state, the monitoring server 3 is appropriately notified to the monitoring server 3. By responding quickly, a safer environment can be provided.
[0102]
In addition, according to this monitoring and reporting system, when a large number of reporting units 2 are set and audio and images acquired by a large-scale client are transmitted to the monitoring server 3, each reporting unit having an abnormality degree A equal to or greater than a threshold value. 2 and the image and sound are transmitted only from the surrounding reporting unit 2, so that the monitoring task of the monitoring server 3 is reduced by reducing the information transmission amount of each reporting unit 2, and the monitoring response response in emergency normal time is increased, Furthermore, it is possible to eliminate the need for special operations at the whistleblower.
[0103]
Furthermore, according to this monitoring and reporting system, when a crime or an accident occurs in the area where the reporting unit 2 is installed, a user or victim in the vicinity of the reporting unit 2 operates the reporting button of the reporting unit 2. Therefore, it is not necessary to manually switch the camera mechanism of the reporting unit 2 on the monitoring server 3 side and control the angle and zoom of the camera mechanism.
[0104]
Furthermore, according to this monitoring and reporting system, in addition to reporting by the user operating the call switch 27 and manual control of the movable camera 42 by the monitor of the monitoring server 3, the police and It is possible to make a report to a security company. For example, according to this monitoring and reporting system, when a walking user is attacked by a criminal such as snatching and screams, etc., the reporting unit 2 detects that an abnormal state has occurred, and the user or crime The user's voice and image can be reported.
[0105]
The above-described embodiment is an example of the present invention. For this reason, the present invention is not limited to the above-described embodiment, and various modifications can be made depending on the design and the like as long as the technical idea according to the present invention is not deviated from this embodiment. Of course, it is possible to change.
[0106]
【The invention's effect】
According to the present invention, when using a large-scale system that acquires images and sounds of a large number of places, when transmitting sounds and images to a monitoring server, the amount of information transmitted is reduced and the server side In addition to reducing the monitoring task, it is possible to increase the monitoring response response in the normal emergency, and to eliminate the need for special operation by the reporter.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a functional configuration of a monitoring notification system to which the present invention is applied.
FIG. 2 is a diagram for explaining an arrangement example of notification units in a monitoring notification system to which the present invention is applied.
FIG. 3 is a side view of a notification unit of a monitoring notification system to which the present invention is applied.
FIG. 4 is a top view showing an installation example of a microphone of a notification unit in the monitoring notification system to which the present invention is applied.
FIG. 5 is a diagram for explaining a formant as a feature amount of voice data stored in an abnormal feature sound database storage unit;
FIG. 6 is a block diagram illustrating a functional configuration of an image recognition unit.
FIG. 7 is an explanatory diagram when creating direction estimation information when a distance sensor is used as an input sensor.
FIG. 8A is a diagram for explaining active reporting units that are monitored by the monitoring server when the degree of abnormality is low when a large number of reporting units are set in the monitoring area; (B) is a diagram for explaining an active reporting unit that is monitored by the monitoring server when the degree of abnormality is high.
FIG. 9 is a flowchart showing an example of abnormality monitoring processing by a reporting unit.
FIG. 10 is a flowchart showing another example of the abnormality monitoring process by the reporting unit.
FIG. 11 is a flowchart showing still another example of abnormality monitoring processing by a reporting unit.
FIG. 12 is a flowchart illustrating an example of an abnormality monitoring process performed by a monitoring server.
FIG. 13 is a flowchart illustrating another example of the abnormality monitoring process performed by the monitoring server.
FIG. 14 is a flowchart showing still another example of abnormality monitoring processing by the monitoring server.
FIG. 15 is a block diagram illustrating a specific configuration example of a conventional monitoring notification system.
[Explanation of symbols]
1 Optical transmission network
2 reporting unit
3 Monitoring server
11 Monitor for monitoring
12 Abnormality judgment display monitor
21 Media Converter
22 routers
23 Camera server
24 Audio signal processor
25 Speaker section
26 microphone
27 Call switch
31 Environmental sound removal unit
32 Sound source direction estimation unit
33 Voice recognition unit
34 Abnormal feature sound database storage
35 Voice memory
36 Environmental characteristic sound database storage
37 VOIP processing section
41 Omnidirectional camera
42 Movable camera
43 Input sensor
51 Environmental image noise removal unit
52 Image recognition unit
53 Integrated direction detector
54 Feature Image Database Storage Unit
55 Image storage
56 Camera control unit
61 Object image extraction unit
62 Feature extraction unit
63 Movement direction estimation unit
64 Abnormal image similarity calculator
71 Monitoring area

Claims

In a monitoring and reporting system in which a plurality of reporting units and a monitoring server that presents sound and images generated by the reporting unit to a monitor are connected via a communication line,
Each reporting unit is
Imaging means for imaging a monitoring target and generating image data;
Sound collecting means for collecting sound around the imaging means to generate sound data, sound recognition means for recognizing the status of the monitoring target from the sound data generated by the sound collecting means, at least image data, Communication means for communicating voice data with the monitoring server, and when the voice recognition means determines that an abnormality has occurred in the status of the monitoring target, a communication means for transmitting an abnormality occurrence signal to the monitoring server;
Imaging control means for controlling the imaging direction of the imaging means by a remote control signal from the monitoring server,
The monitoring server is
A communication means for communicating with the reporting unit;
The installation location data indicating the installation locations of the plurality of notification units is stored, and when an abnormality occurrence signal is received from the plurality of notification units, the installation of the plurality of notification units in which an abnormality has occurred with reference to the installation location data A moving direction recognition means for extracting the location and recognizing the moving direction of the monitoring target according to the order of the installation location of the reporting unit in which an abnormality has occurred;
A notification unit that transmits the abnormality occurrence signal, and a direction estimation unit that estimates a direction of the monitoring target with respect to the imaging unit of the notification unit that exists in the movement direction recognized by the movement direction recognition unit from the notification unit;
The communication unit is controlled to transmit a remote control signal indicating the imaging direction of the imaging unit of each notification unit estimated by the direction estimation unit to each notification unit, and imaging is performed by the imaging control unit according to the remote control signal. A monitoring and reporting system comprising: monitoring control means for receiving image data generated by the imaging means of each reporting unit whose direction is controlled, and audio data generated by the sound collecting means.