JP5627962B2

JP5627962B2 - Anomaly detection device

Info

Publication number: JP5627962B2
Application number: JP2010200586A
Authority: JP
Inventors: 和義福士
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2010-09-08
Filing date: 2010-09-08
Publication date: 2014-11-19
Anticipated expiration: 2030-09-08
Also published as: JP2012058944A

Description

本発明は、音響情報に基づいて監視領域内の異常状態を検知する音響による異常検知装置に関する。 The present invention relates to an abnormality detection apparatus using sound for detecting an abnormal state in a monitoring area based on acoustic information.

従来、集音部にて取得した音響信号から、強盗の強迫音声、銃声、客等の悲鳴などの異常波形をもった異常音を検知し、警備会社等へ通報する音響による異常検知装置が提案されている（特許文献１）。また、周囲の雑音による誤検知を低減するために、音響信号をスペクトル分析することにより、周囲の雑音から悲鳴の存在を検知する音響による異常検知装置も提案されている（特許文献２）。 Conventionally, an acoustic abnormality detection device has been proposed that detects abnormal sounds with abnormal waveforms such as burglary compulsive voices, gunshots, and screams of customers from acoustic signals acquired by the sound collection unit and reports them to security companies. (Patent Document 1). In addition, in order to reduce false detection due to ambient noise, an acoustic abnormality detection device that detects the presence of screams from ambient noise by analyzing the spectrum of the acoustic signal has also been proposed (Patent Document 2).

特公平５−６７１８JP 5-6718 特開平９−２５１５８３JP 9-251583 A

しかしながら、上記の従来の音響による異常検知装置は、音響信号のみから監視領域内の異常状態を検知しているため、周囲のアナウンス音やテレビ音等といった異常音に類似する音響信号を誤って検出してしまう問題があった。このような誤検知を低減するために、検出感度を下げると逆に失検知の要因となりかねないことから、検出感度による調整も困難であった。 However, the above-described conventional sound abnormality detection device detects an abnormal state in the monitoring area only from the sound signal, and thus erroneously detects an acoustic signal similar to an abnormal sound such as a surrounding announcement sound or television sound. There was a problem. In order to reduce such false detection, if the detection sensitivity is lowered, it may be a cause of detection failure. Therefore, it is difficult to adjust the detection sensitivity.

ところで、発明者は、多くの強盗事案について事例分析した結果、強盗事案が発生した場合、悲鳴や銃声等の異常音が発生したとき少なくとも、２人以上の人物が周辺に所在していることに気がついた。すなわち、何らかの理由により独り号泣しているような悲鳴に似た音声などは、強盗事案という賊と被害者の２者以上の人物がかかわって生じる音声ではないということである。 By the way, as a result of analyzing the cases of many burglary cases, the inventor found that when burglary cases occurred, when abnormal sounds such as screams and gunshots occurred, at least two persons were located in the vicinity. noticed. That is, a sound similar to a scream that is crying alone for some reason is not a sound generated by a burglar case involving two or more persons, a bandit and a victim.

そこで本発明は、上記のような異常音と周辺に所在する人数に着目し、異常音に類似する様々な音響が存在する環境下においても、誤検知を頻発することなく、監視領域内の異常状態を検出しうることを目的としたものである。 Therefore, the present invention pays attention to the abnormal sound as described above and the number of people located in the vicinity, and even in an environment where various sounds similar to the abnormal sound exist, the abnormality in the monitoring area does not occur frequently. The purpose is to be able to detect the state.

かかる課題を解決するために、重要物を保管している監視領域にて生じている音響信号を取得する集音部と、前記集音部が集音した前記音響信号を分析し、所定の異常音を検出する音響分析部と、前記監視領域を含む領域の画像を順次取得する撮像部と、前記画像から人物像を抽出し、該人物像の数を算出する画像分析部と、前記音響分析部の異常音検出結果と前記画像分析部にて算出された人物像数とに基いて前記監視領域内の異常状態を判定する異常判定部と、前記異常判定部の判定結果を出力する出力部とから構成される音響による異常検知装置であって、前記異常判定部は、前記音響分析部にて所定の異常音と判定されたときに前記画像分析部にて複数の前記人物像が検出されていると異常状態と判定することを特徴とした異常検知装置を提供する。 In order to solve such a problem, a sound collection unit that acquires an acoustic signal generated in a monitoring area in which an important object is stored, and the acoustic signal collected by the sound collection unit are analyzed to obtain a predetermined abnormality. An acoustic analysis unit that detects sound; an imaging unit that sequentially acquires an image of an area including the monitoring region; an image analysis unit that extracts a person image from the image and calculates the number of the person images; and the acoustic analysis An abnormality determination unit that determines an abnormal state in the monitoring area based on the abnormal sound detection result of the image analysis unit and the number of person images calculated by the image analysis unit, and an output unit that outputs the determination result of the abnormality determination unit An abnormality detection apparatus using sound, wherein the abnormality determination unit detects a plurality of the human images at the image analysis unit when the acoustic analysis unit determines that the sound is a predetermined abnormal sound. Anomaly detection characterized by determining an abnormal state when To provide a location.

かかる構成により、本発明の音響分析部は、集音部によって集音された音響信号を分析することにより、監視領域で悲鳴等の異常音が発生したか否かを判定する。また、本発明の画像分析部は、撮像部にて取得された少なくとも監視領域を含む領域の画像を分析し、人物像を抽出して、当該人物像の数を算出する。そして、本発明の異常判定部は、音響分析部にて監視領域で悲鳴等の異常音が発生したと判定されたとき、画像分析部にて複数の人物像が検出されているならば異常状態と判定する。そして、本発明の出力部は、当該判定結果を出力する。
これにより、例えば、金庫等に保管されている重要物を奪おうとしている強盗によって従業員が威嚇された際に、音響分析部にて当該従業員の発した悲鳴を検出し、画像分析部にて強盗と従業員の複数の人物像を検出することにより、異常判定部は監視領域内が異常状態であると判定できる。一方、例えば、監視領域内に設置されているテレビ等からの悲鳴を音響分析部にて検出したとしても、画像分析部にて複数の人物像を検出できなければ、異常判定部は監視領域内が異常状態であると判定しない。このように、本発明は、たとえ監視領域内が異常音に類似する様々な音響の存在する環境であったとしても、複数の人物が存在しているか否かといった室内状況を考慮することによって、誤検知の頻発を抑えることができる。 With this configuration, the acoustic analysis unit of the present invention determines whether or not an abnormal sound such as a scream has occurred in the monitoring region by analyzing the acoustic signal collected by the sound collection unit. Further, the image analysis unit of the present invention analyzes an image of an area including at least the monitoring area acquired by the imaging unit, extracts a person image, and calculates the number of the person images. The abnormality determination unit of the present invention is in an abnormal state if a plurality of human images are detected by the image analysis unit when the acoustic analysis unit determines that an abnormal sound such as a scream has occurred in the monitoring area. Is determined. And the output part of this invention outputs the said determination result.
As a result, for example, when an employee is threatened by a burglar trying to take away an important item stored in a safe or the like, the acoustic analysis unit detects a scream uttered by the employee, and the image analysis unit Thus, by detecting a plurality of person images of a burglar and an employee, the abnormality determination unit can determine that the monitoring area is in an abnormal state. On the other hand, for example, even if a sound scream from a television set or the like installed in the monitoring area is detected by the acoustic analysis unit, if the image analysis unit cannot detect a plurality of human images, the abnormality determination unit is in the monitoring area. Is not determined to be abnormal. As described above, the present invention takes into consideration the indoor situation such as whether or not there are a plurality of persons even if the monitoring area is an environment where various sounds similar to abnormal sounds exist. The frequent occurrence of false detection can be suppressed.

また、本発明の好ましい態様として、前記音響分析部が検出する異常音が人の悲鳴音声であることとする。 As a preferred aspect of the present invention, the abnormal sound detected by the acoustic analysis unit is a human screaming voice.

上記のように、本発明の異常検知装置は、監視領域内の人物の所在情報を積極的に利用したことにより、異常音に類似する様々な音響が存在する環境下においても、誤検知を多発することなく、監視領域内の異常状態を検出することができる。 As described above, the anomaly detection device of the present invention is prone to erroneous detection even in an environment where various sounds similar to the anomalous sound exist by actively using the location information of the person in the monitoring area. An abnormal state in the monitoring area can be detected without doing so.

異常検知装置の構成と配置イメージを模式的に示した図Diagram showing configuration and layout image of anomaly detection device 管理装置の構成を示すブロック図Block diagram showing the configuration of the management device 移動軌跡情報を示す図Diagram showing movement trajectory information 制御部における処理を示すフローチャートFlow chart showing processing in the control unit 複数人物像検出処理を示すフローチャートFlow chart showing multiple person image detection processing 悲鳴検出処理を示すフローチャートFlow chart showing scream detection processing 異常判定処理を示すフローチャートFlow chart showing abnormality determination processing

以下、本発明の一実施形態として、建物内の金庫が設置されている部屋内を監視領域とし、当該監視領域における音響情報と監視領域内の所定領域を撮像した画像情報とから監視領域内における異常状態を検知する場合の実施例について、図面を参照して説明する。 Hereinafter, as an embodiment of the present invention, a room in a building where a safe in a building is set as a monitoring area, and acoustic information in the monitoring area and image information obtained by imaging a predetermined area in the monitoring area are used. An embodiment for detecting an abnormal state will be described with reference to the drawings.

図１は、異常検知装置１の全体構成および配置について模式的に示した図である。異常検知装置１は、管理装置２、撮像装置３及び集音装置４によって構成される。
本実施例では、異常検知装置１は、金庫７が設置されている部屋内を監視領域とし、監視領域内において発生する異常音である悲鳴が検出されたときに、撮像装置３からの監視画像に複数の人物像が検出されている場合に、異常状態であると判定し、警備装置５を介して遠隔地に所在する警備センタ装置６へ異常通報する処理を行う。 FIG. 1 is a diagram schematically showing the overall configuration and arrangement of the abnormality detection device 1. The abnormality detection device 1 includes a management device 2, an imaging device 3, and a sound collection device 4.
In this embodiment, the abnormality detection apparatus 1 uses the room in which the safe 7 is installed as a monitoring area, and when a scream, which is an abnormal sound generated in the monitoring area, is detected, the monitoring image from the imaging apparatus 3 is detected. When a plurality of person images are detected, the abnormal state is determined, and a process of notifying the security center device 6 located at a remote location via the security device 5 is performed.

管理装置２は、建物内の図示しない事務室や警備室等に設置され、管理装置２に接続された撮像装置３から送信された監視画像及び集音装置４から送信された音響信号に基づいて異常状態の検知処理を行う。また、管理装置２は、異常状態の検知時に、事務室等に所在する管理者や警備員等に音声や画像表示によって警報出力することにより、異常状態を報知する。さらに、管理装置２は、警備装置５に接続され、異常状態の検知時に警備装置５に異常信号を送信する。 The management device 2 is installed in an office or a security room (not shown) in the building, and is based on the monitoring image transmitted from the imaging device 3 connected to the management device 2 and the acoustic signal transmitted from the sound collection device 4. An abnormal state detection process is performed. Moreover, the management apparatus 2 alert | reports an abnormal condition by outputting a warning by an audio | voice or an image display to the administrator, guard, etc. who exist in an office etc. at the time of the detection of an abnormal condition. Further, the management device 2 is connected to the security device 5 and transmits an abnormal signal to the security device 5 when an abnormal state is detected.

撮像装置３は、ＣＣＤ素子やＣ−ＭＯＳ素子等の撮像素子、光学系部品等を含んで構成される所謂監視カメラである。撮像装置３は、室内の壁９の上部又は天井部に設置され、監視領域を斜め上方から俯瞰して撮像するよう設置される。好適には、撮像装置３は、撮像範囲が部屋（閉空間）の略全域であるよう設置される。撮像装置３は、監視領域を所定時間おきに撮像して監視画像を管理装置２に順次送信する。入力画像が撮像される時間間隔は例えば１／５秒である。 The image pickup apparatus 3 is a so-called surveillance camera that includes an image pickup element such as a CCD element or a C-MOS element, optical system components, and the like. The imaging device 3 is installed on the upper part or ceiling of the indoor wall 9 and is installed so as to capture an image of the monitoring area from above. Preferably, the imaging device 3 is installed so that the imaging range is substantially the entire room (closed space). The imaging device 3 captures the monitoring area at predetermined time intervals and sequentially transmits monitoring images to the management device 2. The time interval at which the input image is captured is 1/5 second, for example.

集音装置４は、マイクロフォン、増幅器及びＡ／Ｄ変換器等を含んで構成され、監視空間の音をデジタル信号（音響信号）に変換する電気回路である。集音装置４は、少なくとも監視領域を含む領域にて発生した音響を集音できる位置に設置され、当該音響を音響信号に変換して、管理装置２に出力する。なお、増幅器は、金庫７が設置された部屋内で悲鳴が発せられたときに出力される音響信号の音量が７０〜１００ｄＢの範囲内に納まる程度の増幅率に予め設定される。 The sound collecting device 4 is configured to include a microphone, an amplifier, an A / D converter, and the like, and is an electric circuit that converts sound in a monitoring space into a digital signal (acoustic signal). The sound collection device 4 is installed at a position where sound generated at least in a region including the monitoring region can be collected, converts the sound into an acoustic signal, and outputs the sound signal to the management device 2. Note that the amplifier is set in advance so that the volume of the sound signal output when a scream is generated in the room where the safe 7 is installed falls within the range of 70 to 100 dB.

なお、管理装置２と接続されている警備装置５は、監視領域と同じ建物内の図示しない事務室や警備室等に設置されている。警備装置５は、管理装置２からの異常信号を受信すると、公衆電話回線などの広域通信ネットワークを介して警備センタ装置６に当該異常を送信し、遠隔地にある図示しない警備センタに常駐する警備員に対して異常の発生を報知する。 The security device 5 connected to the management device 2 is installed in a not-shown office room, security room or the like in the same building as the monitoring area. When the security device 5 receives the abnormality signal from the management device 2, the security device 5 transmits the abnormality to the security center device 6 via a wide area communication network such as a public telephone line, and the security device resident in a security center (not shown) at a remote location. Inform the employee of the occurrence of an abnormality.

図２は、管理装置２の構成を示している。管理装置２は、コンピュータ機能を有しており、記憶部２１、制御部２２、入力部２３、出力部２４及び通信部２５を備えている。 FIG. 2 shows the configuration of the management device 2. The management device 2 has a computer function and includes a storage unit 21, a control unit 22, an input unit 23, an output unit 24, and a communication unit 25.

通信部２５は、ＬＡＮやＵＳＢ等の通信インタフェースであり、撮像装置３、集音装置４及び警備装置５と通信を行う。 The communication unit 25 is a communication interface such as a LAN or USB, and communicates with the imaging device 3, the sound collection device 4, and the security device 5.

入力部２３は、キーボードやマウス、タッチパネル、可搬記憶媒体の読み取り装置等の情報入力デバイスである。管理者等は、入力部２３を用いて、管理装置２に対して増幅率等の様々な設定情報や操作情報等を入力することができる。 The input unit 23 is an information input device such as a keyboard, a mouse, a touch panel, or a portable storage medium reading device. An administrator or the like can input various setting information such as an amplification factor, operation information, and the like to the management apparatus 2 using the input unit 23.

出力部２４は、制御部２２による処理結果を様々な外部装置に出力するためのインタフェースである。出力部２４は、例えばスピーカやブザー等の音響出力装置と接続され、制御部２２からの指示により、当該音響出力装置に対して警告音の鳴動を実行させる異常信号を出力する。また、出力部２４は、例えばディスプレイ等の表示出力装置と接続され、制御部２２からの指示により、当該表示出力装置に対して警告メッセージを表示出力させる。管理者等は、警告出力として出力部２４からの表示出力や音声出力を確認することにより、監視領域内における異常状態を検知することができる。なお、管理者等は、表示出力を確認することにより、異常検知装置１の設定情報、監視領域における監視画像等を確認することもできる。
また、出力部２４は、外部の警備装置５に対して異常信号を送信する通信インタフェースを含んでもよい。これにより、制御部２２の処理によって異常状態が検知された場合に、管理装置２は、異常状態を知らせる信号を外部の監視センタなどに通知することができる。なお、この場合、出力部２４は、通信部２５と共通のインタフェース装置であってもよい。 The output unit 24 is an interface for outputting processing results from the control unit 22 to various external devices. The output unit 24 is connected to a sound output device such as a speaker or a buzzer, for example, and outputs an abnormal signal that causes the sound output device to sound a warning sound according to an instruction from the control unit 22. The output unit 24 is connected to a display output device such as a display, for example, and causes the display output device to display and output a warning message according to an instruction from the control unit 22. An administrator or the like can detect an abnormal state in the monitoring area by confirming the display output or the audio output from the output unit 24 as a warning output. Note that the administrator or the like can also confirm the setting information of the abnormality detection device 1, the monitoring image in the monitoring area, and the like by confirming the display output.
The output unit 24 may include a communication interface that transmits an abnormal signal to the external security device 5. Thereby, when an abnormal state is detected by the process of the control unit 22, the management apparatus 2 can notify a signal indicating the abnormal state to an external monitoring center or the like. In this case, the output unit 24 may be an interface device common to the communication unit 25.

記憶部２１は、ＲＯＭ、ＲＡＭ、ＨＤＤ等の情報記憶装置である。記憶部２１は、各種プログラムや各種データを記憶し、制御部２２との間でこれらの情報を入出力する。各種データには、母音特徴量２１１及び移動軌跡情報２１２が含まれる。 The storage unit 21 is an information storage device such as a ROM, RAM, or HDD. The storage unit 21 stores various programs and various data, and inputs / outputs such information to / from the control unit 22. Various data includes a vowel feature quantity 211 and movement trajectory information 212.

母音特徴量２１１は予め作成された母音の特徴量のデータであり、後述する母音抽出手段２２４において音響信号から母音を抽出するために参照される。母音特徴量２１１は母音の種類ごとに各母音の種類を表す識別子と対応付けて記憶されている。
母音特徴量２１１の元データは、多数の話者から採取した悲鳴の語尾のサンプル音響信号である。事前に、これらのサンプル音響信号のそれぞれから悲鳴の語尾の周波数特徴量を表すスペクトル包絡のパラメータを抽出し、母音の種類ごとに当該パラメータの分布を学習しておく。本実施例では、スペクトル包絡を表すパラメータとして８次のＬＰＣケプストラム（LPC：Linear Predictive Coding）、分布としてＧＭＭ（Gaussian Mixture Model；混合正規分布）を用いる。なお、周波数特徴量はこれに限らず、ＭＦＣＣ（Mel-Frequency Cepstral Coefficients）など音声分析で知られている様々なスペクトルパラメータを利用することもできる。また、悲鳴の語尾の母音として「あ」「え」「お」の３種類を用いる。すなわち、母音特徴量２１１として、「あ」のＬＰＣケプストラムのＧＭＭ、「え」のＬＰＣケプストラムのＧＭＭ、「お」のＬＰＣケプストラムのＧＭＭのそれぞれが記憶部２１に記憶されている。 The vowel feature quantity 211 is vowel feature quantity data created in advance, and is referred to in order to extract a vowel from the acoustic signal in the vowel extraction means 224 described later. The vowel feature quantity 211 is stored in association with an identifier representing each vowel type for each vowel type.
The original data of the vowel feature quantity 211 is a sample acoustic signal at the end of a scream collected from many speakers. In advance, a spectral envelope parameter representing the frequency feature amount of the scream ending is extracted from each of these sample acoustic signals, and the distribution of the parameter is learned for each type of vowel. In this embodiment, an 8th-order LPC cepstrum (LPC: Linear Predictive Coding) is used as a parameter representing a spectral envelope, and a GMM (Gaussian Mixture Model) is used as a distribution. The frequency feature quantity is not limited to this, and various spectrum parameters known in speech analysis such as MFCC (Mel-Frequency Cepstral Coefficients) can also be used. In addition, three types of “a”, “e”, and “o” are used as vowels at the end of the scream. That is, as the vowel feature quantity 211, the GMM of the “PC” LPC cepstrum, the GMM of the “PC” LPC cepstrum, and the GMM of the “PC” LPC cepstrum are stored in the storage unit 21, respectively.

移動軌跡情報２１２は、撮像装置３にて取得した監視画像を解析し、当該監視画像に含まれる人物像の監視画像上における座標を時系列に保存することにより、当該人物像の移動軌跡を追跡できるよう生成された情報である。移動軌跡情報２１２の具体例としては、図３に示されるように、監視画像が取得された時刻と、当該監視画像内に人物像が含まれていた場合に当該人物像に付与される一意の識別子である像ＩＤと、当該人物像の重心の監視画像上における座標と、を対応付けるテーブルとして記憶部２１に保存される。なお、像ＩＤは、後述するように、制御部２２にて過去時刻における監視画像内の人物像と現在時刻における人物像との関連を解析され、過去時刻における人物像と現在時刻における人物像とが同一であると判定されれば共通する像ＩＤが付与され、過去時刻における人物像と現在時刻における人物像とが同一でないと判定されれば新規の像ＩＤが付与される。 The movement trajectory information 212 analyzes the monitoring image acquired by the imaging apparatus 3 and tracks the movement trajectory of the person image by storing the coordinates of the person image included in the monitoring image on the monitoring image in time series. Information generated so that it can be used. As a specific example of the movement trajectory information 212, as shown in FIG. 3, the time when the monitoring image is acquired and a unique image given to the person image when the person image is included in the monitoring image. It is stored in the storage unit 21 as a table that associates the image ID that is the identifier and the coordinates of the center of gravity of the person image on the monitoring image. As will be described later, the image ID is analyzed by the control unit 22 between the person image in the monitoring image at the past time and the person image at the current time, and the person image at the past time and the person image at the current time are analyzed. Are determined to be the same, a common image ID is assigned, and if it is determined that the person image at the past time and the person image at the current time are not the same, a new image ID is assigned.

制御部２２は、例えばＣＰＵやＤＳＰ等の演算装置であって、記憶部２１に記憶されるプログラムに従って各種の情報処理を実行する。本実施例では、制御部２２は、集音装置４によって取得した音響信号と撮像装置３によって取得した監視画像とを解析して、異常状態を検知した場合に当該異常の内容に応じた信号を出力部２４に出力する異常検知処理を行う。また、制御部２２は、入力部２３からの設定情報や操作情報等の入力情報を記憶部２１に保存する処理を行う。制御部２２は、機能的に、人物像抽出手段２２１と、追跡手段２２２と、人物像算出手段２２３と、母音抽出手段２２４と、悲鳴判定手段２２５と、異常判定手段２２６とを含んで構成される。 The control unit 22 is an arithmetic device such as a CPU or a DSP, for example, and executes various types of information processing according to programs stored in the storage unit 21. In the present embodiment, the control unit 22 analyzes the acoustic signal acquired by the sound collecting device 4 and the monitoring image acquired by the imaging device 3, and outputs a signal corresponding to the content of the abnormality when an abnormal state is detected. An abnormality detection process to be output to the output unit 24 is performed. In addition, the control unit 22 performs processing for storing input information such as setting information and operation information from the input unit 23 in the storage unit 21. The control unit 22 functionally includes a person image extraction unit 221, a tracking unit 222, a person image calculation unit 223, a vowel extraction unit 224, a scream determination unit 225, and an abnormality determination unit 226. The

人物像抽出手段２２１は、撮像装置３によって所定時間おきに撮像された監視画像のそれぞれについて、当該監視画像の中から、１又は複数の人物像を抽出する人物像抽出処理を行う。具体的には、例えば人物像抽出手段２２１は、予め人が誰もいない状態の監視空間を撮像して得られた監視画像を背景画像として記憶部２１に記憶しておき、当該背景画像と監視画像とを比較して差分画素を抽出する。そして、互いに隣接する差分画素を含んで構成される差分画素群のうち、所定の条件を満足する形状や所定値以上の大きさを有する差分画素群を、人を表す人物像として抽出する。また、人物像抽出手段２２１は、上記方法のほか、エッジ検出などの各種の画像処理を組み合わせて人物像を抽出してもよい。人物像抽出手段２２１によって抽出された人物像に関する情報（例えば人物像の位置や色ヒストグラム等の画像特徴に関する量）は、記憶部２１に記憶され、後述する追跡手段２２２による人物像の追跡処理に用いられる。 The person image extraction unit 221 performs person image extraction processing for extracting one or a plurality of person images from the monitoring images for each of the monitoring images captured every predetermined time by the imaging device 3. Specifically, for example, the person image extraction unit 221 stores a monitoring image obtained by imaging a monitoring space in a state where no one is present in advance in the storage unit 21 as a background image, and the background image and the monitoring image are recorded. A difference pixel is extracted by comparing with an image. Then, among the difference pixel group configured to include the difference pixels adjacent to each other, a difference pixel group having a shape satisfying a predetermined condition and a size greater than or equal to a predetermined value is extracted as a human image representing a person. In addition to the above method, the person image extraction unit 221 may extract a person image by combining various image processes such as edge detection. Information relating to the person image extracted by the person image extraction means 221 (for example, the amount of the image feature such as the position of the person image and the color histogram) is stored in the storage unit 21 and is used for tracking the person image by the tracking means 222 described later. Used.

追跡手段２２２は、撮像装置３が所定時間おきに取得した各監視画像について、人物像抽出手段２２１が人物像を抽出するごとに、人物像の追跡処理を行う。人物像の追跡処理では、追跡手段２２２は、新たに撮像された監視画像から人物像抽出手段２２１によって抽出された各人物像を、過去時刻において抽出されて記憶部２１に記憶されている人物像と比較することによって、当該人物像に対応する人物の監視空間内における移動経路を記憶部２１の移動軌跡情報２１２に保存する。具体的には、追跡手段２２２は、新たな監視画像から抽出された各人物像に関する情報（例えば人物像の位置や画像特徴に関する量）を、前回撮像され記憶部２１に保存されている監視画像から抽出された人物像に関する情報と比較することによって、同じ人物を表す人物像同士を関連付ける。そして、同一人物を表す人物像として互いに関連付けられた人物像について同一の像ＩＤを付与し、時系列に従って撮像された各監視画像上の座標を、像ＩＤと対応付けて記憶部２１の移動軌跡情報２１２に保存する。 The tracking unit 222 performs tracking processing of a person image each time the person image extraction unit 221 extracts a person image for each monitoring image acquired by the imaging device 3 at predetermined time intervals. In the person image tracking process, the tracking unit 222 extracts each person image extracted by the person image extracting unit 221 from the newly captured monitoring image at the past time and stored in the storage unit 21. And the movement path of the person corresponding to the person image in the monitoring space is stored in the movement locus information 212 of the storage unit 21. Specifically, the tracking unit 222 captures information related to each person image extracted from the new monitoring image (for example, the amount related to the position of the person image and the image feature) and is stored in the storage unit 21 last time. The person images representing the same person are associated with each other by comparing with the information about the person image extracted from. Then, the same image ID is assigned to the person images associated with each other as the person image representing the same person, and the coordinates on each monitoring image captured in time series are associated with the image ID and the movement locus of the storage unit 21. Saved in information 212.

人物像算出手段２２３は、現在時刻において抽出されている人物像の総数を算出する人物像算出処理を行う。本実施例では、人物像算出処理は、移動軌跡情報２１２を参照し、現在時刻において追跡されている人物像の像ＩＤの総数を求めることにより行われる。
また、人物像算出手段２２３は、算出した人物像の総数が複数人（２人以上）であった場合、記憶部２１の複数人フラグをＯＮとする処理を行う。ここで、複数人フラグとは、複数の人物像を検出したことを示すフラグである。 The person image calculation means 223 performs a person image calculation process for calculating the total number of person images extracted at the current time. In this embodiment, the human image calculation process is performed by referring to the movement trajectory information 212 and obtaining the total number of human image IDs tracked at the current time.
In addition, the person image calculation unit 223 performs a process of turning on the multiple person flag in the storage unit 21 when the total number of calculated human images is plural (two or more). Here, the multi-person flag is a flag indicating that a plurality of person images are detected.

母音抽出手段２２４は集音装置４が集音した音響信号から音声の母音部分を抽出して、抽出結果を悲鳴判定手段２２５に出力する。すなわち母音抽出手段２２４は、入力音響信号の周波数特徴量をフレームごとに算出して各周波数特徴量を母音特徴量２１１と比較し、各フレームが母音の周波数特徴量を有するか否かを判定して判定結果を悲鳴判定手段２２５に出力する。 The vowel extraction means 224 extracts the vowel part of the sound from the acoustic signal collected by the sound collection device 4 and outputs the extraction result to the scream determination means 225. That is, the vowel extraction unit 224 calculates the frequency feature amount of the input acoustic signal for each frame, compares each frequency feature amount with the vowel feature amount 211, and determines whether each frame has the frequency feature amount of the vowel. The determination result is output to the scream determination means 225.

好ましくは、母音抽出手段２２４は、母音の周波数特徴量を有するフレームが判定されたときに当該フレームの母音種別（「あ」、「え」又は「お」）の情報をも悲鳴判定手段２２５に出力する。母音抽出手段２２４が抽出する母音「あ」、「え」又は「お」は、悲鳴の語尾として典型的に発声される母音である。「あ」を語尾とする典型例な悲鳴の例としては「きゃー」「ぎゃー」「わー」「うわー」「あわわー」を挙げることができ、「え」を語尾とする典型例な悲鳴の例としては「助けてー」「止めてー」「助けてくれー」「止めてくれー」を挙げることができ、「お」を語尾とする典型例な悲鳴の例としては「止めろー」を挙げることができる。 Preferably, the vowel extraction unit 224 also sends information on the vowel type (“A”, “E”, or “O”) of the frame to the scream determination unit 225 when a frame having the frequency characteristic amount of the vowel is determined. Output. The vowel “A”, “E” or “O” extracted by the vowel extraction means 224 is a vowel typically uttered as a scream ending. Examples of typical screams that end with "A" include "Kya", "Gya", "Wow", "Wow", and "Awow". Typical examples of screams that end with "E" For example, you can list "help me", "stop me", "help me", "stop me" as an example of a typical scream that ends with "o" Can be mentioned.

母音抽出手段２２４が算出する周波数特徴量は母音特徴量２１１と同種であり、本例では８次のＬＰＣケプストラムである。フレーム長及びフレーム周期には音声分析に適した値が予め設定される。本例では、フレーム長を２０ｍｓ、フレーム周期を１０ｍｓとする。具体的には母音抽出手段２２４は、入力音響信号の各フレームの周波数特徴量と各母音の母音特徴量２１１との距離を算出して予め設定された母音判定しきい値と比較し、フレームの周波数特徴量といずれかの母音特徴量２１１の距離が母音判定しきい値以下であれば、当該フレームは母音であり当該母音特徴量２１１と対応する母音種別であると判定する。一方、母音抽出手段２２４は、距離が母音判定しきい値以下の母音特徴量２１１がひとつもないフレームは母音ではないと判定する。 The frequency feature amount calculated by the vowel extraction unit 224 is the same type as the vowel feature amount 211, and is an eighth-order LPC cepstrum in this example. Values suitable for speech analysis are set in advance for the frame length and frame period. In this example, the frame length is 20 ms and the frame period is 10 ms. Specifically, the vowel extraction unit 224 calculates the distance between the frequency feature amount of each frame of the input acoustic signal and the vowel feature amount 211 of each vowel, and compares it with a preset vowel determination threshold value. If the distance between the frequency feature quantity and one of the vowel feature quantities 211 is equal to or smaller than the vowel determination threshold value, the frame is determined to be a vowel and a vowel type corresponding to the vowel feature quantity 211. On the other hand, the vowel extraction unit 224 determines that a frame having no vowel feature quantity 211 whose distance is equal to or smaller than the vowel determination threshold is not a vowel.

母音特徴量２１１がＧＭＭで記憶されている本例において、各母音特徴量２１１とフレームの周波数特徴量の距離Ｄは、数式１で算出される。

但し、Ｋは距離算出対象の母音特徴量２１１を構成している正規分布の数、ｋはそのインデックス番号（ｋ＝１，…，Ｋ）を表しており、ｎは周波数特徴量の次数、ｉはそのインデックス番号（ｉ＝１，…，ｎ）を表している。また、ｘ_ｉは入力音響信号の周波数特徴量を表すベクトルのｉ番目要素、ｍ_ｋ，ｉはｋ番目の正規分布の平均を表す平均ベクトルのｉ番目要素、σ^２ _ｋ，ｉはｋ番目の正規分布におけるベクトルｉ番目要素の分散、ｗ_ｋはｋ番目の正規分布の重み係数、をそれぞれ表している。 In this example in which the vowel feature quantity 211 is stored in the GMM, the distance D between each vowel feature quantity 211 and the frequency feature quantity of the frame is calculated by Equation 1.

Here, K is the number of normal distributions constituting the vowel feature quantity 211 for distance calculation, k is the index number (k = 1,..., K), n is the order of the frequency feature quantity, i Represents the index number (i = 1,..., N). X _i is the i-th element of the vector representing the frequency feature quantity of the input acoustic signal, m _{k, i} is the i-th element of the average vector representing the average of the k-th normal distribution, and σ ² _{k, i} is the k-th element. The variance of the vector i-th element in the normal distribution, w _k represents the weight coefficient of the k-th normal distribution, respectively.

悲鳴判定手段２２５は、処理対象のフレームが、予め設定された悲鳴音量しきい値以上の音量（パワー）を有しているか否かを判定することにより、悲鳴の音量であるか否かを判定する処理を行う。また、母音抽出手段２２４により抽出された母音部分が、予め設定された悲鳴判定時間以上継続しているか否かを判定することにより、悲鳴を検出する処理を行う。なお、悲鳴音量しきい値及び悲鳴判定時間には悲鳴の語尾の検出に適した値が予め設定される。本例では、悲鳴音量しきい値を７０ｄＢ、悲鳴判定時間を２００ｍｓとする。因みにフレーム周期が１０ｍｓと設定される本例においてフレーム数に換算された悲鳴判定時間は２０フレームとなる。 The scream determining unit 225 determines whether or not the processing target frame has a volume (power) equal to or higher than a preset scream volume threshold, thereby determining whether or not the volume is a scream volume. Perform the process. Moreover, the process which detects a scream is performed by determining whether the vowel part extracted by the vowel extraction means 224 continues more than preset scream determination time. Note that values suitable for detection of the scream ending are preset in the scream volume threshold and the scream determination time. In this example, the scream volume threshold is 70 dB, and the scream determination time is 200 ms. Incidentally, in this example in which the frame period is set to 10 ms, the scream determination time converted into the number of frames is 20 frames.

上述したように悲鳴音声全体の発声内容は様々だが、悲鳴の語尾に注目することで抽出すべき発声内容を高々３種類に減少させることができる。これにより発声内容が想定外の悲鳴を検出し損ねる不具合が著しく減少する。また母音部分を有することにより咳、くしゃみ、クラクション、扉を閉める音、きしみ音、缶等の落下音など、音量が大きく継続時間長の長い悲鳴以外の音を悲鳴と誤検出する不具合を減少させることができる。また母音部分の音量の条件により通常音量の会話音声を悲鳴と誤検出する不具合を減少させることができる。また母音部分の継続時間長の条件により笑い声のような大声の母音を悲鳴と誤検出する不具合を減少させることができる。 As described above, the utterance contents of the entire screaming voice are various, but the utterance contents to be extracted can be reduced to at most three types by paying attention to the ending of the screaming. As a result, the trouble that the utterance content fails to detect an unexpected scream is significantly reduced. Also, by having a vowel part, it reduces the problems of falsely detecting screams other than screams with high volume and long duration, such as coughing, sneezing, horning, closing sounds, squeaking sounds, falling sounds of cans, etc. be able to. In addition, it is possible to reduce the problem of erroneously detecting a normal volume conversation voice as a scream according to the volume condition of the vowel part. Further, it is possible to reduce the problem of erroneously detecting a loud vowel such as a laughing voice as a scream according to the condition of the duration time of the vowel part.

ここで大声での会話音声の語中において母音が連続する区間が含まれると、悲鳴と誤検出する可能性がある。そこで悲鳴判定手段２２５は、母音抽出手段２２４が抽出した母音種別を参照して同一母音が継続している区間のそれぞれを上記母音部分（判定区間）として悲鳴か否かを判定する。つまり母音種別が異なる区間は互いに異なる母音部分として悲鳴か否かが判定される。これにより、例えば大声会話音声の語中に「あ」と「お」が連続する「あお」という区間が含まれ、「あお」の区間の長さが悲鳴判定時間に達していたとしても、「あ」の区間と「お」の区間が別々の判定区間となるので、この大声会話音声を悲鳴と誤検出する不具合を減少させることができる。 Here, if a section in which a vowel is continuous is included in a word of loud conversational speech, there is a possibility of erroneous detection as a scream. Therefore, the scream determination means 225 refers to the vowel type extracted by the vowel extraction means 224 to determine whether each section in which the same vowel continues is the scream part (determination section) or not. That is, it is determined whether sections having different vowel types are screamed as different vowel parts. Thus, for example, even if a section of “Ao” in which “A” and “O” are consecutive is included in a word of a loud conversation voice, and the length of the section of “AO” reaches the scream determination time, Since the “a” section and the “o” section are separate determination sections, it is possible to reduce the problem of erroneously detecting this loud conversation voice as a scream.

異常判定手段２２６は、人物像算出手段２２３における検出結果である複数人フラグと悲鳴判定手段２２５における悲鳴検出処理結果である悲鳴フラグとに基づいて、悲鳴が検出されたときに複数人の人物像が検出されているか否かを判定し、監視領域内の異常状態判定処理を行う。そして、監視領域内において異常状態であると判定した場合、異常判定手段２２６は、出力部２４に対して警告出力を行わせる。 The abnormality determination unit 226 is configured to detect a plurality of person images when a scream is detected based on the multiple person flag that is the detection result of the human image calculation unit 223 and the scream flag that is the scream detection processing result of the scream determination unit 225. Is detected, and an abnormal state determination process in the monitoring area is performed. If it is determined that there is an abnormal condition within the monitoring area, the abnormality determination unit 226 causes the output unit 24 to output a warning.

以下、本実施例に係る管理装置２の制御部２２が実行する処理の流れの一例について、図４〜図７のフローチャートに基づいて説明する。
図４は、制御部２２における異常検知処理を示すフローチャートである。 Hereinafter, an example of the flow of processing executed by the control unit 22 of the management device 2 according to the present embodiment will be described based on the flowcharts of FIGS.
FIG. 4 is a flowchart showing an abnormality detection process in the control unit 22.

動作に先立ち、管理者等により管理装置２の入力部２３を用いて悲鳴音量しきい値の設定等の各種初期設定が行なわれる（Ｓ１）。初期設定では、まず、部屋が無人であることを確認した管理者が管理装置２を起動すると、制御部２２の人物像抽出手段２２１は、起動後の所定時間に撮像された監視画像を用いて背景画像を生成し、記憶部２１に記憶させる。また、初期設定では、上記の他にも必要に応じて、悲鳴音量しきい値や悲鳴判定時間等を入力し設定登録する。上記の初期設定が終わると撮像装置３から新たな監視画像が入力されるたびにステップＳ３〜Ｓ８の異常検知処理が繰り返される。 Prior to the operation, various initial settings such as setting of a scream volume threshold are performed by an administrator or the like using the input unit 23 of the management device 2 (S1). In the initial setting, first, when an administrator who has confirmed that a room is unattended starts up the management apparatus 2, the person image extraction unit 221 of the control unit 22 uses a monitoring image captured at a predetermined time after the startup. A background image is generated and stored in the storage unit 21. In addition, in the initial setting, a scream volume threshold value, a scream determination time, and the like are input and registered as necessary in addition to the above. When the initial setting is completed, the abnormality detection process in steps S3 to S8 is repeated each time a new monitoring image is input from the imaging device 3.

続いて図４に戻り、ステップＳ３の複数人物像検出処理について説明する。複数人物像検出処理は、撮像装置３によって監視画像を取得する毎に（すなわち監視画像のフレーム周期毎に）、人物像抽出手段２２１、追跡手段２２２、人物像算出手段２２３によって実施される処理である。
図５は、本実施例に係る複数人物像検出処理を示すフローチャートである。以下では、図５のフローチャートに基づいて複数人物像検出処理を説明する。 Next, returning to FIG. 4, the multiple person image detection process in step S3 will be described. The multiple person image detection process is a process performed by the person image extraction unit 221, the tracking unit 222, and the person image calculation unit 223 every time a monitoring image is acquired by the imaging device 3 (that is, every frame period of the monitoring image). is there.
FIG. 5 is a flowchart illustrating the multiple person image detection process according to the present embodiment. Below, a multiple person image detection process is demonstrated based on the flowchart of FIG.

図５に示すように、まず、撮像装置３より現在時刻における監視画像が取得され（Ｓ３０）、当該監視画像を通信部２５を介して送信されると、人物像抽出手段２２１は、当該監視画像と背景画像とを比較して、人物像抽出処理を行う（Ｓ３２）。 As shown in FIG. 5, first, when a monitoring image at the current time is acquired from the imaging device 3 (S30) and the monitoring image is transmitted via the communication unit 25, the person image extraction unit 221 displays the monitoring image. And the background image are compared, and a human image extraction process is performed (S32).

次に、制御部２２は、ステップＳ３２の人物像抽出処理で１以上の人物像が抽出されたか否か判定する（Ｓ３４）。人物像が抽出されなかった場合（Ｓ３４−Ｎｏ）、ステップＳ３０で取得した監視画像を新たな背景画像として記憶部２１に記憶して（Ｓ４４）、ステップＳ４６へ進む。ここで、人物像が抽出されなかった場合に背景画像を更新するのは、時間の経過によって背景画像に変化が生じる場合に対応するためである。 Next, the control unit 22 determines whether or not one or more person images have been extracted in the person image extraction process of step S32 (S34). When the person image is not extracted (S34-No), the monitoring image acquired in step S30 is stored in the storage unit 21 as a new background image (S44), and the process proceeds to step S46. Here, the reason why the background image is updated when the person image is not extracted is to cope with the case where the background image changes with time.

ステップＳ３２の人物像抽出処理で人物像が抽出された場合（Ｓ３４−Ｙｅｓ）、追跡手段２２２は、前述した人物像の追跡処理を行う（Ｓ３６）。次に、人物像算出手段２２３は、人物像算出処理を行う（Ｓ３８）。 When a person image is extracted in the person image extraction process in step S32 (S34-Yes), the tracking unit 222 performs the person image tracking process described above (S36). Next, the person image calculation means 223 performs a person image calculation process (S38).

次に、人物像算出手段２２３は、複数の人物像が存在するか否か判定し（Ｓ４０）、複数の人物像が存在する場合（Ｓ４０−Ｙｅｓ）、複数人フラグをＯＮとし（Ｓ４２）、複数人物像検出処理を終了し、ステップＳ５へ進む。 Next, the person image calculation means 223 determines whether or not there are a plurality of person images (S40). If there are a plurality of person images (S40-Yes), the person image calculating unit 223 turns on the plurality of person flags (S42). The multiple person image detection process ends, and the process proceeds to step S5.

一方、ステップＳ４０にて複数の人物像が存在しない場合（Ｓ４０−Ｎｏ）、またはステップＳ４４にて背景画像を更新した場合、複数人フラグをＯＦＦとし（Ｓ４６）、複数人物像検出処理を終了し、ステップＳ５へ進む。 On the other hand, when a plurality of person images do not exist in step S40 (S40-No), or when the background image is updated in step S44, the plurality of person flags are turned off (S46), and the plurality of person image detection processes are terminated. The process proceeds to step S5.

続いて図４に戻り、ステップＳ５の悲鳴検出処理について説明する。悲鳴検出処理は、母音抽出手段２２４と悲鳴判定手段２２５とによって実施される処理であり、集音装置４によって送信され記憶部２１に記憶された音響信号について、予め定めたフレーム周期毎に実施される処理である。すなわち、制御部２２は、記憶部２１にフレーム周期の長さの音響信号が新たに追加記憶されたか否かを確認する。そして、フレーム周期が到来すると、制御部２２は、記憶部２１から予め設定されたフレーム長だけのフレームデータを読み出して、ハミング窓関数による窓掛け処理を行い、窓掛けしたフレームデータに対して悲鳴検出処理を行う。
図６は、本実施例に係る悲鳴検出処理を示すフローチャートである。以下では、図６のフローチャートに基づいて悲鳴検出処理を説明する。 Subsequently, returning to FIG. 4, the scream detection process in step S <b> 5 will be described. The scream detection processing is performed by the vowel extraction unit 224 and the scream determination unit 225, and is performed for each predetermined frame period for the acoustic signal transmitted by the sound collection device 4 and stored in the storage unit 21. Process. That is, the control unit 22 confirms whether or not an acoustic signal having a frame period length is newly stored in the storage unit 21. When the frame period arrives, the control unit 22 reads out frame data of a preset frame length from the storage unit 21, performs windowing processing using a Hamming window function, and screams the windowed frame data. Perform detection processing.
FIG. 6 is a flowchart illustrating the scream detection process according to the present embodiment. Below, a scream detection process is demonstrated based on the flowchart of FIG.

制御部２２によって記憶部２１からフレームデータが読み出されると、悲鳴判定手段２２５は、当該フレームデータの音量（パワー）を算出し（Ｓ５２）、算出された音量を悲鳴音量しきい値と比較する（Ｓ５４）。音量が悲鳴音量しきい値を超えていなければ（Ｓ５４−Ｎｏ）、悲鳴判定手段２２５は悲鳴なしと判定し、制御部２２は悲鳴カウンタの値をリセットして（Ｓ７０）、悲鳴検出処理を終了し、処理をステップＳ８へ進める。一方、音量が悲鳴音量しきい値を超えていれば（Ｓ５４−Ｙｅｓ）、悲鳴判定手段２２５は、悲鳴が発生している可能性があるとして処理をステップＳ５６へ進める。 When the frame data is read from the storage unit 21 by the control unit 22, the scream determination unit 225 calculates the volume (power) of the frame data (S52) and compares the calculated volume with a scream volume threshold ( S54). If the volume does not exceed the scream volume threshold (S54-No), the scream determination means 225 determines that there is no scream, the control unit 22 resets the value of the scream counter (S70), and ends the scream detection process. Then, the process proceeds to step S8. On the other hand, if the volume exceeds the scream volume threshold (S54-Yes), the scream determination unit 225 proceeds with the process to step S56 because there is a possibility that a scream has occurred.

次に母音抽出手段２２４は、フレームデータの周波数特徴量を算出する（Ｓ５６）。そして、母音抽出手段２２４は、記憶部２１から「あ」の母音特徴量２１１、「え」の母音特徴量２１１及び「お」の母音特徴量２１１を順次読み出し、読み出した各母音特徴量２１１とフレームデータの周波数特徴量とを比較してフレームデータが母音であるか否かを判定する（Ｓ５６）。また母音抽出手段２２４は、フレームデータが母音と判定された場合、その母音種別を特定する。 Next, the vowel extraction unit 224 calculates the frequency feature amount of the frame data (S56). Then, the vowel extraction unit 224 sequentially reads out the vowel feature quantity 211 of “A”, the vowel feature quantity 211 of “e”, and the vowel feature quantity 211 of “O” from the storage unit 21, and each read vowel feature quantity 211 and It is determined whether the frame data is a vowel by comparing the frequency feature amount of the frame data (S56). In addition, the vowel extraction unit 224 specifies the vowel type when the frame data is determined to be a vowel.

フレームデータが母音でないと判定された場合（Ｓ５８−Ｎｏ）、母音抽出手段２２４は母音が抽出されなかった旨を悲鳴判定手段２２５に出力する。この出力を受けた悲鳴判定手段２２５は悲鳴なしと判定する。そして、制御部２２は悲鳴カウンタの値をリセットして（Ｓ７０）、悲鳴検出処理を終了し、処理をステップＳ８へ進める。 When it is determined that the frame data is not a vowel (S58-No), the vowel extraction unit 224 outputs to the scream determination unit 225 that a vowel has not been extracted. Upon receiving this output, the scream determining means 225 determines that there is no scream. And the control part 22 resets the value of a scream counter (S70), complete | finishes a scream detection process, and advances a process to step S8.

一方、フレームデータが母音であると判定されると（Ｓ５８−Ｙｅｓ）、母音抽出手段２２４は、特定された母音種別が前回特定された母音種別と同一か否かを確認するとともに今回特定された母音種別を記憶部２１に記憶させる（Ｓ６０）。次回の確認では記憶部２１に記憶される今回の母音種別が前回の母音種別として参照される。 On the other hand, if it is determined that the frame data is a vowel (S58-Yes), the vowel extraction unit 224 checks whether or not the specified vowel type is the same as the previously specified vowel type and is specified this time. The vowel type is stored in the storage unit 21 (S60). In the next confirmation, the current vowel type stored in the storage unit 21 is referred to as the previous vowel type.

今回特定された母音種別が前回特定された母音種別と同一ならば（Ｓ６０−Ｙｅｓ）、母音抽出手段２２４は同母音が継続抽出された旨を悲鳴判定手段２２５に出力する。この出力を受けた悲鳴判定手段２２５は悲鳴カウンタを１だけ増加させる（Ｓ６２）。
他方、今回特定された母音種別が前回特定された母音種別と同一でなければ（Ｓ６０−Ｎｏ）、母音抽出手段２２４は新たな母音が抽出された旨を悲鳴判定手段２２５に出力する。この出力を受けた悲鳴判定手段２２５は悲鳴カウンタに１を設定する（Ｓ６４）。
こうしてステップＳ６２又はステップＳ６４にて悲鳴カウンタの値が更新されると、悲鳴判定手段２２５は、悲鳴カウンタを悲鳴判定時間と比較する（Ｓ６６）。悲鳴カウンタが悲鳴判定時間を超えていない場合（Ｓ６６−Ｎｏ）、悲鳴判定手段２２５は判定継続中であるとし、制御部２２は次のフレーム周期が到来するまで悲鳴検出処理を終了させ、ステップＳ８へ処理を進める。 If the vowel type specified this time is the same as the vowel type specified last time (S60-Yes), the vowel extraction unit 224 outputs to the scream determination unit 225 that the vowel is continuously extracted. Upon receiving this output, the scream determining means 225 increments the scream counter by 1 (S62).
On the other hand, if the vowel type specified this time is not the same as the vowel type specified last time (S60-No), the vowel extraction unit 224 outputs to the scream determination unit 225 that a new vowel has been extracted. Upon receiving this output, the scream determining means 225 sets 1 to the scream counter (S64).
When the value of the scream counter is updated in step S62 or step S64 in this way, the scream determination means 225 compares the scream counter with the scream determination time (S66). If the scream counter does not exceed the scream determination time (S66-No), it is determined that the scream determination unit 225 is continuing the determination, and the control unit 22 ends the scream detection process until the next frame period arrives, step S8. Proceed to the process.

悲鳴カウンタが悲鳴判定時間を超えていた場合（Ｓ６６−Ｙｅｓ）、悲鳴判定手段２２５は、悲鳴フラグをＯＮとする（Ｓ６８）。その後、制御部２２は悲鳴カウンタの値をリセットして（Ｓ７０）、悲鳴検出処理を終了し、ステップＳ８へ処理を進める。 When the scream counter has exceeded the scream determination time (S66-Yes), the scream determination means 225 turns on the scream flag (S68). Then, the control part 22 resets the value of a scream counter (S70), complete | finishes a scream detection process, and advances a process to step S8.

続いて図４に戻り、ステップＳ８の異常判定処理について説明する。ステップＳ８の異常判定処理は、記憶部２１の複数人フラグと悲鳴フラグとに基づいて、異常判定手段２２６にて実施される処理である。
図７は、本実施例に係る異常判定処理を示すフローチャートである。以下では、図７のフローチャートに基づいて異常判定処理を説明する。 Next, returning to FIG. 4, the abnormality determination process in step S8 will be described. The abnormality determination process in step S8 is a process performed by the abnormality determination unit 226 based on the multiple person flag and the scream flag in the storage unit 21.
FIG. 7 is a flowchart illustrating the abnormality determination process according to the present embodiment. Hereinafter, the abnormality determination process will be described based on the flowchart of FIG.

図７に示すように、まず、異常判定手段２２６は、記憶部２１の悲鳴フラグがＯＮとなっているか否かを判定する（Ｓ８０）。 As shown in FIG. 7, first, the abnormality determination unit 226 determines whether or not the scream flag in the storage unit 21 is ON (S80).

ステップＳ８０で悲鳴フラグがＯＮとなっている場合（Ｓ８０−Ｙｅｓ）、異常判定手段２２６は、記憶部２１の複数人フラグがＯＮとなっているか否かを判定する（Ｓ８２）。 When the scream flag is ON in step S80 (S80-Yes), the abnormality determination unit 226 determines whether the multiple person flag in the storage unit 21 is ON (S82).

ステップＳ８２で複数人フラグがＯＮとなっている場合（Ｓ８２−Ｙｅｓ）、異常判定手段２２６は、出力部２４へ警告信号を出力する。出力部２４は、制御部２２の異常判定手段２２６から警告信号を受信すると、警告出力処理を行う（Ｓ８４）。
警告出力処理では、出力部２４は所定の警告出力を行う。例えば、出力部２４に接続されたスピーカから異常状態の種類に対応する警告音を鳴動させると共に、出力部２４に接続されたディスプレイに当該異常状態の種類を警告メッセージとして出力する。管理者等は、これらの警告音や警告メッセージを確認することにより、監視領域内における異常状態を検知することができる。また、出力部２４は、通信部２５を介して警備装置５に当該異常状態を通知する。これにより、管理装置２は、警備装置５を介して警備センタ装置６に異常状態を通知することができ、外部の監視センタで常駐監視している警備員に対して異常を通知することができる。 If the multi-person flag is ON in step S82 (S82-Yes), the abnormality determination unit 226 outputs a warning signal to the output unit 24. When the output unit 24 receives a warning signal from the abnormality determination unit 226 of the control unit 22, the output unit 24 performs a warning output process (S84).
In the warning output process, the output unit 24 performs a predetermined warning output. For example, a warning sound corresponding to the type of abnormal state is sounded from a speaker connected to the output unit 24, and the type of abnormal state is output to the display connected to the output unit 24 as a warning message. An administrator or the like can detect an abnormal state in the monitoring area by checking these warning sounds and warning messages. Further, the output unit 24 notifies the security device 5 of the abnormal state via the communication unit 25. Thereby, the management apparatus 2 can notify the abnormal state to the security center apparatus 6 via the security apparatus 5, and can notify the abnormality to the security staff who is resident and monitored at the external monitoring center. .

ステップＳ８２で複数人フラグがＯＮとなっていない場合（Ｓ８２−Ｎｏ）、またはステップＳ８４で警報出力処理を行った場合、異常判定手段２２６は、悲鳴フラグをＯＦＦにする処理を行う（Ｓ８６）。 When the multi-person flag is not ON in step S82 (S82-No), or when an alarm output process is performed in step S84, the abnormality determination unit 226 performs a process of turning off the scream flag (S86).

ステップＳ８６で悲鳴フラグをＯＦＦとした場合、またはステップＳ８０で悲鳴フラグがＯＮでなかった場合（Ｓ８０−Ｎｏ）、異常判定処理を終了し、ステップＳ３へ戻り、以後、前述と同様にステップＳ３からステップＳ８の処理を繰り返す。 If the scream flag is turned OFF in step S86, or if the scream flag is not ON in step S80 (S80-No), the abnormality determination process is terminated, and the process returns to step S3. The process of step S8 is repeated.

本発明は、上記実施例に限定されるものではなく、特許請求の範囲に記載した技術的思想の範囲内で、更に種々の異なる実施例で実施されてもよいものである。また、実施例に記載した効果は、これに限定されるものではない。 The present invention is not limited to the above embodiments, and may be implemented in various different embodiments within the scope of the technical idea described in the claims. Moreover, the effect described in the Example is not limited to this.

前記実施例では、人物像算出手段２２３は、追跡手段２２２による追跡処理の結果である移動軌跡情報２１２を参照し、現在時刻において追跡されている人物像の像ＩＤの総数を求めることにより、人物像算出処理を行う。しかし、これに限らず、人物像算出手段２２３は、人物像抽出手段２２１にて抽出された人物像の数をカウントする処理を行うことにより、人物像の総数を求めてもよい。すなわち、追跡手段２２２を省略し、追跡処理を行わない場合であっても、本発明の効果を得ることができる。 In the above-described embodiment, the person image calculation unit 223 refers to the movement trajectory information 212 that is the result of the tracking process by the tracking unit 222 and obtains the total number of image IDs of the person images tracked at the current time. An image calculation process is performed. However, the present invention is not limited to this, and the person image calculation unit 223 may obtain the total number of person images by performing a process of counting the number of person images extracted by the person image extraction unit 221. That is, even when the tracking unit 222 is omitted and the tracking process is not performed, the effect of the present invention can be obtained.

前記実施例では、異常音として悲鳴を検出することにより、監視領域内の異常状態を検知している。しかし、これに限らず、悲鳴以外の異常音、例えば、銃声や怒声等の異常音を検出することにより、監視領域内の異常状態を検知してもよい。例えば、予め設定された異常音量しきい値以上の音量を検出した場合に、簡易的に銃声や怒声等の異常音とみなすことができる。すなわち、ステップＳ５２にて音量を算出した後に、異常音量しきい値と比較し、異常音量しきい値以上で合った場合に、悲鳴フラグをＯＮとすることにより、実現することができる。
また、破壊音は、一般に急速に音量が大きく検出された後、短時間のうちに減衰して小さくなるといったように、検出される音の立ち上がりと立下りに顕著な特徴が現れる。そのため、異常音として破壊音を検出する場合、集音装置４により取得した音響信号の音の立ち上がり率、ピーク音量、立下り減衰率、継続時間を算出し、これらの値と、予め設定された破壊音の特徴を備えた閾値とを照合することにより、破壊音か否かを判定することができる。また、予め金庫等を叩いた時に出る音の周波数成分を分析しておき、検知の際に、音響信号にその固有の周波数成分があるかどうかを評価し、判定結果に加味することで、上記判定の確からしさを大幅に向上させることが可能である。さらに、金庫等の頑丈な作りのものに対しては、一撃で破壊することは困難であるため、同じような破壊音が短い時間内で連続的に複数回観測された場合に異常音として検出を確定するといった条件を課すことも有効である。 In the said Example, the abnormal state in the monitoring area | region is detected by detecting a scream as an abnormal sound. However, the present invention is not limited to this, and an abnormal state in the monitoring area may be detected by detecting an abnormal sound other than a scream, for example, an abnormal sound such as a gunshot or a roar. For example, when a sound volume equal to or higher than a preset abnormal sound volume threshold is detected, it can be simply regarded as an abnormal sound such as a gunshot or anger. That is, after calculating the sound volume in step S52, it can be realized by comparing with the abnormal sound volume threshold value and turning on the scream flag when the sound volume exceeds the abnormal sound volume threshold value.
Further, the destructive sound generally has a remarkable feature in the rise and fall of the detected sound, such that the loudness is generally detected rapidly and then attenuated and reduced in a short time. Therefore, when a destructive sound is detected as an abnormal sound, the sound rise rate, peak volume, fall decay rate, and duration of the sound signal acquired by the sound collector 4 are calculated, and these values are set in advance. It is possible to determine whether or not the sound is a destructive sound by collating with a threshold having a destructive sound characteristic. In addition, by analyzing the frequency component of the sound that is emitted when a safe or the like is struck in advance, whether or not there is a specific frequency component in the acoustic signal at the time of detection, It is possible to greatly improve the certainty of determination. In addition, since it is difficult to destroy a strong box such as a safe, it is detected as an abnormal sound when a similar destructive sound is observed multiple times in a short time. It is also effective to impose conditions such as determining

以上に本発明の実施の形態について説明した。本実施の形態では、異常検知装置１が、本発明の異常検知装置として機能している。また、集音装置４が、本発明の集音部として機能している。また、制御部２２の母音抽出手段２２４、悲鳴判定手段２２５が、本発明の音響分析部として機能している。また、撮像装置３が、本発明の撮像部として機能している。また、制御部２２の人物像抽出手段２２１、追跡手段２２２、人物像算出手段２２３が、本発明の画像分析部として機能している。また、出力部２４が、本発明の出力部として機能している。 The embodiment of the present invention has been described above. In the present embodiment, the abnormality detection device 1 functions as the abnormality detection device of the present invention. The sound collecting device 4 functions as the sound collecting unit of the present invention. Further, the vowel extraction means 224 and the scream determination means 225 of the control unit 22 function as an acoustic analysis unit of the present invention. The imaging device 3 functions as an imaging unit of the present invention. In addition, the person image extraction unit 221, the tracking unit 222, and the person image calculation unit 223 of the control unit 22 function as the image analysis unit of the present invention. The output unit 24 functions as the output unit of the present invention.

１・・・異常検知装置
２・・・管理装置
３・・・撮像装置
４・・・集音装置
５・・・警備装置
６・・・警備センタ装置
７・・・金庫
９・・・壁
２１・・・記憶部
２２・・・制御部
２３・・・入力部
２４・・・出力部
２５・・・通信部
２１１・・・母音特徴量
２１２・・・移動軌跡情報
２２１・・・人物像抽出手段
２２２・・・追跡手段
２２３・・・人物像算出手段
２２４・・・母音抽出手段
２２５・・・悲鳴判定手段
２２６・・・異常判定手段

DESCRIPTION OF SYMBOLS 1 ... Abnormality detection apparatus 2 ... Management apparatus 3 ... Imaging apparatus 4 ... Sound collector 5 ... Security apparatus 6 ... Security center apparatus 7 ... Safe 9 ... Wall 21 ... Storage unit 22 ... Control unit 23 ... Input unit 24 ... Output unit 25 ... Communication unit 211 ... Vowel feature 212 ... Movement trajectory information 221 ... Human image extraction Means 222 ... Tracking means 223 ... Human image calculation means 224 ... Vowel extraction means 225 ... Scream judgment means 226 ... Abnormality judgment means

Claims

A sound collection unit for acquiring an acoustic signal generated in the monitoring area;
Analyzing the acoustic signal collected by the sound collecting unit, and detecting a predetermined abnormal sound; and
An imaging unit that sequentially acquires an image of an area including the monitoring area;
An image analysis unit that extracts a person image from the image and calculates the number of the person images;
An abnormality determination unit that determines an abnormal state indicating occurrence of a burglary case in the monitoring area based on the abnormal sound detection result of the acoustic analysis unit and the number of human images calculated by the image analysis unit;
An abnormality detection device by sound composed of an output unit that outputs a determination result of the abnormality determination unit,
The abnormality determining unit determines an abnormal state when a plurality of the person images are detected by the image analyzing unit when the acoustic analyzing unit determines that the sound is a predetermined abnormal sound. Detection device.

The abnormality detection device according to claim 1, wherein the abnormal sound detected by the acoustic analysis unit is a human screaming voice.