JP2018101317A

JP2018101317A - Abnormality monitoring system

Info

Publication number: JP2018101317A
Application number: JP2016247408A
Authority: JP
Inventors: 真希楠見; Maki Kusumi
Original assignee: Hochiki Corp
Current assignee: Hochiki Corp
Priority date: 2016-12-21
Filing date: 2016-12-21
Publication date: 2018-06-28
Anticipated expiration: 2036-12-21
Also published as: JP6867153B2

Abstract

PROBLEM TO BE SOLVED: To provide an abnormality monitoring system capable of issuing an alarm by generating an image explanation sentence from an image inputted from a monitoring camera and discriminating abnormality such as fire or robbery with high accuracy.SOLUTION: An image of a monitoring target (shrine 14) that is imaged by a monitoring camera 12 is inputted to an image extraction part 16 of a fire detector 10, an image portion having a change is extracted, and a feature amount is extracted by inputting the image portion having a change to a convolution neural network 24 of an image analysis part 20. An image explanation sentence indicating a summary of the image is generated and outputted by inputting the extracted feature amount to a recursive neural network 26. In a case where a fire is discriminated through comparison with fire discrimination words of a thesaurus dictionary 32 by a discriminator 30 of a fire discrimination part 22, a fire discrimination signal is outputted and an alarm is issued.SELECTED DRAWING: Figure 1

Description

本発明は、監視カメラで撮像した監視領域の画像からニューラルネットワークにより火災や盗難を判断して警報させる異常監視システムに関する。 The present invention relates to an anomaly monitoring system that determines whether a fire or theft is detected by a neural network from an image of a monitoring area imaged by a monitoring camera and issues an alarm.

従来、煙感知器や熱感知器など、特定の物理量を監視するセンサを用いて、火災を判定するシステムが実用化されている。 2. Description of the Related Art Conventionally, a system for determining a fire using a sensor that monitors a specific physical quantity such as a smoke sensor or a heat sensor has been put into practical use.

一方、監視カメラで撮像した監視領域の画像に対し画像処理を施すことにより、火災を検知するようにした様々な装置やシステムが提案されている。 On the other hand, various devices and systems have been proposed in which a fire is detected by performing image processing on an image of a monitoring area captured by a monitoring camera.

また、人感センサや監視カメラを用いて侵入や犯罪行為の監視を行うなど、防犯分野に於いても様々な装置やシステムが提案されている。 Various devices and systems have also been proposed in the crime prevention field, such as monitoring intrusions and criminal acts using human sensors and surveillance cameras.

このような防災、防犯に関するシステムにあっては、異常に対する初期対応の観点から異常の早期発見が重要である。 In such disaster prevention and crime prevention systems, early detection of abnormalities is important from the viewpoint of initial response to abnormalities.

このため従来装置（特許文献１）にあっては、画像から火災に伴う煙により起きる現象として、透過率又はコントラストの低下、輝度値の特定値への収束、輝度分布範囲が狭まって輝度の分散の低下、煙による輝度の平均値の変化、エッジの総和量の低下、低周波帯域の強度増加を導出し、これらを総合的に判断して煙の判定を可能としている。 For this reason, in the conventional device (Patent Document 1), as a phenomenon caused by smoke from a fire from an image, a decrease in transmittance or contrast, a convergence of a luminance value to a specific value, a luminance distribution range is narrowed and a luminance distribution is reduced. , A change in average value of brightness due to smoke, a decrease in the total amount of edges, and an increase in intensity in the low frequency band are derived, and these can be comprehensively determined to enable smoke determination.

特開２００８−０４６９１６号公報JP 2008-046916 A 特開平７−２４５７５７号公報JP-A-7-245757 特開２０１０−２３８０２８号公報JP 2010-238028 A 特開平６−３２５２７０号公報JP-A-6-325270

しかしながら、従来の火災に伴う煙の画像から火災を検知する異常監視システムにあっては、煙の画像における透過率、コントラスト、エッジ等の煙の特徴量を予め定め、監視カメラで撮像した画像を処理することで煙による特徴を生成しなければならず、火災による煙の発生状況は多種多様であり、その中に煙としてどのような特徴があるかを見出すことは極めて困難であり、決め手となる特徴がなかなか見いだせないため、監視画像から火災による煙を精度良く判断して火災警報を出力する異常監視システムは実用化の途上にある。 However, in an abnormal monitoring system that detects a fire from a smoke image associated with a conventional fire, the smoke features such as transmittance, contrast, and edge in the smoke image are determined in advance, and the image captured by the monitoring camera is captured. The characteristics of smoke must be generated by processing, and there are a wide variety of smoke generation situations due to fire, and it is extremely difficult to find out what the characteristics of smoke are in it. Therefore, an abnormality monitoring system that accurately determines smoke from fire from a monitoring image and outputs a fire alarm is on the way to practical use.

また、監視カメラにより侵入や犯罪行為を監視するシステムにあっては、実用化はなされているものの、現場の明暗等の状況により監視精度が左右されるなど、改善の余地がある。 In addition, a system for monitoring intrusions and criminal activities with a monitoring camera has been put into practical use, but there is room for improvement in that the monitoring accuracy depends on the conditions of light and darkness in the field.

一方、近年にあっては、例えば多数の猫と犬の画像にラベル付けをし、それを畳み込みニューラルネットワークを備えた多層式のニューラルネットワークに学習させ、所謂ディープラーニングを行い、新たな画像を学習済みの多層式のニューラルネットワークに提示し、それが猫なのか犬なのかを判定する技術が開示されている。 On the other hand, in recent years, for example, images of many cats and dogs are labeled and learned by a multilayer neural network equipped with a convolutional neural network, so-called deep learning is performed, and new images are learned. A technique for presenting an already-described multi-layered neural network and determining whether it is a cat or a dog is disclosed.

また、ディープラーニングは画像解析のみにとどまらず、自然言語処理や行動解析等に用いることが検討されている。 In addition, deep learning is not only used for image analysis but also for use in natural language processing and behavior analysis.

このような多層式のニューラルネットワークを、監視カメラで撮像した監視領域の画像を入力情報とし、入力情報から異常を判定する異常判定器に設け、学習時においては多数の異常時及び非異常時の入力画像を準備して多層式のニューラルネットワークに学習させ、監視時においては入力画像を学習済みの多層式のニューラルネットワークに入力すれば、その出力から異常か否かを高い精度で推定して警報を出力させる異常監視システムが構築可能となる。 Such a multi-layer neural network is provided in an abnormality determination unit that determines an abnormality from the input information using an image of a monitoring region imaged by a monitoring camera as input information. Prepare an input image and train it in a multi-layer neural network.When monitoring, input the input image to a learned multi-layer neural network. Can be constructed.

ところで、人は監視カメラで撮像された画像を見た場合、それがどのような画面かを簡単に説明し、火災や盗難といった状況を判断することができる能力をもつが、これに相当する能力は、多層式のニューラルネットワークによっても実現可能であり、この技術を火災や盗難等の異常監視に利用すれば、監視カメラから監視画像を入力した場合に十分に高い精度で火災や盗難を推定して警報することが期待できる。 By the way, when a person sees an image captured by a surveillance camera, he / she has the ability to easily explain what kind of screen it is and judge the situation such as a fire or theft. Can also be realized by a multilayer neural network, and if this technology is used for monitoring abnormalities such as fire and theft, it can estimate fire and theft with sufficiently high accuracy when monitoring images are input from a surveillance camera. Can be expected to alarm.

本発明は、監視カメラから入力した画像から画像説明文を生成して火災や盗難等の異常を高精度に判定して警報可能とする異常監視システムを提供することを目的とする。 It is an object of the present invention to provide an abnormality monitoring system that generates an image description from an image input from a surveillance camera and can determine an abnormality such as a fire or theft with high accuracy and can give an alarm.

（異常監視システム１）
本発明は、撮像部により撮像された監視領域の画像を入力して異常を判定する異常監視システムに於いて、
撮像部により撮像された監視領域の画像の中の変化のあった画像部分を抽出する画像抽出部と、
画像抽出部で抽出された画像部分を解析し、画像に含まれる要素を単語として出力する多層式のニューラルネットワークで構成された画像解析部と、
画像解析部から出力された画像に含まれる要素の単語を、辞書に予め記憶した所定の異常を示す単語と比較して異常を判定した場合に異常判定信号を出力して警報させる異常判定部と、
が設けられたことを特徴とする。 (Abnormality monitoring system 1)
The present invention is an abnormality monitoring system for determining an abnormality by inputting an image of a monitoring area imaged by an imaging unit,
An image extraction unit that extracts an image part having a change in the image of the monitoring region imaged by the imaging unit;
An image analysis unit configured by a multilayer neural network that analyzes an image portion extracted by the image extraction unit and outputs elements included in the image as words; and
An abnormality determination unit that outputs an abnormality determination signal and warns when an abnormality is determined by comparing a word of an element included in an image output from the image analysis unit with a word indicating a predetermined abnormality stored in advance in a dictionary; ,
Is provided.

（異常監視システム２）
本発明の別の形態にあっては、撮像部により撮像された監視領域の画像を入力して異常を判定する異常視システムに於いて、
撮像部により撮像された画像を解析し、画像に含まれる要素を単語として出力する多層式のニューラルネットワークで構成された画像解析部と、
画像解析部から出力された画像に含まれる要素の単語を、辞書に予め記憶した所定の異常を示す単語と比較して異常を判定した場合に異常判定信号を出力して警報させる異常判定部と、
が設けられたことを特徴とする。 (Abnormality monitoring system 2)
In another embodiment of the present invention, in an abnormal vision system that determines an abnormality by inputting an image of a monitoring area imaged by an imaging unit,
An image analysis unit configured by a multilayer neural network that analyzes an image captured by the imaging unit and outputs an element included in the image as a word;
An abnormality determination unit that outputs an abnormality determination signal and warns when an abnormality is determined by comparing a word of an element included in an image output from the image analysis unit with a word indicating a predetermined abnormality stored in advance in a dictionary; ,
Is provided.

（ＳＶＯ形式等による異常の比較判定）
画像解析部は更に、画像に含まれる要素の単語を元に画像説明文を生成し、
異常判定部は、画像解析部から出力された画像説明文を主語（Ｓ）、動詞（Ｖ）、目的語（Ｏ）及び又は補語（Ｃ）に分類し、辞書に記憶された所定の異常を示す主語、動詞、目的語及び又は補語と比較して、少なくとも何れかが一致又は類似した場合に異常を判定する異常を判定する。 (Comparative judgment of abnormalities by SVO format etc.)
The image analysis unit further generates an image description based on the word of the element included in the image,
The abnormality determination unit classifies the image description output from the image analysis unit into a subject (S), a verb (V), an object (O), or a complement (C), and stores a predetermined abnormality stored in the dictionary. An anomaly that determines an anomaly is determined when at least one matches or resembles the subject, verb, object, and / or complement that is shown.

（画像説明文による警報）
異常判定部は、異常判定時に、画像説明文を出力する。 (Alarm by image description)
The abnormality determination unit outputs an image explanation at the time of abnormality determination.

（ニューラルネットワークの機能構成）
画像解析部の多層式のニューラルネットワークは、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）と隠れ層にロング・ショートターム・メモリ（ＬＳＴＭ：ＬｏｎｇＳｈｏｒｔ−ＴｅｒｍＭｅｍｏｒｙ）を用いた多層式の再帰型ニューラルネットワーク（ＲＮＮ：ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）で構成され、
畳み込みニューラルネットワークは、入力画像の特徴量を抽出して出力し、
再帰型ニューラルネットワークは、畳み込みニューラルネットワークから出力された特徴量を入力して画像説明文を生成して出力する。 (Functional configuration of neural network)
The multi-layer neural network of the image analysis unit is a multi-layer recursive neural network using a convolutional neural network (CNN) and a long short-term memory (LSTM) in a hidden layer. (RNN: Recurrent Natural Network)
The convolutional neural network extracts and outputs the features of the input image,
The recursive neural network receives the feature amount output from the convolutional neural network, generates an image description, and outputs it.

（畳み込みニューラルネットワークと再帰型ニューラルネットワークの学習）
畳み込みニューラルネットワークは、教師なしの学習画像の入力により学習され、
再帰型ニューラルネットワークは、学習済みの畳み込みニューラルネットワークに学習画像を入力した場合に所定の中間層から出力される特徴量と学習画像に対応した所定の学習画像説明文を入力して学習される。 (Learning convolutional and recurrent neural networks)
A convolutional neural network is learned by inputting an unsupervised learning image,
The recursive neural network is learned by inputting a feature amount output from a predetermined intermediate layer and a predetermined learning image description corresponding to the learning image when a learning image is input to the learned convolutional neural network.

（火災監視）
画像解析部の多層式のニューラルネットワークは所定の火災学習画像と所定の画像説明文を入力して学習されており、撮像部から入力された監視領域の画像の解析により画像説明文を生成して出力し、
異常判定部は、画像解析部から出力された画像説明文の単語を、辞書に予め記憶した所定の火災を示す単語と比較して火災を判定した場合に火災判定信号を出力して警報させる。 (Fire monitoring)
The multilayer neural network of the image analysis unit is learned by inputting a predetermined fire learning image and a predetermined image description, and generates an image description by analyzing the image of the monitoring region input from the imaging unit. Output,
The abnormality determination unit outputs a fire determination signal and gives an alarm when the word of the image description output from the image analysis unit is compared with a word indicating a predetermined fire stored in the dictionary in advance to determine a fire.

（盗難監視）
画像解析部の多層式のニューラルネットワークは所定の盗難学習画像と所定の画像説明文を入力して学習されており、撮像部から入力された監視領域の画像の解析により画像説明文を生成して出力し、
異常判定部は、画像解析部から出力された画像説明文の単語を、辞書に予め記憶した所定の盗難を示す単語と比較して盗難を判定した場合に盗難判定信号を出力して警報させる。 (Theft monitoring)
The multilayer neural network of the image analysis unit is learned by inputting a predetermined theft learning image and a predetermined image description, and generates an image description by analyzing the image of the monitoring area input from the imaging unit. Output,
The abnormality determination unit outputs a theft determination signal and gives an alarm when the word of the image description output from the image analysis unit is compared with a word indicating a predetermined theft stored in the dictionary in advance to determine the theft.

（時間による判定基準の変化）
異常判定器は時間帯によって動作を異ならせる。 (Changes in judgment criteria over time)
The operation of the abnormality determiner varies depending on the time zone.

（異常監視システム１の効果）
本発明は、撮像部により撮像された監視領域の画像を入力して異常を判定する異常監視システムに於いて、撮像部により撮像された監視領域の画像の中の変化のあった画像部分を抽出する画像抽出部と、画像抽出部で抽出された画像部分を解析し、画像に含まれる要素を単語として出力する多層式のニューラルネットワークで構成された画像解析部と、画像解析部から出力された画像に含まれる要素の単語を、辞書に予め記憶した所定の異常を示す単語と比較して異常を判定した場合に異常判定信号を出力して警報させる異常判定部と、が設けられたため、例えば監視カメラで遺産的な価値のある神社を監視する場合、異常の原因となる事象の画像全体に占める割合は多くないことが予想され、このような監視画像を入力して解析した場合、画像全体に占める割合が少ない異常原因となる事象を判定できない恐れがあるが、本発明にあっては、監視カメラにより撮像された監視画像の中の変化のあった画像部分を抽出して解析することで、異常の原因となった事象を示す１又は複数の対象物の特徴が抽出されて単語化され、抽出された単語を辞書の異常を示す所定の単語と比較して一致又は類似した場合に異常判定信号を出力して警報させることができるので、異常の原因となった事象を示す単語を確実に出力することができ、異常の判定精度を向上可能とする。また、単語の組み合わせなどから特徴が異常に該当するかを判定することにより、真の異常と誤報との異常の判定精度を向上可能とする。 (Effect of abnormality monitoring system 1)
The present invention extracts an image portion having a change in an image of a monitoring area imaged by an imaging unit in an abnormality monitoring system that determines an abnormality by inputting an image of the monitoring area imaged by an imaging unit. Output from the image analysis unit, the image analysis unit configured to analyze the image portion extracted by the image extraction unit and output the elements included in the image as words, and the image analysis unit An abnormality determination unit that outputs an abnormality determination signal and alarms when an abnormality is determined by comparing the word of an element included in the image with a word indicating a predetermined abnormality stored in the dictionary in advance, for example, When monitoring a heritage-valued shrine with a surveillance camera, it is expected that the percentage of events that cause anomalies will not be large in the overall image. Although it may not be possible to determine an event that causes an abnormality with a small proportion of the entire image, in the present invention, an image portion having a change in a monitoring image captured by a monitoring camera is extracted and analyzed. If the features of one or more objects that indicate the event that caused the abnormality are extracted and worded, and the extracted word is compared or matched with a predetermined word that indicates an abnormality in the dictionary Since an abnormality determination signal can be output and an alarm can be issued, a word indicating the event that caused the abnormality can be reliably output, and the abnormality determination accuracy can be improved. Further, by determining whether a feature corresponds to an abnormality from a combination of words or the like, it is possible to improve the accuracy of determining an abnormality between a true abnormality and a false alarm.

（異常監視システム２の効果）
本発明の別の形態にあっては、撮像部により撮像された監視領域の画像を入力して異常を判定する異常視システムに於いて、撮像部により撮像された画像を解析し、画像に含まれる要素を単語として出力する多層式のニューラルネットワークで構成された画像解析部と、画像解析部から出力された画像に含まれる要素の単語を、辞書に予め記憶した所定の異常を示す単語と比較して異常を判定した場合に異常判定信号を出力して警報させる異常判定部と、が設けられたため、監視カメラにより撮像された監視領域の画像を多層式のニューラルネットワークで構成された画像解析部に入力すると、画像中に存在する１又は複数の対象物の特徴が抽出されて単語化され、例えば「神社・屋根・異常・ある」といった単語が生成され、この単語を辞書の異常を示す所定の単語と比較して一致又は類似した場合に異常判定信号を出力して警報させることができ、単語の組み合わせなどから特徴が異常に該当するかを判定することにより、真の異常と誤報との異常の判定精度を向上可能とする。 (Effect of abnormality monitoring system 2)
In another embodiment of the present invention, in an abnormal vision system that inputs an image of a monitoring area imaged by an imaging unit and determines an abnormality, the image captured by the imaging unit is analyzed and included in the image Compare the word of the element included in the image output from the image analysis unit, which is output from the image analysis unit, with a word indicating a predetermined abnormality stored in advance in the dictionary And an abnormality determination unit that outputs an abnormality determination signal and issues an alarm when an abnormality is determined, so that an image analysis unit configured by a multi-layer neural network for an image of a monitoring area captured by a monitoring camera is provided. , The characteristics of one or more objects present in the image are extracted and worded, for example, a word such as “Shrine / Roof / Abnormal / Yes” is generated, and the word is resigned. When a word matches or resembles a predetermined word indicating an abnormality of the above, an abnormality determination signal can be output and an alarm can be issued, and by determining whether the feature is abnormal from a combination of words, the true It is possible to improve the determination accuracy of abnormality and abnormality.

（ＳＶＯ形式による異常の比較判定による効果）
画像解析部は更に、画像に含まれる要素の単語を元に画像説明文を生成し、異常判定部は、画像解析部から出力された画像説明文を主語（Ｓ）、動詞（Ｖ）、目的語（Ｏ）及び又は補語（Ｃ）に分類し、辞書に記憶された所定の異常を示す主語、動詞及び又は目的語と比較して、少なくとも何れかが一致又は類似した場合異常を判定するようにしたため、画像説明文を構成する主語（Ｓ）、動詞（Ｖ）、目的語（Ｏ）、保護（Ｃ）の各々又はその組み合わせについて辞書に登録された異常を示す主語（Ｓ）、動詞（Ｖ）、目的語（Ｏ）、補語（Ｃ）と比較することで、単語間の関係からより詳細な異常判定基準を適用することが可能となり、真の異常と誤報との異常の判定精度を向上させ、確実に異常を判定して警報可能とする。 (Effects of abnormal comparison by SVO format)
The image analysis unit further generates an image description based on the word of the element included in the image, and the abnormality determination unit determines the image description output from the image analysis unit as the subject (S), verb (V), purpose It is classified into words (O) and / or complements (C), and compared with a subject, verb and / or object indicating a predetermined abnormality stored in the dictionary, an abnormality is determined when at least one matches or is similar. Therefore, the subject (S), verb (() indicating the abnormality registered in the dictionary for each of the subject (S), verb (V), object (O), protection (C) or combination thereof constituting the image description V), the object (O), and the complement (C) can be used to apply more detailed abnormality criteria based on the relationship between words. Improve and make it possible to alarm by judging abnormalities.

（画像説明文による警報による効果）
異常判定部は、異常判定時に、画像説明文を出力させるようにしたため、監視者は監視領域で発生した異常を理解することが可能となる。例えば監視室等のモニタで監視員が監視を行うようなシステムの場合、異常判定時に警報が出力されて監視員が当該画面を確認したときに、モニタに撮像部により撮像された監視領域の画像と画像説明文が表示されるため、画像説明文で概要を確認しながら画像を確認可能となり、状況の理解が容易となる。 (Effect of warning by image description)
Since the abnormality determination unit outputs an image description when determining the abnormality, the monitor can understand the abnormality occurring in the monitoring area. For example, in the case of a system in which a monitor is monitored by a monitor such as a monitoring room, an image of the monitoring area captured by the imaging unit on the monitor when an alarm is output when the abnormality is determined and the monitor confirms the screen Since the image description is displayed, it is possible to confirm the image while confirming the outline with the image description so that the situation can be easily understood.

（ニューラルネットワークの機能構成による効果）
また、画像解析部の多層式のニューラルネットワークは、畳み込みニューラルネットワークと隠れ層にロング・ショートターム・メモリを用いた再帰型ニューラルネットワークで構成され、畳み込みニューラルネットワークは、入力画像の特徴量を抽出して出力し、再帰型ニューラルネットワークは、畳み込みニューラルネットワークから出力された特徴量を入力して画像説明文を生成して出力するようにしたため、畳み込みニューラルネットワークにより入力画像の特徴が自動的に抽出されることで、監視領域の入力情報から前処理により火災入力情報の特徴、例えば、画像に於いては輪郭等を抽出するような前処理を必要とすることなく入力情報の特徴が抽出され、引き続いて行う再帰型ニューラルネットワークにより高い精度で画像説明文を推定して生成可能とする。 (Effects of the functional configuration of the neural network)
The multi-layer neural network of the image analysis unit consists of a convolutional neural network and a recursive neural network that uses long / short term memory in the hidden layer. The convolutional neural network extracts the features of the input image. In the recursive neural network, the feature amount output from the convolutional neural network is input and the image description is generated and output, so the features of the input image are automatically extracted by the convolutional neural network. Thus, the features of the fire input information are extracted from the input information of the monitoring area by the preprocessing, for example, the features of the input information are extracted without the need for preprocessing such as extracting an outline in the image, Recursive neural network It can be generated to estimate the statement.

また、畳み込みニューラルネットワークと再帰型ニューラルネットワークで構成される画像解析部は、前述した異常監視システム１のように、監視カメラで撮像した画像の中の変化のあった部分を抽出して入力する代わりに、前述した異常監視システム２のように、監視カメラで撮像した画像をそのまま入力した場合にも、画像の特徴を抽出して高い精度で画像説明文が生成され、画像説明文の単語を辞書単語と比較することで、確実に異常を判定して警報可能とする。 In addition, the image analysis unit configured by the convolutional neural network and the recursive neural network can extract and input a changed part in the image captured by the monitoring camera as in the abnormality monitoring system 1 described above. In addition, as in the above-described abnormality monitoring system 2, even when an image captured by the monitoring camera is input as it is, the image feature is extracted and the image description is generated with high accuracy, and the word of the image description is dictionaryd. By comparing with words, abnormalities are reliably determined and alarms are possible.

（畳み込みニューラルネットワークと再帰型ニューラルネットワークの学習の効果）
また、畳み込みニューラルネットワークは、教師なしの学習画像の入力により学習され、再帰型ニューラルネットワークは、学習済みの畳み込みニューラルネットワークに学習画像を入力した場合に所定の中間層から出力される特徴量と学習画像に対応した所定の学習画像説明文を入力して学習されるため、異常監視に関連する学習画像とその画像説明文をペアとした多数の学習データセットを準備することで、畳み込みニューラルネットワークによる画像の特徴量を抽出するための機能及び再帰型ニューラルネットワークによる画像の特徴量を画像説明文に変換する機能を学習させ、監視カメラで撮像した監視領域の画像の解析により、高い精度で画像説明文を推定して生成可能とする。 (Effects of learning with convolutional neural networks and recurrent neural networks)
In addition, the convolutional neural network is learned by inputting an unsupervised learning image, and the recursive neural network is learned with a feature amount output from a predetermined intermediate layer when a learning image is input to a learned convolutional neural network. Since learning is performed by inputting a predetermined learning image description corresponding to the image, by preparing a large number of learning data sets in which the learning image related to abnormality monitoring and the image description are paired, a convolutional neural network is used. Learn the function for extracting image features and the function to convert image features to image description by recursive neural network, and analyze the image of the surveillance area captured by the surveillance camera with high accuracy. It is possible to estimate and generate a sentence.

（火災監視による効果）
また、画像解析部の多層式のニューラルネットワークは所定の火災学習画像と所定の画像説明文を入力して学習されており、撮像部から入力され監視領域の画像の解析により画像説明文を生成して出力し、異常判定部は、画像解析部から出力された画像説明文の単語を、辞書に予め記憶した所定の火災を示す単語と比較して火災を判定した場合に火災判定信号を出力して警報させるようにしたため、監視カメラで監視している例えば神社から煙の出ている画像が入力された場合、画像解析部の多層式のニューラルネットワークにより例えば「神社の屋根から煙が出ています。」といった画像説明文が出力され、辞書に登録されている「神社」、「屋根」、「煙」といった単語との比較により火災を示す画像説明文であることを判定し、火災判定信号を出力して火災警報を出力させることができる。 (Effects of fire monitoring)
The multilayer neural network of the image analysis unit is learned by inputting a predetermined fire learning image and a predetermined image description, and generates an image description by analyzing the image of the monitoring area input from the imaging unit. The abnormality determination unit outputs a fire determination signal when the word of the image description output from the image analysis unit is compared with a word indicating a predetermined fire stored in the dictionary in advance to determine a fire. For example, when a smoke image from a shrine that is monitored by a surveillance camera is input, a multi-layer neural network in the image analysis unit, for example, “Smoke is coming from the roof of the shrine. Image description such as “Shrine”, “Roof”, and “Smoke” registered in the dictionary is determined to be an image description indicating a fire. It can be output fire alarm and outputs the signal.

また、監視カメラで監視している例えば神社の近くでライターを付けている人の画像が入力された場合、画像解析部のニューラルネットワークにより例えば「神社の近くの人がライターを付けています。」といった画像説明文が出力され、辞書に登録されている「神社」、「ライター」、「付けて」といった単語との比較により放火を示す画像説明文であることを判定し、火災判定信号を出力して火災警報を出力させることができる。 In addition, when an image of a person wearing a lighter near a shrine, for example, monitored by a surveillance camera, is input, for example, "A person near the shrine has a lighter attached" by the neural network of the image analysis unit. Is output, and it is determined that it is an image description that indicates arson by comparing with words such as “shrine”, “writer”, and “attach” registered in the dictionary, and a fire determination signal is output Fire alarm can be output.

（盗難監視による効果）
また、画像解析部の多層式のニューラルネットワークは所定の盗難習画像と所定の画像説明文を入力して学習されており、撮像部から入力された監視領域の画像の解析により画像説明文を生成して出力し、異常判定部は、画像解析部から出力された画像説明文の単語を、辞書に予め記憶した所定の盗難を示す単語と比較して盗難を判定した場合に盗難判定信号を出力して警報させるようにしたため、監視カメラで監視している例えば神社の賽銭箱の近くを行き来する不審者の画像が入力された場合、画像解析部のニューラルネットワークにより例えば「神社の賽銭箱の近くに不審者がいるようです。」といった画像説明文が出力され、辞書に登録されている「神社」、「賽銭箱」、「不審者」といった単語との比較により盗難を示す画像説明文であることを判定し、盗難判定信号を出力して盗難警報を出力させることができる。 (Effect of theft monitoring)
The multilayer neural network of the image analysis unit is learned by inputting a predetermined theft training image and a predetermined image description, and generates an image description by analyzing the image of the monitoring area input from the imaging unit. The abnormality determination unit outputs a theft determination signal when the word of the image description output from the image analysis unit is compared with a word indicating a predetermined theft stored in the dictionary in advance to determine the theft. For example, when an image of a suspicious person who is traveling by a surveillance camera, for example, near a shrine money box, is input by the neural network of the image analysis unit, for example, “near a shrine money box” An image description such as “There seems to be a suspicious person” is output, and an image description that indicates theft by comparing with words such as “Shrine”, “Purchase box”, and “Suspicious person” registered in the dictionary Determines that it can output a theft warning by outputting a theft determination signal.

（時間による判定基準の変化による効果）
異常判定器は、時間帯によって動作を異ならせるようにしたため、時間によっては異常となる事象に対して、当該時間の間のみ異常と判定することが可能となる。また、時間帯によって監視したい対象・行動が変化するような場所において対応が可能となる。 (Effects of changes in judgment criteria over time)
Since the operation of the abnormality determiner is made different depending on the time zone, an event that becomes abnormal depending on the time can be determined to be abnormal only during that time. In addition, it is possible to cope with a place where the target / behavior to be monitored changes depending on the time zone.

監視カメラと神社の火災を監視する異常監視システムの概略を示した説明図Explanatory diagram showing an outline of an anomaly monitoring system that monitors the fire of a surveillance camera and shrine 図１の画像解析部に設けられた畳み込みニューラルネットワークと再帰型ニューラルネットワークの機能構成を示した説明図Explanatory drawing which showed the function structure of the convolution neural network and recursive neural network provided in the image analysis part of FIG. 監視カメラにより撮像した神社の監視画像の画像解析による火災判定処理を示した説明図Explanatory drawing which showed the fire judgment processing by image analysis of the monitoring image of the shrine imaged with the surveillance camera 図１の火災検出器により火災監視制御を示したフローチャートFlow chart showing fire monitoring control by the fire detector of FIG.

［火災監視システムの概要］
図１は監視カメラにより火災を監視する異常監視システムの概略を示した説明図である。 [Outline of fire monitoring system]
FIG. 1 is an explanatory diagram showing an outline of an abnormality monitoring system for monitoring a fire with a monitoring camera.

図１に示すように、監視カメラ１２が所定の監視対象、例えば歴史的な遺産としての価値がある神社１４を監視するため設置され、神社１４を含む監視領域を監視カメラ１２により動画撮影している。 As shown in FIG. 1, a monitoring camera 12 is installed to monitor a predetermined monitoring target, for example, a shrine 14 that is valuable as a historical heritage, and the monitoring area including the shrine 14 is video-captured by the monitoring camera 12. Yes.

監視カメラ１２はＲＧＢのカラー画像を例えば３０フレーム／秒で撮像して動画として出力する。また、１フレームは例えば縦横４０５６×４０５６ピクセルの画素配置となる。 The monitoring camera 12 captures an RGB color image at, for example, 30 frames / second and outputs it as a moving image. One frame has a pixel arrangement of, for example, vertical and horizontal 4056 × 4056 pixels.

監視対象となる神社１４の管理棟等には、火災検出器１０が設置され、監視カメラ１２が信号ケーブル１５により接続されており、監視カメラ１２で撮像された動画画像を入力している。 A fire detector 10 is installed in a management building of a shrine 14 to be monitored, a monitoring camera 12 is connected by a signal cable 15, and a moving image captured by the monitoring camera 12 is input.

火災検出器１０は、画像抽出部１６、学習データセット記憶部１８、画像解析部２０及び火災判定部２２を備える。画像解析部２０は多層式ニューラルネットワークとして畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６を備え、また、畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６を学習するための学習制御部２８を備えている。火災判定部２２は判定器３０とシソーラス辞書３２で構成されている。シソーラス辞書３２には火災判定単語が大分類から小分類に分けて体系的に整理して記憶されている。 The fire detector 10 includes an image extraction unit 16, a learning data set storage unit 18, an image analysis unit 20, and a fire determination unit 22. The image analysis unit 20 includes a convolutional neural network 24 and a recursive neural network 26 as a multilayer neural network, and a learning control unit 28 for learning the convolutional neural network 24 and the recursive neural network 26. The fire determination unit 22 includes a determination device 30 and a thesaurus dictionary 32. In the thesaurus dictionary 32, fire determination words are stored in a systematically divided into a large classification and a small classification.

ここで、火災検出器２０及び火災判定部２２の機能は、多層式のニューラルネットワークの処理に対応したコンピュータ回路のＣＰＵによるプログラムの実行により実現される。 Here, the functions of the fire detector 20 and the fire determination unit 22 are realized by executing a program by the CPU of the computer circuit corresponding to the processing of the multilayer neural network.

画像抽出部１６は監視カメラ１２で撮像された神社１４を含む監視領域の画像を入力し、例えばフレーム単位に前フレームとの差分をとることで、監視画像の中の変化のあった画像部分を抽出して画像解析部２０に出力する。画像抽出部１６による変化のあった画像部分の抽出は、変化のあった画像領域の中心位置を起点に上下左右に所定の画素数の範囲を抽出して出力する。また、画像抽出部１６は監視画像を縦横所定サイズのブロック画像に分割し、変化のあった１又は複数のブロック画像を抽出して出力するようにしても良い。 The image extraction unit 16 inputs an image of the monitoring area including the shrine 14 imaged by the monitoring camera 12, and by taking a difference from the previous frame in units of frames, for example, an image portion that has changed in the monitoring image is obtained. Extracted and output to the image analysis unit 20. Extraction of a changed image portion by the image extraction unit 16 extracts and outputs a range of a predetermined number of pixels vertically and horizontally starting from the center position of the changed image region. Further, the image extracting unit 16 may divide the monitoring image into block images having a predetermined size in the vertical and horizontal directions, and extract and output one or a plurality of block images that have changed.

画像解析部２０に設けられた畳み込みニューラルネットワーク２４は入力した監視画像の特徴量を抽出して出力する。再帰型ニューラルネットワーク２６は畳み込みニューラルネットワーク２４から出力された特徴量を入力し、入力画像の概要を説明する画像説明文を生成して出力する。 A convolutional neural network 24 provided in the image analysis unit 20 extracts and outputs the feature amount of the input monitoring image. The recursive neural network 26 receives the feature amount output from the convolutional neural network 24, generates and outputs an image description describing the outline of the input image.

火災判定部２２の判定器３０は、画像解析部２０の再帰型ニューラルネットワーク２６から出力された画像説明文を構成する１又は複数の単語と、シソーラス辞書３２に記憶されている複数の火災判定単語とを比較し、画像説明文の単語がシソーラス辞書３２の火災判定単語に一致又は類似した場合に火災を判定して火災判定信号を例えば神社１４に設置されている火災報知設備の火災受信機に出力して火災予兆警報又は火災警報を出力させる。なお、判定器３０における火災判定単語等の異常判定単語は監視者が監視対象や監視行為に応じて適宜変更可能である。 The determination unit 30 of the fire determination unit 22 includes one or more words constituting the image description output from the recursive neural network 26 of the image analysis unit 20 and a plurality of fire determination words stored in the thesaurus dictionary 32. And when the word in the image description matches or resembles the fire judgment word in the thesaurus dictionary 32, the fire is judged and a fire judgment signal is sent to the fire receiver of the fire alarm facility installed at the shrine 14, for example. Output a fire warning or fire alarm. Note that the abnormality determination word such as the fire determination word in the determination device 30 can be appropriately changed by the monitor according to the monitoring target or the monitoring action.

画像解析部２０に設けられた折り畳みニューラルネットワーク２４と再帰型ニューラルネットワーク２６は、学習データセット記憶部１８に予め記憶された学習画像とその画像説明文のペアからなる多数の学習データセットを使用して学習制御部２８により学習されている。 The folding neural network 24 and the recursive neural network 26 provided in the image analysis unit 20 use a large number of learning data sets composed of pairs of learning images and image explanations stored in advance in the learning data set storage unit 18. Learning by the learning control unit 28.

学習データセット記憶部１８に記憶されている学習画像は、例えば、通常監視状態で監視カメラ１２により撮像された神社１４を含む監視画像を、画像抽出部１６による部分画像の抽出サイズで縦及び横方向に所定画素ピッチずつシフトして多数の部分画像を生成し、この部分画像に火災実験等により撮像された火災による炎や煙の画像を合成した画像を学習画像として記憶している。 The learning image stored in the learning data set storage unit 18 is, for example, a monitoring image including the shrine 14 imaged by the monitoring camera 12 in the normal monitoring state in the vertical and horizontal sizes with the partial image extraction size by the image extraction unit 16. A large number of partial images are generated by shifting by a predetermined pixel pitch in the direction, and an image obtained by synthesizing images of flames and smoke caused by fire captured by a fire experiment or the like is stored as a learning image.

また、記憶された多数の学習画像に対応して画像説明文を生成し、学習画像と画像説明部のペアからなる多数の学習データセットを記憶している。例えば、神社１４の屋根の部分から煙が上がっている学習画像に対しては「神社の屋根から煙が出ている。」といった画像説明文が記憶されている。 In addition, image explanations are generated corresponding to a large number of stored learning images, and a large number of learning data sets including pairs of learning images and image explanation units are stored. For example, for a learning image in which smoke is rising from the roof portion of a shrine 14, an image description such as “Smoke is coming out from the roof of the shrine” is stored.

学習制御部２８は、まず、学習データセット記憶部１８に記憶されている多数の学習画像を読み出し、畳み込みニューラルネットワーク２４に教師なしの学習画像として入力し、バックプロパゲーション法（逆伝播法）により学習させる。 The learning control unit 28 first reads a large number of learning images stored in the learning data set storage unit 18, inputs them as unsupervised learning images to the convolutional neural network 24, and uses the back propagation method (back propagation method). Let them learn.

続いて、学習制御部２８は、学習画像による学習の済んだ畳み込みニューラルネットワーク２４に学習画像を入力すると共に、入力した学習画像とペアになっている画像説明文を再帰型ニューラルネットワーク２６に入力し、再帰型ニューラルネットワーク２６を、学習済みの畳み込みニューラルネットワーク２４から出力された特徴量とその画像説明文により学習させる。 Subsequently, the learning control unit 28 inputs the learning image to the convolutional neural network 24 that has been learned with the learning image, and also inputs the image description paired with the input learning image to the recursive neural network 26. Then, the recursive neural network 26 is learned by the feature amount output from the learned convolutional neural network 24 and its image description.

［画像解析部の多層式ニューラルネットワーク］
図２は図１の画像解析部に設けられた畳み込みニューラルネットワークと再帰型ニューラルネットワークの機能構成を示した説明図である。 [Multilayer neural network in the image analysis section]
FIG. 2 is an explanatory diagram showing a functional configuration of the convolutional neural network and the recursive neural network provided in the image analysis unit of FIG.

（畳み込みニューラルネットワーク）
図２に示すように、畳み込みニューラルネットワーク２４は入力層３４、複数の中間層３６で構成されている。通常の畳み込みニューラルネットワークは最後の中間層３６の後に、入力層、複数の中間層及び出力層を全結合して画像の特徴量から出力を推定する多層式ニューラルネットワークを設けているが、本実施形態は、入力画像の特徴量を抽出するだけで良いことから、後段の全結合の多層式ニューラルネットワークは設けていない。 (Convolutional neural network)
As shown in FIG. 2, the convolutional neural network 24 includes an input layer 34 and a plurality of intermediate layers 36. A normal convolutional neural network is provided with a multilayer neural network that estimates the output from the feature quantity of the image by fully connecting the input layer, the plurality of intermediate layers, and the output layer after the last intermediate layer 36. Since only the feature quantity of the input image needs to be extracted, a multi-layer neural network of the subsequent stage is not provided.

畳み込みニューラルネットワーク２４は、通常のニューラルネットワークとは少し特徴が異なり、視覚野から生物学的な構造を取り入れている。視覚野には、視野の小区域に対し敏感な小さな細胞の集まりとなる受容野が含まれており、受容野の挙動は、行列の形で重み付けを学習することで模倣できる。この行列は重みフィルタ（カーネル）呼ばれ、生物学的に受容野が果たす役割と同様に、ある画像の類似した小区域に対して敏感になる。 The convolutional neural network 24 has slightly different characteristics from a normal neural network and incorporates biological structures from the visual cortex. The visual cortex contains a receptive field that is a collection of small cells that are sensitive to a small area of the field of view, and the behavior of the receptive field can be imitated by learning weights in the form of a matrix. This matrix is called a weight filter (kernel) and becomes sensitive to similar subregions of an image, as well as the role biologically receptive fields play.

畳み込みニューラルネットワーク２４は、畳み込み演算により、重みフィルタと小区域との間の類似性を表すことでき、この演算を通して、画像の適切な特徴を抽出することができる。 The convolutional neural network 24 can represent the similarity between the weight filter and the small area by a convolution operation, and can extract an appropriate feature of the image through this operation.

畳み込みニューラルネットワーク２４は、入力層３４に入力した入力画像に対し重みフィルタにより畳み込み処理を行う。例えば、重みフィルタは縦横３×３の所定の重み付けがなされた行列フィルタであり、入力画像の各画素にフィルタ中心を位置合わせしながら畳み込み演算を行うことで、入力画像の９画素を次の中間層３６の小区域となる特徴マップの１画素に畳み込み、中間層３６に特徴マップが生成される。 The convolutional neural network 24 performs a convolution process on the input image input to the input layer 34 using a weight filter. For example, the weight filter is a matrix filter with predetermined weights of 3 × 3 in the vertical and horizontal directions, and by performing a convolution operation while aligning the filter center with each pixel of the input image, nine pixels of the input image are subjected to the next intermediate step. A feature map is generated in the intermediate layer 36 by convolution with one pixel of the feature map, which is a small area of the layer 36.

続いて、畳み込み演算により得られた中間層３６の特徴マップに対しプーリングの演算を行う。プーリングの演算は、識別に不必要な特徴量を除去し、識別に必要な特徴量を抽出する処理である。 Subsequently, a pooling operation is performed on the feature map of the intermediate layer 36 obtained by the convolution operation. The pooling calculation is a process of removing feature quantities unnecessary for identification and extracting feature quantities necessary for identification.

続いて、重みフィルタを使用した畳み込み演算とプーリングの演算を各中間層３６毎に繰り返すことで最後の中間層３６まで特徴マップが生成され、本実施形態にあっては、任意の中間層３６に生成された特徴マップを、入力画像の特徴量として再帰型ニューラルネットワーク２６に入力している。 Subsequently, a feature map is generated up to the last intermediate layer 36 by repeating a convolution operation and a pooling operation using a weight filter for each intermediate layer 36. In this embodiment, a feature map is added to any intermediate layer 36. The generated feature map is input to the recursive neural network 26 as the feature amount of the input image.

畳込みニューラルネットワーク２４は、図１に示した学習制御部２８により学習データセット記憶部１８に記憶された学習画像を入力して教師なしの学習を行っており、この学習により、良く似た画像をグループ分けするクラスタリングされた特徴量をもつ画像を生成することができる。 The convolutional neural network 24 performs unsupervised learning by inputting the learning image stored in the learning data set storage unit 18 by the learning control unit 28 shown in FIG. 1, and a similar image is obtained by this learning. An image having clustered feature values can be generated.

（再帰型ニューラルネットワーク）
図２に示す再帰型ニューラルネットワーク２６は、畳み込みニューラルネットワーク２４を用いて抽出した画像の特徴量を、単語ベクトルと共に入力して画像説明文を予測する。 (Recursive neural network)
The recursive neural network 26 shown in FIG. 2 predicts an image description by inputting the image feature amount extracted using the convolutional neural network 24 together with a word vector.

本実施形態の再帰型ニューラルネットワーク２６は、時系列データ対応の深層学習モデルとなるＬＳＴＭ−ＬＭ（ＬｏｎｇＳｈｏｒｔ−ＴｅｒｍＭｅｍｏｒｙ−ＬａｎｇａｇｅＭｏｄｅｌ）を使用している。 The recursive neural network 26 of this embodiment uses LSTM-LM (Long Short-Term Memory-Language Model), which is a deep learning model corresponding to time series data.

通常の再帰型ニューラルネットワークのモデルは、入力層、隠れ層、出力層で構成され、隠れ層の情報を次時刻の入力とすることで過去の経歴を利用した時系列解析をするモデルである。これに対しＬＳＴＭモデルは、過去の文脈となるｔ−１個の単語からｔ番目の単語として各単語が選ばれる確率を算出する。即ち、ＬＳＴＭモデルは１時刻前の隠れ状態となる時刻１〜ｔ−１の単語情報、１時刻前の予測結果となる時刻ｔ−１の単語、及び外部情報の３つを入力とし、逐次的に次の単語の予測を繰り返して文章を生成する。 A normal recursive neural network model is composed of an input layer, a hidden layer, and an output layer, and performs time-series analysis using past history by using hidden layer information as an input at the next time. On the other hand, the LSTM model calculates the probability that each word is selected as the t-th word from t-1 words as the past context. In other words, the LSTM model takes three words, the word information at time 1 to t-1 that becomes a hidden state one hour before, the word at time t-1 that becomes the prediction result before one time, and external information, and sequentially Next, the prediction of the next word is repeated to generate a sentence.

図２の再帰型ニューラルネットワーク２６は、畳み込みニューラルネットワーク２４で抽出された画像の特徴ベクトルをＬＳＴＭ隠れ層３８に入力する行列に変換するＬＳＴＭ入力層３７、レジスタ４０に単語単位に格納された単語Ｓ₀〜Ｓ_N-1をベクトルＷｅＳ₀〜ＷｅＳ_N-1に変換するベクトル変換部４２、Ｎ−１段のＬＳＴＭ隠れ層３８、ＬＳＴＭ隠れ層３８の出力を出現確率ｐ₁〜ｐ_Nに変換する確率変換部４４、単語を出力する確率からコスト関数ｌｏｇＰ₁（ｓ１）〜ｌｏｇｐ_N（Ｓ_N）により算出してコストを最小化するコスト算出部４６で構成される。 The recursive neural network 26 shown in FIG. 2 has an LSTM input layer 37 for converting the feature vector of the image extracted by the convolutional neural network 24 into a matrix to be input to the LSTM hidden layer 38, and the word S stored in the register 40 in units of words. _{0 to} S _N-1 are converted into vectors WeS _{0 to} WeS _N-1 , the outputs of the N-1 stage LSTM hidden layer 38 and the LSTM hidden layer 38 are converted into appearance probabilities p _{1 to} p _N. The probability conversion unit 44 includes a cost calculation unit 46 that calculates a cost function logP ₁ (s1) to logp _N (S _N ) from the probability of outputting a word and minimizes the cost.

（再帰型ニューラルネットワークの学習）
再帰型ニューラルネットワーク２６の学習対象は、ベクトル変換部４２とＬＳＴＭ隠れ層３８であり、畳み込みニューラルネットワーク２４からの特徴量は、学習済みのパラメータをそのまま使用する。 (Recursive neural network learning)
The learning target of the recursive neural network 26 is the vector conversion unit 42 and the LSTM hidden layer 38, and the learned parameters are used as they are as the feature amounts from the convolutional neural network 24.

学習データは、学習画像Ｉとその画像説明文の単語列｛Ｓｔ｝（ｔ＝０，・・・Ｎ）となり、次の手順で行う。
（１）画像Ｉを畳み込みニューラルネットワーク２４に入力し、特定の中間層３６の出力を特徴ベクトルとして取り出す。
（２）特徴ベクトルをＬＳＴＭ隠れ層３８に入力する。
（３）単語列Ｓｔをｔ＝０からｔ＝Ｎ−１まで順に入力し、それぞれのステップで確率ｐ_t+1を得る。
（４）単語Ｓｔ＋１を出力する確率ｐｔ＋１（Ｓｔ＋１）から求まるコストを最小化する。 The learning data is a learning image I and a word string {St} (t = 0,... N) of the image description and is performed in the following procedure.
(1) The image I is input to the convolutional neural network 24, and the output of a specific intermediate layer 36 is extracted as a feature vector.
(2) The feature vector is input to the LSTM hidden layer 38.
(3) The word string St is sequentially input from t = 0 to t = N−1, and the probability p _{t + 1} is obtained at each step.
(4) The cost obtained from the probability pt + 1 (St + 1) of outputting the word St + 1 is minimized.

（画像説明文の生成）
学習済みの畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６を使用して画像説明文を生成する場合には、畳み込みニューラルネットワーク２４に画像を入力して生成した特徴量のベクトルを再帰型ニューラルネットワーク２６に入力し、単語の出現確率の積が高い順に単語列を並べて画像説明文を生成させる。この手順は次のようになる。
（１）画像を畳み込みニューラルネットワーク２４に入力し、特定の中間層３６の出力を特徴ベクトルとして取り出す。
（２）特徴ベクトルをＬＳＴＭ入力層３７からＬＳＴＭ隠れ層３８に入力する。
（３）文の開始記号＜Ｓ＞を、ベクトル変換部４２を使用してベクトルに変換し、ＬＳＴＭ隠れ層３８に入力する。
（４）ＬＳＴＭ隠れ層３８の出力から単語の出現確率が分かるので、上位Ｍ個（例えばＭ＝２０個）の単語を選ぶ。
（５）１つ前のステップで出力した単語を、ベクトル変換部４２を使用してベクトルに変換し、ＬＳＴＭ隠れ層３８に入力する。
（６）ＬＳＴＭ隠れ層３８の出力から、これまでに出力した単語の確率の積を求め、上位Ｍ個の単語列を選択する。
（７）前記（５）と前記（６）の処理を、単語の出力が終端記号になるまで繰り返す。 (Generation of image description)
When an image description is generated using the learned convolutional neural network 24 and the recursive neural network 26, the feature vector generated by inputting the image into the convolutional neural network 24 is input to the recursive neural network 26. The image description is generated by arranging the word strings in descending order of the product of the word appearance probabilities. This procedure is as follows.
(1) An image is input to the convolutional neural network 24, and an output of a specific intermediate layer 36 is extracted as a feature vector.
(2) A feature vector is input from the LSTM input layer 37 to the LSTM hidden layer 38.
(3) The sentence start symbol <S> is converted into a vector using the vector conversion unit 42 and input to the LSTM hidden layer 38.
(4) Since the appearance probability of the word is known from the output of the LSTM hidden layer 38, the top M words (for example, M = 20) are selected.
(5) The word output in the previous step is converted into a vector using the vector conversion unit 42 and input to the LSTM hidden layer 38.
(6) From the output of the LSTM hidden layer 38, the product of the probabilities of the words output so far is obtained, and the top M word strings are selected.
(7) The processes (5) and (6) are repeated until the word output becomes a terminal symbol.

［火災監視制御］
図３は監視カメラにより撮像した神社の監視画像の画像解析による火災判定制御を示した説明図である。 [Fire monitoring control]
FIG. 3 is an explanatory view showing fire determination control by image analysis of a monitoring image of a shrine taken by a monitoring camera.

図１に示した火災判定装置１０は、監視カメラ１２で撮像された監視対象となる神社１４を含む監視画像を入力して監視している。この監視中に、例えば落雷等により図３の監視画像４８に示すように、神社１４の右上の屋根のひさしに近い部分から煙が上がっていたとすると、画像抽出部１６が前フレームとの差分から煙の発生部分の画像変化を判定し、煙を含む所定領域の部分画像５０を抽出する画像抽出処理５２を行い、画像解析部２０に入力して画像解析処理５４を行わせる。 The fire determination device 10 illustrated in FIG. 1 inputs and monitors a monitoring image including a shrine 14 to be monitored, which is captured by the monitoring camera 12. Assuming that smoke has risen from the portion near the eaves of the upper right roof of the shrine 14 as shown in the monitoring image 48 of FIG. 3 due to lightning or the like, for example, the image extraction unit 16 uses the difference from the previous frame. The image change of the smoke generation part is determined, the image extraction process 52 for extracting the partial image 50 of the predetermined area including the smoke is performed, and input to the image analysis unit 20 to perform the image analysis process 54.

この画像解析処理５４にあっては、畳み込みニューラルネットワーク２４により部分画像５０の特徴量が抽出されて再帰型ニューラルネットワーク２６に入力され、例えば「屋根から煙が出ている。」といった画像説明文５８が出力される。 In this image analysis processing 54, the feature quantity of the partial image 50 is extracted by the convolutional neural network 24 and input to the recursive neural network 26, for example, an image description 58 such as “Smoke comes out of the roof”. Is output.

画像解析部２０から出力された画像説明文５８は火災判定部２２の判定器３０に入力され、判定器３０は画像説明文５８の主語Ｓとして「屋根」、動詞Ｖとして「出ている」、目的語Ｏとして「煙」となるＳＶＯ形式に変換し、シソーラス辞書３２に大分類から小分類に分けて体系的に記憶された火災に関連する火災判定単語と比較し、「屋根」、「煙」、「出ている」といった単語の全て、或いは「煙」が火災判定単語に一致又は類似した場合に火災と判定し、火災判定信号を図示しない火災報知設備の火災受信機に出力して火災予兆警報又は火災警報を出力させる。 The image description 58 output from the image analysis unit 20 is input to the determiner 30 of the fire determination unit 22, and the determiner 30 is “out” as the subject S of the image description 58 and the verb V is “out”. The object word O is converted to the SVO format that becomes “smoke”, and is compared with the fire judgment words related to the fire that are systematically stored in the thesaurus dictionary 32 from the large classification to the small classification. ”,“ Out ”, or when“ smoke ”matches or resembles the fire judgment word, it is judged as a fire, and a fire judgment signal is output to the fire receiver of the fire alarm equipment not shown. A warning warning or fire warning is output.

なお、判定器３０は、画像説明文をＳＶＯ形式以外に、補語Ｃを加えたＳＶＯＣ形式に変換してもよいし、単に複数の単語列に変換するだけでも良い。 Note that the determiner 30 may convert the image description to an SVOC format in which a complement C is added in addition to the SVO format, or may simply convert it to a plurality of word strings.

また、火災検出器１０にモニタ装置を設け、火災を判定した場合に監視カメラ１２により撮像している火災が判定された監視領域の画像を画面表示し、火災受信機からの火災予兆警報や火災警報を知った管理責任者や防災担当者による火災確認ができるようにしても良い。この際、モニタ装置に画像説明文５８を表示し、抽出した画像部分を枠で囲み強調表示する。また、複数の監視カメラ１２の映像を切り替えてモニタ装置に表示するシステムの場合、異常を判定した監視カメラ１２の映像を表示するように切り替える。この場合、火災検出器１０の操作部に火災断定スイッチを設け、モニタ画像から火災を確認した場合に火災断定スイッチを操作すると、火災受信機１２に発信機を操作した場合と同様に、火災通報信号を出力し、火災受信機から火災警報を出力させるようにしても良い。 In addition, the fire detector 10 is provided with a monitor device, and when a fire is determined, an image of the monitoring area in which the fire is detected by the monitoring camera 12 is displayed on the screen, and a fire warning alarm or fire from the fire receiver is displayed. A fire manager may be able to confirm the fire by the manager in charge of the alarm or the person in charge of disaster prevention. At this time, the image description 58 is displayed on the monitor device, and the extracted image portion is surrounded by a frame and highlighted. Further, in the case of a system in which videos of a plurality of monitoring cameras 12 are switched and displayed on the monitor device, switching is performed so that the videos of the monitoring cameras 12 that have determined an abnormality are displayed. In this case, if a fire determination switch is provided in the operation section of the fire detector 10 and the fire determination switch is operated when a fire is confirmed from the monitor image, the fire notification is performed in the same manner as when the transmitter is operated on the fire receiver 12. A signal may be output and a fire alarm may be output from the fire receiver.

図４は図１の火災検出器により火災監視制御を示したフローチャートである。図４に示すように、火災検出器１０はステップＳ１で監視カメラ１２により撮像された監視画像を画像抽出部１６に読み込み、ステップＳ３で前フレームとの差分画像から変化のある画像部分を抽出し、ステップＳ３で画像解析部２０に入力し、畳み込みニューラルネットワーク２４による部分画像の特徴量を抽出し、抽出した特徴量を再帰型ニューラルネットワーク２６に入力して画像説明文を出力する。 FIG. 4 is a flowchart showing fire monitoring control by the fire detector of FIG. As shown in FIG. 4, the fire detector 10 reads the monitoring image captured by the monitoring camera 12 in step S1 into the image extraction unit 16, and extracts an image portion having a change from the difference image from the previous frame in step S3. In step S3, the image is input to the image analysis unit 20, the feature amount of the partial image by the convolutional neural network 24 is extracted, the extracted feature amount is input to the recursive neural network 26, and an image description is output.

続いて火災検出器１０は、ステップＳ４で画像解析部２０で生成された画像説明文を火災判定部２２に入力して辞書登録された火災判定単語と比較し、一致又は類似した場合にステップＳ５で火災と判定し、ステップＳ６で火災判定信号を火災受信機に出力して火災予兆警報又は火災警報を出力させる。 Subsequently, the fire detector 10 inputs the image description generated by the image analysis unit 20 in step S4 to the fire determination unit 22 and compares it with the fire determination word registered in the dictionary. In step S6, a fire determination signal is output to the fire receiver to output a fire warning or fire alarm.

［火災検出器の他の実施形態］
図１に示した火災検出器１０は、画像抽出部１６により監視画像の中の変化のあった部分を抽出して画像解析部２０に入力して画像説明文を生成させているが、火災検出器１０の他の実施形態として、画像抽出部１６を除いた構成としても良い。 [Other Embodiments of Fire Detector]
The fire detector 10 shown in FIG. 1 uses the image extraction unit 16 to extract a changed part in the monitoring image and input it to the image analysis unit 20 to generate an image description. As another embodiment of the container 10, the image extracting unit 16 may be omitted.

画像抽出部１６を除いた火災検出器１０にあっては、監視カメラ１２により撮像された監視領域の画像かそのまま画像解析部２０に入力されることになるが、図２に示した画像解析部２０を構成する畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６は、火災による煙等の事象変化を一部に含む監視画像が入力された場合、例えば図３に示した監視画像４８が入力された場合、畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６を十分な数の学習画像とその画像説明文のペアとなるデータセットにより学習していれば、例えば「神社の屋根から煙が出ている。」といった画像説明文を高い精度で推定して出力することができ、画像抽出部１６を不要とした分、機能機構を簡単にすることができる。 In the fire detector 10 excluding the image extraction unit 16, the image of the monitoring area captured by the monitoring camera 12 is directly input to the image analysis unit 20, but the image analysis unit shown in FIG. In the convolutional neural network 24 and the recursive neural network 26 that form 20, when a monitoring image partially including an event change such as smoke due to a fire is input, for example, when the monitoring image 48 shown in FIG. 3 is input If the convolutional neural network 24 and the recursive neural network 26 are learned by a data set that is a pair of a sufficient number of learning images and image explanations thereof, for example, “smoke comes out from the roof of the shrine”. The image description can be estimated and output with high accuracy, and the functional mechanism can be simplified by eliminating the need for the image extraction unit 16. .

［火災検出器による放火監視］
図１に示した火災検出器１０は、監視対象とする神社の火災監視を行っているが、これ以外に、監視対象としている神社の放火監視を行うこともできる。 [Arson monitoring by fire detector]
The fire detector 10 shown in FIG. 1 performs fire monitoring of a shrine to be monitored, but can also perform arson monitoring of a shrine to be monitored.

火災検出器１０で放火監視を行うためには、図１の学習データセット記憶部１８に記憶させる学習画像とその画像説明文のペアからなるデータセットとして、神社の周囲を撮像するように複数台の監視カメラを設置し、火災検出器１０に複数台の監視カメラを順番に切り替えて監視画像を入力し、神社に放火しようと不審者がライターを付けた画像が入力された場合、画像解析部２０の畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６により例えば「男がライターを付けている。」といった画像説明文が生成され、辞書登録された「男」、「ライター」、「付ける」といった放火判定単語との比較により放火を判定し、火災受信機に放火判定信号を出力して火災警報を出力させる。 In order to perform arson monitoring with the fire detector 10, a plurality of sets are used so as to capture the surroundings of the shrine as a data set composed of a pair of learning images and image explanations stored in the learning data set storage unit 18 of FIG. 1. If an image of a suspicious person with a lighter attached to the shrine is input to the fire detector 10 by sequentially switching a plurality of monitoring cameras to the fire detector 10 and inputting a monitoring image, the image analysis unit An image description such as “Men is wearing a writer” is generated by the 20 convolutional neural networks 24 and the recursive neural network 26, and arson determinations such as “Men”, “Writer”, and “Attach” registered in the dictionary are performed. The fire is judged by comparing with words, and a fire alarm is output by outputting a fire judgment signal to the fire receiver.

ここで、火災検出器１０による放火監視は、参拝者が居なくなる出入口が閉鎖された夜間を含む時間帯に動作するように管理されており、参拝者の喫煙場所でのライターの着火は放火とは判定されず、不審者による放火を確実に判定して警報することができる。 Here, the arson monitoring by the fire detector 10 is managed so as to operate in the time zone including the night when the entrance where the worshiper is absent is closed. It is not determined, and it is possible to reliably determine and warn of arson fire.

［盗難判定器による盗難監視］
図１に示した火災検出器１０は、監視対象とする神社の火災監視を行っているが、これ以外に、監視対象としている神社に置かれている賽銭箱等の盗難監視を行う盗難検出器とすることもできる。 [Theft monitoring with theft detector]
The fire detector 10 shown in FIG. 1 performs fire monitoring of a shrine to be monitored, but besides this, a theft detector that performs theft monitoring of a monetary box or the like placed in a shrine that is to be monitored. It can also be.

盗難検出器で盗難監視を行うためには、図１の学習データセット記憶部１８に記憶させる学習画像とその画像説明文のペアからなるデータセットとして、神社に置かれた賽銭箱の周囲を人が行き来するような画像を学習画像として撮像して例えば「不審者が賽銭箱の近くにいます。」といった画像説明分のペアを多数のデータセットとして準備し、畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６を学習させる。 In order to monitor the theft with a theft detector, a data set consisting of a pair of learning images and image descriptions stored in the learning data set storage unit 18 in FIG. As a learning image, a pair of image explanations such as “A suspicious person is near the money box” is prepared as a large number of data sets, and the convolutional neural network 24 and the recursive neural network are prepared. The network 26 is learned.

このような盗難監視のための学習が済んだ盗難検出器に、賽銭箱の周囲を徘徊する男の移った監視画像が入力されると、画像解析部２０の畳み込みニューラルネットワーク２４と再帰型ニューラルネットワーク２６により例えば「人が賽銭箱の近くにいます。」といった画像説明文が生成され、辞書登録された「人」、「賽銭箱」、「いる」といった盗難判定単語との比較により盗難を判定し、盗難受信機に盗難判定信号を出力して盗難警報を出力させる。 When a surveillance image moved by a man who is wandering around the money box is input to such a theft detector that has been learned for theft monitoring, the convolutional neural network 24 and the recursive neural network of the image analysis unit 20 are input. 26, for example, an image description such as “a person is near the money box” is generated, and the theft is determined by comparison with a theft determination word such as “person”, “money box”, and “is” registered in the dictionary. Then, a theft determination signal is output to the theft receiver to output a theft alarm.

また、盗難検出器による盗難監視は、参拝者が居なくなる出入り口が閉鎖された夜間を含む時間帯に動作するように管理されており、これにより参拝者が賽銭箱ら近づいても盗難として判定されず、不審者による盗難を確実に判定して警報することができる。
〔本発明の変形例〕
（システムの運用形態）
上記の実施形態は、火災報知設備の火災受信機に火災判定信号を出力して火災警報させるようにしているが、火災警報の出力方法は一例であり、火災受信機と接続されずに運用されても良い。この場合、判定器で異常が判定された場合、監視カメラの映像等を監視する監視室に警報を出力する。また、監視カメラ・監視室間はローカルなシステムであっても良いし、監視室が複数の現場の監視カメラの映像をインターネット等により通じて集中的に監視するシステムであっても良い。 In addition, theft monitoring by the theft detector is managed so that it operates in the time zone including the night when the entrance where the worshiper disappears is closed. Therefore, it is possible to reliably determine and alarm theft by a suspicious person.
[Modification of the present invention]
(System operation mode)
In the above embodiment, a fire judgment signal is output to the fire receiver of the fire alarm facility so that a fire alarm is generated. However, the fire alarm output method is an example and is operated without being connected to the fire receiver. May be. In this case, if an abnormality is determined by the determiner, an alarm is output to the monitoring room that monitors the video of the monitoring camera. Further, a local system between the monitoring camera and the monitoring room may be used, or a system in which the monitoring room centrally monitors the images of the monitoring cameras at a plurality of sites through the Internet or the like.

（システムの運用対象）
上記の実施形態では神社を監視対象としているが、監視対象はこれに限らない。例えば、店舗において適用し、昼間は盗難に対して異常判定を行うため誤報を少なくするように異常の判定条件を緩めに設定し、夜間は人の存在に対して異常判定を行うため、確実に警報できるように異常の判定精度を厳しめに設定する。 (System operation target)
In the above embodiment, the shrine is the monitoring target, but the monitoring target is not limited to this. For example, it is applied in a store, and abnormal conditions are determined for theft during the day, so the abnormal conditions are set to be loose so as to reduce false alarms. Strictly set the abnormality determination accuracy so that an alarm can be issued.

また例えば、工事現場に適用し、昼間は安全帯未装着等の不安全行為等の禁止行為を異常判定し、夜間は放火行為の監視を行うようにしても良い。 Further, for example, it may be applied to a construction site, and a prohibited action such as an unsafe action such as wearing a safety belt is abnormally determined during the day, and the arson may be monitored at night.

（ネットワークを通じた学習）
畳み込みニューラルネットワークと再帰型ニューラルネットワークは、他の現場で学習されたニューラルネットワークや防災・防犯分野に限らず総合的な学習を行ったニューラルネットワークを用いるようにしても良い。これは、単語抽出と画像説明文の作成において必要な要素が防災・防犯分野に限られないためである。 (Learning through the network)
The convolutional neural network and the recursive neural network may be a neural network learned in another field or a neural network that performs comprehensive learning without being limited to the disaster prevention / crime prevention field. This is because elements necessary for word extraction and creation of image explanations are not limited to the disaster prevention / crime prevention field.

また、畳み込みニューラルネットワークと再帰型ニューラルネットワークは、他の現場の画像をインターネット等経由して収集して学習するようにしても良い。この場合、学習用のサーバーを設けて各現場の学習画像を収集して学習し、その学習結果を各現場の異常監視システムに適用させることが好適であるが、これに限らない。 The convolutional neural network and the recursive neural network may collect and learn images from other sites via the Internet or the like. In this case, it is preferable to provide a learning server, collect and learn learning images at each site, and apply the learning results to the abnormality monitoring system at each site, but this is not a limitation.

また、シソーラス辞書と判定器についても、他の現場のシステムと共有され、更新されるようにすることが、他の物件で出た異常に対応できるようにするため、好適である。 In addition, it is preferable that the thesaurus dictionary and the determiner are shared and updated with other on-site systems so as to cope with anomalies occurring in other properties.

（赤外線照明と赤外線画像の撮像）
上記の実施形態は、監視カメラにより監視領域の照明を使用した状態及び又は自然光の状態で監視領域を撮像しているが、赤外線照明装置からの赤外線光を監視領域に照射し、赤外線領域に感度のある監視カメラにより赤外線画像を撮像して畳み込みニューラルネットワーク（赤外線画像により学習済み）と再帰型ニューラルネットワーク（赤外線画像の特徴量と画像説明文により学習済み）により構成された画像解析部入力して赤外線画像の画像説明文を生成し、火災や盗難といった異常を判定して警報するようにしても良い。 (Infrared illumination and imaging of infrared images)
In the above-described embodiment, the monitoring area is imaged in a state where the monitoring area is used by the monitoring camera and / or in a natural light state. An infrared image is captured by a certain surveillance camera and input to an image analysis unit composed of a convolutional neural network (learned from an infrared image) and a recursive neural network (learned from infrared image features and image descriptions). An image description of an infrared image may be generated, and an alarm such as a fire or theft may be determined and alarmed.

このように監視領域の赤外線画像を火災検出器や盗難検出器に入力することで、監視領域の照明状態や昼夜の明るさ変化等に影響されることなく、監視画像を用いた火災や盗難の監視が可能となる。 In this way, by inputting the infrared image of the monitoring area to the fire detector or theft detector, it is possible to detect the fire or theft using the monitoring image without being affected by the lighting condition of the monitoring area or the change in brightness of day and night. Monitoring is possible.

（その他）
また、本発明は上記の実施形態に限定されず、その目的と利点を損なうことのない適宜の変形を含み、更に上記の実施形態に示した数値による限定は受けない。 (Other)
The present invention is not limited to the above-described embodiment, includes appropriate modifications without impairing the object and advantages thereof, and is not limited by the numerical values shown in the above-described embodiment.

１０：火災検出器
１２：監視カメラ
１４：神社
１６：画像抽出部
１８：学習データセット記憶部
２０：画像解析部
２２：火災判定部
２４：畳み込みニューラルネットワーク
２６：再帰型ニューラルネットワーク
２８：学習制御部
３０：判定器
３２：シソーラス辞書
３４：入力装置
３６：中間層
３８：特徴抽出部
３７：ＬＳＴＭ入力層
３８：ＬＳＴＭ隠れ層
４０：単語レジスタ
４２：単語ベクトル変換部
４４：確率変換部
４６：コスト算出部
４８：監視画像
５０：部分画像
５２：画像抽出処理
５４：画像解析処理
５６：画像生成文
６０：火災判定処理 10: fire detector 12: surveillance camera 14: shrine 16: image extraction unit 18: learning data set storage unit 20: image analysis unit 22: fire determination unit 24: convolutional neural network 26: recursive neural network 28: learning control unit 30: Determinator 32: Thesaurus dictionary 34: Input device 36: Intermediate layer 38: Feature extraction unit 37: LSTM input layer 38: LSTM hidden layer 40: Word register 42: Word vector conversion unit 44: Probability conversion unit 46: Cost calculation Unit 48: monitoring image 50: partial image 52: image extraction processing 54: image analysis processing 56: image generation sentence 60: fire determination processing

Claims

In an abnormality monitoring system for determining an abnormality by inputting an image of a monitoring area imaged by an imaging unit,
An image extraction unit that extracts an image part having a change in the image of the monitoring region imaged by the imaging unit;
Analyzing the image portion extracted by the image extraction unit, and an image analysis unit configured with a multilayer neural network that outputs elements included in the image as words,
An abnormality that outputs an abnormality determination signal and gives an alarm when an abnormality is determined by comparing a word of an element included in the image output from the image analysis unit with a word indicating a predetermined abnormality stored in advance in a dictionary A determination unit;
An abnormality monitoring system characterized in that is provided.

In an abnormal vision system that determines an abnormality by inputting an image of a monitoring area imaged by an imaging unit,
An image analysis unit configured with a multilayer neural network that analyzes an image captured by the imaging unit and outputs elements included in the image as words;
An abnormality that outputs an abnormality determination signal and gives an alarm when an abnormality is determined by comparing a word of an element included in the image output from the image analysis unit with a word indicating a predetermined abnormality stored in advance in a dictionary A determination unit;
An abnormality monitoring system characterized in that is provided.

In the abnormality monitoring system according to claim 1 or 2,
The image analysis unit further generates an image description based on a word of an element included in the image;
The abnormality determination unit classifies the image description output from the image analysis unit into a subject, a verb, an object and / or a complement, and a subject, a verb and an object indicating a predetermined abnormality stored in the dictionary And an abnormality monitoring system for determining an abnormality for determining an abnormality when matching or similar to a complement.

In the abnormality monitoring system according to claim 3,
The abnormality monitoring system, wherein the abnormality determination unit outputs the image explanatory text at the time of abnormality determination.

In the abnormality monitoring system according to claim 3,
The multilayer neural network of the image analysis unit is composed of a convolutional neural network and a recursive neural network using a long / short term memory in a hidden layer,
The convolutional neural network extracts and outputs a feature amount of an input image,
The abnormality monitoring system, wherein the recursive neural network receives the feature amount output from the convolutional neural network, generates and outputs the image description.

In the abnormality monitoring system according to claim 5,
The convolutional neural network is learned by inputting an unsupervised learning image,
The recursive neural network inputs a feature amount output from a predetermined intermediate layer when the learning image is input to the learned convolutional neural network and a predetermined learning image description corresponding to the learning image. An anomaly monitoring system characterized by learning.

In the abnormality monitoring system according to claim 3,
The multilayer neural network of the image analysis unit is learned by inputting a predetermined fire learning image and a predetermined image description, and an image description is obtained by analyzing the image of the monitoring area input from the imaging unit. Is generated and output,
The abnormality determination unit outputs a fire determination signal when the word of the image explanatory text output from the image analysis unit is compared with a word indicating a predetermined fire stored in advance in the dictionary to determine the fire. An abnormality monitoring system characterized by alarming.

In the abnormality monitoring system according to claim 3,
The multilayer neural network of the image analysis unit is learned by inputting a predetermined theft learning image and a predetermined image description, and an image description by analyzing the image of the monitoring area input from the imaging unit. Is generated and output,
The abnormality determination unit outputs a theft determination signal when the word of the image description output from the image analysis unit is compared with a word indicating a predetermined theft stored in advance in the dictionary to determine the theft. An abnormality monitoring system characterized by alarming.

In the abnormality monitoring system according to claims 1 to 3,
The abnormality monitoring system characterized in that the operation of the abnormality determiner varies depending on a time zone.