JP6905850B2

JP6905850B2 - Image processing system, imaging device, learning model creation method, information processing device

Info

Publication number: JP6905850B2
Application number: JP2017073011A
Authority: JP
Inventors: 亜沙美岡田; 二見　聡; 聡二見
Original assignee: SOHGO SECURITY SERVICES CO.,LTD.
Current assignee: SOHGO SECURITY SERVICES CO.,LTD.
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2021-07-21
Anticipated expiration: 2037-03-31
Also published as: JP2018173914A

Description

本発明は、画像処理システム、撮像装置、学習モデル作成方法、及び、情報処理装置に関する。 The present invention relates to an image processing system, an imaging device, a learning model creation method, and an information processing device.

監視カメラなどの撮像装置により映像を撮像して記録しておき、関係者が後で映像を確認して異常の有無やどのような異常があったのかを確認する監視方法が知られている。しかし、関係者としては映像を確認する必要性に気づかないと映像を確認できない場合もあり、事後的にしか映像から異常の有無を確認できなかった。また、関係者が映像を全て見ないと異常の有無を確認できない。 There is known a monitoring method in which an image is captured and recorded by an imaging device such as a surveillance camera, and a person concerned later confirms the image to confirm the presence or absence of an abnormality and what kind of abnormality has occurred. However, as a person concerned, there are cases where the video cannot be confirmed without noticing the necessity of confirming the video, and the presence or absence of an abnormality can only be confirmed from the video after the fact. In addition, the presence or absence of an abnormality cannot be confirmed unless the person concerned sees all the images.

このため、リアルタイムに映像を解析して異常を検出する技術が考案されている（例えば、特許文献１参照。）。異常検出とは、日常起こりうる正常な状態から逸脱した現象を検出することをいう。特許文献１には、カメラが撮像した画像を収集する前に不審者判定を行い対象となる画像のみを送信することで輻輳を抑制する方法が開示されている。 Therefore, a technique for analyzing an image in real time to detect an abnormality has been devised (see, for example, Patent Document 1). Anomaly detection refers to detecting a phenomenon that deviates from the normal state that can occur on a daily basis. Patent Document 1 discloses a method of suppressing congestion by determining a suspicious person before collecting an image captured by a camera and transmitting only the target image.

しかし、特許文献１は事前に登録された住人以外が検出された場合に異常であると判定する技術であり、登録内容が適切でないと異常の検出結果も正確でなくなるおそれがある。 However, Patent Document 1 is a technique for determining an abnormality when a resident other than a pre-registered resident is detected, and if the registered contents are not appropriate, the abnormality detection result may not be accurate.

このような不都合に対し、情報処理装置が機械学習により正常な種々の映像を学習しておき、カメラが設置された場所で撮像した画像から異常検出する技術が考案されている（例えば、特許文献２参照。）。特許文献２には、オンライン学習部で絞られた情報の中から更に学習実行者又は学習エージェントにより情報を選択させることで学習に適した情報を選別する方法が開示されている。 In response to such inconvenience, a technique has been devised in which an information processing device learns various normal images by machine learning and abnormally detects an abnormality from an image captured at a place where a camera is installed (for example, a patent document). See 2.). Patent Document 2 discloses a method of selecting information suitable for learning by having a learning executor or a learning agent further select information from the information narrowed down by the online learning unit.

特開２０１３−２２２２１６号公報Japanese Unexamined Patent Publication No. 2013-222216 特開２０１６−１７３６８２号公報Japanese Unexamined Patent Publication No. 2016-173682

しかしながら、従来の技術では、新たに設置されたカメラが異常検出を行うには設置場所で正常な画像を学習する必要があるため、設置後すぐに異常判定を開始することが困難であるという問題があった。例えば、画像処理システムが機械学習した学習モデルを使って不審者を検出する場合、カメラが撮像するシーンに合わせた学習モデルが必要であるが、新たに設置されたカメラには学習に使用する画像データが存在しないため、学習に使用する画像が蓄積されるまでは異常判定できない。 However, in the conventional technology, since it is necessary for the newly installed camera to learn a normal image at the installation location in order to detect an abnormality, there is a problem that it is difficult to start the abnormality determination immediately after the installation. was there. For example, when an image processing system uses a machine-learned learning model to detect a suspicious person, a learning model that matches the scene captured by the camera is required, but the newly installed camera has an image used for learning. Since there is no data, the abnormality cannot be determined until the images used for learning are accumulated.

本発明は、上記課題に鑑み、設置後早期に異常検出が可能になる画像処理システムを提供することを目的とする。 In view of the above problems, an object of the present invention is to provide an image processing system capable of detecting an abnormality at an early stage after installation.

本発明は、撮像装置が撮像した画像から異常検出を行う画像処理システムであって、異なる設置場所に設置された複数の第一の撮像装置から前記画像を取得する画像取得手段と、特徴が類似する前記画像をそれぞれクラスタに分類する分類手段と、前記クラスタに分類された前記画像を学習して異常を検出する異常判定手段を前記クラスタごとに構築する学習手段と、前記第一の撮像装置とは異なる設置場所に設置された第二の撮像装置が撮像した前記画像と類似する前記第一の撮像装置の画像が分類された前記クラスタを判定し、該クラスタに分類された前記画像から構築された前記異常判定手段を決定する決定手段と、を有し、前記決定手段が決定した前記異常判定手段は、前記第二の撮像装置が撮像した前記画像から異常の有無を判定する。

The present invention is an image processing system that detects anomalies from an image captured by an image pickup device, and has similar features to an image acquisition means that acquires the image from a plurality of first image pickup devices installed at different installation locations. A classification means for classifying the images to be classified into clusters, a learning means for constructing an abnormality determination means for learning the images classified into the clusters and detecting an abnormality for each cluster, and the first imaging device. Determines the cluster in which the image of the first imaging device similar to the image captured by the second imaging device installed at a different installation location is classified, and is constructed from the image classified into the cluster. The abnormality determining means determined by the determining means has a determining means for determining the abnormality determining means, and determines the presence or absence of an abnormality from the image captured by the second imaging device .

本発明によれば、設置後早期に異常検出が可能になる画像処理システムを提供することができる。 According to the present invention, it is possible to provide an image processing system capable of detecting an abnormality at an early stage after installation.

機械学習による画像の学習と異常検出を説明する図の一例である。This is an example of a diagram for explaining image learning and abnormality detection by machine learning. 画像処理の概略を説明する図の一例である。It is an example of a figure explaining the outline of image processing. 画像処理システムのシステム構成図の一例である。This is an example of a system configuration diagram of an image processing system. カメラ装置のハードウェア構成図の一例である。This is an example of a hardware configuration diagram of a camera device. 情報処理装置の概略的なハードウェア構成を示したブロック図である。It is a block diagram which showed the schematic hardware configuration of an information processing apparatus. 画像処理システムが備える各機能を図示した機能ブロック図の一例である。This is an example of a functional block diagram illustrating each function of the image processing system. クラスタリングの処理手順を示す流れ図の一例である。This is an example of a flow chart showing a clustering processing procedure. クラスタ設定部が局所的特徴量と大域的特徴量を抽出する手順を示す流れ図を示す。A flow chart showing the procedure for the cluster setting unit to extract local features and global features is shown. K-means法による分類を模式的に示す図である。It is a figure which shows typically the classification by the K-means method. クラスタリングを説明する図の一例である。It is an example of the figure explaining clustering. 分離度Ｒを説明する図の一例である。It is an example of the figure explaining the degree of separation R. ＳＡＥの学習方法を説明する図の一例である。It is an example of the figure explaining the learning method of SAE. 画像から異常の有無を判定する学習モデルを模式的に示す図である。It is a figure which shows typically the learning model which determines the presence or absence of an abnormality from an image. 図１３の学習の手順をより詳細に説明する図の一例である。This is an example of a diagram for explaining the learning procedure of FIG. 13 in more detail. 異常判定に関する情報処理装置の機能ブロック図の一例である。This is an example of a functional block diagram of an information processing device related to abnormality determination. 既存設置場所の異常検出を模式的に説明する図の一例である。It is an example of the figure schematically explaining the abnormality detection of the existing installation place. 新規設置場所の異常検出を模式的に説明する図の一例である。It is an example of the figure schematically explaining the abnormality detection of a new installation place. 判定結果処理部が行う処理を模式的に説明する図の一例である。This is an example of a diagram schematically explaining the processing performed by the determination result processing unit. カメラ装置が異常を判定する場合の構成を説明する図の一例である。This is an example of a diagram for explaining a configuration when the camera device determines an abnormality.

以下、本発明を実施するため形態の一例として、画像処理システム及び画像処理システムが行う学習モデル作成方法等について図面を参照しながら説明する。 Hereinafter, as an example of the embodiment for carrying out the present invention, the image processing system and the learning model creation method performed by the image processing system will be described with reference to the drawings.

＜異常検出の具体例＞
本実施形態を説明するに当たって、異常検出の具体例を説明する。異常とは、日常起こりうる正常から逸脱した現象を言う。 <Specific example of abnormality detection>
In explaining the present embodiment, a specific example of abnormality detection will be described. Abnormality refers to a phenomenon that deviates from normality that can occur on a daily basis.

図１（ａ）は機械学習による正常な画像６の学習を説明する図の一例である。
（１）異常を含まない大量の画像６（正常な画像６）が用意される。この画像６は動画であるものとする。異常を含まないとは、写っている内容が警報の対象とならないことをいう。正常かどうかはカメラが設置された場所と写っている対象によって異なる。例えば、歩行者が写っている画像６では歩行者が歩行する画像が正常な画像であり、壁を乗り越えたりする画像６は異常である。また、車が写っている画像６では人が乗降したり車が停止、駐車、発進したりする画像６が正常な画像６であり、ドアロックを解除する行動が写った画像６は異常である。自転車の場合も車と同様に正常と異常で異なる画像が得られる。
（２）学習部３４は、ディープラーニングやＳＶＭ（サポートベクトルマシン）などの機械学習を使って正常な画像６に写っている対象の形状及び動きが正常な範囲を学習し、学習モデルを構築する。学習モデルとはこの正常な範囲を推定したものであり、入力される対象の形状及び動きが正常か否かを出力する。
（３）学習モデルの一例を説明する。例えば、歩行者が写っている画像６では、背景が静止しているのに対し歩行者が動くので学習部３４は歩行者の姿勢及び移動速度等を自動で生成する。そして、この歩行者の姿勢及び移動速度等の正常な範囲を推定する。なお、学習部３４にとって歩行者という教師信号等は不要であり、動くモノが歩行者（人）であるという認識は必要でない。学習部３４はある形状（ここでは歩行者）の正常な行動パターン（ここでは移動速度）はこのようなものであると数値化するに過ぎない。 FIG. 1A is an example of a diagram illustrating normal learning of the image 6 by machine learning.
(1) A large number of images 6 (normal images 6) containing no abnormalities are prepared. It is assumed that this image 6 is a moving image. Not including anomalies means that the content in the picture is not subject to the alarm. Whether it is normal or not depends on the location where the camera is installed and the object in the picture. For example, in the image 6 in which a pedestrian is shown, the image in which the pedestrian walks is a normal image, and the image 6 in which the pedestrian gets over a wall is abnormal. Further, in the image 6 in which the car is shown, the image 6 in which a person gets on and off, the car stops, parks, and starts is a normal image 6, and the image 6 in which the action of releasing the door lock is shown is abnormal. .. In the case of a bicycle, as with a car, different images can be obtained depending on whether it is normal or abnormal.
(2) The learning unit 34 uses machine learning such as deep learning or SVM (support vector machine) to learn the range in which the shape and movement of the object shown in the normal image 6 are normal, and constructs a learning model. .. The learning model estimates this normal range, and outputs whether or not the shape and movement of the input object are normal.
(3) An example of a learning model will be described. For example, in the image 6 in which a pedestrian is shown, the background is stationary while the pedestrian moves, so the learning unit 34 automatically generates the posture, movement speed, and the like of the pedestrian. Then, a normal range such as the posture and moving speed of the pedestrian is estimated. It should be noted that the learning unit 34 does not need a teacher signal such as a pedestrian, and does not need to recognize that the moving object is a pedestrian (person). The learning unit 34 merely quantifies the normal behavior pattern (here, the moving speed) of a certain shape (here, a pedestrian) as such.

図１（ｂ）は識別装置５が行う異常判定を説明する図の一例である。
（４）異常な画像６として人が壁を乗り越える画像６が撮像される。人が壁を乗り越えることは少なく正常な画像６に含まれていない。
（５）識別装置５は（３）で作成した学習モデルと比較することで異常判定を行う。
（６）例えば、人の姿勢の一致度が２０％、移動速度の一致度が６０％のように判定される。正常な画像６との一致度が低い場合、識別装置５は異常の可能性が高いと判定する。 FIG. 1B is an example of a diagram illustrating an abnormality determination performed by the identification device 5.
(4) An image 6 in which a person climbs over a wall is imaged as an abnormal image 6. People rarely get over the wall and are not included in normal image 6.
(5) The identification device 5 determines an abnormality by comparing it with the learning model created in (3).
(6) For example, it is determined that the degree of agreement between the postures of people is 20% and the degree of agreement between the movement speeds is 60%. If the degree of agreement with the normal image 6 is low, the identification device 5 determines that there is a high possibility of abnormality.

このように、正常な大量の画像６を学習部３４が学習することで、人が判定した場合と同様に画像６を異常であると判定する識別装置を構築できる。このため、本実施形態の画像処理システムは、侵入者が施設に侵入する前にリアルタイムに異常を検出して、警備員を派遣するなどの適切な対応が可能になる。 In this way, by learning a large amount of normal images 6 by the learning unit 34, it is possible to construct an identification device that determines that the image 6 is abnormal as in the case of a human determination. Therefore, the image processing system of the present embodiment can detect an abnormality in real time before an intruder invades the facility and dispatch a guard to take appropriate measures.

しかしながら、一方で、正常な大量の画像６を学習するまでは異常の有無を判定できない。上記のように、画像６が正常かどうかはカメラが設置された場所と写っている対象等によって異なるため、他のカメラが学習に使用した正常な画像６を新規に設置されたカメラの学習にそのまま転用しても、適切な学習モデルが構築されるとは限らない。 However, on the other hand, the presence or absence of an abnormality cannot be determined until a large amount of normal images 6 are learned. As described above, whether or not the image 6 is normal depends on the place where the camera is installed and the object in which the camera is captured. Therefore, the normal image 6 used for learning by another camera can be used for learning the newly installed camera. Even if it is diverted as it is, an appropriate learning model is not always constructed.

＜本実施形態の画像処理の概略＞
そこで、本実施形態の画像処理システムは、既存の多数の設置場所から収集した画像６を類似するシーンでクラスタリングして学習し、新規の設置場所の画像６とシーンが類似する学習モデルを新規の設置場所の学習モデルに使用する。既存設置場所の学習モデルを使用できるので、新規の設置場所にカメラが設置された直後から監視を開始できる。 <Outline of image processing of this embodiment>
Therefore, in the image processing system of the present embodiment, images 6 collected from a large number of existing installation locations are clustered and learned in similar scenes, and a learning model in which the scene is similar to the image 6 in the new installation location is newly developed. Used for the learning model of the installation location. Since the learning model of the existing installation location can be used, monitoring can be started immediately after the camera is installed in the new installation location.

図２は、本実施形態の画像処理の概略を説明する図の一例である。以下、カメラ装置の設置前の準備フェーズと、新規のカメラ装置が撮像した画像６による異常判定フェーズに分けて説明する。 FIG. 2 is an example of a diagram illustrating an outline of image processing of the present embodiment. Hereinafter, the preparation phase before the installation of the camera device and the abnormality determination phase based on the image 6 captured by the new camera device will be described separately.

・準備フェーズ
（１）すでに、カメラ装置１０の複数の既存設置場所８ａ、８ｂ、８ｃ、８ｄ…等がある。複数ある既存設置場所８を区別する場合は符号８にアルファベットを付けて説明する（既存設置場所の数は特に問わない）。これらのカメラ装置１０（第一の撮像装置の一例）が撮像した画像６が画像蓄積部３２に蓄積される。画像蓄積部３２には全ての既存設置場所８のカメラ装置１０が撮像した画像６が蓄積される。
（２）クラスタ設定部３３は蓄積された画像６を類似するシーンごとにクラスタリングする。クラスタリングにより、例えばカメラ装置１０の設置場所及び写っている対象が類似している画像が同じクラスタに分類される。
（３）機械学習部１〜ｎ
機械学習部１〜ｎはクラスタリングされた正常な画像６を機械学習する。図２では機械学習部１〜ｎがクラスタの数（ｎ個）だけあるが、機械学習部１〜ｎは最小で１つあればよい。機械学習部１〜ｎはそれぞれ学習モデルを構築する。各学習モデルは異常判定部３６にて使用される。
（４）次に、クラスタ判定部３５は、新規設置場所７のカメラ装置１０（第二の撮像装置の一例）が撮像する画像６が、既存設置場所８の画像６が分類されたどのクラスタに分類されるかを判定し、分類先のクラスタの学習モデルを適用すると決定する。このように、新規に設置されたカメラ装置１０に適切な学習モデルを設置された直後から決定できる。 -Preparation phase (1) There are already a plurality of existing installation locations 8a, 8b, 8c, 8d, etc. of the camera device 10. When distinguishing a plurality of existing installation locations 8, the reference numeral 8 will be described by adding an alphabet (the number of existing installation locations is not particularly limited). The image 6 captured by these camera devices 10 (an example of the first image pickup device) is stored in the image storage unit 32. The image storage unit 32 stores the images 6 captured by the camera devices 10 at all the existing installation locations 8.
(2) The cluster setting unit 33 clusters the accumulated images 6 for each similar scene. By clustering, for example, images in which the location of the camera device 10 and the objects in the image are similar are classified into the same cluster.
(3) Machine learning unit 1-n
The machine learning units 1 to n machine-learn the clustered normal images 6. In FIG. 2, there are as many machine learning units 1 to n as the number of clusters (n), but the number of machine learning units 1 to n may be at least one. The machine learning units 1 to n each build a learning model. Each learning model is used by the abnormality determination unit 36.
(4) Next, the cluster determination unit 35 determines which cluster the image 6 captured by the camera device 10 (an example of the second imaging device) of the new installation location 7 is classified into the image 6 of the existing installation location 8. Determine if it is classified and decide to apply the learning model of the cluster to which it is classified. In this way, it can be determined immediately after the appropriate learning model is installed in the newly installed camera device 10.

・異常判定フェーズ
（５）既存設置場所８のカメラ装置１０に対応する異常判定部３６は、学習モデルを使って画像６が正常か異常かを判定する。例えば、既存設置場所８ａに適した学習モデルは、既存設置場所８ａの画像６のクラスタリング結果によって自動的に決まる。図２で機械学習部１の学習モデルが既存設置場所８ａと８ｂに提供されているのは、既存設置場所８ａと８ｂの画像６が同じクラスタにクラスタリングされたことを表している。
（６）新規設置場所７のカメラ装置１０に対応する異常判定部３６も、（４）で決定された学習モデルを使って画像６が正常か異常かを判定する。
（７）判定結果処理部３７は、各異常判定部３６から異常の有無の判定結果を取得し、警備員の派遣の指示を行ったり、顧客に画像６を送信したりする。 -Abnormality determination phase (5) The abnormality determination unit 36 corresponding to the camera device 10 at the existing installation location 8 determines whether the image 6 is normal or abnormal using the learning model. For example, a learning model suitable for the existing installation location 8a is automatically determined by the clustering result of the image 6 of the existing installation location 8a. The fact that the learning model of the machine learning unit 1 is provided to the existing installation locations 8a and 8b in FIG. 2 indicates that the images 6 of the existing installation locations 8a and 8b are clustered in the same cluster.
(6) The abnormality determination unit 36 corresponding to the camera device 10 at the new installation location 7 also determines whether the image 6 is normal or abnormal using the learning model determined in (4).
(7) The determination result processing unit 37 acquires the determination result of the presence or absence of an abnormality from each abnormality determination unit 36, gives an instruction to dispatch a guard, and transmits the image 6 to the customer.

このように、新規に設置されたカメラ装置１０の画像６が既存設置場所８の画像と類似するクラスタにクラスタリングされることで、既存設置場所８のカメラ装置１０の画像６から学習された学習モデルを利用できるので、新規のカメラ装置１０が設置された直後から異常の有無を判定可能になる。また、既存設置場所８についても、１つの設置場所の画像から学習された学習モデルより適切な学習モデルを作れる可能性がある。 In this way, the image 6 of the newly installed camera device 10 is clustered into a cluster similar to the image of the existing installation location 8, so that the learning model learned from the image 6 of the camera device 10 of the existing installation location 8 Is available, so that it is possible to determine the presence or absence of an abnormality immediately after the new camera device 10 is installed. Further, with respect to the existing installation location 8, there is a possibility that a learning model more appropriate than the learning model learned from the image of one installation location can be created.

＜用語について＞
クラスタとはかたまりを意味し、本実施形態ではシーンが類似する画像の集まりである。クラスタの他、クラス、グループ、カテゴリ、などと称してもよい。 <Terminology>
A cluster means a group, and in the present embodiment, it is a group of images having similar scenes. In addition to clusters, they may be referred to as classes, groups, categories, and so on.

学習とは、人間が行っている学習能力と同様の機能をコンピュータで実現するため、データから有用な規則、ルール、知識表現、又は判定基準などを抽出することをいう。 Learning refers to extracting useful rules, rules, knowledge representations, judgment criteria, etc. from data in order to realize functions similar to those of human learning ability on a computer.

＜システム構成例＞
図３は、画像処理システム１００のシステム構成図の一例を示す。画像処理システム１００は、ネットワークＮを介して接続されたカメラ装置１０と情報処理装置３０とを主に有する。１つの既存設置場所８には１台以上のカメラ装置１０が設置されている。「既存」とはすでに学習に十分な数の画像６が撮像されている設置場所であることをいう。また、新規設置場所７にも１台以上のカメラ装置１０が設置されている。新規設置場所７とは、過去にはカメラ装置１０がなく新たにカメラ装置１０が設置された場所である。 <System configuration example>
FIG. 3 shows an example of a system configuration diagram of the image processing system 100. The image processing system 100 mainly includes a camera device 10 and an information processing device 30 connected via a network N. One or more camera devices 10 are installed in one existing installation location 8. “Existing” means an installation location where a sufficient number of images 6 have already been captured for learning. In addition, one or more camera devices 10 are installed at the new installation location 7. The new installation location 7 is a location where the camera device 10 has not been installed in the past and the camera device 10 is newly installed.

カメラ装置１０は毎秒数枚以上の画像６を撮像する撮像装置であり、いわゆる動画の撮像が可能である。既存設置場所又は新規設置場所は、主に第三者の侵入が監視されうる場所である。例えば、施設の周辺、個人邸宅の周辺、駐車場、山林、臨海地域、立ち入り禁止区域などが挙げられるがこれらに制限されるものではない。また、屋外に限られず、施設、オフィス、個人邸宅、ホテル、店舗、研究所、倉庫、工場などの屋内に設置される場合もある。 The camera device 10 is an imaging device that captures several or more images 6 per second, and can capture so-called moving images. The existing installation location or new installation location is mainly a location where the intrusion of a third party can be monitored. For example, the area around a facility, the area around a private residence, a parking lot, a forest, a seaside area, an exclusion zone, etc. are included, but are not limited to these. In addition to being installed outdoors, it may be installed indoors such as facilities, offices, private residences, hotels, stores, laboratories, warehouses, and factories.

カメラ装置１０は撮像する機能だけでなく動体を検出する画像処理機能等を有していることが好ましいが、本実施形態では必須でない。カメラ装置１０は、毎秒数枚以上の画像６を撮像して、監視センターに送信する。 It is preferable that the camera device 10 has not only an image capturing function but also an image processing function for detecting a moving object, but it is not essential in the present embodiment. The camera device 10 captures several or more images 6 per second and transmits them to the monitoring center.

ネットワークＮは、カメラ装置１０の設置場所に敷設されている電話回線、ＬＡＮ、ＬＡＮをインターネットに接続するプロバイダのプロバイダネットワーク、及び、回線事業者が提供する回線等により構築されている。ネットワークＮが複数のＬＡＮを有する場合、ネットワークＮはＷＡＮと呼ばれ、インターネットが含まれる場合がある。ネットワークＮは有線又は無線のどちらで構築されてもよく、また、有線と無線が組み合わされていてもよい。また、カメラ装置１０が、３Ｇ、４Ｇ、５Ｇ、ＬＴＥ(Long Term Revolution)などの無線の公衆回線網に接続する場合は、有線の電話回線やＬＡＮを介さずにプロバイダネットワークに接続することができる。 The network N is constructed by a telephone line laid at the installation location of the camera device 10, a LAN, a provider network of a provider that connects the LAN to the Internet, a line provided by a line operator, and the like. When the network N has a plurality of LANs, the network N is called a WAN and may include the Internet. The network N may be constructed either wired or wireless, and may be a combination of wired and wireless. Further, when the camera device 10 is connected to a wireless public line network such as 3G, 4G, 5G, LTE (Long Term Revolution), it can be connected to the provider network without going through a wired telephone line or LAN. ..

監視センター９には情報処理装置３０が設置されている。情報処理装置３０は学習モデルを構築したり、異常判定を行ったりする。異常であるとの判定結果は、画像６と共に顧客の携帯端末に送信されたり、監視員の端末に送信されたりする。顧客や監視員は異常であると判定された画像６を確認して、必要であれば警備員を設置場所に派遣したり警察に通報したりする。 An information processing device 30 is installed in the monitoring center 9. The information processing device 30 builds a learning model and performs abnormality determination. The determination result of abnormality is transmitted to the customer's mobile terminal or the observer's terminal together with the image 6. The customer or the guard confirms the image 6 determined to be abnormal, and dispatches a guard to the installation site or reports to the police if necessary.

情報処理装置３０は、一般的なコンピュータ、ＰＣ（Personal Computer）、サーバなどとよばれる装置である。図３では説明の便宜上、一台の情報処理装置３０を示したが、情報処理装置３０は一台とは限らず、本実施形態で説明される機能を複数の情報処理装置３０が分担して有していてもよい。 The information processing device 30 is a device called a general computer, a PC (Personal Computer), a server, or the like. Although one information processing device 30 is shown in FIG. 3 for convenience of explanation, the information processing device 30 is not limited to one, and a plurality of information processing devices 30 share the functions described in the present embodiment. You may have.

＜ハードウェア構成例＞
<<カメラ装置のハードウェア構成例>>
図４は、カメラ装置１０のハードウェア構成図の一例である。カメラ装置１０は、撮像部１０１、画像処理ＩＣ１０３、ＲＯＭ１０５、ＣＰＵ１０６、ＲＡＭ１０７、及び、通信装置１０８を有している。 <Hardware configuration example>
<< Hardware configuration example of camera device >>
FIG. 4 is an example of a hardware configuration diagram of the camera device 10. The camera device 10 includes an image pickup unit 101, an image processing IC 103, a ROM 105, a CPU 106, a RAM 107, and a communication device 108.

撮像部１０１は、レンズ、絞り、シャッター（メカ・電子）、ＣＭＯＳやＣＣＤなどの固体撮像素子、及び、メモリ等を有するカメラである。画像６は白黒でもカラーでもよい、撮像部１０１はＣＰＵ１０６からの制御により定期的に所定範囲を撮像し、画像６を画像処理ＩＣ１０３に送出する。画像処理ＩＣ１０３は、画像６に動体を検出する画像処理を施す集積回路である。 The image pickup unit 101 is a camera having a lens, an aperture, a shutter (mechanical / electronic), a solid-state image sensor such as CMOS or CCD, a memory, and the like. The image 6 may be black and white or color. The imaging unit 101 periodically images a predetermined range under the control of the CPU 106, and sends the image 6 to the image processing IC 103. The image processing IC 103 is an integrated circuit that performs image processing on the image 6 to detect a moving object.

ＣＰＵ１０６は、ＲＡＭ１０７を作業メモリにしてＲＯＭ１０５に記憶されたプログラムを実行し、カメラ装置１０の全体を制御する。すなわち、撮像部１０１による撮像を制御する。また、通信装置１０８を制御して監視センター９に画像６を送信したりする。 The CPU 106 uses the RAM 107 as a working memory to execute a program stored in the ROM 105 and controls the entire camera device 10. That is, the imaging by the imaging unit 101 is controlled. Further, the communication device 108 is controlled to transmit the image 6 to the monitoring center 9.

通信装置１０８は、ネットワークインタフェースやイーサネット（登録商標）カードと呼ばれ、ネットワークに接続する機能を提供する。無線ＬＡＮのアクセスポイントや携帯電話網の基地局に接続してもよい。 The communication device 108 is called a network interface or an Ethernet (registered trademark) card, and provides a function of connecting to a network. It may be connected to a wireless LAN access point or a base station of a mobile phone network.

<<情報処理装置３０のハードウェア構成例>>
図５は、情報処理装置３０の概略的なハードウェア構成を示したブロック図である。情報処理装置３０は、概ねパーソナル・コンピュータ、ワークステーション又はアプライアンス・サーバとして実装することができる。情報処理装置３０は、ＣＰＵ２０１と、ＣＰＵ２０１が使用するデータの高速アクセスを可能とするメモリ２０２とを備える。ＣＰＵ２０１及びメモリ２０２は、システム・バス２０３を介して、情報処理装置３０の他のデバイス又はドライバ、例えば、グラフィックス・ドライバ２０４及びネットワーク・ドライバ（ＮＩＣ：Network Interface Card）２０５へと接続されている。 << Hardware configuration example of information processing device 30 >>
FIG. 5 is a block diagram showing a schematic hardware configuration of the information processing device 30. The information processing device 30 can be implemented as a personal computer, a workstation, or an appliance server. The information processing device 30 includes a CPU 201 and a memory 202 that enables high-speed access to data used by the CPU 201. The CPU 201 and the memory 202 are connected to other devices or drivers of the information processing device 30, such as the graphics driver 204 and the network driver (NIC) 205, via the system bus 203. ..

ＬＣＤ（表示装置）２０６は、グラフィックス・ドライバ２０４に接続されて、ＣＰＵ２０１による処理結果をモニタする。ＬＣＤ２０６にはタッチパネルが一体に配置されていてもよい。この場合、ユーザは操作手段として指を使って情報処理装置３０を操作できる。 The LCD (display device) 206 is connected to the graphics driver 204 to monitor the processing result by the CPU 201. A touch panel may be integrally arranged on the LCD 206. In this case, the user can operate the information processing device 30 by using a finger as an operation means.

また、ネットワーク・ドライバ２０５は、トランスポート層レベル及び物理層レベルで情報処理装置３０をネットワークＮへと接続して、カメラ装置１０等とのセッションを確立させている。 Further, the network driver 205 connects the information processing device 30 to the network N at the transport layer level and the physical layer level to establish a session with the camera device 10 and the like.

システム・バス２０３には、更にＩ／Ｏバス・ブリッジ２０７が接続されている。Ｉ／Ｏバス・ブリッジ２０７の下流側には、ＰＣＩなどのＩ／Ｏバス２０８を介して、ＩＤＥ、ＡＴＡ、ＡＴＡＰＩ、シリアルＡＴＡ、ＳＣＳＩ、ＵＳＢなどにより、ＨＤＤ２０９などの記憶装置が接続されている。ＨＤＤ２０９の代わりに又はＨＤＤ２０９と共にＳＳＤ（Solid State Drive）を有していてもよい。 An I / O bus bridge 207 is further connected to the system bus 203. A storage device such as HDD 209 is connected to the downstream side of the I / O bus bridge 207 by IDE, ATA, ATAPI, serial ATA, SCSI, USB, etc. via an I / O bus 208 such as PCI. .. You may have SSD (Solid State Drive) instead of HDD 209 or together with HDD 209.

ＨＤＤ２０９は情報処理装置３０の全体を制御するプログラム２０９ｐを記憶している。情報処理装置３０はプログラム２０９ｐを実行することで、監視員の操作を受け付けたり学習したりする。プログラム２０９ｐは、プログラムを配信するサーバから配信される他、ＵＳＢメモリや光記憶媒体などの可搬性の記憶媒体に記憶された状態で配布されてもよい。 The HDD 209 stores a program 209p that controls the entire information processing apparatus 30. The information processing device 30 receives and learns the operation of the observer by executing the program 209p. The program 209p may be distributed from a server that distributes the program, or may be distributed in a state of being stored in a portable storage medium such as a USB memory or an optical storage medium.

また、Ｉ／Ｏバス２０８には、ＵＳＢなどのバスを介して、キーボード及びマウス（ポインティング・デバイスと呼ばれる）などの入力装置２１０が接続され、オペレータによる入力及び指令を受け付けている。 Further, an input device 210 such as a keyboard and a mouse (called a pointing device) is connected to the I / O bus 208 via a bus such as USB, and receives inputs and commands by the operator.

情報処理装置３０はクラウドコンピューティングに対応していてもよい。クラウドコンピューティングとは、特定ハードウェア資源が意識されずにネットワーク上のリソースが利用される利用形態をいう。この場合、図５に示したハードウェア構成は、１つの筐体に収納されていたり一まとまりの装置として備えられていたりする必要はなく、情報処理装置３０が備えていることが好ましいハード的な要素を示す。 The information processing device 30 may support cloud computing. Cloud computing refers to a usage pattern in which resources on a network are used without being aware of specific hardware resources. In this case, the hardware configuration shown in FIG. 5 does not need to be housed in one housing or provided as a group of devices, and it is preferable that the information processing device 30 includes the hardware configuration. Indicates an element.

＜画像処理システム１００の機能構成例＞
図６は、画像処理システム１００が備える各機能を図示した機能ブロック図の一例である。なお、図６では、学習モデルの構築に関する機能を主に説明する。また、各設置場所でカメラ装置１０の機能は共通であるか、又は、異なっていても本実施形態の説明には支障がないとする。このため、図６ではカメラ装置１０を１つだけ示す。 <Example of functional configuration of image processing system 100>
FIG. 6 is an example of a functional block diagram illustrating each function included in the image processing system 100. Note that FIG. 6 mainly describes the functions related to the construction of the learning model. Further, it is assumed that the functions of the camera device 10 are common or different at each installation location, but there is no problem in the explanation of the present embodiment. Therefore, FIG. 6 shows only one camera device 10.

<<カメラ装置の機能構成>>
カメラ装置１０は、画像取得部１１、画像処理部１２、及び通信部１３を有している。これらの各機能は図４に示したＣＰＵ１０６がプログラムを実行してカメラ装置１０のハードウェアと協働することで実現される機能又は手段である。ハード的又はソフト的に実現される機能を明確に区別する必要はなく、これらの機能の一部又は全てがＩＣなどのハードウェア回路により実現されてもよい。 << Functional configuration of camera device >>
The camera device 10 includes an image acquisition unit 11, an image processing unit 12, and a communication unit 13. Each of these functions is a function or means realized by the CPU 106 shown in FIG. 4 executing a program and cooperating with the hardware of the camera device 10. It is not necessary to clearly distinguish the functions realized by hardware or software, and some or all of these functions may be realized by a hardware circuit such as an IC.

また、カメラ装置１０は、図４に示したＲＯＭ又はＲＡＭ１０７により構築される記憶部１９を有している。記憶部１９には画像ＤＢ１９１が構築されている。画像ＤＢ１９１は、カメラ装置１０が直接有していなくてもよく、カメラ装置１０がアクセス可能なネットワーク上の任意の場所にあればよい。 Further, the camera device 10 has a storage unit 19 constructed by the ROM or RAM 107 shown in FIG. An image DB 191 is constructed in the storage unit 19. The image DB 191 does not have to be directly possessed by the camera device 10, and may be located at any location on the network accessible to the camera device 10.

画像取得部１１は、異常の有無に関係なく定期的に画像６を取得する（撮像する）。十分に短い時間間隔で撮像することで動画を撮像できる。なお、撮像間隔は必ずしも一定でなくよく、時間帯や動体検知結果等で撮像間隔を変更してもよい。画像取得部１１は、図４のＣＰＵ１０６がプログラムを実行し撮像部１０１を制御すること等により実現される。 The image acquisition unit 11 periodically acquires (images) the image 6 regardless of the presence or absence of an abnormality. A moving image can be captured by capturing images at sufficiently short time intervals. The imaging interval does not necessarily have to be constant, and the imaging interval may be changed depending on the time zone, the moving object detection result, or the like. The image acquisition unit 11 is realized by the CPU 106 of FIG. 4 executing a program to control the image pickup unit 101 and the like.

画像処理部１２は、クラスタリングに有効な画像処理を行う。例えば、画像６の輝度の平均や撮像時刻から撮像時刻が日中か夜間を判定し各画像６にラベルとして添付する。あるいは、ノイズ除去、トリミング、高解像度化などの処理を行ってもよい。画像処理部１２は、撮像された画像６を画像ＤＢ１９１に記憶する。画像ＤＢ１９１では古い画像６から上書きされ常に一定の新しい画像６が保持されている。画像処理部１２は、図４のＣＰＵ１０６がプログラムを実行すること等により実現される。 The image processing unit 12 performs image processing effective for clustering. For example, it is determined whether the imaging time is daytime or nighttime from the average brightness of the images 6 and the imaging time, and a label is attached to each image 6. Alternatively, processing such as noise removal, trimming, and high resolution may be performed. The image processing unit 12 stores the captured image 6 in the image DB 191. In the image DB 191, the old image 6 is overwritten and a constant new image 6 is always held. The image processing unit 12 is realized by the CPU 106 of FIG. 4 executing a program or the like.

通信部１３は、画像ＤＢ１９１に記憶された画像６を情報処理装置３０に送信する。異常検出のリアルタイム性を保証するため、記憶された画像６はすぐに情報処理装置３０に送信することが好ましい。このため、通信部１３は継続的に画像６を情報処理装置３０に送信するが、一定量の画像６が蓄積されてからまとめて送信してもよい。通信部１３は、図４のＣＰＵ１０６がプログラムを実行して通信装置１０８を制御すること等により実現される。 The communication unit 13 transmits the image 6 stored in the image DB 191 to the information processing device 30. In order to guarantee the real-time property of abnormality detection, it is preferable that the stored image 6 is immediately transmitted to the information processing apparatus 30. Therefore, the communication unit 13 continuously transmits the images 6 to the information processing device 30, but the images 6 may be collectively transmitted after a certain amount of the images 6 are accumulated. The communication unit 13 is realized by the CPU 106 of FIG. 4 executing a program to control the communication device 108 and the like.

<<情報処理装置の機能構成>>
情報処理装置３０は、通信部３１、画像蓄積部３２、クラスタ設定部３３、学習部３４、及び、クラスタ判定部３５を有している。これらの各機能は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行して情報処理装置３０のハードウェアと協働することで実現される機能又は手段である。ハード的又はソフト的に実現される機能を明確に区別する必要はなく、これらの機能の一部又は全てがＩＣなどのハードウェア回路により実現されてもよい。 << Functional configuration of information processing device >>
The information processing device 30 includes a communication unit 31, an image storage unit 32, a cluster setting unit 33, a learning unit 34, and a cluster determination unit 35. Each of these functions is a function or means realized by the CPU 201 shown in FIG. 5 executing the program 209p and cooperating with the hardware of the information processing apparatus 30. It is not necessary to clearly distinguish the functions realized by hardware or software, and some or all of these functions may be realized by a hardware circuit such as an IC.

また、情報処理装置３０は、図５に示したメモリ２０２又はＨＤＤ２０９より構築される記憶部３９を有している。記憶部３９には画像蓄積ＤＢ３９１及び学習モデルＤＢ３９２が構築されている。これらＤＢは、情報処理装置３０が直接有していなくてもよく、情報処理装置３０がアクセス可能なネットワーク上の任意の場所にあればよい。 Further, the information processing device 30 has a storage unit 39 constructed from the memory 202 or HDD 209 shown in FIG. An image storage DB 391 and a learning model DB 392 are constructed in the storage unit 39. These DBs do not have to be directly possessed by the information processing device 30, and may be provided at any location on the network accessible to the information processing device 30.

通信部３１は、カメラ装置１０から画像６を受信する。通信部３１は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行してネットワーク・ドライバ２０５を制御すること等により実現される。 The communication unit 31 receives the image 6 from the camera device 10. The communication unit 31 is realized by the CPU 201 shown in FIG. 5 executing the program 209p to control the network driver 205 and the like.

画像蓄積部３２は、通信部３１が受信した画像６をカメラ装置１０の識別情報（設置場所に１台しかカメラ装置１０がない場合は設置場所の識別情報でもよい）に対応付けて画像蓄積ＤＢ３９１に記憶する。画像蓄積部３２は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行すること等により実現される。 The image storage unit 32 associates the image 6 received by the communication unit 31 with the identification information of the camera device 10 (or the identification information of the installation location if there is only one camera device 10 at the installation location), and the image storage DB 391. Remember in. The image storage unit 32 is realized by the CPU 201 shown in FIG. 5 executing the program 209p or the like.

クラスタ設定部３３は、画像蓄積ＤＢ３９１に蓄積された画像６から特徴量を抽出し、特徴量を用いて各カメラ装置１０から送信された画像６をシーンが類似したクラスタにクラスタリングする。シーンとは、画像６が撮像される広義の場面ということができる。例えば、駐車場に設置されたカメラ装置１０の画像は異なる駐車場の画像６でもシーンが似ている可能性が高い。クラスタリングの際、最良のクラスタ数の決定を行う。また、日中と夜間では特徴が異なる傾向が強いため、同じカメラ装置１０の画像６でも日中と夜間で別々にクラスタが分けられる可能性が高い。更に上記のラベルを使えば確実に日中と夜間で別々にクラスタにクラスタリングできる。クラスタ設定部３３は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行すること等により実現される。 The cluster setting unit 33 extracts a feature amount from the image 6 stored in the image storage DB 391, and clusters the image 6 transmitted from each camera device 10 into clusters having similar scenes using the feature amount. The scene can be said to be a scene in a broad sense in which the image 6 is captured. For example, it is highly possible that the images of the camera device 10 installed in the parking lot are similar in the images 6 of different parking lots. When clustering, determine the best number of clusters. Further, since the characteristics tend to be different between daytime and nighttime, there is a high possibility that the clusters of image 6 of the same camera device 10 will be separated into daytime and nighttime. Furthermore, the above label can be used to ensure that clusters can be clustered separately during the day and at night. The cluster setting unit 33 is realized by the CPU 201 shown in FIG. 5 executing the program 209p or the like.

学習部３４は、更に機械学習部１、機械学習部２，…、機械学習部ｎ（ｎは自然数）を有している。機械学習部１〜ｎの１〜ｎはクラスタリングにより形成されるクラスタの数（ｎ個）に対応している。しかしながら、１つの機械学習部（例えば機械学習部１）が全てのクラスタの画像６を学習してもよく、機械学習部１〜ｎは１つでもよい。機械学習部１は学習により構築した学習モデルＣ１を学習モデルＤＢ３９２に記憶させ、機械学習部２は学習により構築した学習モデルＣ２を学習モデルＤＢ３９２に記憶させ、機械学習部ｎは学習により構築した学習モデルＣｎを学習モデルＤＢ３９２に記憶させる。以下では、説明の便宜上、単に「学習部３４」が学習モデルを構築すると説明する場合がある。学習部３４は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行すること等により実現される。 The learning unit 34 further includes a machine learning unit 1, a machine learning unit 2, ..., And a machine learning unit n (n is a natural number). 1 to n of the machine learning units 1 to n correspond to the number of clusters (n) formed by clustering. However, one machine learning unit (for example, the machine learning unit 1) may learn the image 6 of all the clusters, and the machine learning units 1 to n may be one. The machine learning unit 1 stores the learning model C1 constructed by learning in the learning model DB 392, the machine learning unit 2 stores the learning model C2 constructed by learning in the learning model DB 392, and the machine learning unit n stores the learning constructed by learning. The model Cn is stored in the learning model DB 392. In the following, for convenience of explanation, it may be explained that the "learning unit 34" simply constructs the learning model. The learning unit 34 is realized by the CPU 201 shown in FIG. 5 executing the program 209p or the like.

クラスタ判定部３５は、クラスタ設定部３３が生成したクラスタのうち新規設置場所７のカメラ装置１０が撮像する画像６がクラスタリングされるクラスタを判定する。例えば、特徴量が最も近いクラスタを選択する。カメラ装置１０と学習モデル（クラスタ）の対応は学習モデルＤＢ３９２に登録される。カメラ装置１０が日中か夜間のラベルを付さない場合でも、日中と夜間で特徴量が異なることを利用して日中と夜間のそれぞれで対応するクラスタを判定することが好ましい。なお、画像６が日中に撮像されたのか、夜間に撮像されたのかは情報処理装置３０側でも判定可能である。クラスタ判定部３５は図５に示したＣＰＵ２０１がプログラム２０９ｐを実行すること等により実現される。 The cluster determination unit 35 determines the cluster in which the image 6 captured by the camera device 10 at the new installation location 7 is clustered among the clusters generated by the cluster setting unit 33. For example, select the cluster with the closest features. The correspondence between the camera device 10 and the learning model (cluster) is registered in the learning model DB 392. Even when the camera device 10 does not label daytime or nighttime, it is preferable to determine the corresponding clusters in daytime and nighttime by utilizing the fact that the feature amounts are different between daytime and nighttime. It should be noted that the information processing apparatus 30 can also determine whether the image 6 is captured during the daytime or at night. The cluster determination unit 35 is realized by the CPU 201 shown in FIG. 5 executing the program 209p or the like.

また、クラスタ判定部３５は、既存設置場所８ａのカメラ装置１０が撮像する画像６がクラスタリングされるクラスタを判定する。この場合も、クラスタ判定部３５は日中と夜間のそれぞれで対応するクラスタを判定することが好ましい。 Further, the cluster determination unit 35 determines a cluster in which the image 6 captured by the camera device 10 at the existing installation location 8a is clustered. In this case as well, it is preferable that the cluster determination unit 35 determines the corresponding clusters during the daytime and at nighttime.

表１は、学習モデルＤＢ３９２に格納される情報を模式的に示す。学習モデルＤＢ３９２は、クラスタＩＤに対応付けて、学習モデル、カメラＩＤ、及び、日中／夜間の各項目を有する。クラスタＩＤは、最終的にクラスタリングされた各クラスタを特定する情報である。ＩＤはIdentificationの略であり識別子や識別情報という意味である。ＩＤは複数の対象から、ある特定の対象を一意的に区別するために用いられる名称、符号、文字列、数値又はこれらのうち１つ以上の組み合わせをいう。他のＩＤについても同様である。カメラＩＤは各カメラ装置１０を特定する情報であり、学習モデルと対応付けられている。日中／夜間の項目は各カメラ装置１０の日中の画像６と夜間の画像６のそれぞれがどの学習モデルと最も近いかを示す。これにより、既存設置場所８及び新規設置場所７の各カメラ装置１０が撮像した画像６をどの学習モデルで異常判定すればよいか決定される。

Table 1 schematically shows the information stored in the learning model DB 392. The learning model DB 392 has a learning model, a camera ID, and daytime / nighttime items in association with the cluster ID. The cluster ID is information that identifies each cluster that is finally clustered. ID is an abbreviation for Identification and means an identifier or identification information. An ID refers to a name, a code, a character string, a numerical value, or a combination of one or more of these, which is used to uniquely distinguish a specific object from a plurality of objects. The same applies to other IDs. The camera ID is information that identifies each camera device 10, and is associated with the learning model. The daytime / nighttime item indicates which training model each of the daytime image 6 and the nighttime image 6 of each camera device 10 is closest to. As a result, it is determined which learning model should be used to determine the abnormality of the image 6 captured by each camera device 10 of the existing installation location 8 and the new installation location 7.

＜クラスタリングの必要性＞
クラスタリングの必要性を説明する。まず前提として、カメラ装置１０が設置される設置場所の種類は限定的であり、その場所で考えられる正常行動（歩行者等の行動パターン）も限定されるとしてよい。表２は各設置場所の正常行動の一例を示す。 <Necessity of clustering>
Explain the need for clustering. First, as a premise, the type of installation location where the camera device 10 is installed is limited, and the normal behavior (behavior pattern of a pedestrian or the like) that can be considered at that location may also be limited. Table 2 shows an example of normal behavior at each installation location.

表２に示すように、設置場所が駐車場、家の玄関、建物外周のそれぞれで正常行動は幾つかに限定される。このため、画像６をクラスタリングすることで特徴が類似した画像を集めることができ、類似した画像から学習モデルを構築することで、異常判定の精度を向上できると期待できる。

As shown in Table 2, the normal behavior is limited to some in each of the parking lot, the entrance of the house, and the outer circumference of the building. Therefore, by clustering the images 6, images having similar characteristics can be collected, and by constructing a learning model from the similar images, it can be expected that the accuracy of abnormality determination can be improved.

＜クラスタリングの処理＞
次に、図７を用いてクラスタリングについて詳細に説明する。図７は、クラスタリングの処理手順を示す流れ図の一例である。以下、図７の各ステップを順番に説明する。 <Clustering processing>
Next, clustering will be described in detail with reference to FIG. 7. FIG. 7 is an example of a flow chart showing a clustering processing procedure. Hereinafter, each step of FIG. 7 will be described in order.

（Ｓ１）
クラスタ設定部３３は各画像６からノイズを除去した背景画像を取得する。ノイズの除去には、例えば公知の平滑化フィルタを使用すればよい。 (S1)
The cluster setting unit 33 acquires a background image from which noise is removed from each image 6. For removing noise, for example, a known smoothing filter may be used.

（Ｓ２）
クラスタ設定部３３は背景画像から局所的特徴量及び大域的特徴量を抽出する。局所的特徴量及び大域的特徴量について表３を用いて説明する。 (S2)
The cluster setting unit 33 extracts local features and global features from the background image. The local features and the global features will be described with reference to Table 3.

表３は、局所的特徴量と大域的特徴量の利用目的と判定できないことがそれぞれまとめられたものである。局所的特徴量では画像６の一部（局所）の特徴が得られるが、シーンを分類することが得意でない。例えば、車が何台かあっても道路なのか駐車場なのかの特徴が得られない。大域的特徴量はベランダなのか駐車場なのかという画像全体の特徴が得られるが、駐車場と住宅のガレージを分類できない可能性がある。

Table 3 summarizes the facts that cannot be determined as the purpose of use of the local feature amount and the global feature amount. With the local feature amount, a part (local) feature of the image 6 can be obtained, but the scene is not good at classifying. For example, even if there are several cars, it is not possible to obtain the characteristics of whether it is a road or a parking lot. Although it is possible to obtain the characteristics of the entire image as to whether the global features are balconies or parking lots, it may not be possible to classify parking lots and residential garages.

このように、局所的特徴量と大域的特徴量は補完する関係にあるため、シーンを適切に分類するには局所的特徴量と大域的特徴量の両方を抽出することが有効である。 In this way, since the local features and the global features are in a complementary relationship, it is effective to extract both the local features and the global features in order to properly classify the scene.

図８は、クラスタ設定部３３が局所的特徴量と大域的特徴量を抽出する手順を示す流れ図を示す。 FIG. 8 shows a flow chart showing a procedure in which the cluster setting unit 33 extracts the local feature amount and the global feature amount.

S2-1：まず、クラスタ設定部３３は局所的特徴量を検出する。本実施形態では局所的特徴量の一例としてＳＩＦＴ（Scale-Invariant Feature Transform）特徴量を取得する。ＳＩＦＴ特徴量は、スケールスペースを使った照明変化や回転、拡大、縮小に頑強な特徴量であり、１つの画像６から１２８次元の複数の特徴点を取得できる。クラスタ設定部３３はまずキーポイントを検出する。キーポイントは、スケールが変わっても画像６上に特徴的に現れる点である。検出には、解像度の異なるガウシアンフィルターを段階的に適用して解像度の異なる画像６を作成し、ＤｏＧ（Difference of Gaussian）画像の極値を探索する。探索で得られた局所領域の中心部分がキーポイントである。 S2-1: First, the cluster setting unit 33 detects the local feature amount. In this embodiment, SIFT (Scale-Invariant Feature Transform) features are acquired as an example of local features. The SIFT feature quantity is a feature quantity that is robust against lighting changes, rotation, enlargement, and reduction using a scale space, and a plurality of 128-dimensional feature points can be acquired from one image 6. The cluster setting unit 33 first detects a key point. The key point is that it appears characteristically on the image 6 even if the scale changes. For detection, a Gaussian filter having a different resolution is applied stepwise to create an image 6 having a different resolution, and an extreme value of a DoG (Difference of Gaussian) image is searched. The key point is the central part of the local area obtained by the search.

S2-2：次に、クラスタ設定部３３は、キーポイントの近傍に対し濃度の勾配の強さと勾配の向きを検出する（特徴を抽出する）。各キーポイントの勾配の強さと勾配がＳＩＦＴ特徴量である。クラスタ設定部３３は分類対象の全ての画像６についてＳＩＦＴ特徴量を算出する。 S2-2: Next, the cluster setting unit 33 detects the strength of the concentration gradient and the direction of the gradient with respect to the vicinity of the key point (extracts features). The gradient strength and gradient of each key point are SIFT features. The cluster setting unit 33 calculates SIFT features for all the images 6 to be classified.

S2-3：次に、クラスタ設定部３３は分類対象の全ての画像６について局所的特徴量を分類する。全ての画像６とは既存設置場所８の全ての画像６である。 S2-3: Next, the cluster setting unit 33 classifies the local features for all the images 6 to be classified. All images 6 are all images 6 of the existing installation location 8.

図９（ａ）はK-means法による分類を模式的に示す。図９（ａ）は３次元で各特徴点を示すが、実際は１２８次元（ＳＩＦＴ特徴量）×画像一枚から得られた特徴点数×画像枚数、の次元を有する。図９（ａ）の１つの点が１つの特徴点を表す。K-means法では、任意に与えられたクラスタ数の各クラスタに各特徴点を分類する。まず、クラスタ数ｋは、以下の式で算出される。
クラスタ数ｋ＝√｛１２８次元（ＳＩＦＴ特徴量）×画像一枚から得られた特徴点数×画像枚数｝
次に、クラスタ設定部３３は、このクラスタ数のクラスタに全ての画像６を分類する。クラスタの初期の重心をランダムに与え、各特徴点を最も近い重心のクラスタに振り分け、重心を再計算し、再度、各特徴点を最も近い重心のクラスタに振り分る処理をクラスタ間の移動が無くなるまで繰り返す。図９（ｂ）は３つのクラスタ７１に分類された各特徴点を模式的に示す。 FIG. 9A schematically shows the classification by the K-means method. FIG. 9A shows each feature point in three dimensions, but actually has a dimension of 128 dimensions (SIFT feature amount) × number of feature points obtained from one image × number of images. One point in FIG. 9A represents one feature point. In the K-means method, each feature point is classified into each cluster with an arbitrarily given number of clusters. First, the number of clusters k is calculated by the following formula.
Number of clusters k = √ {128 dimensions (SIFT feature amount) x number of feature points obtained from one image x number of images}
Next, the cluster setting unit 33 classifies all the images 6 into clusters having this number of clusters. The initial center of gravity of the cluster is randomly given, each feature point is assigned to the cluster with the closest center of gravity, the center of gravity is recalculated, and the process of assigning each feature point to the cluster with the closest center of gravity is performed again. Repeat until it is gone. FIG. 9B schematically shows each feature point classified into the three clusters 71.

S2-4：次に、クラスタ設定部３３は、visual
wordsを決定することで量子化する。すなわち、各クラスタの中心（重心）をvisual words７２に決定する。これにより、クラスタ数であるｋ個のvisual wordsが特定される。図９（ｃ）には一例としてｋ＝３とした場合の３つのvisual words７２を模式的に示す。visual words７２の数はクラスタ数と同じであり、各visual words７２の次元数は１２８次元（ＳＩＦＴ特徴量）×画像一枚から得られた特徴点数×画像枚数である。 S2-4: Next, the cluster setting unit 33 is visually
Quantize by determining words. That is, the center (center of gravity) of each cluster is determined to be visual words 72. As a result, k visual words, which is the number of clusters, are specified. FIG. 9C schematically shows three visual words 72 when k = 3 as an example. The number of visual words 72 is the same as the number of clusters, and the number of dimensions of each visual words 72 is 128 dimensions (SIFT feature amount) × the number of feature points obtained from one image × the number of images.

S2-5：次に、クラスタ設定部３３は、visual
words７２を次元とする特徴ベクトルにクラスタリング対象の全ての画像６を変換する。１つの画像６の各局所特徴量について一番近いvisual words７２を検索し、そのvisual words７２に投票する。これにより、１つの画像６のヒストグラムがえられる。図９（ｄ）にヒストグラムを模式的に示す。図９（ｄ）の横軸はvisual wordsであり、縦軸はある画像６において各visual wordsに最も近いと判定された局所的特徴（SIFT特徴量）の数である。このヒストグラムが各画像６の特徴ベクトルになる。 S2-5: Next, the cluster setting unit 33 is visually
All the images 6 to be clustered are converted into a feature vector having words72 as a dimension. Search for the closest visual words 72 for each local feature of one image 6 and vote for that visual words 72. As a result, a histogram of one image 6 can be obtained. FIG. 9D schematically shows a histogram. The horizontal axis of FIG. 9D is visual words, and the vertical axis is the number of local features (SIFT features) determined to be closest to each visual words in a certain image 6. This histogram becomes the feature vector of each image 6.

S2-6：次に、クラスタ設定部３３は大域的特徴量を検出する。本実施形態では大域的特徴量の一例としてＧＩＳＴ特徴量を取得する。ＧＩＳＴ特徴量は、画像６を小領域に区切り、それらの小領域に対し様々な方向及び周波数のガボールフィルターを適用してシーン情報を記述する特徴量であり、９６０次元の特徴を有する。ＧＩＳＴ特徴量は１枚の画像６から１つ求められるので、ヒストグラムの作成は不要である。 S2-6: Next, the cluster setting unit 33 detects the global feature amount. In this embodiment, the GIST feature amount is acquired as an example of the global feature amount. The GIST feature amount is a feature amount that divides the image 6 into small areas and applies Gabor filters of various directions and frequencies to the small areas to describe scene information, and has 960-dimensional features. Since one GIST feature amount can be obtained from one image 6, it is not necessary to create a histogram.

S2-7：次に、クラスタ設定部３３は、局所的特徴量と大域的特徴量を合体して最終的な特徴量を生成する。したがって、次元数はvisual wordsの数＋９６０次元となる。 S2-7: Next, the cluster setting unit 33 combines the local feature amount and the global feature amount to generate the final feature amount. Therefore, the number of dimensions is the number of visual words + 960 dimensions.

S2-8：クラスタ設定部３３は特徴量を出力する。 S2-8: The cluster setting unit 33 outputs the feature amount.

（Ｓ３、Ｓ４）
次に、クラスタ設定部３３は、特徴量をｎ個のクラスタにクラスタリングする。ここで、ｎ個は固定された値ではなく、最良のクラスタ数が動的に決定される。 (S3, S4)
Next, the cluster setting unit 33 clusters the feature amount into n clusters. Here, n is not a fixed value, and the best number of clusters is dynamically determined.

図１０は、クラスタリングを説明する図の一例である。図１０（ａ）はステップＳ２で特徴量に変換された各画像６の集合を模式的に示している。計算を容易にするため、画像６は１つのカメラ装置１０に対し１枚でよい。 FIG. 10 is an example of a diagram illustrating clustering. FIG. 10A schematically shows a set of each image 6 converted into a feature amount in step S2. In order to facilitate the calculation, one image 6 may be used for one camera device 10.

クラスタリングには局所的特徴量の分類に使用したK-means法を用いる他、ガウシアンミクスチャーモデルを利用したＥＭアルゴリズム、又は、ＬＤＡ（Latent Dirichlet Allocation）等の方法を使用してもよい。なお、K-means法とＥＭアルゴリズムではクラスタ数を決定できない（適宜、与える必要がある）。これに対し、ＬＤＡではクラスタリングとクラスタ数を同時に決定することができる。本実施形態では、K-means法、又は、ＥＭアルゴリズムによりクラスタリングされた例として、後述するStacked Auto Encoderを用いて最適なクラスタ数を決定する。 For clustering, the K-means method used for classifying local features may be used, or an EM algorithm using a Gaussian mixture model or a method such as LDA (Latent Dirichlet Allocation) may be used. The number of clusters cannot be determined by the K-means method and the EM algorithm (it is necessary to give them as appropriate). On the other hand, in LDA, clustering and the number of clusters can be determined at the same time. In this embodiment, the optimum number of clusters is determined by using the Stacked Auto Encoder described later as an example of clustering by the K-means method or the EM algorithm.

最良のクラスタ数を決定するため、クラスタ設定部３３はクラスタ数ｉを変えてクラスタリングを行う。図１０ではｉ＝５，６，１００のクラスタが模式的に示されている。クラスタ設定部３３は、クラスタ数をｉ＝５〜１００としてクラスタリングを行う。図１０（ｂ）はクラスタ数をｉ＝５、６及び１００のクラスタリングの結果を模式的に示す。 In order to determine the best number of clusters, the cluster setting unit 33 performs clustering by changing the number of clusters i. In FIG. 10, clusters of i = 5, 6, 100 are schematically shown. The cluster setting unit 33 performs clustering with the number of clusters being i = 5 to 100. FIG. 10B schematically shows the results of clustering with the number of clusters i = 5, 6 and 100.

クラスタ数の設定の際は、クラスタ数ｉを１つずつ大きくしてもよいし、クラスタ数ｉを例えば１０ずつ大きくし分離度がよくなるクラスタ数の周辺でより小さくクラスタ数ｉを変えて評価してもよい。 When setting the number of clusters, the number of clusters i may be increased by one, or the number of clusters i may be increased by, for example, 10 and evaluated by changing the number of clusters i smaller around the number of clusters where the degree of separation is improved. You may.

クラスタ数を評価するためクラスタ判定部３５はクラスタリングを行うごとに分離度Ｒを算出する。図１１は分離度Ｒを説明する図の一例である。本実施例では分離度Ｒの算出のため一例としてＳＡＥ（Stacked Auto Encoder）を利用する。図１１ではクラスタ数ｉ＝４の場合を例にして説明する。クラスタリングでそれぞれのクラスタに２５００枚の画像６が分類されたとする。クラスタの画像６の一部（例えば２０００枚）を学習に使用し、残り（例えば５００枚）を識別に使用する。 In order to evaluate the number of clusters, the cluster determination unit 35 calculates the degree of separation R each time clustering is performed. FIG. 11 is an example of a diagram for explaining the degree of separation R. In this embodiment, SAE (Stacked Auto Encoder) is used as an example for calculating the degree of separation R. In FIG. 11, a case where the number of clusters i = 4 will be described as an example. It is assumed that 2500 images 6 are classified into each cluster by clustering. A part of the cluster image 6 (for example, 2000) is used for learning, and the rest (for example, 500) is used for identification.

図１２はＳＡＥの学習方法を説明する図の一例である。まず、オートエンコーダ（Auto Encoder）とは、特徴を圧縮しても元のパターンへ復元が可能な特徴表現力があるネットワークをいう。したがって、オートエンコーダは特徴を圧縮して学習することができ、学習がうまくいくと入力層のデータが圧縮されても復元できる。ＳＡＥは、オートエンコーダが組み合わされた（積み重ねられた）ネットワークである。図１２（ａ）はＳＡＥの一例であり、図１２（ｂ）（ｃ）（ｄ）はオートエンコーダの一例である。通常、中間層が２層以上あると、不適切な極小解に収束してしまい誤差逆伝播がうまくいかない（これを勾配消失問題という）。ＳＡＥではこれを２層連続して計算するのではなくて、１層ずつ積み上げていくことで勾配消失問題に陥りにくくする。ＳＡＥの学習がうまくいくとＳＡＥは画像６を復元できる。ただし、ＳＡＥは異常画像が入力された場合は復元できないので、画像６は正常な画像のみであるとする。 FIG. 12 is an example of a diagram illustrating a SAE learning method. First, the autoencoder is a network having a feature expressive power that can restore the original pattern even if the feature is compressed. Therefore, the autoencoder can compress the features for learning, and if the learning is successful, the data in the input layer can be restored even if it is compressed. SAE is a network in which autoencoders are combined (stacked). 12 (a) is an example of SAE, and FIGS. 12 (b), (c) and 12 (d) are examples of an autoencoder. Normally, if there are two or more intermediate layers, the error will converge to an inappropriate minimum solution and the error back propagation will not work (this is called the vanishing gradient problem). In SAE, this is not calculated continuously for two layers, but by stacking one layer at a time, it is difficult to fall into the vanishing gradient problem. If the learning of SAE is successful, SAE can restore image 6. However, since SAE cannot be restored when an abnormal image is input, it is assumed that the image 6 is only a normal image.

図１２（ａ）に示すように入力層ｘ、隠れ層ｙ、隠れ層ｚ、出力層ｓにかけて特徴（ノード）が減少することが特徴の圧縮に相当し、出力層ｓ、隠れ層ｚ´、隠れ層ｙ´、入力層ｘ´にかけてノード数が元に戻ることが元のパターンへの復元に相当する。ＳＡＥでこのようなニューラルネットワークのパラメータを学習する場合、図１２（ｂ）に示すオートエンコーダで、入力層ｘと隠れ層ｙの間のパラメータ（重み）を学習する。 As shown in FIG. 12A, the feature (node) decreases toward the input layer x, the hidden layer y, the hidden layer z, and the output layer s, which corresponds to the compression of the feature, and the output layer s, the hidden layer z', Restoring the number of nodes to the hidden layer y'and the input layer x'corresponds to restoration to the original pattern. When learning the parameters of such a neural network by SAE, the parameters (weights) between the input layer x and the hidden layer y are learned by the autoencoder shown in FIG. 12 (b).

次に、図１２（ａ）の隠れ層ｙと隠れ層ｚのパラメータ（重み）を得るために、図１２（ｂ）のｙを復元するように、図１２（ｃ）に示すオートエンコーダで隠れ層ｙと隠れ層ｚのパラメータ（重み）を学習する。次に、図１２（ａ）の隠れ層ｚと出力層ｓのパラメータ（重み）を得るために、図１２（ｃ）のｚを復元するように、図１２（ｄ）に示すオートエンコーダで隠れ層ｚと出力層ｓのパラメータ（重み）を学習する。以上で、図１２（ａ）のＳＡＥのパラメータ（重み）が学習される。 Next, in order to obtain the parameters (weights) of the hidden layer y and the hidden layer z in FIG. 12 (a), the autoencoder shown in FIG. 12 (c) is used to hide the y in FIG. 12 (b). The parameters (weights) of the layer y and the hidden layer z are learned. Next, in order to obtain the parameters (weights) of the hidden layer z of FIG. 12 (a) and the output layer s, the autoencoder shown in FIG. 12 (d) is used to hide the z of FIG. 12 (c). The parameters (weights) of the layer z and the output layer s are learned. With the above, the parameters (weights) of SAE shown in FIG. 12A are learned.

図１１に戻って説明する。識別用の５００枚の画像６をｘ、ＳＡＥを関数ｆ（）で表してＳＡＥの出力をｆ（ｘ）とする。スコアＳを以下で定義する。ｉはクラスタの番号である。スコアＳは画像６の特徴量の空間におけるＬ２ノルム（距離）の二乗である。 It will be described back to FIG. The 500 images 6 for identification are represented by x, the SAE is represented by the function f (), and the output of the SAE is f (x). The score S is defined below. i is the cluster number. The score S is the square of the L2 norm (distance) in the space of the feature amount of the image 6.

この計算を各クラスタから抽出した１枚の画像６に対して行う。Ｓ_ciは学習がうまくいくほど小さくなる。

This calculation is performed on one image 6 extracted from each cluster. S _ci becomes smaller as learning is successful.

次に、全クラスタの分離度Ｒを次のように定義する。 Next, the degree of separation R of all clusters is defined as follows.

学習がうまくいくこととクラスタリングがうまくいくことは相関するので、分離度Ｒが小さい方が好ましい。したがって、クラスタ設定部３３は幾つかクラスタ数ｉを変えて分離度Ｒを算出し、分離度Ｒが最も小さい時のクラスタ数ｉでクラスタリングすると判定する。つまり、各クラスタのスコアＳを合計した値が最も小さくなるようにクラスタ数を決定する
なお、ＳＡＥ以外でクラスタ数を決定する方法としては、ＡＩＣ（Akaike’s Information Criterion）、ＢＩＣ（Bayesian Information Criterion）などがある。

Since there is a correlation between successful learning and successful clustering, it is preferable that the degree of separation R is small. Therefore, the cluster setting unit 33 calculates the degree of separation R by changing the number of clusters i, and determines that clustering is performed by the number of clusters i when the degree of separation R is the smallest. That is, the number of clusters is determined so that the total value of the scores S of each cluster is the smallest. As a method for determining the number of clusters other than SAE, AIC (Akaike's Information Criterion), BIC (Bayesian Information Criterion), etc. There is.

（Ｓ５）
次に、学習部３４は、クラスタリングされた画像６から異常判定するための学習モデルを作成する。異常判定するため、学習部３４は正常な画像６のみを学習する。 (S5)
Next, the learning unit 34 creates a learning model for determining an abnormality from the clustered images 6. In order to determine the abnormality, the learning unit 34 learns only the normal image 6.

図１３は画像６から異常の有無を判定する学習モデルを模式的に示す。本実施形態では一例としてディープラーニングとＳＶＭを組み合わせた学習モデルを説明する。図１３に示すように、ディープラーニングのための多層パーセプトロンの入力層３０１には１つの画像６が均等に分割されたパッチ領域が入力される。学習フェーズでは出力層３０３に画像６が正常であることを意味する値（例えば１）が入力される。多層パーセプトロンにおいて入力層３０１と中間層３０２と出力層３０３の重みの学習には誤差逆伝播法が用いられることが一般的である。なお、出力層３０３のノードの数は、画像６の特徴を適切に抽出できる数として適宜、設計される。出力層３０３の各ノードの出力値は動体の特徴を表す値に数値化されるが詳細は図１４で説明する。 FIG. 13 schematically shows a learning model for determining the presence or absence of an abnormality from the image 6. In this embodiment, a learning model that combines deep learning and SVM will be described as an example. As shown in FIG. 13, a patch region in which one image 6 is evenly divided is input to the input layer 301 of the multi-layer perceptron for deep learning. In the learning phase, a value (for example, 1) indicating that the image 6 is normal is input to the output layer 303. In a multi-layer perceptron, an error backpropagation method is generally used for learning the weights of the input layer 301, the intermediate layer 302, and the output layer 303. The number of nodes in the output layer 303 is appropriately designed as a number capable of appropriately extracting the features of the image 6. The output value of each node of the output layer 303 is quantified into a value representing the characteristics of the moving body, and the details will be described with reference to FIG.

ＳＶＭ３０５は本来２クラス判別を行う教師付き学習のアルゴリズムであるが、ＳＶＭは１クラスＳＶＭとしても利用可能である。１クラスＳＶＭでは、２クラス判別がデータの高密度領域を推定する領域判別問題に置き換えられるので、教師信号が不要になる。サポートベクトルの算出にはラグランジュの未定定数法が使用されるが詳細は省略する。図に示すように正常な特徴ベクトルの領域を特定できるため、この領域から外れるほど異常の可能性が高いと異常判定部３６が判定できる。 The SVM305 is originally a supervised learning algorithm that discriminates between two classes, but the SVM can also be used as a one-class SVM. In the 1-class SVM, the 2-class discrimination is replaced by the region discrimination problem for estimating the high-density region of the data, so that the teacher signal becomes unnecessary. The Lagrange's undetermined constant method is used to calculate the support vector, but details are omitted. Since the region of the normal feature vector can be specified as shown in the figure, the abnormality determination unit 36 can determine that the possibility of abnormality is higher as the region deviates from this region.

なお、学習はステップＳ３でクラスタリングされた同じクラスタの画像６ごとに行われる。したがって、シーンごとに正常であることを学習した学習モデルが得られる。 The learning is performed for each image 6 of the same cluster clustered in step S3. Therefore, a learning model learned to be normal for each scene can be obtained.

図１４は、図１３の学習の手順をより詳細に説明する図の一例である。図１４（ａ）は各クラスタ７１の複数の画像６を模式的に示す。図１４（ａ）のクラスタ７１はいずれも駐車場の画像６が分類されるクラスタであるが、車の移動、歩行などのシーンごとにクラスタリングされている。なお、それぞれ正常行動のみの画像６である。図１４（ｂ）に示すように、学習部３４は１枚の画像６をパッチ領域（縦横均等に分割）に分割し、図１３に示した多層パーセプトロンの入力層３０１に入力する。パッチ領域に分割することで、シーンの構造を考慮する必要がなくなるという利点がある。例えば、出庫する車はどちらの方向から移動しても同じ特徴になる。図１４（ｂ）では移動中の車が写っているパッチ領域３０４を太枠で示した。パッチ領域ごとに特徴抽出した場合、車などが動くと、同じ車が異なる画像６の異なるパッチ領域で検出される。したがって、学習部３４は抽出された特徴量により同じ車を異なる画像間で追跡することができる。 FIG. 14 is an example of a diagram for explaining the learning procedure of FIG. 13 in more detail. FIG. 14A schematically shows a plurality of images 6 of each cluster 71. The cluster 71 in FIG. 14A is a cluster in which the image 6 of the parking lot is classified, but is clustered for each scene such as movement of a car or walking. It should be noted that each is an image 6 of only normal behavior. As shown in FIG. 14B, the learning unit 34 divides one image 6 into patch regions (divided evenly in the vertical and horizontal directions) and inputs them to the input layer 301 of the multi-layer perceptron shown in FIG. Dividing into patch areas has the advantage that it is not necessary to consider the structure of the scene. For example, a car leaving the garage has the same characteristics regardless of which direction it moves. In FIG. 14B, the patch area 304 showing the moving vehicle is shown in a thick frame. When the features are extracted for each patch area, when a car or the like moves, the same car is detected in different patch areas of different images 6. Therefore, the learning unit 34 can track the same car between different images depending on the extracted features.

学習部３４は、多層パーセプトロンの出力層３０３から得られた特徴量から時刻ｔの画像６の各パッチ領域の特徴量を取得し、時刻ｔ−１，ｔ−２…などの過去の画像６の特徴量とマッチングする。これにより、車がどのパッチ領域に移動したか判定できる。例えば、１つ前の画像６（動画の１フレーム）では左端のパッチ領域にあった車が右に１パッチ領域移動したことが分かる。以上から、図１４（ｃ）に示すように、移動している車の形状（特徴量）、及び、動き（移動速度）を数値化できる。車の形状を表す特徴量はディープラーニングにより抽出される。移動速度は、ｎパッチ領域／１画像、ｎパッチ領域／１秒、ｎメートル／１秒、などのように扱いやすい単位で算出すればよい。このように形状（特徴量）、及び、動き（移動速度）が数値化されＳＶＭ３０５に入力される。異常が発生する場合は何らかの動きがあるので、動体が検出されている（移動速度がゼロより大きい）パッチ領域のみから特徴を抽出すればよい。 The learning unit 34 acquires the features of each patch region of the image 6 at time t from the features obtained from the output layer 303 of the multi-layer perceptron, and the learning unit 34 obtains the features of the past images 6 such as times t-1, t-2, and so on. Matches with features. This makes it possible to determine to which patch area the car has moved. For example, in the previous image 6 (one frame of the moving image), it can be seen that the car in the patch area at the left end has moved to the right by one patch area. From the above, as shown in FIG. 14C, the shape (feature amount) and movement (moving speed) of the moving vehicle can be quantified. Features representing the shape of a car are extracted by deep learning. The moving speed may be calculated in easy-to-use units such as n-patch area / 1 image, n-patch area / 1 second, n meters / 1 second, and the like. In this way, the shape (feature amount) and movement (movement speed) are quantified and input to the SVM305. If an abnormality occurs, there is some movement, so it is only necessary to extract the features from the patch area where the moving object is detected (movement speed is greater than zero).

学習部３４は以上のようにしてクラスタごとに全ての画像６を学習し学習モデルＣ１〜Ｃｎを作成する。 As described above, the learning unit 34 learns all the images 6 for each cluster and creates learning models C1 to Cn.

＜異常検出時の機能について＞
学習モデルＣ１〜Ｃｎが得られると、情報処理装置３０は動画に対し異常判定を開始する。図１５は、異常判定に関する情報処理装置３０の機能ブロック図の一例である。図１５の情報処理装置３０は通信部３１、異常判定部３６（区別する場合異常判定部１〜ｎという）、異常判定部ｘ及び判定結果処理部３７を有する。まず、異常判定部１〜ｎは、それぞれ学習モデルＣ１〜Ｃｎを有している。ｎは図７のステップＳ３、Ｓ４のクラスタ数である。クラスタ判定部３５はそれぞれの既存設置場所８の画像６から局所的特徴量と大域的特徴量を求めることで既存設置場所８の画像６が最も近いクラスタを決定できる。クラスタと既存設置場所８（カメラＩＤ）の対応は表１に示した。既存設置場所８と学習モデルの対応付けは、異常判定の開始前に１回行えばよい。 <Functions when anomaly is detected>
When the learning models C1 to Cn are obtained, the information processing apparatus 30 starts abnormality determination for the moving image. FIG. 15 is an example of a functional block diagram of the information processing apparatus 30 related to abnormality determination. The information processing device 30 of FIG. 15 includes a communication unit 31, an abnormality determination unit 36 (referred to as abnormality determination units 1 to n when distinguishing), an abnormality determination unit x, and a determination result processing unit 37. First, the abnormality determination units 1 to n have learning models C1 to Cn, respectively. n is the number of clusters in steps S3 and S4 of FIG. The cluster determination unit 35 can determine the cluster closest to the image 6 of the existing installation location 8 by obtaining the local feature amount and the global feature amount from the image 6 of each existing installation location 8. Table 1 shows the correspondence between the cluster and the existing installation location 8 (camera ID). The association between the existing installation location 8 and the learning model may be performed once before the start of the abnormality determination.

通信部３１は、学習モデルＣ１〜ＣｎとカメラＩＤの対応に基づいて、各既存設置場所８から送信された画像６を異常判定部１〜ｎに割り振る。この時、日中か夜間かのラベルを参照し、日中用の異常判定部３６又は夜間用の異常判定部３６に割り振ることが更に好ましい。新規設置場所７と学習モデルの対応付けは、異常判定の開始前に１回行えばよい。 The communication unit 31 allocates the images 6 transmitted from the existing installation locations 8 to the abnormality determination units 1 to n based on the correspondence between the learning models C1 to Cn and the camera ID. At this time, it is more preferable to refer to the label of daytime or nighttime and allocate it to the abnormality determination unit 36 for daytime or the abnormality determination unit 36 for nighttime. The association between the new installation location 7 and the learning model may be performed once before the start of the abnormality determination.

異常判定部ｘは、新規設置場所７のカメラの画像６を異常判定する。異常判定部ｘは学習モデルＣｘを有している。同様に、クラスタ判定部３５は新規設置場所７の画像６から局所的特徴量と大域的特徴量を求めることで新規設置場所７の画像６が最も近いクラスタを決定できる。日中と夜間の判別も異常判定部１〜ｎと同様でよい。 The abnormality determination unit x determines the abnormality of the image 6 of the camera at the new installation location 7. The abnormality determination unit x has a learning model Cx. Similarly, the cluster determination unit 35 can determine the cluster closest to the image 6 of the new installation location 7 by obtaining the local feature amount and the global feature amount from the image 6 of the new installation location 7. The discrimination between daytime and nighttime may be the same as in the abnormality determination units 1 to n.

異常判定部１〜ｎ及び異常判定部ｘは割り当てられている学習モデルを使用して正常な画像６にどれだけ近いかを示す確度を算出する。確度は例えば０〜１の値を取り、正常に近いほど高い値となる。 The abnormality determination units 1 to n and the abnormality determination unit x use the assigned learning model to calculate the accuracy indicating how close the image 6 is to the normal image 6. The accuracy takes a value of 0 to 1, for example, and the closer to normal, the higher the value.

判定結果処理部３７は、異常判定部１〜ｎ及び異常判定部ｘが算出する確度を取得し、異常があるか否かを判定する。例えば、確度が閾値未満の場合に異常があると判定すればよい。異常があると判定した場合、警備員の派遣や顧客へ異常がある旨を通知したりする。 The determination result processing unit 37 acquires the accuracy calculated by the abnormality determination units 1 to n and the abnormality determination unit x, and determines whether or not there is an abnormality. For example, if the accuracy is less than the threshold value, it may be determined that there is an abnormality. If it is determined that there is an abnormality, a guard will be dispatched or the customer will be notified that there is an abnormality.

＜既存設置場所の異常検出＞
図１６は既存設置場所８の異常検出を模式的に説明する図の一例である。図１６では既存設置場所８ａを例にするが、他の既存設置場所８も同様である。
（１）クラスタ判定部３５は、既存設置場所８ａのカメラ装置１０が撮像した画像６をクラスタリングすることで、日中と夜間それぞれの画像６に適した学習モデルを判定する。
（２）異常検出の開始後、既存設置場所８ａのカメラ装置１０は動画を情報処理装置３０に送信する。動画には日中又は夜間のラベルが付与されているものとする。情報処理装置３０が日中又は夜間を判定してもよい。
（３）既存設置場所８ａに対応する異常判定部３６は、既存設置場所８の日中又は夜間に割り当てられた学習モデルを使って、画像６の異常の有無を判定する。
（４）異常判定部３６による判定結果（確度）は判定結果処理部３７に送出される。 <Abnormality detection of existing installation location>
FIG. 16 is an example of a diagram schematically illustrating abnormality detection at the existing installation location 8. In FIG. 16, the existing installation location 8a is taken as an example, but the same applies to the other existing installation locations 8.
(1) The cluster determination unit 35 determines a learning model suitable for each of the daytime and nighttime images 6 by clustering the images 6 captured by the camera device 10 at the existing installation location 8a.
(2) After the start of abnormality detection, the camera device 10 at the existing installation location 8a transmits a moving image to the information processing device 30. The video shall be labeled during the day or at night. The information processing device 30 may determine daytime or nighttime.
(3) The abnormality determination unit 36 corresponding to the existing installation location 8a determines the presence or absence of an abnormality in the image 6 by using the learning model assigned during the daytime or nighttime of the existing installation location 8.
(4) The determination result (accuracy) by the abnormality determination unit 36 is sent to the determination result processing unit 37.

このように、既存設置場所８ａの画像６に対しても、クラスタリングされた画像６から作成された学習モデルを使って異常判定することができるので、より精度よく異常判定することが可能になる。 In this way, even for the image 6 of the existing installation location 8a, the abnormality can be determined by using the learning model created from the clustered image 6, so that the abnormality can be determined more accurately.

＜新規設置場所の異常検出＞
図１７は新規設置場所７の異常検出を模式的に説明する図の一例である。
（１）クラスタ判定部３５は、新規設置場所７のカメラ装置１０が撮像した画像６をクラスタリングすることで、日中と夜間それぞれの画像６に適した学習モデルを判定する。
（２）異常検出の開始後、新規設置場所７のカメラ装置１０は動画を情報処理装置３０に送信する。動画には日中又は夜間のラベルが付与されているものとする。情報処理装置３０が日中又は夜間を判定してもよい。
（３）新規設置場所７の異常判定部３６は、新規設置場所７の日中又は夜間に割り当てられた学習モデルを使って、画像６の異常の有無を判定する。
（４）異常判定部３６による判定結果（確度）は判定結果処理部３７に送出される。 <Abnormal detection of new installation location>
FIG. 17 is an example of a diagram schematically illustrating abnormality detection at the new installation location 7.
(1) The cluster determination unit 35 determines a learning model suitable for each of the daytime and nighttime images 6 by clustering the images 6 captured by the camera device 10 at the new installation location 7.
(2) After the start of abnormality detection, the camera device 10 at the new installation location 7 transmits a moving image to the information processing device 30. The video shall be labeled during the day or at night. The information processing device 30 may determine daytime or nighttime.
(3) The abnormality determination unit 36 of the new installation location 7 determines the presence or absence of an abnormality in the image 6 by using the learning model assigned during the daytime or nighttime of the new installation location 7.
(4) The determination result (accuracy) by the abnormality determination unit 36 is sent to the determination result processing unit 37.

このように、新規設置場所７にカメラ装置１０が新たに設置された場合、既存設置場所８の画像６をクラスタリングして得た学習モデルを使って異常判定することができるので、カメラ装置１０が新たに設置された当初から、精度よく異常判定することが可能になる。 In this way, when the camera device 10 is newly installed at the new installation location 7, an abnormality can be determined using the learning model obtained by clustering the images 6 of the existing installation location 8, so that the camera device 10 can be used. From the beginning of the new installation, it will be possible to accurately determine abnormalities.

＜判定結果の処理例＞
図１８を用いて判定結果処理部３７の処理について説明する。図１８は、判定結果処理部３７が行う処理を模式的に説明する図の一例である。
（１）上記のように画像６が異常判定部３６に入力される。異常判定部３６は、異常の判定結果と共にカメラ装置１０の設置場所に関する情報（例えば、顧客ＩＤや顧客名）及び異常の判定に用いた動画を判定結果処理部３７に送出する。
（２）判定結果処理部３７は異常の判定結果と閾値を比較し、異常の可能性が高いか否かを判定する。異常の可能性が高いと判定した場合、少なくとも動画の一部と判定結果を顧客及び警備員に送信する。送信の前に監視員が目視で動画を確認することがより好ましい。なお、送信先は、警備員を派遣する司令塔となるガードセンター、警備員が常駐している待機所、現地周辺の警備員、顧客、などである。 <Judgment result processing example>
The processing of the determination result processing unit 37 will be described with reference to FIG. FIG. 18 is an example of a diagram schematically explaining the processing performed by the determination result processing unit 37.
(1) The image 6 is input to the abnormality determination unit 36 as described above. The abnormality determination unit 36 sends information about the installation location of the camera device 10 (for example, a customer ID and a customer name) and a moving image used for determining the abnormality to the determination result processing unit 37 together with the abnormality determination result.
(2) The determination result processing unit 37 compares the determination result of the abnormality with the threshold value, and determines whether or not there is a high possibility of the abnormality. If it is determined that there is a high possibility of abnormality, at least a part of the video and the determination result are transmitted to the customer and the guards. It is more preferable that the observer visually confirms the moving image before transmission. The destinations are a guard center that serves as a command tower for dispatching guards, a waiting area where guards are stationed, guards around the site, customers, and so on.

図１８では、顧客又は警備員等が有する携帯端末７４ａ、７４ｂの画面例が示されている。携帯端末７４ａの画面には「異常度８０％」というメッセージ４０１、動画表示欄４０２、及び、動画の再生ボタン４０３等が表示されている。顧客や警備員はメッセージや動画を目視して適切な対応を取ることができる。携帯端末７４ｂには複数のカメラ装置１０のうち１つのカメラ装置１０が強調して表示されている。このように、１つの設置場所に複数のカメラ装置１０が設置されている場合、どのカメラ装置１０で異常が検出されたかを表示することで、顧客や警備員は異常の発生場所を特定しやすくなる。
（３）顧客や警備員は必要に応じて警察に通報したり、警備員が現地に赴いて確認したりする。警備員が現地に赴いたが正常であった場合、警備員はその旨を判定結果処理部３７に通知する（フィードバックする）。通知の際は、携帯端末７４ａ、７４ｂに送信された動画（又はこの動画を特定できる情報）を判定結果処理部３７に送信する。学習部３４はこの動画が正常であることを教師信号にして学習できるため、異常検出の精度を向上させることができる。 FIG. 18 shows screen examples of mobile terminals 74a and 74b owned by a customer, a guard, or the like. On the screen of the mobile terminal 74a, a message 401 of "abnormality 80%", a moving image display field 402, a moving image playback button 403, and the like are displayed. Customers and security guards can visually view messages and videos and take appropriate action. On the mobile terminal 74b, one of the plurality of camera devices 10 is highlighted. In this way, when a plurality of camera devices 10 are installed in one installation location, it is easy for customers and security guards to identify the location where the abnormality occurs by displaying which camera device 10 has detected the abnormality. Become.
(3) Customers and security guards report to the police as necessary, and security guards go to the site to check. If the guard goes to the site but is normal, the guard notifies (feeds back) the determination result processing unit 37 to that effect. At the time of notification, the moving image (or information that can identify this moving image) transmitted to the mobile terminals 74a and 74b is transmitted to the determination result processing unit 37. Since the learning unit 34 can learn that the moving image is normal as a teacher signal, the accuracy of abnormality detection can be improved.

＜カメラ装置における学習モデルの利用方法＞
本実施形態では監視センターで異常を判定していたが、カメラ装置１０が異常を判定することも可能である。 <How to use the learning model in the camera device>
In the present embodiment, the monitoring center determines the abnormality, but the camera device 10 can also determine the abnormality.

図１９（ａ）は新規設置場所７のカメラ装置１０が学習モデルＣｘを使用して異常判定する手順を示す。
（１）新規設置場所７のカメラ装置１０は画像６を撮像する。少なくとも１枚の画像６があればよく動画である必要はない。
（２）カメラ装置１０は画像６を情報処理装置３０に送信する。
（３）情報処理装置３０は画像６を記憶しておく。
（４）情報処理装置３０は新規設置場所７の画像６がどのクラスタに当てはまるかを判定する。
（５）情報処理装置３０は当てはまると判定した学習モデルＣｘをカメラ装置１０に送信する。カメラ装置１０はこの学習モデルＣｘを使って異常判定するので、異常判定フェーズでは動画を情報処理装置３０に送信する必要がない。 FIG. 19A shows a procedure in which the camera device 10 at the new installation location 7 uses the learning model Cx to determine an abnormality.
(1) The camera device 10 at the new installation location 7 captures the image 6. As long as there is at least one image 6, it does not have to be a moving image.
(2) The camera device 10 transmits the image 6 to the information processing device 30.
(3) The information processing device 30 stores the image 6.
(4) The information processing device 30 determines which cluster the image 6 of the new installation location 7 applies to.
(5) The information processing device 30 transmits the learning model Cx determined to be applicable to the camera device 10. Since the camera device 10 uses this learning model Cx to determine the abnormality, it is not necessary to transmit the moving image to the information processing device 30 in the abnormality determination phase.

図１９（ｂ）は新規設置場所７のカメラ装置１０が撮像した画像６を使用して異常判定する手順を示す。新規設置場所７のカメラ装置１０もある程度の時間が経過すると、学習に十分な画像６を撮像することができる。
（１）新規設置場所７のカメラ装置１０は画像６を撮像する。異常判定するための画像６であるため、複数の画像６（動画）であることが好ましい。
（２）カメラ装置１０は画像６を情報処理装置３０に送信する。
（３）情報処理装置３０は複数の画像６を記憶しておく。
（４）情報処理装置３０はある程度の画像６が蓄積されると正常な画像６を学習し学習モデルＣｘ（Ｖｅｒ２．０）を作成する。
（５）情報処理装置３０は作成した学習モデルＣｘ（Ｖｅｒ２．０）を新規設置場所７のカメラ装置１０に送信する。
（６）カメラ装置１０はこの学習モデルＣｘ（Ｖｅｒ２．０）を使って異常判定することができる。元の学習モデルＣｘと併用してもよい。併用する場合、元の学習モデルＣｘと学習モデルＣｘ（Ｖｅｒ２．０）の一方でも異常と判定した場合、異常を監視センターに送信する。これにより失報を低減できる。 FIG. 19B shows a procedure for determining an abnormality using the image 6 captured by the camera device 10 at the new installation location 7. After a certain amount of time has passed, the camera device 10 at the new installation location 7 can also capture an image 6 sufficient for learning.
(1) The camera device 10 at the new installation location 7 captures the image 6. Since it is an image 6 for determining an abnormality, it is preferable that there are a plurality of images 6 (moving images).
(2) The camera device 10 transmits the image 6 to the information processing device 30.
(3) The information processing device 30 stores a plurality of images 6.
(4) When the information processing device 30 accumulates a certain amount of images 6, it learns the normal images 6 and creates a learning model Cx (Ver2.0).
(5) The information processing device 30 transmits the created learning model Cx (Ver2.0) to the camera device 10 at the new installation location 7.
(6) The camera device 10 can determine an abnormality using this learning model Cx (Ver2.0). It may be used in combination with the original learning model Cx. When used in combination, if it is determined that either the original learning model Cx or the learning model Cx (Ver2.0) is abnormal, the abnormality is transmitted to the monitoring center. As a result, false reports can be reduced.

図１９で説明したように、新規設置場所７のカメラ装置１０は設置された直後から異常判定が可能であり、時間が経過すると設置された場所で撮像した画像６の学習モデルで異常判定できる。 As described with reference to FIG. 19, the camera device 10 at the new installation location 7 can determine the abnormality immediately after it is installed, and after a lapse of time, the abnormality can be determined by the learning model of the image 6 captured at the installed location.

＜その他の適用例＞
以上、本発明を実施するための最良の形態について実施例を用いて説明したが、本発明はこうした実施例に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 <Other application examples>
Although the best mode for carrying out the present invention has been described above with reference to examples, the present invention is not limited to these examples, and various modifications are made without departing from the gist of the present invention. And substitutions can be made.

本実施形態では画像６に基づく異常検出が行われたが、情報処理装置３０は音を識別して異常を検出することも可能である。この場合、カメラ装置１０は音を集音するマイク等を有している。カメラ装置１０が収集した音データは画像６と共に情報処理装置３０に送信される。情報処理装置３０は正常な音データを学習しておき、異常か否かを判定する。この場合も、新規設置場所７への設置の直後から異常を検出できる。 In the present embodiment, the abnormality detection is performed based on the image 6, but the information processing apparatus 30 can also identify the sound and detect the abnormality. In this case, the camera device 10 has a microphone or the like that collects sound. The sound data collected by the camera device 10 is transmitted to the information processing device 30 together with the image 6. The information processing device 30 learns normal sound data and determines whether or not it is abnormal. In this case as well, the abnormality can be detected immediately after the installation at the new installation location 7.

また、図２のシステム構成図ではカメラ装置１０と情報処理装置３０が別体であるが、図１９にて説明したように、カメラ装置１０と情報処理装置３０は一体でもよい。 Further, although the camera device 10 and the information processing device 30 are separate bodies in the system configuration diagram of FIG. 2, as described with reference to FIG. 19, the camera device 10 and the information processing device 30 may be integrated.

また、同じ場所のカメラ装置１０でも日中と夜間の画像は異なるクラスタに分類され得ることを説明したが、より細かく時間帯ごとに異なるクラスタに分類されてもよい。また、天候、季節ごとに異なるクラスタに分類されてもよい。このように、クラスタ設定部３３はカメラ装置１０が画像を撮像する環境に応じて各画像を適切なクラスタに分類できる。 Further, although it has been explained that the daytime and nighttime images can be classified into different clusters even with the camera device 10 at the same location, the images may be further classified into different clusters for each time zone. In addition, it may be classified into different clusters according to the weather and the season. In this way, the cluster setting unit 33 can classify each image into an appropriate cluster according to the environment in which the camera device 10 captures the image.

また、図６、１５などの構成例は、カメラ装置１０と情報処理装置３０による処理の理解を容易にするために、主な機能に応じて分割したものである。処理単位の分割の仕方や名称によって本願発明が制限されることはない。カメラ装置１０と情報処理装置３０の処理は、処理内容に応じて更に多くの処理単位に分割することもできる。また、１つの処理単位が更に多くの処理を含むように分割することもできる。 Further, the configuration examples shown in FIGS. 6 and 15 are divided according to the main functions in order to facilitate understanding of the processing by the camera device 10 and the information processing device 30. The present invention is not limited by the method of dividing the processing unit or the name. The processing of the camera device 10 and the information processing device 30 can be divided into more processing units according to the processing content. It is also possible to divide one processing unit so as to include more processing.

また、図６，１５では情報処理装置３０が一台であるが、同じ情報処理装置３０が複数台、存在してもよいし、複数の情報処理装置３０に図６，１５の機能が分散されていてもよい。 Further, although there is one information processing device 30 in FIGS. 6 and 15, a plurality of the same information processing devices 30 may exist, and the functions of FIGS. 6 and 15 are distributed to the plurality of information processing devices 30. You may be.

なお、通信部３１，画像蓄積部３２は画像取得手段の一例であり、クラスタ設定部３３は分類手段の一例であり、学習部３４は学習手段の一例であり、異常判定部３６は異常判定手段の一例であり、学習モデルＣｘ（Ｖｅｒ２．０）を用いた異常判定部３６は第二の異常判定手段の一例であり、クラスタ判定部３５は決定手段の一例である。 The communication unit 31 and the image storage unit 32 are examples of image acquisition means, the cluster setting unit 33 is an example of classification means, the learning unit 34 is an example of learning means, and the abnormality determination unit 36 is an abnormality determination means. As an example, the abnormality determination unit 36 using the learning model Cx (Ver2.0) is an example of the second abnormality determination means, and the cluster determination unit 35 is an example of the determination means.

６：画像
７：新規設置場所
８：既存設置場所
９：監視センター
１０：カメラ装置
３０：情報処理装置
１００：画像処理システム 6: Image 7: New installation location 8: Existing installation location 9: Monitoring center 10: Camera device 30: Information processing device 100: Image processing system

Claims

An image processing system that detects anomalies from images captured by an image pickup device.
An image acquisition means for acquiring the image from a plurality of first image pickup devices installed at different installation locations, and an image acquisition means.
A classification means for classifying the images having similar characteristics into clusters, and
A learning means for constructing an abnormality determination means for detecting an abnormality by learning the images classified into the cluster for each cluster, and a learning means.
The cluster in which the image of the first image pickup device similar to the image taken by the second image pickup device installed in the installation location different from the first image pickup device is classified is determined and classified into the cluster. It has a determination means for determining the abnormality determination means constructed from the image, and
The abnormality determination means determined by the determination means is an image processing system that determines the presence or absence of an abnormality from the image captured by the second imaging device.

The classification means extracts a local feature amount and a global feature amount from the image captured by the first imaging device, and puts the image having similar local feature amount and the global feature amount at the installation location. The image processing system according to claim 1 , which is classified into the same cluster regardless of the above.

The classification means classifies the image from which the local feature amount and the global feature amount are extracted into several numbers of the clusters, and calculates the degree of separation at which the classification becomes appropriately small.
The image processing system according to claim 2 , wherein the number having the smallest degree of separation is determined, and the images captured by the first image pickup apparatus are classified into the clusters of the number.

The classification means calculates the difference between the image captured by the first image pickup device input to the neural network constructed by the Stacked Auto Encoder and the output value output by the neural network for each cluster, and of each cluster. The image processing system according to claim 3 , wherein the number is determined so that the total value of the differences is the smallest.

The classification means classifies the image captured by the first imaging device at the same installation location into the clusters different depending on the environment in which the image is captured.
The image processing system according to any one of claims 1 to 4 , wherein the learning means constructs the abnormality determination means from the images classified into the clusters different according to the environment.

The learning means extracts the features of the image divided into regions by deep learning.
Based on the characteristics of the area, the shape and movement of the object shown in the image are quantified.
The image processing system according to any one of claims 1 to 4 , wherein the abnormality determination means for classifying the shape and movement of the target as normal is constructed by an SVM (support vector machine).

Image acquisition means for acquiring images from multiple other imaging devices installed at different installation locations, and
A classification means for classifying the images having similar characteristics into clusters, and
A learning model is acquired from an information processing device having a learning means for constructing an abnormality determination means for learning an abnormality by learning the images classified into the clusters for each cluster, and abnormality detection is performed from the captured image. It is an imaging device to perform
The image is transmitted to the information processing apparatus to acquire the abnormality determination means constructed from the cluster in which the images having similar characteristics of the images are classified.
An image pickup apparatus characterized in that it determines the presence or absence of an abnormality in an image captured by the abnormality determination means acquired from the information processing apparatus.

A second abnormality determination means constructed from the image captured by the image pickup device is acquired from the information processing device.
The imaging device according to claim 7 , wherein the abnormality determination is performed by both the second abnormality determination means and the abnormality determination means.

It is a learning model creation method performed by an image processing system that detects anomalies from images captured by an image pickup device.
A step in which the image acquisition means acquires the image from a plurality of first image pickup devices installed at different installation locations, and
A step in which the classification means classifies the images having similar characteristics into clusters, respectively.
A step in which the learning means constructs an abnormality determination means for each of the clusters by learning the images classified into the clusters and detecting an abnormality.
The determining means determined the cluster in which the image similar to the image captured by the second imaging device installed at the installation location different from the first imaging device was classified, and was constructed from the cluster. The step of determining the abnormality determination means and
A learning model creation method in which the determined abnormality determination means has a step of determining the presence or absence of an abnormality from the image captured by the second imaging device.

An information processing device that creates a learning model that detects anomalies from images captured by the image pickup device.
An image acquisition means for acquiring the image from a plurality of first image pickup devices installed at different installation locations, and an image acquisition means.
A classification means for classifying the images having similar characteristics into clusters, and
A learning means for constructing an abnormality determination means for detecting an abnormality by learning the images classified into the cluster for each cluster, and a learning means.
The abnormality determination means constructed from the cluster is determined by determining the cluster in which the image similar to the image captured by the second image pickup device installed at an installation location different from the first image pickup apparatus is classified. anda determining means for determining a,
The abnormality determination means determined by the determination means is an information processing device that determines the presence or absence of an abnormality from the image captured by the second imaging device.