JP6989294B2

JP6989294B2 - Monitoring system and monitoring method

Info

Publication number: JP6989294B2
Application number: JP2017108164A
Authority: JP
Inventors: 利章篠原; 義人東澤; 徹寺田
Original assignee: パナソニックｉ−ＰＲＯセンシングソリューションズ株式会社
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2022-01-05
Anticipated expiration: 2037-05-31
Also published as: JP2018205900A

Description

本開示は、サーバと監視エリアに設置された複数のカメラとが互いに通信可能に接続された監視システム及び監視方法に関する。 The present disclosure relates to a monitoring system and a monitoring method in which a server and a plurality of cameras installed in a monitoring area are connected so as to be able to communicate with each other.

現在、カメラの演算処理能力は、０．３Ｔ（テラ）ｏｐｓ（オプス）と言われている。Ｔ（テラ）は、１０の１２乗を示す値である。ｏｐｓ（オプス）は、演算処理能力を示す単位として知られている。今後、ゲーム機等に搭載される高性能なＧＰＵ（Graphics Processing Unit）やＦＰＧＡ（Field Programmable Gate Array）がカメラの演算処理装置に利用されるべく採用されることが考えられている。その場合、例えば１年後には、カメラの演算処理能力が１０倍以上の約２．６Ｔｏｐｓに飛躍的に向上することが期待されている。 Currently, the computing power of the camera is said to be 0.3T (tera) ops. T (tera) is a value indicating 10 to the 12th power. Ops is known as a unit indicating arithmetic processing power. In the future, it is considered that high-performance GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) mounted on game machines and the like will be adopted for use in camera arithmetic processing units. In that case, for example, after one year, it is expected that the arithmetic processing capacity of the camera will be dramatically improved to about 2.6 Tops, which is more than 10 times.

また、カメラが機械学習の一例としてのディープラーニングを用いて画像認識処理を行う場合、カメラの演算処理能力には、１．３Ｔｏｐｓが要求されるとの指摘がある。この演算処理能力の高さから、従来では、カメラがディープラーニングを用いて画像認識処理を行うことは難しいと考えられていたが、１年後のカメラの演算処理能力では、ディープラーニングを用いて画像認識処理を行うことが十分に可能と考えられる。 Further, it has been pointed out that when the camera performs image recognition processing using deep learning as an example of machine learning, 1.3 Tops is required for the arithmetic processing capacity of the camera. Due to this high computing power, it was conventionally thought that it would be difficult for a camera to perform image recognition processing using deep learning, but one year later, the camera's computing power will use deep learning. It is considered possible to perform image recognition processing.

一方、カメラで撮像された高画質（例えば４Ｋ）の撮像画像データをサーバに転送し、サーバが画像認識処理を行う場合、撮像画像データのサイズの増大に伴ってネットワーク上で伝送される通信量（トラフィック）も必然的に増大し、結果的に通信効率が下がって遅延が発生するようになる。このため、高画質（例えば４Ｋ）の撮像画像データを転送することなく、カメラは、自装置でディープラーニングを用いて画像認識処理を行うことが期待される。 On the other hand, when high-quality (for example, 4K) captured image data captured by a camera is transferred to a server and the server performs image recognition processing, the amount of communication transmitted on the network as the size of the captured image data increases. (Traffic) also inevitably increases, and as a result, communication efficiency decreases and delays occur. Therefore, it is expected that the camera performs image recognition processing by using deep learning in its own device without transferring high-quality (for example, 4K) captured image data.

一般に、ディープラーニングを用いて画像認識処理を行う場合、カメラ等のデバイスは、撮像画像データに含まれる対象物（つまり、被画像認識対象物）を学習し、画像認識処理において用いられるモデルパラメータ（例えば重み係数やしきい値）を変更することで学習モデルを更新する。カメラ等のデバイスは、この更新された学習モデルを基に、撮像画像データに含まれる対象物（つまり、被画像認識対象物）を検出する精度を向上させる。 Generally, when image recognition processing is performed using deep learning, a device such as a camera learns an object (that is, an object to be recognized) contained in captured image data, and model parameters used in the image recognition processing (that is, an object to be recognized). For example, the training model is updated by changing the weight coefficient and threshold value). A device such as a camera improves the accuracy of detecting an object (that is, an object to be recognized as an image) contained in the captured image data based on this updated learning model.

カメラにより撮像される撮像画像データを用いて物体を認識し、物体動き情報を取得する先行技術として、例えば特許文献１の物体追跡装置が提案されている。この物体追跡装置は、物体を撮影可能なカメラから取得される時系列の画像群を用いて、取得された画像に係る画像情報と、その物体の実空間での位置に係る位置情報を含む物体動き情報であって正解とされる情報とを含む教師データセットによって学習する。更に、物体追跡装置は、物体追跡対象の画像毎に、その画像に係る画像情報を入力することで少なくともその物体の実空間での正解とされる位置情報を出力する追跡用識別器を用い、その物体の実空間での刻々の位置情報を取得する。 As a prior art for recognizing an object using captured image data captured by a camera and acquiring object movement information, for example, an object tracking device of Patent Document 1 has been proposed. This object tracking device uses a time-series image group acquired from a camera capable of photographing an object, and includes image information relating to the acquired image and position information relating to the position of the object in real space. Learning is performed by a teacher data set that includes motion information and information that is considered to be the correct answer. Further, the object tracking device uses a tracking classifier that outputs at least the position information that is considered to be the correct answer in the real space of the object by inputting the image information related to the image for each image to be tracked. Acquires the momentary position information of the object in the real space.

特開２０１６−２０６７９５号公報Japanese Unexamined Patent Publication No. 2016-206795

今後、カメラが取り扱う撮像画像データは例えば４Ｋや８Ｋ等の高精細かつ大容量となってデータサイズが増大することが予想されている。このような撮像画像データのサイズの増大に伴い、撮像画像データの検出に用いたパラメータの学習をカメラではなくサーバで行う場合、サーバにおいて処理負荷が集中してしまい、更に、大容量のデータを逐一サーバに送信することで、ネットワーク上のトラフィックが増大し、データ通信時に相応の遅延が生じるという課題が生じてしまう。このような課題に対する技術的対策について、特許文献１のような従来技術では特段の考慮はなされてはいなかった。 In the future, it is expected that the captured image data handled by the camera will have a high definition and a large capacity such as 4K or 8K, and the data size will increase. With the increase in the size of the captured image data, if the parameters used for detecting the captured image data are learned by the server instead of the camera, the processing load will be concentrated on the server, and a large amount of data will be stored. By transmitting to the server one by one, the traffic on the network increases, which causes a problem that a corresponding delay occurs during data communication. No particular consideration has been given to the technical measures for such problems in the prior art as in Patent Document 1.

本開示は、上述した従来の事情に鑑みて案出され、監視エリアに設置された複数のカメラにおいて撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ間で分散し、ネットワーク上のトラフィックの増大を抑制し、複数のカメラに接続されるサーバの処理負荷の軽減を支援する監視システム及び監視方法を提供することを目的とする。 The present disclosure has been devised in view of the above-mentioned conventional circumstances, and when detecting at least one object in each captured image captured by a plurality of cameras installed in the monitoring area, learning of parameters used for the detection. The purpose is to provide a monitoring system and a monitoring method that distributes such processing among a plurality of cameras, suppresses an increase in traffic on the network, and helps reduce the processing load of a server connected to a plurality of cameras. do.

本開示は、サーバと、監視エリアに設置された複数のカメラとが互いに通信可能に接続された監視システムであって、前記サーバは、それぞれの前記カメラの学習モデルを生成する処理能力に対する空きリソースの量に関する情報と、それぞれの前記カメラにより前記監視エリアの撮像により得られた撮像画像とを保持するメモリを有し、前記カメラの前記空きリソースの量に関する情報に基づいて、それぞれの前記カメラにより得られる撮像画像に出現する少なくとも１つのオブジェクトの検出に関して前記カメラが実行する処理を前記カメラごとに決定し、決定された前記処理の実行指示を前記カメラごとに送信し、前記それぞれのカメラは、前記サーバから送信された前記処理の実行指示に基づいて、前記実行指示に対応する処理を実行する、監視システムを提供する。 The present disclosure is a monitoring system in which a server and a plurality of cameras installed in a monitoring area are connected to each other so as to be able to communicate with each other, and the server is a free resource for the processing capacity for generating a learning model of each of the cameras. It has a memory for holding information about the amount of the camera and an image captured by imaging the surveillance area by each camera, and by each camera based on information about the amount of free resources of the camera. The process to be executed by the camera with respect to the detection of at least one object appearing in the obtained captured image is determined for each camera, and the determined execution instruction of the process is transmitted for each camera. Provided is a monitoring system that executes a process corresponding to the execution instruction based on the execution instruction of the process transmitted from the server.

また、本開示は、サーバと、監視エリアに設置された複数のカメラとが互いに通信可能に接続された監視システムを用いた監視方法であって、前記サーバは、それぞれの前記カメラの学習モデルを生成する処理能力に対する空きリソースの量に関する情報と、それぞれの前記カメラにより前記監視エリアの撮像により得られた撮像画像とをメモリに保持し、前記カメラの前記空きリソースの量に関する情報に基づいて、それぞれの前記カメラにより得られる撮像画像に出現する少なくとも１つのオブジェクトの検出に関して前記カメラが実行する処理を前記カメラごとに決定し、決定された前記処理の実行指示を前記カメラごとに送信し、前記それぞれのカメラは、前記サーバから送信された前記処理の実行指示に基づいて、前記実行指示に対応する処理を実行する、監視方法を提供する。 Further, the present disclosure is a monitoring method using a monitoring system in which a server and a plurality of cameras installed in a monitoring area are connected so as to be able to communicate with each other, and the server uses a learning model of each of the cameras. Information on the amount of free resources for the processing power to be generated and the captured images obtained by imaging the surveillance area by each of the cameras are held in memory, and based on the information on the amount of free resources of the cameras. The process to be executed by the camera with respect to the detection of at least one object appearing in the captured image obtained by each of the cameras is determined for each camera, and the determined execution instruction of the process is transmitted for each camera. Each camera provides a monitoring method that executes a process corresponding to the execution instruction based on the execution instruction of the process transmitted from the server.

本開示によれば、監視エリアに設置された複数のカメラにおいて撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ間で分散し、ネットワーク上のトラフィックの増大を抑制し、複数のカメラに接続されるサーバの処理負荷の軽減を支援することができる。 According to the present disclosure, when detecting at least one object in each captured image captured by a plurality of cameras installed in a monitoring area, processing such as learning of parameters used for the detection is distributed among the plurality of cameras. However, it is possible to suppress the increase in traffic on the network and help reduce the processing load of the server connected to multiple cameras.

実施の形態１の監視システムのシステム構成の一例を示すブロック図A block diagram showing an example of the system configuration of the monitoring system of the first embodiment. 学習及び検出の概要例の説明図Explanatory diagram of an outline example of learning and detection 実施の形態１のカメラの内部構成の一例を詳細に示すブロック図A block diagram showing in detail an example of the internal configuration of the camera according to the first embodiment. 実施の形態１のサーバの内部構成の一例を詳細に示すブロック図A block diagram showing in detail an example of the internal configuration of the server of the first embodiment. デバイスにおける学習の概要例の説明図Explanatory diagram of an overview example of learning on a device カメラの検出の概要例の説明図Explanatory diagram of a schematic example of camera detection 監視システムにおける複数のカメラを用いた学習時の分散を行う時の処理概要例の説明図Explanatory diagram of an example of processing outline when performing distribution during learning using multiple cameras in a surveillance system 監視システムにおけるリソース管理の概要例の説明図Explanatory diagram of an outline example of resource management in a monitoring system 実施の形態１においてサーバがカメラに処理の実行指示を行う動作手順の一例を詳細に示すシーケンス図A sequence diagram showing in detail an example of an operation procedure in which the server gives an instruction to execute a process to the camera in the first embodiment. 実施の形態１においてサーバがモデルパラメータのフィードバック量を制御する動作手順の一例を詳細に示すシーケンス図A sequence diagram showing in detail an example of an operation procedure in which the server controls the feedback amount of the model parameters in the first embodiment. 監視システムにおける学習結果の共有の概要例の説明図Explanatory diagram of an outline example of sharing learning results in a monitoring system ローカル学習時に表示されるＵＩ画面の一例を示す図A diagram showing an example of the UI screen displayed during local learning 統合学習時にサーバの表示部に表示されるＵＩ画面の一例を示す図A diagram showing an example of the UI screen displayed on the display of the server during integrated learning. 実施の形態２のカメラの処理実行部の内部構成の一例を詳細に示すブロック図A block diagram showing in detail an example of the internal configuration of the processing execution unit of the camera of the second embodiment. カメラのローカル学習の動作手順の一例を詳細に示すフローチャートA flowchart showing in detail an example of the operation procedure of local learning of the camera 監視システムにおける学習結果の共有の概要例の説明図Explanatory diagram of an outline example of sharing learning results in a monitoring system ローカル学習時に表示されるＵＩ画面の一例を示す図A diagram showing an example of the UI screen displayed during local learning 統合学習時にサーバの表示部に表示されるＵＩ画面の一例を示す図A diagram showing an example of the UI screen displayed on the display of the server during integrated learning.

（第１の実施の形態に至る経緯）
今後、カメラが取り扱う撮像画像データは例えば４Ｋや８Ｋ等の高精細かつ大容量となってデータサイズが増大することが予想されている。このような撮像画像データのサイズの増大に伴い、撮像画像データの検出に用いたパラメータの学習をカメラではなくサーバで行う場合、サーバにおいて処理負荷が集中してしまい、更に、大容量のデータを逐一サーバに送信することで、ネットワーク上のトラフィックが増大し、データ通信時に相応の遅延が生じるという課題が生じてしまう。このような課題に対する技術的対策について、特許文献１のような従来技術では特段の考慮はなされてはいなかった。 (Background to the first embodiment)
In the future, it is expected that the captured image data handled by the camera will have a high definition and a large capacity such as 4K or 8K, and the data size will increase. With the increase in the size of the captured image data, if the parameters used for detecting the captured image data are learned by the server instead of the camera, the processing load is concentrated on the server, and a large amount of data is stored. By transmitting to the server one by one, the traffic on the network increases, which causes a problem that a corresponding delay occurs during data communication. No particular consideration has been given to the technical measures for such problems in the prior art as in Patent Document 1.

そこで、実施の形態１では、監視エリアに設置された複数のカメラにおいて撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ間で分散し、ネットワーク上のトラフィックの増大を抑制し、複数のカメラに接続されるサーバの処理負荷の軽減を支援する監視システム及び監視方法の例を説明する。 Therefore, in the first embodiment, when detecting at least one object in each captured image captured by a plurality of cameras installed in the monitoring area, processing such as learning of parameters used for the detection is performed between the plurality of cameras. An example of a monitoring system and a monitoring method that distributes the data, suppresses an increase in traffic on the network, and helps reduce the processing load of a server connected to a plurality of cameras will be described.

（実施の形態１）
図１は、実施の形態１の監視システム５のシステム構成の一例を示すブロック図である。 (Embodiment 1)
FIG. 1 is a block diagram showing an example of the system configuration of the monitoring system 5 of the first embodiment.

監視システム５は、例えば防犯用の監視システムであって、銀行、店舗、企業、施設等の屋内、又は、駐車場や公園等の屋外に設置される。銀行、店舗、企業、施設等の屋内、又は、駐車場や公園等の屋外は、監視システム５の監視エリアとなる。本実施の形態の監視システム５は、人工知能（ＡＩ：Artificial Intelligence）技術を利用し、撮像画像に出現する少なくとも１つの対象物（言い換えると、オブジェクト）を認識する少なくとも１つのカメラ１０と、サーバ３０と、レコーダ５０とを含む構成を有する。少なくとも１つのカメラ１０と、サーバ３０と、レコーダ５０とは、ネットワークＮＷを介して互いに通信可能に接続される。 The monitoring system 5 is, for example, a security monitoring system, and is installed indoors of banks, stores, companies, facilities, etc., or outdoors of parking lots, parks, etc. Indoors such as banks, stores, companies, facilities, etc., or outdoors such as parking lots and parks are monitoring areas of the monitoring system 5. The monitoring system 5 of the present embodiment uses artificial intelligence (AI) technology, and has at least one camera 10 and a server that recognize at least one object (in other words, an object) appearing in a captured image. It has a configuration including 30 and a recorder 50. The at least one camera 10, the server 30, and the recorder 50 are communicably connected to each other via the network NW.

以下、複数のカメラ１０をそれぞれ区別する必要がある場合には、カメラ１０Ａ，１０Ｂ，１０Ｃ，…と表記する。複数のカメラ１０は、監視エリアとして、例えば建物内の同じ場所に設置されてもよいし、一部のカメラ１０が他のカメラ１０とは異なる場所に設置されてもよい。ここでは、監視エリアとして異なる場所に設置されたカメラ１０Ａ，１０Ｂ，１０Ｃの設置状況（例えば設置角度やカメラの画角）が同じであることを想定している。例えば、カメラ１０Ａ，１０Ｂ，１０Ｃは、いずれも自動ドアが設置された出入口の上方に位置するように壁面に取り付けられ、出入口を出入りする人物をやや上方から見下ろすように撮像する。なお、カメラ１０Ａ，１０Ｂ，１０Ｃの設置状況は、自動ドアが設置された出入口の情報に位置する場合に限定されない。 Hereinafter, when it is necessary to distinguish between the plurality of cameras 10, they are referred to as cameras 10A, 10B, 10C, .... The plurality of cameras 10 may be installed as a monitoring area in the same place in a building, for example, or some cameras 10 may be installed in a place different from other cameras 10. Here, it is assumed that the installation conditions (for example, the installation angle and the angle of view of the camera) of the cameras 10A, 10B, and 10C installed in different places as the monitoring area are the same. For example, the cameras 10A, 10B, and 10C are all mounted on the wall surface so as to be located above the doorway where the automatic door is installed, and image a person entering and exiting the doorway so as to look down from slightly above. The installation status of the cameras 10A, 10B, and 10C is not limited to the case where the automatic door is located in the information of the installed entrance / exit.

先ず始めに、人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングに用いられるニューラルネットワーク（言い換えると、学習モデル）を生成するための学習、及び学習済みの学習モデル（以下、「学習済みモデル」という）にデータを入力して結果を出力する検出（つまり、推論）について、その概要を説明する。 First of all, learning to generate a neural network (in other words, a learning model) used for deep learning as an example of machine learning of artificial intelligence (AI) technology, and a learned learning model (hereinafter, "learned"). The outline of detection (that is, inference) that inputs data to (called a model) and outputs the result will be described.

図２は、学習及び検出の概要例の説明図である。 FIG. 2 is an explanatory diagram of a schematic example of learning and detection.

学習処理（以下、単に「学習」という）は、例えば人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングによって行われる処理である。言い換えると、機械学習の１つとして、近年注目されているニューラルネットワーク（以下、「ＮＮ」と略記する）におけるディープラーニング（つまり、深層学習）を用いて学習が行われる。ディープラーニングによる機械学習では、教師データを用いた「教師有り学習」と、教師データを用いない「教師無し学習」とが行われる。機械学習の結果、学習済みモデルが生成される。一方、検出は、生成された学習済みモデルにデータを入力して結果を得る処理である。 The learning process (hereinafter, simply referred to as “learning”) is a process performed by deep learning as an example of machine learning of artificial intelligence (AI) technology, for example. In other words, as one of machine learning, learning is performed using deep learning (that is, deep learning) in a neural network (hereinafter, abbreviated as "NN"), which has been attracting attention in recent years. In machine learning by deep learning, "learning with teacher" using teacher data and "learning without teacher" using teacher data are performed. As a result of machine learning, a trained model is generated. On the other hand, detection is a process of inputting data into a generated trained model and obtaining a result.

学習は、リアルタイムで行われてもよいが、多くの演算処理を必要とするので、通常、オフライン（つまり、非同期）で行われる。一方、検出処理（以下、単に「検出」という）は、通常リアルタイムで行われる。また、学習が行われるデバイスは、例えばカメラ１０、サーバ３０、レコーダ５０のいずれであってもよく、ここでは、カメラ１０において学習される場合を示す。一方、検出は、カメラ１０において行われる。なお、カメラ１０により撮像された撮像画像データをサーバ３０やレコーダ５０に転送しても、ネットワークＮＷ上のトラフィックが発生しない場合には、サーバ３０やレコーダ５０が検出を行ってもよい。 Learning may be done in real time, but it is usually done offline (ie, asynchronous) because it requires a lot of arithmetic processing. On the other hand, the detection process (hereinafter, simply referred to as "detection") is usually performed in real time. Further, the device on which learning is performed may be any of, for example, a camera 10, a server 30, and a recorder 50, and here, the case where learning is performed by the camera 10 is shown. On the other hand, the detection is performed by the camera 10. Even if the captured image data captured by the camera 10 is transferred to the server 30 or the recorder 50, if the traffic on the network NW does not occur, the server 30 or the recorder 50 may perform the detection.

学習時、デバイス１５０は、多くの学習データ（例えばカメラ１０で撮像された画像データ）を入力する。デバイス１５０は、入力された学習データを基に、機械学習（例えば、ディープラーニングの処理）を行い、学習モデルであるニューラルネットワーク（ＮＮ１４０）のモデルパラメータＰを更新する。モデルパラメータＰは、ＮＮ１４０を構成する複数のそれぞれのニューロンにおいて設定される重み付け係数（つまり、バイアス）やしきい値等である。デバイス１５０は、機械学習（例えばディープラーニングの処理）を行う際、教師データを用い、学習データごとに正誤を取得するか、或いは評価値（つまり、スコア）を算出する。デバイス１５０は、学習データの正誤或いはスコアの高低に応じて、モデルパラメータＰの学習度合いを変更する。学習後、ＮＮ１４０は、学習済みモデルとして、デバイス１５０における検出に用いられる。 At the time of learning, the device 150 inputs a lot of learning data (for example, image data captured by the camera 10). The device 150 performs machine learning (for example, deep learning processing) based on the input learning data, and updates the model parameter P of the neural network (NN140) which is a learning model. The model parameter P is a weighting coefficient (that is, a bias), a threshold value, or the like set in each of the plurality of neurons constituting the NN 140. When the device 150 performs machine learning (for example, deep learning processing), the teacher data is used to acquire correctness for each learning data, or to calculate an evaluation value (that is, a score). The device 150 changes the learning degree of the model parameter P according to the correctness of the training data or the high or low score. After training, the NN 140 is used as a trained model for detection on the device 150.

検出時（つまり、推論時）、デバイス１５０は、入力データ（例えばカメラ１０でリアルタイムに撮像された撮像画像データ）を入力し、ＮＮ１４０において推論を実行し、その実行により得られた推論結果（つまり、検出されたオブジェクトの判定結果）を出力する。判定結果は、例えば、撮像画像データに含まれる対象物の有無に応じた正報や誤報に関する情報、及び、対象物の評価値を示すスコアに関する情報を含む。正報とは、対象物の検出時に高い確度で正しく検出されたことを示すレポートである。誤報とは、対象物の検出時に高い確度で誤って検出されたことを示すレポートである。 At the time of detection (that is, at the time of inference), the device 150 inputs input data (for example, image data captured in real time by the camera 10), executes inference in the NN 140, and the inference result obtained by the execution (that is, that is). , Judgment result of the detected object) is output. The determination result includes, for example, information on a correct report or a false report according to the presence or absence of an object included in the captured image data, and information on a score indicating an evaluation value of the object. The correct report is a report showing that the object was detected correctly with high accuracy at the time of detection. A false alarm is a report showing that an object was erroneously detected with high accuracy when it was detected.

図３は、実施の形態１のカメラ１０の内部構成の一例を詳細に示すブロック図である。 FIG. 3 is a block diagram showing in detail an example of the internal configuration of the camera 10 of the first embodiment.

カメラ１０は、例えば監視エリアの被写体像を撮像して撮像画像データを取得する。具体的には、カメラ１０は、レンズ１１と、イメージセンサ１２と、信号処理部１３と、処理実行部１４と、リソース監視部１５と、クロップエンコード部１７と、ネットワークＩ／Ｆ１６とを含む構成である。 For example, the camera 10 captures a subject image in a monitoring area and acquires captured image data. Specifically, the camera 10 includes a lens 11, an image sensor 12, a signal processing unit 13, a processing execution unit 14, a resource monitoring unit 15, a crop encoding unit 17, and a network I / F 16. Is.

カメラ１０は、監視エリアＳＡからの被写体像を入射可能に配されたレンズ１１を介して、監視エリアＳＡからの入射された被写体像をイメージセンサ１２に結像し、イメージセンサ１２において被写体像（つまり、光学像）を電気信号に変換して撮像する。少なくともレンズ１１及びイメージセンサ１２により、カメラ１０の撮像部が構成される。カメラ１０は、イメージセンサ１２において得られた電気信号を用いて、信号処理部１３においてＲＧＢ信号を生成したり、ホワイトバランスやコントラスト調整等の既定の各種の画像処理を行うことで、撮像画像データを生成して出力する。 The camera 10 forms an image of the incident subject image from the surveillance area SA on the image sensor 12 via the lens 11 arranged so that the subject image from the surveillance area SA can be incident, and the subject image (the subject image ( That is, the optical image) is converted into an electric signal and imaged. At least the lens 11 and the image sensor 12 constitute an image pickup unit of the camera 10. The camera 10 uses the electric signal obtained by the image sensor 12 to generate an RGB signal in the signal processing unit 13 and to perform various predetermined image processing such as white balance and contrast adjustment to obtain captured image data. Is generated and output.

処理実行部１４は、例えばＧＰＵ（Graphics Processing Unit）又はＦＰＧＡ（Field Programmable Gate Array）を用いて構成される。今後、高性能で演算処理能力の高いＧＰＵ又はＦＰＧＡがカメラ１０のプロセッサとして採用されてくると、カメラ１０の演算処理能力は飛躍的に向上し、カメラ１０においてディープラーニングの処理が十分に実行可能であると期待される。処理実行部１４は、ＧＰＵ又はＦＰＧＡにおける処理実行によって生成又は更新されたＮＮ１４０としての学習モデル又は学習済みモデルを含み、入力された撮像画像データに対し、撮像画像に現れる少なくとも１つの対象物（つまり、オブジェクト）の判定結果を出力する。 The processing execution unit 14 is configured by using, for example, a GPU (Graphics Processing Unit) or an FPGA (Field Programmable Gate Array). In the future, when a GPU or FPGA with high performance and high arithmetic processing power is adopted as the processor of the camera 10, the arithmetic processing capacity of the camera 10 will be dramatically improved, and the deep learning processing can be sufficiently executed in the camera 10. Is expected to be. The processing execution unit 14 includes a learning model or a trained model as an NN140 generated or updated by processing execution in the GPU or FPGA, and has at least one object (that is, an object) appearing in the captured image with respect to the input captured image data. , Object) judgment result is output.

リソース監視部１５は、処理実行部１４内のＧＰＵ或いはＦＰＧＡやメモリ等の使用状況を基に、カメラ１０の処理能力に関する情報（例えば空きリソースの量）を監視する。 The resource monitoring unit 15 monitors information (for example, the amount of free resources) regarding the processing capacity of the camera 10 based on the usage status of the GPU, FPGA, memory, etc. in the processing execution unit 14.

クロップエンコード部１７は、検出時、撮像画像データに現れる対象物（つまり、オブジェクト）の一部を切り出し、処理すべき撮像画像データ或いはサムネイルのデータとして出力する。 The crop encoding unit 17 cuts out a part of an object (that is, an object) that appears in the captured image data at the time of detection, and outputs it as captured image data or thumbnail data to be processed.

ネットワークＩ／Ｆ１６は、ネットワークＮＷとの接続を制御する。カメラ１０は、ネットワークＩ／Ｆ１６を介して、サーバ３０やレコーダ５０に対し、処理実行部１４から出力される対象物（つまり、オブジェクト）の判定結果、リソース監視部１５によって監視された空きリソースの量、サムネイルのデータ等を送信する。また、カメラ１０は、ネットワークＩ／Ｆ１６を介して、サーバ３０やレコーダ５０、他のカメラ１０から、学習の結果であるモデルパラメータＰを受信する。 The network I / F16 controls the connection with the network NW. The camera 10 determines the object (that is, an object) output from the processing execution unit 14 to the server 30 and the recorder 50 via the network I / F16, and the free resource monitored by the resource monitoring unit 15. Send the amount, thumbnail data, etc. Further, the camera 10 receives the model parameter P, which is the result of learning, from the server 30, the recorder 50, and another camera 10 via the network I / F16.

図４は、実施の形態１のサーバ３０の内部構成の一例を詳細に示すブロック図である。 FIG. 4 is a block diagram showing in detail an example of the internal configuration of the server 30 of the first embodiment.

サーバ３０は、プロセッサ（例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）又はＤＳＰ（Digital Signal Processor））３１と、メモリ３２と、通信部３３と、操作部３６と、表示部３７と、学習用ＤＢ（データベース）３４と、テーブルメモリ３５とを含む構成である。プロセッサ３１は、メモリ３２と協働して、サーバ３０の各部の処理や制御を統括的に実行する。メモリ３２は、不揮発メモリ及び揮発メモリを有する。不揮発メモリには、例えば複数のカメラ１０Ａ，１０Ｂ，１０Ｃから通知された、それぞれのカメラ１０Ａ，１０Ｂ，１０Ｃの単価コストに関する情報（例えばカメラ１０Ａ，１０Ｂ，１０Ｃの電力コストに関する情報）が記憶される。電力コストに関する情報とは、詳細は後述するが、例えばカメラ１０Ａ，１０Ｂ，１０Ｃがどのくらい使用されれば結果的にどの程度の電力量（つまり、コスト）がかかるかを示す指標値である。 The server 30 includes a processor (for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit) or a DSP (Digital Signal Processor)) 31, a memory 32, a communication unit 33, an operation unit 36, and a display unit 37. It is configured to include a learning DB (database) 34 and a table memory 35. The processor 31 collectively executes processing and control of each part of the server 30 in cooperation with the memory 32. The memory 32 has a non-volatile memory and a volatile memory. In the non-volatile memory, for example, information regarding the unit price cost of each of the cameras 10A, 10B, 10C notified from the plurality of cameras 10A, 10B, 10C (for example, information regarding the power cost of the cameras 10A, 10B, 10C) is stored. .. The information regarding the electric power cost, which will be described in detail later, is an index value indicating, for example, how much electric power (that is, cost) is required as a result of how much the cameras 10A, 10B, and 10C are used.

プロセッサ３１は、サーバ３０が機械学習（例えば、ディープラーニングの処理）を行う場合、不揮発メモリに記憶されたプログラムを実行し、学習モデル（ニューラルネットワーク：ＮＮ）を生成する。また、サーバ３０は、複数のカメラ１０から学習の結果であるモデルパラメータＰを受信し、監視エリアＳＡに設置されたそれぞれのカメラ１０の設置状況（つまり、設置角度や画角等の設置環境）が同一であるモデルパラメータＰを統合する。 When the server 30 performs machine learning (for example, deep learning processing), the processor 31 executes a program stored in the non-volatile memory and generates a learning model (neural network: NN). Further, the server 30 receives the model parameter P which is the result of learning from the plurality of cameras 10, and the installation status of each camera 10 installed in the monitoring area SA (that is, the installation environment such as the installation angle and the angle of view). Integrate the model parameters P that are the same.

学習用ＤＢ（データベース）３４には、複数のカメラ１０から送信されてサーバ３０において受信された学習の結果であるモデルパラメータＰ（例えば重み付け係数やしきい値）が保存される。 In the learning DB (database) 34, a model parameter P (for example, a weighting coefficient or a threshold value) which is a result of learning transmitted from a plurality of cameras 10 and received by the server 30 is stored.

テーブルメモリ３５は、複数のカメラ１０の処理能力に関する情報（例えば空きリソースの量）が登録されたテーブルを記憶する。 The table memory 35 stores a table in which information regarding the processing capacity of the plurality of cameras 10 (for example, the amount of free resources) is registered.

操作部３６は、ユーザが操作可能な学習ボタンｂｔ５（例えば図１３参照）等、各種ボタンを有し、ユーザの入力操作を受け付ける。 The operation unit 36 has various buttons such as a learning button bt5 (see, for example, FIG. 13) that can be operated by the user, and accepts the input operation of the user.

表示部３７は、サーバ３０における統合学習の処理結果を提示するＵＩ（ユーザインタフェース）画面３１０（例えば図１２又は図１３参照）を表示する。 The display unit 37 displays a UI (user interface) screen 310 (see, for example, FIG. 12 or FIG. 13) that presents the processing result of the integrated learning in the server 30.

図５は、デバイス１５０における学習の概要例の説明図である。 FIG. 5 is an explanatory diagram of a schematic example of learning in the device 150.

ここでは、デバイス１５０が撮像画像に出現する「車」を対象物ｏｂｊとして学習する場合を例示して説明する。前述したように、学習は、通常オフライン（非同期）で行われる処理であり、カメラ１０、サーバ３０、レコーダ５０のいずれで行われてもよい。本実施の形態では、デバイス１５０の一例として、カメラ１０が学習を行う。デバイス１５０は、処理実行部１６４と、リソース監視部１６５と、ネットワークＩ／Ｆ１６６と、パラメータ勾配算出部１６８とを含む構成である。 Here, a case where the device 150 learns the “car” appearing in the captured image as the object obj will be illustrated and described. As described above, the learning is a process normally performed offline (asynchronously), and may be performed by any of the camera 10, the server 30, and the recorder 50. In this embodiment, the camera 10 learns as an example of the device 150. The device 150 includes a processing execution unit 164, a resource monitoring unit 165, a network I / F 166, and a parameter gradient calculation unit 168.

ネットワークＩ／Ｆ１６６は、ネットワークＮＷとの接続を制御し、学習データを、ネットワークＮＷを介して受信する。ここでは、学習データは、車を対象物ｏｂｊとする撮像画像データｇｚ１，ｇｚ２である。各撮像画像データｇｚ１，ｇｚ２は、それぞれスコア（評価値）と正報或いは誤報を付加した教師データである。例えば、撮像画像データｇｚ１は、対象物となる「車」を含む撮像画像であり、高いスコア或いは正報を持つ教師データである。一方、撮像画像データｇｚ２は、対象物となる車ではない「木」の画像であり、低いスコア或いは誤報を持つ教師データである。 The network I / F166 controls the connection with the network NW and receives the learning data via the network NW. Here, the learning data is the captured image data gz1 and gz2 with the car as the object obj. The captured image data gz1 and gz2 are teacher data to which a score (evaluation value) and a positive report or a false report are added, respectively. For example, the captured image data gz1 is a captured image including a "car" as an object, and is teacher data having a high score or a positive report. On the other hand, the captured image data gz2 is an image of a "tree" that is not a target vehicle, and is teacher data having a low score or a false alarm.

処理実行部１６４は、ネットワークＩ／Ｆ１６６を介して入力された、これらの教師データを基に推論を実行することで、学習モデルのモデルパラメータＰ（例えば重み付け係数やしきい値等）を更新する。また、処理実行部１６４は、更新されモデルパラメータＰを、ネットワークＩ／Ｆ１６６を介して、カメラ１０、サーバ３０、レコーダ５０等、他のデバイスに送信する。このように、「教師有り学習」を行うことで、学習能力が高まり、処理実行部１６４は、高品質な学習モデルを生成できる。 The processing execution unit 164 updates the model parameter P (for example, weighting coefficient, threshold value, etc.) of the learning model by executing inference based on these teacher data input via the network I / F166. .. Further, the processing execution unit 164 transmits the updated model parameter P to other devices such as the camera 10, the server 30, and the recorder 50 via the network I / F166. By performing "supervised learning" in this way, the learning ability is enhanced, and the processing execution unit 164 can generate a high-quality learning model.

パラメータ勾配算出部１６８は、教師データの撮像画像に出現する対象物の勾配を算出する。例えば、横からカメラにより撮像された撮像画像と正面からカメラにより撮像された撮像画像とでは、同じ対象物であっても撮像画像は異なる。つまり、カメラの設置状況（例えば設置角度や画角）に応じて、同じ対象物を検出する際に用いられる学習モデルのモデルパラメータＰも異なってくる。このため、パラメータ勾配算出部１６８は、撮像方向を表す勾配（以下、「パラメータ勾配」という）を算出し、ネットワークＩ／Ｆ１６６を介して、パラメータ勾配Ｐｔをカメラ１０、サーバ３０、レコーダ５０等、他のデバイスに送信する。パラメータ勾配Ｐｔは、モデルパラメータと一緒に或いは別に送信されてもよい。いずれにせよ、カメラの設置状況は頻繁に変更されないので、パラメータ勾配Ｐｔは少なくとも１回送信されればよい。パラメータ勾配Ｐｔを用いることで、カメラの設置状況毎に異なる学習モデルが利用可能となる。 The parameter gradient calculation unit 168 calculates the gradient of the object appearing in the captured image of the teacher data. For example, the captured image captured by the camera from the side and the captured image captured by the camera from the front are different even if they are the same object. That is, the model parameter P of the learning model used when detecting the same object also differs depending on the installation state of the camera (for example, the installation angle and the angle of view). Therefore, the parameter gradient calculation unit 168 calculates a gradient representing the imaging direction (hereinafter referred to as “parameter gradient”), and sets the parameter gradient Pt to the camera 10, the server 30, the recorder 50, etc. via the network I / F166. Send to other devices. The parameter gradient Pt may be transmitted with or separately from the model parameters. In any case, the installation status of the camera does not change frequently, so the parameter gradient Pt may be transmitted at least once. By using the parameter gradient Pt, different learning models can be used depending on the installation status of the camera.

リソース監視部１６５は、処理実行部１６４内のＧＰＵやメモリ等の使用状況を基に、空きリソースの量を監視する。なお、デバイス１５０がカメラ１０である場合には、図５に示す処理実行部１６４及びパラメータ勾配算出部１６８は図３の処理実行部１４に対応し、図５に示すリソース監視部１６５は図３に示すリソース監視部１５に対応し、図５に示すネットワークＩ／Ｆ１６６は図３に示すネットワークＩ／Ｆ１６に対応する。 The resource monitoring unit 165 monitors the amount of free resources based on the usage status of the GPU, memory, etc. in the processing execution unit 164. When the device 150 is the camera 10, the processing execution unit 164 and the parameter gradient calculation unit 168 shown in FIG. 5 correspond to the processing execution unit 14 of FIG. 3, and the resource monitoring unit 165 shown in FIG. 5 corresponds to the processing execution unit 14. Corresponds to the resource monitoring unit 15 shown in FIG. 5, and the network I / F166 shown in FIG. 5 corresponds to the network I / F16 shown in FIG.

図６は、カメラ１０の検出の概要例の説明図である。 FIG. 6 is an explanatory diagram of a schematic example of detection of the camera 10.

ここでは、カメラ１０が撮像画像に出現する「車」を対象物として検出する場合を例示して説明する。カメラ１０の処理実行部１４は、機械学習（例えばディープラーニングの処理）が行われた後の学習モデル（つまり、学習済みモデル）を有する。処理実行部１４は、レンズ１１を通して撮像された被写体の撮像画像ｏｇを入力し、学習済みモデルを用いて検出（つまり、撮像画像ｏｇに出現するオブジェクトの推論）を行い、その検出結果（つまり、推論結果）を出力する。クロップエンコード部１７は、被写体の撮像画像ｏｇに含まれる、対象物となる画像を切り出し、検出の結果として切り出し画像を出力する。 Here, a case where the camera 10 detects a “car” appearing in the captured image as an object will be described as an example. The processing execution unit 14 of the camera 10 has a learning model (that is, a trained model) after machine learning (for example, deep learning processing) has been performed. The processing execution unit 14 inputs the captured image og of the subject captured through the lens 11, detects it using the trained model (that is, infers the object appearing in the captured image og), and the detection result (that is, that is). Inference result) is output. The crop encoding unit 17 cuts out an image to be an object included in the captured image og of the subject, and outputs the cut-out image as a result of detection.

ここでは、クロップエンコード部１７によって切り出された、「車」の切り出し画像ｔｇ２と、「木」の切り出し画像ｔｇ１が出力される。「車」の切り出し画像ｔｇ２は、対象物となる車の撮像画像を含むので、高いスコアと正報を有する。一方、「木」の切り出し画像ｔｇ１は、対象物となる車の撮像画像を含まないので、低いスコアと誤報を有する。 Here, the cut-out image tg2 of the "car" and the cut-out image tg1 of the "tree" cut out by the crop encoding unit 17 are output. Since the cut-out image tg2 of the "car" includes a captured image of the car as an object, it has a high score and a positive report. On the other hand, the cut-out image tg1 of the "tree" does not include the captured image of the target vehicle, and therefore has a low score and a false alarm.

次に、本実施の形態の監視システム５の具体的な動作について、図面を参照して説明する。 Next, the specific operation of the monitoring system 5 of the present embodiment will be described with reference to the drawings.

図７は、監視システム５における複数のカメラを用いた学習時の分散を行う時の処理概要例の説明図である。 FIG. 7 is an explanatory diagram of an example of processing outline when performing distribution during learning using a plurality of cameras in the monitoring system 5.

前述したように、学習では、人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングの処理を用いて生成される学習モデル（つまり、ニューラルネットワーク）のモデルパラメータＰを更新する処理が行われる。一例として、学習を行うデバイスとして、３つのカメラ１０Ａ，１０Ｂ，１０Ｃが学習を行う場合を示す。なお、学習を行うデバイスは、カメラに限らず、サーバ、レコーダでもよい。各カメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ入力した撮像画像データに対し、例えば「教師なし学習」を行う。教師なし学習では、カメラ１０は、学習モデルのモデルパラメータが収束しない場合、アラームを発生する。このとき、ユーザは、アラームを解除して、「教師あり学習」を行う。教師あり学習では、ユーザは、画像データの正報或いは誤報を入力する。なお、教師データの入力では、画像データの正報或いは誤報を入力する代わりに、正報或いは誤報とともにスコア（評価値）を入力してもよい。スコアは、撮像画像データが対象物を含む撮像画像データであることを評価する値であり、例えば８０点，１０点等の点数や、５０％，２０％等の確率で表現される。 As described above, in learning, a process of updating the model parameter P of a learning model (that is, a neural network) generated by using deep learning processing as an example of machine learning of artificial intelligence (AI) technology is performed. .. As an example, a case where three cameras 10A, 10B, and 10C perform learning as a learning device is shown. The device for learning is not limited to the camera, but may be a server or a recorder. Each camera 10A, 10B, 10C performs, for example, "unsupervised learning" on the input image data. In unsupervised learning, the camera 10 raises an alarm if the model parameters of the learning model do not converge. At this time, the user cancels the alarm and performs "supervised learning". In supervised learning, the user inputs a positive or false report of image data. In the input of the teacher data, instead of inputting the positive report or the false report of the image data, the score (evaluation value) may be input together with the positive report or the false report. The score is a value for evaluating that the captured image data is captured image data including an object, and is expressed by, for example, a score of 80 points, 10 points, or a probability of 50%, 20%, or the like.

３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ学習の結果であるモデルパラメータＰをサーバ３０に送信する。また、送信されるモデルパラメータＰには、前述したパラメータ勾配Ｐｔが付加される。 The three cameras 10A, 10B, and 10C each transmit the model parameter P, which is the result of learning, to the server 30. Further, the parameter gradient Pt described above is added to the transmitted model parameter P.

サーバ３０は、３つのカメラ１０Ａ，１０Ｂ，１０Ｃから送信されたモデルパラメータＰを基に、学習モデルのモデルパラメータＰを更新する。このとき、パラメータ勾配Ｐｔが同じであるモデルパラメータ、つまり、カメラの設置状況が同じであるモデルパラメータを統合する。従って、パラメータ勾配が同じである学習モデルのモデルパラメータが更新される。ここでは、カメラ１０Ａ，１０Ｂ，１０Ｃの設置状況はいずれも同じであり、サーバ３０は、カメラ１０Ａ，１０Ｂ，１０Ｃの、更新された各モデルパラメータを統合する。 The server 30 updates the model parameter P of the learning model based on the model parameter P transmitted from the three cameras 10A, 10B, and 10C. At this time, the model parameters having the same parameter gradient Pt, that is, the model parameters having the same camera installation status are integrated. Therefore, the model parameters of the learning model with the same parameter gradient are updated. Here, the installation status of the cameras 10A, 10B, and 10C is the same, and the server 30 integrates the updated model parameters of the cameras 10A, 10B, and 10C.

サーバ３０は、統合したモデルパラメータを３つのカメラ１０Ａ，１０Ｂ，１０Ｃにフィードバック送信する。これにより、３つのカメラ１０Ａ，１０Ｂ，１０Ｃに記憶されるモデルパラメータは、同じになる。なお、３つのカメラ１０Ａ，１０Ｂ，１０Ｃからサーバ３０へのモデルパラメータの送信は、非同期で行われる。 The server 30 feeds back the integrated model parameters to the three cameras 10A, 10B, and 10C. As a result, the model parameters stored in the three cameras 10A, 10B, and 10C are the same. The model parameters are transmitted asynchronously from the three cameras 10A, 10B, and 10C to the server 30.

図８は、監視システム５におけるリソース管理の概要例の説明図である。 FIG. 8 is an explanatory diagram of an outline example of resource management in the monitoring system 5.

３つのカメラ１０Ａ，１０Ｂ，１０Ｃでは、リソース監視部１５は、それぞれ学習モデルを生成するＧＰＵ或いはＦＰＧＡ等の処理能力に対し、空きリソースの量（言い換えると、処理能力の余り度合いを示す余力）を監視している。３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、リソース監視部１５によって監視された空きリソースの量を非同期で又は周期的にサーバ３０に通知する。空きリソースの量は、処理能力の百分率（％）で表される。一例として、カメラ１０Ａの空きリソースの量が９０％であり、カメラ１０Ｂの空きリソースの量が２０％であり、カメラ１０Ｃの空きリソースの量が１０％である場合、サーバ３０は、空きリソースの量が多いカメラ１０Ａに優先的に学習させるように、つまり、学習量を増やすように、このカメラ１０Ａに学習の指示を出力する。 In the three cameras 10A, 10B, and 10C, the resource monitoring unit 15 determines the amount of free resources (in other words, the surplus capacity indicating the surplus degree of the processing capacity) with respect to the processing capacity of the GPU or FPGA that generates the learning model, respectively. I'm watching. The three cameras 10A, 10B, and 10C notify the server 30 of the amount of free resources monitored by the resource monitoring unit 15 asynchronously or periodically. The amount of free resources is expressed as a percentage of processing power. As an example, when the amount of free resources of the camera 10A is 90%, the amount of free resources of the camera 10B is 20%, and the amount of free resources of the camera 10C is 10%, the server 30 is the free resource. A learning instruction is output to the camera 10A so as to preferentially train the camera 10A having a large amount, that is, to increase the learning amount.

また、サーバ３０は、ネットワークＮＷの帯域が広い、或いはネットワークＮＷが空いている場合には、空きリソースの量が１０％と少ないカメラ１０Ｃで撮像された撮像画像データ（正報或いは誤報の情報付き）を受信すると、空きリソースの量が９０％と多いカメラ１０Ａにその撮像画像データを送信して学習を指示してもよい。これにより、カメラ間で偏った処理の負荷がかかることなく、適正な学習が実現可能となる。 Further, when the network NW has a wide band or the network NW is free, the server 30 has captured image data (with correct or false report information) captured by the camera 10C, which has a small amount of free resources of 10%. ) Is received, the captured image data may be transmitted to the camera 10A having a large amount of free resources of 90% to instruct learning. As a result, proper learning can be realized without imposing a biased processing load between the cameras.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、学習を行うように指示してもよい。これにより、監視システム内で学習を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い学習が可能である。 Further, the server 30 may instruct the camera to directly transfer the captured image data captured by the camera having a small amount of free resources to the camera having a large amount of free resources for learning. As a result, learning can be distributed within the monitoring system, and efficient learning is possible without imposing a heavy load on a specific camera.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、検出を行うように指示してもよい。これにより、監視システム内で検出を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い検出が可能である。 Further, the server 30 may instruct the camera to directly transfer the captured image data captured by the camera having a small amount of free resources to the camera having a large amount of free resources for detection. As a result, the detection can be distributed in the surveillance system, and efficient detection is possible without imposing a heavy load on a specific camera.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、分析を行うように指示してもよい。ここで、分析とは、撮像画像に出現する対象物（つまり、オブジェクト）ｏｂｊを追尾する、或いは、対象物が不審人物に該当するか否かを認識する、等の処理であり、分析の内容は本実施の形態では特に限定されない。これにより、監視システム内で分析を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い分析が可能である。 Further, the server 30 may instruct the camera to directly transfer the captured image data captured by the camera having a small amount of free resources to the camera having a large amount of free resources for analysis. Here, the analysis is a process of tracking an object (that is, an object) obj appearing in a captured image, or recognizing whether or not the object corresponds to a suspicious person, and the content of the analysis. Is not particularly limited in this embodiment. As a result, the analysis can be distributed within the surveillance system, and efficient analysis can be performed without imposing a heavy load on a specific camera.

また、サーバ３０は、監視システム５の全体の処理能力を監視し、システム全体の空きリソースの量が多い場合、各カメラ１０に対し、学習量を増やすように指示し、一方、システム全体の空きリソースの量が少ない場合、各カメラ１０に対し、学習量を減らすように指示してもよい。これにより、監視システム全体に大きな負荷をかけることなく、適正な学習が可能となる。 Further, the server 30 monitors the overall processing capacity of the monitoring system 5, and when the amount of free resources in the entire system is large, the server 30 instructs each camera 10 to increase the learning amount, while the free space in the entire system. If the amount of resources is small, each camera 10 may be instructed to reduce the amount of learning. This enables proper learning without imposing a heavy load on the entire monitoring system.

また、サーバ３０は、各カメラ１０による検出の結果を全てのカメラ１０に共有するように、指示してもよい。これにより、各カメラ１０に検出の結果を分散させることができ、次回以降の検出に用いることで検出精度の向上を図ることができる。 Further, the server 30 may instruct all cameras 10 to share the result of detection by each camera 10. As a result, the detection results can be distributed to each camera 10, and the detection accuracy can be improved by using the detection results for the next and subsequent detections.

また、サーバ３０は、カメラ１０の空きリソースの量が多い場合、このカメラ１０に送信する統合学習の結果のフィードバック量（例えばフィードバック回数）を増やすように指示し、一方、カメラ１０の空きリソースの量が少ない場合、このカメラ１０に送信する、統合学習の結果のフィードバック量（例えばフィードバック回数）を減らすように指示してもよい。これにより、カメラに大きな負荷をかけることなく、適正な量の学習の結果をカメラにフィードバックする（戻す）ことができる。 Further, when the amount of free resources of the camera 10 is large, the server 30 instructs to increase the feedback amount (for example, the number of feedbacks) of the result of the integrated learning transmitted to the camera 10, while the free resources of the camera 10 are used. If the amount is small, it may be instructed to reduce the feedback amount (for example, the number of feedbacks) of the result of the integrated learning transmitted to the camera 10. This makes it possible to feed back (return) an appropriate amount of learning results to the camera without imposing a heavy load on the camera.

また、３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ単価コストに関する情報（例えば電力コストに関する情報）をサーバ３０に通知する。電力コストは、カメラ固有の値であり、例えばワット／フレーム（Ｗ／ｆｒａｍｅ）の単位で表現される。一例として、カメラ１０Ａでは１／２００、カメラ１０Ｂでは１／２００、カメラ１０Ｃでは１／４００が挙げられる。なお、電力コストは、通常カメラの使用状況によって大きく変化しないので、１回の通知で充分である。また、電力コストの単位は、フレーム／ワット（ｆｒａｍｅ／Ｗ）で表現されてもよい。 Further, each of the three cameras 10A, 10B, and 10C notifies the server 30 of information regarding the unit price cost (for example, information regarding the power cost). The power cost is a value unique to the camera and is expressed in units of watts / frames (W / frame), for example. As an example, the camera 10A has 1/200, the camera 10B has 1/200, and the camera 10C has 1/400. Since the power cost does not usually change significantly depending on the usage status of the camera, one notification is sufficient. Further, the unit of power cost may be expressed in frames / watts (frame / watt).

サーバ３０は、カメラ１０Ａとカメラ１０Ｂの電力コストが同じように高い場合、電力コストの低いカメラ１０Ｃに対し、優先的に学習を割り当てる。 When the power costs of the camera 10A and the camera 10B are similarly high, the server 30 preferentially allocates learning to the camera 10C having a low power cost.

サーバ３０は、カメラ１０Ａ，１０Ｂ，１０Ｃの空きリソースの量が同じ或いは同程度である場合、例えばカメラ１０Ａの空きリソースの量が１０％であり、カメラ１０Ｂ，１０Ｃの空きリソースの量がいずれも４５％である場合、電力コストのかからないカメラ１０Ｃで優先的に学習するように、このカメラ１０Ｃに学習の指示を出力する。 In the server 30, when the amount of free resources of the cameras 10A, 10B, and 10C is the same or the same, for example, the amount of free resources of the camera 10A is 10%, and the amount of free resources of the cameras 10B, 10C is all. When it is 45%, a learning instruction is output to the camera 10C so that the camera 10C, which does not require power cost, learns preferentially.

なお、カメラの空きリソースの多寡に拘わらず、サーバ３０は、コスト優先で電力コストの低いカメラで学習を実行するように指示してもよい。また、各デバイスの空きリソース及び電力コストの管理を、サーバ３０が行っていたが、各カメラやレコーダが管理してもよく、その場合、空きリソース及び電力コストを監視システム５内の全てデバイス１５０で共有できる。従って、空きリソース及び電力コストを考慮して、各デバイスは、処理の指示実行を行うことも可能となり、多様な運用が可能となる。 Regardless of the amount of free resources of the camera, the server 30 may instruct the camera to perform learning with a camera that gives priority to cost and has a low power cost. Further, although the server 30 manages the free resources and the power cost of each device, each camera or the recorder may manage the free resources and the power cost. In that case, the free resources and the power cost are managed by all the devices 150 in the monitoring system 5. Can be shared with. Therefore, in consideration of free resources and power cost, each device can execute an instruction for processing, and various operations are possible.

図９は、実施の形態１においてサーバ３０がカメラ１０に処理の実行指示を行う動作手順の一例を詳細に示すシーケンス図である。 FIG. 9 is a sequence diagram showing in detail an example of an operation procedure in which the server 30 gives an instruction to execute a process to the camera 10 in the first embodiment.

図９の動作手順では、サーバ３０は、カメラ１０の空きリソースの情報を基に、複数のカメラ１０の中から、分散処理の対象となるカメラ１０を決定し、該当するカメラ１０に処理実行を指示する。カメラの台数Ｎは、任意の台数でよく、ここでは、説明を簡単にするために２台（カメラ１０Ａ，１０Ｂ）を例示する。なお、サーバ３０の代わりに、レコーダ５０が処理実行を指示するカメラ１０を決定してもよい。 In the operation procedure of FIG. 9, the server 30 determines the camera 10 to be distributed processing from among the plurality of cameras 10 based on the information of the free resources of the camera 10, and executes the processing on the corresponding camera 10. Instruct. The number N of cameras may be any number, and here, two cameras (cameras 10A and 10B) are exemplified for the sake of simplicity. Instead of the server 30, the recorder 50 may determine the camera 10 instructing the processing execution.

カメラ１０Ａは、リソース監視部１５によって監視された空きリソースの情報を繰り返し（例えば常時、又は周期的に）サーバ３０に通知する（Ｔ１）。同様に、カメラ１０Ｂは、リソース監視部１５によって監視された空きリソースの情報を繰り返し（例えば常時、又は周期的に）サーバ３０に通知する（Ｔ２）。 The camera 10A repeatedly (for example, constantly or periodically) notifies the server 30 of the information of the free resource monitored by the resource monitoring unit 15 (T1). Similarly, the camera 10B repeatedly (for example, constantly or periodically) notifies the server 30 of the information of the free resource monitored by the resource monitoring unit 15 (T2).

サーバ３０は、カメラ１０Ａ，１０Ｂの空きリソースの情報をテーブルメモリ３５に登録して管理する（Ｔ３）。サーバ３０は、所定値（例えば７０％）以上の空きリソースを有する少なくとも１台のカメラの有無を判別する（Ｔ４）。ここでは、カメラ１０Ｂだけが所定値以上の空きリソースを有すると想定する。 The server 30 registers and manages information on free resources of the cameras 10A and 10B in the table memory 35 (T3). The server 30 determines the presence or absence of at least one camera having a predetermined value (for example, 70%) or more of free resources (T4). Here, it is assumed that only the camera 10B has a free resource of a predetermined value or more.

所定値以上の空きリソースを有するカメラがある場合（Ｔ４、ＹＥＳ）、サーバ３０は、該当するカメラに対する検出と学習との両方の実行指示を生成する（Ｔ５）。サーバ３０は、カメラ１０Ｂに対し、ネットワークＮＷを経由して、検出と学習との両方の実行指示を送信する（Ｔ６）。カメラ１０Ｂは、該当する処理を実行する（Ｔ７）。 When there is a camera having a free resource of a predetermined value or more (T4, YES), the server 30 generates an execution instruction for both detection and learning for the corresponding camera (T5). The server 30 transmits an execution instruction for both detection and learning to the camera 10B via the network NW (T6). The camera 10B executes the corresponding process (T7).

一方、手順Ｔ４で所定値以上の空きリソースを有するカメラが無い場合（Ｔ４、ＮＯ）、サーバ３０は、全てのカメラ（ここでは、カメラ１０Ａ，１０Ｂ）に対する検出の実行指示を生成する（Ｔ８）。カメラ１０Ａ，１０Ｂは、学習を実行できる程の空きリソースを有していないので、検出のみを行うことになる。サーバ３０は、検出の実行指示を全てのカメラ（ここでは、カメラ１０Ａ，１０Ｂ）に送信する（Ｔ９）。カメラ１０Ａ，１０Ｂは、それぞれ該当する処理を実行する（Ｔ１０，Ｔ１１）。 On the other hand, when there is no camera having a free resource of a predetermined value or more in the procedure T4 (T4, NO), the server 30 generates a detection execution instruction for all the cameras (here, the cameras 10A and 10B) (T8). .. Since the cameras 10A and 10B do not have enough free resources to execute learning, only detection is performed. The server 30 transmits a detection execution instruction to all cameras (here, cameras 10A and 10B) (T9). The cameras 10A and 10B execute the corresponding processes, respectively (T10, T11).

手順Ｔ７でカメラ１０Ｂが該当する処理を実行した場合、カメラ１０Ｂは、学習結果を生成し（Ｔ１２）、生成した学習結果をサーバ３０に送信する（Ｔ１３）。 When the camera 10B executes the corresponding process in the procedure T7, the camera 10B generates a learning result (T12) and transmits the generated learning result to the server 30 (T13).

図１０は、実施の形態１においてサーバ３０がモデルパラメータのフィードバック量を制御する動作手順の一例を詳細に示すシーケンス図である。 FIG. 10 is a sequence diagram showing in detail an example of an operation procedure in which the server 30 controls the feedback amount of the model parameters in the first embodiment.

図１０の動作手順では、サーバ３０は、カメラ１０の空きリソースの情報を基に、モデルパラメータのフィードバック量を制御する。カメラの台数Ｎは、任意の台数でよく、ここでは、説明を簡単にするために２台（カメラ１０Ａ，１０Ｂ）である。なお、サーバ３０の代わりに、レコーダ５０がモデルパラメータをフィードバックするカメラ１０を決定してもよい。 In the operation procedure of FIG. 10, the server 30 controls the feedback amount of the model parameter based on the information of the free resource of the camera 10. The number N of the cameras may be any number, and here, two cameras (cameras 10A and 10B) are used for the sake of simplicity. Instead of the server 30, the recorder 50 may determine the camera 10 to feed back the model parameters.

サーバ３０は、カメラ１０Ａ，１０Ｂからそれぞれ学習結果であるモデルパラメータを受信し、学習用ＤＢ３４に蓄積する（Ｔ２１）。サーバ３０は、各カメラ１０Ａ，１０Ｂの空きリソースの量に応じて、学習結果である多くのモデルパラメータの中から、推論（検出）処理時に用いる学習モデルのモデルパラメータのフィードバック量をカメラごとに算出する（Ｔ２２）。 The server 30 receives model parameters, which are learning results, from the cameras 10A and 10B, respectively, and stores them in the learning DB 34 (T21). The server 30 calculates the feedback amount of the model parameter of the learning model used at the time of inference (detection) processing from many model parameters which are the learning results according to the amount of free resources of each camera 10A and 10B for each camera. (T22).

サーバ３０は、カメラ１０Ｂに対し、算出されたフィードバック量分のモデルパラメータのデータを送信する（Ｔ２３）。同様に、サーバ３０は、カメラ１０Ａに対し、算出されたフィードバック量分のモデルパラメータのデータを送信する（Ｔ２４）。カメラ１０Ｂは、サーバ３０から受信したモデルパラメータを、処理実行部１４のメモリに追加登録して蓄積する（Ｔ２５）。同様に、カメラ１０Ａは、サーバ３０から受信したモデルパラメータを、処理実行部１４のメモリに追加登録して蓄積する（Ｔ２６）。 The server 30 transmits the calculated feedback amount of model parameter data to the camera 10B (T23). Similarly, the server 30 transmits the calculated feedback amount of model parameter data to the camera 10A (T24). The camera 10B additionally registers and stores the model parameters received from the server 30 in the memory of the processing execution unit 14 (T25). Similarly, the camera 10A additionally registers and stores the model parameters received from the server 30 in the memory of the processing execution unit 14 (T26).

なお、ここでは、フィードバック量は、各カメラの空きリソースの情報を基に、サーバ３０により決定されたが、空きリソースに限らず、教師データに基づく正報検出数や教師データに基づく誤報検出数に応じて、決定されてもよい。 Here, the amount of feedback is determined by the server 30 based on the information of the free resources of each camera, but it is not limited to the free resources, but the number of positive reports detected based on the teacher data and the number of false alarms detected based on the teacher data. May be determined accordingly.

図１１は、監視システム５における学習結果の共有の概要例の説明図である。 FIG. 11 is an explanatory diagram of an outline example of sharing learning results in the monitoring system 5.

各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）は、撮像により得られた撮像画像データを用いてローカル学習を行い、モデルパラメータを更新する。また、各カメラ１０は、正報が得られた撮像画像データだけを用いて学習を行うことができ、学習の結果であるモデルパラメータの精度を向上できる。また、カメラ１０は、オプションとして接続された表示器１９に、ローカル学習において、撮像画像データを評価するためのＵＩ画面３２０（図１２参照）を表示可能である。また、カメラ１０は、ローカル学習時のＵＩ画面３２０をサーバ３０の表示部３７に表示させることも可能である。 Each camera 10 (10A, 10B, 10C) performs local learning using the captured image data obtained by imaging, and updates the model parameters. In addition, each camera 10 can perform learning using only the captured image data for which the correct report has been obtained, and can improve the accuracy of the model parameters that are the result of the learning. Further, the camera 10 can display the UI screen 320 (see FIG. 12) for evaluating the captured image data in the local learning on the display 19 connected as an option. Further, the camera 10 can display the UI screen 320 at the time of local learning on the display unit 37 of the server 30.

図１２は、ローカル学習時に表示されるＵＩ画面３２０を示す図である。 FIG. 12 is a diagram showing a UI screen 320 displayed during local learning.

ＵＩ画面３２０は、例えばカメラ１０のローカル学習時に、カメラ１０と通信可能に接続されたサーバ３０の表示部３７又はＰＣ（図示略）の表示部において表示され、具体的には、撮像画像データから切り出された学習データごとに、正誤の判定、カメラＩＤ、リジェクトボタンｂｘを表示する。なお、撮像画像データのサムネイルは、カメラ１０が元の撮像画像データを記憶しているので、ここでは表示されないが、表示されるようにしてもよい。検出の対象物（オブジェクト）ｏｂｊは「人」である。 The UI screen 320 is displayed, for example, on the display unit 37 of the server 30 or the display unit of a PC (not shown) communicably connected to the camera 10 during local learning of the camera 10, and specifically, from the captured image data. For each of the cut out learning data, the correctness judgment, the camera ID, and the reject button bx are displayed. The thumbnail of the captured image data is not displayed here because the camera 10 stores the original captured image data, but it may be displayed. The object (object) obj to be detected is a "person".

サーバ３０は、正誤の判定処理において、撮像画像データに対象物ｏｂｊを検出できた場合に正報と判定し、撮像画像データに対象物ｏｂｊを検出できなかった場合に誤報と判定する。なお、ユーザが、サーバ３０の表示部３７に表示されたＵＩ画面３２０に対して入力することで、サーバ３０は、正報或いは誤報を判定してもよい。 In the correctness determination process, the server 30 determines that the target object obj can be detected in the captured image data, and determines that the target object obj cannot be detected in the captured image data. The server 30 may determine a correct report or a false report by inputting to the UI screen 320 displayed on the display unit 37 of the server 30 by the user.

カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。 The camera ID is the identification information of the camera imaged to obtain the learning data.

リジェクトボタンｂｘは、ユーザにより選択され、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The reject button bx is selected by the user and a check mark is displayed. The learning data with a check mark added to the reject button bx is not used for learning when the user presses the learning button bt5.

カメラ１０は、自動的に、誤報の撮像画像データを採用する学習に用いず、正報の撮像画像データを採用する学習に用いたが、カメラ１０の代わりに、ユーザがリジェクトボタンｂｘを用いて撮像画像データを指示してもよい。例えば、ユーザは、誤報の撮像画像データを採用する学習に用いず、正報の撮像画像データを採用する学習に用いるように指示してもよい。これにより、誤報の撮像画像データを用いて学習することができる。また、カメラ１０は、正報の撮像画像データと誤報の撮像画像データとを組み合わせて学習に用いてもよい。これにより、撮像画像データの品質に照らして、学習に用いる撮像画像データを選別できる。 The camera 10 was not automatically used for learning to adopt the captured image data of the false report, but was used for learning to adopt the captured image data of the positive report. However, instead of the camera 10, the user uses the reject button bx. The captured image data may be instructed. For example, the user may instruct to use it for learning to adopt the captured image data of the positive report instead of using it for learning to adopt the captured image data of the false alarm. As a result, learning can be performed using the captured image data of the false alarm. Further, the camera 10 may be used for learning by combining the captured image data of the positive report and the captured image data of the false report. Thereby, the captured image data used for learning can be selected in light of the quality of the captured image data.

サーバ３０は、各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）から送信されたモデルパラメータＰを受信し、受信した各モデルパラメータＰを合算する統合学習を行い、合算したモデルパラメータＰを学習用ＤＢ３４に追加する。ここで、統合されるモデルパラメータは、設置状況が同じカメラで撮像された画像データを基に得られたモデルパラメータである。一方、設置状況が異なるカメラで撮像された画像データを基に得られるモデルパラメータは、合算されず、別々の学習モデルに対するモデルパラメータとして個別に登録される。 The server 30 receives the model parameter P transmitted from each camera 10 (10A, 10B, 10C), performs integrated learning to sum the received model parameters P, and adds the summed model parameter P to the learning DB 34. do. Here, the model parameters to be integrated are model parameters obtained based on the image data captured by the cameras having the same installation status. On the other hand, the model parameters obtained based on the image data captured by the cameras having different installation conditions are not added up and are individually registered as model parameters for different learning models.

図１３は、統合学習時にサーバ３０の表示部３７に表示されるＵＩ画面３１０を示す図である。 FIG. 13 is a diagram showing a UI screen 310 displayed on the display unit 37 of the server 30 during integrated learning.

サーバ３０は、表示部３７に、統合学習時のＵＩ画面３１０（図１３参照）を表示可能である。ＵＩ画面３１０は、撮像画像データから切り出された学習データごとに、正誤の判定、サムネイル、カメラＩＤ、リジェクトボタンｂｘを表示する。ここでは、検出の対象物（オブジェクト）が「人」である場合を示す。 The server 30 can display the UI screen 310 (see FIG. 13) at the time of integrated learning on the display unit 37. The UI screen 310 displays a correct / incorrect determination, a thumbnail, a camera ID, and a reject button bx for each learning data cut out from the captured image data. Here, the case where the object (object) to be detected is a “person” is shown.

サーバ３０は、正誤の判定処理では、撮像画像データに対象物ｏｂｊを検出できた場合に正報と判定し、撮像画像データに対象物ｏｂｊを検出できなかった場合に誤報と判定する。なお、ユーザが、サーバ３０の表示部３７に表示されたＵＩ画面３１０に対して入力することで、サーバ３０は、正報或いは誤報を判定してもよい。 In the correctness determination process, the server 30 determines that the object OBj is detected in the captured image data as a correct report, and determines that the target object obj is not detected in the captured image data as a false alarm. The server 30 may determine a correct report or a false report by inputting to the UI screen 310 displayed on the display unit 37 of the server 30 by the user.

サムネイルは、学習データの縮小画像である。サムネイルであるので、カメラ１０からサーバ３０に送信される際、データ転送量は抑えられる。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。 Thumbnails are reduced images of learning data. Since it is a thumbnail, the amount of data transfer can be suppressed when it is transmitted from the camera 10 to the server 30. The camera ID is the identification information of the camera imaged to obtain the learning data.

サーバ３０は、自動的に正報の撮像画像データを採用するように学習し（つまり、正報の撮像画像データの検出に用いたモデルパラメータを蓄積するように学習し）、誤報の撮像画像データを排除するように学習する（つまり、誤報の撮像画像データの検出に用いたモデルパラメータを蓄積しないように学習する）。但し、学習に用いる撮像画像データの選択について、ユーザが主体的にリジェクトボタンｂｘを用いて、学習に用いる撮像画像データが指示されてもよい。また、ユーザは、誤報の撮像画像データを排除する学習を行わず、正報の撮像画像データを採用する学習を行うように指示してもよい。 The server 30 automatically learns to adopt the captured image data of the positive report (that is, learns to accumulate the model parameters used for detecting the captured image data of the positive report), and the captured image data of the false report. (That is, learn not to accumulate the model parameters used to detect falsely captured image data). However, regarding the selection of the captured image data to be used for learning, the user may independently use the reject button bx to instruct the captured image data to be used for learning. Further, the user may instruct to perform learning to adopt the captured image data of the positive report without performing learning to exclude the captured image data of the false alarm.

このように、サーバ３０がモデルパラメータを統合学習することで、モデルパラメータの学習の精度が向上する。サーバ３０は、統合学習の結果である更新されたモデルパラメータを、該当するカメラ１０にフィードバック送信する。これにより、各カメラ１０で得られる撮像画像データの正報が多くなるほど、カメラの検出精度が高くなる。 In this way, the server 30 performs integrated learning of the model parameters, so that the accuracy of learning the model parameters is improved. The server 30 feeds back the updated model parameters that are the result of the integrated learning to the corresponding camera 10. As a result, the more positive reports of the captured image data obtained by each camera 10, the higher the detection accuracy of the cameras.

また、サーバ３０は、統合学習の結果である更新されたモデルパラメータＰを、各カメラ１０にフィードバック送信する際、各カメラ１０の正報の数に応じて、フィードバック量を制御する。つまり、サーバ３０は、誤報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が多くなるように、更新済みモデルパラメータを送信する。これにより、正報の数が増加し、カメラの検出精度が向上する。 Further, when the updated model parameter P, which is the result of the integrated learning, is fed back to each camera 10, the server 30 controls the amount of feedback according to the number of positive reports of each camera 10. That is, the server 30 transmits updated model parameters to the camera 10 having a large number of false alarms so that the amount of feedback (for example, the number of feedbacks) is large. This increases the number of positive reports and improves the detection accuracy of the camera.

一方、サーバ３０は、正報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が少なくなるように、更新済みモデルパラメータを送信する。これにより、カメラの処理負荷を軽減できる。なお、サーバ３０は、設置環境が同じであるカメラに対し、同一の更新済みのモデルパラメータを送信して共有させることは、前述した通りである。 On the other hand, the server 30 transmits updated model parameters to the camera 10 having a large number of positive reports so that the amount of feedback (for example, the number of feedbacks) is small. As a result, the processing load of the camera can be reduced. As described above, the server 30 transmits and shares the same updated model parameters to cameras having the same installation environment.

また、サーバ３０は、各カメラ１０に対し、学習の実行指示を行う際、各カメラ１０の正報の数に応じて、学習の量を指示する。誤報の数が多いカメラ１０に対し、学習量が多くなるように、学習の実行指示を行う。これにより、正報の数が増加し、カメラの検出精度が向上する。一方、サーバ３０は、正報の数が多いカメラ１０に対し、学習量が少なくなるように、学習の実行指示を行う。これにより、カメラの処理負荷を軽減できる。 Further, when the server 30 gives an instruction to execute learning to each camera 10, the server 30 instructs each camera 10 to instruct the amount of learning according to the number of positive reports of each camera 10. A learning execution instruction is given to the camera 10 having a large number of false alarms so that the learning amount is large. This increases the number of positive reports and improves the detection accuracy of the camera. On the other hand, the server 30 gives an instruction to execute learning to the camera 10 having a large number of positive reports so that the amount of learning is small. As a result, the processing load of the camera can be reduced.

また、サーバ３０は、各カメラ１０で撮像された撮像画像に出現する対象物を検出する検出の結果を統合して管理してもよい。検出の結果を統合する場合、対象物の動きをベクトルで表し、ベクトルで検出の結果を管理してもよい。 Further, the server 30 may integrate and manage the detection result of detecting the object appearing in the captured image captured by each camera 10. When integrating the detection results, the movement of the object may be represented by a vector, and the detection results may be managed by the vector.

以上により、第１の実施形態の監視システム５では、サーバ３０と、監視エリアＳＡに設置された複数のカメラ１０とが互いに通信可能に接続される。サーバ３０は、それぞれのカメラ１０の空きリソース（つまり、処理能力に関する情報）と、それぞれのカメラ１０により監視エリアＳＡの撮像により得られた撮像画像のデータとを保持するテーブルメモリ３５を有する。サーバ３０は、カメラ１０の処理能力に関する情報に基づいて、それぞれのカメラ１０により得られる撮像画像に出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に関してカメラ１０が実行する処理をカメラ１０ごとに決定し、決定された処理の実行指示をカメラ１０ごとに送信する。それぞれのカメラ１０は、サーバ３０から送信された処理の実行指示に基づいて、実行指示に対応する処理を実行する。 As described above, in the monitoring system 5 of the first embodiment, the server 30 and the plurality of cameras 10 installed in the monitoring area SA are connected to each other so as to be communicable with each other. The server 30 has a table memory 35 that holds free resources (that is, information regarding processing capacity) of each camera 10 and data of an captured image obtained by imaging the monitoring area SA by each camera 10. The server 30 performs a process performed by the camera 10 for each camera 10 regarding the detection of at least one object (object) obj appearing in the captured image obtained by each camera 10 based on the information regarding the processing capacity of the camera 10. It is determined, and the execution instruction of the determined process is transmitted for each camera 10. Each camera 10 executes a process corresponding to the execution instruction based on the process execution instruction transmitted from the server 30.

これにより、監視システム５は、監視エリアＳＡに設置された複数のカメラ１０において撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ１０間で分散でき、ネットワーク上のトラフィックの増大を抑制し、複数のカメラ１０に接続されるサーバ３０の処理負荷の軽減を支援することができる。 As a result, when the monitoring system 5 detects at least one object in each captured image captured by the plurality of cameras 10 installed in the monitoring area SA, the monitoring system 5 performs a plurality of processes such as learning of parameters used for the detection. It can be distributed among the cameras 10, suppresses the increase in traffic on the network, and can support the reduction of the processing load of the server 30 connected to the plurality of cameras 10.

また、上述した処理は、撮像画像に出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に用いるモデルパラメータＰを学習する学習である。これにより、監視システム５は、負荷の大きな学習を複数のカメラ１０に分散させることができる。 Further, the above-mentioned processing is learning to learn the model parameter P used for detecting at least one object (object) obj appearing in the captured image. As a result, the monitoring system 5 can distribute the learning with a heavy load to a plurality of cameras 10.

また、サーバ３０は、複数のカメラ１０に対し、学習の実行指示をそれぞれ送信する。複数のカメラ１０は、それぞれ学習の実行指示に従って、学習を実行する。サーバ３０は、複数のカメラ１０により実行された学習の結果を受信する。これにより、サーバ３０は、例えば自装置で学習することなく、複数のカメラ１０から学習の結果を得ることができる。 Further, the server 30 transmits a learning execution instruction to each of the plurality of cameras 10. Each of the plurality of cameras 10 executes learning according to a learning execution instruction. The server 30 receives the result of learning executed by the plurality of cameras 10. As a result, the server 30 can obtain the learning result from the plurality of cameras 10 without learning by the own device, for example.

また、サーバ３０は、自身で学習を実行するとともに、複数のカメラ１０に学習の実行指示をそれぞれ送信する。複数のカメラ１０は、それぞれ学習の実行指示に従い、学習を実行する。サーバ３０は、複数のカメラ１０により実行された学習の結果を受信する。これにより、サーバ３０は、複数のカメラ１０から得た学習の結果に、自装置の学習結果を加えることができ、次回以降の学習の効率化を図ることができる。 Further, the server 30 executes the learning by itself and also transmits the learning execution instruction to the plurality of cameras 10. Each of the plurality of cameras 10 executes learning according to the learning execution instruction. The server 30 receives the result of learning executed by the plurality of cameras 10. As a result, the server 30 can add the learning result of its own device to the learning result obtained from the plurality of cameras 10, and can improve the efficiency of learning from the next time onward.

また、サーバ３０は、学習の結果を複数の前記カメラに送信する。複数のカメラ１０は、学習の結果を共有する。これにより、複数のカメラは、同じ学習の結果を利用できる。 Further, the server 30 transmits the learning result to the plurality of cameras. The plurality of cameras 10 share the learning result. This allows multiple cameras to take advantage of the same learning results.

また、複数のカメラ１０のうち一部の複数のカメラ１０は同一の設置状況で設置される。サーバ３０は、学習の結果を、設置状況が同じである一部の複数のカメラ１０にそれぞれ送信する。設置状況が同じである一部の複数のカメラ１０は、サーバ３０から送信された学習の結果を共有する。これにより、監視システム５は、設置状況が同じである複数のカメラ１０によるオブジェクトの検出精度を高めることができる。 Further, some of the plurality of cameras 10 among the plurality of cameras 10 are installed in the same installation situation. The server 30 transmits the learning result to each of a plurality of cameras 10 having the same installation status. Some of the plurality of cameras 10 having the same installation status share the learning result transmitted from the server 30. As a result, the surveillance system 5 can improve the detection accuracy of the object by the plurality of cameras 10 having the same installation condition.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出数に応じて、学習の処理量を制御する。これにより、サーバ３０は、オブジェクトの検出数が多くて、処理の負荷が大きいカメラに対し、負荷を増加させるような、学習の量を減らすことができる。一方、サーバ３０は、オブジェクトの検出数が多くて、処理の負荷が小さいカメラに対し、学習の量を増やすことができる。従って、カメラの処理の負荷を均一化に繋がる。 Further, the server 30 controls the learning processing amount according to the number of detected objects (objects) obj detected by the camera 10. As a result, the server 30 can reduce the amount of learning that increases the load on the camera, which has a large number of detected objects and a heavy processing load. On the other hand, the server 30 can increase the amount of learning for a camera having a large number of detected objects and a small processing load. Therefore, the processing load of the camera can be made uniform.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出の正報の数に応じて、カメラ１０における学習の処理量を制御する。これにより、サーバ３０は、正報の学習の結果を多く用いることで、学習の結果の精度（言い換えると、次回以降の検出の精度）を向上できる。 Further, the server 30 controls the amount of learning processing in the camera 10 according to the number of positive reports of the detection of the object (object) obj detected by the camera 10. As a result, the server 30 can improve the accuracy of the learning result (in other words, the accuracy of the detection from the next time onward) by using a lot of the learning results of the positive report.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出の誤報の数に応じて、カメラ１０における学習の量を制御する。これにより、サーバ３０は、誤報の学習の結果を用いないようにすることで、結果的に学習の結果の精度（言い換えると、次回以降の検出の精度）を向上できる。 Further, the server 30 controls the amount of learning in the camera 10 according to the number of false alarms of the detection of the object (object) obj detected by the camera 10. As a result, the server 30 can improve the accuracy of the learning result (in other words, the accuracy of the detection from the next time onward) by not using the learning result of the false alarm.

また、サーバ３０は、カメラ１０の処理能力の量に応じて、カメラ１０における学習の処理量を制御する。これにより、サーバ３０は、特定のカメラに大きな負荷をかけることなく、複数のカメラに学習を分散させることができ、効率の良い学習の実現が可能である。 Further, the server 30 controls the amount of learning processing in the camera 10 according to the amount of processing capacity of the camera 10. As a result, the server 30 can distribute learning to a plurality of cameras without imposing a heavy load on a specific camera, and efficient learning can be realized.

また、サーバ３０は、監視システム５を構成するサーバ３０及び複数のカメラ１０のそれぞれの処理能力に関する情報をテーブルメモリ３５に保持する。サーバ３０は、サーバ３０及び複数のカメラ１０のそれぞれの処理能力の量に応じて、学習の処理量を制御する。これにより、サーバ３０は、監視システムの特定のデバイスに大きな負荷をかけることなく、複数のデバイスに学習を分散させることができ、効率の良い学習の実現が可能である。 Further, the server 30 holds information on the processing capabilities of the server 30 constituting the monitoring system 5 and the plurality of cameras 10 in the table memory 35. The server 30 controls the amount of learning processing according to the amount of processing capacity of each of the server 30 and the plurality of cameras 10. As a result, the server 30 can distribute learning to a plurality of devices without imposing a large load on a specific device of the monitoring system, and efficient learning can be realized.

また、上記処理は、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に用いるモデルパラメータＰを学習する学習と、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊを検出する検出と、検出によって検出された少なくとも１つの対象物（オブジェクト）ｏｂｊを分析する分析と、を含む。これにより、サーバ３０は、学習の他、検出と分析においても、複数のカメラ１０に処理を分散させることができる。 Further, in the above processing, learning to learn the model parameter P used for detecting at least one object (object) obj appearing in the captured image og, and learning to learn at least one object (object) obj appearing in the captured image og. Includes detection to detect and analysis to analyze at least one object obj detected by detection. As a result, the server 30 can distribute the processing to the plurality of cameras 10 not only in learning but also in detection and analysis.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、学習の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に学習のスピードを向上させることができる。 Further, the server 30 transmits data of the captured image held in the table memory 35 to at least one camera 10 having a relatively high processing capacity as compared with the other cameras among the plurality of cameras 10 for learning. Give an execution instruction. As a result, the server 30 can transmit the captured image data to another camera having high processing power when the network band is wide or the network is free, and as a result, the learning speed is increased. Can be improved.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、検出の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に検出のスピードを向上させることができる。 Further, the server 30 transmits and detects the data of the captured image held in the table memory 35 to at least one camera 10 having a relatively high processing capacity as compared with the other cameras among the plurality of cameras 10. Give an execution instruction. As a result, the server 30 can transmit the captured image data to another camera having high processing power when the network band is wide or the network is free, and as a result, the detection speed is increased. Can be improved.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、分析の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に分析のスピードを向上させることができる。 Further, the server 30 transmits data of the captured image held in the table memory 35 to at least one camera 10 having a relatively high processing capacity as compared with the other cameras among the plurality of cameras 10 and analyzes the data. Give an execution instruction. As a result, the server 30 can transmit the captured image data to another camera having high processing power when the network band is wide or the network is free, and as a result, the speed of analysis can be increased. Can be improved.

また、上記処理は、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊを検出する検出である。サーバ３０は、検出の結果を複数のカメラ１０にそれぞれ送信する。複数のカメラ１０は、検出の結果を共有する。これにより、サーバ３０は、特定のカメラに大きな負荷をかけることなく、複数のカメラに検出を分散させることができ、検出の効率を高めることができる。 Further, the above processing is detection for detecting at least one object (object) obj that appears in the captured image og. The server 30 transmits the detection result to each of the plurality of cameras 10. The plurality of cameras 10 share the detection result. As a result, the server 30 can distribute the detection to a plurality of cameras without imposing a heavy load on a specific camera, and can improve the efficiency of the detection.

また、サーバ３０は、複数のカメラ１０により実行された学習の結果を統合する。これにより、監視システム５は、サーバ３０における統合によって集約された学習の結果の精度を向上できる。 Further, the server 30 integrates the learning results executed by the plurality of cameras 10. Thereby, the monitoring system 5 can improve the accuracy of the learning result aggregated by the integration in the server 30.

また、複数のカメラ１０のうち一部の複数のカメラ１０は、同一の設置状況で設置される。サーバ３０は、カメラ１０の設置状況が同じである、複数のカメラ１０により実行された学習の結果を統合する。これにより、サーバ３０は、同じ設置状況のカメラによるオブジェクトの検出精度を高めることができる。 Further, some of the plurality of cameras 10 among the plurality of cameras 10 are installed in the same installation situation. The server 30 integrates the results of learning performed by a plurality of cameras 10 having the same installation status of the cameras 10. As a result, the server 30 can improve the detection accuracy of the object by the camera in the same installation condition.

サーバ３０は、設置状況が同一の一部の複数のカメラ１０から、それぞれのカメラの設置状況に関する情報の通知を受信し、その一部の複数のカメラ１０により実行された学習の結果を統合する。これにより、サーバ３０は、同じ設置状況のカメラによる学習の結果を統合し易くなる。 The server 30 receives notification of information on the installation status of each camera from a plurality of cameras 10 having the same installation status, and integrates the learning results executed by the plurality of cameras 10 of the plurality of cameras. .. This makes it easier for the server 30 to integrate the learning results from the cameras in the same installation situation.

また、サーバ３０及び複数のカメラ１０が、複数のカメラ１０の処理能力に関する情報と複数のカメラ１０の単価コストに関する情報（例えば個々のカメラ１０の電力コストの情報）とを共有する。これにより、サーバ３０は、空きリソース及び電力コストを考慮して、サーバ及び複数のカメラ等の各デバイスは、処理の指示実行を行うことも可能となり、多様な運用が可能となる。 Further, the server 30 and the plurality of cameras 10 share information regarding the processing capacity of the plurality of cameras 10 and information regarding the unit price cost of the plurality of cameras 10 (for example, information on the power cost of the individual cameras 10). As a result, the server 30 can be instructed to execute processing by each device such as the server and a plurality of cameras in consideration of free resources and power cost, and various operations are possible.

（第２の実施の形態に至る経緯）
上述した特許文献１のような従来技術では、撮像画像内において追跡対象となる物体の正解となる物体動き情報を得るために、その物体に関する評価関数のスコアを用いることは開示されている。しかし、物体の検出精度を示すスコアに応じて、物体の検出において必要なパラメータの学習量をコントロールすることについては特段の考慮がなされていなかった。このため、例えば本来学習が必要ではない、検出に用いるパラメータを学習してしまうことでパラメータの学習精度にばらつきが生じ、物体の検出精度に影響を及ぼすことが懸念される。 (Background to the second embodiment)
In the prior art such as Patent Document 1 described above, it is disclosed that the score of the evaluation function for the object is used in order to obtain the object motion information which is the correct answer of the object to be tracked in the captured image. However, no particular consideration has been given to controlling the learning amount of the parameters required for detecting the object according to the score indicating the accuracy of detecting the object. For this reason, for example, learning the parameters used for detection, which originally does not require learning, causes variations in the learning accuracy of the parameters, and there is a concern that the detection accuracy of the object may be affected.

そこで、実施の形態２では、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出に得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上する監視システム及び監視方法の例、並びに、カメラ及びパラメータ登録方法を説明する。 Therefore, in the second embodiment, the parameters used for the detection are obtained according to the score indicating the detection accuracy of the objects obtained in the detection of at least one object in the captured image captured by the camera installed in the monitoring area. An example of a monitoring system and a monitoring method for appropriately controlling the learning amount of the camera and improving the learning accuracy in the camera, and a camera and a parameter registration method will be described.

（実施の形態２）
実施の形態２の監視システム５のシステム構成は、上述した実施の形態１の監視システム５のシステム構成と同一であるので、同一の符号を用いることで、その説明を簡略化又は省略し、異なる内容について説明する。 (Embodiment 2)
Since the system configuration of the monitoring system 5 of the second embodiment is the same as the system configuration of the monitoring system 5 of the first embodiment described above, the description thereof is simplified or omitted by using the same reference numerals, and the description is different. The contents will be explained.

図１４は、実施の形態２のカメラ１０の処理実行部１４の内部構成の一例を詳細に示すブロック図である。 FIG. 14 is a block diagram showing in detail an example of the internal configuration of the processing execution unit 14 of the camera 10 of the second embodiment.

カメラ１０の主要な構成である処理実行部１４は、ニューラルネットワーク（つまり、ＮＮ１４０）の他、教師データセットメモリ１５１及びパラメータメモリ１５２を含む。 The processing execution unit 14, which is the main configuration of the camera 10, includes a neural network (that is, NN140), a teacher data set memory 151, and a parameter memory 152.

ＮＮ１４０は、オブジェクト推論機能１４１と、スコア導出機能１４２と、正誤判定機能１４３と、パラメータ学習機能１４４との各機能を有する。 The NN 140 has an object inference function 141, a score derivation function 142, a correctness determination function 143, and a parameter learning function 144.

検出部の一例としてのオブジェクト推論機能１４１では、ＮＮ１４０は、モデルパラメータに従い、撮像画像に出現する対象物が何であるかを推論（つまり、検出）する。 In the object inference function 141 as an example of the detection unit, the NN 140 infers (that is, detects) what the object appearing in the captured image is according to the model parameters.

導出部の一例としてのスコア導出機能１４２では、ＮＮ１４０は、推論時に対象物の検出精度を示すスコア（評価値）を、教師データセットメモリ１５１に登録された教師データを用いて導出し、そのスコアを出力する。 In the score derivation function 142 as an example of the derivation unit, the NN 140 derives a score (evaluation value) indicating the detection accuracy of the object at the time of inference using the teacher data registered in the teacher data set memory 151, and the score is derived. Is output.

正誤判定機能１４３では、ＮＮ１４０は、推論時に対象物の正誤の判定を、教師データセットメモリ１５１に登録された教師データを用いて導出し、その判定結果を出力する。 In the correctness determination function 143, the NN 140 derives the correctness determination of the object at the time of inference using the teacher data registered in the teacher data set memory 151, and outputs the determination result.

パラメータ学習部の一例としてのパラメータ学習機能１４４では、ＮＮ１４０は、スコアが高い対象物の推論に用いられたモデルパラメータを採用するように学習する。また、パラメータ学習機能１４４では、ＮＮ１４０は、スコアが低い対象の推論に用いられたモデルパラメータを排除するように学習する。ＮＮ１４０は、学習したモデルパラメータをパラメータメモリ１５２に登録して蓄積する。パラメータメモリ１５２に登録された第１所定値（例えば８０点）よりスコアが高い対象物の推論に用いられたモデルパラメータは、後述する学習結果の共有において、サーバ３０に送信され、統合学習において利用される。 In the parameter learning function 144 as an example of the parameter learning unit, the NN 140 learns to adopt the model parameters used for inferring an object having a high score. Further, in the parameter learning function 144, the NN 140 learns to exclude the model parameters used for the inference of the object having a low score. The NN 140 registers and stores the learned model parameters in the parameter memory 152. The model parameters used for inferring an object having a score higher than the first predetermined value (for example, 80 points) registered in the parameter memory 152 are transmitted to the server 30 in the sharing of the learning result described later and used in the integrated learning. Will be done.

図１５は、カメラ１０のローカル学習の動作手順の一例を詳細に示すフローチャートである。 FIG. 15 is a flowchart showing in detail an example of the operation procedure of the local learning of the camera 10.

図１５において、カメラ１０は、イメージセンサ１２において被写体像から対象物を撮像し（Ｓ１）、撮像画像データを生成する（Ｓ２）。 In FIG. 15, the camera 10 captures an object from a subject image by the image sensor 12 (S1) and generates captured image data (S2).

処理実行部１４は、撮像画像データを入力し、撮像画像に現れる少なくとも１つの対象物（つまり、オブジェクト）を推論（検出）する（Ｓ３）。処理実行部１４は、推論（検出）時に少なくとも１つのオブジェクトのスコアリング処理を行う（Ｓ４）。このスコアリング処理では、処理実行部１４は、教師データセットメモリ１５１に登録された教師データを用いて、オブジェクトのスコア（評価値）を出力する。 The processing execution unit 14 inputs captured image data and infers (detects) at least one object (that is, an object) appearing in the captured image (S3). The processing execution unit 14 performs scoring processing of at least one object at the time of inference (detection) (S4). In this scoring process, the process execution unit 14 outputs the score (evaluation value) of the object using the teacher data registered in the teacher data set memory 151.

処理実行部１４は、スコアリング処理の結果、第１所定値（例えば８０点）より上位スコアのオブジェクトの推論に用いたモデルパラメータ、及び第２所定値（例えば１０点）下位スコアのオブジェクトの推論に用いたモデルパラメータを用いて、ＮＮ１４０のモデルパラメータを学習する（Ｓ５）。また、ステップＳ５において、処理実行部１４は、第１所定値より上位スコアのモデルパラメータをパラメータメモリ１５２に登録して蓄積する。この後、カメラ１０は、図１５に示す処理を終了する。 As a result of the scoring process, the processing execution unit 14 infers the model parameters used for inferring the object having a higher score than the first predetermined value (for example, 80 points) and the object having the lower score in the second predetermined value (for example, 10 points). The model parameters of the NN140 are learned using the model parameters used in (S5). Further, in step S5, the processing execution unit 14 registers and accumulates the model parameters having a higher score than the first predetermined value in the parameter memory 152. After this, the camera 10 ends the process shown in FIG.

上位スコアは、例えば８０点〜１００点である。下位スコアは、例えば０点〜１０点である。処理実行部１４は、例えば上位スコアのモデルパラメータを採用し、下位スコアのモデルパラメータを排除する。 The top score is, for example, 80 to 100 points. The lower score is, for example, 0 to 10 points. The processing execution unit 14 adopts, for example, a model parameter having a high score and excludes a model parameter having a low score.

上位スコアのオブジェクトは正報の可能性が高いオブジェクトであり、下位スコアのオブジェクトは誤報の可能性が高いオブジェクトである。従って、上位スコアのオブジェクトの推論（検出）に用いたモデルパラメータを採用するように学習することで、正報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられるようになり、モデルパラメータの学習精度を向上させることができる。 An object with a high score is an object with a high possibility of a positive report, and an object with a low score is an object with a high possibility of a false alarm. Therefore, by learning to adopt the model parameters used for inference (detection) of the object with the highest score, the model parameters applied to the estimation of the object with high possibility of correct information will be used, and the model will be used. The learning accuracy of parameters can be improved.

また、下位スコアのオブジェクトの推論に用いたモデルパラメータを排除するように学習することで、誤報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられなくなり、モデルパラメータの学習精度を向上させることができる。 In addition, by learning to eliminate the model parameters used to infer low-score objects, the model parameters applied to the estimation of objects with a high possibility of false alarms are no longer used, and the learning accuracy of model parameters is improved. Can be made to.

また、処理実行部１４は、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて、学習してもよい。このように、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて学習することで、モデルパラメータの学習精度をより一層向上させることができる。 Further, the processing execution unit 14 may learn by combining the model parameters used for inferring the object with the higher score and the model parameters used for inferring the object with the lower score. In this way, by learning by combining the model parameters used for inferring the high-scoring object and the model parameters used for inferring the low-scoring object, the learning accuracy of the model parameters can be further improved. ..

図１６は、監視システム５における学習結果の共有の概要例の説明図である。 FIG. 16 is an explanatory diagram of an outline example of sharing learning results in the monitoring system 5.

各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）は、撮像により得られた撮像画像データを用いてローカル学習を行う。ローカル学習において、各カメラ１０は、撮像画像データの中で検出した対象物の検出精度に関するスコアリング処理を実行し、得られたスコアに応じて、ＮＮ１４０のモデルパラメータを学習する。また、各カメラ１０は、上位スコアのオブジェクトの推論（検出）に用いたモデルパラメータのみ採用して学習する。これにより、モデルパラメータの学習精度を向上できる。 Each camera 10 (10A, 10B, 10C) performs local learning using the captured image data obtained by imaging. In the local learning, each camera 10 executes a scoring process regarding the detection accuracy of the object detected in the captured image data, and learns the model parameters of the NN 140 according to the obtained score. Further, each camera 10 learns by adopting only the model parameters used for inference (detection) of the object with the higher score. As a result, the learning accuracy of the model parameters can be improved.

また、ローカル学習では、撮像画像データを評価するためのＵＩ画面３４０（図１７参照）が表示可能である。例えば、カメラ１０は、カメラ１０にオプションとして接続された表示器（図示略）にＵＩ画面３４０を表示させてもよいし、サーバ３０の表示部３７に転送してＵＩ画面３４０を表示させることも可能である。 Further, in local learning, a UI screen 340 (see FIG. 17) for evaluating captured image data can be displayed. For example, the camera 10 may display the UI screen 340 on a display (not shown) connected to the camera 10 as an option, or may transfer the UI screen 340 to the display unit 37 of the server 30 to display the UI screen 340. It is possible.

図１７は、ローカル学習時に表示されるＵＩ画面３４０の一例を示す図である。 FIG. 17 is a diagram showing an example of the UI screen 340 displayed during local learning.

ＵＩ３４０は、撮像画像データから切り出された学習データごとに、スコア、カメラＩＤ、リジェクトボタンｂｘを表示する。なお、画像データのサムネイルは、カメラ１０が元の撮像画像データを記憶しているので、ここでは表示されないが、表示されるようにしてもよい。検出の対象（オブジェクト）は「人」である。 The UI 340 displays a score, a camera ID, and a reject button bx for each learning data cut out from the captured image data. The thumbnail of the image data is not displayed here because the camera 10 stores the original captured image data, but it may be displayed. The target (object) of detection is a "person".

スコアは、０点〜１００点の範囲で数値化される。なお、スコアは、各カメラ１０がスコアリング処理することで算出されたが、ユーザがＵＩ３２０から入力することで取得されてもよい。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。リジェクトボタンｂｘは、ユーザにより選択された場合、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The score is quantified in the range of 0 to 100 points. The score was calculated by each camera 10 performing scoring processing, but it may be acquired by inputting from the UI 320 by the user. The camera ID is the identification information of the camera imaged to obtain the learning data. When the reject button bx is selected by the user, a check mark is displayed. The learning data with a check mark added to the reject button bx is not used for learning when the user presses the learning button bt5.

カメラ１０は、自動的に、下位スコアの撮像画像データを採用する学習を行わず、上位スコアの撮像画像データを採用する学習を行った。但し、例えばカメラ１０の代わりに、ユーザが、リジェクトボタンｂｘを用いて学習に用いる撮像画像データを指示してもよい。例えば、ユーザは、下位スコアの撮像画像データを排除する学習を行わず、上位スコアの撮像画像データを採用する学習を行うように指示してもよい。 The camera 10 did not automatically learn to adopt the captured image data of the lower score, but learned to adopt the captured image data of the upper score. However, for example, instead of the camera 10, the user may use the reject button bx to instruct the captured image data to be used for learning. For example, the user may instruct to perform learning to adopt the captured image data of the upper score without performing learning to exclude the captured image data of the lower score.

また、各カメラ１０は、上位スコアの撮像画像データのみ採用する学習を行ったが、例えば下位スコアの撮像画像データのみ排除する学習を行うように指示してもよい。これにより、下位スコアの撮像画像データが排除された撮像画像データを用いて学習を行うことができる。また、上位スコアの撮像画像データと下位スコアの撮像画像データとを組み合わせて用いるように設定されてもよい。これにより、撮像画像データの品質に照らして、学習に用いる画像データを、カメラ或いはユーザが個別に選別できる。 Further, although each camera 10 has learned to adopt only the captured image data of the upper score, it may be instructed to perform learning to exclude only the captured image data of the lower score, for example. As a result, learning can be performed using the captured image data from which the captured image data of the lower score is excluded. Further, it may be set to use the captured image data of the upper score and the captured image data of the lower score in combination. Thereby, the camera or the user can individually select the image data used for learning in light of the quality of the captured image data.

サーバ３０は、各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）から送信されたモデルパラメータを受信し、受信した各モデルパラメータを合算する統合学習を行い、合算したモデルパラメータを学習用ＤＢ３４に追加する。ここで、統合されるモデルパラメータは、設置状況が同じカメラで撮像された画像データを基に得られたモデルパラメータである。一方、設置状況が異なるカメラで撮像された画像データを基に得られるモデルパラメータは、合算されず、別々の学習モデルに対するモデルパラメータとして個別に登録される。 The server 30 receives the model parameters transmitted from each camera 10 (10A, 10B, 10C), performs integrated learning to sum the received model parameters, and adds the summed model parameters to the learning DB 34. Here, the model parameters to be integrated are model parameters obtained based on the image data captured by the cameras having the same installation status. On the other hand, the model parameters obtained based on the image data captured by the cameras having different installation conditions are not added up and are individually registered as model parameters for different learning models.

図１８は、統合学習時にサーバ３０の表示部３７に表示されるＵＩ画面３５０の一例を示す図である。 FIG. 18 is a diagram showing an example of a UI screen 350 displayed on the display unit 37 of the server 30 during integrated learning.

サーバ３０は、表示部３７に、統合学習時のＵＩ３１０（図１８参照）を表示可能である。ＵＩ３５０は、撮像画像データから切り出された学習データ毎に、スコア、サムネイル、カメラＩＤ、リジェクトボタンｂｘを表示する。ここでは、検出の対象（オブジェクト）が「人」である場合を示す。 The server 30 can display the UI 310 (see FIG. 18) at the time of integrated learning on the display unit 37. The UI 350 displays a score, a thumbnail, a camera ID, and a reject button bx for each learning data cut out from the captured image data. Here, the case where the detection target (object) is a “person” is shown.

スコアは、０点〜１００点の範囲で数値化される。例えば、対象が「人」である場合、人が写っている画像データの点数は、８０点〜１００点と高くなる。一方、人でなく「木」が写っている画像データの点数は、１０点と低くなる。サムネイルは、学習データの縮小画像である。サムネイルであるので、カメラ１０からサーバ３０に送信される際、データ転送量は抑えられる。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。リジェクトボタンｂｘは、ユーザにより選択され、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The score is quantified in the range of 0 to 100 points. For example, when the target is a "person", the score of the image data showing the person is as high as 80 to 100 points. On the other hand, the score of the image data showing a "tree" instead of a person is as low as 10 points. Thumbnails are reduced images of learning data. Since it is a thumbnail, the amount of data transfer can be suppressed when it is transmitted from the camera 10 to the server 30. The camera ID is the identification information of the camera imaged to obtain the learning data. The reject button bx is selected by the user and a check mark is displayed. The learning data with a check mark added to the reject button bx is not used for learning when the user presses the learning button bt5.

サーバ３０は、自動的に上位スコアの撮像画像データを採用する学習を行わず、下位スコアの撮像画像データを排除する学習を行う。但し、例えばユーザが、リジェクトボタンｂｘを用いて学習に用いる撮像画像データを指示してもよい。例えば、ユーザは、下位スコアの撮像画像データを排除する学習を行わず、上位スコアの撮像画像データを採用する学習を行うように指示してもよい。 The server 30 does not automatically learn to adopt the captured image data of the upper score, but performs learning to exclude the captured image data of the lower score. However, for example, the user may instruct the captured image data to be used for learning by using the reject button bx. For example, the user may instruct to perform learning to adopt the captured image data of the upper score without performing learning to exclude the captured image data of the lower score.

このように、サーバ３０がモデルパラメータを統合学習することで、モデルパラメータの学習の精度が向上する。サーバ３０は、統合学習の結果である更新されたモデルパラメータを、該当するカメラ１０にフィードバック送信する。これにより、各カメラ１０で得られる画像データの正報が多くなるほど、カメラの検出精度が高くなる。 In this way, the server 30 performs integrated learning of the model parameters, so that the accuracy of learning the model parameters is improved. The server 30 feeds back the updated model parameters that are the result of the integrated learning to the corresponding camera 10. As a result, the more positive reports of the image data obtained by each camera 10, the higher the detection accuracy of the cameras.

また、サーバ３０は、統合学習の結果である、更新されたモデルパラメータを、各カメラ１０にフィードバック送信する際、各カメラ１０の正報の数に応じて、フィードバック量を制御する。つまり、サーバ３０は、誤報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が多くなるように、更新済みモデルパラメータを送信する。これにより、正報の数が増加し、カメラの検出精度が向上する。 Further, when the updated model parameter, which is the result of the integrated learning, is fed back to each camera 10, the server 30 controls the amount of feedback according to the number of positive reports of each camera 10. That is, the server 30 transmits updated model parameters to the camera 10 having a large number of false alarms so that the amount of feedback (for example, the number of feedbacks) is large. This increases the number of positive reports and improves the detection accuracy of the camera.

一方、サーバ３０は、正報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が少なくなるように、更新済みモデルパラメータを送信する。これにより、カメラの処理の負荷を軽減できる。なお、サーバ３０は、設置環境が同じであるカメラに対し、同一の更新済みのモデルパラメータを送信して共有させることは、前述した通りである。 On the other hand, the server 30 transmits updated model parameters to the camera 10 having a large number of positive reports so that the amount of feedback (for example, the number of feedbacks) is small. This can reduce the processing load of the camera. As described above, the server 30 transmits and shares the same updated model parameters to cameras having the same installation environment.

また、サーバ３０は、各カメラ１０に対し、学習の実行指示を行う際、各カメラ１０の正報の数に応じて、学習の処理量を指示する。誤報の数が多いカメラ１０に対し、学習量が多くなるように、学習の実行指示を行う。これにより、正報の数が増加し、カメラの検出精度が向上する。一方、サーバ３０は、正報の数が多いカメラ１０に対し、学習量が少なくなるように、学習の実行指示を行う。これにより、カメラの処理の負荷を軽減できる。 Further, when the server 30 gives an instruction to execute learning to each camera 10, the server 30 instructs each camera 10 to process the learning according to the number of positive reports of each camera 10. A learning execution instruction is given to the camera 10 having a large number of false alarms so that the learning amount is large. This increases the number of positive reports and improves the detection accuracy of the camera. On the other hand, the server 30 gives an instruction to execute learning to the camera 10 having a large number of positive reports so that the amount of learning is small. This can reduce the processing load of the camera.

また、サーバ３０は、各カメラ１０で撮像された画像に出現する対象を検出する検出の結果を統合して管理してもよい。検出の結果を統合する場合、対象の動きをベクトルで表し、ベクトルで検出の結果を管理してもよい。 Further, the server 30 may integrate and manage the detection result of detecting the target appearing in the image captured by each camera 10. When integrating the detection results, the movement of the target may be represented by a vector, and the detection results may be managed by the vector.

以上により、実施の形態２のカメラ１０は、監視エリアＳＡに設置され、サーバ３０と互いに通信可能に接続された監視システム５に用いられるカメラである。カメラ１０は、イメージセンサ１２において、監視エリアＳＡからの被写体光を撮像する。カメラ１０は、検出部の一例としてのオブジェクト推論機能１４１において、被写体光の撮像に基づく撮像画像ｏｇを用いて、撮像画像に出現する少なくとも１つのオブジェクトを検出する。カメラ１０は、オブジェクトの種別ごとに用意された教師データセットを教師データセットメモリ１５１において保持する。カメラ１０は、導出部の一例としてのスコア導出機能１４２において、教師データセットを用いて、検出されたオブジェクトの検出精度を示すスコアを導出する。カメラ１０は、パラメータ学習部の一例としてのパラメータ学習機能１４４において、導出されたスコアに応じて、オブジェクトの検出に用いるモデルパラメータを学習する。カメラ１０は、パラメータ学習機能１４４において、モデルパラメータの学習結果をパラメータメモリ１５２に登録して蓄積する。 As described above, the camera 10 of the second embodiment is a camera used in the monitoring system 5 installed in the monitoring area SA and connected to the server 30 so as to be able to communicate with each other. The camera 10 captures the subject light from the monitoring area SA in the image sensor 12. The camera 10 detects at least one object appearing in the captured image by using the captured image og based on the imaging of the subject light in the object inference function 141 as an example of the detection unit. The camera 10 holds a teacher data set prepared for each type of object in the teacher data set memory 151. The camera 10 derives a score indicating the detection accuracy of the detected object by using the teacher data set in the score derivation function 142 as an example of the derivation unit. The camera 10 learns the model parameters used for detecting the object according to the derived score in the parameter learning function 144 as an example of the parameter learning unit. The camera 10 registers and stores the learning result of the model parameter in the parameter memory 152 in the parameter learning function 144.

これにより、カメラ１０は、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出によって得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上させることができる。 As a result, the camera 10 is a parameter used for detection according to a score indicating the detection accuracy of the object obtained by detecting at least one object in the captured image captured by the camera installed in the surveillance area. The amount of learning can be appropriately controlled and the learning accuracy in the camera can be improved.

また、パラメータ学習機能１４４は、第１所定値より上位のスコアが導出されたモデルパラメータを採用するように学習する。このように、カメラ１０は、上位スコアのオブジェクトの推論に用いたモデルパラメータを採用するように学習することで、正報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられるようになり、モデルパラメータの学習精度を向上させることができる。 Further, the parameter learning function 144 learns to adopt the model parameter from which the score higher than the first predetermined value is derived. In this way, the camera 10 learns to adopt the model parameters used for inferring the high-scoring object so that the model parameters applied to the estimation of the object with high possibility of positive information are used. Therefore, the learning accuracy of model parameters can be improved.

また、パラメータ学習機能１４４は、第２所定値より下位のスコアが導出されたモデルパラメータを排除するように学習する。このように、カメラ１０は、下位スコアのオブジェクトの推論に用いたモデルパラメータを排除するように学習することで、誤報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられなくなり、モデルパラメータの学習精度を向上させることができる。 Further, the parameter learning function 144 learns so as to exclude the model parameter from which the score lower than the second predetermined value is derived. In this way, the camera 10 learns to exclude the model parameters used for inferring the lower score object, so that the model parameters applied to the estimation of the object having a high possibility of false alarm are not used, and the model is used. The learning accuracy of parameters can be improved.

また、パラメータ学習機能１４４は、第１所定値より上位のスコアが導出されたモデルパラメータを採用するように学習し、かつ、第２所定値より下位のスコアが導出されたモデルパラメータを排除するように学習する。このように、カメラ１０は、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて学習することで、モデルパラメータの学習精度をより一層向上させることができる。 Further, the parameter learning function 144 learns to adopt the model parameter from which the score higher than the first predetermined value is derived, and excludes the model parameter from which the score lower than the second predetermined value is derived. To learn. In this way, the camera 10 further improves the learning accuracy of the model parameters by learning by combining the model parameters used for inferring the high-scoring object and the model parameters used for inferring the low-scoring object. Can be made to.

以上、図面を参照しながら各種の実施形態について説明したが、本発明はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。また、発明の趣旨を逸脱しない範囲において、上述実施の形態における各構成要素を任意に組み合わせてもよい。 Although various embodiments have been described above with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It is clear that a person skilled in the art can come up with various modifications or modifications within the scope of the claims, which naturally belong to the technical scope of the present invention. Understood. Further, each component in the above-described embodiment may be arbitrarily combined as long as the gist of the invention is not deviated.

例えば、上述した実施の形態では、監視システムは、泥棒等の不審者を発見・追跡する、防犯用の監視システムに適用される場合を示したが、無人自動化（ＦＡ化）された製造ラインにおける製品検査用の監視システム等に適用されてもよい。 For example, in the above-described embodiment, the monitoring system is applied to a security monitoring system that detects and tracks a suspicious person such as a thief, but in an unmanned automated (FA) production line. It may be applied to a monitoring system for product inspection or the like.

本開示は、ネットワーク上のトラフィックの増大を抑制し、複数のカメラに接続されるサーバの処理負荷の軽減を支援することができる監視システムとして有用である。 The present disclosure is useful as a monitoring system capable of suppressing an increase in traffic on a network and assisting in reducing the processing load of a server connected to a plurality of cameras.

５監視システム
１０，１０Ａ，１０Ｂ，１０Ｃカメラ
１１レンズ
１２イメージセンサ
１３信号処理部
１４処理実行部
１５リソース監視部
１６ネットワークＩ／Ｆ
１７クロップエンコード部
３０サーバ
３１プロセッサ
３２メモリ
３３通信部
３４学習用ＤＢ
３５テーブルメモリ（メモリ）
３６操作部
３７表示部
５０レコーダ
１５０デバイス
ＮＷネットワーク
Ｐモデルパラメータ 5 Monitoring system 10, 10A, 10B, 10C Camera 11 Lens 12 Image sensor 13 Signal processing unit 14 Processing execution unit 15 Resource monitoring unit 16 Network I / F
17 Crop encoding unit 30 Server 31 Processor 32 Memory 33 Communication unit 34 Learning DB
35 Table memory (memory)
36 Operation unit 37 Display unit 50 Recorder 150 Device NW network P Model parameters

Claims

A monitoring system in which a server and multiple cameras installed in the monitoring area are connected so that they can communicate with each other.
The server
It has a memory that holds information about the amount of free resources for the processing capacity to generate the learning model of each of the cameras and the captured image obtained by imaging the surveillance area with each of the cameras.
Based on the information about the amount of free resources of the camera, the process performed by the camera with respect to the detection of at least one object appearing in the captured image obtained by each camera is determined and determined for each camera. The execution instruction of the process is transmitted for each camera, and the process is executed.
Each of the cameras executes a process corresponding to the execution instruction based on the execution instruction of the process transmitted from the server.
Monitoring system.

The process is learning to learn parameters used for detecting at least one object appearing in the captured image.
The monitoring system according to claim 1.

The server transmits the learning execution instruction to the plurality of cameras, respectively.
The plurality of cameras execute the learning according to the learning execution instruction, respectively.
The server receives the result of the learning performed by the plurality of cameras.
The monitoring system according to claim 2.

The server executes the learning, and also sends an execution instruction for the learning to the plurality of cameras.
The plurality of cameras execute the learning according to the learning execution instruction, respectively.
The server receives the result of the learning performed by the plurality of cameras.
The monitoring system according to claim 2.

The server transmits the learning result to the plurality of cameras, respectively.
The plurality of cameras share the learning result transmitted from the server.
The monitoring system according to claim 3 or 4.

Among the plurality of cameras, some of the plurality of cameras are installed under the same installation conditions.
The server transmits the learning result to each of the plurality of cameras.
Some of the plurality of cameras share the learning result transmitted from the server.
The monitoring system according to claim 5.

The server controls the amount of learning processing according to the number of detected objects detected by the camera.
The monitoring system according to claim 2.

The server controls the amount of processing of the learning in the camera according to the number of positive reports of the detection of the object detected by the camera.
The monitoring system according to claim 7.

The server controls the amount of processing of the learning in the camera according to the number of false alarms of detection of the object detected by the camera.
The monitoring system according to claim 7.

The server controls the amount of learning in the camera according to information about the amount of free resources in the camera.
The monitoring system according to claim 2.

The server holds information on the amount of free resources of each of the server and the plurality of cameras constituting the monitoring system in the memory, and relates to the amount of the free resources of each of the server and the plurality of cameras. Controlling the processing amount of the learning according to the information,
The monitoring system according to claim 2.

The processing includes learning to learn parameters used for detecting at least one object appearing in the captured image, detection to detect at least one object appearing in the captured image, and at least one detected by the detection. Analyzing objects, including,
The monitoring system according to claim 1.

The server transmits the captured image held in the memory to at least one camera having a relatively large amount of free resources as compared with the other cameras among the plurality of cameras, and performs the learning. Give an execution instruction,
The monitoring system according to claim 12.

The server transmits the captured image held in the memory to at least one camera having a relatively large amount of free resources as compared with the other cameras among the plurality of cameras, and performs the detection. Give an execution instruction,
The monitoring system according to claim 12.

The server transmits the captured image held in the memory to at least one camera having a relatively large amount of free resources as compared with the other cameras among the plurality of cameras, and performs the analysis. Give an execution instruction,
The monitoring system according to claim 12.

The process is a detection for detecting at least one object appearing in the captured image.
The server transmits the detection result to the plurality of cameras, respectively.
The plurality of cameras share the result of the detection.
The monitoring system according to claim 1.

The server integrates the results of learning performed by the plurality of cameras.
The monitoring system according to claim 2.

Among the plurality of cameras, some of the plurality of cameras are installed under the same installation conditions.
The server integrates the results of learning performed by the plurality of cameras.
The monitoring system according to claim 17.

The server receives notifications of information about the installation status of each of the plurality of cameras from the plurality of cameras, and integrates the learning results performed by the plurality of cameras.
The monitoring system according to claim 18.

The server and the plurality of cameras share information regarding the amount of free resources of the plurality of cameras and information regarding the unit price cost of the plurality of cameras.
The monitoring system according to claim 1.

It is a monitoring method using a monitoring system in which a server and multiple cameras installed in the monitoring area are connected so that they can communicate with each other.
The server
Information on the amount of free resources for the processing capacity to generate the learning model of each of the cameras and the captured image obtained by imaging the surveillance area with each of the cameras are stored in memory.
Based on the information about the amount of free resources of the camera, the process performed by the camera with respect to the detection of at least one object appearing in the captured image obtained by each camera is determined and determined for each camera. The execution instruction of the process is transmitted for each camera, and the process is executed.
Each of the cameras executes a process corresponding to the execution instruction based on the execution instruction of the process transmitted from the server.
Monitoring method.