JP2018207222A

JP2018207222A - Camera and parameter registration method

Info

Publication number: JP2018207222A
Application number: JP2017108163A
Authority: JP
Inventors: 利章篠原; Toshiaki Shinohara; 東澤　義人; Yoshito Tosawa; 義人東澤; 徹寺田; Toru Terada
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2018-12-27

Abstract

To improve learning accuracy in a camera, by appropriately controlling a learning amount of parameters for use in detection.SOLUTION: A camera 10 is installed in a monitoring area SA. The camera 10 includes an image sensor 12 for imaging subject light from the monitoring area SA, an object estimation function for detecting at least one object appearing in a captured image, by using the captured image based on imaging of the subject light, a teacher data set memory holding teacher data set prepared for each type of the object, a score derivation function for deriving a score indicating detection accuracy of a detected object by using the teacher data set, and a parameter learning function for learning model parameters for use in detection of the object, according to the derived score. The parameter learning function registers and accumulates learning results of the model parameters in the parameter memory .SELECTED DRAWING: Figure 15

Description

本開示は、監視エリアに設置され、監視エリアの被写体を撮像して対象物の検出を行うカメラと、対象物の検出の際に用いられるパラメータの登録を行うパラメータ登録方法とに関する。 The present disclosure relates to a camera that is installed in a monitoring area and images a subject in the monitoring area to detect an object, and a parameter registration method that registers a parameter used when detecting the object.

現在、カメラの演算処理能力は、０．３Ｔ（テラ）ｏｐｓ（オプス）と言われている。Ｔ（テラ）は、１０の１２乗を示す値である。ｏｐｓ（オプス）は、演算処理能力を示す単位として知られている。今後、ゲーム機等に搭載される高性能なＧＰＵ（Graphics Processing Unit）やＦＰＧＡ（Field Programmable Gate Array）がカメラの演算処理装置に利用されるべく採用されることが考えられている。その場合、例えば１年後には、カメラの演算処理能力が１０倍以上の約２．６Ｔｏｐｓに飛躍的に向上することが期待されている。 At present, it is said that the arithmetic processing capability of a camera is 0.3 T (tera) ops (ops). T (terra) is a value indicating 10 12. Ops (ops) is known as a unit indicating arithmetic processing capability. In the future, it is considered that high-performance GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays) mounted on game machines and the like will be used for camera processing units. In that case, for example, after one year, it is expected that the arithmetic processing capability of the camera will be dramatically improved to about 2.6 Tops, which is 10 times or more.

また、カメラが機械学習の一例としてのディープラーニングを用いて画像認識処理を行う場合、カメラの演算処理能力には、１．３Ｔｏｐｓが要求されるとの指摘がある。この演算処理能力の高さから、従来では、カメラがディープラーニングを用いて画像認識処理を行うことは難しいと考えられていたが、１年後のカメラの演算処理能力では、ディープラーニングを用いて画像認識処理を行うことが十分に可能と考えられる。 Further, it is pointed out that when the camera performs image recognition processing using deep learning as an example of machine learning, 1.3 Tops is required for the arithmetic processing capability of the camera. In the past, it was thought that it was difficult for the camera to perform image recognition processing using deep learning because of its high processing power. However, in the processing power of the camera one year later, deep learning was used. It is considered possible to perform image recognition processing sufficiently.

一方、カメラで撮像された高画質（例えば４Ｋ）の撮像画像データをサーバに転送し、サーバが画像認識処理を行う場合、撮像画像データのサイズの増大に伴ってネットワーク上で伝送される通信量（トラフィック）も必然的に増大し、結果的に通信効率が下がって遅延が発生するようになる。このため、高画質（例えば４Ｋ）の撮像画像データを転送することなく、カメラは、自装置でディープラーニングを用いて画像認識処理を行うことが期待される。 On the other hand, when high-quality (for example, 4K) captured image data captured by the camera is transferred to the server and the server performs image recognition processing, the amount of communication transmitted over the network as the size of the captured image data increases. (Traffic) inevitably increases, resulting in a decrease in communication efficiency and a delay. For this reason, the camera is expected to perform image recognition processing using deep learning in its own device without transferring high-quality (for example, 4K) captured image data.

一般に、ディープラーニングを用いて画像認識処理を行う場合、カメラ等のデバイスは、撮像画像データに含まれる対象物（つまり、被画像認識対象物）を学習し、画像認識処理において用いられるモデルパラメータ（例えば重み係数やしきい値）を変更することで学習モデルを更新する。カメラ等のデバイスは、この更新された学習モデルを基に、撮像画像データに含まれる対象物（つまり、被画像認識対象物）を検出する精度を向上させる。 In general, when performing image recognition processing using deep learning, a device such as a camera learns an object (that is, an image recognition target object) included in captured image data and uses model parameters ( For example, the learning model is updated by changing a weighting factor or a threshold value. A device such as a camera improves the accuracy of detecting an object (that is, an image recognition object) included in the captured image data based on the updated learning model.

カメラにより撮像される撮像画像データを用いて物体を認識し、物体動き情報を取得する先行技術として、例えば特許文献１の物体追跡装置が提案されている。この物体追跡装置は、物体を撮影可能なカメラから取得される時系列の画像群を用いて、取得された画像に係る画像情報と、その物体の実空間での位置に係る位置情報を含む物体動き情報であって正解とされる情報とを含む教師データセットによって学習する。更に、物体追跡装置は、物体追跡対象の画像毎に、その画像に係る画像情報を入力することで少なくともその物体の実空間での正解とされる位置情報を出力する追跡用識別器を用い、その物体の実空間での刻々の位置情報を取得する。 As a prior art for recognizing an object using captured image data captured by a camera and acquiring object motion information, for example, an object tracking device of Patent Document 1 has been proposed. This object tracking device uses a time-series image group acquired from a camera capable of photographing an object, and includes an object including image information related to the acquired image and position information related to the position of the object in real space. Learning is performed using a teacher data set including motion information and correct information. Further, the object tracking device uses a tracking discriminator that outputs position information that is at least correct in the real space of the object by inputting image information related to the image for each object tracking target image. The position information of the object in real space is acquired.

特開２０１６−２０６７９５号公報JP, 2006-207995, A

上述した特許文献１のような従来技術では、撮像画像内において追跡対象となる物体の正解となる物体動き情報を得るために、その物体に関する評価関数のスコアを用いることは開示されている。しかし、物体の検出精度を示すスコアに応じて、物体の検出において必要なパラメータの学習量をコントロールすることについては特段の考慮がなされていなかった。このため、例えば本来学習が必要ではない、検出に用いるパラメータを学習してしまうことでパラメータの学習精度にばらつきが生じ、物体の検出精度に影響を及ぼすことが懸念される。 In the related art such as Patent Document 1 described above, it is disclosed that the score of the evaluation function related to an object is used in order to obtain object motion information as a correct answer of the object to be tracked in the captured image. However, no particular consideration has been given to controlling the learning amount of a parameter necessary for object detection according to the score indicating the object detection accuracy. For this reason, there is a concern that learning of parameters used for detection, which originally does not require learning, causes variations in parameter learning accuracy and affects object detection accuracy.

本開示は、上述した従来の事情に鑑みて案出され、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出に得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上するカメラ及びパラメータ登録方法を提供することを目的とする。 The present disclosure is devised in view of the above-described conventional circumstances, and a score indicating the detection accuracy of an object obtained by detecting at least one object in a captured image captured by a camera installed in a monitoring area. Accordingly, an object of the present invention is to provide a camera and a parameter registration method that appropriately control a learning amount of a parameter used for detection and improve learning accuracy in the camera.

本開示は、監視エリアに設置され、サーバと互いに通信可能に接続された監視システムに用いられるカメラであって、前記監視エリアからの被写体光を撮像する撮像部と、前記撮像部における前記被写体光の撮像に基づく撮像画像を用いて、前記撮像画像に出現する少なくとも１つのオブジェクトを検出する検出部と、前記オブジェクトの種別ごとに用意された教師データセットを保持するメモリと、前記教師データセットを用いて、前記検出部により検出された前記オブジェクトの検出精度を示すスコアを導出する導出部と、前記導出部により導出された前記スコアに応じて、前記オブジェクトの検出に用いるパラメータを学習するパラメータ学習部と、を備え、前記パラメータ学習部は、前記パラメータの学習結果を前記メモリに登録して蓄積する、カメラを提供する。 The present disclosure is a camera used in a monitoring system that is installed in a monitoring area and is connected to a server so as to be communicable with each other, an imaging unit that images subject light from the monitoring area, and the subject light in the imaging unit A detection unit that detects at least one object appearing in the captured image using a captured image based on the captured image, a memory that holds a teacher data set prepared for each type of the object, and the teacher data set. A derivation unit for deriving a score indicating the detection accuracy of the object detected by the detection unit, and parameter learning for learning a parameter used for detection of the object according to the score derived by the derivation unit And the parameter learning unit registers the learning result of the parameter in the memory. To accumulate, to provide a camera.

また、本開示は、監視エリアに設置されたカメラを用いたパラメータ登録方法であって、前記監視エリアからの被写体光を撮像するステップと、前記被写体光の撮像に基づく撮像画像を用いて、前記撮像画像に出現する少なくとも１つのオブジェクトを検出するステップと、前記オブジェクトの種別ごとに用意された教師データセットを用いて、検出された前記オブジェクトの検出精度を示すスコアを導出するステップと、導出された前記スコアに応じて、前記オブジェクトの検出に用いるパラメータを学習するステップと、前記パラメータの学習結果をメモリに登録して蓄積するステップと、を有する、パラメータ登録方法を提供する。 Further, the present disclosure is a parameter registration method using a camera installed in a monitoring area, the step of imaging subject light from the monitoring area, and a captured image based on imaging of the subject light, Detecting at least one object appearing in the captured image, deriving a score indicating detection accuracy of the detected object using a teacher data set prepared for each type of the object, and According to another aspect of the present invention, there is provided a parameter registration method comprising: learning a parameter used for detecting the object according to the score; and registering and storing a learning result of the parameter in a memory.

本開示によれば、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出に得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上することができる。 According to the present disclosure, learning of parameters used for detection is performed according to a score indicating detection accuracy of an object obtained by detecting at least one object in a captured image captured by a camera installed in a monitoring area. It is possible to appropriately control the amount and improve the learning accuracy in the camera.

実施の形態１の監視システムのシステム構成の一例を示すブロック図1 is a block diagram illustrating an example of a system configuration of a monitoring system according to a first embodiment. 学習及び検出の概要例の説明図Explanatory drawing of an outline example of learning and detection 実施の形態１のカメラの内部構成の一例を詳細に示すブロック図FIG. 2 is a block diagram showing in detail an example of the internal configuration of the camera according to the first embodiment. 実施の形態１のサーバの内部構成の一例を詳細に示すブロック図FIG. 2 is a block diagram showing in detail an example of the internal configuration of the server according to the first embodiment. デバイスにおける学習の概要例の説明図Explanatory diagram of an outline example of learning on a device カメラの検出の概要例の説明図Explanatory diagram of an example of camera detection overview 監視システムにおける複数のカメラを用いた学習時の分散を行う時の処理概要例の説明図Explanatory drawing of an example of processing outline when performing dispersion during learning using multiple cameras in a surveillance system 監視システムにおけるリソース管理の概要例の説明図Explanatory drawing of an outline example of resource management in a monitoring system 実施の形態１においてサーバがカメラに処理の実行指示を行う動作手順の一例を詳細に示すシーケンス図FIG. 3 is a sequence diagram illustrating in detail an example of an operation procedure in which the server instructs the camera to execute a process in the first embodiment. 実施の形態１においてサーバがモデルパラメータのフィードバック量を制御する動作手順の一例を詳細に示すシーケンス図The sequence diagram which shows in detail an example of the operation | movement procedure in which the server controls the feedback amount of a model parameter in Embodiment 1. 監視システムにおける学習結果の共有の概要例の説明図Explanatory drawing of an outline example of sharing of learning results in a monitoring system ローカル学習時に表示されるＵＩ画面の一例を示す図The figure which shows an example of UI screen displayed at the time of local learning 統合学習時にサーバの表示部に表示されるＵＩ画面の一例を示す図The figure which shows an example of UI screen displayed on the display part of a server at the time of integrated learning 実施の形態２のカメラの処理実行部の内部構成の一例を詳細に示すブロック図FIG. 3 is a block diagram showing in detail an example of an internal configuration of a processing execution unit of a camera according to a second embodiment. カメラのローカル学習の動作手順の一例を詳細に示すフローチャートThe flowchart which shows an example of the operation | movement procedure of the local learning of a camera in detail 監視システムにおける学習結果の共有の概要例の説明図Explanatory drawing of an outline example of sharing of learning results in a monitoring system ローカル学習時に表示されるＵＩ画面の一例を示す図The figure which shows an example of UI screen displayed at the time of local learning 統合学習時にサーバの表示部に表示されるＵＩ画面の一例を示す図The figure which shows an example of UI screen displayed on the display part of a server at the time of integrated learning

（第１の実施の形態に至る経緯）
今後、カメラが取り扱う撮像画像データは例えば４Ｋや８Ｋ等の高精細かつ大容量となってデータサイズが増大することが予想されている。このような撮像画像データのサイズの増大に伴い、撮像画像データの検出に用いたパラメータの学習をカメラではなくサーバで行う場合、サーバにおいて処理負荷が集中してしまい、更に、大容量のデータを逐一サーバに送信することで、ネットワーク上のトラフィックが増大し、データ通信時に相応の遅延が生じるという課題が生じてしまう。このような課題に対する技術的対策について、特許文献１のような従来技術では特段の考慮はなされてはいなかった。 (Background to the first embodiment)
In the future, the captured image data handled by the camera is expected to increase in data size, for example, with high definition and large capacity such as 4K and 8K. As the size of the captured image data increases, when the parameters used for detection of the captured image data are learned not by the camera but by the server, the processing load is concentrated on the server. By transmitting to the server one by one, the traffic on the network increases, causing a problem that a corresponding delay occurs during data communication. With regard to the technical countermeasures for such problems, the conventional technology such as Patent Document 1 has not been particularly considered.

そこで、実施の形態１では、監視エリアに設置された複数のカメラにおいて撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ間で分散し、ネットワーク上のトラフィックの増大を抑制し、複数のカメラに接続されるサーバの処理負荷の軽減を支援する監視システム及び監視方法の例を説明する。 Therefore, in the first embodiment, when detecting at least one object in each captured image captured by a plurality of cameras installed in a monitoring area, processing such as learning of parameters used for the detection is performed between the plurality of cameras. An example of a monitoring system and a monitoring method for distributing and suppressing an increase in traffic on the network and supporting a reduction in processing load on servers connected to a plurality of cameras will be described.

（実施の形態１）
図１は、実施の形態１の監視システム５のシステム構成の一例を示すブロック図である。 (Embodiment 1)
FIG. 1 is a block diagram illustrating an example of a system configuration of the monitoring system 5 according to the first embodiment.

監視システム５は、例えば防犯用の監視システムであって、銀行、店舗、企業、施設等の屋内、又は、駐車場や公園等の屋外に設置される。銀行、店舗、企業、施設等の屋内、又は、駐車場や公園等の屋外は、監視システム５の監視エリアとなる。本実施の形態の監視システム５は、人工知能（ＡＩ：Artificial Intelligence）技術を利用し、撮像画像に出現する少なくとも１つの対象物（言い換えると、オブジェクト）を認識する少なくとも１つのカメラ１０と、サーバ３０と、レコーダ５０とを含む構成を有する。少なくとも１つのカメラ１０と、サーバ３０と、レコーダ５０とは、ネットワークＮＷを介して互いに通信可能に接続される。 The monitoring system 5 is, for example, a crime prevention monitoring system, and is installed indoors such as a bank, a store, a company, or a facility, or outdoors such as a parking lot or a park. Indoors such as banks, stores, companies, and facilities, or outdoors such as parking lots and parks are monitoring areas of the monitoring system 5. The monitoring system 5 according to the present embodiment uses at least one camera 10 that recognizes at least one object (in other words, an object) appearing in a captured image by using artificial intelligence (AI) technology, and a server. 30 and a recorder 50. At least one camera 10, server 30, and recorder 50 are connected to each other via a network NW so as to communicate with each other.

以下、複数のカメラ１０をそれぞれ区別する必要がある場合には、カメラ１０Ａ，１０Ｂ，１０Ｃ，…と表記する。複数のカメラ１０は、監視エリアとして、例えば建物内の同じ場所に設置されてもよいし、一部のカメラ１０が他のカメラ１０とは異なる場所に設置されてもよい。ここでは、監視エリアとして異なる場所に設置されたカメラ１０Ａ，１０Ｂ，１０Ｃの設置状況（例えば設置角度やカメラの画角）が同じであることを想定している。例えば、カメラ１０Ａ，１０Ｂ，１０Ｃは、いずれも自動ドアが設置された出入口の上方に位置するように壁面に取り付けられ、出入口を出入りする人物をやや上方から見下ろすように撮像する。なお、カメラ１０Ａ，１０Ｂ，１０Ｃの設置状況は、自動ドアが設置された出入口の情報に位置する場合に限定されない。 Hereinafter, when it is necessary to distinguish the plurality of cameras 10 from each other, they are denoted as cameras 10A, 10B, 10C,. The plurality of cameras 10 may be installed as the monitoring area, for example, in the same place in the building, or some of the cameras 10 may be installed in a place different from the other cameras 10. Here, it is assumed that the installation conditions (for example, the installation angle and the angle of view of the camera) of the cameras 10A, 10B, and 10C installed in different places as the monitoring area are the same. For example, the cameras 10A, 10B, and 10C are all attached to the wall surface so as to be positioned above the entrance / exit where the automatic door is installed, and take an image so that the person entering / exiting the entrance / exit is looked down slightly from above. The installation status of the cameras 10A, 10B, and 10C is not limited to the case where the cameras 10A, 10B, and 10C are located in the information on the entrance / exit where the automatic door is installed.

先ず始めに、人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングに用いられるニューラルネットワーク（言い換えると、学習モデル）を生成するための学習、及び学習済みの学習モデル（以下、「学習済みモデル」という）にデータを入力して結果を出力する検出（つまり、推論）について、その概要を説明する。 First of all, learning for generating a neural network (in other words, a learning model) used for deep learning as an example of machine learning of artificial intelligence (AI) technology, and a learned learning model (hereinafter referred to as “learned”). The outline of detection (that is, inference) for inputting data to a model and outputting the result will be described.

図２は、学習及び検出の概要例の説明図である。 FIG. 2 is an explanatory diagram of an outline example of learning and detection.

学習処理（以下、単に「学習」という）は、例えば人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングによって行われる処理である。言い換えると、機械学習の１つとして、近年注目されているニューラルネットワーク（以下、「ＮＮ」と略記する）におけるディープラーニング（つまり、深層学習）を用いて学習が行われる。ディープラーニングによる機械学習では、教師データを用いた「教師有り学習」と、教師データを用いない「教師無し学習」とが行われる。機械学習の結果、学習済みモデルが生成される。一方、検出は、生成された学習済みモデルにデータを入力して結果を得る処理である。 The learning process (hereinafter simply referred to as “learning”) is a process performed by deep learning as an example of machine learning of artificial intelligence (AI) technology, for example. In other words, as one of machine learning, learning is performed using deep learning (that is, deep learning) in a neural network (hereinafter abbreviated as “NN”) that has been attracting attention in recent years. In machine learning by deep learning, “supervised learning” using teacher data and “unsupervised learning” not using teacher data are performed. As a result of machine learning, a learned model is generated. On the other hand, the detection is a process of obtaining data by inputting data into the generated learned model.

学習は、リアルタイムで行われてもよいが、多くの演算処理を必要とするので、通常、オフライン（つまり、非同期）で行われる。一方、検出処理（以下、単に「検出」という）は、通常リアルタイムで行われる。また、学習が行われるデバイスは、例えばカメラ１０、サーバ３０、レコーダ５０のいずれであってもよく、ここでは、カメラ１０において学習される場合を示す。一方、検出は、カメラ１０において行われる。なお、カメラ１０により撮像された撮像画像データをサーバ３０やレコーダ５０に転送しても、ネットワークＮＷ上のトラフィックが発生しない場合には、サーバ３０やレコーダ５０が検出を行ってもよい。 Although learning may be performed in real time, since it requires a lot of arithmetic processing, it is usually performed offline (that is, asynchronously). On the other hand, the detection process (hereinafter simply referred to as “detection”) is normally performed in real time. In addition, the device on which learning is performed may be any of the camera 10, the server 30, and the recorder 50, and here, a case where learning is performed in the camera 10 is shown. On the other hand, the detection is performed in the camera 10. In addition, even if the captured image data captured by the camera 10 is transferred to the server 30 or the recorder 50, the server 30 or the recorder 50 may perform detection if no traffic on the network NW occurs.

学習時、デバイス１５０は、多くの学習データ（例えばカメラ１０で撮像された画像データ）を入力する。デバイス１５０は、入力された学習データを基に、機械学習（例えば、ディープラーニングの処理）を行い、学習モデルであるニューラルネットワーク（ＮＮ１４０）のモデルパラメータＰを更新する。モデルパラメータＰは、ＮＮ１４０を構成する複数のそれぞれのニューロンにおいて設定される重み付け係数（つまり、バイアス）やしきい値等である。デバイス１５０は、機械学習（例えばディープラーニングの処理）を行う際、教師データを用い、学習データごとに正誤を取得するか、或いは評価値（つまり、スコア）を算出する。デバイス１５０は、学習データの正誤或いはスコアの高低に応じて、モデルパラメータＰの学習度合いを変更する。学習後、ＮＮ１４０は、学習済みモデルとして、デバイス１５０における検出に用いられる。 At the time of learning, the device 150 inputs a lot of learning data (for example, image data captured by the camera 10). The device 150 performs machine learning (for example, deep learning processing) based on the input learning data, and updates the model parameter P of the neural network (NN 140) that is a learning model. The model parameter P is a weighting coefficient (that is, bias) or a threshold value set in each of a plurality of neurons constituting the NN 140. When performing machine learning (for example, deep learning processing), the device 150 uses teacher data and acquires correctness for each learning data or calculates an evaluation value (that is, a score). The device 150 changes the learning degree of the model parameter P according to the correctness of the learning data or the level of the score. After learning, the NN 140 is used for detection in the device 150 as a learned model.

検出時（つまり、推論時）、デバイス１５０は、入力データ（例えばカメラ１０でリアルタイムに撮像された撮像画像データ）を入力し、ＮＮ１４０において推論を実行し、その実行により得られた推論結果（つまり、検出されたオブジェクトの判定結果）を出力する。判定結果は、例えば、撮像画像データに含まれる対象物の有無に応じた正報や誤報に関する情報、及び、対象物の評価値を示すスコアに関する情報を含む。正報とは、対象物の検出時に高い確度で正しく検出されたことを示すレポートである。誤報とは、対象物の検出時に高い確度で誤って検出されたことを示すレポートである。 At the time of detection (that is, at the time of inference), the device 150 inputs input data (for example, captured image data imaged in real time by the camera 10), executes an inference at the NN 140, and an inference result obtained by the execution (that is, an inference) , The determination result of the detected object) is output. The determination result includes, for example, information on correct reports and false reports according to the presence or absence of an object included in the captured image data, and information on a score indicating an evaluation value of the object. The correct report is a report indicating that the object is correctly detected with high accuracy when the object is detected. The false alarm is a report indicating that the object is erroneously detected with high accuracy when the object is detected.

図３は、実施の形態１のカメラ１０の内部構成の一例を詳細に示すブロック図である。 FIG. 3 is a block diagram illustrating in detail an example of the internal configuration of the camera 10 according to the first embodiment.

カメラ１０は、例えば監視エリアの被写体像を撮像して撮像画像データを取得する。具体的には、カメラ１０は、レンズ１１と、イメージセンサ１２と、信号処理部１３と、処理実行部１４と、リソース監視部１５と、クロップエンコード部１７と、ネットワークＩ／Ｆ１６とを含む構成である。 For example, the camera 10 captures a subject image in a monitoring area and acquires captured image data. Specifically, the camera 10 includes a lens 11, an image sensor 12, a signal processing unit 13, a processing execution unit 14, a resource monitoring unit 15, a crop encoding unit 17, and a network I / F 16. It is.

カメラ１０は、監視エリアＳＡからの被写体像を入射可能に配されたレンズ１１を介して、監視エリアＳＡからの入射された被写体像をイメージセンサ１２に結像し、イメージセンサ１２において被写体像（つまり、光学像）を電気信号に変換して撮像する。少なくともレンズ１１及びイメージセンサ１２により、カメラ１０の撮像部が構成される。カメラ１０は、イメージセンサ１２において得られた電気信号を用いて、信号処理部１３においてＲＧＢ信号を生成したり、ホワイトバランスやコントラスト調整等の既定の各種の画像処理を行うことで、撮像画像データを生成して出力する。 The camera 10 forms an image of the subject that has entered from the monitoring area SA on the image sensor 12 via the lens 11 that can receive the subject image from the monitoring area SA, and the subject image ( That is, an optical image) is converted into an electrical signal and captured. An imaging unit of the camera 10 is configured by at least the lens 11 and the image sensor 12. The camera 10 uses the electrical signal obtained by the image sensor 12 to generate an RGB signal in the signal processing unit 13 or perform predetermined various image processing such as white balance and contrast adjustment to obtain captured image data. Is generated and output.

処理実行部１４は、例えばＧＰＵ（Graphics Processing Unit）又はＦＰＧＡ（Field Programmable Gate Array）を用いて構成される。今後、高性能で演算処理能力の高いＧＰＵ又はＦＰＧＡがカメラ１０のプロセッサとして採用されてくると、カメラ１０の演算処理能力は飛躍的に向上し、カメラ１０においてディープラーニングの処理が十分に実行可能であると期待される。処理実行部１４は、ＧＰＵ又はＦＰＧＡにおける処理実行によって生成又は更新されたＮＮ１４０としての学習モデル又は学習済みモデルを含み、入力された撮像画像データに対し、撮像画像に現れる少なくとも１つの対象物（つまり、オブジェクト）の判定結果を出力する。 The process execution part 14 is comprised, for example using GPU (Graphics Processing Unit) or FPGA (Field Programmable Gate Array). If GPUs or FPGAs with high performance and high processing power are adopted as the processor of the camera 10 in the future, the processing power of the camera 10 will be greatly improved, and deep learning processing can be sufficiently performed in the camera 10. Expected to be The process execution unit 14 includes a learning model or a learned model as the NN 140 generated or updated by execution of a process in the GPU or FPGA, and at least one target (that is, an object that appears in the captured image) (that is, the input captured image data) , Object) determination result is output.

リソース監視部１５は、処理実行部１４内のＧＰＵ或いはＦＰＧＡやメモリ等の使用状況を基に、カメラ１０の処理能力に関する情報（例えば空きリソースの量）を監視する。 The resource monitoring unit 15 monitors information on the processing capability of the camera 10 (for example, the amount of free resources) based on the usage status of the GPU, FPGA, memory, or the like in the processing execution unit 14.

クロップエンコード部１７は、検出時、撮像画像データに現れる対象物（つまり、オブジェクト）の一部を切り出し、処理すべき撮像画像データ或いはサムネイルのデータとして出力する。 At the time of detection, the crop encoding unit 17 cuts out a part of an object (that is, an object) that appears in the captured image data and outputs it as captured image data or thumbnail data to be processed.

ネットワークＩ／Ｆ１６は、ネットワークＮＷとの接続を制御する。カメラ１０は、ネットワークＩ／Ｆ１６を介して、サーバ３０やレコーダ５０に対し、処理実行部１４から出力される対象物（つまり、オブジェクト）の判定結果、リソース監視部１５によって監視された空きリソースの量、サムネイルのデータ等を送信する。また、カメラ１０は、ネットワークＩ／Ｆ１６を介して、サーバ３０やレコーダ５０、他のカメラ１０から、学習の結果であるモデルパラメータＰを受信する。 The network I / F 16 controls connection with the network NW. The camera 10 receives the determination result of the object (that is, the object) output from the process execution unit 14 to the server 30 and the recorder 50 via the network I / F 16 and the free resource monitored by the resource monitoring unit 15. Send volume, thumbnail data, etc. Further, the camera 10 receives the model parameter P as a learning result from the server 30, the recorder 50, and another camera 10 via the network I / F 16.

図４は、実施の形態１のサーバ３０の内部構成の一例を詳細に示すブロック図である。 FIG. 4 is a block diagram showing in detail an example of the internal configuration of the server 30 according to the first embodiment.

サーバ３０は、プロセッサ（例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）又はＤＳＰ（Digital Signal Processor））３１と、メモリ３２と、通信部３３と、操作部３６と、表示部３７と、学習用ＤＢ（データベース）３４と、テーブルメモリ３５とを含む構成である。プロセッサ３１は、メモリ３２と協働して、サーバ３０の各部の処理や制御を統括的に実行する。メモリ３２は、不揮発メモリ及び揮発メモリを有する。不揮発メモリには、例えば複数のカメラ１０Ａ，１０Ｂ，１０Ｃから通知された、それぞれのカメラ１０Ａ，１０Ｂ，１０Ｃの単価コストに関する情報（例えばカメラ１０Ａ，１０Ｂ，１０Ｃの電力コストに関する情報）が記憶される。電力コストに関する情報とは、詳細は後述するが、例えばカメラ１０Ａ，１０Ｂ，１０Ｃがどのくらい使用されれば結果的にどの程度の電力量（つまり、コスト）がかかるかを示す指標値である。 The server 30 includes a processor (for example, CPU (Central Processing Unit), MPU (Micro Processing Unit) or DSP (Digital Signal Processor)) 31, a memory 32, a communication unit 33, an operation unit 36, a display unit 37, The configuration includes a learning DB (database) 34 and a table memory 35. The processor 31 cooperates with the memory 32 to centrally execute processing and control of each unit of the server 30. The memory 32 includes a nonvolatile memory and a volatile memory. In the nonvolatile memory, for example, information related to unit cost of each camera 10A, 10B, 10C notified from a plurality of cameras 10A, 10B, 10C (for example, information related to power cost of the cameras 10A, 10B, 10C) is stored. . The information related to the power cost is an index value indicating how much power (ie, cost) is consumed as a result, for example, when the cameras 10A, 10B, and 10C are used.

プロセッサ３１は、サーバ３０が機械学習（例えば、ディープラーニングの処理）を行う場合、不揮発メモリに記憶されたプログラムを実行し、学習モデル（ニューラルネットワーク：ＮＮ）を生成する。また、サーバ３０は、複数のカメラ１０から学習の結果であるモデルパラメータＰを受信し、監視エリアＳＡに設置されたそれぞれのカメラ１０の設置状況（つまり、設置角度や画角等の設置環境）が同一であるモデルパラメータＰを統合する。 When the server 30 performs machine learning (for example, deep learning processing), the processor 31 executes a program stored in the nonvolatile memory and generates a learning model (neural network: NN). In addition, the server 30 receives the model parameter P as a learning result from the plurality of cameras 10 and installs each camera 10 installed in the monitoring area SA (that is, an installation environment such as an installation angle and an angle of view). Integrate model parameters P that are identical.

学習用ＤＢ（データベース）３４には、複数のカメラ１０から送信されてサーバ３０において受信された学習の結果であるモデルパラメータＰ（例えば重み付け係数やしきい値）が保存される。 The learning DB (database) 34 stores model parameters P (for example, weighting coefficients and threshold values), which are learning results transmitted from the plurality of cameras 10 and received by the server 30.

テーブルメモリ３５は、複数のカメラ１０の処理能力に関する情報（例えば空きリソースの量）が登録されたテーブルを記憶する。 The table memory 35 stores a table in which information (for example, the amount of free resources) regarding the processing capabilities of the plurality of cameras 10 is registered.

操作部３６は、ユーザが操作可能な学習ボタンｂｔ５（例えば図１３参照）等、各種ボタンを有し、ユーザの入力操作を受け付ける。 The operation unit 36 has various buttons such as a learning button bt5 (see, for example, FIG. 13) that can be operated by the user, and receives an input operation of the user.

表示部３７は、サーバ３０における統合学習の処理結果を提示するＵＩ（ユーザインタフェース）画面３１０（例えば図１２又は図１３参照）を表示する。 The display unit 37 displays a UI (user interface) screen 310 (for example, see FIG. 12 or FIG. 13) that presents the processing results of the integrated learning in the server 30.

図５は、デバイス１５０における学習の概要例の説明図である。 FIG. 5 is an explanatory diagram of an outline example of learning in the device 150.

ここでは、デバイス１５０が撮像画像に出現する「車」を対象物ｏｂｊとして学習する場合を例示して説明する。前述したように、学習は、通常オフライン（非同期）で行われる処理であり、カメラ１０、サーバ３０、レコーダ５０のいずれで行われてもよい。本実施の形態では、デバイス１５０の一例として、カメラ１０が学習を行う。デバイス１５０は、処理実行部１６４と、リソース監視部１６５と、ネットワークＩ／Ｆ１６６と、パラメータ勾配算出部１６８とを含む構成である。 Here, the case where the device 150 learns the “car” appearing in the captured image as the object obj will be described as an example. As described above, learning is a process that is normally performed offline (asynchronously), and may be performed by any of the camera 10, the server 30, and the recorder 50. In the present embodiment, the camera 10 performs learning as an example of the device 150. The device 150 includes a process execution unit 164, a resource monitoring unit 165, a network I / F 166, and a parameter gradient calculation unit 168.

ネットワークＩ／Ｆ１６６は、ネットワークＮＷとの接続を制御し、学習データを、ネットワークＮＷを介して受信する。ここでは、学習データは、車を対象物ｏｂｊとする撮像画像データｇｚ１，ｇｚ２である。各撮像画像データｇｚ１，ｇｚ２は、それぞれスコア（評価値）と正報或いは誤報を付加した教師データである。例えば、撮像画像データｇｚ１は、対象物となる「車」を含む撮像画像であり、高いスコア或いは正報を持つ教師データである。一方、撮像画像データｇｚ２は、対象物となる車ではない「木」の画像であり、低いスコア或いは誤報を持つ教師データである。 The network I / F 166 controls connection with the network NW and receives learning data via the network NW. Here, the learning data is captured image data gz1 and gz2 in which the vehicle is the object obj. The captured image data gz1 and gz2 are teacher data to which a score (evaluation value) and a correct report or a false report are added, respectively. For example, the captured image data gz1 is a captured image including a “car” as an object, and is teacher data having a high score or correct report. On the other hand, the captured image data gz2 is an image of a “tree” that is not a target vehicle, and is teacher data having a low score or false alarm.

処理実行部１６４は、ネットワークＩ／Ｆ１６６を介して入力された、これらの教師データを基に推論を実行することで、学習モデルのモデルパラメータＰ（例えば重み付け係数やしきい値等）を更新する。また、処理実行部１６４は、更新されモデルパラメータＰを、ネットワークＩ／Ｆ１６６を介して、カメラ１０、サーバ３０、レコーダ５０等、他のデバイスに送信する。このように、「教師有り学習」を行うことで、学習能力が高まり、処理実行部１６４は、高品質な学習モデルを生成できる。 The process execution unit 164 updates the model parameters P (for example, weighting factors and threshold values) of the learning model by executing inference based on these teacher data input via the network I / F 166. . In addition, the process execution unit 164 transmits the updated model parameter P to other devices such as the camera 10, the server 30, and the recorder 50 via the network I / F 166. Thus, by performing “supervised learning”, the learning ability is enhanced, and the processing execution unit 164 can generate a high-quality learning model.

パラメータ勾配算出部１６８は、教師データの撮像画像に出現する対象物の勾配を算出する。例えば、横からカメラにより撮像された撮像画像と正面からカメラにより撮像された撮像画像とでは、同じ対象物であっても撮像画像は異なる。つまり、カメラの設置状況（例えば設置角度や画角）に応じて、同じ対象物を検出する際に用いられる学習モデルのモデルパラメータＰも異なってくる。このため、パラメータ勾配算出部１６８は、撮像方向を表す勾配（以下、「パラメータ勾配」という）を算出し、ネットワークＩ／Ｆ１６６を介して、パラメータ勾配Ｐｔをカメラ１０、サーバ３０、レコーダ５０等、他のデバイスに送信する。パラメータ勾配Ｐｔは、モデルパラメータと一緒に或いは別に送信されてもよい。いずれにせよ、カメラの設置状況は頻繁に変更されないので、パラメータ勾配Ｐｔは少なくとも１回送信されればよい。パラメータ勾配Ｐｔを用いることで、カメラの設置状況毎に異なる学習モデルが利用可能となる。 The parameter gradient calculation unit 168 calculates the gradient of the object that appears in the captured image of the teacher data. For example, a captured image captured by the camera from the side and a captured image captured by the camera from the front are different from each other even for the same object. That is, the model parameter P of the learning model used when detecting the same object varies depending on the camera installation status (for example, installation angle or angle of view). For this reason, the parameter gradient calculation unit 168 calculates a gradient representing the imaging direction (hereinafter referred to as “parameter gradient”), and the parameter gradient Pt is converted to the camera 10, the server 30, the recorder 50, etc. via the network I / F 166. Send to other devices. The parameter gradient Pt may be transmitted together with the model parameter or separately. In any case, since the camera installation status is not changed frequently, the parameter gradient Pt only needs to be transmitted at least once. By using the parameter gradient Pt, a different learning model can be used for each camera installation situation.

リソース監視部１６５は、処理実行部１６４内のＧＰＵやメモリ等の使用状況を基に、空きリソースの量を監視する。なお、デバイス１５０がカメラ１０である場合には、図５に示す処理実行部１６４及びパラメータ勾配算出部１６８は図３の処理実行部１４に対応し、図５に示すリソース監視部１６５は図３に示すリソース監視部１５に対応し、図５に示すネットワークＩ／Ｆ１６６は図３に示すネットワークＩ／Ｆ１６に対応する。 The resource monitoring unit 165 monitors the amount of free resources based on the usage status of the GPU and memory in the processing execution unit 164. When the device 150 is the camera 10, the processing execution unit 164 and the parameter gradient calculation unit 168 shown in FIG. 5 correspond to the processing execution unit 14 in FIG. 3, and the resource monitoring unit 165 shown in FIG. The network I / F 166 shown in FIG. 5 corresponds to the network I / F 16 shown in FIG.

図６は、カメラ１０の検出の概要例の説明図である。 FIG. 6 is an explanatory diagram of an outline example of detection by the camera 10.

ここでは、カメラ１０が撮像画像に出現する「車」を対象物として検出する場合を例示して説明する。カメラ１０の処理実行部１４は、機械学習（例えばディープラーニングの処理）が行われた後の学習モデル（つまり、学習済みモデル）を有する。処理実行部１４は、レンズ１１を通して撮像された被写体の撮像画像ｏｇを入力し、学習済みモデルを用いて検出（つまり、撮像画像ｏｇに出現するオブジェクトの推論）を行い、その検出結果（つまり、推論結果）を出力する。クロップエンコード部１７は、被写体の撮像画像ｏｇに含まれる、対象物となる画像を切り出し、検出の結果として切り出し画像を出力する。 Here, the case where the camera 10 detects a “car” appearing in a captured image as an object will be described as an example. The processing execution unit 14 of the camera 10 has a learning model (that is, a learned model) after machine learning (for example, deep learning processing) is performed. The process execution unit 14 receives the captured image og of the subject imaged through the lens 11, performs detection using the learned model (that is, inference of an object appearing in the captured image og), and the detection result (that is, Output inference results). The crop encoding unit 17 cuts out an image to be a target included in the captured image og of the subject, and outputs a cut-out image as a detection result.

ここでは、クロップエンコード部１７によって切り出された、「車」の切り出し画像ｔｇ２と、「木」の切り出し画像ｔｇ１が出力される。「車」の切り出し画像ｔｇ２は、対象物となる車の撮像画像を含むので、高いスコアと正報を有する。一方、「木」の切り出し画像ｔｇ１は、対象物となる車の撮像画像を含まないので、低いスコアと誤報を有する。 Here, the “car” clipped image tg2 and the “tree” clipped image tg1 cut out by the crop encoding unit 17 are output. The cut-out image tg2 of “car” includes a captured image of the car that is the object, and thus has a high score and correct report. On the other hand, the cut-out image tg1 of “tree” does not include the captured image of the vehicle that is the object, and thus has a low score and false alarm.

次に、本実施の形態の監視システム５の具体的な動作について、図面を参照して説明する。 Next, specific operations of the monitoring system 5 of the present embodiment will be described with reference to the drawings.

図７は、監視システム５における複数のカメラを用いた学習時の分散を行う時の処理概要例の説明図である。 FIG. 7 is an explanatory diagram of an example of a processing outline when performing dispersion during learning using a plurality of cameras in the monitoring system 5.

前述したように、学習では、人工知能（ＡＩ）技術の機械学習の一例としてのディープラーニングの処理を用いて生成される学習モデル（つまり、ニューラルネットワーク）のモデルパラメータＰを更新する処理が行われる。一例として、学習を行うデバイスとして、３つのカメラ１０Ａ，１０Ｂ，１０Ｃが学習を行う場合を示す。なお、学習を行うデバイスは、カメラに限らず、サーバ、レコーダでもよい。各カメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ入力した撮像画像データに対し、例えば「教師なし学習」を行う。教師なし学習では、カメラ１０は、学習モデルのモデルパラメータが収束しない場合、アラームを発生する。このとき、ユーザは、アラームを解除して、「教師あり学習」を行う。教師あり学習では、ユーザは、画像データの正報或いは誤報を入力する。なお、教師データの入力では、画像データの正報或いは誤報を入力する代わりに、正報或いは誤報とともにスコア（評価値）を入力してもよい。スコアは、撮像画像データが対象物を含む撮像画像データであることを評価する値であり、例えば８０点，１０点等の点数や、５０％，２０％等の確率で表現される。 As described above, in the learning, a process of updating the model parameter P of a learning model (that is, a neural network) generated using a deep learning process as an example of machine learning of artificial intelligence (AI) technology is performed. . As an example, a case where three cameras 10A, 10B, and 10C perform learning as a device that performs learning will be described. The learning device is not limited to the camera, but may be a server or a recorder. Each of the cameras 10A, 10B, and 10C performs, for example, “unsupervised learning” on the input captured image data. In unsupervised learning, the camera 10 generates an alarm if the model parameters of the learning model do not converge. At this time, the user cancels the alarm and performs “supervised learning”. In supervised learning, the user inputs the correct or incorrect report of image data. In inputting teacher data, a score (evaluation value) may be input together with the correct or incorrect report instead of inputting the correct or incorrect report of the image data. The score is a value for evaluating that the captured image data is captured image data including an object, and is expressed by a score of 80 points, 10 points, or a probability of 50%, 20%, for example.

３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ学習の結果であるモデルパラメータＰをサーバ３０に送信する。また、送信されるモデルパラメータＰには、前述したパラメータ勾配Ｐｔが付加される。 The three cameras 10 </ b> A, 10 </ b> B, and 10 </ b> C each transmit model parameters P that are learning results to the server 30. Further, the parameter gradient Pt described above is added to the transmitted model parameter P.

サーバ３０は、３つのカメラ１０Ａ，１０Ｂ，１０Ｃから送信されたモデルパラメータＰを基に、学習モデルのモデルパラメータＰを更新する。このとき、パラメータ勾配Ｐｔが同じであるモデルパラメータ、つまり、カメラの設置状況が同じであるモデルパラメータを統合する。従って、パラメータ勾配が同じである学習モデルのモデルパラメータが更新される。ここでは、カメラ１０Ａ，１０Ｂ，１０Ｃの設置状況はいずれも同じであり、サーバ３０は、カメラ１０Ａ，１０Ｂ，１０Ｃの、更新された各モデルパラメータを統合する。 The server 30 updates the model parameter P of the learning model based on the model parameter P transmitted from the three cameras 10A, 10B, and 10C. At this time, model parameters having the same parameter gradient Pt, that is, model parameters having the same camera installation state are integrated. Accordingly, the model parameters of the learning model having the same parameter gradient are updated. Here, the installation status of the cameras 10A, 10B, and 10C is the same, and the server 30 integrates the updated model parameters of the cameras 10A, 10B, and 10C.

サーバ３０は、統合したモデルパラメータを３つのカメラ１０Ａ，１０Ｂ，１０Ｃにフィードバック送信する。これにより、３つのカメラ１０Ａ，１０Ｂ，１０Ｃに記憶されるモデルパラメータは、同じになる。なお、３つのカメラ１０Ａ，１０Ｂ，１０Ｃからサーバ３０へのモデルパラメータの送信は、非同期で行われる。 The server 30 sends the integrated model parameters as feedback to the three cameras 10A, 10B, and 10C. As a result, the model parameters stored in the three cameras 10A, 10B, and 10C are the same. Note that the transmission of model parameters from the three cameras 10A, 10B, and 10C to the server 30 is performed asynchronously.

図８は、監視システム５におけるリソース管理の概要例の説明図である。 FIG. 8 is an explanatory diagram of an outline example of resource management in the monitoring system 5.

３つのカメラ１０Ａ，１０Ｂ，１０Ｃでは、リソース監視部１５は、それぞれ学習モデルを生成するＧＰＵ或いはＦＰＧＡ等の処理能力に対し、空きリソースの量（言い換えると、処理能力の余り度合いを示す余力）を監視している。３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、リソース監視部１５によって監視された空きリソースの量を非同期で又は周期的にサーバ３０に通知する。空きリソースの量は、処理能力の百分率（％）で表される。一例として、カメラ１０Ａの空きリソースの量が９０％であり、カメラ１０Ｂの空きリソースの量が２０％であり、カメラ１０Ｃの空きリソースの量が１０％である場合、サーバ３０は、空きリソースの量が多いカメラ１０Ａに優先的に学習させるように、つまり、学習量を増やすように、このカメラ１０Ａに学習の指示を出力する。 In the three cameras 10A, 10B, and 10C, the resource monitoring unit 15 determines the amount of free resources (in other words, the remaining power indicating the remaining degree of the processing capacity) for the processing capacity such as GPU or FPGA that generates the learning model. Monitoring. The three cameras 10A, 10B, and 10C notify the server 30 of the amount of free resources monitored by the resource monitoring unit 15 asynchronously or periodically. The amount of free resources is expressed as a percentage (%) of processing capacity. As an example, when the amount of free resources of the camera 10A is 90%, the amount of free resources of the camera 10B is 20%, and the amount of free resources of the camera 10C is 10%, the server 30 A learning instruction is output to the camera 10A so as to preferentially learn the camera 10A having a large amount, that is, to increase the learning amount.

また、サーバ３０は、ネットワークＮＷの帯域が広い、或いはネットワークＮＷが空いている場合には、空きリソースの量が１０％と少ないカメラ１０Ｃで撮像された撮像画像データ（正報或いは誤報の情報付き）を受信すると、空きリソースの量が９０％と多いカメラ１０Ａにその撮像画像データを送信して学習を指示してもよい。これにより、カメラ間で偏った処理の負荷がかかることなく、適正な学習が実現可能となる。 In addition, when the network NW has a wide bandwidth or the network NW is free, the server 30 captures captured image data (correct information or incorrect information with information) captured by the camera 10C having a small amount of free resources of 10%. ), The captured image data may be transmitted to the camera 10A having a large amount of free resources of 90% to instruct learning. As a result, it is possible to realize appropriate learning without imposing a processing load biased between the cameras.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、学習を行うように指示してもよい。これにより、監視システム内で学習を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い学習が可能である。 In addition, the server 30 may instruct to perform learning by directly transferring captured image data captured by a camera with a small amount of free resources to a camera having a large amount of free resources. Thereby, learning can be distributed in the monitoring system, and efficient learning is possible without imposing a large load on a specific camera.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、検出を行うように指示してもよい。これにより、監視システム内で検出を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い検出が可能である。 Further, the server 30 may instruct the detection to be performed by directly transferring the captured image data captured by the camera having a small amount of free resources to the camera having a large amount of free resources. Thereby, detection can be distributed within the monitoring system, and efficient detection is possible without imposing a large load on a specific camera.

また、サーバ３０は、空きリソース量の少ないカメラで撮像された撮像画像データを、空きリソース量の多いカメラに直接に転送して、分析を行うように指示してもよい。ここで、分析とは、撮像画像に出現する対象物（つまり、オブジェクト）ｏｂｊを追尾する、或いは、対象物が不審人物に該当するか否かを認識する、等の処理であり、分析の内容は本実施の形態では特に限定されない。これにより、監視システム内で分析を分散させることができ、特定のカメラに大きな負荷をかけることなく、効率の良い分析が可能である。 Further, the server 30 may instruct to directly transfer the captured image data captured by the camera having a small amount of free resources to the camera having a large amount of free resources, and perform an analysis. Here, the analysis is processing such as tracking an object (that is, an object) obj appearing in the captured image or recognizing whether the object corresponds to a suspicious person. Is not particularly limited in the present embodiment. As a result, the analysis can be distributed within the monitoring system, and an efficient analysis can be performed without imposing a large load on a specific camera.

また、サーバ３０は、監視システム５の全体の処理能力を監視し、システム全体の空きリソースの量が多い場合、各カメラ１０に対し、学習量を増やすように指示し、一方、システム全体の空きリソースの量が少ない場合、各カメラ１０に対し、学習量を減らすように指示してもよい。これにより、監視システム全体に大きな負荷をかけることなく、適正な学習が可能となる。 In addition, the server 30 monitors the overall processing capability of the monitoring system 5 and instructs each camera 10 to increase the learning amount when the amount of free resources in the entire system is large. When the amount of resources is small, each camera 10 may be instructed to reduce the learning amount. As a result, appropriate learning can be performed without imposing a large load on the entire monitoring system.

また、サーバ３０は、各カメラ１０による検出の結果を全てのカメラ１０に共有するように、指示してもよい。これにより、各カメラ１０に検出の結果を分散させることができ、次回以降の検出に用いることで検出精度の向上を図ることができる。 Further, the server 30 may instruct the detection result of each camera 10 to be shared by all the cameras 10. As a result, the detection results can be distributed to the cameras 10, and the detection accuracy can be improved by using the detection results for the next and subsequent detections.

また、サーバ３０は、カメラ１０の空きリソースの量が多い場合、このカメラ１０に送信する統合学習の結果のフィードバック量（例えばフィードバック回数）を増やすように指示し、一方、カメラ１０の空きリソースの量が少ない場合、このカメラ１０に送信する、統合学習の結果のフィードバック量（例えばフィードバック回数）を減らすように指示してもよい。これにより、カメラに大きな負荷をかけることなく、適正な量の学習の結果をカメラにフィードバックする（戻す）ことができる。 In addition, when the amount of free resources of the camera 10 is large, the server 30 instructs to increase the feedback amount (for example, the number of feedbacks) of the integrated learning result transmitted to the camera 10. When the amount is small, it may be instructed to reduce the feedback amount (for example, the number of feedbacks) of the result of integrated learning transmitted to the camera 10. Thereby, an appropriate amount of learning results can be fed back (returned) to the camera without imposing a heavy load on the camera.

また、３つのカメラ１０Ａ，１０Ｂ，１０Ｃは、それぞれ単価コストに関する情報（例えば電力コストに関する情報）をサーバ３０に通知する。電力コストは、カメラ固有の値であり、例えばワット／フレーム（Ｗ／ｆｒａｍｅ）の単位で表現される。一例として、カメラ１０Ａでは１／２００、カメラ１０Ｂでは１／２００、カメラ１０Ｃでは１／４００が挙げられる。なお、電力コストは、通常カメラの使用状況によって大きく変化しないので、１回の通知で充分である。また、電力コストの単位は、フレーム／ワット（ｆｒａｍｅ／Ｗ）で表現されてもよい。 The three cameras 10A, 10B, and 10C each notify the server 30 of information about unit cost (for example, information about power cost). The power cost is a value unique to the camera and is expressed in units of watts / frame (W / frame), for example. As an example, 1/20 for the camera 10A, 1/200 for the camera 10B, and 1/400 for the camera 10C. Note that the power cost does not vary greatly depending on the usage status of the camera, so a single notification is sufficient. The unit of power cost may be expressed in frame / watt (frame / W).

サーバ３０は、カメラ１０Ａとカメラ１０Ｂの電力コストが同じように高い場合、電力コストの低いカメラ１０Ｃに対し、優先的に学習を割り当てる。 When the power costs of the camera 10A and the camera 10B are equally high, the server 30 preferentially assigns learning to the camera 10C having a low power cost.

サーバ３０は、カメラ１０Ａ，１０Ｂ，１０Ｃの空きリソースの量が同じ或いは同程度である場合、例えばカメラ１０Ａの空きリソースの量が１０％であり、カメラ１０Ｂ，１０Ｃの空きリソースの量がいずれも４５％である場合、電力コストのかからないカメラ１０Ｃで優先的に学習するように、このカメラ１０Ｃに学習の指示を出力する。 In the server 30, when the amount of free resources of the cameras 10A, 10B, and 10C is the same or similar, for example, the amount of free resources of the camera 10A is 10%, and the amount of free resources of the cameras 10B and 10C are both If it is 45%, a learning instruction is output to the camera 10C so as to preferentially learn with the camera 10C that does not incur power costs.

なお、カメラの空きリソースの多寡に拘わらず、サーバ３０は、コスト優先で電力コストの低いカメラで学習を実行するように指示してもよい。また、各デバイスの空きリソース及び電力コストの管理を、サーバ３０が行っていたが、各カメラやレコーダが管理してもよく、その場合、空きリソース及び電力コストを監視システム５内の全てデバイス１５０で共有できる。従って、空きリソース及び電力コストを考慮して、各デバイスは、処理の指示実行を行うことも可能となり、多様な運用が可能となる。 Note that the server 30 may instruct the camera 30 to perform learning with a camera with priority on cost and low power cost, regardless of the number of available resources of the camera. Further, although the server 30 manages the free resources and power costs of each device, each camera or recorder may manage them. In this case, all the devices 150 in the monitoring system 5 manage the free resources and power costs. Can be shared. Therefore, each device can also execute processing instructions in consideration of available resources and power costs, and various operations are possible.

図９は、実施の形態１においてサーバ３０がカメラ１０に処理の実行指示を行う動作手順の一例を詳細に示すシーケンス図である。 FIG. 9 is a sequence diagram illustrating in detail an example of an operation procedure in which the server 30 instructs the camera 10 to execute a process in the first embodiment.

図９の動作手順では、サーバ３０は、カメラ１０の空きリソースの情報を基に、複数のカメラ１０の中から、分散処理の対象となるカメラ１０を決定し、該当するカメラ１０に処理実行を指示する。カメラの台数Ｎは、任意の台数でよく、ここでは、説明を簡単にするために２台（カメラ１０Ａ，１０Ｂ）を例示する。なお、サーバ３０の代わりに、レコーダ５０が処理実行を指示するカメラ１０を決定してもよい。 In the operation procedure of FIG. 9, the server 30 determines a camera 10 to be distributed from a plurality of cameras 10 based on the information on the free resources of the camera 10, and executes the process on the corresponding camera 10. Instruct. The number N of cameras may be an arbitrary number. Here, two cameras (cameras 10A and 10B) are illustrated for the sake of simplicity. Instead of the server 30, the recorder 50 may determine the camera 10 that instructs the execution of the process.

カメラ１０Ａは、リソース監視部１５によって監視された空きリソースの情報を繰り返し（例えば常時、又は周期的に）サーバ３０に通知する（Ｔ１）。同様に、カメラ１０Ｂは、リソース監視部１５によって監視された空きリソースの情報を繰り返し（例えば常時、又は周期的に）サーバ３０に通知する（Ｔ２）。 The camera 10A repeatedly (for example, constantly or periodically) notifies the server 30 of the information on the free resources monitored by the resource monitoring unit 15 (T1). Similarly, the camera 10B repeatedly notifies the server 30 of the information on the free resources monitored by the resource monitoring unit 15 (for example, constantly or periodically) (T2).

サーバ３０は、カメラ１０Ａ，１０Ｂの空きリソースの情報をテーブルメモリ３５に登録して管理する（Ｔ３）。サーバ３０は、所定値（例えば７０％）以上の空きリソースを有する少なくとも１台のカメラの有無を判別する（Ｔ４）。ここでは、カメラ１０Ｂだけが所定値以上の空きリソースを有すると想定する。 The server 30 registers and manages information on the free resources of the cameras 10A and 10B in the table memory 35 (T3). The server 30 determines whether or not there is at least one camera having a free resource equal to or greater than a predetermined value (for example, 70%) (T4). Here, it is assumed that only the camera 10B has free resources equal to or greater than a predetermined value.

所定値以上の空きリソースを有するカメラがある場合（Ｔ４、ＹＥＳ）、サーバ３０は、該当するカメラに対する検出と学習との両方の実行指示を生成する（Ｔ５）。サーバ３０は、カメラ１０Ｂに対し、ネットワークＮＷを経由して、検出と学習との両方の実行指示を送信する（Ｔ６）。カメラ１０Ｂは、該当する処理を実行する（Ｔ７）。 When there is a camera having a free resource equal to or greater than a predetermined value (T4, YES), the server 30 generates an execution instruction for both detection and learning for the corresponding camera (T5). The server 30 transmits execution instructions for both detection and learning to the camera 10B via the network NW (T6). The camera 10B performs a corresponding process (T7).

一方、手順Ｔ４で所定値以上の空きリソースを有するカメラが無い場合（Ｔ４、ＮＯ）、サーバ３０は、全てのカメラ（ここでは、カメラ１０Ａ，１０Ｂ）に対する検出の実行指示を生成する（Ｔ８）。カメラ１０Ａ，１０Ｂは、学習を実行できる程の空きリソースを有していないので、検出のみを行うことになる。サーバ３０は、検出の実行指示を全てのカメラ（ここでは、カメラ１０Ａ，１０Ｂ）に送信する（Ｔ９）。カメラ１０Ａ，１０Ｂは、それぞれ該当する処理を実行する（Ｔ１０，Ｔ１１）。 On the other hand, if there is no camera having free resources equal to or greater than the predetermined value in step T4 (T4, NO), the server 30 generates detection execution instructions for all cameras (here, cameras 10A and 10B) (T8). . Since the cameras 10A and 10B do not have enough free resources to perform learning, only the detection is performed. The server 30 transmits a detection execution instruction to all the cameras (here, the cameras 10A and 10B) (T9). The cameras 10A and 10B execute corresponding processes (T10 and T11).

手順Ｔ７でカメラ１０Ｂが該当する処理を実行した場合、カメラ１０Ｂは、学習結果を生成し（Ｔ１２）、生成した学習結果をサーバ３０に送信する（Ｔ１３）。 When the camera 10B executes the corresponding process in the procedure T7, the camera 10B generates a learning result (T12) and transmits the generated learning result to the server 30 (T13).

図１０は、実施の形態１においてサーバ３０がモデルパラメータのフィードバック量を制御する動作手順の一例を詳細に示すシーケンス図である。 FIG. 10 is a sequence diagram illustrating in detail an example of an operation procedure in which the server 30 controls the feedback amount of the model parameter in the first embodiment.

図１０の動作手順では、サーバ３０は、カメラ１０の空きリソースの情報を基に、モデルパラメータのフィードバック量を制御する。カメラの台数Ｎは、任意の台数でよく、ここでは、説明を簡単にするために２台（カメラ１０Ａ，１０Ｂ）である。なお、サーバ３０の代わりに、レコーダ５０がモデルパラメータをフィードバックするカメラ１０を決定してもよい。 In the operation procedure of FIG. 10, the server 30 controls the feedback amount of the model parameter based on the information on the free resources of the camera 10. The number N of cameras may be an arbitrary number, and here is two (cameras 10A and 10B) for ease of explanation. Instead of the server 30, the recorder 50 may determine the camera 10 that feeds back the model parameters.

サーバ３０は、カメラ１０Ａ，１０Ｂからそれぞれ学習結果であるモデルパラメータを受信し、学習用ＤＢ３４に蓄積する（Ｔ２１）。サーバ３０は、各カメラ１０Ａ，１０Ｂの空きリソースの量に応じて、学習結果である多くのモデルパラメータの中から、推論（検出）処理時に用いる学習モデルのモデルパラメータのフィードバック量をカメラごとに算出する（Ｔ２２）。 The server 30 receives model parameters as learning results from the cameras 10A and 10B, respectively, and accumulates them in the learning DB 34 (T21). The server 30 calculates the feedback amount of the model parameter of the learning model used at the time of the inference (detection) processing among the many model parameters that are the learning results according to the amount of free resources of each camera 10A, 10B. (T22).

サーバ３０は、カメラ１０Ｂに対し、算出されたフィードバック量分のモデルパラメータのデータを送信する（Ｔ２３）。同様に、サーバ３０は、カメラ１０Ａに対し、算出されたフィードバック量分のモデルパラメータのデータを送信する（Ｔ２４）。カメラ１０Ｂは、サーバ３０から受信したモデルパラメータを、処理実行部１４のメモリに追加登録して蓄積する（Ｔ２５）。同様に、カメラ１０Ａは、サーバ３０から受信したモデルパラメータを、処理実行部１４のメモリに追加登録して蓄積する（Ｔ２６）。 The server 30 transmits model parameter data corresponding to the calculated feedback amount to the camera 10B (T23). Similarly, the server 30 transmits model parameter data corresponding to the calculated feedback amount to the camera 10A (T24). The camera 10B additionally registers and stores the model parameters received from the server 30 in the memory of the process execution unit 14 (T25). Similarly, the camera 10A additionally registers and stores the model parameters received from the server 30 in the memory of the process execution unit 14 (T26).

なお、ここでは、フィードバック量は、各カメラの空きリソースの情報を基に、サーバ３０により決定されたが、空きリソースに限らず、教師データに基づく正報検出数や教師データに基づく誤報検出数に応じて、決定されてもよい。 Here, the feedback amount is determined by the server 30 on the basis of information on the free resources of each camera. However, the number of correct reports detected based on teacher data and the number of false reports detected based on teacher data is not limited to free resources. May be determined according to

図１１は、監視システム５における学習結果の共有の概要例の説明図である。 FIG. 11 is an explanatory diagram of an example of an outline of sharing of learning results in the monitoring system 5.

各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）は、撮像により得られた撮像画像データを用いてローカル学習を行い、モデルパラメータを更新する。また、各カメラ１０は、正報が得られた撮像画像データだけを用いて学習を行うことができ、学習の結果であるモデルパラメータの精度を向上できる。また、カメラ１０は、オプションとして接続された表示器１９に、ローカル学習において、撮像画像データを評価するためのＵＩ画面３２０（図１２参照）を表示可能である。また、カメラ１０は、ローカル学習時のＵＩ画面３２０をサーバ３０の表示部３７に表示させることも可能である。 Each camera 10 (10A, 10B, 10C) performs local learning using captured image data obtained by imaging, and updates model parameters. Further, each camera 10 can perform learning using only the captured image data for which the correct report is obtained, and can improve the accuracy of the model parameter that is the learning result. In addition, the camera 10 can display a UI screen 320 (see FIG. 12) for evaluating captured image data in local learning on the display 19 connected as an option. The camera 10 can also display the UI screen 320 at the time of local learning on the display unit 37 of the server 30.

図１２は、ローカル学習時に表示されるＵＩ画面３２０を示す図である。 FIG. 12 is a diagram showing a UI screen 320 displayed during local learning.

ＵＩ画面３２０は、例えばカメラ１０のローカル学習時に、カメラ１０と通信可能に接続されたサーバ３０の表示部３７又はＰＣ（図示略）の表示部において表示され、具体的には、撮像画像データから切り出された学習データごとに、正誤の判定、カメラＩＤ、リジェクトボタンｂｘを表示する。なお、撮像画像データのサムネイルは、カメラ１０が元の撮像画像データを記憶しているので、ここでは表示されないが、表示されるようにしてもよい。検出の対象物（オブジェクト）ｏｂｊは「人」である。 The UI screen 320 is displayed on, for example, the display unit 37 of the server 30 or the display unit of a PC (not shown) that is connected to be communicable with the camera 10 during local learning of the camera 10, and specifically, from the captured image data. For each piece of learning data that has been cut out, a correct / incorrect determination, a camera ID, and a reject button bx are displayed. Note that thumbnails of captured image data are not displayed here because the camera 10 stores the original captured image data, but may be displayed. The detection object (object) obj is “person”.

サーバ３０は、正誤の判定処理において、撮像画像データに対象物ｏｂｊを検出できた場合に正報と判定し、撮像画像データに対象物ｏｂｊを検出できなかった場合に誤報と判定する。なお、ユーザが、サーバ３０の表示部３７に表示されたＵＩ画面３２０に対して入力することで、サーバ３０は、正報或いは誤報を判定してもよい。 In the correctness / incorrectness determination process, the server 30 determines a correct report when the object obj can be detected in the captured image data, and determines a false report when the target object obj cannot be detected in the captured image data. It should be noted that the server 30 may determine whether the report is correct or incorrect by inputting to the UI screen 320 displayed on the display unit 37 of the server 30.

カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。 The camera ID is identification information of a camera imaged to obtain learning data.

リジェクトボタンｂｘは、ユーザにより選択され、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The reject button bx is selected by the user and a check mark is displayed. The learning data in which the check mark is added to the reject button bx is not used for learning when the user presses the learning button bt5.

カメラ１０は、自動的に、誤報の撮像画像データを採用する学習に用いず、正報の撮像画像データを採用する学習に用いたが、カメラ１０の代わりに、ユーザがリジェクトボタンｂｘを用いて撮像画像データを指示してもよい。例えば、ユーザは、誤報の撮像画像データを採用する学習に用いず、正報の撮像画像データを採用する学習に用いるように指示してもよい。これにより、誤報の撮像画像データを用いて学習することができる。また、カメラ１０は、正報の撮像画像データと誤報の撮像画像データとを組み合わせて学習に用いてもよい。これにより、撮像画像データの品質に照らして、学習に用いる撮像画像データを選別できる。 The camera 10 is not automatically used for learning to adopt false image data, but used to learn to use correct image data. Instead of the camera 10, the user uses the reject button bx. You may instruct captured image data. For example, the user may instruct to use the captured image data of the correct report instead of the learning to adopt the captured image data of the incorrect report. Thereby, it is possible to learn using the erroneously captured image data. In addition, the camera 10 may use a combination of correct captured image data and erroneous captured image data for learning. Thereby, the captured image data used for learning can be selected in light of the quality of the captured image data.

サーバ３０は、各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）から送信されたモデルパラメータＰを受信し、受信した各モデルパラメータＰを合算する統合学習を行い、合算したモデルパラメータＰを学習用ＤＢ３４に追加する。ここで、統合されるモデルパラメータは、設置状況が同じカメラで撮像された画像データを基に得られたモデルパラメータである。一方、設置状況が異なるカメラで撮像された画像データを基に得られるモデルパラメータは、合算されず、別々の学習モデルに対するモデルパラメータとして個別に登録される。 The server 30 receives the model parameter P transmitted from each camera 10 (10A, 10B, 10C), performs integrated learning by adding the received model parameters P, and adds the combined model parameter P to the learning DB 34. To do. Here, the model parameters to be integrated are model parameters obtained based on image data captured by cameras having the same installation situation. On the other hand, model parameters obtained based on image data captured by cameras with different installation situations are not added together but individually registered as model parameters for different learning models.

図１３は、統合学習時にサーバ３０の表示部３７に表示されるＵＩ画面３１０を示す図である。 FIG. 13 is a diagram illustrating a UI screen 310 displayed on the display unit 37 of the server 30 during integrated learning.

サーバ３０は、表示部３７に、統合学習時のＵＩ画面３１０（図１３参照）を表示可能である。ＵＩ画面３１０は、撮像画像データから切り出された学習データごとに、正誤の判定、サムネイル、カメラＩＤ、リジェクトボタンｂｘを表示する。ここでは、検出の対象物（オブジェクト）が「人」である場合を示す。 The server 30 can display the UI screen 310 (see FIG. 13) at the time of integrated learning on the display unit 37. The UI screen 310 displays a correct / incorrect determination, a thumbnail, a camera ID, and a reject button bx for each learning data cut out from the captured image data. Here, a case where the detection target (object) is “person” is shown.

サーバ３０は、正誤の判定処理では、撮像画像データに対象物ｏｂｊを検出できた場合に正報と判定し、撮像画像データに対象物ｏｂｊを検出できなかった場合に誤報と判定する。なお、ユーザが、サーバ３０の表示部３７に表示されたＵＩ画面３１０に対して入力することで、サーバ３０は、正報或いは誤報を判定してもよい。 In the correct / incorrect determination process, the server 30 determines that the target object obj is detected in the captured image data as a correct report, and determines that the target object obj is not detected in the captured image data as an incorrect report. It should be noted that the server 30 may determine whether the report is correct or incorrect by inputting to the UI screen 310 displayed on the display unit 37 of the server 30.

サムネイルは、学習データの縮小画像である。サムネイルであるので、カメラ１０からサーバ３０に送信される際、データ転送量は抑えられる。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。 The thumbnail is a reduced image of learning data. Since it is a thumbnail, when it is transmitted from the camera 10 to the server 30, the data transfer amount can be suppressed. The camera ID is identification information of a camera imaged to obtain learning data.

サーバ３０は、自動的に正報の撮像画像データを採用するように学習し（つまり、正報の撮像画像データの検出に用いたモデルパラメータを蓄積するように学習し）、誤報の撮像画像データを排除するように学習する（つまり、誤報の撮像画像データの検出に用いたモデルパラメータを蓄積しないように学習する）。但し、学習に用いる撮像画像データの選択について、ユーザが主体的にリジェクトボタンｂｘを用いて、学習に用いる撮像画像データが指示されてもよい。また、ユーザは、誤報の撮像画像データを排除する学習を行わず、正報の撮像画像データを採用する学習を行うように指示してもよい。 The server 30 automatically learns to adopt the correct captured image data (that is, learns to accumulate the model parameters used to detect the correct captured image data), and erroneously captures the captured image data. (I.e., learning so as not to accumulate model parameters used for detection of mis-reported captured image data). However, regarding selection of captured image data used for learning, the user may instruct the captured image data used for learning by using the reject button bx. In addition, the user may instruct to perform the learning using the correct captured image data without performing the learning to exclude the erroneously captured image data.

このように、サーバ３０がモデルパラメータを統合学習することで、モデルパラメータの学習の精度が向上する。サーバ３０は、統合学習の結果である更新されたモデルパラメータを、該当するカメラ１０にフィードバック送信する。これにより、各カメラ１０で得られる撮像画像データの正報が多くなるほど、カメラの検出精度が高くなる。 As described above, the server 30 performs integrated learning of model parameters, thereby improving the accuracy of model parameter learning. The server 30 transmits the updated model parameter, which is the result of the integrated learning, to the corresponding camera 10 as a feedback. Thereby, the detection accuracy of a camera becomes high, so that the correct report of the captured image data obtained with each camera 10 increases.

また、サーバ３０は、統合学習の結果である更新されたモデルパラメータＰを、各カメラ１０にフィードバック送信する際、各カメラ１０の正報の数に応じて、フィードバック量を制御する。つまり、サーバ３０は、誤報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が多くなるように、更新済みモデルパラメータを送信する。これにより、正報の数が増加し、カメラの検出精度が向上する。 Further, the server 30 controls the amount of feedback according to the number of correct reports of each camera 10 when the updated model parameter P, which is the result of the integrated learning, is transmitted as feedback to each camera 10. That is, the server 30 transmits the updated model parameter to the camera 10 with a large number of false alarms so that the feedback amount (for example, the number of feedbacks) increases. As a result, the number of correct reports increases and the detection accuracy of the camera improves.

一方、サーバ３０は、正報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が少なくなるように、更新済みモデルパラメータを送信する。これにより、カメラの処理負荷を軽減できる。なお、サーバ３０は、設置環境が同じであるカメラに対し、同一の更新済みのモデルパラメータを送信して共有させることは、前述した通りである。 On the other hand, the server 30 transmits the updated model parameters to the camera 10 having a large number of correct reports so that the feedback amount (for example, the number of feedbacks) is reduced. Thereby, the processing load of the camera can be reduced. As described above, the server 30 transmits and shares the same updated model parameter to cameras having the same installation environment.

また、サーバ３０は、各カメラ１０に対し、学習の実行指示を行う際、各カメラ１０の正報の数に応じて、学習の量を指示する。誤報の数が多いカメラ１０に対し、学習量が多くなるように、学習の実行指示を行う。これにより、正報の数が増加し、カメラの検出精度が向上する。一方、サーバ３０は、正報の数が多いカメラ１０に対し、学習量が少なくなるように、学習の実行指示を行う。これにより、カメラの処理負荷を軽減できる。 Further, when the server 30 instructs each camera 10 to execute learning, the server 30 instructs the amount of learning according to the number of correct reports of each camera 10. A learning execution instruction is given to the camera 10 having a large number of false reports so that the learning amount increases. As a result, the number of correct reports increases and the detection accuracy of the camera improves. On the other hand, the server 30 instructs the camera 10 having a large number of correct reports to execute learning so that the learning amount decreases. Thereby, the processing load of the camera can be reduced.

また、サーバ３０は、各カメラ１０で撮像された撮像画像に出現する対象物を検出する検出の結果を統合して管理してもよい。検出の結果を統合する場合、対象物の動きをベクトルで表し、ベクトルで検出の結果を管理してもよい。 In addition, the server 30 may integrate and manage detection results for detecting an object appearing in a captured image captured by each camera 10. When integrating the detection results, the motion of the object may be represented by a vector, and the detection result may be managed by the vector.

以上により、第１の実施形態の監視システム５では、サーバ３０と、監視エリアＳＡに設置された複数のカメラ１０とが互いに通信可能に接続される。サーバ３０は、それぞれのカメラ１０の空きリソース（つまり、処理能力に関する情報）と、それぞれのカメラ１０により監視エリアＳＡの撮像により得られた撮像画像のデータとを保持するテーブルメモリ３５を有する。サーバ３０は、カメラ１０の処理能力に関する情報に基づいて、それぞれのカメラ１０により得られる撮像画像に出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に関してカメラ１０が実行する処理をカメラ１０ごとに決定し、決定された処理の実行指示をカメラ１０ごとに送信する。それぞれのカメラ１０は、サーバ３０から送信された処理の実行指示に基づいて、実行指示に対応する処理を実行する。 As described above, in the monitoring system 5 of the first embodiment, the server 30 and the plurality of cameras 10 installed in the monitoring area SA are connected to be communicable with each other. The server 30 includes a table memory 35 that holds free resources (that is, information regarding processing capability) of each camera 10 and captured image data obtained by capturing the monitoring area SA by each camera 10. The server 30 performs, for each camera 10, a process executed by the camera 10 for detecting at least one object (object) obj appearing in a captured image obtained by each camera 10 based on information on the processing capability of the camera 10. An instruction to execute the determined process is transmitted for each camera 10. Each camera 10 executes a process corresponding to the execution instruction based on the execution instruction of the process transmitted from the server 30.

これにより、監視システム５は、監視エリアＳＡに設置された複数のカメラ１０において撮像されたそれぞれの撮像画像内の少なくとも１つのオブジェクトの検出に際し、その検出に用いるパラメータの学習等の処理を複数のカメラ１０間で分散でき、ネットワーク上のトラフィックの増大を抑制し、複数のカメラ１０に接続されるサーバ３０の処理負荷の軽減を支援することができる。 Thereby, when detecting at least one object in each captured image captured by the plurality of cameras 10 installed in the monitoring area SA, the monitoring system 5 performs a plurality of processes such as learning of parameters used for the detection. It is possible to distribute among the cameras 10, suppress an increase in traffic on the network, and assist in reducing the processing load of the server 30 connected to the plurality of cameras 10.

また、上述した処理は、撮像画像に出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に用いるモデルパラメータＰを学習する学習である。これにより、監視システム５は、負荷の大きな学習を複数のカメラ１０に分散させることができる。 Further, the above-described processing is learning for learning a model parameter P used for detection of at least one object (object) obj appearing in the captured image. Thereby, the monitoring system 5 can distribute learning with a large load to the plurality of cameras 10.

また、サーバ３０は、複数のカメラ１０に対し、学習の実行指示をそれぞれ送信する。複数のカメラ１０は、それぞれ学習の実行指示に従って、学習を実行する。サーバ３０は、複数のカメラ１０により実行された学習の結果を受信する。これにより、サーバ３０は、例えば自装置で学習することなく、複数のカメラ１０から学習の結果を得ることができる。 Further, the server 30 transmits learning execution instructions to the plurality of cameras 10. Each of the plurality of cameras 10 performs learning in accordance with a learning execution instruction. The server 30 receives the results of learning performed by the plurality of cameras 10. Thereby, the server 30 can obtain the learning result from the plurality of cameras 10 without learning by the own device, for example.

また、サーバ３０は、自身で学習を実行するとともに、複数のカメラ１０に学習の実行指示をそれぞれ送信する。複数のカメラ１０は、それぞれ学習の実行指示に従い、学習を実行する。サーバ３０は、複数のカメラ１０により実行された学習の結果を受信する。これにより、サーバ３０は、複数のカメラ１０から得た学習の結果に、自装置の学習結果を加えることができ、次回以降の学習の効率化を図ることができる。 In addition, the server 30 performs learning by itself and transmits a learning execution instruction to each of the plurality of cameras 10. Each of the plurality of cameras 10 performs learning in accordance with a learning execution instruction. The server 30 receives the results of learning performed by the plurality of cameras 10. Thereby, the server 30 can add the learning result of the own apparatus to the learning results obtained from the plurality of cameras 10, and can improve the efficiency of learning from the next time onward.

また、サーバ３０は、学習の結果を複数の前記カメラに送信する。複数のカメラ１０は、学習の結果を共有する。これにより、複数のカメラは、同じ学習の結果を利用できる。 In addition, the server 30 transmits the learning result to the plurality of cameras. The plurality of cameras 10 share the learning result. Thereby, the plurality of cameras can use the same learning result.

また、複数のカメラ１０のうち一部の複数のカメラ１０は同一の設置状況で設置される。サーバ３０は、学習の結果を、設置状況が同じである一部の複数のカメラ１０にそれぞれ送信する。設置状況が同じである一部の複数のカメラ１０は、サーバ３０から送信された学習の結果を共有する。これにより、監視システム５は、設置状況が同じである複数のカメラ１０によるオブジェクトの検出精度を高めることができる。 In addition, some of the plurality of cameras 10 are installed in the same installation situation. The server 30 transmits the learning result to each of a plurality of cameras 10 having the same installation status. Some of the plurality of cameras 10 having the same installation status share the learning result transmitted from the server 30. Thereby, the monitoring system 5 can improve the detection accuracy of the object by the some camera 10 with the same installation condition.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出数に応じて、学習の処理量を制御する。これにより、サーバ３０は、オブジェクトの検出数が多くて、処理の負荷が大きいカメラに対し、負荷を増加させるような、学習の量を減らすことができる。一方、サーバ３０は、オブジェクトの検出数が多くて、処理の負荷が小さいカメラに対し、学習の量を増やすことができる。従って、カメラの処理の負荷を均一化に繋がる。 Further, the server 30 controls the processing amount of learning in accordance with the number of objects (objects) obj detected by the camera 10. As a result, the server 30 can reduce the amount of learning that increases the load on a camera with a large number of detected objects and a large processing load. On the other hand, the server 30 can increase the amount of learning for a camera with a large number of detected objects and a small processing load. Therefore, the processing load of the camera is made uniform.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出の正報の数に応じて、カメラ１０における学習の処理量を制御する。これにより、サーバ３０は、正報の学習の結果を多く用いることで、学習の結果の精度（言い換えると、次回以降の検出の精度）を向上できる。 Further, the server 30 controls the processing amount of learning in the camera 10 according to the number of correct reports of detection of the object (object) obj detected by the camera 10. Thereby, the server 30 can improve the accuracy of the learning result (in other words, the accuracy of detection after the next time) by using many results of correct learning.

また、サーバ３０は、カメラ１０により検出された対象物（オブジェクト）ｏｂｊの検出の誤報の数に応じて、カメラ１０における学習の量を制御する。これにより、サーバ３０は、誤報の学習の結果を用いないようにすることで、結果的に学習の結果の精度（言い換えると、次回以降の検出の精度）を向上できる。 Further, the server 30 controls the amount of learning in the camera 10 according to the number of false reports of detection of the object (object) obj detected by the camera 10. Accordingly, the server 30 can improve the accuracy of the learning result (in other words, the accuracy of detection after the next time) as a result by not using the learning result of the false alarm.

また、サーバ３０は、カメラ１０の処理能力の量に応じて、カメラ１０における学習の処理量を制御する。これにより、サーバ３０は、特定のカメラに大きな負荷をかけることなく、複数のカメラに学習を分散させることができ、効率の良い学習の実現が可能である。 Further, the server 30 controls the learning processing amount in the camera 10 according to the amount of processing capability of the camera 10. Thereby, the server 30 can distribute learning to a plurality of cameras without imposing a large load on a specific camera, and can realize efficient learning.

また、サーバ３０は、監視システム５を構成するサーバ３０及び複数のカメラ１０のそれぞれの処理能力に関する情報をテーブルメモリ３５に保持する。サーバ３０は、サーバ３０及び複数のカメラ１０のそれぞれの処理能力の量に応じて、学習の処理量を制御する。これにより、サーバ３０は、監視システムの特定のデバイスに大きな負荷をかけることなく、複数のデバイスに学習を分散させることができ、効率の良い学習の実現が可能である。 In addition, the server 30 holds information regarding the processing capabilities of the server 30 and the plurality of cameras 10 constituting the monitoring system 5 in the table memory 35. The server 30 controls the processing amount of learning according to the amount of processing capacity of each of the server 30 and the plurality of cameras 10. Thus, the server 30 can distribute learning to a plurality of devices without imposing a large load on a specific device of the monitoring system, and can realize efficient learning.

また、上記処理は、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊの検出に用いるモデルパラメータＰを学習する学習と、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊを検出する検出と、検出によって検出された少なくとも１つの対象物（オブジェクト）ｏｂｊを分析する分析と、を含む。これにより、サーバ３０は、学習の他、検出と分析においても、複数のカメラ１０に処理を分散させることができる。 Further, the above processing is performed by learning to learn a model parameter P used for detection of at least one object (object) obj appearing in the captured image og, and at least one object (object) obj appearing in the captured image og. Detection to detect, and analysis to analyze at least one object obj detected by the detection. Thereby, the server 30 can distribute the processing to the plurality of cameras 10 not only for learning but also for detection and analysis.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、学習の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に学習のスピードを向上させることができる。 Further, the server 30 transmits the captured image data held in the table memory 35 to at least one of the plurality of cameras 10 having a relatively high processing capability as compared with the other cameras, and learning. The execution instruction is performed. As a result, the server 30 can also transmit the captured image data to another camera with high processing capability when the network bandwidth is wide or when the network is free. Can be improved.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、検出の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に検出のスピードを向上させることができる。 In addition, the server 30 transmits and detects the captured image data held in the table memory 35 to at least one of the plurality of cameras 10 having a relatively high processing capability compared to the other cameras. The execution instruction is performed. As a result, the server 30 can also transmit captured image data to another camera with high processing capability when the network bandwidth is wide or when the network is free. Can be improved.

また、サーバ３０は、複数のカメラ１０のうち、他のカメラと比べて相対的に処理能力の高い少なくとも１つのカメラ１０に対し、テーブルメモリ３５に保持される撮像画像のデータを送信し、分析の実行指示を行う。これにより、サーバ３０は、ネットワークの帯域が広い場合或いはネットワークが空いている時等において、他の処理能力の高いカメラに撮像画像データを送信することも可能であり、結果的に分析のスピードを向上させることができる。 In addition, the server 30 transmits captured image data held in the table memory 35 to at least one camera 10 having a relatively high processing capability compared to other cameras among the plurality of cameras 10 for analysis. The execution instruction is performed. As a result, the server 30 can also transmit the captured image data to another camera with high processing capability when the network bandwidth is wide or when the network is free. Can be improved.

また、上記処理は、撮像画像ｏｇに出現する少なくとも１つの対象物（オブジェクト）ｏｂｊを検出する検出である。サーバ３０は、検出の結果を複数のカメラ１０にそれぞれ送信する。複数のカメラ１０は、検出の結果を共有する。これにより、サーバ３０は、特定のカメラに大きな負荷をかけることなく、複数のカメラに検出を分散させることができ、検出の効率を高めることができる。 Moreover, the said process is a detection which detects the at least 1 target object (object) obj which appears in the captured image og. The server 30 transmits the detection result to each of the plurality of cameras 10. The plurality of cameras 10 share the detection result. Thereby, the server 30 can distribute detection to a plurality of cameras without imposing a large load on a specific camera, and can increase the efficiency of detection.

また、サーバ３０は、複数のカメラ１０により実行された学習の結果を統合する。これにより、監視システム５は、サーバ３０における統合によって集約された学習の結果の精度を向上できる。 The server 30 also integrates the results of learning performed by the plurality of cameras 10. Thereby, the monitoring system 5 can improve the accuracy of the learning result aggregated by the integration in the server 30.

また、複数のカメラ１０のうち一部の複数のカメラ１０は、同一の設置状況で設置される。サーバ３０は、カメラ１０の設置状況が同じである、複数のカメラ１０により実行された学習の結果を統合する。これにより、サーバ３０は、同じ設置状況のカメラによるオブジェクトの検出精度を高めることができる。 In addition, some of the plurality of cameras 10 are installed in the same installation situation. The server 30 integrates the results of learning performed by a plurality of cameras 10 with the same installation status of the cameras 10. Thereby, the server 30 can raise the detection accuracy of the object by the camera of the same installation condition.

サーバ３０は、設置状況が同一の一部の複数のカメラ１０から、それぞれのカメラの設置状況に関する情報の通知を受信し、その一部の複数のカメラ１０により実行された学習の結果を統合する。これにより、サーバ３０は、同じ設置状況のカメラによる学習の結果を統合し易くなる。 The server 30 receives notification of information regarding the installation status of each camera from a plurality of cameras 10 having the same installation status, and integrates the learning results executed by the some cameras 10. . This makes it easy for the server 30 to integrate the learning results from cameras with the same installation status.

また、サーバ３０及び複数のカメラ１０が、複数のカメラ１０の処理能力に関する情報と複数のカメラ１０の単価コストに関する情報（例えば個々のカメラ１０の電力コストの情報）とを共有する。これにより、サーバ３０は、空きリソース及び電力コストを考慮して、サーバ及び複数のカメラ等の各デバイスは、処理の指示実行を行うことも可能となり、多様な運用が可能となる。 Further, the server 30 and the plurality of cameras 10 share information on the processing capabilities of the plurality of cameras 10 and information on unit cost of the plurality of cameras 10 (for example, information on the power cost of each camera 10). Thus, the server 30 can execute processing instructions for each device such as the server and the plurality of cameras in consideration of available resources and power costs, and various operations can be performed.

（第２の実施の形態に至る経緯）
上述した特許文献１のような従来技術では、撮像画像内において追跡対象となる物体の正解となる物体動き情報を得るために、その物体に関する評価関数のスコアを用いることは開示されている。しかし、物体の検出精度を示すスコアに応じて、物体の検出において必要なパラメータの学習量をコントロールすることについては特段の考慮がなされていなかった。このため、例えば本来学習が必要ではない、検出に用いるパラメータを学習してしまうことでパラメータの学習精度にばらつきが生じ、物体の検出精度に影響を及ぼすことが懸念される。 (Background to the second embodiment)
In the related art such as Patent Document 1 described above, it is disclosed that the score of the evaluation function related to an object is used in order to obtain object motion information as a correct answer of the object to be tracked in the captured image. However, no particular consideration has been given to controlling the learning amount of a parameter necessary for object detection according to the score indicating the object detection accuracy. For this reason, there is a concern that learning of parameters used for detection, which originally does not require learning, causes variations in parameter learning accuracy and affects object detection accuracy.

そこで、実施の形態２では、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出に得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上する監視システム及び監視方法、並びに、カメラ及びパラメータ登録方法の例を説明する。 Therefore, in the second embodiment, the parameters used for detection are determined according to the score indicating the detection accuracy of the object obtained by detecting at least one object in the captured image captured by the camera installed in the monitoring area. An example of a monitoring system and monitoring method that appropriately controls the learning amount of the camera and improves the learning accuracy of the camera, and a camera and parameter registration method will be described.

（実施の形態２）
実施の形態２の監視システム５のシステム構成は、上述した実施の形態１の監視システム５のシステム構成と同一であるので、同一の符号を用いることで、その説明を簡略化又は省略し、異なる内容について説明する。 (Embodiment 2)
Since the system configuration of the monitoring system 5 according to the second embodiment is the same as the system configuration of the monitoring system 5 according to the first embodiment described above, the description thereof is simplified or omitted by using the same reference numerals. The contents will be described.

図１４は、実施の形態２のカメラ１０の処理実行部１４の内部構成の一例を詳細に示すブロック図である。 FIG. 14 is a block diagram illustrating in detail an example of the internal configuration of the processing execution unit 14 of the camera 10 according to the second embodiment.

カメラ１０の主要な構成である処理実行部１４は、ニューラルネットワーク（つまり、ＮＮ１４０）の他、教師データセットメモリ１５１及びパラメータメモリ１５２を含む。 The processing execution unit 14 which is a main configuration of the camera 10 includes a teacher data set memory 151 and a parameter memory 152 in addition to a neural network (that is, the NN 140).

ＮＮ１４０は、オブジェクト推論機能１４１と、スコア導出機能１４２と、正誤判定機能１４３と、パラメータ学習機能１４４との各機能を有する。 The NN 140 has functions of an object inference function 141, a score derivation function 142, a correctness determination function 143, and a parameter learning function 144.

検出部の一例としてのオブジェクト推論機能１４１では、ＮＮ１４０は、モデルパラメータに従い、撮像画像に出現する対象物が何であるかを推論（つまり、検出）する。 In the object inference function 141 as an example of the detection unit, the NN 140 infers (that is, detects) what the target appears in the captured image according to the model parameter.

導出部の一例としてのスコア導出機能１４２では、ＮＮ１４０は、推論時に対象物の検出精度を示すスコア（評価値）を、教師データセットメモリ１５１に登録された教師データを用いて導出し、そのスコアを出力する。 In the score derivation function 142 as an example of the derivation unit, the NN 140 derives a score (evaluation value) indicating the detection accuracy of the object at the time of inference using the teacher data registered in the teacher data set memory 151, and the score Is output.

正誤判定機能１４３では、ＮＮ１４０は、推論時に対象物の正誤の判定を、教師データセットメモリ１５１に登録された教師データを用いて導出し、その判定結果を出力する。 In the correctness determination function 143, the NN 140 derives the correctness / incorrectness determination of the object using the teacher data registered in the teacher data set memory 151 at the time of inference, and outputs the determination result.

パラメータ学習部の一例としてのパラメータ学習機能１４４では、ＮＮ１４０は、スコアが高い対象物の推論に用いられたモデルパラメータを採用するように学習する。また、パラメータ学習機能１４４では、ＮＮ１４０は、スコアが低い対象の推論に用いられたモデルパラメータを排除するように学習する。ＮＮ１４０は、学習したモデルパラメータをパラメータメモリ１５２に登録して蓄積する。パラメータメモリ１５２に登録された第１所定値（例えば８０点）よりスコアが高い対象物の推論に用いられたモデルパラメータは、後述する学習結果の共有において、サーバ３０に送信され、統合学習において利用される。 In the parameter learning function 144 as an example of the parameter learning unit, the NN 140 learns to adopt model parameters used for inferring an object having a high score. In the parameter learning function 144, the NN 140 learns to exclude model parameters used for inference of an object with a low score. The NN 140 registers and stores the learned model parameters in the parameter memory 152. A model parameter used for inferring an object having a higher score than a first predetermined value (for example, 80 points) registered in the parameter memory 152 is transmitted to the server 30 in sharing of learning results described later, and used in integrated learning. Is done.

図１５は、カメラ１０のローカル学習の動作手順の一例を詳細に示すフローチャートである。 FIG. 15 is a flowchart illustrating in detail an example of an operation procedure of local learning of the camera 10.

図１５において、カメラ１０は、イメージセンサ１２において被写体像から対象物を撮像し（Ｓ１）、撮像画像データを生成する（Ｓ２）。 In FIG. 15, the camera 10 captures an object from a subject image in the image sensor 12 (S1), and generates captured image data (S2).

処理実行部１４は、撮像画像データを入力し、撮像画像に現れる少なくとも１つの対象物（つまり、オブジェクト）を推論（検出）する（Ｓ３）。処理実行部１４は、推論（検出）時に少なくとも１つのオブジェクトのスコアリング処理を行う（Ｓ４）。このスコアリング処理では、処理実行部１４は、教師データセットメモリ１５１に登録された教師データを用いて、オブジェクトのスコア（評価値）を出力する。 The process execution unit 14 receives the captured image data, and infers (detects) at least one object (that is, an object) appearing in the captured image (S3). The process execution unit 14 performs scoring processing of at least one object at the time of inference (detection) (S4). In this scoring process, the process execution unit 14 outputs an object score (evaluation value) using the teacher data registered in the teacher data set memory 151.

処理実行部１４は、スコアリング処理の結果、第１所定値（例えば８０点）より上位スコアのオブジェクトの推論に用いたモデルパラメータ、及び第２所定値（例えば１０点）下位スコアのオブジェクトの推論に用いたモデルパラメータを用いて、ＮＮ１４０のモデルパラメータを学習する（Ｓ５）。また、ステップＳ５において、処理実行部１４は、第１所定値より上位スコアのモデルパラメータをパラメータメモリ１５２に登録して蓄積する。この後、カメラ１０は、図１５に示す処理を終了する。 As a result of the scoring process, the process execution unit 14 infers the model parameter used for inferring an object having a higher score than the first predetermined value (for example, 80 points) and the object having a second predetermined value (for example, 10 points) lower score. The model parameters of NN 140 are learned using the model parameters used in (S5). In step S <b> 5, the process execution unit 14 registers and accumulates model parameters having higher scores than the first predetermined value in the parameter memory 152. Thereafter, the camera 10 ends the process shown in FIG.

上位スコアは、例えば８０点〜１００点である。下位スコアは、例えば０点〜１０点である。処理実行部１４は、例えば上位スコアのモデルパラメータを採用し、下位スコアのモデルパラメータを排除する。 The higher score is, for example, 80 to 100 points. The lower score is, for example, 0 to 10 points. The process execution unit 14 adopts, for example, a model parameter with a higher score and eliminates a model parameter with a lower score.

上位スコアのオブジェクトは正報の可能性が高いオブジェクトであり、下位スコアのオブジェクトは誤報の可能性が高いオブジェクトである。従って、上位スコアのオブジェクトの推論（検出）に用いたモデルパラメータを採用するように学習することで、正報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられるようになり、モデルパラメータの学習精度を向上させることができる。 An object with a higher score is an object with a high possibility of correct reporting, and an object with a lower score is an object with a high possibility of erroneous reporting. Therefore, by learning to adopt the model parameters used for the inference (detection) of the object with the higher score, the model parameters applied to the estimation of the object with the high possibility of correct reporting can be used, and the model The parameter learning accuracy can be improved.

また、下位スコアのオブジェクトの推論に用いたモデルパラメータを排除するように学習することで、誤報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられなくなり、モデルパラメータの学習精度を向上させることができる。 In addition, by learning to eliminate the model parameters used for inferring objects with lower scores, the model parameters applied to the estimation of objects with high possibility of false alarms are not used, improving the accuracy of model parameter learning. Can be made.

また、処理実行部１４は、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて、学習してもよい。このように、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて学習することで、モデルパラメータの学習精度をより一層向上させることができる。 Further, the process execution unit 14 may learn by combining the model parameter used for inferring an object with a higher score and the model parameter used for inferring an object with a lower score. In this way, the learning accuracy of the model parameters can be further improved by combining the model parameters used for inferring the object with the higher score and the model parameters used for inferring the object with the lower score. .

図１６は、監視システム５における学習結果の共有の概要例の説明図である。 FIG. 16 is an explanatory diagram of an outline example of sharing of learning results in the monitoring system 5.

各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）は、撮像により得られた撮像画像データを用いてローカル学習を行う。ローカル学習において、各カメラ１０は、撮像画像データの中で検出した対象物の検出精度に関するスコアリング処理を実行し、得られたスコアに応じて、ＮＮ１４０のモデルパラメータを学習する。また、各カメラ１０は、上位スコアのオブジェクトの推論（検出）に用いたモデルパラメータのみ採用して学習する。これにより、モデルパラメータの学習精度を向上できる。 Each camera 10 (10A, 10B, 10C) performs local learning using captured image data obtained by imaging. In local learning, each camera 10 executes scoring processing related to the detection accuracy of the object detected in the captured image data, and learns the model parameters of the NN 140 according to the obtained score. Each camera 10 learns by adopting only model parameters used for inference (detection) of an object having a higher score. Thereby, the learning accuracy of model parameters can be improved.

また、ローカル学習では、撮像画像データを評価するためのＵＩ画面３４０（図１７参照）が表示可能である。例えば、カメラ１０は、カメラ１０にオプションとして接続された表示器（図示略）にＵＩ画面３４０を表示させてもよいし、サーバ３０の表示部３７に転送してＵＩ画面３４０を表示させることも可能である。 In local learning, a UI screen 340 (see FIG. 17) for evaluating captured image data can be displayed. For example, the camera 10 may display the UI screen 340 on a display (not shown) connected to the camera 10 as an option, or may display the UI screen 340 by transferring it to the display unit 37 of the server 30. Is possible.

図１７は、ローカル学習時に表示されるＵＩ画面３４０の一例を示す図である。 FIG. 17 is a diagram illustrating an example of a UI screen 340 displayed during local learning.

ＵＩ３４０は、撮像画像データから切り出された学習データごとに、スコア、カメラＩＤ、リジェクトボタンｂｘを表示する。なお、画像データのサムネイルは、カメラ１０が元の撮像画像データを記憶しているので、ここでは表示されないが、表示されるようにしてもよい。検出の対象（オブジェクト）は「人」である。 The UI 340 displays a score, a camera ID, and a reject button bx for each learning data cut out from the captured image data. The thumbnail of the image data is not displayed here because the camera 10 stores the original captured image data, but it may be displayed. The detection target (object) is “person”.

スコアは、０点〜１００点の範囲で数値化される。なお、スコアは、各カメラ１０がスコアリング処理することで算出されたが、ユーザがＵＩ３２０から入力することで取得されてもよい。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。リジェクトボタンｂｘは、ユーザにより選択された場合、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The score is quantified in the range of 0 to 100 points. The score is calculated by the scoring process performed by each camera 10, but may be acquired by the user inputting from the UI 320. The camera ID is identification information of a camera imaged to obtain learning data. When the reject button bx is selected by the user, a check mark is displayed. The learning data in which the check mark is added to the reject button bx is not used for learning when the user presses the learning button bt5.

カメラ１０は、自動的に、下位スコアの撮像画像データを採用する学習を行わず、上位スコアの撮像画像データを採用する学習を行った。但し、例えばカメラ１０の代わりに、ユーザが、リジェクトボタンｂｘを用いて学習に用いる撮像画像データを指示してもよい。例えば、ユーザは、下位スコアの撮像画像データを排除する学習を行わず、上位スコアの撮像画像データを採用する学習を行うように指示してもよい。 The camera 10 did not automatically learn to adopt captured image data with a lower score, but performed learning to adopt captured image data with a higher score. However, for example, instead of the camera 10, the user may instruct captured image data used for learning using the reject button bx. For example, the user may instruct to perform learning that adopts the captured image data of the higher score without performing the learning to exclude the captured image data of the lower score.

また、各カメラ１０は、上位スコアの撮像画像データのみ採用する学習を行ったが、例えば下位スコアの撮像画像データのみ排除する学習を行うように指示してもよい。これにより、下位スコアの撮像画像データが排除された撮像画像データを用いて学習を行うことができる。また、上位スコアの撮像画像データと下位スコアの撮像画像データとを組み合わせて用いるように設定されてもよい。これにより、撮像画像データの品質に照らして、学習に用いる画像データを、カメラ或いはユーザが個別に選別できる。 Further, each camera 10 has learned to use only the captured image data of the higher score, but may instruct the learning to exclude only the captured image data of the lower score, for example. Thereby, learning can be performed using the captured image data from which the captured image data of the lower score is excluded. Further, it may be set so that the captured image data of the higher score and the captured image data of the lower score are used in combination. Thereby, in view of the quality of the captured image data, the image data used for learning can be individually selected by the camera or the user.

サーバ３０は、各カメラ１０（１０Ａ，１０Ｂ，１０Ｃ）から送信されたモデルパラメータを受信し、受信した各モデルパラメータを合算する統合学習を行い、合算したモデルパラメータを学習用ＤＢ３４に追加する。ここで、統合されるモデルパラメータは、設置状況が同じカメラで撮像された画像データを基に得られたモデルパラメータである。一方、設置状況が異なるカメラで撮像された画像データを基に得られるモデルパラメータは、合算されず、別々の学習モデルに対するモデルパラメータとして個別に登録される。 The server 30 receives the model parameters transmitted from each camera 10 (10A, 10B, 10C), performs integrated learning that adds the received model parameters, and adds the combined model parameters to the learning DB 34. Here, the model parameters to be integrated are model parameters obtained based on image data captured by cameras having the same installation situation. On the other hand, model parameters obtained based on image data captured by cameras with different installation situations are not added together but individually registered as model parameters for different learning models.

図１８は、統合学習時にサーバ３０の表示部３７に表示されるＵＩ画面３５０の一例を示す図である。 FIG. 18 is a diagram illustrating an example of a UI screen 350 displayed on the display unit 37 of the server 30 during integrated learning.

サーバ３０は、表示部３７に、統合学習時のＵＩ３１０（図１８参照）を表示可能である。ＵＩ３５０は、撮像画像データから切り出された学習データ毎に、スコア、サムネイル、カメラＩＤ、リジェクトボタンｂｘを表示する。ここでは、検出の対象（オブジェクト）が「人」である場合を示す。 The server 30 can display the UI 310 (see FIG. 18) at the time of integrated learning on the display unit 37. The UI 350 displays a score, a thumbnail, a camera ID, and a reject button bx for each learning data cut out from the captured image data. Here, a case where the detection target (object) is “person” is shown.

スコアは、０点〜１００点の範囲で数値化される。例えば、対象が「人」である場合、人が写っている画像データの点数は、８０点〜１００点と高くなる。一方、人でなく「木」が写っている画像データの点数は、１０点と低くなる。サムネイルは、学習データの縮小画像である。サムネイルであるので、カメラ１０からサーバ３０に送信される際、データ転送量は抑えられる。カメラＩＤは、学習データを得るために撮像したカメラの識別情報である。リジェクトボタンｂｘは、ユーザにより選択され、チェックマークが表示される。リジェクトボタンｂｘにチェックマークが付加された学習データは、ユーザが学習ボタンｂｔ５を押下すると、学習に用いられなくなる。 The score is quantified in the range of 0 to 100 points. For example, when the target is “person”, the score of the image data in which the person is shown is as high as 80 to 100 points. On the other hand, the score of image data in which “tree” is reflected instead of a person is as low as 10. The thumbnail is a reduced image of learning data. Since it is a thumbnail, when it is transmitted from the camera 10 to the server 30, the data transfer amount can be suppressed. The camera ID is identification information of a camera imaged to obtain learning data. The reject button bx is selected by the user and a check mark is displayed. The learning data in which the check mark is added to the reject button bx is not used for learning when the user presses the learning button bt5.

サーバ３０は、自動的に上位スコアの撮像画像データを採用する学習を行わず、下位スコアの撮像画像データを排除する学習を行う。但し、例えばユーザが、リジェクトボタンｂｘを用いて学習に用いる撮像画像データを指示してもよい。例えば、ユーザは、下位スコアの撮像画像データを排除する学習を行わず、上位スコアの撮像画像データを採用する学習を行うように指示してもよい。 The server 30 does not automatically learn captured image data having a higher score, but performs learning that eliminates captured image data having a lower score. However, for example, the user may instruct captured image data used for learning using the reject button bx. For example, the user may instruct to perform learning that adopts the captured image data of the higher score without performing the learning to exclude the captured image data of the lower score.

このように、サーバ３０がモデルパラメータを統合学習することで、モデルパラメータの学習の精度が向上する。サーバ３０は、統合学習の結果である更新されたモデルパラメータを、該当するカメラ１０にフィードバック送信する。これにより、各カメラ１０で得られる画像データの正報が多くなるほど、カメラの検出精度が高くなる。 As described above, the server 30 performs integrated learning of model parameters, thereby improving the accuracy of model parameter learning. The server 30 transmits the updated model parameter, which is the result of the integrated learning, to the corresponding camera 10 as a feedback. Thereby, the detection accuracy of a camera becomes high, so that the correct report of the image data obtained with each camera 10 increases.

また、サーバ３０は、統合学習の結果である、更新されたモデルパラメータを、各カメラ１０にフィードバック送信する際、各カメラ１０の正報の数に応じて、フィードバック量を制御する。つまり、サーバ３０は、誤報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が多くなるように、更新済みモデルパラメータを送信する。これにより、正報の数が増加し、カメラの検出精度が向上する。 The server 30 also controls the amount of feedback according to the number of correct reports of each camera 10 when the updated model parameter, which is the result of the integrated learning, is transmitted to each camera 10 in a feedback manner. That is, the server 30 transmits the updated model parameter to the camera 10 with a large number of false alarms so that the feedback amount (for example, the number of feedbacks) increases. As a result, the number of correct reports increases and the detection accuracy of the camera improves.

一方、サーバ３０は、正報の数が多いカメラ１０に対し、フィードバック量（例えばフィードバック回数）が少なくなるように、更新済みモデルパラメータを送信する。これにより、カメラの処理の負荷を軽減できる。なお、サーバ３０は、設置環境が同じであるカメラに対し、同一の更新済みのモデルパラメータを送信して共有させることは、前述した通りである。 On the other hand, the server 30 transmits the updated model parameters to the camera 10 having a large number of correct reports so that the feedback amount (for example, the number of feedbacks) is reduced. Thereby, the processing load of the camera can be reduced. As described above, the server 30 transmits and shares the same updated model parameter to cameras having the same installation environment.

また、サーバ３０は、各カメラ１０に対し、学習の実行指示を行う際、各カメラ１０の正報の数に応じて、学習の処理量を指示する。誤報の数が多いカメラ１０に対し、学習量が多くなるように、学習の実行指示を行う。これにより、正報の数が増加し、カメラの検出精度が向上する。一方、サーバ３０は、正報の数が多いカメラ１０に対し、学習量が少なくなるように、学習の実行指示を行う。これにより、カメラの処理の負荷を軽減できる。 In addition, when the server 30 instructs each camera 10 to execute learning, the server 30 instructs the processing amount of learning according to the number of correct reports of each camera 10. A learning execution instruction is given to the camera 10 having a large number of false reports so that the learning amount increases. As a result, the number of correct reports increases and the detection accuracy of the camera improves. On the other hand, the server 30 instructs the camera 10 having a large number of correct reports to execute learning so that the learning amount decreases. Thereby, the processing load of the camera can be reduced.

また、サーバ３０は、各カメラ１０で撮像された画像に出現する対象を検出する検出の結果を統合して管理してもよい。検出の結果を統合する場合、対象の動きをベクトルで表し、ベクトルで検出の結果を管理してもよい。 In addition, the server 30 may integrate and manage detection results for detecting a target appearing in an image captured by each camera 10. When integrating the detection results, the motion of the target may be represented by a vector, and the detection result may be managed by the vector.

以上により、実施の形態２のカメラ１０は、監視エリアＳＡに設置され、サーバ３０と互いに通信可能に接続された監視システム５に用いられるカメラである。カメラ１０は、イメージセンサ１２において、監視エリアＳＡからの被写体光を撮像する。カメラ１０は、検出部の一例としてのオブジェクト推論機能１４１において、被写体光の撮像に基づく撮像画像ｏｇを用いて、撮像画像に出現する少なくとも１つのオブジェクトを検出する。カメラ１０は、オブジェクトの種別ごとに用意された教師データセットを教師データセットメモリ１５１において保持する。カメラ１０は、導出部の一例としてのスコア導出機能１４２において、教師データセットを用いて、検出されたオブジェクトの検出精度を示すスコアを導出する。カメラ１０は、パラメータ学習部の一例としてのパラメータ学習機能１４４において、導出されたスコアに応じて、オブジェクトの検出に用いるモデルパラメータを学習する。カメラ１０は、パラメータ学習機能１４４において、モデルパラメータの学習結果をパラメータメモリ１５２に登録して蓄積する。 As described above, the camera 10 according to the second embodiment is a camera used in the monitoring system 5 installed in the monitoring area SA and connected to the server 30 so as to be able to communicate with each other. The camera 10 images the subject light from the monitoring area SA in the image sensor 12. In the object inference function 141 as an example of a detection unit, the camera 10 detects at least one object appearing in the captured image using the captured image og based on the imaging of the subject light. The camera 10 holds a teacher data set prepared for each object type in the teacher data set memory 151. In the score derivation function 142 as an example of a derivation unit, the camera 10 derives a score indicating the detection accuracy of the detected object using the teacher data set. In the parameter learning function 144 as an example of the parameter learning unit, the camera 10 learns model parameters used for object detection according to the derived score. In the parameter learning function 144, the camera 10 registers and accumulates model parameter learning results in the parameter memory 152.

これにより、カメラ１０は、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出によって得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上させることができる。 As a result, the camera 10 determines the parameters used for detection according to the score indicating the detection accuracy of the object obtained by detecting at least one object in the captured image captured by the camera installed in the monitoring area. It is possible to appropriately control the learning amount and improve the learning accuracy in the camera.

また、パラメータ学習機能１４４は、第１所定値より上位のスコアが導出されたモデルパラメータを採用するように学習する。このように、カメラ１０は、上位スコアのオブジェクトの推論に用いたモデルパラメータを採用するように学習することで、正報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられるようになり、モデルパラメータの学習精度を向上させることができる。 In addition, the parameter learning function 144 learns to adopt a model parameter from which a score higher than the first predetermined value is derived. As described above, the camera 10 learns to adopt the model parameter used for the inference of the object having the higher score, so that the model parameter applied to the estimation of the object having a high possibility of correct report is used. Thus, the learning accuracy of the model parameters can be improved.

また、パラメータ学習機能１４４は、第２所定値より下位のスコアが導出されたモデルパラメータを排除するように学習する。このように、カメラ１０は、下位スコアのオブジェクトの推論に用いたモデルパラメータを排除するように学習することで、誤報の可能性が高いオブジェクトの推定に適用されたモデルパラメータが用いられなくなり、モデルパラメータの学習精度を向上させることができる。 Further, the parameter learning function 144 performs learning so as to exclude model parameters for which a score lower than the second predetermined value is derived. As described above, the camera 10 learns to exclude the model parameter used for the inference of the object having the lower score, so that the model parameter applied to the estimation of the object having a high possibility of misreporting is not used. The parameter learning accuracy can be improved.

また、パラメータ学習機能１４４は、第１所定値より上位のスコアが導出されたモデルパラメータを採用するように学習し、かつ、第２所定値より下位のスコアが導出されたモデルパラメータを排除するように学習する。このように、カメラ１０は、上位スコアのオブジェクトの推論に用いたモデルパラメータと、下位スコアのオブジェクトの推論に用いたモデルパラメータとを組み合わせて学習することで、モデルパラメータの学習精度をより一層向上させることができる。 Further, the parameter learning function 144 learns to adopt a model parameter from which a score higher than the first predetermined value is derived, and excludes a model parameter from which a score lower than the second predetermined value is derived. To learn. In this way, the camera 10 learns by combining the model parameter used for inferring the object with the higher score and the model parameter used for inferring the object with the lower score, thereby further improving the learning accuracy of the model parameter. Can be made.

以上、図面を参照しながら各種の実施形態について説明したが、本発明はかかる例に限定されないことは言うまでもない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。また、発明の趣旨を逸脱しない範囲において、上述実施の形態における各構成要素を任意に組み合わせてもよい。 While various embodiments have been described above with reference to the drawings, it goes without saying that the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the claims, and these are naturally within the technical scope of the present invention. Understood. In addition, the constituent elements in the above-described embodiment may be arbitrarily combined within the scope not departing from the spirit of the invention.

例えば、上述した実施の形態では、監視システムは、泥棒等の不審者を発見・追跡する、防犯用の監視システムに適用される場合を示したが、無人自動化（ＦＡ化）された製造ラインにおける製品検査用の監視システム等に適用されてもよい。 For example, in the above-described embodiment, the monitoring system is applied to a security monitoring system that detects and tracks a suspicious person such as a thief. The present invention may be applied to a monitoring system for product inspection.

本開示は、監視エリアに設置されたカメラにおいて撮像された撮像画像内の少なくとも１つのオブジェクトの検出に得られた、そのオブジェクトの検出精度を示すスコアに応じて、検出に用いるパラメータの学習量を適切に制御し、カメラにおける学習精度を向上するカメラ及びパラメータ登録方法として有用である。 In the present disclosure, the learning amount of a parameter used for detection is determined according to a score indicating detection accuracy of an object obtained by detecting at least one object in a captured image captured by a camera installed in a monitoring area. It is useful as a camera and parameter registration method for appropriately controlling and improving learning accuracy in the camera.

５監視システム
１０，１０Ａ，１０Ｂ，１０Ｃカメラ
１１レンズ
１２イメージセンサ
１３信号処理部
１４処理実行部
１５リソース監視部
１６ネットワークＩ／Ｆ
１７クロップエンコード部
３０サーバ
３１プロセッサ
３２メモリ
３３通信部
３４学習用ＤＢ
３５テーブルメモリ（メモリ）
３６操作部
３７表示部
５０レコーダ
１５０デバイス
ＮＷネットワーク
Ｐモデルパラメータ
5 Monitoring System 10, 10A, 10B, 10C Camera 11 Lens 12 Image Sensor 13 Signal Processing Unit 14 Processing Execution Unit 15 Resource Monitoring Unit 16 Network I / F
17 Crop Encoding Unit 30 Server 31 Processor 32 Memory 33 Communication Unit 34 Learning DB
35 Table memory (memory)
36 Operation unit 37 Display unit 50 Recorder 150 Device NW Network P Model parameter

Claims

A camera used in a surveillance system installed in a surveillance area and connected to a server so as to communicate with each other.
An imaging unit for imaging subject light from the monitoring area;
A detection unit that detects at least one object appearing in the captured image using a captured image based on the imaging of the subject light in the imaging unit;
A memory for holding a teacher data set prepared for each type of the object;
A derivation unit for deriving a score indicating the detection accuracy of the object detected by the detection unit using the teacher data set;
A parameter learning unit that learns parameters used for detection of the object according to the score derived by the deriving unit;
The parameter learning unit registers and accumulates the learning result of the parameter in the memory;
camera.

The parameter learning unit learns to adopt the parameter from which the score higher than the first predetermined value is derived.
The camera according to claim 1.

The parameter learning unit learns to exclude the parameter from which the score lower than a second predetermined value is derived;
The camera according to claim 1.

The parameter learning unit learns to employ the parameter from which the score higher than the first predetermined value is derived, and excludes the parameter from which the score lower than the second predetermined value is derived. To learn,
The camera according to claim 1.

A parameter registration method using a camera installed in a monitoring area,
Imaging subject light from the monitoring area;
Detecting at least one object appearing in the captured image using a captured image based on imaging of the subject light;
Deriving a score indicating the detection accuracy of the detected object using a teacher data set prepared for each type of the object;
Learning parameters used for detecting the object according to the derived score;
Registering and storing the learning result of the parameter in a memory, and
Parameter registration method.