JP2020042367A

JP2020042367A - Learning system, server, and feature amount image drawing interpolation program

Info

Publication number: JP2020042367A
Application number: JP2018167272A
Authority: JP
Inventors: 安紘土田; Yasuhiro Tsuchida
Original assignee: AWL Inc
Current assignee: AWL Inc
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-19

Abstract

To provide a learning system, a server, and a program that can reduce the number of cameras installed in each store of chain stores and reduce the costs of newly opening and maintaining each store of the chain stores.SOLUTION: In a learning system, a server 2 includes: a drawing processing unit that draws a feature amount image on the basis of feature amounts extracted from images captured by a plurality of cameras; a feature amount image interpolation unit that interpolates a missing portion of the feature amount image by applying a learned neural network to the entire feature amount image including the missing portion of the feature amount image caused by the missing of the captured images; and a machine learning unit that performs machine learning of the neural network before completion of the learning using a learning image based on the feature amount image drawn by the drawing processing unit on the basis of the feature amounts extracted from the images captured by all the cameras.SELECTED DRAWING: Figure 2

Description

本発明は、学習システム、サーバ、及び特徴量画像描画補間プログラムに関する。 The present invention relates to a learning system, a server, and a feature image drawing interpolation program.

従来から、販売店やレストラン等の店舗には、監視カメラや、いわゆるＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）カメラ等のカメラが、店内の各所に設置されている。これらのカメラは、万引き等の犯罪の防止や、顧客及び従業員の行動の把握等のために用いられるので（特許文献１等参照）、例えば、４００平方メートル程度の規模のドラッグストア各店舗に、５０台程度のカメラを設置することで、店舗内の顧客や従業員の行動等がデータ化され、様々な分析に応用することが可能である。 2. Description of the Related Art Conventionally, in a store such as a store or a restaurant, cameras such as a surveillance camera and a so-called AI (Artificial Intelligence) camera are installed in various places in the store. Since these cameras are used to prevent crimes such as shoplifting and to grasp the behaviors of customers and employees (see Patent Literature 1 and the like), for example, drugstores each having a size of about 400 square meters are used in cameras. By installing about 50 cameras, the behaviors of customers and employees in the store can be converted into data and applied to various analyses.

特開２０１８−９３２８３号公報JP 2018-93283 A

ところが、上記の監視カメラや、ＡＩカメラ等のカメラは、高価であるため、各店舗に多くのカメラを設置することは、店舗の新規開店費用（イニシャルコスト）や、店舗の維持費用（ランニングコスト）の上昇につながる。 However, since cameras such as the above-mentioned surveillance cameras and AI cameras are expensive, installing a large number of cameras in each store requires new store opening costs (initial costs) and store maintenance costs (running costs). ) Leads to a rise.

本発明は、上記課題を解決するものであり、チェーン店の各店舗に設置するカメラの数を抑えて、チェーン店の各店舗の新規開店費用や維持費用を削減することが可能な学習システム、サーバ、及び特徴量画像描画補間プログラムを提供することを目的とする。 The present invention is to solve the above problems, a learning system capable of reducing the number of cameras to be installed in each store of the chain store, and reducing the cost of new opening and maintenance of each store in the chain store, It is an object to provide a server and a feature quantity image drawing interpolation program.

上記課題を解決するために、本発明の第1の態様による学習システムは、系列に属する店舗が同じレイアウトを有するチェーン店の店舗に配された複数のカメラと、前記複数のカメラと通信を行うサーバとを備えた学習システムであって、前記サーバは、前記複数のカメラによる撮影画像から抽出された特徴量に基づいて、特徴量画像を描画する描画処理部と、前記チェーン店のある店舗に配置されたカメラの数が、本来配置されるべきカメラの数よりも少ないために、前記撮影画像が欠落している場合に、これらの撮影画像の欠落により生じる前記特徴量画像の欠落部分を含む前記特徴量画像の全体に対して、学習済のニューラルネットワークを適用することで、前記特徴量画像の欠落部分を補間する特徴量画像補間部と、前記本来配置されるべきカメラのうち前記ある店舗に欠けているカメラを備えた前記系列に属する他の店舗において、前記他の店舗に配置された全てのカメラによる撮影画像から抽出された特徴量に基づいて前記描画処理部が描画した前記特徴量画像に基づく学習用画像を用いて、学習完了前の前記ニューラルネットワークの機械学習を行う機械学習部とを備える。 In order to solve the above-described problem, a learning system according to a first aspect of the present invention communicates with a plurality of cameras arranged in a chain store having stores having the same layout and belonging to a series, and the plurality of cameras. A learning system including a server, wherein the server is a drawing processing unit that draws a feature amount image based on a feature amount extracted from images captured by the plurality of cameras; and Since the number of arranged cameras is smaller than the number of cameras to be originally arranged, when the photographed images are missing, a missing portion of the feature amount image caused by the lack of these photographed images is included. A feature image interpolating unit that interpolates a missing part of the feature image by applying a learned neural network to the entire feature image; In another store belonging to the series having a camera missing from the certain store among the cameras to be drawn, the drawing is performed based on feature amounts extracted from images taken by all cameras arranged in the other store. A machine learning unit that performs machine learning of the neural network before learning is completed using a learning image based on the feature amount image drawn by the processing unit.

この学習システムにおいて、前記他の店舗は、前記系列に属する前記ある店舗以外の店舗のうち、本来配置されるべき全てのカメラを備えた店舗であり、
前記機械学習部は、前記他の店舗に配置された、前記本来配置されるべき全てのカメラによる撮影画像から抽出された特徴量に基づいて前記描画処理部が描画した前記特徴量画像に基づく学習用画像を用いて、学習完了前の前記ニューラルネットワークの機械学習を行うことが望ましい。 In this learning system, the other store is a store provided with all cameras to be originally arranged among stores other than the certain store belonging to the series,
The machine learning unit is configured to perform learning based on the feature amount image drawn by the drawing processing unit based on feature amounts extracted from images captured by all the cameras to be originally arranged, which are arranged in the another store. It is desirable to perform machine learning of the neural network before learning is completed using the image for training.

この学習システムにおいて、前記特徴量画像は、前記複数のカメラによる撮影画像から抽出された特徴量を色を用いて可視化したヒートマップ画像であることが望ましい。 In this learning system, it is preferable that the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.

この学習システムにおいて、前記特徴量は、複数の種類の特徴量のデータから構成される多次元データであり、前記特徴量画像は、前記多次元データを色を用いて可視化したカラーのヒートマップ画像であることが望ましい。 In this learning system, the feature quantity is multidimensional data composed of data of a plurality of types of feature quantities, and the feature quantity image is a color heat map image obtained by visualizing the multidimensional data using colors. It is desirable that

この学習システムにおいて、前記描画処理部は、前記複数のカメラによる撮影画像から抽出された所定の時間分の前記特徴量に基づいて、前記ヒートマップ画像を描画することが望ましい。 In this learning system, it is preferable that the drawing processing unit draws the heat map image based on the feature amount for a predetermined time extracted from images captured by the plurality of cameras.

この学習システムにおいて、前記特徴量は、前記複数のカメラの各々による撮影画像そのものであり、前記サーバにおける描画処理部は、前記複数のカメラの各々による撮影画像に基づいて、これらの画像を組み合わせた画像である店舗内の画像を、特徴量画像として描画し、前記学習済のニューラルネットワークは、前記ある店舗に配置されたカメラの数が、本来配置されるべきカメラの数よりも少ないために、前記撮影画像が欠落している場合に、欠落した撮影画像を補間することにより、前記ある店舗内の画像を完成させるものであってもよい。 In this learning system, the feature amount is an image itself taken by each of the plurality of cameras, and the drawing processing unit in the server combines these images based on the image taken by each of the plurality of cameras. An image in a store that is an image is drawn as a feature amount image, and the learned neural network has a smaller number of cameras arranged in the certain store than the number of cameras to be originally arranged. When the photographed image is missing, the image in the certain store may be completed by interpolating the missing photographed image.

この学習システムにおいて、前記ニューラルネットワークは、生成モデルのニューラルネットワークであり、前記特徴量画像補間部は、前記学習済のニューラルネットワークを用いて、前記特徴量画像の欠落部分を生成することにより、前記欠落部分を補間するようにしてもよい。 In this learning system, the neural network is a neural network of a generative model, and the feature image interpolating unit uses the learned neural network to generate a missing portion of the feature image. The missing part may be interpolated.

本発明の第２の態様によるサーバは、系列に属する店舗が同じレイアウトを有するチェーン店のある店舗に配された複数のカメラと通信可能なサーバであって、前記複数のカメラによる撮影画像から抽出された特徴量に基づいて、特徴量画像を描画する描画処理部と、前記チェーン店のある店舗に配置されたカメラの数が、本来配置されるべきカメラの数よりも少ないために、前記撮影画像が欠落している場合に、これらの撮影画像の欠落により生じる前記特徴量画像の欠落部分を含む前記特徴量画像の全体に対して、学習済のニューラルネットワークを適用することで、前記特徴量画像の欠落部分を補間する特徴量画像補間部と、前記本来配置されるべきカメラのうち前記ある店舗に欠けているカメラを備えた前記系列に属する他の店舗において、前記他の店舗に配置された全てのカメラによる撮影画像から抽出された特徴量に基づいて前記描画処理部が描画した前記特徴量画像に基づく学習用画像を用いて、学習完了前の前記ニューラルネットワークの機械学習を行う機械学習部とを備える。 A server according to a second aspect of the present invention is a server capable of communicating with a plurality of cameras arranged in a store having a chain store in which affiliated stores have the same layout, and is extracted from images taken by the plurality of cameras. The drawing processing unit that draws a feature amount image based on the obtained feature amount, and the number of cameras arranged in the store where the chain store is located is smaller than the number of cameras that should be originally arranged, so that the photographing is performed. By applying a learned neural network to the entire feature amount image including a missing portion of the feature amount image caused by the lack of these captured images when the image is missing, the feature amount is obtained. A feature amount image interpolating unit that interpolates a missing part of an image, and another store belonging to the series having a camera that is missing from the certain store among the cameras to be originally arranged. And, using a learning image based on the feature amount image drawn by the drawing processing unit based on the feature amount extracted from images captured by all the cameras arranged in the other stores, the learning before completion of the learning A machine learning unit that performs machine learning of the neural network.

このサーバにおいて、前記特徴量画像は、前記複数のカメラによる撮影画像から抽出された特徴量を色を用いて可視化したヒートマップ画像であることが望ましい。 In the server, it is preferable that the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.

本発明の第３の態様による特徴量画像描画補間プログラムは、コンピュータを、系列に属する店舗が同じレイアウトを有するチェーン店のある店舗に配された複数のカメラによる撮影画像から抽出された特徴量に基づいて、特徴量画像を描画する描画処理部と、前記チェーン店のある店舗に配置されたカメラの数が、本来配置されるべきカメラの数よりも少ないために、前記撮影画像が欠落している場合に、これらの撮影画像の欠落により生じる前記特徴量画像の欠落部分を含む前記特徴量画像の全体に対して、学習済のニューラルネットワークを適用することで、前記特徴量画像の欠落部分を補間する特徴量画像補間部と、前記本来配置されるべきカメラのうち前記ある店舗に欠けているカメラを備えた前記系列に属する他の店舗において、前記他の店舗に配置された全てのカメラによる撮影画像から抽出された特徴量に基づいて前記描画処理部が描画した前記特徴量画像に基づく学習用画像を用いて、学習完了前の前記ニューラルネットワークの機械学習を行う機械学習部として機能させるための、特徴量画像描画補間プログラムである。 The feature quantity image drawing interpolation program according to the third aspect of the present invention is a program for converting a computer belonging to a series into feature quantities extracted from images captured by a plurality of cameras arranged in a store having a chain store having the same layout. Based on the drawing processing unit that draws the feature amount image, the number of cameras arranged at the store where the chain store is located is smaller than the number of cameras that should be originally arranged. In this case, by applying a learned neural network to the entire feature amount image including the missing portion of the feature amount image caused by the lack of these captured images, the missing portion of the feature amount image In a feature amount image interpolating unit to be interpolated, and in another store belonging to the series having a camera lacking in the certain store among the cameras to be originally arranged, The neural network before learning is completed using a learning image based on the feature amount image drawn by the drawing processing unit based on the feature amount extracted from images captured by all cameras arranged in the other stores. Is a feature amount image drawing interpolation program for functioning as a machine learning unit that performs machine learning.

この特徴量画像描画補間プログラムにおいて、前記特徴量画像は、前記複数のカメラによる撮影画像から抽出された特徴量を色を用いて可視化したヒートマップ画像であることが望ましい。 In the feature amount image drawing interpolation program, it is preferable that the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.

本発明の第１の態様による学習システム、第２の態様によるサーバ、及び第３の態様による特徴量画像描画補間プログラムによれば、本来配置されるべきカメラのうち、ある店舗に欠けているカメラを備えた（同じ）系列に属する他の店舗において、この他の店舗に配置された全てのカメラによる撮影画像から抽出された特徴量に基づいて描画した特徴量画像に基づく学習用画像用いて、学習完了前のニューラルネットワークの機械学習を行うようにした。ここで、いわゆるチェーン店では、系列に属する各店舗が同じレイアウトを有している。このため、上記のように、本来配置されるべきカメラのうち、ある店舗に欠けているカメラを備えた（同じ）系列に属する他の店舗における撮影画像から抽出された特徴量に基づいて描画した特徴量画像に基づく学習用画像を用いて、ニューラルネットワークの機械学習を行うことにより、同じ系列に属するある店舗に配置されたカメラの数が、本来配置されるべきカメラの数よりも少ないために、本来配置されるべきカメラのうち、いくつかのカメラによる撮影画像が欠落している場合でも、これらの撮影画像の欠落により生じる特徴量画像の欠落部分を、学習済のニューラルネットワークを用いて、補間することができる。 According to the learning system according to the first aspect of the present invention, the server according to the second aspect, and the feature quantity image drawing interpolation program according to the third aspect, the cameras that should be arranged and are missing from a certain store In another store belonging to the (same) series provided with, using a learning image based on a feature amount image drawn based on a feature amount extracted from images captured by all cameras arranged in the other store, Machine learning of neural network before learning is completed. Here, in a so-called chain store, stores belonging to the affiliate have the same layout. For this reason, as described above, drawing is performed based on the feature amounts extracted from the captured images of the other stores belonging to the (same) series having the camera missing in a certain store among the cameras to be originally arranged. By using the learning image based on the feature amount image and performing machine learning of the neural network, the number of cameras arranged in a store belonging to the same series is smaller than the number of cameras that should be originally arranged. Of the cameras that should be arranged, even if images captured by some cameras are missing, the missing part of the feature amount image caused by the lack of these captured images is obtained using a learned neural network, Can be interpolated.

これにより、チェーン店の各店舗に配置するカメラの数を、本来配置されるべきカメラの数よりも少なくした場合でも、本来配置されるべきカメラの数と同じ数のカメラを配置した場合と遜色のない特徴量画像を得ることができる。従って、チェーン店の各店舗に設置するカメラの数を抑えて、チェーン店の各店舗の新規開店費用や維持費用を削減することができる。 As a result, even when the number of cameras to be arranged at each chain store is smaller than the number of cameras to be originally arranged, it is inferior to the case where the same number of cameras to be originally arranged are arranged. It is possible to obtain a feature-value-free image. Therefore, it is possible to reduce the number of cameras installed in each chain store, thereby reducing the cost of opening and maintaining each chain store.

本発明の一実施形態の学習システムの概略の構成を示すブロック構成図。FIG. 1 is a block diagram showing a schematic configuration of a learning system according to an embodiment of the present invention. 同学習システムに含まれるエッジカメラとサーバの機能ブロック構成図。FIG. 2 is a functional block configuration diagram of an edge camera and a server included in the learning system. 同学習システムにおける画像補間ネットワーク（及びディスクリミネータ）の学習の準備処理のフローチャート。9 is a flowchart of a process of preparing for learning of an image interpolation network (and a discriminator) in the learning system. 同学習システムのエッジカメラにおける撮影エリアと、撮影エリア毎の特徴量の説明図。FIG. 4 is an explanatory diagram of a photographing area in an edge camera of the learning system and a feature amount for each photographing area. 上記エッジカメラによる撮影画像を、店内全体の撮影画像を再現するように並べたときの撮影エリアの説明図。FIG. 4 is an explanatory diagram of a photographing area when images photographed by the edge camera are arranged so as to reproduce a photographed image of the entire shop. 上記図３の学習の準備処理の説明図。FIG. 4 is an explanatory diagram of the learning preparation processing of FIG. 3. 同学習システムにおける欠落部分を含むヒートマップ画像（欠落ヒートマップ画像）の生成処理と、欠落ヒートマップ画像の補間処理のフローチャート。9 is a flowchart of a process of generating a heat map image including a missing portion (missing heat map image) and an interpolation process of the missing heat map image in the learning system. 上記図７の欠落ヒートマップ画像の生成処理と、欠落ヒートマップ画像の補間処理の説明図。FIG. 8 is an explanatory diagram of the missing heat map image generation processing and the missing heat map image interpolation processing of FIG. 7. 同学習システムにおける画像補間ネットワークの説明図。Explanatory drawing of the image interpolation network in the learning system. 同学習システムにおける画像補間ネットワークに含まれる各層の説明図。FIG. 4 is an explanatory diagram of each layer included in the image interpolation network in the learning system. 同学習システムにおけるディスクリミネータの説明図。Explanatory drawing of the discriminator in the learning system. 同学習システムにおけるディスクリミネータに含まれる各層の説明図。Explanatory drawing of each layer contained in the discriminator in the learning system. 同学習システムにおける画像補間ネットワーク及びディスクリミネータの機械学習の説明図。FIG. 4 is an explanatory diagram of machine learning of an image interpolation network and a discriminator in the learning system.

以下、本発明を具体化した実施形態による学習システム、サーバ、及び特徴量画像描画補間プログラムについて、図面を参照して説明する。図１は、本実施形態による学習システムの概略の構成を示すブロック図である。この学習システム１０は、複数のエッジカメラ１（いわゆるＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）カメラ：請求項における「カメラ」に相当）と、これらのエッジカメラ１と通信を行う（クラウド上の）サーバ２とを備えている。上記のエッジカメラ１は、いわゆるエッジコンピューティング機能を有するカメラであり、系列に属する店舗が同じレイアウトを有するチェーン店の各店舗に配されている。なお、図１には、１つのエッジカメラ１のみを示しているが、チェーン店の各店舗には、複数のエッジカメラ１が設置されている。チェーン店の各店舗に本来配置されるべきエッジカメラ１の数は、例えば、５０台以上であるが、一部の店舗のみに、本来配置されるべきエッジカメラ１のうちの全て（５０台以上のエッジカメラ１）が設置され、他の多くの店舗では、本来配置されるべき数よりも少ない数（例えば、５台〜１０台）のエッジカメラ１しか設置されていない。 Hereinafter, a learning system, a server, and a feature image drawing interpolation program according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a schematic configuration of a learning system according to the present embodiment. The learning system 10 includes a plurality of edge cameras 1 (so-called AI (Artificial Intelligence) cameras: corresponding to “cameras” in the claims), and a server 2 (on a cloud) that communicates with the edge cameras 1. ing. The above-described edge camera 1 is a camera having a so-called edge computing function, and stores belonging to affiliates are arranged in each chain store having the same layout. Although FIG. 1 shows only one edge camera 1, a plurality of edge cameras 1 are installed in each chain store. The number of edge cameras 1 that should be originally arranged in each chain store is, for example, 50 or more, but only some of the edge cameras 1 that should be arranged (only 50 or more) are provided in some stores. Edge cameras 1) are installed, and in many other stores, only a smaller number (for example, 5 to 10) of edge cameras 1 than the number of edge cameras 1 to be originally installed are installed.

上記のエッジカメラ１は、カメラ部１１と、装置全体の制御と各種演算を行うＣＰＵ１２と、通信部１３とを備えている。上記のＣＰＵ１２には、不図示のＧＰＵ（ディスクリートＧＰＵ）を備えること、または不図示の外付けのＧＰＵ（グラフィックスカードや、ＵＳＢ等でエッジカメラ１と接続が可能な機械学習計算用ＧＰＵを搭載したデバイス）を接続することが望ましい。 The edge camera 1 includes a camera unit 11, a CPU 12 that controls the entire apparatus and performs various calculations, and a communication unit 13. The CPU 12 includes a GPU (discrete GPU) not shown or an external GPU (not shown) equipped with a GPU for machine learning calculation that can be connected to the edge camera 1 by a graphics card, USB, or the like. Device).

また、エッジカメラ１は、各種のデータやプログラムを記憶するメモリ１４を備えている。メモリ１４に記憶されているプログラムには、エッジカメラ側制御プログラム１５が含まれている。なお、本エッジカメラ側制御プログラム１５の一部または全部が、不図示のＧＰＵ内のメモリに記憶される場合もある。 Further, the edge camera 1 includes a memory 14 for storing various data and programs. The programs stored in the memory 14 include an edge camera-side control program 15. A part or all of the edge camera-side control program 15 may be stored in a memory (not shown) in a GPU.

上記のサーバ２は、装置全体の制御と各種演算を行うＣＰＵ２１を備えている。また、サーバ２は、通信部２２を有しており、通信部２２を介して、エッジカメラ１と通信を行う。通信部２２は、通信用ＩＣを備えている。 The server 2 includes a CPU 21 that controls the entire apparatus and performs various calculations. The server 2 has a communication unit 22, and communicates with the edge camera 1 via the communication unit 22. The communication unit 22 includes a communication IC.

また、サーバ２は、各種のプログラムやデータを記憶するハードディスク２３と、各種のプログラムの実行時に、実行するプログラムやデータをロードするＲＡＭ２４と、ディスプレイ２５と、各種の入力指示操作に用いられる操作部３０とを備えている。上記のハードディスク２３には、ヒートマップ描画補間プログラム２６と、画像補間ネットワークＩと、ディスクリミネータＤと、画像補間ネットワークＩの学習用の訓練データセット２９とが格納されている。画像補間ネットワークＩと、ディスクリミネータＤとは、生成モデルの一種であるＧＡＮ（正確に言うと、ＤＣＧＡＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ））のＧｅｎｅｒａｔｏｒと、Ｄｉｓｃｒｉｍｉｎａｔｏｒとに相当する。ハードディスク２３は、上記の画像補間ネットワークＩのパラメータデータ２７と、ディスクリミネータＤのパラメータデータ２８も、格納している。なお、図には示していないが、サーバ２も、ＧＰＵを備えることが望ましい。 The server 2 includes a hard disk 23 for storing various programs and data, a RAM 24 for loading programs and data to be executed when the various programs are executed, a display 25, and an operation unit used for various input instruction operations. 30. The hard disk 23 stores a heat map drawing interpolation program 26, an image interpolation network I, a discriminator D, and a training data set 29 for learning the image interpolation network I. The image interpolation network I and the discriminator D correspond to a generator and a discriminator of a GAN (more precisely, DCGAN (Deep Convolutional Advertisement Networks)) which is a kind of a generation model. The hard disk 23 also stores the parameter data 27 of the image interpolation network I and the parameter data 28 of the discriminator D. Although not shown in the figure, it is desirable that the server 2 also includes a GPU.

図２は、上記のエッジカメラ１側の機能ブロックと、サーバ２側の機能ブロックを示す。エッジカメラ１側のＣＰＵ１２は、特徴量抽出部３０を有している。特徴量抽出部３０は、自機（エッジカメラ１）による撮影画像から、女性人数、男性人数、滞留時間等の特徴量を抽出する。なお、特徴量抽出部３０は、撮影画像における男性人数や女性人数の検出に、例えば、Ｒ−ＣＮＮベースの物体検出エンジンを用いる。Ｒ−ＣＮＮベースの物体検出エンジンは、入力画像における、オブジェクトらしい領域を抽出する機能と、この抽出された領域に対してＣＮＮを適用して、抽出された領域の画像が、どのクラスに属するかを分類する機能とを含んでいる。 FIG. 2 shows a functional block on the edge camera 1 side and a functional block on the server 2 side. The CPU 12 on the edge camera 1 side has a feature amount extraction unit 30. The feature amount extraction unit 30 extracts feature amounts such as the number of women, the number of men, and the residence time from an image captured by the own device (edge camera 1). The feature amount extraction unit 30 uses, for example, an R-CNN-based object detection engine to detect the number of men and women in the captured image. The R-CNN-based object detection engine has a function of extracting an object-like region in an input image, and applying CNN to the extracted region to determine to which class the image of the extracted region belongs. And the ability to classify.

また、サーバ２側のＣＰＵ２１は、描画処理部３１と、特徴量画像補間部３２と、機械学習部３３とを有している。上記の描画処理部３１は、複数のエッジカメラ１による撮影画像から上記の特徴量抽出部３０により抽出された特徴量に基づいて、ヒートマップ画像（請求項における「特徴量画像」）を描画する。このヒートマップ画像は、複数のエッジカメラ１による撮影画像から抽出された特徴量を色を用いて可視化した画像である。上記の特徴量抽出部３０により抽出された特徴量が、複数の種類の特徴量のデータから構成される多次元データの場合には、上記のヒートマップ画像は、多次元データを色を用いて可視化したカラーのヒートマップ画像となる。 Further, the CPU 21 of the server 2 includes a drawing processing unit 31, a feature amount image interpolation unit 32, and a machine learning unit 33. The drawing processing unit 31 draws a heat map image (“feature amount image” in the claims) based on the feature amount extracted by the feature amount extraction unit 30 from images captured by the plurality of edge cameras 1. . This heat map image is an image obtained by visualizing feature amounts extracted from images captured by the plurality of edge cameras 1 using colors. When the feature amount extracted by the feature amount extraction unit 30 is multi-dimensional data composed of data of a plurality of types of feature amounts, the heat map image is obtained by converting the multi-dimensional data using colors. It becomes a heatmap image of the visualized color.

上記の特徴量画像補間部３２は、チェーン店のある店舗に配置されたエッジカメラ１の数が、本来配置されるべきエッジカメラ１の数よりも少ないために、エッジカメラ１による撮影画像が欠落している場合に、これらの撮影画像の欠落により生じるヒートマップ画像の欠落部分を含むヒートマップ画像の全体に対して、学習済の画像補間ネットワークＩ（請求項における「学習済のニューラルネットワーク」）を適用することで、ヒートマップ画像の欠落部分を補間する。 Since the number of edge cameras 1 arranged in a store where a chain store is located is smaller than the number of edge cameras 1 which should be originally arranged, the feature amount image interpolation unit 32 lacks an image captured by the edge camera 1. In this case, a learned image interpolation network I ("learned neural network" in the claims) is applied to the entire heat map image including the missing part of the heat map image caused by the lack of these captured images. Is applied, the missing part of the heat map image is interpolated.

上記の機械学習部３３は、本来配置されるべきエッジカメラ１のうち、ある店舗に欠けているエッジカメラ１を備えた、同じ系列に属する他の店舗において、この他の店舗に配置された全てのエッジカメラ１による撮影画像から抽出された上記の特徴量に基づいて、描画処理部３１が描画したヒートマップ画像に基づく学習用画像を用い、学習完了前の画像補間ネットワークＩ（及びディスクリミネータＤ）の機械学習を行う。より正確に言うと、機械学習部３３は、系列に属するある店舗以外の店舗のうち、本来配置されるべき全てのエッジカメラ１を備えた他の店舗における、全てのエッジカメラ１による撮影画像から抽出された上記の特徴量に基づいて、描画処理部３１が描画したヒートマップ画像に基づく学習用画像を用いて、学習完了前の画像補間ネットワークＩ（及びディスクリミネータＤ）の機械学習を行う。 The above-mentioned machine learning unit 33 performs, in the other stores belonging to the same series, the edge cameras 1 which are to be arranged, including the edge cameras 1 which are missing in a certain store, and all the other cameras arranged in the other stores. The image interpolation network I (and the discriminator) before learning is completed, using a learning image based on the heat map image drawn by the drawing processing unit 31 based on the above-described feature amount extracted from the image captured by the edge camera 1 of FIG. Perform the machine learning of D). To be more precise, the machine learning unit 33 extracts, from among the stores other than a certain store belonging to the affiliated system, the images captured by all the edge cameras 1 in the other stores having all the edge cameras 1 to be originally arranged. Based on the extracted feature amount, machine learning of the image interpolation network I (and discriminator D) before the completion of learning is performed using a learning image based on the heat map image drawn by the drawing processing unit 31. .

上記のエッジカメラ１側の特徴量抽出部３０の機能は、エッジカメラ１側のＣＰＵ１２がエッジカメラ側制御プログラム１５を実行することにより実現される。また、サーバ２側のＣＰＵ２１内の各ブロック（描画処理部３１、特徴量画像補間部３２、及び機械学習部３３）の機能は、ＣＰＵ２１がヒートマップ描画補間プログラム２６を実行することにより実現される。ただし、この構成に限られず、例えば、上記のＣＰＵ１２及びＣＰＵ２１における各ブロックの機能の少なくとも一つを、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等によって構成される個別のハードウェアによって実現してもよい。 The function of the feature amount extraction unit 30 of the edge camera 1 is realized by the CPU 12 of the edge camera 1 executing the edge camera control program 15. The function of each block (drawing processing unit 31, feature amount image interpolation unit 32, and machine learning unit 33) in the CPU 21 of the server 2 is realized by the CPU 21 executing the heat map drawing interpolation program 26. . However, the present invention is not limited to this configuration. For example, at least one of the functions of each block in the CPU 12 and the CPU 21 may be realized by individual hardware including an ASIC (Application Specific Integrated Circuit) or the like.

次に、図３のフローチャートに加えて、図４乃至図６を参照して、上記画像補間ネットワークＩ（及びディスクリミネータＤ）の学習の準備処理について、説明する。この学習の準備処理は、サーバ２が、本来配置されるべき全てのエッジカメラ１から時々刻々と送信される各時刻ｔの特徴量を用いて、描画処理部３１により描画したヒートマップ画像を、学習用画像（の基になる画像）として、繰り返し、訓練データセットに格納する処理である。 Next, with reference to FIGS. 4 to 6 in addition to the flowchart of FIG. 3, a preparation process for learning of the image interpolation network I (and the discriminator D) will be described. In the learning preparation process, the server 2 generates a heat map image drawn by the drawing processing unit 31 using the feature amount at each time t transmitted from time to time from all edge cameras 1 to be originally arranged. This is a process of repeatedly storing in the training data set as (the base image of) the learning image.

この学習の準備処理において、上記エッジカメラ１のＣＰＵ１２の特徴量抽出部３０が、自機（エッジカメラ１）による撮影画像を、図４に示す４×４（４行４列）の撮影エリアＡＲ_ｉに分け、時刻ｔにおける撮影エリアＡＲ_ｉ毎の特徴量ａ_ｉｔを抽出（判定）して、特徴量ａ_ｉｔを含む撮影エリア特徴量情報ｃ_ｉｔを、サーバ２に送信する処理を繰り返す（Ｓ１）。ここで、ｉは、撮影エリアの番号を示す。この撮影エリアの番号ｉには、本来配置されるべき全てのエッジカメラ１による撮影画像に含まれる撮影エリアの通し番号が付与される。より詳細に言うと、撮影エリアの番号ｉには、図５に示すように、全てのエッジカメラ１による撮影画像を、店内全体の撮影画像を再現するように並べたときに、店内全体の撮影画像に含まれる撮影エリアの通し番号が付与される。図５に示す例では、店舗内に、８行８列の６４台のエッジカメラを並べており、この場合には、撮影エリアの番号ｉの最大値は、（８×４）×（８×４）＝３２×３２＝１０２４になる。 In the learning preparation process, the feature amount extraction unit 30 of the CPU 12 of the edge camera 1 converts the image captured by the own device (edge camera 1) into a 4 × 4 (4 rows × 4 columns) shooting area AR shown in FIG. divided into _i, extracts features _{a it} of each imaging area AR _i at time t (determined) that the photographic area feature amount information _{c it} including the feature quantity _{a it,} repeats the process of transmitting to the server 2 (S1 ). Here, i indicates the number of the shooting area. The serial number of the shooting area included in the images shot by all the edge cameras 1 to be originally arranged is assigned to the shooting area number i. More specifically, as shown in FIG. 5, when the images captured by all the edge cameras 1 are arranged so as to reproduce the captured images of the entire store, the image i of the entire store is assigned to the number i of the capture area. A serial number of the shooting area included in the image is given. In the example shown in FIG. 5, 64 edge cameras having 8 rows and 8 columns are arranged in the store. In this case, the maximum value of the number i of the shooting area is (8 × 4) × (8 × 4). ) = 32 × 32 = 1024.

上記の撮影エリアＡＲ_ｉ毎の撮影エリア特徴量情報ｃ_ｉｔは、下記の式（１）で表される。
ｃ_ｉｔ＝｛ｘ_ｉ，ｙ_ｉ，ａ_ｉｔ｝・・・（１） Imaging area feature amount information _{c it} above the imaging area per AR _i is expressed by the following equation (1).
c _it = {xi _, y _i , a _it } (1)

上記のｘ_ｉ，ｙ_ｉは、それぞれ、撮影エリアＡＲ_ｉのｘ座標方向（横方向）とｙ座標方向（縦方向）の位置を表し、ａ_ｉｔは、時刻ｔにおける撮影エリアＡＲ_ｉ毎の特徴量ａ_ｉｔを表す。上記の撮影エリアＡＲ_ｉのｘ座標方向（横方向）とｙ座標方向（縦方向）の位置とは、例えば、撮影エリアＡＲ_ｉ内のｘ座標の最小値とｙ座標の最小値である。なお、式（１）に示すように、撮影エリアＡＲ_ｉのｘ座標方向の位置ｘ_ｉとｙ座標方向の位置ｙ_ｉとは、時刻ｔに関わらず、不変である。 The above x _{i and} y _i represent the position of the imaging area AR _{i in} the x-coordinate direction (horizontal direction) and the y-coordinate direction (vertical direction), respectively, and a _it is the characteristic of each imaging area AR _{i at} time t. Represents the quantity a _it . The position of the x-coordinate direction of the imaging area AR _i (horizontal direction) and y coordinate direction (vertical direction), for example, the minimum value of the minimum value and the y coordinate of the x-coordinate of the imaging areas AR _i. Note that, as shown in Expression (1), the position x _i in the x coordinate direction and the position y _{i in the} y coordinate direction of the imaging area AR _i are unchanged regardless of the time t.

また、上記の時刻ｔにおける撮影エリアＡＲ_ｉ毎の特徴量ａ_ｉｔは、例えば、時刻ｔにおける撮影エリアＡＲ_ｉ内に含まれる（滞留している）男性の人数、女性の人数、滞留時間である。この場合には、特徴量ａ_ｉｔは、下記の式（２）で表される。ここで、滞留時間は、例えば、撮影エリアＡＲ_ｉに滞留している（留まっている）人のうち、滞留時間が一番長い人の滞留時間である。ただし、現在、撮影エリアＡＲ_ｉに滞留している人の滞留時間の平均値を、滞留時間の値としてもよい。
ａ_ｉｔ＝｛男性人数_ｉｔ、女性人数_ｉｔ、滞留時間_ｉｔ｝・・・（２） The feature amount a _{it for} each shooting area AR _{i at} the time t is, for example, the number of men (staying), the number of women, and the staying time included in the shooting area AR _{i at} the time t. . In this case, the feature amount a _it is represented by the following equation (2). Here, the residence time, for example, among of which (remains) who stays in imaging areas AR _i, the residence time is the longest human residence time. However, currently, the average value of the residence time of the people who are staying in the shooting area AR _i, may be as the value of the residence time.
a _it = {number of men _it , number of women _it , residence time _it } (2)

図４における下の部分には、各時刻における撮影エリアＡＲ_ｉ毎の特徴量ａ_ｉｔを示している。例えば、図４に示すように、ａ_ｉｔ＝｛１，１，４｝の場合は、時刻ｔにおける撮影エリアＡＲ_ｉ内に含まれる（滞留している）男性の人数＝１人、女性の人数＝１人、滞留時間＝４秒である。 The lower part in FIG. 4 shows the feature amount a _it for each shooting area AR _{i at} each time. For example, as shown in FIG. _4, a it = For {1,1,4}, are included in the captured area AR _i at time t (staying) Number = 1 male, number of women = 1 person, dwell time = 4 seconds.

サーバ２は、全てのエッジカメラ１から取得した全撮影エリアＡＲ_１〜ＡＲ_ｎの時刻ｔにおける撮影エリア特徴量情報ｃ_１ｔ〜ｃ_ｎｔを、下記の式（３）に示すように、特徴量情報ｃ_ｔとして、まとめる。
ｃ_ｔ＝｛ｃ_０ｔ，ｃ_{１ｔ，・・・}，ｃ_ｎｔ｝・・・（３） The server 2 converts the photographing area feature amount information c _{1t to} c _nt at time t of all the photographing areas AR _{1 to} AR _n acquired from all the edge cameras 1 into feature amount information as shown in the following equation (3). _Collected as ct.
c _t = {c _0t, c _1t,... , c _nt } (3)

そして、サーバ２は、全てのエッジカメラ１より取得した、時刻（ｔ−Ｔ）から時刻ｔにおける特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔから、時刻ｔのヒートマップ画像ＨＭ_ｔを生成する（Ｓ２）。ここで、Ｔは、所定の時間であり、例えば、３０秒に設定される。上記のヒートマップ画像ＨＭ_ｔは、上記の所定の時間Ｔ分の特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔに含まれる特徴量を、色を用いて可視化したものである。より詳細に説明すると、時刻ｔにおける特徴量情報ｃ_ｔに含まれる特徴量ａ_１ｔ〜ａ_ｎｔは、上記式（２）に示されるように、例えば、女性人数、男性人数、及び滞留時間のデータから構成される多次元データである。この例の場合は、例えば、特徴量ａ_ｉｔに含まれる男性人数_ｉｔ、女性人数_ｉｔ、滞留時間_ｉｔを、それぞれ、Ｒ（赤）、Ｂ（青）、Ｇ（緑）で表すことにより、上記のヒートマップ画像ＨＭ_ｔは、カラーのヒートマップ画像になる。 Then, the server 2 generates all obtained from the edge camera 1, the time from the feature amount information _{c _(t-T) ~c t} at time t from (t-T), heat map image HM _t at time t (S2). Here, T is a predetermined time, and is set to, for example, 30 seconds. The above heat map image HM _t is one in which the characteristic amount contained in the feature amount information c _{_(t-T)} ~c _t of the predetermined time T min, and visualized by using color. More particularly, the feature amount a _1t ~a _nt included in the feature amount information c _t at time t, as shown in equation (2), for example, female persons, male persons, and residence time of the data Is multidimensional data composed of In the case of this example, for example, the number of males _it , the number of females _it , and the residence time _it included in the feature amount a _it are represented by R (red), B (blue), and G (green), respectively. heat map image HM _t of will color the heat map image.

ただし、ヒートマップ画像ＨＭ_ｔは、上記のように、所定の時間Ｔ分の特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔに基づいて生成したものであり、時刻（ｔ−Ｔ）〜時刻ｔの各々の時刻における特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔから生成した、時刻（ｔ−Ｔ）〜時刻ｔのカラーのヒートマップ画像を、合成したものである。この各時刻におけるカラーのヒートマップ画像を合成する際には、時間が遡るに従って、合成対象となるヒートマップ画像におけるＲ，Ｇ，Ｂの輝度値への重み付けが小さくなるように（直近の時刻ｔにおけるＲ，Ｇ，Ｂの輝度値への重み付けが一番大きく、最も古い時刻（ｔ−Ｔ）におけるＲ，Ｇ，Ｂの輝度値への重み付けが一番が一番小さくなるように）した上で、これらのヒートマップ画像を合成することが望ましい。 However, heat map image HM _t, as described above, are those generated based on the feature amount information _{c (t-T)} to c _t for a predetermined time T minutes, the time (t-T) ~ time t generated from the feature amount information c _{_(t-T)} ~c _t at each time, the time a (t-T) color heatmap image of the to time t, is one synthesized. When the color heat map images at each time are combined, the weighting of the R, G, and B luminance values in the heat map image to be combined becomes smaller as the time goes back (the latest time t , So that the weighting of the R, G, and B luminance values is the largest, and the weighting of the R, G, and B luminance values at the oldest time (t-T) is the smallest. Therefore, it is desirable to combine these heat map images.

図５に示すように、撮影エリアの数が、３２行３２列である場合は、各撮影エリアを１ピクセルで表すと、ヒートマップ画像ＨＭ_ｔの大きさは、３２×３２＝１０２４ピクセルになる。 As shown in FIG. 5, the number of shooting area, if it is 32 rows and 32 columns, expressed each shot area by one pixel, the size of the heat map image HM _t will 32 × 32 = 1024 pixels .

サーバ２のＣＰＵ２１は、図３中の上記Ｓ２で生成したヒートマップ画像ＨＭ_ｔを、学習用画像の基になる画像として訓練データセット２９（図２参照）に追加する（図３のＳ３）。 CPU21 of the server 2, the heat map image HM _t generated in the signal line S2 in FIG. 3 is added to the training data set 29 (see FIG. 2) as an image on which to base the learning image (S3 in FIG. 3).

上記図３の処理を、学習システム１０全体で見ると、図６に示すように、学習システム１０は、全エッジカメラ１で取得した時刻（ｔ−Ｔ）〜時刻ｔの撮影画像から、全撮影エリアにおける時刻（ｔ−Ｔ）〜時刻ｔの特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔを抽出して、この（所定の）時間Ｔ分の特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔに基づいて、ヒートマップ画像ＨＭ_ｔを生成し、時事刻々と生成されるヒートマップ画像ＨＭ_ｔを、学習用画像（の基になる画像）として訓練データセット２９に追加する処理を繰り返す。そして、学習時には、訓練データセット２９に格納された各時刻におけるヒートマップ画像（例えば、ヒートマップ画像ＨＭ_ｔ）と、各時刻におけるヒートマップ画像に対応するランダムマスクとに基づいて、画像補間ネットワークＩの（機械）学習を行う。すなわち、ランダムマスクを施したヒートマップ画像を学習用画像として用いて、画像補間ネットワークＩの（機械）学習を行う。なお、正確に言うと、本実施形態では、画像補間ネットワークＩとして、ＤＣＧＡＮのＧｅｎｅｒａｔｏｒに相当するネットワークを使用するので、上記のＧｅｎｅｒａｔｏｒに相当する画像補間ネットワークＩの学習時には、ディスクリミネータＤ（Ｄｉｓｃｒｉｍｉｎａｔｏｒ）の学習も行う。画像補間ネットワークＩ（及びディスクリミネータＤ）の（機械）学習処理については、後で、詳述する。 When the processing of FIG. 3 is viewed as a whole in the learning system 10, as shown in FIG. 6, the learning system 10 performs the entire shooting from the captured images at the time (t−T) to the time t acquired by the all edge cameras 1. extracting the time (t-T) ~ time t feature amount information _{c (t-T)} to c _t of the area, the (predetermined) time T min of the feature amount information _{c (t-T)} to c _t based on, to generate a heat map image HM _t, the heat map image HM _t to be varies continuously generating repeats the process of adding to the training data set 29 as the learning images (images to be of the group). Then, at the time of learning, the image interpolation network I based on the heat map image (for example, the heat map image HM _t ) at each time stored in the training data set 29 and the random mask corresponding to the heat map image at each time. (Machine) learning of That is, (machine) learning of the image interpolation network I is performed using the heat map image on which the random mask is applied as a learning image. To be precise, in the present embodiment, a network corresponding to a DCGAN generator is used as the image interpolation network I. Therefore, when learning the image interpolation network I corresponding to the above-described generator, the discriminator D (Discriminator) is used. ) Is also learned. The (machine) learning processing of the image interpolation network I (and the discriminator D) will be described later in detail.

次に、上記の機械学習後の画像補間ネットワークＩ（学習済みの画像補間ネットワークＩ）を用いた、ヒートマップ画像の補間処理について、説明する。この学習済みの画像補間ネットワークＩによる画像補間処理は、いわゆるニューラルネットワークの推論処理に相当する。上記の本来配置されるべき数よりも少ない数（例えば、５台〜１０台）のエッジカメラ１しか設置されていない店舗では、配置されたエッジカメラ１の数が、本来配置されるべきエッジカメラ１の数よりも少ないために、撮影画像に欠落が生じる。このため、店舗内に配置されたエッジカメラ１の数が、本来配置されるべきエッジカメラ１の数よりも少ない場合には、店舗内に配置された全エッジカメラ１からの撮影画像を図５に示すように並べた画像に、図８に示すような欠落部分が生じる。学習済みの画像補間ネットワークＩは、上記のように、撮影画像が欠落している場合に、これらの撮影画像の欠落により生じるヒートマップ画像の欠落部分を補間する。 Next, the interpolation process of the heat map image using the image interpolation network I after the machine learning (the learned image interpolation network I) will be described. The image interpolation processing by the learned image interpolation network I corresponds to a so-called neural network inference processing. In a store where only a smaller number of edge cameras 1 (for example, 5 to 10) are installed than the number of edge cameras 1 to be arranged, the number of edge cameras 1 to be arranged is determined by the number of edge cameras to be originally arranged. Since the number is smaller than 1, the captured image is missing. For this reason, when the number of edge cameras 1 arranged in the store is smaller than the number of edge cameras 1 to be originally arranged, the images taken from all the edge cameras 1 arranged in the store are shown in FIG. In the images arranged as shown in FIG. 8, a missing portion as shown in FIG. 8 occurs. The learned image interpolation network I, as described above, interpolates a missing portion of the heat map image caused by a missing captured image when the captured image is missing.

図７のフローチャートと図８を参照して、上記のヒートマップ画像の補間処理の詳細について、説明する。まず、本来配置されるべき数よりも少ないエッジカメラ１が配置された店舗における各エッジカメラ１が、自機の撮影画像から、時刻ｔにおける撮影エリアＡＲ_ｉ毎の特徴量ａ^Ｌ _ｉｔを抽出（判定）して、特徴量ａ^Ｌ _ｉｔを含む撮影エリア特徴量情報ｃ^Ｌ _ｉｔを、サーバ２に送信する処理を繰り返す（Ｓ１１）。 The details of the above-described heat map image interpolation processing will be described with reference to the flowchart of FIG. 7 and FIG. First, extraction each edge camera 1 in store small edge camera 1 than the number is arranged to be located originally, from the captured image of the own apparatus, the feature amount a ^L _it of each imaging area AR _i at time t ( determination) that the imaging area feature amount information ^c _{L it} including the feature quantity ^a _{L it,} repeats the process of transmitting to the server 2 (S11).

上記の撮影エリアＡＲ_ｉ毎の撮影エリア特徴量情報ｃ^Ｌ _ｉｔは、上記の式（１）と同様な下記の式（４）で表される。
ｃ^Ｌ _ｉｔ＝｛ｘ^Ｌ _ｉ，ｙ^Ｌ _ｉ，ａ^Ｌ _ｉｔ｝・・・（４） Imaging area feature amount information c ^L _it of the shooting area per AR _i is represented by the above formula (1) and similar following formula (4).
^{_{^{_{^{c L it = {x L i}}}}} , y L i, a L it} ··· (4)

上記のｘ^Ｌ _ｉ，ｙ^Ｌ _ｉ，ａ^Ｌ _ｉｔは、それぞれ、式（１）におけるｘ_ｉ，ｙ_ｉ，ａ_ｉｔに相当する。なお、上特徴量ａ^Ｌ _ｉｔは、上記の特徴量ａ_ｉｔと同様に、特徴量として、男性人数、女性人数、及び滞留時間を含んでおり、上記式（２）と同様の式で表される。 The above ^{_{^{_{^{x L i, y L i,}}}}} a L , respectively, corresponding to _x _{_i,} y _i, _{a it} in the formula (1). Incidentally, the upper characteristic quantity a ^L _it, like the above feature quantity a _it, as a feature, male persons, female persons, and includes a retention time is represented by the formula similar to the formula (2) You.

サーバ２は、各エッジカメラ１から取得することができた、時刻ｔにおける撮影エリア特徴量情報ｃ^Ｌ _１ｔ〜ｃ^Ｌ _ｍｔを、下記の式（５）に示すように、特徴量情報ｃ^Ｌ _ｔとして、まとめる。ただし、上記のｍは、上記式（３）におけるｎよりも小さい数である。
ｃ^Ｌ _ｔ＝｛ｃ^Ｌ _０ｔ，ｃ^Ｌ _{１ｔ，・・・}，ｃ^Ｌ _ｍｔ｝・・・（５） Server 2 could be obtained from each edge camera 1, the imaging area feature amount information ^{^c} _{_L} 1t ^~c _L _mt at time t, as shown in the following formula (5), the feature amount information ^c _{L t} As a summary. Here, the above m is a number smaller than n in the above equation (3).
^{_{^{_{^{c L t = {c L 0t}}}}} , c L 1t, ···, c L mt} ··· (5)

そして、サーバ２は、各エッジカメラ１から取得することができた、時刻（ｔ−Ｔ）から時刻ｔにおける特徴量情報ｃ^Ｌ _{（ｔ−Ｔ）}〜ｃ^Ｌ _ｔから、時刻ｔの欠落ヒートマップ画像ＨＭ^Ｌ _ｔを生成する（Ｓ１２）。ただし、ヒートマップ画像補間処理を行うときは、上記のように、撮影画像に欠落が生じている場合であるので、各エッジカメラ１からサーバ２に送信される各時刻の撮影エリア特徴量情報（例えば、時刻ｔの撮影エリア特徴量情報ｃ^Ｌ _１ｔ〜ｃ^Ｌ _ｎｔ）にも、欠落が生じる。このため、図８に示すように、時刻（ｔ−Ｔ）から時刻ｔにおける撮影エリア特徴量情報をまとめた特徴量情報ｃ^Ｌ _{（ｔ−Ｔ）}〜ｃ^Ｌ _ｔより生成された欠落ヒートマップ画像ＨＭ^Ｌ _ｔにも、欠落部分が生じる。 Then, the server 2 obtains the missing heat map at time t from the feature amount information c ^L _{(t−T) to} c ^L _t from time (t−T) to time t, which could be obtained from each edge camera 1. generating an image ^HM _{L t} (S12). However, when the heat map image interpolation process is performed, since the captured image is missing as described above, the captured area feature amount information (at each time) transmitted from each edge camera 1 to the server 2 is transmitted. For example, the shooting area feature amount information c ^L _{1t to} c ^L _{nt at} time t also lacks. For this reason, as shown in FIG. 8, the missing heat map image generated from the feature amount information c ^L _{(t−T) to} c ^L _t that summarizes the imaging area feature amount information from time (t−T) to time t. also HM ^L _t, missing part occurs.

図８に示すように、欠落ヒートマップ画像ＨＭ^Ｌ _ｔを、学習済みの画像補間ネットワークＩに入力すると（図７のＳ１３）、学習済みの画像補間ネットワークＩが、（補間済みの）補間ヒートマップ画像ＩＨＭ_ｔを出力する（図７のＳ１４）。すなわち、学習済みの画像補間ネットワークＩが、入力された欠落ヒートマップ画像ＨＭ^Ｌ _ｔにおける欠落部分を補間する。 As shown in FIG. 8, the missing heat map image HM ^L _t, the input to the learned image interpolation network I (S13 in FIG. 7), the learned image interpolation network I is (interpolated) of the interpolation heatmap and outputs the image IHM _t (S14 in FIG. 7). That is, the learned image interpolation network I is interpolates the missing part of the missing heat map image HM ^L _t input.

上記のように、学習済みの画像補間ネットワークＩを用いて、各時刻の欠落ヒートマップ画像を補間することにより、チェーン店の各店舗に配置するエッジカメラ１の数を、本来配置されるべきエッジカメラ１の数よりも少なくした場合でも、本来配置されるべきエッジカメラ１の数と同じ数のエッジカメラ１を配置した場合と遜色のないヒートマップ画像（補間ヒートマップ画像ＩＨＭ_ｔ）を得ることができる。従って、チェーン店の各店舗に設置するエッジカメラ１の数を抑えて、チェーン店の各店舗の新規開店費用や維持費用を削減することができる。 As described above, by interpolating the missing heat map image at each time using the learned image interpolation network I, the number of edge cameras 1 to be arranged at each store in the chain store is determined by the edge to be originally arranged. Even if the number is smaller than the number of cameras 1, a heat map image (interpolated heat map image IHM _t ) is obtained which is comparable to the case where the same number of edge cameras 1 as the number of edge cameras 1 to be arranged is arranged. Can be. Therefore, it is possible to reduce the number of edge cameras 1 installed in each chain store, and to reduce the cost of newly opening and maintaining each chain store.

次に、ＤＣＧＡＮのＧｅｎｅｒａｔｏｒに相当する画像補間ネットワークＩと、ＤＣＧＡＮのＤｉｓｃｒｉｍｉｎａｔｏｒに相当するディスクリミネータＤについて、図９乃至図１２を参照して、説明する。画像補間ネットワークＩは、ＦＣＮ（ＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）ベースのニューラルネットワークであり、図９及び図１０に示すように、多数の畳み込み層（ｃｏｎｖｏｌｕｔｉｏｎｌａｙｅｒ）から構成されている。より正確に言うと、画像補間ネットワークＩにおける各畳み込み層（ｃｏｎｖｏｌｕｔｉｏｎｌａｙｅｒ）の後ろには、ＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）層が存在する。また、画像補間ネットワークＩにおける出力層（図１０中の“ＯＵＴＰＵＴ”）は、シグモイド関数付きの畳み込み層である。 Next, an image interpolation network I corresponding to a DCGAN generator and a discriminator D corresponding to a DCGAN discriminator will be described with reference to FIGS. The image interpolation network I is a Full Convolutional Network (FCN) -based neural network, and includes a number of convolution layers as shown in FIGS. 9 and 10. More precisely, behind each convolution layer in the image interpolation network I is a ReLU (Rectified Linear Unit) layer. The output layer (“OUTPUT” in FIG. 10) in the image interpolation network I is a convolution layer with a sigmoid function.

なお、図１０中の「タイプ」は、層（レイヤー）のタイプ（種類）を示し、「フィルターサイズ」は、カーネル（畳み込み演算用のフィルター）の大きさを示す。また、「Ｄｉｌａｔｉｏｎ」は、ＤｉｌａｔｅｄＣｏｎｖｏｌｕｔｉｏｎ（フィルター（カーネル）との積を取る相手の間隔をあける畳み込みのこと）における間隔の大きさを示し、「ストライド」は、フィルターを適用する窓（ｗｉｎｄｏｗ）の間隔を示す。図１０中の「出力チャンネル数」は、各層（レイヤー）の出力チャンネル数（フィルターの数に相当）を示す。図１０の「タイプ」欄において、ｃｏｎｖ．，ｄｅｃｏｎｖ．，ｄｉｌａｔｅｄｃｏｎｖ．，ｏｕｔｐｕｔは、それぞれ、通常の畳み込み層（ｃｏｎｖｏｌｕｔｉｏｎｌａｙｅｒ）、ｄｅｃｏｎｖｏｌｕｔｉｏｎを行う（アップサンプリング（アンプーリング）を行ってから畳み込む）ｌａｙｅｒ、上記のＤｉｌａｔｅｄＣｏｎｖｏｌｕｔｉｏｎを行うｌａｙｅｒ、出力層（ｏｕｔｐｕｔｌａｙｅｒ）を示す。 Note that “type” in FIG. 10 indicates the type (type) of a layer, and “filter size” indicates the size of a kernel (filter for convolution operation). “Dilation” indicates the size of the interval in the Diluted Convolution (convolution in which the interval between the counterparts that take the product with the filter (kernel) is taken), and “stride” indicates the size of the window to which the filter is applied. Indicates the interval. The “number of output channels” in FIG. 10 indicates the number of output channels of each layer (corresponding to the number of filters). In the “type” column of FIG. 10, conv. , Deconv. , Dilated conv. , Output indicate a normal convolution layer, a layer that performs deconvolution (convolution after performing upsampling (amplifying)), a layer that performs the above-described Diluted Convolution, and an output layer (output layer).

図９及び図１０に示すように、画像補間ネットワークＩは、２つ目と４つ目の畳み込み層において、畳み込み時のストライドを２×２にすることにより、元の画像のサイズＨ×Ｗを、それぞれ、（Ｈ／２）×（Ｗ／２）と、（Ｈ／４）×（Ｗ／４）に縮小しつつ、畳み込みを行う。このように、画像の縮小処理を行いつつ、畳み込み処理を繰り返すことにより、画像における特徴の位置情報が曖昧になる。この後、ＤｉｌａｔｅｄＣｏｎｖｏｌｕｔｉｏｎ（フィルターとの積を取る相手の間隔をあける畳み込み）を行うことにより、受容野（あるピクセルに影響を与える入力部分）を増やす。そして、最後から３つ目と５つ目の畳み込み層（ｄｅｃｏｎｖ．）において、ストライドを（１／２）×（１／２）にして、ｄｅｃｏｎｖｏｌｕｔｉｏｎを行うことにより、一旦、（Ｈ／４）×（Ｗ／４）にした画像のサイズを、それぞれ、（Ｈ／２）×（Ｗ／２）と、Ｈ×Ｗに復元しつつ、畳み込み処理を繰り返す。なお、図９におけるＨ×Ｗは、例えば、３２（ピクセル）×３２（ピクセル）である。 As shown in FIGS. 9 and 10, the image interpolation network I reduces the original image size H × W by setting the stride at the time of convolution to 2 × 2 in the second and fourth convolution layers. Are convolved while reducing the size to (H / 2) × (W / 2) and (H / 4) × (W / 4), respectively. As described above, by repeating the convolution process while performing the image reduction process, the position information of the feature in the image becomes ambiguous. After that, the number of receptive fields (input portions that affect a certain pixel) is increased by performing a Diluted Convolution (convolution with a space between the counterparts that take the product with the filter). Then, in the third and fifth convolutional layers (deconv.) From the end, the stride is set to (1/2) × (1/2), and deconvolution is performed, so that (H / 4) × The convolution process is repeated while restoring the size of the image (W / 4) to (H / 2) × (W / 2) and H × W, respectively. Note that H × W in FIG. 9 is, for example, 32 (pixels) × 32 (pixels).

図８に示すような学習済みの画像補間ネットワークＩによる推論時には、画像補間ネットワークＩへの入力画像は、欠落ヒートマップ画像ＨＭ^Ｌ _ｔである。この推論時には、サーバ２側のＣＰＵ２１における特徴量画像補間部３２は、入力された欠落ヒートマップ画像ＨＭ^Ｌ _ｔに対する画像補間ネットワークＩの適用後に、入力された欠落ヒートマップ画像ＨＭ^Ｌ _ｔにおける欠落部分のみを、画像補間ネットワークＩの出力層からの出力画像（図９における出力ヒートマップ画像）で置き換える。すなわち、サーバ２側のＣＰＵ２１における特徴量画像補間部３２は、入力された欠落ヒートマップ画像ＨＭ^Ｌ _ｔにおける欠落部分の画像については、画像補間ネットワークＩの出力層からの出力画像（出力ヒートマップ画像）で置き換えるが、欠落ヒートマップ画像ＨＭ^Ｌ _ｔにおける欠落部分以外の部分の画像については、入力された欠落ヒートマップ画像ＨＭ^Ｌ _ｔ（における部分画像）を、そのまま用いて、図８に示す補間ヒートマップ画像ＩＨＭ_ｔを生成する。 When inference by trained image interpolation network I as shown in FIG. 8, the input image to the image interpolation network I is a missing heatmap image HM ^L _t. During this reasoning, the feature image interpolation unit 32 of the CPU21 of the server 2 side, after application of the image interpolation network I in for the missing heat map image HM ^L _t input, the missing portion of the missing heat map image HM ^L _t input Is replaced with the output image from the output layer of the image interpolation network I (the output heat map image in FIG. 9). That is, the feature quantity image interpolation unit 32 in the CPU21 of the server 2 side, the image of the missing portion in the missing heat map image HM ^L _t input, the output image (output heatmap image from the output layer of the image interpolation network I replaced with), but for the image of the portion other than the missing portion of the missing heat map image HM ^L _t, missing inputted heat map image HM L ^t _(partial image in), used as it is, the interpolation heat shown in FIG. 8 to generate a map image IHM _t.

一方、図６に示すような画像補間ネットワークＩ（及びディスクリミネータＤ）の学習時には、画像補間ネットワークＩへの入力画像（学習用画像）は、ランダムマスク情報付のヒートマップ画像である。すなわち、学習時における画像補間ネットワークＩへの入力画像は、欠落部分がないヒートマップ画像ＨＭ_ｔに、ランダムな位置と大きさのマスク領域を設けた画像である。この学習時には、サーバ２側のＣＰＵ２１における特徴量画像補間部３２は、上記のマスク領域を有するヒートマップ画像（以下、「マスク付きヒートマップ画像」）に対して、画像補間ネットワークＩを適用した後、入力されたマスク付きヒートマップ画像におけるマスク領域のみを、画像補間ネットワークＩの出力層からの出力画像（図９における出力ヒートマップ画像）で置き換える。すなわち、サーバ２側のＣＰＵ２１における特徴量画像補間部３２は、推論時と同様に、入力されたマスク付きヒートマップ画像におけるマスク領域の画像については、画像補間ネットワークＩの出力層からの出力画像（出力ヒートマップ画像）で置き換えるが、マスク付きヒートマップ画像におけるマスク領域以外の部分の画像については、入力されたマスク付きヒートマップ画像（における部分画像）を、そのまま用いて、補間ヒートマップ画像ＩＨＭ_ｔを生成する。そして、学習時には、この補間ヒートマップ画像ＩＨＭ_ｔは、ディスクリミネータＤに送られて、ディスクリミネータＤの学習に用いられる。 On the other hand, when learning the image interpolation network I (and discriminator D) as shown in FIG. 6, the input image (learning image) to the image interpolation network I is a heat map image with random mask information. That is, the input image to the image interpolation network I during learning, the no missing portion heatmap image HM _t, is an image in which a mask area of the random position and size. At the time of this learning, the feature amount image interpolating unit 32 in the CPU 21 of the server 2 applies the image interpolation network I to the heat map image having the above mask area (hereinafter, “heat map image with mask”). Only the mask area in the input heat map image with the mask is replaced with the output image from the output layer of the image interpolation network I (the output heat map image in FIG. 9). That is, the feature amount image interpolating unit 32 in the CPU 21 of the server 2 determines the output image (from the output layer of the image interpolation network I) for the image of the mask area in the input heat map image with the mask as in the inference. The output heat map image) is replaced with the interpolation heat map image IHM _t for the image of the portion other than the mask region in the masked heat map image, using the input heat map image with the mask (partial image in) as it is. Generate Then, at the time of learning, the interpolation heat map image IHM _t is sent to the discriminator D, used in learning of the discriminator D.

なお、上記の推論時に学習済みの画像補間ネットワークＩへ入力される欠落ヒートマップ画像ＨＭ^Ｌ _ｔにおける欠落部分の画像、及び学習時に画像補間ネットワークＩへ入力されるマスク付きヒートマップ画像におけるマスク領域の画像については、画像補間ネットワークＩへ入力する前に、これらの画像（欠落部分及びマスク領域の画像）の各ピクセルにおけるＲ，Ｇ，Ｂの輝度値が、それぞれ、訓練データセット２９内の全てのヒートマップ画像の平均的なＲ，Ｇ，Ｂの輝度値の値になるようにしておくことが望ましい。 Note that the image of the missing portion in the missing heat map image HM ^L _t which is entered when the above reasoning to the trained image interpolation network I, and the mask regions in the mask with a heat map image input to the image interpolation network I during learning For the images, before input to the image interpolation network I, the luminance values of R, G, and B at each pixel of these images (images of the missing portion and the mask area) are respectively set for all the pixels in the training data set 29. It is desirable that the average R, G, and B luminance values of the heat map image be set.

次に、上記のディスクリミネータＤについて、説明する。このディスクリミネータＤは、入力された画像が、全てのエッジカメラ１より取得した特徴量情報に基づいて生成した（本物の）ヒートマップ画像（例えば、図６に示すヒートマップ画像ＨＭ_ｔ）であるか、画像補間ネットワークＩを用いて生成（補間）した補間ヒートマップ画像（例えば、補間ヒートマップ画像ＩＨＭ_ｔ）であるかを識別するネットワークである。画像補間ネットワークＩ、及びディスクリミネータＤの学習時には、ディスクリミネータＤは、入力画像が、画像補間ネットワークＩを用いて生成（補間）した補間ヒートマップ画像であるか、（補間により生成したものではない）本物のヒートマップ画像であるかを正しく識別するように学習する。一方、画像補間ネットワークＩは、補間により生成した補間ヒートマップ画像を、ディスクリミネータＤが、（補間により生成したものではない）本物のヒートマップ画像であると分類（識別）を誤るように、学習を行う。このように、画像補間ネットワークＩとディスクリミネータＤとが競い合うようにして学習を進めることにより、画像補間ネットワークＩが、（補間により生成したものではない）本物のヒートマップ画像に近い補間ヒートマップ画像を生成することができる。この学習処理の詳細については、後述する。 Next, the discriminator D will be described. The discriminator D is an input image based on a (real) heat map image (for example, a heat map image HM _t shown in FIG. 6) generated based on feature amount information acquired from all edge cameras 1. This is a network that identifies whether or not there is an interpolated heat map image generated (interpolated) using the image interpolation network I (for example, an interpolated heat map image IHM _t ). At the time of learning the image interpolation network I and the discriminator D, the discriminator D determines whether the input image is an interpolation heat map image generated (interpolated) using the image interpolation network I, or Not) Learn to correctly identify whether it is a real heatmap image. On the other hand, the image interpolation network I categorizes (identifies) the interpolated heat map image generated by interpolation so that the discriminator D is a real heat map image (not generated by interpolation). Do the learning. As described above, the learning is performed in such a manner that the image interpolation network I and the discriminator D compete with each other, so that the image interpolation network I can obtain an interpolation heat map close to a real heat map image (not generated by interpolation). Images can be generated. The details of this learning process will be described later.

ここで、図１１及び図１２を参照して、ディスクリミネータＤの構成について説明する。これらの図に示すように、ディスクリミネータＤは、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）ベースのネットワークであり、３つの畳み込み層と、１つの全結合層（ｆｕｌｌｙｃｏｎｎｅｃｔｅｄｌａｙｅｒ：図１２中の“ＦＣ”）から構成されている。上記の３つの畳み込み層の各々は、２（ピクセル）×２（ピクセル）のストライドを採用することにより画像のサイズを小さくする（画像解像度を減らす）一方で、各層における出力フィルタの数を、直前の畳み込み層の倍に増やしている。なお、図１１における入力画像のサイズ（Ｈ×Ｗ）は、３２（ピクセル）×３２（ピクセル）である。ディスクリミネータＤの出力層（図１２中の“ＦＣ”）は、シグモイド関数付きの全結合層であり、入力画像が上記の本物のヒートマップ画像である確率（に対応する値）を出力する。なお、図１２における「タイプ」、「フィルターサイズ」、「ストライド」、及び「出力チャンネル数」の意味は、図１０と同じである。また、図１２の「タイプ」欄において、ｃｏｎｖ．，及びＦＣは、それぞれ、通常の畳み込み層（ｃｏｎｖｏｌｕｔｉｏｎｌａｙｅｒ）、及び全結合層（ｆｕｌｌｙｃｏｎｎｅｃｔｅｄｌａｙｅｒ）を示す。 Here, the configuration of the discriminator D will be described with reference to FIGS. As shown in these figures, the discriminator D is a CNN (Convolutional Neural Network) -based network, and includes three convolutional layers and one fully connected layer (“FC” in FIG. 12). It is composed of Each of the above three convolutional layers reduces the size of the image (reduces image resolution) by employing a 2 (pixel) × 2 (pixel) stride, while reducing the number of output filters in each layer by Double the number of convolution layers. Note that the size (H × W) of the input image in FIG. 11 is 32 (pixels) × 32 (pixels). The output layer (“FC” in FIG. 12) of the discriminator D is a fully connected layer with a sigmoid function, and outputs the probability (a value corresponding to) that the input image is the real heat map image described above. . The meanings of “type”, “filter size”, “stride”, and “number of output channels” in FIG. 12 are the same as those in FIG. In the “type” column of FIG. 12, conv. , And FC denote a normal convolution layer and a fully connected layer, respectively.

次に、図１３を参照して、上記サーバ２側のＣＰＵ２１の機械学習部３３が行う、画像補間ネットワークＩ及びディスクリミネータＤの機械学習について説明する。本来配置されるべき全てのエッジカメラ１から時々刻々と送信される各時刻の特徴量を用いて、描画処理部３１により描画されたヒートマップ画像が、所定の枚数以上、訓練データセット２９に格納された後、ユーザが、操作部３０により、サーバ２に対して、機械学習の開始を指示すると、ＣＰＵ２１の機械学習部３３は、繰り返し数ＩＴに１をセットして初期化すると共に、画像補間ネットワークＩ及びディスクリミネータＤのパラメータ（畳み込み層については、フィルタ及びバイアス）の初期化（初期値のセット）を行う（Ｓ２１）。 Next, the machine learning of the image interpolation network I and the discriminator D performed by the machine learning unit 33 of the CPU 21 of the server 2 will be described with reference to FIG. A predetermined number or more of heat map images drawn by the drawing processing unit 31 are stored in the training data set 29 using the feature amounts at each time instantly transmitted from all the edge cameras 1 to be originally arranged. After that, when the user instructs the server 2 to start machine learning using the operation unit 30, the machine learning unit 33 of the CPU 21 sets 1 to the number of repetitions IT, initializes it, and performs image interpolation. Initialization (setting of initial values) of parameters of the network I and the discriminator D (filter and bias for the convolutional layer) is performed (S21).

次に、ＣＰＵ２１の機械学習部３３は、訓練データセット２９から、（欠落部分のない）ヒートマップ画像のミニバッチをランダムに選択して（Ｓ２３）、このミニバッチに含まれる各ヒートマップ画像Ｘ_ｉについて、補間領域を表すバイナリーマスク（補間領域マスク）であるマスクＭＩ_ｉをランダムに生成する（Ｓ２４）。そして、ＣＰＵ２１の機械学習部３３は、ミニバッチに含まれる各ヒートマップ画像Ｘ_ｉと、このヒートマップ画像Ｘ_ｉに対応するマスクＭＩ_ｉに基づいて、下記の式（６）の重み付き二乗和誤差の損失関数を用いて、画像補間ネットワークＩのパラメータ（フィルタ及びバイアス）を更新する処理を、ミニバッチ単位で行う（Ｓ２６）。このミニバッチ単位の更新処理は、画像補間ネットワークＩの学習を安定させるために行われる処理であり、繰り返し数ＩＴがＩＴ_ｉｎｔ１になるまでの間（Ｓ２５でＹＥＳ）、上記Ｓ２３乃至Ｓ２６の処理が繰り返される。下記の式（６）の損失関数は、上記の補間領域を表すマスクＭＩ_ｉを考慮した重み付き二乗和誤差の損失関数である。

Then, the machine learning unit 33 of the CPU21, from the training data set 29, (missing no portion) of the mini-batch heatmap image selected at random (S23), for each heat map image X _i included in the mini-batch the mask MI _i is a binary mask representing an interpolation region (interpolation region mask) randomly generates (S24). Then, based on each heat map image X _i included in the mini-batch and the mask MI _i corresponding to the heat map image X _i , the machine learning unit 33 of the CPU 21 calculates a weighted sum of squares error of the following equation (6). The process of updating the parameters (filter and bias) of the image interpolation network I using the loss function of is performed in mini-batch units (S26). This mini-batch unit update process is a process performed to stabilize the learning of the image interpolation network I. The processes of S23 to S26 are repeated until the number of repetitions IT reaches IT _int1 (YES in S25). It is. Loss function of Equation (6) below, the loss function of the weighted squared error considering mask MI _i representative of the interpolation region.

上記式（６）において、Ｌ（Ｘ_ｉ，ＭＩ_ｉ）は、ヒートマップ画像Ｘ_ｉと、このヒートマップ画像Ｘ_ｉに対応するマスクＭＩ_ｉに基づいて、求めた誤差（損失）を表す。また、Ｃ（Ｘ_ｉ，ＭＩ_ｉ）は、入力画像と補間領域マスクに、ヒートマップ画像Ｘ_ｉとマスクＭＩ_ｉを用いたときの、画像補間ネットワークＩを、関数形式で示したものであり、入力画像と補間領域マスクに、ヒートマップ画像Ｘ_ｉとマスクＭＩ_ｉを用いたときの、画像補間ネットワークＩからの出力画像に相当する。また、式（６）における||・||は、ノルムを表す。 In the above formula _{(6), L (X i} , MI i) includes a heat map image _{X i,} based on the mask MI _i corresponding to the heat map image _{X i,} represents the error (loss) obtained. C (X _i , MI _i ) is a functional form of an image interpolation network I when the heat map image X _i and the mask MI _i are used for the input image and the interpolation area mask. the input image and interpolation region mask, when using the heat map image X _i and the mask MI _i, corresponding to the output image from the image interpolation network I. |||| in Equation (6) represents a norm.

なお、上記Ｓ２６におけるパラメータ更新処理では、ＣＰＵ２１の機械学習部３３は、ミニバッチに含まれる各ヒートマップ画像Ｘ_ｉについて、上記式（６）で求めた誤差Ｌ（Ｘ_ｉ，ＭＩ_ｉ）を求めた上で、求めた全ての誤差の平均値を計算し、その勾配の方向にパラメータを更新する。ＣＰＵ２１の機械学習部３３は、学習に使用するヒートマップ画像のミニバッチを変更しながら、上記Ｓ２６のパラメータ更新処理を、ＩＴ_ｉｎｔ１回分繰り返す。 Incidentally, the parameter updating process in S26, the machine learning unit 33 of the CPU21, for each heat map image _{X i} included in the mini-batch, calculated error _{L (X} i, MI _i) obtained by the above formula (6) Above, the average value of all the obtained errors is calculated, and the parameter is updated in the direction of the gradient. The machine learning unit 33 of the CPU 21 repeats the parameter updating process of S26 for one IT _int while changing the mini-batch of the heat map image used for learning.

そして、上記の繰り返し数ＩＴがＩＴ_ｉｎｔ１を超えると（Ｓ２５でＮＯ）、ＣＰＵ２１の機械学習部３３は、一旦、画像補間ネットワークＩのパラメータの更新を停止して、ＩＴ_ｉｎｔ２回分、ディスクリミネータＤのパラメータの更新処理を繰り返す。具体的には、ＣＰＵ２１の機械学習部３３は、繰り返し数ＩＴが（ＩＴ_ｉｎｔ１＋ＩＴ_ｉｎｔ２）になるまで（Ｓ２７でＹＥＳ）、上記Ｓ２３、及びＳ２４の処理に加えて、損失関数に交差エントロピー誤差を用いて、ディスクリミネータＤのパラメータを更新する処理を、ミニバッチ単位で行う（Ｓ２８）。この更新処理は、ミニバッチに含まれる各ヒートマップ画像Ｘ_ｉとマスクＭＩ_ｉを用いたときの、画像補間ネットワークＩからの出力画像（偽物の画像）と、ミニバッチに含まれる各ヒートマップ画像Ｘ_ｉ（本物の画像）の両方を用いて、確率的勾配降下法により行われる。なお、上記の画像補間ネットワークＩからの出力画像（偽物の画像）には、図９における出力ヒートマップ画像を用いてもよいし、補間ヒートマップ画像ＩＨＭ_ｔを用いてもよい。 When the number of repetitions IT exceeds IT _int1 (NO in S25), the machine learning unit 33 of the CPU 21 temporarily stops updating the parameters of the image interpolation network I, and performs the discriminator D for two IT _ints. Is repeated. Specifically, the machine learning unit 33 of the CPU 21 adds the cross entropy error to the loss function in addition to the processing in S23 and S24 until the number of repetitions IT becomes (IT _int1 + IT _int2 ) (YES in S27). The process of updating the parameters of the discriminator D is performed on a mini-batch basis (S28). This updating process includes an output image (fake image) from the image interpolation network I when each heat map image X _i and the mask MI _i included in the mini-batch are used, and each heat map image X _i included in the mini-batch. (Real image) using a stochastic gradient descent method. Incidentally, the output image from the image interpolation network I (fake image) may be using the output heat map image in FIG. 9 may be used an interpolation heat map image IHM _t.

繰り返し数ＩＴが（ＩＴ_ｉｎｔ１＋ＩＴ_ｉｎｔ２）を超えると（Ｓ２７でＮＯ）、ＣＰＵ２１の機械学習部３３は、学習の最終段階として、画像補間ネットワークＩとディスクリミネータＤの両方の学習（訓練）を、一緒に行う段階に入る。ここで、上記の画像補間ネットワークＩとディスクリミネータＤとから構成されるＤＣＧＡＮの目的式は、一般的なＧＡＮの目的式に相当する下記の式（７）と、上記の式（６）の損失関数を組み合わせた、下記の式（８）になる。

If the number of repetitions IT exceeds (IT _int1 + IT _int2 ) (NO in S27), the machine learning unit 33 of the CPU 21 performs learning (training) of both the image interpolation network I and the discriminator D as the final stage of learning. Enter the stage to do together. Here, the objective expression of DCGAN composed of the image interpolation network I and the discriminator D is represented by the following expression (7) corresponding to a general GAN objective expression and the following expression (6). The following equation (8) is obtained by combining the loss functions.

上記式（７）及び式（８）において、Ｅ［・］は、期待値を表す。また、αは、重み付け用のハイパーパラメータである。なお、式（７）の一般的なＧＡＮの目的式における損失関数は、下記の式（９）である。また、上記式（８）における損失関数は、下記の式（１０）である。この式（１０）の損失関数は、上記式（６）の重み付き二乗和誤差の損失関数と、式（９）のＧＡＮの損失関数とを組み合わせた（結合した）損失関数であるので、以下の説明において、式（１０）を、結合損失関数という。

In the above equations (7) and (8), E [•] represents an expected value. Α is a hyperparameter for weighting. The loss function in the general GAN objective expression of Expression (7) is Expression (9) below. The loss function in the above equation (8) is the following equation (10). Since the loss function of the equation (10) is a combination of (combined with) the loss function of the weighted sum of squares error of the equation (6) and the GAN loss function of the equation (9), Equation (10) is referred to as a coupling loss function.

繰り返し数ＩＴが（ＩＴ_ｉｎｔ１＋ＩＴ_ｉｎｔ２）を超えると（Ｓ２７でＮＯ）、ＣＰＵ２１の機械学習部３３は、繰り返し数ＩＴがＩＴ_ｋになるまで、損失関数に上記式（１０）の結合損失関数を用いて、画像補間ネットワークＩとディスクリミネータＤのパラメータを、ミニバッチ単位で更新する処理を行う（Ｓ２９）。この更新処理が１回終了する度に、繰り返し数ＩＴの加算（インクリメント）が行われ、繰り返し数ＩＴがＩＴ_ｋを超えるまで（Ｓ２２でＹＥＳ）、Ｓ２９の更新処理が繰り返される。画像補間ネットワークＩのパラメータ（重みとバイアス）をθ_ｃと表すと、パラメータθ_ｃに関する結合損失関数の確率的勾配（ミニバッチに含まれる全ヒートマップ画像Ｘ_ｉについての「パラメータθ_ｃに関する結合損失関数の勾配」の平均値）は、以下の式（１１）で表される。式（１１）において、ｍは、ミニバッチに含まれるヒートマップ画像Ｘ_ｉの数である。

When the repetition number IT exceeds (IT _int1 + IT _int2 ) (NO in S27), the machine learning unit 33 of the CPU 21 _substitutes the coupling loss function of the above equation (10) into the loss function until the repetition number IT becomes IT _k. Then, a process of updating the parameters of the image interpolation network I and the discriminator D in mini-batch units is performed (S29). Every time the update process is completed once done addition of the number of repetitions IT (increment) is, (YES in S22) until the number of repetitions IT exceeds IT _k, update processing of S29 is repeated. When the parameters (weights and biases) of the image interpolation network I are represented by θ _c , the stochastic gradient of the coupling loss function for the parameter θ _c (“the coupling loss function for the parameter θ _c for all the heat map images X _i included in the mini-batch”) The average value of “slope” is expressed by the following equation (11). In the formula (11), m is the number of heat map image _{X i} included in the mini-batch.

また、ディスクリミネータＤのパラメータをθ_ｄと表すと、パラメータθ_ｄに関する結合損失関数の確率的勾配は、以下の式（１２）で表される。式（１２）におけるｍも、ミニバッチに含まれるヒートマップ画像Ｘ_ｉの数である。

Also, to represent the parameters of the discriminator D and theta _d, stochastic gradient coupling loss function for parameter theta _d is expressed by the following equation (12). M in the formula (12) is also a number of heat map image X _i included in the mini-batch.

上記Ｓ２９のパラメータ更新処理において、ＣＰＵ２１の機械学習部３３は、画像補間ネットワークＩのパラメータθ_ｃについては、上記式（１０）の結合損失関数の値を減らすように、パラメータθ_ｃを勾配方向に微小量だけ更新する。一方、ディスクリミネータＤのパラメータθ_ｄについては、ＣＰＵ２１の機械学習部３３は、上記式（１０）の結合損失関数の値を増やすように、ディスクリミネータＤのパラメータをθ_ｄを、勾配方向に微小量だけ更新する。このような画像補間ネットワークＩのパラメータθ_ｃとディスクリミネータＤのパラメータθ_ｄの更新を繰り返すことにより、画像補間ネットワークＩとディスクリミネータＤの機械学習が完了して、学習済の画像補間ネットワークＩを得ることができる。上記のように、ＣＰＵ２１の機械学習部３３は、ミニバッチを用いた確率的勾配降下法により、画像補間ネットワークＩとディスクリミネータＤの機械学習を行う。 In parameter update processing in S29, the machine learning unit 33 of the CPU21, for parameter theta _c of the image interpolation network I is to reduce the value of the coupling loss function of the above formula (10), the parameter theta _c the gradient direction Update only a small amount. On the other hand, with respect to the parameter θ _d of the discriminator D, the machine learning unit 33 of the CPU 21 sets the parameter of the discriminator D to θ _d in the gradient direction so as to increase the value of the coupling loss function of the above equation (10). Is updated by a very small amount. By repeating the updating of the parameter theta _d parameter theta _c and discriminator D of such image interpolation network I, to machine learning is completed image interpolation network I and discriminator D, image interpolation network Learned I can be obtained. As described above, the machine learning unit 33 of the CPU 21 performs the machine learning of the image interpolation network I and the discriminator D by the stochastic gradient descent method using the mini-batch.

上記のように、本実施形態の学習システム１０によれば、系列に属する店舗のうち、本来配置されるべき全てのエッジカメラ１を備えた他の店舗における、全てのエッジカメラ１による撮影画像から抽出された特徴量に基づいて描画したヒートマップ画像ＨＭ_ｔに基づく学習用画像（マスク付きヒートマップ画像）を用いて、学習完了前の画像補間ネットワークＩ（及びディスクリミネータＤ）の機械学習を行うようにした。ここで、いわゆるチェーン店では、系列に属する各店舗が同じレイアウトを有している。このため、上記のように、本来配置されるべき全てのエッジカメラ１を備えた他の店舗における撮影画像から抽出された特徴量に基づいて描画したヒートマップ画像ＨＭ_ｔに基づく学習用画像を用いて、画像補間ネットワークＩ（及びディスクリミネータＤ）の機械学習を行うことにより、同じ系列に属するある店舗に配置されたエッジカメラ１の数が、本来配置されるべきエッジカメラ１の数よりも少ないために、本来配置されるべきエッジカメラ１のうち、いくつかのエッジカメラ１による撮影画像が欠落している場合でも、これらの撮影画像の欠落により生じるヒートマップ画像（欠落ヒートマップ画像ＨＭ^Ｌ _ｔ）の欠落部分を、学習済の画像補間ネットワークＩを用いて、補間することができる。 As described above, according to the learning system 10 of the present embodiment, among the stores belonging to the series, from the images captured by all the edge cameras 1 in other stores having all the edge cameras 1 to be originally arranged. using the learning image based on the heat map image HM _t drawn based on the extracted feature quantity (heat map image with the mask), the machine learning image interpolation network I before learning completion (and discriminator D) I did it. Here, in a so-called chain store, stores belonging to the affiliate have the same layout. Therefore, as described above, using a learning image based on the heat map image HM _t drawn based on the feature amount extracted from the captured image in another store with all the edge camera 1 should be arranged originally Then, by performing the machine learning of the image interpolation network I (and the discriminator D), the number of the edge cameras 1 arranged at a certain store belonging to the same series is larger than the number of the edge cameras 1 which should be originally arranged. for small, of the edge camera 1 to be arranged originally, some even in the case where the photographed image by the edge camera 1 is missing, heatmap image (missing heat map image HM ^L caused by lack of these captured images The missing part of _t ) can be interpolated using the learned image interpolation network I.

これにより、チェーン店の各店舗に配置するエッジカメラ１の数を、本来配置されるべきエッジカメラ１の数よりも少なくした場合でも、本来配置されるべきエッジカメラ１の数と同じ数のエッジカメラ１を配置した場合と遜色のないヒートマップ画像（補間ヒートマップ画像ＩＨＭ_ｔ）を得ることができる。従って、チェーン店の各店舗に設置するエッジカメラ１の数を抑えて、チェーン店の各店舗の新規開店費用や維持費用を削減することができる。 Thereby, even if the number of edge cameras 1 to be arranged in each chain store is smaller than the number of edge cameras 1 to be arranged, the same number of edge cameras 1 as the number of edge cameras 1 to be arranged is required. A heat map image (interpolated heat map image IHM _t ) comparable to the case where the camera 1 is arranged can be obtained. Therefore, it is possible to reduce the number of edge cameras 1 installed in each chain store, and to reduce the cost of newly opening and maintaining each chain store.

また、本実施形態のサーバ２、及びヒートマップ描画補間プログラム２６によっても、上記と同様な効果を得ることができる。 Further, the same effect as described above can be obtained by the server 2 and the heat map drawing interpolation program 26 of the present embodiment.

また、本実施形態の学習システム１０によれば、複数のエッジカメラ１による撮影画像から抽出された特徴量に基づいて描画する画像（請求項における「特徴量画像」）を、複数のエッジカメラ１による撮影画像から抽出された特徴量を色を用いて可視化したヒートマップ画像としたことにより、従業員等のユーザが、該当の店舗における現時点の特徴を、視覚的に把握することができる。 Further, according to the learning system 10 of the present embodiment, an image to be drawn based on a feature amount extracted from an image captured by a plurality of edge cameras 1 (“feature amount image” in the claims) is output to the plurality of edge cameras 1. By using a feature map extracted from a photographed image by using a heat map image visualized by using colors, a user such as an employee can visually grasp the current feature in the corresponding store.

また、本実施形態の学習システム１０によれば、複数のエッジカメラ１による撮影画像から抽出される特徴量を、複数の種類の特徴量のデータから構成される多次元データとし、この特徴量に基づいて描画する画像（請求項における「特徴量画像」）を、上記の多次元データを色を用いて可視化したカラーのヒートマップ画像とした。これにより、従業員等のユーザが、該当の店舗における現時点の複数種類の特徴を、容易に視覚的に把握することができる。 Further, according to the learning system 10 of the present embodiment, the feature amount extracted from the images captured by the plurality of edge cameras 1 is multi-dimensional data composed of data of a plurality of types of feature amounts. An image to be drawn based on the image (“feature amount image” in the claims) was defined as a color heat map image obtained by visualizing the multidimensional data using colors. As a result, a user such as an employee can easily and visually grasp a plurality of types of characteristics at the present store at the present time.

また、本実施形態の学習システム１０によれば、複数のエッジカメラ１による撮影画像から抽出された所定の時間分の特徴量（例えば、特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔ）に基づいて、ヒートマップ画像を描画するようにした。これにより、ある店舗に配置されたエッジカメラ１の数が、本来配置されるべきエッジカメラ１の数よりも少ないために、撮影画像が欠落している場合でも、これらの撮影画像から抽出された特徴量に基づいて描画する欠落ヒートマップ画像ＨＭ^Ｌ _ｔ、及び学習済の画像補間ネットワークＩを用いた補間後の補間ヒートマップ画像ＩＨＭ_ｔに、現時点ｔよりも前の所定の時間分の特徴量を反映することができる。従って、従業員等のユーザが、複数のエッジカメラ１により撮影した現時点の撮影画像には映っていない特徴を、把握することができる。 Further, according to the learning system 10 of the present embodiment, based on the plurality of features of a predetermined time period which is extracted from the photographed image by the edge camera 1 (for example, feature amount information _{c (t-T) ~c t} ) Draw a heat map image. Accordingly, even if a captured image is missing because the number of edge cameras 1 arranged in a certain store is smaller than the number of edge cameras 1 that should be originally arranged, the camera is extracted from these captured images. In the missing heat map image HM ^L _t drawn based on the feature amount and the interpolated heat map image IHM _t after interpolation using the learned image interpolation network I, the feature amount for a predetermined time period before the present time t. Can be reflected. Therefore, it is possible for a user such as an employee to grasp features that are not reflected in the current captured image captured by the plurality of edge cameras 1.

また、本実施形態の学習システム１０によれば、画像補間ネットワークＩ及びディスクリミネータＤ（請求項における「ニューラルネットワーク」）を、生成モデルの一種であるＤＣＧＡＮのニューラルネットワークとし、特徴量画像補間部３２が、学習済の画像補間ネットワークＩを用いて、欠落ヒートマップ画像ＨＭ^Ｌ _ｔの欠落部分を生成することにより、この欠落部分を補間するようにした。ＤＣＧＡＮは、画像のディープラーニングに適したＣＮＮを使った生成モデルであるため、ＤＣＧＡＮのニューラルネットワークを用いて、欠落ヒートマップ画像ＨＭ^Ｌ _ｔの欠落部分を生成することにより、補間後のヒートマップ画像（補間ヒートマップ画像ＩＨＭ_ｔ）を、妥当な画像にすることができる。 Further, according to the learning system 10 of the present embodiment, the image interpolation network I and the discriminator D (“neural network” in the claims) are a neural network of DCGAN, which is a kind of generation model, and the feature image interpolation unit 32, using the image interpolation network I learned, by generating the missing part of the missing heat map image HM ^L _t, and to interpolate the missing part. DCGAN are the product model using the CNN suitable for deep learning image, using a neural network DCGAN, by generating the missing part of the missing heat map image HM ^L _t, heat map image after the interpolation The (interpolated heat map image IHM _t ) can be an appropriate image.

変形例：
なお、本発明は、上記の各実施形態の構成に限られず、発明の趣旨を変更しない範囲で種々の変形が可能である。次に、本発明の変形例について説明する。 Modification:
The present invention is not limited to the configuration of each of the above embodiments, and various modifications can be made without departing from the spirit of the invention. Next, a modified example of the present invention will be described.

変形例１：
上記の実施形態では、請求項における「特徴量画像」が、複数のエッジカメラ１による撮影画像から抽出された特徴量を色を用いて可視化したヒートマップ画像である場合の例を示したが、撮影画像から抽出される特徴量を、複数のエッジカメラの各々による撮影画像そのものとして、サーバが、複数のエッジカメラの各々による撮影画像に基づいて、これらの画像を組み合わせた画像である店舗内の画像を、特徴量画像として描画するようにしてもよい。この例の場合には、学習済のニューラルネットワーク（上記実施形態の「学習済みの画像補間ネットワークＩ」に相当）は、ある店舗に配置されたエッジカメラの数が、本来配置されるべきエッジカメラの数よりも少ないために、撮影画像が欠落している場合に、欠落した撮影画像を補間することにより、この店舗内の画像を完成させる。 Modification 1
In the above embodiment, an example in which the “feature amount image” in the claims is a heat map image in which feature amounts extracted from images captured by the plurality of edge cameras 1 are visualized using colors has been described. The server extracts the feature amount extracted from the photographed image as the photographed image itself by each of the plurality of edge cameras, based on the photographed image by each of the plurality of edge cameras, and an image obtained by combining these images in the store. The image may be drawn as a feature image. In the case of this example, the learned neural network (corresponding to the “learned image interpolation network I” in the above embodiment) has the number of edge cameras arranged in a certain store, If the captured image is missing because the number is less than the number of images, the image in the store is completed by interpolating the missing captured image.

この変形例１においても、上記実施形態の場合と同様に、補間後の画像を、所定の時間分の特徴量（変形例１の場合は、複数のエッジカメラの各々による所定の時間分の撮影画像）を反映したものにすることが望ましい。ただし、上記実施形態の場合には、補間前の欠落部分を有するヒートマップ画像（欠落ヒートマップ画像ＨＭ^Ｌ _ｔ）自体が、所定の時間分の特徴量（例えば、特徴量情報ｃ_{（ｔ−Ｔ）}〜ｃ_ｔ）に基づいて描画されているので、補間時には、過去の欠落ヒートマップ画像を考慮する必要はなく、現時点の欠落ヒートマップ画像のみを考慮すれば、補間後の画像を、所定の時間分の特徴量を反映したものにすることができる。しかしながら、変形例１では、補間対象になる画像が、撮影画像そのものであるため、補間処理時に、過去の撮影画像を考慮しなければ、補間後の画像を、所定の時間分の特徴量（撮影画像）を反映したものにすることができない。そして、補間後の画像を、所定の時間分の特徴量を反映したものにすることができないと、ある時刻の撮影画像の欠落部分にのみ現れている特徴（例えば、欠落部分に存在する人）を、従業員等のユーザが把握することができない。 In the first modification as well, in the same manner as in the above-described embodiment, the interpolated image is captured by a feature amount for a predetermined time (in the first modification, the image is captured by a plurality of edge cameras for a predetermined time). Image). However, in the case of the above embodiment, heat map image (missing heat map image HM L ^_t) itself having a missing part of the previous interpolation, the feature amount of a predetermined time period (e.g., the feature amount information c _{(t-T )} so to c _t) to be rendered based on, at the time of interpolation, it is not necessary to consider the historical lack heat map image, considering only the missing heat map image of the present time, the image after the interpolation, a predetermined It is possible to reflect the feature amount for time. However, in the first modification, since the image to be interpolated is the photographed image itself, if the past photographed image is not taken into account during the interpolation processing, the image after interpolation is displayed in the feature amount (photographed image) for a predetermined time. Image) cannot be reflected. If the interpolated image cannot reflect the feature amount for a predetermined time, a feature that appears only in a missing portion of a captured image at a certain time (for example, a person existing in the missing portion) Cannot be grasped by a user such as an employee.

従って、この変形例１のように、撮影画像から抽出される特徴量を、複数のエッジカメラの各々による撮影画像そのものとして、サーバが、複数のエッジカメラの各々による撮影画像を組み合わせた画像である店舗内の画像を、特徴量画像として描画する場合には、３ＤＣＮＮ等の時系列方向を意識したＣＮＮ、又は再帰型ニューラルネットワーク（ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いることが望ましい。ここで、再帰型ニューラルネットワークは、現在の入力情報だけではなく、過去の入力情報を、現在の出力情報に反映することが可能なニューラルネットワークである。上記の３ＤＣＮＮ等の時系列方向を意識したＣＮＮ、又は再帰型ニューラルネットワークを用いることにより、補間後の画像を、複数のエッジカメラの各々による所定の時間分の撮影画像を反映したものにすることが可能である。 Therefore, as in the first modification, the feature amount extracted from the captured image is an image in which the server combines the images captured by each of the plurality of edge cameras, as the captured image itself by each of the plurality of edge cameras. When an image in a store is rendered as a feature amount image, it is desirable to use a CNN such as 3DCNN or the like, or a recurrent neural network (RNN), which is a recursive neural network. A neural network is a neural network that can reflect not only current input information but also past input information in current output information, such as the above-described 3DCNN or other CNN that is conscious of a time-series direction, or a recursive type. By using a neural network, multiple images after interpolation It is possible to reflect an image taken for a predetermined time by each of the edge cameras.

変形例２：
上記の実施形態では、請求項におけるニューラルネットワークが、生成モデルの一種であるＤＣＧＡＮである場合の例について示したが、ヒートマップ画像等の特徴量画像の欠落部分の補間に用いるニューラルネットワークは、これに限られず、例えば、ＷＧＡＮ（ＷａｓｓｅｒｓｔｅｉｎＧＡＮ）等の他の種類の生成モデルのニューラルネットワークであってもよいし、生成モデル以外のニューラルネットワークであってもよい。 Modification Example 2:
In the above-described embodiment, an example in which the neural network in the claims is DCGAN, which is a kind of generative model, has been described. However, the neural network used for interpolation of a missing portion of a feature amount image such as a heat map image is However, the neural network may be a neural network of another type of generation model such as WGAN (Wasserstein GAN) or a neural network other than the generation model.

変形例３：
上記の実施形態では、エッジカメラ側が、エッジカメラによる撮影画像から特徴量を抽出する場合の例について示したが、これに限られず、サーバ側が、エッジカメラから送信された撮影画像より特徴量を抽出するようにしてもよい。また、上記の実施形態では、クラウド上のサーバが、エッジカメラによる撮影画像から抽出された特徴量に基づいて、特徴量画像を描画する場合の例について示したが、店舗内に配置されたサーバが、上記の特徴量画像の描画処理を行って、描画した特徴量画像を、クラウド上のサーバに送信するようにしてもよい。 Modification 3:
In the above-described embodiment, an example in which the edge camera side extracts a feature amount from an image captured by the edge camera has been described. However, the present invention is not limited to this, and the server side extracts a feature amount from a captured image transmitted from the edge camera. You may make it. Further, in the above embodiment, an example in which the server on the cloud draws the feature amount image based on the feature amount extracted from the image captured by the edge camera has been described. However, the above-described feature amount image drawing processing may be performed, and the drawn feature amount image may be transmitted to a server on the cloud.

変形例４：
上記の実施形態では、請求項における「特徴量」が、３種類の特徴量のデータ（男性人数_ｉｔ、女性人数_ｉｔ、滞留時間_ｉｔ）から構成される３次元データである場合の例について示したが、請求項における「特徴量」は、何次元のデータであってもよく、例えば、４種類の特徴量のデータから構成される４次元データであってもよい。 Modification 4:
In the above-described embodiment, an example has been described in which the “feature amount” in the claims is three-dimensional data composed of data of three types of feature amounts (number of men _it , number of women _it , stay time _it ). However, the “feature amount” in the claims may be data of any dimension, for example, four-dimensional data composed of data of four types of feature amounts.

１エッジカメラ（カメラ）
２サーバ
１０学習システム
２６ヒートマップ描画補間プログラム（特徴量画像描画補間プログラム）
３１描画処理部
３２特徴量画像補間部
３３機械学習部
Ｔ所定の時間 1 edge camera (camera)
2 server 10 learning system 26 heat map drawing interpolation program (feature image drawing interpolation program)
31 drawing processing unit 32 feature amount image interpolation unit 33 machine learning unit T predetermined time

Claims

A learning system including a plurality of cameras arranged in stores of chain stores in which stores belonging to the affiliate have the same layout, and a server that communicates with the plurality of cameras,
The server is
A drawing processing unit that draws a feature amount image based on the feature amount extracted from the images captured by the plurality of cameras,
When the number of cameras arranged in a certain store of the chain store is smaller than the number of cameras that should be originally arranged, when the photographed images are missing, the feature caused by the lack of these photographed images. A feature amount image interpolating unit that interpolates the missing portion of the feature amount image by applying a learned neural network to the entire feature amount image including the missing portion of the amount image;
In another store belonging to the series having a camera lacking in the certain store among the cameras to be originally arranged, a feature amount extracted from images captured by all cameras arranged in the other store is used. Using a learning image based on the feature amount image drawn by the drawing processing unit based on the machine learning unit that performs machine learning of the neural network before learning is completed,
A learning system with

The other store is a store provided with all cameras to be originally arranged, among stores other than the certain store belonging to the affiliate,
The machine learning unit is configured to perform learning based on the feature amount image drawn by the drawing processing unit based on feature amounts extracted from images captured by all the cameras to be originally arranged, which are arranged in the another store. The learning system according to claim 1, wherein machine learning of the neural network before learning is completed is performed using an image for learning.

The learning system according to claim 1, wherein the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.

The feature amount is multidimensional data composed of data of a plurality of types of feature amounts,
The learning system according to claim 3, wherein the feature amount image is a color heat map image obtained by visualizing the multidimensional data using colors.

5. The heat map image according to claim 3, wherein the drawing processing unit draws the heat map image based on the feature amount for a predetermined time extracted from images captured by the plurality of cameras. 6. Learning system.

The feature amount is an image itself captured by each of the plurality of cameras,
The drawing processing unit in the server, based on the images captured by each of the plurality of cameras, draws an in-store image that is an image obtained by combining these images, as a feature amount image,
The learned neural network, when the number of cameras arranged in the certain store is smaller than the number of cameras to be originally arranged, if the photographed image is missing, the missing photographed image The learning system according to claim 1, wherein an image in the certain store is completed by interpolation.

The neural network is a neural network of a generative model, and the feature image interpolating unit interpolates the missing portion by generating a missing portion of the feature image using the learned neural network. The learning system according to any one of claims 1 to 6, wherein:

A server that can communicate with a plurality of cameras arranged in a store where a store belonging to the affiliate has a chain store having the same layout,
A drawing processing unit that draws a feature amount image based on the feature amount extracted from the images captured by the plurality of cameras,
When the number of cameras arranged in a certain store of the chain store is smaller than the number of cameras that should be originally arranged, when the photographed images are missing, the feature caused by the lack of these photographed images. A feature amount image interpolating unit that interpolates the missing portion of the feature amount image by applying a learned neural network to the entire feature amount image including the missing portion of the amount image;
In another store belonging to the series having a camera lacking in the certain store among the cameras to be originally arranged, a feature amount extracted from images captured by all cameras arranged in the other store is used. Using a learning image based on the feature amount image drawn by the drawing processing unit based on the machine learning unit that performs machine learning of the neural network before learning is completed,
Server comprising:

9. The server according to claim 8, wherein the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.

Computer
A drawing processing unit that draws a feature amount image based on feature amounts extracted from images captured by a plurality of cameras arranged in a store where a chain store having the same layout has a store belonging to the series;
When the number of cameras arranged in a certain store of the chain store is smaller than the number of cameras that should be originally arranged, when the photographed images are missing, the feature caused by the lack of these photographed images. A feature amount image interpolating unit that interpolates the missing portion of the feature amount image by applying a learned neural network to the entire feature amount image including the missing portion of the amount image;
In another store belonging to the series having a camera lacking in the certain store among the cameras to be originally arranged, a feature amount extracted from images captured by all cameras arranged in the other store is used. A feature amount image drawing interpolation program for functioning as a machine learning unit that performs machine learning of the neural network before completion of learning using a learning image based on the feature amount image drawn by the drawing processing unit based on the drawing.

The computer-readable storage medium according to claim 10, wherein the feature amount image is a heat map image in which feature amounts extracted from images captured by the plurality of cameras are visualized using colors.