JP7036401B2

JP7036401B2 - Learning server, image collection support system for insufficient learning, and image estimation program for insufficient learning

Info

Publication number: JP7036401B2
Application number: JP2018086457A
Authority: JP
Inventors: 安紘土田
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-04-27
Filing date: 2018-04-27
Publication date: 2022-03-15
Anticipated expiration: 2038-04-27
Also published as: JP2019192082A

Description

本発明は、学習用サーバ、不足学習用画像収集支援システム、及び不足学習用画像推定プログラムに関し、より詳細には、特定の物体を認識するためのニューラルネットワークの機械学習に関する。 The present invention relates to a learning server, an image collection support system for under-learning, and an image estimation program for under-learning, and more particularly to machine learning of a neural network for recognizing a specific object.

従来から、畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ：ＣＮＮ）等のニューラルネットワークは、手書き数字の認識等の入力画像のクラス分類（物体認識）に利用されている。また、ＣＮＮ等のニューラルネットワークは、上記の物体認識の応用である物体検出にも利用されている。この物体検出は、画像中における物体の位置と種類（クラス）を特定する処理である。 Conventionally, neural networks such as convolutional neural networks (CNN) have been used for class classification (object recognition) of input images such as recognition of handwritten numbers. In addition, neural networks such as CNN are also used for object detection, which is an application of the above-mentioned object recognition. This object detection is a process of specifying the position and type (class) of an object in an image.

上記の物体検出を行うプログラム（物体検出エンジン）として、Ｒ－ＣＮＮベースの物体検出エンジンが知られている（例えば、特許文献１の背景技術等参照）。このＲ－ＣＮＮベースの物体検出エンジンは、主に、オブジェクトらしい領域を抽出するための候補領域抽出部と、候補領域抽出部で抽出した各領域についてクラス分類（物体認識）を行うためのＣＮＮとから構成されている。このＲ－ＣＮＮベースの物体検出エンジンは、例えば、小売店の商品棚における商品タグの検出に用いることができる。 An R-CNN-based object detection engine is known as a program (object detection engine) for performing the above object detection (see, for example, background technology of Patent Document 1). This R-CNN-based object detection engine mainly has a candidate area extraction unit for extracting an object-like area and a CNN for classifying (object recognition) each area extracted by the candidate area extraction unit. It is composed of. This R-CNN-based object detection engine can be used, for example, to detect product tags on product shelves in retail stores.

特開２０１８－２２４８４号公報Japanese Unexamined Patent Publication No. 2018-22484

ところが、上記の物体認識（クラス分類）を行うニューラルネットワークには、現場のユーザが、ある物体を認識するための機械学習を完了するのに必要な学習用画像の内容や数量を知ることができないという問題がある。このため、上記の物体認識用のニューラルネットワークの再学習を行う度に、ディープラーニングに関する知識のあるＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）技術者が必要になる。 However, in the neural network that performs the above object recognition (classification), the user in the field cannot know the content and quantity of the learning image required to complete the machine learning for recognizing a certain object. There is a problem. Therefore, every time the above-mentioned neural network for object recognition is relearned, an AI (Artificial Intelligence) engineer who has knowledge about deep learning is required.

例えば、上記の例のように、Ｒ－ＣＮＮベースの物体検出エンジンを、小売店の商品棚における商品タグの検出に用いた場合には、小売店の店員（ユーザ）は、商品タグを認識するための（ニューラルネットワークのパラメータの）機械学習を完了するのに必要な学習用画像の内容や数量を知ることができなかった。ここで、小売店では、商品タグの入れ替え時に、商品タグ検出用の物体検出エンジン（中のニューラルネットワークにおけるパラメータ）の再学習が必要になる場合がある。そして、従来は、この物体検出エンジンの再学習を行う度に、ディープラーニングに関する知識のあるＡＩ技術者が必要であった。 For example, when an R-CNN-based object detection engine is used to detect a product tag on a retail store's product shelf as in the above example, the retail store clerk (user) recognizes the product tag. It was not possible to know the content or quantity of training images needed to complete machine learning (of the parameters of the neural network) for. Here, in the retail store, it may be necessary to relearn the object detection engine (parameter in the neural network inside) for detecting the product tag when the product tag is replaced. And, conventionally, every time this object detection engine is relearned, an AI engineer who has knowledge about deep learning is required.

本発明は、上記課題を解決するものであり、特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な学習用画像の内容と数量を、ユーザに知らせることができるようにして、ディープラーニングに関する知識のないユーザでも、容易に、上記の機械学習を完了するのに必要な学習用画像を作成することが可能な学習用サーバ、不足学習用画像収集支援システム、及び不足学習用画像推定プログラムを提供することを目的とする。 The present invention solves the above-mentioned problems, and makes it possible to inform the user of the content and quantity of the learning image required to complete the machine learning of the neural network for recognizing a specific object. A learning server, an image collection support system for under-learning, and an under-learning that can easily create the learning images necessary to complete the above machine learning even for users who have no knowledge of deep learning. It is an object of the present invention to provide an image estimation program for use.

上記課題を解決するために、本発明の第1の態様による学習用サーバは、学習用画像を含む入力画像を取得する画像取得部と、前記画像取得部により取得した学習用画像に基づいて、特定の物体を認識するためのニューラルネットワークの機械学習を行う機械学習部と、前記ニューラルネットワークが、前記特定の物体の認識において、現時点で着目している前記入力画像中の着目領域を抽出する着目領域抽出部と、前記入力画像における、前記特定の物体を判別するための特徴部分を記憶する特徴部分記憶部と、前記着目領域抽出部により抽出された前記着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記機械学習を完了するのに必要な学習用画像の内容と数量を推定する不足学習用画像推定部とを備える。 In order to solve the above problem, the learning server according to the first aspect of the present invention is based on an image acquisition unit that acquires an input image including a learning image and a learning image acquired by the image acquisition unit. A machine learning unit that performs machine learning of a neural network for recognizing a specific object, and a focus that the neural network extracts a region of interest in the input image that is currently being focused on in recognizing the specific object. In the region extraction unit, the feature portion storage unit for storing the feature portion for discriminating the specific object in the input image, the focus region extracted by the focus region extraction unit, and the feature portion storage unit. Based on the stored feature portion, the learning image estimation unit for estimating the content and quantity of the learning image required to complete the machine learning is provided.

この学習用サーバにおいて、前記ニューラルネットワークは、前記入力画像がどのクラスに属するかを分類する画像分類器であり、前記画像分類器による分類先のクラスには、前記特定の物体に対応する特定クラスが含まれ、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記特定クラスについての機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this learning server, the neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier includes a specific class corresponding to the specific object. Is included, and the image estimation unit for insufficient learning is machine learning about the specific class based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. You may try to estimate the content and quantity of the training image required to complete.

この学習用サーバにおいて、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分との一致度に基づいて、前記ニューラルネットワークの機械学習の進捗度を算出し、この進捗度と、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this learning server, the lack learning image estimation unit is the neural based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. The progress of machine learning of the network is calculated, and the machine learning is performed based on the progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit. You may try to estimate the content and quantity of the learning image required to complete.

本発明の第２の態様による不足学習用画像収集支援システムは、情報処理端末と、前記情報処理端末にネットワークを介して接続された学習用サーバとを備え、前記情報処理端末は、学習用画像を含む入力画像を撮影する撮影部と、前記撮影部により撮影された入力画像における、特定の物体を判別するための特徴部分の指示入力操作を行うための操作部と、前記操作部を用いてユーザにより指示された前記特徴部分と、前記撮影部により撮影した、前記学習用画像を含む前記入力画像とを、前記学習用サーバに送信する端末側送信部と、前記学習用サーバから受信した不足学習用画像情報に基づいて、前記特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な、学習用画像の内容と数量をユーザに提示する必要学習用画像提示部とを備え、前記学習用サーバは、前記端末側送信部により送信された、前記学習用画像を含む前記入力画像を受信する画像受信部と、前記画像受信部により受信した前記学習用画像に基づいて、前記ニューラルネットワークの機械学習を行う機械学習部と、前記ニューラルネットワークが、前記特定の物体の認識において、現時点で着目している前記入力画像中の着目領域を抽出する着目領域抽出部と、前記端末側送信部により送信された前記特徴部分を記憶する特徴部分記憶部と、前記着目領域抽出部により抽出された前記着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記機械学習を完了するのに必要な学習用画像の内容と数量を推定する不足学習用画像推定部と、前記不足学習用画像推定部により推定された、前記必要な学習用画像の内容と数量を、前記不足学習用画像情報として、前記情報処理端末に送信するサーバ側送信部とを備える。 The image collection support system for insufficient learning according to the second aspect of the present invention includes an information processing terminal and a learning server connected to the information processing terminal via a network, and the information processing terminal is a learning image. Using the photographing unit for capturing an input image including, an operation unit for performing an instruction input operation of a feature portion for discriminating a specific object in the input image captured by the photographing unit, and the operation unit. The terminal-side transmitting unit that transmits the feature portion instructed by the user and the input image including the learning image taken by the photographing unit to the learning server, and a shortage of receiving from the learning server. A learning image presentation unit that presents the content and quantity of the learning image to the user, which is necessary to complete the machine learning of the neural network for recognizing the specific object based on the learning image information. The learning server is based on an image receiving unit that receives the input image including the learning image transmitted by the terminal-side transmitting unit and the learning image received by the image receiving unit. A machine learning unit that performs machine learning of the neural network, a region of interest extraction unit that extracts the region of interest in the input image that the neural network is currently focusing on in recognizing the specific object, and the terminal. Based on the feature portion storage unit that stores the feature portion transmitted by the side transmission unit, the focus area extracted by the focus area extraction unit, and the feature portion stored in the feature portion storage unit. The content and quantity of the required learning image estimated by the under-learning image estimation unit that estimates the content and quantity of the learning image required to complete the machine learning, and the under-learning image estimation unit. Is provided as a server-side transmission unit that transmits the image information for insufficient learning to the information processing terminal.

この不足学習用画像収集支援システムにおいて、前記ニューラルネットワークは、前記入力画像がどのクラスに属するかを分類する画像分類器であり、前記画像分類器による分類先のクラスには、前記特定の物体に対応する特定クラスが含まれ、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記特定クラスについての機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this under-learning image collection support system, the neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier includes the specific object. The corresponding specific class is included, and the image estimation unit for lack learning is based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. You may try to estimate the content and quantity of the learning image needed to complete the machine learning about.

この不足学習用画像収集支援システムにおいて、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分との一致度に基づいて、前記ニューラルネットワークの機械学習の進捗度を算出し、この進捗度と、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this under-learning image collection support system, the under-learning image estimation unit is based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. Then, the progress of machine learning of the neural network is calculated, and based on this progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit, The content and quantity of the learning image required to complete the machine learning may be estimated.

本発明の第３の態様による不足学習用画像推定プログラムは、コンピュータを、学習用画像を含む入力画像を取得する画像取得部と、前記特定の物体を認識するためのニューラルネットワークが、前記特定の物体の認識において、現時点で着目している前記入力画像中の着目領域を抽出する着目領域抽出部と、前記入力画像における、前記特定の物体を判別するための特徴部分を記憶する特徴部分記憶部と、前記着目領域抽出部により抽出された前記着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な、学習用画像の内容と数量を推定する不足学習用画像推定部として機能させる。 In the image estimation program for insufficient learning according to the third aspect of the present invention, the computer has an image acquisition unit that acquires an input image including a learning image, and a neural network for recognizing the specific object. In the recognition of an object, a region of interest extraction unit that extracts a region of interest in the input image that is currently being focused on, and a feature portion storage unit that stores a feature portion of the input image for discriminating the specific object. And, based on the area of interest extracted by the area of interest extraction unit and the feature portion stored in the feature portion storage unit, machine learning of the neural network for recognizing the specific object is completed. It functions as an image estimation unit for insufficient learning that estimates the content and quantity of the image for learning, which is necessary for the learning.

この不足学習用画像推定プログラムにおいて、前記ニューラルネットワークは、前記入力画像がどのクラスに属するかを分類する画像分類器であり、前記画像分類器による分類先のクラスには、前記特定の物体に対応する特定クラスが含まれ、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記特定クラスについての機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this under-learning image estimation program, the neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier corresponds to the specific object. The specific class is included, and the image estimation unit for lack learning is about the specific class based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. You may want to estimate the content and quantity of the training images needed to complete the machine learning.

この不足学習用画像推定プログラムにおいて、前記不足学習用画像推定部は、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分との一致度に基づいて、前記ニューラルネットワークの機械学習の進捗度を算出し、この進捗度と、前記着目領域抽出部により抽出された着目領域と、前記特徴部分記憶部に記憶された前記特徴部分とに基づいて、前記機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにしてもよい。 In this under-learning image estimation program, the under-learning image estimation unit is based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. , The progress of machine learning of the neural network is calculated, and the progress is based on the progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit. You may try to estimate the content and quantity of the learning image needed to complete the machine learning.

本発明の第１の態様による学習用サーバ、及び第３の態様による不足学習用画像推定プログラムによれば、特定の物体を判別するための特徴部分と、この特定の物体の認識において、ニューラルネットワークが現時点で着目している入力画像中の着目領域とに基づいて、この特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な学習用画像の内容と数量を推定することができる。これにより、上記の機械学習を完了するのに必要な学習用画像の内容と数量を、ユーザに知らせることが可能になるので、ディープラーニングに関する知識のないユーザでも、容易に、上記の機械学習を完了するのに必要な学習用画像を作成することが可能になる。従って、上記のニューラルネットワークの再学習を行う度に、ディープラーニングに関する知識のあるＡＩ技術者が必要になるという状況を、回避することができる。 According to the learning server according to the first aspect of the present invention and the insufficient learning image estimation program according to the third aspect, the feature portion for discriminating a specific object and the neural network in recognizing the specific object. To estimate the content and quantity of the learning image required to complete the machine learning of the neural network to recognize this particular object, based on the region of interest in the input image that is currently being focused on. Can be done. This makes it possible to inform the user of the content and quantity of the learning image required to complete the above machine learning, so that even a user who has no knowledge of deep learning can easily perform the above machine learning. It will be possible to create the learning images needed to complete. Therefore, it is possible to avoid the situation where an AI engineer who has knowledge about deep learning is required every time the above neural network is relearned.

本発明の第２の態様による不足学習用画像収集支援システムによれば、学習用サーバが、ユーザにより指示された、特定の物体を判別するための特徴部分と、この特定の物体の認識において、ニューラルネットワークが現時点で着目している入力画像中の着目領域とに基づいて、この特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な学習用画像の内容と数量を推定して、推定した学習用画像の内容と数量を、不足学習用画像情報として情報処理端末に送信する。そして、情報処理端末が、学習用サーバから受信した不足学習用画像情報に基づいて、特定の物体を認識するためのニューラルネットワークの機械学習を完了するのに必要な、学習用画像の内容と数量を、ユーザに提示する（上記の機械学習を完了するのに必要な学習用画像の内容と数量を、情報処理端末を保持するユーザに知らせる）。これにより、ディープラーニングに関する知識のないユーザでも、上記の情報処理端末が提示した学習用画像の内容と数量に基づいて、容易に、上記の機械学習を完了するのに必要な学習用画像を作成することが可能になる。従って、上記のニューラルネットワークの再学習を行う度に、ディープラーニングに関する知識のあるＡＩ技術者が必要になるという状況を、回避することができる。 According to the image collection support system for insufficient learning according to the second aspect of the present invention, the learning server has a feature portion for discriminating a specific object, which is instructed by the user, and recognition of the specific object. Estimate the content and quantity of the training image required to complete the machine learning of the neural network to recognize this particular object, based on the region of interest in the input image that the neural network is currently focusing on. Then, the content and quantity of the estimated learning image are transmitted to the information processing terminal as insufficient learning image information. Then, the content and quantity of the learning image required for the information processing terminal to complete the machine learning of the neural network for recognizing a specific object based on the insufficient learning image information received from the learning server. (Inform the user holding the information processing terminal of the content and quantity of the learning image required to complete the above machine learning). As a result, even a user who has no knowledge about deep learning can easily create a learning image necessary to complete the above machine learning based on the content and quantity of the learning image presented by the above information processing terminal. It will be possible to do. Therefore, it is possible to avoid the situation where an AI engineer who has knowledge about deep learning is required every time the above neural network is relearned.

本発明の一実施形態の不足学習用画像推定プログラムを実装した学習用サーバを含む、不足学習用画像収集支援システムの概略のブロック構成図。FIG. 6 is a schematic block configuration diagram of an image collection support system for insufficient learning, including a learning server that implements an image estimation program for insufficient learning according to an embodiment of the present invention. 同学習用サーバとスマートフォンの機能ブロック構成図。Functional block configuration diagram of the learning server and smartphone. 同学習用サーバにおける不足学習用画像推定処理のフローチャート。The flowchart of the image estimation process for lack learning in the learning server. 同学習用サーバにおける、物体検出エンジンの概略構成と、着目領域抽出部の詳細処理の説明図。Schematic diagram of the schematic configuration of the object detection engine and the detailed processing of the region of interest extraction unit in the learning server. 上記物体検出エンジンの画像分類器に商品タグの認識の再学習をさせた場合における、今までの商品タグの例の正面図。The front view of the example of the product tag so far in the case where the image classifier of the object detection engine is made to relearn the recognition of the product tag. 上記物体検出エンジンの画像分類器に商品タグの認識の再学習をさせた場合における、新しい商品タグの例の正面図。The front view of the example of the new product tag in the case where the image classifier of the object detection engine is made to relearn the recognition of the product tag. 上記スマートフォンにおいて、ユーザが行う特徴部分の指示入力操作の例の説明図。An explanatory diagram of an example of an instruction input operation of a feature portion performed by a user in the above smartphone. 上記スマートフォンのディスプレイに表示される、機械学習の完了に必要な学習用画像の内容と数量の例を示す図。The figure which shows the example of the content and quantity of the learning image necessary for the completion of machine learning displayed on the display of the said smartphone.

以下、本発明を具体化した実施形態による学習用サーバ、不足学習用画像収集支援システム、及び不足学習用画像推定プログラムについて、図面を参照して説明する。図１は、本実施形態による不足学習用画像収集支援システム１０を構成する、学習用サーバ１（請求項における「学習用サーバ」、及び「コンピュータ」）とスマートフォン２（請求項における「情報処理端末」）の概略の内部構成を示すブロック図である。図中の学習用サーバ１は、装置全体の制御と各種演算を行うＣＰＵ１１（請求項における「機械学習部」、「着目領域抽出部」、及び「不足学習用画像推定部」）を備えている。また、学習用サーバ１は、通信部１２（請求項における「画像取得部」、「画像受信部」、及び「サーバ側送信部」）を有しており、通信部１２とネットワーク（例えば、インターネット）とを介して、スマートフォン２と接続されている。通信部１２は、通信用ＩＣを備えている。 Hereinafter, a learning server, an image collection support system for insufficient learning, and an image estimation program for insufficient learning according to an embodiment embodying the present invention will be described with reference to the drawings. FIG. 1 shows a learning server 1 (“learning server” and “computer” in the claim) and a smartphone 2 (“information processing terminal” in the claim) constituting the insufficient learning image collection support system 10 according to the present embodiment. It is a block diagram which shows the schematic internal structure of). The learning server 1 in the figure includes a CPU 11 (“machine learning unit”, “focused area extraction unit”, and “deficiency learning image estimation unit” in the claims) that controls the entire device and performs various calculations. .. Further, the learning server 1 has a communication unit 12 (“image acquisition unit”, “image reception unit”, and “server-side transmission unit” in the claims), and has a communication unit 12 and a network (for example, the Internet). ) And is connected to the smartphone 2. The communication unit 12 includes a communication IC.

また、学習用サーバ１は、各種のプログラムやデータを記憶するハードディスク１３と、各種のプログラムの実行時に、実行するプログラムやデータをロードするＲＡＭ１４とを備えている。上記のハードディスク１３には、物体検出エンジン１６と、特徴部分ＤＢ１８（請求項における「特徴部分記憶部」）と、不足学習用画像推定プログラム１９とが格納されている。 Further, the learning server 1 includes a hard disk 13 for storing various programs and data, and a RAM 14 for loading the programs and data to be executed when the various programs are executed. The object detection engine 16, the feature portion DB 18 (“feature portion storage unit” in the claims), and the image estimation program 19 for insufficient learning are stored in the hard disk 13.

上記の物体検出エンジン１６は、例えば、Ｒ－ＣＮＮ（ＲｅｇｉｏｎｓｗｉｔｈＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｆｅａｔｕｒｅｓ）ベースの物体検出エンジンである。この物体検出エンジン１６は、重みやバイアス等のパラメータデータ１７を有している。なお、本明細書において、「エンジン」とは、情報処理装置を使用して様々な情報処理を行う、一種のプログラムを意味する。 The object detection engine 16 is, for example, an R-CNN (Regions with Convolutional Neural Network features) -based object detection engine. The object detection engine 16 has parameter data 17 such as weights and biases. In addition, in this specification, "engine" means a kind of program which performs various information processing using an information processing apparatus.

また、上記の特徴部分ＤＢ１８は、スマートフォン２側から送信された、特定の物体を判別するための（画像中の）特徴部分を記憶する。より詳細に言うと、特徴部分ＤＢ１８には、物体検出エンジン１６に含まれる画像分類器の分類先の各クラスについての、ユーザが考える（画像中の）特徴部分が記憶される。また、上記の不足学習用画像推定プログラム１９は、上記物体検出エンジン１６に含まれる画像分類器が、（上記の特定の物体に対応する）特定クラスについての機械学習を完了するのに必要な学習用画像の内容と数量を推定するためのプログラムである。 Further, the feature portion DB 18 described above stores a feature portion (in an image) for discriminating a specific object transmitted from the smartphone 2 side. More specifically, the feature portion DB 18 stores the feature portion (in the image) considered by the user for each class of the classification destination of the image classifier included in the object detection engine 16. Further, the image estimation program 19 for insufficient learning is required for the image classifier included in the object detection engine 16 to complete machine learning for a specific class (corresponding to the specific object). This is a program for estimating the content and quantity of images for use.

一方、スマートフォン２は、装置全体の制御と各種演算を行うＣＰＵ２１と、通信部２２（請求項における「端末側送信部」）とを備えている。通信部２２は、通信ＩＣとアンテナを備えている。スマートフォン２は、通信部２２とネットワークとを介して、学習用サーバ１と接続されている。 On the other hand, the smartphone 2 includes a CPU 21 that controls the entire device and performs various operations, and a communication unit 22 (“terminal-side transmission unit” in the claim). The communication unit 22 includes a communication IC and an antenna. The smartphone 2 is connected to the learning server 1 via the communication unit 22 and the network.

また、スマートフォン２は、各種のデータやプログラムを記憶するメモリ２３を備えている。メモリ２３に記憶されているプログラムには、必要学習用画像提示プログラム２４が含まれている。この必要学習用画像提示プログラム２４の詳細については、後述する。 Further, the smartphone 2 is provided with a memory 23 for storing various data and programs. The program stored in the memory 23 includes the necessary learning image presentation program 24. The details of the required learning image presentation program 24 will be described later.

また、スマートフォン２は、カメラ２７（請求項における「撮影部」）と、ディスプレイ２８と、操作ボタン２９と、マイクロフォン３０と、スピーカ３１と、二次電池３２とを備えている。カメラ２７は、物体検出エンジン１６内の画像分類器の機械学習に用いられる学習用画像を含む、（物体検出エンジン１６への）入力画像の撮影に用いられる。 Further, the smartphone 2 includes a camera 27 (“photographing unit” in the claim), a display 28, an operation button 29, a microphone 30, a speaker 31, and a secondary battery 32. The camera 27 is used to capture an input image (to the object detection engine 16), including a learning image used for machine learning of the image classifier in the object detection engine 16.

上記のディスプレイ２８は、いわゆるタッチパネルタイプのディスプレイであり、ユーザが、上記の入力画像における、特定の物体を判別するための特徴部分を指示入力する際に用いられる。従って、ディスプレイ２８は、請求項における「操作部」に相当する。また、ディスプレイ２８は、後述する画像分類器の機械学習を完了するのに必要な、学習用画像の内容と数量の表示（提示）に用いられる。操作ボタン２９は、ユーザによる電源オン／オフ等の指示入力に用いられる。なお、タッチパネルタイプのディスプレイ２８の代わりに、操作ボタン２９を、上記の特徴部分の指示入力に用いてもよいし、メモリ２３に音声指示用のプログラムを格納して、この音声指示用のプログラムとマイクロフォン３０とを用いて、ユーザによる音声指示により、上記の特徴部分の指示入力を行ってもよい。また、二次電池３２は、リチウムイオン電池等の、充電により繰り返し使用することが可能な電池であり、スマートフォン２の各部に電力を供給する。 The display 28 is a so-called touch panel type display, and is used when a user instructs and inputs a feature portion for discriminating a specific object in the input image. Therefore, the display 28 corresponds to the "operation unit" in the claims. Further, the display 28 is used for displaying (presenting) the content and quantity of the learning image necessary for completing the machine learning of the image classifier described later. The operation button 29 is used for inputting instructions such as power on / off by the user. In addition, instead of the touch panel type display 28, the operation button 29 may be used for the instruction input of the above-mentioned feature portion, or the program for voice instruction is stored in the memory 23, and the program for this voice instruction is used. Using the microphone 30, the instruction input of the above-mentioned feature portion may be performed by the voice instruction by the user. Further, the secondary battery 32 is a battery such as a lithium ion battery that can be repeatedly used by charging, and supplies electric power to each part of the smartphone 2.

図２は、上記の学習用サーバ１側の機能ブロックと、スマートフォン２側の機能ブロックを示す。学習用サーバ１側のＣＰＵ１１内の各ブロック（機械学習部４３、着目領域抽出部４４、不足学習用画像推定部４５）の機能は、ＣＰＵ１１が不足学習用画像推定プログラム１９を実行することにより実現される。また、スマートフォン２側のＣＰＵ２１内の各ブロック（学習用画像取得部４１、特徴部分登録部４６、必要学習用画像提示部４７）の機能は、ＣＰＵ２１が必要学習用画像提示プログラム２４を実行することにより実現される。ただし、この構成に限られず、例えば、上記のＣＰＵ１１及びＣＰＵ２１における各ブロックの機能の少なくとも一つを、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等によって構成される個別のハードウェアによって実現してもよい。なお、図２中の画像受信部４２は、請求項における「画像受信部」と「画像取得部」に相当する。 FIG. 2 shows the functional block on the learning server 1 side and the functional block on the smartphone 2 side. The functions of each block (machine learning unit 43, focus area extraction unit 44, insufficient learning image estimation unit 45) in the CPU 11 on the learning server 1 side are realized by the CPU 11 executing the insufficient learning image estimation program 19. Will be done. Further, the function of each block (learning image acquisition unit 41, feature portion registration unit 46, required learning image presentation unit 47) in the CPU 21 on the smartphone 2 side is that the CPU 21 executes the required learning image presentation program 24. Is realized by. However, the present invention is not limited to this configuration, and for example, at least one of the functions of each block in the CPU 11 and the CPU 21 may be realized by individual hardware configured by an ASIC (Application Specific Integrated Circuit) or the like. The image receiving unit 42 in FIG. 2 corresponds to the "image receiving unit" and the "image acquisition unit" in the claims.

次に、上記図２に加えて、図３のフローチャートを参照して、不足学習用画像収集支援システム１０において行われる不足学習用画像提示処理について説明する。図３は、学習用サーバ１が行う不足学習用画像推定処理のフローチャートである。 Next, in addition to FIG. 2, the image presentation process for insufficient learning performed in the image collection support system 10 for insufficient learning will be described with reference to the flowchart of FIG. FIG. 3 is a flowchart of the image estimation process for insufficient learning performed by the learning server 1.

ユーザが、図２に示すカメラ２７を用いて、学習用画像等の入力画像を撮影すると、スマートフォン２の学習用画像取得部４１は、カメラ２７から学習用画像等の入力画像を取得して、この入力画像を、通信部２２（図１参照）により学習用サーバ１に送信する。ここで、上記の入力画像には、学習用画像（訓練データ又は教師データ）と、特定の物体が写りこんだテスト用画像（テストデータ）とが含まれる。ただし、学習用画像のうちの１枚を、テスト用画像として用いてもよい。以下の説明では、学習用画像とテスト用画像とを分けた場合の例を説明する。 When the user captures an input image such as a learning image using the camera 27 shown in FIG. 2, the learning image acquisition unit 41 of the smartphone 2 acquires the input image such as the learning image from the camera 27. This input image is transmitted to the learning server 1 by the communication unit 22 (see FIG. 1). Here, the above input image includes a learning image (training data or teacher data) and a test image (test data) in which a specific object is reflected. However, one of the learning images may be used as a test image. In the following description, an example in which the learning image and the test image are separated will be described.

学習用サーバ１の画像受信部４２（通信部１２に相当）は、上記のスマートフォン２側の通信部２２により送信された入力画像を受信する（図３のＳ１）。受信した入力画像が学習用画像のときには（図３のＳ２でＹＥＳ）、学習用サーバ１の機械学習部４３は、受信した学習用画像に基づいて、画像分類器（図４参照）の（パラメータデータ１７の）機械学習を行う（図３のＳ３）。 The image receiving unit 42 (corresponding to the communication unit 12) of the learning server 1 receives the input image transmitted by the communication unit 22 on the smartphone 2 side (S1 in FIG. 3). When the received input image is a learning image (YES in S2 of FIG. 3), the machine learning unit 43 of the learning server 1 is based on the received learning image of the image classifier (see FIG. 4) (parameter). Machine learning (of data 17) is performed (S3 in FIG. 3).

これに対して、ユーザがカメラ２７を用いて撮影した入力画像が、テスト用画像のときには、ユーザは、タッチパネルタイプのディスプレイ２８を用いて、テスト用画像における、特定の物体を判別するための特徴部分の指示入力操作を行う。言い換えると、ユーザは、特定の物体の認識においてユーザが特徴部分と考える、テスト用画像中の部分（領域）を登録するための指示入力を行う。この指示入力に応じて、スマートフォン２の特徴部分登録部４６は、ユーザにより指示された特徴部分を、学習用サーバ１側の特徴部分ＤＢ１８に登録する。ユーザは、上記のディスプレイ２８へのタッチ操作による、画像中の特徴部分の指示入力操作を行う代わりに、特徴部分ＤＢ１８に登録される特徴部分を、デフォルトで画像中央の部分としておいて、自分が特徴部分と考える部分（領域）が、画像中央になるように画像を撮影することにより、画像中の特徴部分の指示入力操作を行うようにしてもよい。 On the other hand, when the input image taken by the user using the camera 27 is a test image, the user uses the touch panel type display 28 to discriminate a specific object in the test image. Perform the instruction input operation of the part. In other words, the user inputs an instruction for registering a portion (area) in the test image that the user considers to be a feature portion in the recognition of a specific object. In response to this instruction input, the feature portion registration unit 46 of the smartphone 2 registers the feature portion instructed by the user in the feature portion DB 18 on the learning server 1 side. Instead of performing an instruction input operation for the feature portion in the image by touching the display 28, the user sets the feature portion registered in the feature portion DB 18 as the center portion of the image by default, and he / she himself / herself. By taking an image so that the portion (region) considered to be the feature portion is in the center of the image, the instruction input operation of the feature portion in the image may be performed.

学習用サーバ１のＣＰＵ１１は、画像受信部４２により受信した入力画像が、（特定の物体が写りこんだ）テスト用画像のときには（図３のＳ２でＮＯ）、上記の機械学習部４３による機械学習処理を行わず、着目領域抽出部４４による処理を行う。この着目領域抽出部４４は、Ｇｒａｄ－ＣＡＭ（Ｇｒａｄｉｅｎｔ－ｗｅｉｇｈｔｅｄＣｌａｓｓＡｃｔｉｖａｔｉｏｎＭａｐｐｉｎｇ）等の技術を利用して、上記のテスト用画像について、画像分類器が、特定の物体の認識において（特定クラスの分類において）、現時点で着目しているテスト用画像中の着目領域を抽出する（図３のＳ４）。言い換えると、ＣＮＮベースの画像分類器が、ある特定クラスの分類において、テスト用画像の、どこ（どの部分）に着目しているかを判定する。 When the input image received by the image receiving unit 42 is a test image (NO in S2 of FIG. 3), the CPU 11 of the learning server 1 is a machine by the machine learning unit 43. The learning process is not performed, but the process by the region of interest extraction unit 44 is performed. In the region of interest extraction unit 44, the image classifier uses a technique such as Grad-CAM (Gradient-weighted Class Activation Mapping) to recognize a specific object (classification of a specific class) for the above-mentioned test image. In), the region of interest in the test image of interest at the present time is extracted (S4 in FIG. 3). In other words, the CNN-based image classifier determines where (which part) of the test image is focused on in a particular class of classification.

そして、学習用サーバ１の不足学習用画像推定部４５が、上記の着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶（登録）された特徴部分とに基づいて、特定の物体を認識するための画像分類器の機械学習を完了するのに必要な学習用画像の内容と数量を推定する。より詳細に言うと、不足学習用画像推定部４５は、着目領域抽出部４４により抽出された（特定クラスについての）着目領域と、特徴部分ＤＢ１８に記憶（登録）された（特定クラスの）特徴部分（の領域）との一致度に基づいて、画像分類器の特定クラスについての機械学習の進捗率（請求項における「進捗度」に相当）を算出し（図３のＳ５）、この進捗率と、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分とに基づいて、特定クラスについての機械学習を完了するのに必要な学習用画像の内容と数量（どのような学習用画像を、後どの程度集める必要があるか）を推定する（図３のＳ６）。学習用サーバ１の通信部１２は、不足学習用画像推定部４５により推定された、上記の必要な学習用画像の内容と数量を、不足学習用画像情報として、スマートフォン２に送信する（図３のＳ７）。 Then, the image estimation unit 45 for insufficient learning of the learning server 1 is specific based on the region of interest extracted by the region of interest 44 described above and the feature portion stored (registered) in the feature portion DB 18. Estimate the content and quantity of learning images required to complete machine learning of the image classifier for recognizing objects. More specifically, the lack learning image estimation unit 45 includes the region of interest (for a specific class) extracted by the region of interest extraction unit 44 and the features (of the specific class) stored (registered) in the feature portion DB18. Based on the degree of agreement with the part (area), the progress rate of machine learning (corresponding to the "progress" in the claim) for a specific class of the image classifier is calculated (S5 in FIG. 3), and this progress rate. The content and quantity of the learning image required to complete the machine learning for the specific class based on the region of interest extracted by the region of interest extraction unit 44 and the feature portion stored in the feature portion DB18 ( What kind of learning image needs to be collected later) is estimated (S6 in FIG. 3). The communication unit 12 of the learning server 1 transmits the content and quantity of the necessary learning images estimated by the image estimation unit 45 for insufficient learning to the smartphone 2 as image information for insufficient learning (FIG. 3). S7).

スマートフォン２の必要学習用画像提示部４７は、学習用サーバ１から受信した上記の不足学習用画像情報に基づいて、特定の物体を認識するための画像分類器の機械学習を完了するのに必要な（画像分類器の特定クラスについての機械学習を完了するのに必要な）、学習用画像の内容と数量を、ディスプレイ２８等を用いて提示する。 The required learning image presentation unit 47 of the smartphone 2 is necessary to complete the machine learning of the image classifier for recognizing a specific object based on the above-mentioned insufficient learning image information received from the learning server 1. (Necessary to complete machine learning for a specific class of image classifier), the content and quantity of learning images are presented using a display 28 or the like.

なお、学習用サーバ１の不足学習用画像推定部４５は、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分とが重なる場合には、当該クラスについての学習用画像が必要でない旨の情報を、不足学習用画像情報として、スマートフォン２に送信する。そして、スマートフォン２の必要学習用画像提示部４７が、当該クラスについての学習用画像が必要でない旨を、ディスプレイ２８等を用いて提示する。 The lack learning image estimation unit 45 of the learning server 1 learns about the class when the region of interest extracted by the region of interest 44 and the feature portion stored in the feature portion DB 18 overlap. Information indicating that the image is not required is transmitted to the smartphone 2 as image information for insufficient learning. Then, the required learning image presentation unit 47 of the smartphone 2 presents that the learning image for the class is not required by using the display 28 or the like.

次に、図４を参照して、上記の物体検出エンジン１６の概略構成と、着目領域抽出部４４がＧｒａｄ－ＣＡＭの技術を利用した場合の詳細処理の例について、説明する。Ｒ－ＣＮＮベースの物体検出エンジン１６は、候補領域抽出部６２と、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）により構成された画像分類器６３（請求項における「ニューラルネットワーク」）とを備えている。候補領域抽出部６２は、入力画像６１における、オブジェクトらしい領域を探し出す（抽出する）。また、画像分類器６３は、候補領域抽出部６２により抽出された領域に対してＣＮＮを適用して、抽出された領域の画像が、どのクラスに属するかを分類する。画像分類器６３は、特徴抽出部６４と識別部６５とを含んでいる。 Next, with reference to FIG. 4, a schematic configuration of the above-mentioned object detection engine 16 and an example of detailed processing when the region of interest extraction unit 44 uses the technology of Grad-CAM will be described. The R-CNN-based object detection engine 16 includes a candidate region extraction unit 62 and an image classifier 63 (“neural network” in the claim) configured by a CNN (Convolutional Neural Network). The candidate area extraction unit 62 searches for (extracts) an object-like area in the input image 61. Further, the image classifier 63 applies CNN to the region extracted by the candidate region extraction unit 62 to classify which class the image of the extracted region belongs to. The image classifier 63 includes a feature extraction unit 64 and an identification unit 65.

上記の特徴抽出部６４は、候補領域抽出部６２により抽出された領域の画像に対して、ＣＮＮ特徴の抽出処理を行う。特徴抽出部６４は、Ｃｏｎｖｏｌｕｔｉｏｎレイヤ６４ａと、ＲｅＬＵレイヤ６４ｂと、Ｐｏｏｌｉｎｇレイヤ６４ｃとを含んでいる。なお、特徴抽出部６４は、Ｃｏｎｖｏｌｕｔｉｏｎレイヤ６４ａとＲｅＬＵレイヤ６４ｂのみから構成されていてもよい。また、図４には、簡略化した１組のＣｏｎｖｏｌｕｔｉｏｎレイヤ６４ａ、ＲｅＬＵレイヤ６４ｂ、及びＰｏｏｌｉｎｇレイヤ６４ｃのみを記載しているが、特徴抽出部６４は、実際には、これらのレイヤを何組も（何層も）含んでいる。Ｃｏｎｖｏｌｕｔｉｏｎレイヤ６４ａは、候補領域抽出部６２により抽出された領域の画像に対して、畳み込み演算を行い、ＲｅＬＵレイヤ６４ｂは、上記の畳み込み演算結果に対する活性化処理を行い、Ｐｏｏｌｉｎｇレイヤ６４ｃは、ＲｅＬＵレイヤ６４ｂからの活性化処理後の出力データに対して、縦・横方向の空間を小さくするための演算を行う。なお、上記のＲｅＬＵレイヤ６４ｂは、Ｃｏｎｖｏｌｕｔｉｏｎレイヤ６４ａから出力された特徴マップにおける各データを、０以下の値のデータについては、０に置き換え、０を超える値のデータについては、そのまま出力する処理を行う。 The feature extraction unit 64 performs CNN feature extraction processing on the image of the region extracted by the candidate region extraction unit 62. The feature extraction unit 64 includes a Convolution layer 64a, a ReLU layer 64b, and a Polling layer 64c. The feature extraction unit 64 may be composed of only the Convolution layer 64a and the ReLU layer 64b. Further, although FIG. 4 shows only one set of simplified Convolution layer 64a, ReLU layer 64b, and Pooling layer 64c, the feature extraction unit 64 actually includes many sets of these layers. Contains (many layers). The Convolution layer 64a performs a convolution operation on the image of the area extracted by the candidate area extraction unit 62, the ReLU layer 64b performs an activation process on the above-mentioned convolution operation result, and the Pooling layer 64c is a ReLU layer. For the output data after the activation process from 64b, an operation for reducing the space in the vertical and horizontal directions is performed. The above ReLU layer 64b replaces each data in the feature map output from the Convolution layer 64a with 0 for data having a value of 0 or less, and outputs the data having a value exceeding 0 as it is. conduct.

また、上記の識別部６５は、特徴抽出部６４から出力されたＣＮＮ特徴量に基づき、候補領域抽出部６２により抽出された領域の画像に対するクラス分類を行う。この識別部６５は、全結合の多層ニューラルネットワークから構成され、少なくとも、Ａｆｆｉｎｅレイヤ６５ａと、Ｓｏｆｔｍａｘレイヤ６５ｂとを含んでいる。識別部６５は、候補領域抽出部６２により抽出された各領域の画像に対して、画像分類器６３による分類先の各クラスに分類される可能性の高さを表す確率スコアを算出し、この確率スコアが一番大きいクラスを、分類先のクラスとする。なお、Ｓｏｆｔｍａｘレイヤ６５ｂは、直近のＡｆｆｉｎｅレイヤ６５ａから出力された、各クラスに分類される可能性の高さを表すスコアを、確率スコアに変換する。また、識別部６５は、上記の各クラスに分類される可能性の高さを表す確率スコアと、各学習用画像についての教師ラベル（クラスラベル）との誤差に基づく、教師あり学習も行う。 Further, the identification unit 65 classifies the image of the region extracted by the candidate region extraction unit 62 based on the CNN feature amount output from the feature extraction unit 64. The identification unit 65 is composed of a fully connected multi-layer neural network, and includes at least an Affine layer 65a and a Softmax layer 65b. The identification unit 65 calculates a probability score indicating the high possibility of being classified into each class of the classification destination by the image classifier 63 for the image of each region extracted by the candidate region extraction unit 62, and this The class with the highest probability score is the class to be classified. The Softmax layer 65b converts the score output from the latest Affine layer 65a, which represents the high possibility of being classified into each class, into a probability score. Further, the identification unit 65 also performs supervised learning based on an error between the probability score indicating the high possibility of being classified into each of the above classes and the teacher label (class label) for each learning image.

次に、図４の下段に示される、着目領域抽出部４４がＧｒａｄ－ＣＡＭの技術を利用した場合の詳細処理の例について、説明する。図中のｙ^ｃは、上記のＳｏｆｔｍａｘレイヤ６５ｂから出力された、クラスｃの確率スコアを示す。ただし、ｙ^ｃは、上記のＳｏｆｔｍａｘレイヤ６５ｂにより確率スコアに変換される前の、クラスｃのスコア（ｒａｗｓｃｏｒｅ）であってもよい。また、α^ｃ _ｋは、クラスｃについてのｋ番目の（Ｃｏｎｖｏｌｕｔｉｏｎレイヤ６４ａの）フィルタに関する重み（係数）である。そして、Ａ^ｋは、クラスｃについてのｋ番目の特徴マップ（ｋ番目のＣｏｎｖｏｌｕｔｉｏｎレイヤ６４ａの後のＰｏｏｌｉｎｇレイヤ６４ｃからの出力）を示す。 Next, an example of detailed processing when the region of interest extraction unit 44 using the technology of Grad-CAM, which is shown in the lower part of FIG. 4, will be described. In the figure, y ^c indicates the probability score of the class c output from the above Softmax layer 65b. However, y ^c may be a class c score (raw score) before being converted into a probability score by the Softmax layer 65b described above. Further, α ^c _k is a weight (coefficient) for the kth filter (of Convolution layer 64a) for the class c. Then, Ak indicates the ^k -th feature map for the class c (output from the Pooling layer 64c after the k-th Convolution layer 64a).

ＣＰＵ１１の着目領域抽出部４４は、下記の式（１）に基づいて、上記の重みα^ｃ _ｋを算出する。具体的には、着目領域抽出部４４は、クラスｃの確率スコアｙ^ｃを、クラスｃについてのｋ番目の特徴マップＡ^ｋのピクセル（ｉ，ｊ）における強度Ａ^ｋ _ｉｊについて、偏微分して、勾配（ｇｒａｄｉｅｎｔ）（∂ｙ_ｃ／∂Ａ^ｋ _ｉｊ）を求める処理を繰り返し、これらの処理によって得られた勾配を、ｋ番目の特徴マップＡ^ｋの全ピクセルについて平均することにより、重みα^ｃ _ｋを求める。上記の勾配（∂ｙ_ｃ／∂Ａ^ｋ _ｉｊ）は、ｋ番目の特徴マップのピクセル（ｉ，ｊ）が、クラスｃの確率スコアｙ^ｃに及ぼす影響の大きさを示し、上記の重みα^ｃ _ｋは、ｋ番目の特徴マップＡ^ｋ（全体）が、クラスｃの確率スコアｙ^ｃに及ぼす影響の大きさを示す。

The region of interest extraction unit 44 of the CPU 11 calculates the above weight α ^c _k based on the following equation (1). Specifically, the region of interest extraction unit 44 partially differentiates the probability score y ^c of the class c with respect to the intensity ^Ak _ij at the pixel (i, j) of the ^k -th feature map Ak for the class c. , Gradient (∂y _c / ∂ Ak _ij ) is repeated, and the gradient obtained by these processes is averaged for all pixels of the ^k -th feature map ^Ak , thereby giving a weight α ^c . _{Find k} . The above gradient (∂y _c / ∂A ^k _ij ) indicates the magnitude of the influence of the pixel (i, j) of the kth feature map on the probability score y ^c of the class c, and the above weight α ^c . _k indicates the magnitude of the influence of the ^k -th feature map Ak (overall) on the probability score y ^c of the class c.

次に、着目領域抽出部４４は、上記の式（１）で求めた各特徴マップＡ^ｋの重みα^ｃ _ｋを用いて、下記の式（２）に基づき、ｎ個の特徴マップＡ^ｎについての加重平均値を各ピクセル毎に計算し、この各ピクセル毎の加重平均値を、活性化関数ＲｅＬＵ＝ｍａｘ｛ｘ，０｝のパラメタｘとした場合の出力値を、Ｇｒａｄ－ＣＡＭによるヒートマップ出力値Ｌ^ｃ _{Ｇｒａｄ－ＣＡＭ}とする。

Next, the region of interest extraction unit 44 uses the weight α ^c ^k of each feature map Ak _obtained by the above equation (1), and uses the following equation (2) for ⁿ feature maps An. The weighted average value of is calculated for each pixel, and the output value when the weighted average value for each pixel is used as the parameter x of the activation function ReLU = max {x, 0} is the heat map by Grad-CAM. The output value is L ^c _Grad-CAM .

ここで、上記のように、活性化関数ＲｅＬＵを用いた理由は、我々は、注目しているクラス（クラスｃ）に肯定的な影響を与える特徴（ピクセル）にだけ興味があるからであり、ヒートマップ出力に必用なピクセルは、そのピクセル（ｉ，ｊ）についての出力値が増加したときに、クラスｃの確率スコアｙ^ｃが増加するピクセルのみだからである。そして、着目領域抽出部４４は、上記の活性化関数ＲｅＬＵからの各ピクセルについての出力値（Ｇｒａｄ－ＣＡＭによるヒートマップ出力値）のうち、所定値以上の出力値を持つピクセルが集まった領域を、着目領域６８として抽出する。 Here, as described above, the reason for using the activation function ReLU is that we are only interested in features (pixels) that have a positive effect on the class (class c) of interest. This is because the pixels required for heat map output are only the pixels in which the probability score y ^c of the class c increases when the output value for that pixel (i, j) increases. Then, the region of interest extraction unit 44 sets a region in which pixels having an output value equal to or higher than a predetermined value among the output values (heat map output value by Grad-CAM) for each pixel from the above activation function ReLU are gathered. , Extracted as the region of interest 68.

次に、上記の物体検出エンジン１６を商品タグの検出に用いた場合における、物体検出エンジン１６の再学習を例にして、この不足学習用画像収集支援システム１０が行う、物体検出エンジン１６の機械学習完了に必要な学習用画像の提示処理について、説明する。 Next, taking as an example the re-learning of the object detection engine 16 when the above-mentioned object detection engine 16 is used for detecting a product tag, the machine of the object detection engine 16 performed by the image collection support system 10 for insufficient learning. The process of presenting the learning image required to complete the learning will be described.

例えば、ある小売店の商品棚における商品タグが、今までは、図５に示すフォーマットの商品タグ７１のみであったという状況において、新たに、図６に示す新しいフォーマットの商品タグ７２が追加されたとする。そして、図５に示す今までの商品タグ７１には、商品名７１ａ、値段７１ｂ、及びバーコード７１ｃが記載されており、図６に示す新しい商品タグ７２には、商品名７２ａ、値段７２ｂ、及びバーコード７２ｃに加えて、大セール表示７２ｄが記載されていたとする。この場合には、新しいフォーマットの商品タグ７２も商品タグであると認識させるために、物体検出エンジン１６の画像分類器６３の再学習が必要になる場合がある。このケースにおいて、物体検出エンジン１６の画像分類器６３は、本来なら、今までの商品タグ７１と新しい商品タグ７２において画像的特徴が共通する、バーコード７１ｃ，７２ｃの領域に着目すべきである（値段７１ｂや７２ｂに注目する場合もあるが、一般的にこれらは商品タグ毎に異なる数字となることが多く、「円」の部分を除き着目領域とならない場合が多い。一方でバーコードは、それぞれ意味する数字が異なっても、画像的特徴としては縦方向の直線が長方形状に分布するものであり、着目領域となりやすい）。しかしながら、この再学習の途中において、偶々、新しい商品タグ７２における大セール表示７２ｄの特徴が、（商品タグに相当するクラス以外の）他のクラスの特徴に似ていたために、物体検出エンジン１６の画像分類器６３が、（商品タグに相当する特定クラスの認識において）現時点で着目している領域が、新しい商品タグ７２における大セール表示７２ｄの領域であったとする。 For example, in a situation where the product tag on the product shelf of a retail store has been only the product tag 71 in the format shown in FIG. 5, a new product tag 72 in the new format shown in FIG. 6 has been added. Suppose. The product tag 71 up to now shown in FIG. 5 has a product name 71a, a price 71b, and a barcode 71c, and the new product tag 72 shown in FIG. 6 has a product name 72a, a price 72b, and a bar code 71c. And, in addition to the barcode 72c, it is assumed that the large sale display 72d is described. In this case, it may be necessary to relearn the image classifier 63 of the object detection engine 16 in order to recognize that the product tag 72 in the new format is also a product tag. In this case, the image classifier 63 of the object detection engine 16 should pay attention to the regions of the barcodes 71c and 72c, which originally have the same image features in the old product tag 71 and the new product tag 72. (Although we may pay attention to prices 71b and 72b, in general, these are often different numbers for each product tag, and in many cases they are not the area of interest except for the "yen" part. On the other hand, the barcode is. , Even if the numbers that mean each are different, the image feature is that the straight lines in the vertical direction are distributed in a rectangular shape, and it tends to be a region of interest). However, in the middle of this re-learning, by chance, the feature of the big sale display 72d in the new product tag 72 resembled the feature of another class (other than the class corresponding to the product tag), so that the object detection engine 16 It is assumed that the area of interest at the present time (in recognition of a specific class corresponding to the product tag) of the image classifier 63 is the area of the large sale display 72d in the new product tag 72.

上記の状況において、ユーザである店員が、物体検出エンジン１６の（再）学習状況を確認するために、カメラ２７を用いて、図６に示す新しいフォーマットの商品タグ７２の画像を、テスト用画像として撮影した上で、このテスト用画像における、商品タグを判別するための特徴部分の指示入力操作を行ったとする。この指示入力操作は、具体的には、店員が、スマートフォン２のタッチパネルタイプのディスプレイ２８へのタッチ操作で、図７に示す特徴部分指示枠８１により、商品タグの認識において自分が特徴部分と考える、テスト用画像中の部分（領域）を囲むという操作である。この指示入力操作に応じて、スマートフォン２の特徴部分登録部４６が、店員により指示された特徴部分（特徴部分指示枠８１内の領域）を、学習用サーバ１側の特徴部分ＤＢ１８に登録する。ここでは、店員が、商品タグ７２におけるバーコード７２ｃの領域を、商品タグを判別するための特徴部分として登録したものとする。 In the above situation, the store clerk who is the user uses the camera 27 to check the (re) learning status of the object detection engine 16 by displaying the image of the product tag 72 in the new format shown in FIG. 6 as a test image. It is assumed that the instruction input operation of the feature part for discriminating the product tag in this test image is performed after taking a picture. Specifically, this instruction input operation is a touch operation on the touch panel type display 28 of the smartphone 2, and the feature portion instruction frame 81 shown in FIG. 7 considers the store clerk to be a feature portion in recognizing the product tag. , It is an operation to surround the part (area) in the test image. In response to this instruction input operation, the feature portion registration unit 46 of the smartphone 2 registers the feature portion (area in the feature portion instruction frame 81) instructed by the clerk in the feature portion DB 18 on the learning server 1 side. Here, it is assumed that the clerk has registered the area of the barcode 72c in the product tag 72 as a feature portion for discriminating the product tag.

上記の特徴部分の登録処理が完了すると、学習用サーバ１の着目領域抽出部４４は、スマートフォン２から受信した上記のテスト用画像について、画像分類器６３が、商品タグの認識において、現時点で着目しているテスト用画像中の着目領域を抽出する。ここでは、上記のように、画像分類器６３が現時点で着目している領域（着目領域）が、大セール表示７２ｄの領域であり、店員により指示された特徴部分（の領域）が、バーコード７２ｃの領域であるので、上記図３のＳ５において算出される機械学習の進捗率は、低くなる。 When the registration process of the above feature portion is completed, the focus area extraction unit 44 of the learning server 1 focuses on the above test image received from the smartphone 2 by the image classifier 63 at the present time in recognizing the product tag. Extract the region of interest in the test image being used. Here, as described above, the area (area of interest) that the image classifier 63 is currently paying attention to is the area of the large sale display 72d, and the feature portion (area) instructed by the clerk is the barcode. Since it is the region of 72c, the progress rate of machine learning calculated in S5 of FIG. 3 is low.

学習用サーバ１の不足学習用画像推定部４５は、上記の機械学習の進捗率と、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分とに基づいて、物体検出エンジン１６の画像分類器６３の（商品タグの認識についての）再学習を完了するのに必要な学習用画像の内容と数量を推定する。具体的には、この例の場合は、新しい商品タグ７２における大セール表示７２ｄの特徴が、（商品タグに相当するクラス以外の）他のクラスの特徴に似ていたために、機械学習の進捗率が低くなっていると考えられる。このため、不足学習用画像推定部４５は、商品タグに対応するクラスについての機械学習を完了するのに必要な学習用画像（の内容）が、大セール表示７２ｄを含む商品タグ（新しい商品タグ７２）が写り込んだ学習用画像であるという推定と、この種類の学習画像が、後どれ位必要であるかという推定を行う。学習用サーバ１の通信部１２は、上記の必要な学習用画像の内容（種類）と数量を、不足学習用画像情報として、スマートフォン２に送信する The shortage learning image estimation unit 45 of the learning server 1 is based on the progress rate of the machine learning described above, the region of interest extracted by the region of interest extraction unit 44, and the feature portion stored in the feature portion DB 18. It estimates the content and quantity of the learning image required to complete the re-learning (for recognition of the product tag) of the image classifier 63 of the object detection engine 16. Specifically, in the case of this example, the machine learning progress rate because the characteristics of the big sale display 72d in the new product tag 72 were similar to the characteristics of other classes (other than the class corresponding to the product tag). Is considered to be low. Therefore, the lack learning image estimation unit 45 has a product tag (new product tag) in which the learning image (contents) required to complete machine learning for the class corresponding to the product tag includes the big sale display 72d. It is estimated that 72) is a learning image in which the image is reflected, and how much later this type of learning image is needed. The communication unit 12 of the learning server 1 transmits the content (type) and quantity of the necessary learning images to the smartphone 2 as insufficient learning image information.

不足学習用画像推定部４５は、回帰モデルのニューラルネットワークにより実現することができる。当該ニューラルネットワークは、着目領域と特徴部分を入力とし、当該着目領域と特徴部分の状況において必要な学習用画像の数量を出力として、事前に学習させたものであり、充分な量の学習データ（入力と出力の組み合わせ）を用いて学習することで、ある未知の着目領域と特徴部分の組み合わせにおいて、必要な学習用画像の数量を回帰（予測（推定））することが可能となる。必要な学習用画像の内容については、着目領域とテスト画像のＡＮＤをとった画像が利用可能である。 The under-learning image estimation unit 45 can be realized by a neural network of a regression model. The neural network is trained in advance by inputting the region of interest and the feature portion and outputting the number of learning images required in the situation of the region of interest and the feature portion as an output, and a sufficient amount of training data ( By learning using a combination of input and output), it is possible to regress (predict (estimate)) the required number of learning images in a combination of an unknown region of interest and a feature portion. As for the content of the necessary learning image, an image obtained by ANDing the region of interest and the test image can be used.

スマートフォン２の必要学習用画像提示部４７は、学習用サーバ１から受信した上記の不足学習用画像情報に基づいて、商品タグを認識するための画像分類器の機械学習（再学習）を完了するのに必要な、学習用画像の内容と数量を、ディスプレイ２８を用いて表示（提示）する。この例では、スマートフォン２の必要学習用画像提示部４７によってディスプレイ２８に表示される学習用画像の内容と数量（ガイダンス）は、図８に示すようになる。店員は、図８に示すガイダンスに従って、大セール表示７２ｄを含む学習用画像を、カメラ２７で、５０枚撮影（取得）することにより、商品タグを認識するための画像分類器の機械学習（再学習）を完了させることができる。これにより、店舗の現場において、ディープラーニングに関する知識のない店員が、スマートフォン２が提示した学習用画像の内容と数量（ガイダンス）に基づいて、容易に、機械学習を完了するのに必要な学習用画像を作成することができる。従って、上記のガイダンス（学習用画像の内容と数量）に基づいて、店員が、容易に、それまでの機械学習の軌道を修正することができる。 The required learning image presentation unit 47 of the smartphone 2 completes machine learning (re-learning) of the image classifier for recognizing the product tag based on the above-mentioned insufficient learning image information received from the learning server 1. The content and quantity of the learning image required for the above are displayed (presented) using the display 28. In this example, the content and quantity (guidance) of the learning image displayed on the display 28 by the required learning image presenting unit 47 of the smartphone 2 are as shown in FIG. According to the guidance shown in FIG. 8, the clerk takes (acquires) 50 images for learning including the big sale display 72d with the camera 27, and the machine learning (re-learning) of the image classifier for recognizing the product tag. Learning) can be completed. As a result, at the site of the store, a clerk who has no knowledge about deep learning can easily complete machine learning based on the content and quantity (guidance) of the learning image presented by the smartphone 2. You can create an image. Therefore, based on the above guidance (content and quantity of learning images), the clerk can easily correct the trajectory of machine learning up to that point.

上記のように、本実施形態の不足学習用画像推定プログラム１９を実装した学習用サーバ１によれば、商品タグ等の特定の物体を判別するための特徴部分と、この特定の物体の認識において、画像分類器６３（請求項における「ニューラルネットワーク」）が現時点で着目している入力画像（テスト用画像）中の着目領域とに基づいて、この特定の物体を認識するための画像分類器６３の機械学習を完了するのに必要な学習用画像の内容と数量を推定することができる。これにより、上記の機械学習を完了するのに必要な学習用画像の内容と数量を、ユーザに知らせることが可能になるので、ディープラーニングに関する知識のないユーザでも、容易に、上記の機械学習を完了するのに必要な学習用画像を作成することが可能になる。従って、画像分類器６３の再学習を行う度に、ディープラーニングに関する知識のあるＡＩ技術者が必要になるという状況を、回避することができる。 As described above, according to the learning server 1 that implements the image estimation program 19 for lack learning of the present embodiment, in the feature portion for discriminating a specific object such as a product tag and the recognition of this specific object. , Image classifier 63 for recognizing this particular object based on the region of interest in the input image (test image) that the image classifier 63 (“neural network” in claim) is currently focusing on. It is possible to estimate the content and quantity of learning images required to complete machine learning. This makes it possible to inform the user of the content and quantity of the learning image required to complete the above machine learning, so that even a user who has no knowledge of deep learning can easily perform the above machine learning. It will be possible to create the learning images needed to complete. Therefore, it is possible to avoid a situation in which an AI engineer who has knowledge about deep learning is required every time the image classifier 63 is relearned.

また、本実施形態の不足学習用画像推定プログラム１９を実装した学習用サーバ１によれば、ＣＰＵ１１の不足学習用画像推定部４５が、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分との一致度に基づいて、画像分類器６３の機械学習の進捗度を算出し、この進捗度と、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分とに基づいて、画像分類器６３の機械学習を完了するのに必要な学習用画像の内容と数量を推定するようにした。ここで、上記のように、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分との一致度に基づいて、画像分類器６３の機械学習の進捗度を算出することにより、画像分類器６３の機械学習の進捗度を正確に算出することができるので、この正確な進捗度と、着目領域抽出部４４により抽出された着目領域と、特徴部分ＤＢ１８に記憶された特徴部分とを用いることにより、画像分類器６３の機械学習を完了するのに必要な学習用画像の内容と数量を、正確に推定することが可能になる。 Further, according to the learning server 1 on which the lack learning image estimation program 19 of the present embodiment is implemented, the lack learning image estimation unit 45 of the CPU 11 has the focus region extracted by the focus region extraction unit 44 and the feature portion. The progress of machine learning of the image classifier 63 is calculated based on the degree of coincidence with the feature portion stored in the DB 18, and the progress, the region of interest extracted by the region of interest extraction unit 44, and the feature portion DB 18 The content and quantity of the learning image required to complete the machine learning of the image classifier 63 are estimated based on the feature portion stored in the image classifier 63. Here, as described above, the progress of machine learning of the image classifier 63 is calculated based on the degree of coincidence between the region of interest extracted by the region of interest 44 and the feature portion stored in the feature portion DB 18. By doing so, the progress of machine learning of the image classifier 63 can be accurately calculated, so that the accurate progress, the region of interest extracted by the region of interest extraction unit 44, and the region of interest extracted by the region of interest 44 are stored in the feature portion DB18. By using the feature portion, it is possible to accurately estimate the content and quantity of the learning image required to complete the machine learning of the image classifier 63.

また、本実施形態の不足学習用画像収集支援システム１０によれば、学習用サーバ１が、ユーザにより指示された、（商品タグ等の）特定の物体を判別するための特徴部分と、この特定の物体の認識において、画像分類器６３が現時点で着目している入力画像（テスト用画像）中の着目領域とに基づいて、この特定の物体を認識するための画像分類器６３の機械学習を完了するのに必要な学習用画像の内容と数量を推定して、推定した学習用画像の内容と数量を、不足学習用画像情報としてスマートフォン２に送信する。そして、スマートフォン２が、学習用サーバ１から受信した不足学習用画像情報に基づいて、特定の物体を認識するための画像分類器６３の機械学習を完了するのに必要な、学習用画像の内容と数量を、ユーザに提示する（上記の機械学習を完了するのに必要な学習用画像の内容と数量を、スマートフォン２を保持するユーザに知らせる）。これにより、ディープラーニングに関する知識のないユーザでも、上記のスマートフォン２が提示した学習用画像の内容と数量に基づいて、容易に、上記の機械学習を完了するのに必要な学習用画像を作成することが可能になる。従って、画像分類器６３の再学習を行う度に、ディープラーニングに関する知識のあるＡＩ技術者が必要になるという状況を、回避することができる。 Further, according to the insufficient learning image collection support system 10 of the present embodiment, the learning server 1 has a feature portion for discriminating a specific object (such as a product tag) instructed by the user, and this identification. In the recognition of an object, machine learning of the image classifier 63 for recognizing this specific object is performed based on the region of interest in the input image (test image) that the image classifier 63 is currently focusing on. The content and quantity of the learning image required for completion are estimated, and the content and quantity of the estimated learning image are transmitted to the smartphone 2 as insufficient learning image information. Then, the content of the learning image necessary for the smartphone 2 to complete the machine learning of the image classifier 63 for recognizing a specific object based on the insufficient learning image information received from the learning server 1. And the quantity are presented to the user (the user holding the smartphone 2 is informed of the content and quantity of the learning image required to complete the above machine learning). As a result, even a user who has no knowledge about deep learning can easily create a learning image necessary for completing the above machine learning based on the content and quantity of the learning image presented by the smartphone 2. Will be possible. Therefore, it is possible to avoid a situation in which an AI engineer who has knowledge about deep learning is required every time the image classifier 63 is relearned.

変形例：
なお、本発明は、上記の各実施形態の構成に限られず、発明の趣旨を変更しない範囲で種々の変形が可能である。次に、本発明の変形例について説明する。 Modification example:
The present invention is not limited to the configuration of each of the above embodiments, and various modifications can be made without changing the gist of the invention. Next, a modification of the present invention will be described.

変形例１：
上記の実施形態では、スマートフォン２の必要学習用画像提示部４７が、画像分類器の機械学習を完了するのに必要な、学習用画像の内容と数量を、ディスプレイ２８に表示することにより、ユーザに提示したが、画像分類器の機械学習を完了するのに必要な、学習用画像の内容と数量を、スピーカを用いた音声ガイダンスによって、ユーザに提示してもよい。 Modification 1: Modification 1:
In the above embodiment, the required learning image presentation unit 47 of the smartphone 2 displays the content and quantity of the learning image required to complete the machine learning of the image classifier on the display 28, thereby displaying the user. However, the content and quantity of the learning image required to complete the machine learning of the image classifier may be presented to the user by voice guidance using a speaker.

変形例２：
また、上記の実施形態では、学習用サーバ１が、ユーザ（店員）がスマートフォン２のカメラ２７で撮影した画像を、スマートフォン２から受信して、学習用画像及びテスト用画像として用いる場合の例を示したが、これに限られず、例えば、他のサーバから送信された画像を、学習用画像及びテスト用画像として用いてもよい。また、学習用サーバが、いわゆるＵＳＢメモリ等のリムーバブルメディアから、学習用画像及びテスト用画像を読み取ることにより、これらの画像を取得するようにしてもよい。すなわち、請求項における画像取得部は、上記実施形態におけるスマートフォン２のような情報処理端末から、学習用画像等の入力画像を取得する通信装置（図１における通信部１２に相当）に限られず、例えば、リムーバブルメディアから学習用画像等の入力画像を取得（入力）するための入力端子であってもよい。 Modification 2:
Further, in the above embodiment, an example in which the learning server 1 receives an image taken by the user (clerk) with the camera 27 of the smartphone 2 from the smartphone 2 and uses it as a learning image and a test image. Although shown, the present invention is not limited to this, and for example, an image transmitted from another server may be used as a learning image and a test image. Further, the learning server may acquire these images by reading the learning image and the test image from a removable medium such as a so-called USB memory. That is, the image acquisition unit in the claim is not limited to the communication device (corresponding to the communication unit 12 in FIG. 1) that acquires an input image such as a learning image from an information processing terminal such as the smartphone 2 in the above embodiment. For example, it may be an input terminal for acquiring (inputting) an input image such as a learning image from a removable medium.

変形例３：
上記の実施形態では、学習用サーバ１が、画像分類器６３を含む物体検出エンジン１６を用いる場合の例を示したが、学習用サーバ１が、物体の検出を行わず、物体（例えば、商品タグ）の認識だけを行う場合には、物体検出エンジンの代わりに、画像分類器を単独で用いればよい。 Modification 3:
In the above embodiment, an example is shown in which the learning server 1 uses the object detection engine 16 including the image classifier 63, but the learning server 1 does not detect the object and the object (for example, a product). When only the tag) is recognized, the image classifier may be used alone instead of the object detection engine.

変形例４：
また、上記の実施形態では、Ｒ－ＣＮＮベースの物体検出エンジン１６を用いる場合の例を示したが、ＦａｓｔｅｒＲ－ＣＮＮベースの物体検出エンジンを用いてもよい。ＦａｓｔｅｒＲ－ＣＮＮベースの物体検出エンジンを用いることにより、物体認識処理（図４中の画像分類器６３が行う、画像のクラス分類処理に相当）だけではなく、画像における物体候補領域抽出処理（図４中の候補領域抽出部６２が行う処理）も、一つのＣＮＮで行うことができる。 Modification 4:
Further, in the above embodiment, the case where the R-CNN-based object detection engine 16 is used is shown, but the Faster R-CNN-based object detection engine may be used. By using the Faster R-CNN-based object detection engine, not only the object recognition process (corresponding to the image classification process performed by the image classifier 63 in FIG. 4) but also the object candidate region extraction process in the image (FIG. 4). The process performed by the candidate region extraction unit 62 in 4) can also be performed by one CNN.

変形例５：
上記の実施形態では、本発明の情報処理端末が、スマートフォン２である場合の例を示したが、本発明の情報処理端末は、これに限られず、例えば、カメラを備えたタブレット型コンピュータであってもよい。 Modification 5:
In the above embodiment, an example is shown in which the information processing terminal of the present invention is a smartphone 2, but the information processing terminal of the present invention is not limited to this, and is, for example, a tablet computer provided with a camera. You may.

１学習用サーバ（コンピュータ）
２スマートフォン（情報処理端末）
１２通信部（画像取得部、画像受信部、サーバ側送信部）
１８特徴部分ＤＢ（特徴部分記憶部）
１９不足学習用画像推定プログラム
２２通信部（端末側送信部）
２７カメラ（撮影部）
２８（タッチパネルタイプの）ディスプレイ（操作部）
４２画像受信部（画像取得部、画像受信部）
４３機械学習部
４４着目領域抽出部
４５不足学習用画像推定部
４７必要学習用画像提示部
６１入力画像
６３画像分類器（ニューラルネットワーク）
６８着目領域 1 Learning server (computer)
2 Smartphone (information processing terminal)
12 Communication unit (image acquisition unit, image reception unit, server-side transmission unit)
18 Feature part DB (feature part storage part)
19 Image estimation program for insufficient learning 22 Communication unit (terminal side transmission unit)
27 Camera (shooting section)
28 (touch panel type) display (operation unit)
42 Image receiving unit (image acquisition unit, image receiving unit)
43 Machine learning unit 44 Focus area extraction unit 45 Insufficient learning image estimation unit 47 Required learning image presentation unit 61 Input image 63 Image classifier (neural network)
68 Area of interest

Claims

An image acquisition unit that acquires an input image including a learning image,
A machine learning unit that performs machine learning of a neural network for recognizing a specific object based on a learning image acquired by the image acquisition unit.
In the recognition of the specific object, the neural network includes a region of interest extraction unit that extracts the region of interest in the input image that is currently being focused on.
A feature portion storage unit that stores a feature portion for discriminating the specific object in the input image, and a feature portion storage unit.
Based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit, the content and quantity of the learning image required to complete the machine learning are estimated. A learning server equipped with an image estimation unit for insufficient learning.

The neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier includes a specific class corresponding to the specific object.
The under-learning image estimation unit completes machine learning for the specific class based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. The learning server according to claim 1, wherein the content and quantity of the learning images required for the above are estimated.

The under-learning image estimation unit advances machine learning of the neural network based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. It is necessary to calculate the degree and complete the machine learning based on the progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit. The learning server according to claim 1 or 2, wherein the content and quantity of the learning image are estimated.

It is equipped with an information processing terminal and a learning server connected to the information processing terminal via a network.
The information processing terminal is
A shooting unit that shoots input images including learning images,
An operation unit for performing an instruction input operation of a feature portion for discriminating a specific object in an input image captured by the photographing unit, and an operation unit.
A terminal-side transmission unit that transmits the feature portion instructed by the user using the operation unit and the input image including the learning image taken by the imaging unit to the learning server.
Based on the under-learning image information received from the learning server, the user is presented with the content and quantity of the learning image necessary to complete the machine learning of the neural network for recognizing the specific object. Equipped with a necessary learning image presentation unit
The learning server is
An image receiving unit that receives the input image including the learning image transmitted by the terminal-side transmitting unit, and an image receiving unit.
A machine learning unit that performs machine learning of the neural network based on the learning image received by the image receiving unit, and a machine learning unit.
In the recognition of the specific object, the neural network includes a region of interest extraction unit that extracts the region of interest in the input image that is currently being focused on.
A feature portion storage unit that stores the feature portion transmitted by the terminal-side transmission unit, and a feature portion storage unit.
Based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit, the content and quantity of the learning image required to complete the machine learning are estimated. Image estimation unit for insufficient learning and
An image for insufficient learning provided with a server-side transmission unit that transmits the content and quantity of the necessary learning image estimated by the image estimation unit for insufficient learning as the image information for insufficient learning to the information processing terminal. Collection support system.

The neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier includes a specific class corresponding to the specific object.
The under-learning image estimation unit completes machine learning for the specific class based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. The insufficient learning image collection support system according to claim 4, wherein the content and quantity of the learning images required for the learning are estimated.

The under-learning image estimation unit advances machine learning of the neural network based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. It is necessary to calculate the degree and complete the machine learning based on the progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit. The insufficient learning image collection support system according to claim 4 or 5, wherein the content and quantity of the learning image are estimated.

Computer,
An image acquisition unit that acquires an input image including a learning image,
The neural network for recognizing a specific object includes a region of interest extraction unit that extracts the region of interest in the input image that is currently being focused on in recognizing the specific object.
A feature portion storage unit that stores a feature portion for discriminating the specific object in the input image, and a feature portion storage unit.
To complete the machine learning of the neural network for recognizing the specific object based on the feature region extracted by the focus region extraction unit and the feature portion stored in the feature portion storage unit. An image estimation program for under-learning to function as an image estimation unit for under-learning that estimates the content and quantity of necessary learning images.

The neural network is an image classifier that classifies which class the input image belongs to, and the class to be classified by the image classifier includes a specific class corresponding to the specific object.
The under-learning image estimation unit completes machine learning for the specific class based on the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. The insufficient learning image estimation program according to claim 7, wherein the content and quantity of the learning images required for the above are estimated.

The under-learning image estimation unit advances machine learning of the neural network based on the degree of coincidence between the region of interest extracted by the region of interest extraction unit and the feature portion stored in the feature portion storage unit. It is necessary to calculate the degree and complete the machine learning based on the progress, the region of interest extracted by the region of interest extraction unit, and the feature portion stored in the feature portion storage unit. The insufficient learning image estimation program according to claim 7 or 8, wherein the content and quantity of the learning image are estimated.