JP6964316B1

JP6964316B1 - Artificial intelligence (AI) estimation system, learning data generator, learning device, fruit thinning object estimation device, learning system, and program

Info

Publication number: JP6964316B1
Application number: JP2021010624A
Authority: JP
Inventors: 浩二佐々木; 和治井上; 葵岩渕
Original assignee: AdIn Research Inc
Current assignee: AdIn Research Inc
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-11-10
Anticipated expiration: 2041-01-26
Also published as: JP2022114352A

Abstract

【課題】農作物の摘果作業における摘果対象物を高精度に推定するＡＩを提供する。【解決手段】学習データ生成装置と、学習装置と、摘果対象物推定装置を有する推定システム５０１であって、学習データ生成装置は、農作物の摘果前（第１）と後（第２）を示す画像データを入力し両者の差異となる対象物体を摘果対象物として抽出する抽出部と、その抽出結果を含む第１学習データを用いて学習して摘果対象物を推定する生成部と、生成部からの推定結果画像データを識別して第２学習データを生成する識別部とを備える。学習装置は、第２学習データにより、摘果対象物の推定結果画像データを生成する生成部と、学習データと比較に基づく識別結果を生成部へフィードバックさせて学習モデルを学習させる識別部とを有する学習部を備える。摘果対象物推定装置は、摘果前を示す未知画像データから学習済みモデルにより、摘果対象物を推定する。【選択図】図２０PROBLEM TO BE SOLVED: To provide an AI for estimating a fruit-picking object with high accuracy in a fruit-picking operation of an agricultural product. SOLUTION: The estimation system 501 has a learning data generation device, a learning device, and a fruit thinning object estimation device, and the learning data generation device shows before (first) and after (second) fruit thinning of agricultural products. An extraction unit that inputs image data and extracts the target object that is the difference between the two as a fruit-picking target, a generation unit that learns using the first learning data including the extraction result and estimates the fruit-picking target, and a generation unit. The estimation result from is provided with an identification unit that identifies the image data and generates the second training data. The learning device has a generation unit that generates an estimation result image data of a fruit-picking object by the second learning data, and a discrimination unit that feeds back the identification result based on the training data and comparison to the generation unit to learn the learning model. It has a learning department. The fruit-picking target estimation device estimates the fruit-picking target from the unknown image data showing before the fruit-picking by the trained model. [Selection diagram] FIG. 20

Description

本発明は、ＡＩによる推定システム、学習データ生成装置、学習装置、摘果対象物推定装置、学習システム、及び、プログラムに関する。 The present invention relates to an AI estimation system, a learning data generation device, a learning device, a fruit thinning object estimation device, a learning system, and a program.

人工知能（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ、以下「ＡＩ」という。）により、現状等に基づき推定、又は、様々な対象物を認識する技術が知られている。 There is known a technique of estimating based on the current situation or recognizing various objects by artificial intelligence (hereinafter referred to as "AI").

例えば、コンベアにロボットハンドを設置した工場等に用いるロボットシステムがある。具体的には、ロボットシステムは、まず、カメラによって物体を撮影する。撮影後、撮影された画像に基づき、物体が画像認識される。そして、ロボットシステムは、撮影した画像に基づき、物体の重心位置を計算する。このように計算される重心位置に基づき、ロボットシステムは、ロボットハンドで物体を把持する正確な位置等を決定する。このようにして、ロボットハンドで物体を安定して把持する技術が知られている（例えば、特許文献１等を参照）。 For example, there is a robot system used in a factory or the like in which a robot hand is installed on a conveyor. Specifically, the robot system first photographs an object with a camera. After shooting, the object is image-recognized based on the shot image. Then, the robot system calculates the position of the center of gravity of the object based on the captured image. Based on the position of the center of gravity calculated in this way, the robot system determines an accurate position for gripping the object with the robot hand. A technique for stably gripping an object with a robot hand in this way is known (see, for example, Patent Document 1 and the like).

また、ＡＩによる物体の認識は、農業の場面にも用いられる。具体的には、ぶどうの摘粒作業において、ＡＩが粒数を自動的に判定する技術が知られている（例えば、非特許文献１等を参照）。 The recognition of objects by AI is also used in agricultural situations. Specifically, there is known a technique in which AI automatically determines the number of grains in a grape picking operation (see, for example, Non-Patent Document 1 and the like).

特開２０１９−１８５２０４号公報Japanese Unexamined Patent Publication No. 2019-185204

庄司健一，“「ぶどうの粒いくつある？」を自動判定するＡＩ来夏実用化へ〜山梨大学と農業生産法人が共同開発”，［ｏｎｌｉｎｅ］，２０２０年８月１７日，ＤＧＬａｂＨａｕｓ，［令和２年１２月２日検索］，インターネット，＜ＵＲＬ：https://media.dglab.com/2020/08/17-grape-01/＞Kenichi Shoji, "AI to automatically determine" how many grape grains are there? "To be put into practical use next summer-joint development by Yamanashi University and agricultural production corporation", [online], August 17, 2020, DG Lab House, [ Search on December 2, 2nd year of Reiwa], Internet, <URL: https://media.dglab.com/2020/08/17-grape-01/>

上記の特許文献１に記載のような技術は、工場内等の照明環境を想定した技術である。すなわち、工場内等といった照明環境は、撮影、及び、画像認識等の処理を行うのに、野外等の自然光の下といった照明環境と比較して、光等の条件が安定している環境である場合が多い。したがって、工場内等の照明環境を想定した技術は、農作物を扱う等の照明環境には適用させにくい課題がある。 The technology described in Patent Document 1 described above is a technology assuming a lighting environment such as in a factory. That is, the lighting environment such as in a factory is an environment in which the conditions such as light are stable as compared with the lighting environment such as under natural light such as outdoors for processing such as shooting and image recognition. In many cases. Therefore, there is a problem that it is difficult to apply the technology assuming the lighting environment such as in a factory to the lighting environment such as handling agricultural products.

また、上記の非特許文献１に記載のような技術において、ＡＩを学習させるには、学習データを十分に確保することになる。特に、ＡＩを高精度化させるには、大量の学習データを確保するのが望ましい。ゆえに、上記の非特許文献１に記載のような技術では、摘果対象物をＡＩで高精度に推定するのが難しい課題がある。 Further, in the technique as described in Non-Patent Document 1 above, sufficient learning data is secured in order to train AI. In particular, in order to improve the accuracy of AI, it is desirable to secure a large amount of learning data. Therefore, in the technique as described in Non-Patent Document 1 above, there is a problem that it is difficult to estimate the fruit-picking object with high accuracy by AI.

本発明は、農作物の摘果作業における摘果対象物をＡＩで高精度に推定することを目的とする。 An object of the present invention is to estimate the fruit-picking target in the fruit-picking work of agricultural products with high accuracy by AI.

上記の課題を解決するため、本発明の一態様における、
学習データを生成する学習データ生成装置、生成部と識別部を有し、かつ、前記学習データを用いて学習モデルを学習させる学習装置、及び、前記学習装置が学習させた学習済みモデルを用いる摘果対象物推定装置を有する推定システムでは、
前記学習データ生成装置は、
摘果前の農作物を示す画像データである第１入力画像データ、及び、摘果後の前記農作物を示す画像データである第２入力画像データを入力する画像データ入力部と、
前記農作物の実、花、葉、又は、これらの組み合わせである対象物体のうち、前記第１入力画像データ、及び、前記第２入力画像データの差異となる対象物体を摘果対象物として抽出する抽出部と、
前記第１入力画像データ、及び、前記抽出部による抽出結果のセットを第１学習データとして学習し、かつ、前記摘果対象物を推定した結果を示す推定結果画像データを生成する生成部と、
前記推定結果画像データを識別して、識別結果に基づき第２学習データを生成する識別部と
を備え、
前記学習装置は、
前記第２学習データを入力する学習データ入力部と、
前記第２学習データにより、摘果対象物を推定した結果を示す推定結果画像データを生成する前記生成部と、
前記学習データと比較して、前記推定結果画像データを識別して、識別結果を前記生成部へフィードバックさせて前記学習モデルを学習させる前記識別部と
を備え、
前記摘果対象物推定装置は、
未知の摘果前の農作物を示す未知画像データを入力する画像データ入力部と、
前記学習済みモデルにより、前記摘果対象物を推定する推定部と、
前記推定部による推定結果を出力する出力部と
を備える。 In order to solve the above problems, in one aspect of the present invention,
A learning data generation device that generates learning data, a learning device that has a generation unit and an identification unit and trains a learning model using the learning data, and fruit picking using a learned model trained by the learning device. In an estimation system with an object estimation device,
The learning data generator is
An image data input unit for inputting a first input image data which is image data showing an agricultural product before fruit thinning and a second input image data which is an image data showing the agricultural product after fruit thinning.
Extraction of the target object which is the difference between the first input image data and the second input image data among the target objects which are the fruits, flowers, leaves, or a combination thereof of the agricultural crop as the fruit-picking target. Department and
A generation unit that learns the first input image data and a set of extraction results by the extraction unit as the first learning data, and generates estimation result image data indicating the result of estimating the fruit-picking object.
It is provided with an identification unit that identifies the estimation result image data and generates a second learning data based on the identification result.
The learning device is
A learning data input unit for inputting the second learning data,
The generation unit that generates the estimation result image data showing the result of estimating the fruit-picking object by the second learning data, and the generation unit.
It is provided with the identification unit that identifies the estimation result image data as compared with the learning data and feeds back the identification result to the generation unit to train the learning model.
The fruit thinning object estimation device is
An image data input unit that inputs unknown image data indicating an unknown crop before fruit thinning,
An estimation unit that estimates the fruit-picking object using the trained model,
It includes an output unit that outputs an estimation result by the estimation unit.

本発明によれば、農作物の摘果作業における摘果対象物を高精度にＡＩで推定できる。 According to the present invention, the fruit-picking target in the fruit-picking work of agricultural products can be estimated with high accuracy by AI.

ＡＩ用の学習データ生成装置の全体構成例を示す図である。It is a figure which shows the whole structure example of the learning data generation apparatus for AI. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware configuration example of an information processing apparatus. 第１実施形態の全体処理例を示す図である。It is a figure which shows the whole processing example of 1st Embodiment. 敵対的生成ネットワークの構成例を示す図である。It is a figure which shows the configuration example of a hostile generation network. 撮影方法の例を示す図である。It is a figure which shows the example of the photographing method. 第２実施形態の全体処理例を示す図である。It is a figure which shows the whole processing example of 2nd Embodiment. 抽出処理の例を示す図である。It is a figure which shows the example of the extraction process. インスタンスセグメンテーションの処理例、及び、マスク画像データの例を示す図である。It is a figure which shows the processing example of the instance segmentation, and the example of the mask image data. イラスト化の処理例を示す図である。It is a figure which shows the processing example of illustration. イラスト化された画像データ、又は、マスク画像データの変形例を示す図である。It is a figure which shows the modification example of the illustrated image data or the mask image data. 対象物体の認識例を示す図である。It is a figure which shows the recognition example of the target object. 全体処理の処理結果例を示す図である。It is a figure which shows the processing result example of the whole processing. 学習装置の構成例を示す図である。It is a figure which shows the configuration example of the learning apparatus. 学習装置によって学習を行う構成の例を示す図である。It is a figure which shows the example of the structure which performs learning by a learning device. 学習装置の機能構成例を示す図である。It is a figure which shows the functional configuration example of a learning apparatus. 摘果対象物推定装置の構成例を示す図である。It is a figure which shows the structural example of the fruit thinning object estimation apparatus. 摘果対象物推定装置によって推定を行う構成の例を示す図である。It is a figure which shows the example of the structure which estimates by the fruit thinning object estimation apparatus. 摘果対象物推定装置の機能構成例を示す図である。It is a figure which shows the functional composition example of the fruit thinning object estimation apparatus. 学習システムの機能構成例を示す図である。It is a figure which shows the functional structure example of a learning system. 推定システムの機能構成例を示す図である。It is a figure which shows the functional configuration example of an estimation system. ネットワーク構造例を示す図である。It is a figure which shows the example of a network structure.

以下、添付する図面を参照して、具体例を説明する。なお、以下の説明において、図面に記載する符号は、符号が同一の場合には同一の要素を指す。 Specific examples will be described below with reference to the attached drawings. In the following description, the reference numerals described in the drawings refer to the same elements when the reference numerals are the same.

［第１実施形態］
図１は、ＡＩ用の学習データ生成装置の全体構成例を示す図である。例えば、ＡＩ用の学習データ生成装置（以下「学習データ生成装置１０」という。）は、以下のように用いる。 [First Embodiment]
FIG. 1 is a diagram showing an overall configuration example of a learning data generation device for AI. For example, the learning data generation device for AI (hereinafter referred to as "learning data generation device 10") is used as follows.

学習データ生成装置１０は、例えば、以下のような情報処理装置等である。 The learning data generation device 10 is, for example, the following information processing device.

［情報処理装置のハードウェア構成例］
図２は、情報処理装置のハードウェア構成例を示す図である。例えば、学習データ生成装置１０は、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ、以下「ＣＰＵ１０Ｈ１」という。）、記憶装置１０Ｈ２、インタフェース１０Ｈ３、入力装置１０Ｈ４、及び、出力装置１０Ｈ５等を有するハードウェア構成である。また、学習データ生成装置１０は、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ、以下「ＧＰＵ１０Ｈ６」という。）を有するハードウェア構成であるのが望ましい。 [Hardware configuration example of information processing device]
FIG. 2 is a diagram showing a hardware configuration example of the information processing device. For example, the learning data generation device 10 has a hardware configuration including a Central Processing Unit (CPU, hereinafter referred to as “CPU 10H1”), a storage device 10H2, an interface 10H3, an input device 10H4, an output device 10H5, and the like. Further, it is desirable that the learning data generation device 10 has a hardware configuration having a Graphics Processing Unit (GPU, hereinafter referred to as "GPU10H6").

ＣＰＵ１０Ｈ１は、演算装置及び制御装置の例である。例えば、ＣＰＵ１０Ｈ１は、プログラム、又は、操作等に基づいて演算を行う。 CPU10H1 is an example of an arithmetic unit and a control unit. For example, the CPU 10H1 performs an operation based on a program, an operation, or the like.

記憶装置１０Ｈ２は、メモリ等の主記憶装置である。なお、記憶装置１０Ｈ２は、ＳＳＤＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ（ＳＳＤ）、又は、ハードディスク等の補助記憶装置があってもよい。 The storage device 10H2 is a main storage device such as a memory. The storage device 10H2 may have an auxiliary storage device such as an SSD Solid State Drive (SSD) or a hard disk.

インタフェース１０Ｈ３は、ネットワーク、又は、ケーブル等を介して外部装置とデータを送受信する。例えば、インタフェース１０Ｈ３は、コネクタ、又は、アンテナ等である。 The interface 10H3 transmits / receives data to / from an external device via a network, a cable, or the like. For example, the interface 10H3 is a connector, an antenna, or the like.

入力装置１０Ｈ４は、ユーザによる操作を入力する装置である。例えば、入力装置１０Ｈ４は、マウス、又は、キーボード等である。 The input device 10H4 is a device for inputting an operation by the user. For example, the input device 10H4 is a mouse, a keyboard, or the like.

出力装置１０Ｈ５は、ユーザに対して処理結果等を出力する装置である。例えば、出力装置１０Ｈ５は、ディスプレイ等である。 The output device 10H5 is a device that outputs a processing result or the like to the user. For example, the output device 10H5 is a display or the like.

ＧＰＵ１０Ｈ６は、画像処理用の演算装置である。なお、ＧＰＵ１０Ｈ６は、グラフィックコントローラ等と呼ばれる場合もある。特に、ＧＰＵ１０Ｈ６は、画像処理をリアルタイムに行う場合、又は、学習における並列計算等に用いる。 The GPU 10H6 is an arithmetic unit for image processing. The GPU 10H6 may be called a graphic controller or the like. In particular, GPU10H6 is used when performing image processing in real time, or for parallel calculation in learning.

なお、学習データ生成装置１０は、上記以外のハードウェア資源を内部、又は、外部に更に有するハードウェア構成であってもよい。また、学習データ生成装置１０は、複数の装置であってもよい。 The learning data generation device 10 may have a hardware configuration in which hardware resources other than the above are further provided inside or outside. Further, the learning data generation device 10 may be a plurality of devices.

［農作物、対象物体、摘果対象物、及び、摘果作業について］
学習データ生成装置１０は、摘果作業を行う前の農作物（以下、摘果作業前の状態の農作物を「第１農作物１２」という。）をカメラ１１で撮影した画像データ（以下「第１入力画像データ１１Ｄ１」という。）を入力する。なお、カメラ１１等の撮影装置は、学習データ生成装置１０が有する構成でもよい。 [About crops, target objects, fruit-picking objects, and fruit-picking work]
The learning data generation device 10 captures image data (hereinafter, "first input image data") of the crop before the fruit-picking work (hereinafter, the crop in the state before the fruit-picking work is referred to as "first crop 12") with the camera 11. 11D1 ") is input. The photographing device such as the camera 11 may have a configuration included in the learning data generation device 10.

さらに、学習データ生成装置１０は、摘果作業を行った後の農作物（以下、摘果作業後の状態の農作物を「第２農作物１３」という。）をカメラ１１で撮影した画像データ（以下「第２入力画像データ１１Ｄ２」という。）を入力する。 Further, the learning data generation device 10 captures image data (hereinafter, "second") of the crop after the fruit-picking work (hereinafter, the crop in the state after the fruit-picking work is referred to as "second crop 13") with the camera 11. Input image data 11D2 ”) is input.

以下、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２をまとめて単に「入力画像データ」という場合がある。 Hereinafter, the first input image data 11D1 and the second input image data 11D2 may be collectively referred to as “input image data”.

第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２は、動画、静止画、又は、これらの組み合わせである。また、動画の形式で入力する場合には、例えば、動画を構成する複数のフレームのうち、１枚、又は、所定数のフレームを切り出して、入力画像データとする。 The first input image data 11D1 and the second input image data 11D2 are a moving image, a still image, or a combination thereof. When inputting in the form of a moving image, for example, one frame or a predetermined number of frames out of a plurality of frames constituting the moving image are cut out and used as input image data.

摘果作業は、農作物が有する、若しくは、農作物の周辺に存在する実、花、葉、又は、これらの組み合わせ（以下「対象物体」という。）を間引く作業である。すなわち、摘果作業は、摘粒、摘果、摘花、又は、これらの組み合わせとなる作業である。 The fruit-picking operation is an operation of thinning out fruits, flowers, leaves, or a combination thereof (hereinafter referred to as "target object") possessed by or around the crop. That is, the fruit-picking work is a work of grain-picking, fruit-picking, flower-picking, or a combination thereof.

以下、対象物体のうち、摘果作業で間引く対象を「摘果対象物」という。つまり、摘果作業は、複数の対象物体のうち、いくつかの摘果対象物を選んで間引く作業である。なお、図では、摘果対象物を「×」で示し、間引かれた状態であることを示す。ただし、対象物体と、摘果対象物とをどのように区別して示すかの形式は問わない。 Hereinafter, among the target objects, the objects to be thinned out by the fruit thinning work are referred to as "fruit thinning objects". That is, the fruit-picking work is a work of selecting and thinning out some fruit-picking objects from a plurality of target objects. In the figure, the fruit-picking object is indicated by "x" to indicate that it is in a thinned state. However, the form of how to distinguish the target object from the fruit-picking target does not matter.

作業者１４は、対象物体のうち、どれを摘果対象物とするかを決定する。 The worker 14 determines which of the target objects is to be the fruit-picking target.

例えば、摘果対象物は、同じ農作物であっても、目的により、異なる場合がある。まず、目的は、例えば、農作物に全体的に日当たりが均等となるようにする、味を調整する、農作物がある程度密集するようにする、農作物が所定の大きさに収まるようにする、又は、収穫時に農作物の見栄え（色、形状、傷がついている対象物体を少なく、又は、これらを総合した外観等である。）が良くなるようにする等である。 For example, even if the crops are the same, the fruit-picking objects may differ depending on the purpose. First, the objectives are, for example, to ensure that the crops are generally evenly lit, to adjust the taste, to ensure that the crops are dense to some extent, to keep the crops within a predetermined size, or to harvest. Occasionally, the appearance of crops (color, shape, scratched target objects are reduced, or the overall appearance of these is reduced) is improved.

作業者１４は、摘果の目的に基づき、第１農作物１２に対して、見本となる摘果作業を行う。そして、作業者１４は、摘果作業の前後を別々に撮影する。このような各々の撮影により、入力画像データが生成される。 The worker 14 performs a sample fruit-picking operation on the first crop 12 based on the purpose of fruit-picking. Then, the worker 14 separately photographs before and after the fruit-picking operation. Input image data is generated by each such shooting.

また、入力画像データは、摘果の目的、又は、農作物の種類等によって別々に撮影する。すなわち、目的によって摘果作業の内容が異なる場合がある。ゆえに、入力画像データは、目的、又は、農作物の種類等に応じて別々に生成する。なお、作業者１４は、見本となる摘果作業を示すため、例えば、熟練の農業者等である。 In addition, the input image data is separately photographed according to the purpose of fruit thinning, the type of crop, and the like. That is, the content of the fruit thinning work may differ depending on the purpose. Therefore, the input image data is generated separately according to the purpose, the type of agricultural product, and the like. The worker 14 is, for example, a skilled farmer or the like in order to show a sample fruit-picking work.

第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較すると、学習データ生成装置１０等は、どの箇所の対象物体を摘果対象物とするか、及び、どの程度の量を摘果対象物とするか等が把握できる。 Comparing the first input image data 11D1 and the second input image data 11D2, the learning data generation device 10 and the like determine which target object is the fruit picking target and how much is the fruit picking target. It is possible to grasp whether or not to do so.

農作物は、例えば、トマト等といった実を実らせる農作物である。以下、農作物がトマトである場合を例に説明する。ただし、農作物は、トマトに限られない。例えば、農作物は、柿、さくらんぼ、苺、葡萄、又は、蜜柑等の果物である。又は、農作物は、花、若しくは、野菜等でもよい。なお、農作物がトマト等であっても、摘果対象物には、実の周辺に存在する葉、又は、茎等が含まれてもよい。 Agricultural crops are crops that bear fruit, such as tomatoes. Hereinafter, the case where the agricultural product is tomato will be described as an example. However, crops are not limited to tomatoes. For example, crops are fruits such as persimmons, cherries, strawberries, grapes, or tangerines. Alternatively, the crop may be a flower, a vegetable, or the like. Even if the crop is tomato or the like, the fruit-picking object may include leaves, stems or the like existing around the fruit.

以上のように撮影される第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２が学習データ生成装置１０に入力される。次に、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２が入力されると、学習データ生成装置１０は、全体処理により、学習用の画像データ（以下「学習データ１５」という。）を生成する。このように生成される学習データ１５を入力し、ＡＩ１６は、学習を行う。 The first input image data 11D1 and the second input image data 11D2 captured as described above are input to the learning data generation device 10. Next, when the first input image data 11D1 and the second input image data 11D2 are input, the learning data generation device 10 performs the whole processing to obtain image data for learning (hereinafter referred to as "learning data 15"). To generate. The learning data 15 generated in this way is input, and the AI 16 performs learning.

［全体処理例］
図３は、第１実施形態の全体処理例を示す図である。 [Overall processing example]
FIG. 3 is a diagram showing an example of overall processing of the first embodiment.

ステップＳ０３０１では、作業者１４は、第１入力画像データ１１Ｄ１を撮影する。すなわち、作業者１４は、摘果作業を行う前に第１農作物１２を撮影して、第１入力画像データ１１Ｄ１を生成する。 In step S0301, the worker 14 captures the first input image data 11D1. That is, the worker 14 photographs the first crop 12 before performing the fruit-picking operation to generate the first input image data 11D1.

ステップＳ０３０２では、作業者１４は、摘果作業を行う。この摘果作業により、第１農作物１２は、摘果対象物が排除された状態となり、第２農作物１３となる。このような摘果作業の後、ステップＳ０３０３が行われる。 In step S0302, the worker 14 performs the fruit picking work. By this fruit-picking operation, the first crop 12 is in a state in which the fruit-picking target is excluded, and becomes the second crop 13. After such fruit thinning work, step S0303 is performed.

ステップＳ０３０３では、作業者１４は、第２入力画像データ１１Ｄ２を撮影する。すなわち、作業者１４は、摘果作業を行った後に第２農作物１３を撮影して、第２入力画像データ１１Ｄ２を生成する。 In step S0303, the worker 14 captures the second input image data 11D2. That is, the worker 14 photographs the second crop 13 after performing the fruit thinning work to generate the second input image data 11D2.

ステップＳ０３０４では、学習データ生成装置１０は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を入力する。 In step S0304, the learning data generation device 10 inputs the first input image data 11D1 and the second input image data 11D2.

ステップＳ０３０５では、学習データ生成装置１０は、摘果対象物を抽出する。例えば、学習データ生成装置１０は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較して、第１入力画像データ１１Ｄ１が示すすべての対象物体のうち、第２入力画像データ１１Ｄ２上では無くなっている対象物体を摘果対象物と抽出する。 In step S0305, the learning data generation device 10 extracts the fruit-picking object. For example, the learning data generation device 10 compares the first input image data 11D1 and the second input image data 11D2, and among all the target objects indicated by the first input image data 11D1, the second input image data 11D2. The target object that is no longer above is extracted as the fruit-picking target.

したがって、抽出結果は、摘果対象物の位置を示す画像データ等の形式となる。具体的には、抽出結果は、第１入力画像データ１１Ｄ１を加工して、摘果対象物の領域を所定の色で塗り潰す、又は、ハッチングする等によって示す。 Therefore, the extraction result is in the form of image data or the like indicating the position of the fruit-picking object. Specifically, the extraction result is shown by processing the first input image data 11D1 and filling the area of the fruit-picking object with a predetermined color, hatching, or the like.

なお、抽出結果は、画像データ形式に限られず、摘果対象物を特定できればよい。例えば、抽出において、対象物体を認識する場合には、各々の対象物体に対し、識別番号、又は、画像データにおける座標値（図心等の代表値でもよい。）が設定される。このような識別番号、又は、座標値等を指定して摘果対象物を特定する形式で、抽出結果は生成されてもよい。 The extraction result is not limited to the image data format, and it is sufficient that the fruit-picking target can be specified. For example, when recognizing a target object in extraction, an identification number or a coordinate value in image data (a representative value such as a center of gravity may be used) is set for each target object. The extraction result may be generated in a format for specifying the fruit-picking target by designating such an identification number, a coordinate value, or the like.

ただし、学習データ生成装置１０は、識別番号等のデータがあれば、抽出結果を示す画像データが生成できるとする。以下、抽出結果は、画像データの形式である例で説明する。 However, it is assumed that the learning data generation device 10 can generate image data indicating the extraction result if there is data such as an identification number. Hereinafter, the extraction result will be described with an example in which the image data format is used.

なお、抽出結果は、ユーザによる指定、訂正、又は、追加がされてもよい。 The extraction result may be specified, corrected, or added by the user.

ステップＳ０３０６では、学習データ生成装置１０は、抽出結果を示す画像データ等を学習データとし、学習を行う。 In step S0306, the learning data generation device 10 uses image data or the like indicating the extraction result as learning data and performs learning.

学習データは、抽出結果等を示す画像データ等、すなわち、イラスト化された形式の画像等である。ただし、学習データは、複数の形式の画像データでもよい。学習データの形式は、後述する。 The learning data is image data or the like showing the extraction result or the like, that is, an image or the like in an illustrated format. However, the learning data may be image data in a plurality of formats. The format of the training data will be described later.

なお、学習は、繰り返し行われてもよい。すなわち、学習は、後述するステップＳ０３０７、及び、ステップＳ０３０８が所定の精度を確保して実行できる程度に繰り返されてもよい。 The learning may be repeated. That is, the learning may be repeated to the extent that step S0307 and step S0308, which will be described later, can be executed with a predetermined accuracy.

ステップＳ０３０７では、学習データ生成装置１０は、推定結果画像データを生成する。 In step S0307, the learning data generation device 10 generates the estimation result image data.

ステップＳ０３０８では、学習データ生成装置１０は、推定結果画像データを識別する。 In step S0308, the learning data generation device 10 identifies the estimation result image data.

ステップＳ０３０７、及び、ステップＳ０３０８は、例えば、以下のような構成で実現されるのが望ましい。 It is desirable that step S0307 and step S0308 are realized, for example, in the following configuration.

［敵対的生成ネットワーク（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ、以下「ＧＡＮ」という。）による画像データの生成と識別の例］
図４は、敵対的生成ネットワークの構成例を示す図である。例えば、学習データ生成装置１０は、抽出部１０Ｆ２、生成部１０Ｆ３、及び、識別部１０Ｆ４等により、以下のような構成であるのが望ましい。 [Example of generation and identification of image data by a hostile generation network (Generative Adversarial Networks, hereinafter referred to as "GAN")]
FIG. 4 is a diagram showing a configuration example of a hostile generation network. For example, it is desirable that the learning data generation device 10 has the following configuration by the extraction unit 10F2, the generation unit 10F3, the identification unit 10F4, and the like.

ＧＡＮは、図示するように、生成部１０Ｆ３が生成する画像データと、抽出部１０Ｆ２による抽出結果を示す画像データを識別部１０Ｆ４が見分ける構成である。 As shown in the figure, the GAN has a configuration in which the identification unit 10F4 distinguishes between the image data generated by the generation unit 10F3 and the image data indicating the extraction result by the extraction unit 10F2.

生成部１０Ｆ３は、敵対的生成ネットワークにおける生成器（Ｇｅｎｅｒａｔｏｒ、生成ネットワーク等とも呼ばれる。）となる。すなわち、生成部１０Ｆ３は、画像データを作り出すニューラルネットワークモデルである。 The generation unit 10F3 serves as a generator (also referred to as a Generator, a generation network, or the like) in a hostile generation network. That is, the generation unit 10F3 is a neural network model that creates image data.

識別部１０Ｆ４は、敵対的生成ネットワークにおける識別器（Ｄｉｓｃｒｉｍｉｎａｔｏｒ、識別ネットワーク等とも呼ばれる。）となる。すなわち、識別部１０Ｆ４は、画像データが生成器によって生成された画像データであるか否かを識別するニューラルネットワークモデルである。 The identification unit 10F4 serves as a discriminator (also referred to as a discriminator, an identification network, or the like) in a hostile generation network. That is, the identification unit 10F4 is a neural network model that discriminates whether or not the image data is the image data generated by the generator.

以下、ＧＡＮを構成する生成器、及び、識別器の学習に用いる学習データを「第１学習データ」という。一方で、全体処理によって生成される、すなわち、識別部１０Ｆ４の識別結果に基づき、出力する学習データを「第２学習データ」という。 Hereinafter, the learning data used for learning the generator and the classifier constituting the GAN will be referred to as "first learning data". On the other hand, the learning data generated by the overall processing, that is, output based on the identification result of the identification unit 10F4, is referred to as "second learning data".

図示するＧＡＮでは、抽出結果を示す画像データ（以下単に「抽出結果２０」という。）が「本物」となる。また、抽出結果２０は、生成部１０Ｆ３の「見本」にもなる。すなわち、生成部１０Ｆ３は、例えば、いくつかの抽出結果２０を第１学習データとして事前に学習し、ある程度の精度で抽出結果２０に似せた画像データを生成できる構成とする。 In the illustrated GAN, the image data indicating the extraction result (hereinafter, simply referred to as “extraction result 20”) is “genuine”. The extraction result 20 also serves as a "sample" of the generation unit 10F3. That is, the generation unit 10F3 has a configuration in which, for example, some extraction results 20 are learned in advance as the first learning data, and image data similar to the extraction results 20 can be generated with a certain degree of accuracy.

一方で、生成部１０Ｆ３が生成する摘果作業の内容を推定した結果を示す画像データ（以下「推定結果画像データ２１」という。）が「偽物」である。 On the other hand, the image data (hereinafter referred to as "estimation result image data 21") showing the result of estimating the content of the fruit thinning work generated by the generation unit 10F3 is "fake".

ステップＳ０３０７では、生成部１０Ｆ３は、推定結果画像データ２１を生成する。 In step S0307, the generation unit 10F3 generates the estimation result image data 21.

推定結果画像データ２１は、抽出結果２０を真似て生成する画像データである。したがって、推定結果画像データ２１は、抽出結果２０と同様の形式であって、摘果対象物を特定する画像データである。このように、生成部１０Ｆ３は、「偽物」である推定結果画像データ２１を識別部１０Ｆ４に「本物」と識別させるのを狙って生成する。 The estimation result image data 21 is image data generated by imitating the extraction result 20. Therefore, the estimation result image data 21 has the same format as the extraction result 20, and is image data for identifying the fruit-picking target. In this way, the generation unit 10F3 generates the estimation result image data 21 which is a “fake” with the aim of causing the identification unit 10F4 to identify it as the “genuine”.

ただし、推定結果画像データ２１は、生成部１０Ｆ３が生成する画像データであるため、実在する農作物を示す画像データではない。このように、生成部１０Ｆ３、及び、識別部１０Ｆ４、すなわち、ＧＡＮは、合成画像データを生成する。 However, since the estimation result image data 21 is image data generated by the generation unit 10F3, it is not image data indicating an actual agricultural product. In this way, the generation unit 10F3 and the identification unit 10F4, that is, GAN generate the composite image data.

また、推定結果画像データ２１は、抽出結果２０が示す摘果作業を別の農作物において再現する。すなわち、推定結果画像データ２１は、すべての対象物体のうち、摘果対象物となる対象物体を推定した結果を示す。 Further, the estimation result image data 21 reproduces the fruit-picking operation indicated by the extraction result 20 in another crop. That is, the estimation result image data 21 shows the result of estimating the target object to be the fruit-picking target among all the target objects.

生成部１０Ｆ３は、事前に、抽出結果２０等を第１学習データにして摘果作業のパターン等を学習する。したがって、生成部１０Ｆ３は、未知の農作物を示す第１入力画像データ１１Ｄ１が入力されると、まず、事前の学習により、第１入力画像データ１１Ｄ１が示す対象物体を認識できる。 The generation unit 10F3 uses the extraction result 20 and the like as the first learning data in advance to learn the pattern of fruit-picking work and the like. Therefore, when the first input image data 11D1 indicating an unknown agricultural product is input, the generation unit 10F3 can first recognize the target object indicated by the first input image data 11D1 by prior learning.

次に、生成部１０Ｆ３は、事前の学習により、認識した対象物体のうち、どの位置にある対象物体を摘果対象物するか、又は、どの程度の量を摘果対象物とするか等を推定できる。そして、生成部１０Ｆ３は、これらの推定結果を画像データの形式で示し、推定結果画像データ２１を生成する。 Next, the generation unit 10F3 can estimate, by learning in advance, at what position the target object is to be picked, or how much is to be picked. .. Then, the generation unit 10F3 shows these estimation results in the form of image data, and generates the estimation result image data 21.

ステップＳ０３０８では、抽出結果２０、及び、推定結果画像データ２１を混ぜ、識別部１０Ｆ４は、「本物」であるか、又は、「偽物」であるかを識別する。 In step S0308, the extraction result 20 and the estimation result image data 21 are mixed, and the identification unit 10F4 identifies whether it is “genuine” or “fake”.

生成部１０Ｆ３は、できる限り「本物」と識別部１０Ｆ４に識別されるように推定結果画像データ２１を生成するように、画像処理等を学習する。一方で、識別部１０Ｆ４は、フィードバック等に基づき、「偽物」を「偽物」と識別できる精度を高めるように学習する。 The generation unit 10F3 learns image processing and the like so as to generate the estimation result image data 21 so that the identification unit 10F4 can identify it as “genuine” as much as possible. On the other hand, the identification unit 10F4 learns to improve the accuracy of distinguishing the "counterfeit" from the "counterfeit" based on feedback or the like.

具体的には、識別部１０Ｆ４による識別結果に対し、第１学習データには、識別対象となった画像データが「本物」であるか、又は、「偽物」であるかの「正解」を示すデータ（以下「正解データ２２」という。）が用意される。そして、識別結果と正解データ２２を照合すると、識別部１０Ｆ４が正しい識別であったか否かを評価できる。 Specifically, with respect to the identification result by the identification unit 10F4, the first learning data indicates a "correct answer" as to whether the image data to be identified is "genuine" or "fake". Data (hereinafter referred to as "correct answer data 22") is prepared. Then, by collating the identification result with the correct answer data 22, it is possible to evaluate whether or not the identification unit 10F4 is the correct identification.

このような評価、及び、識別結果等が生成部１０Ｆ３にフィードバック（Ｆｅｅｄｂａｃｋ）されると、生成部１０Ｆ３は、識別部１０Ｆ４に「本物」と識別されるのを狙って、推定結果画像データ２１を生成するように学習できる。すなわち、生成部１０Ｆ３は、フィードバックによって「本物」と識別されやすい「偽物」を生成できるように学習する。 When such evaluation and identification results are fed back to the generation unit 10F3, the generation unit 10F3 aims to identify the estimation result image data 21 as "genuine" by the identification unit 10F4. Can be learned to generate. That is, the generation unit 10F3 learns so that it can generate a "counterfeit" that can be easily identified as a "genuine" by feedback.

また、評価が識別部１０Ｆ４にフィードバックされると、識別部１０Ｆ４は、「偽物」を「偽物」と識別できる精度を高めるように学習できる。すなわち、識別部１０Ｆ４は、フィードバックによって、「偽物」を見逃す、又は、「偽物」を「本物」と誤認する確率を低くするように学習する。 Further, when the evaluation is fed back to the identification unit 10F4, the identification unit 10F4 can learn to improve the accuracy of distinguishing the "counterfeit" from the "counterfeit". That is, the identification unit 10F4 learns to reduce the probability of overlooking the "counterfeit" or misidentifying the "counterfeit" as the "genuine" by feedback.

なお、学習データ生成装置１０は、事前にステップＳ０３０６による第１学習データに基づく学習を繰り返す、学習処理を行って、生成部１０Ｆ３、及び、識別部１０Ｆ４にある程度の精度を持たせてもよい。 The learning data generation device 10 may perform a learning process in which learning based on the first learning data in step S0306 is repeated in advance to give the generation unit 10F3 and the identification unit 10F4 a certain degree of accuracy.

そして、識別部１０Ｆ４によって「本物」と識別される程度の品質で生成された推定結果画像データ２１を第２学習データとする。このように、学習データ１５を生成すると、ＡＩ１６が学習に用いる第２学習データを増やすことができる。 Then, the estimation result image data 21 generated with a quality that can be identified as "genuine" by the identification unit 10F4 is used as the second learning data. By generating the learning data 15 in this way, the second learning data used by the AI 16 for learning can be increased.

一方で、識別部１０Ｆ４によって「偽物」と識別された推定結果画像データ２１は、「再利用」の対象とする。すなわち、「偽物」と識別された推定結果画像データは、学習が不十分な結果である。 On the other hand, the estimation result image data 21 identified as "fake" by the identification unit 10F4 is subject to "reuse". That is, the estimation result image data identified as "fake" is a result of insufficient learning.

そこで、例えば、「偽物」と識別された推定結果画像データに対して、「本物」と識別させるように、不十分な点を修正する操作を行う。このように、手動で操作された内容を反映させた画像データ等により、生成部１０Ｆ３にフィードバックさせる等の処理が「再利用」となる。このような「再利用」がされると、生成部１０Ｆ３は、不十分な点を学習し、より「本物」と識別されやすい推定結果画像データ２１を生成できる。 Therefore, for example, an operation of correcting insufficient points is performed so that the estimation result image data identified as "fake" is identified as "genuine". In this way, the process of feeding back to the generation unit 10F3 by the image data or the like reflecting the manually operated contents is “reuse”. When such "reuse" is performed, the generation unit 10F3 can learn the insufficient points and generate the estimation result image data 21 that is more easily identified as "genuine".

なお、「再利用」は、生成部１０Ｆ３の学習に用いるに限られない。例えば、「再利用」は、手動で操作された内容を反映させた画像データを学習データ１５に加える等でもよい。ただし、「再利用」が難しい場合には、「偽物」と識別された推定結果画像データは、破棄されてもよい。 Note that "reuse" is not limited to being used for learning of the generation unit 10F3. For example, “reuse” may include adding image data reflecting the manually operated contents to the learning data 15. However, if "reuse" is difficult, the estimation result image data identified as "fake" may be discarded.

なお、図示するようなＧＡＮは、ＡＩ１６の学習に用いる学習データ１５を生成する。このように生成される第２学習データは、農作物の摘果箇所を推定するＡＩ用であり、人による目視で評価される画像データとは異なる。 The GAN as shown in the figure generates learning data 15 used for learning the AI 16. The second learning data generated in this way is for AI for estimating the fruit-picked portion of the crop, and is different from the image data evaluated visually by a human.

例えば、一般的な風景等を撮影した場合には、画像データには、人の目視では判断しにくいような微小な色の変化等が存在する場合がある。このような変化は、人の目視による評価ではあまり重視されない。一方で、コンピュータによる評価では、画素値の変動等を計算すると把握できる場合がある。このように、画像データの生成は、コンピュータによる評価を意識するか、又は、人の目視による評価を意識するかにより、重視する評価項目等が異なる場合がある。 For example, when a general landscape or the like is photographed, the image data may have minute color changes or the like that are difficult for humans to visually judge. Such changes are less important in human visual assessment. On the other hand, in the evaluation by a computer, it may be possible to grasp by calculating the fluctuation of the pixel value and the like. As described above, in the generation of image data, the evaluation items to be emphasized may differ depending on whether the evaluation by a computer or the visual evaluation by a person is conscious.

［撮影方法の例］
第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２等の入力画像データは、例えば、以下のように撮影されるのが望ましい。 [Example of shooting method]
It is desirable that the input image data such as the first input image data 11D1 and the second input image data 11D2 be photographed as follows, for example.

図５は、撮影方法の例を示す図である。以下、図において上下方向を「Ｚ軸方向」とする。Ｚ軸方向は、いわゆる重力方向である。また、図において、主に左右方向を「Ｘ軸方向」とする。Ｘ軸方向は、農作物に対して正面に向かい合った状態で右手方向とする。さらに、奥行き方向を「Ｙ軸方向」とする。 FIG. 5 is a diagram showing an example of a photographing method. Hereinafter, in the figure, the vertical direction is referred to as the "Z-axis direction". The Z-axis direction is the so-called gravity direction. Further, in the figure, the left-right direction is mainly defined as the "X-axis direction". The X-axis direction is the right-hand direction while facing the front of the crop. Further, the depth direction is defined as the "Y-axis direction".

以下、第１農作物１２を撮影する場合を例に説明する。 Hereinafter, a case where the first crop 12 is photographed will be described as an example.

入力画像データは、Ｚ軸回りに複数の視点で撮影するのが望ましい。すなわち、入力画像データは、第１農作物１２をできるだけ様々な視点で示す画像データであるのが望ましい。 It is desirable that the input image data be photographed from a plurality of viewpoints around the Z axis. That is, it is desirable that the input image data is image data showing the first crop 12 from various viewpoints as much as possible.

具体的には、カメラ１１は、光軸を第１農作物１２に向けて、Ｚ軸を中心に回転するように（いわゆるＹａｗ軸回転である。図において「Ｙａｗ」で示す回転である。）動画で撮影するのが望ましい。 Specifically, the camera 11 is moving toward the first crop 12 so as to rotate about the Z axis (so-called Yaw axis rotation, which is the rotation indicated by “Yaw” in the figure). It is desirable to shoot with.

このように撮影すると、第１農作物１２を全周方向から撮影できる。なお、入力画像データは、３６０°のうち、３視点程度を撮影する静止画等でもよい。 When photographed in this way, the first crop 12 can be photographed from the entire circumference direction. The input image data may be a still image or the like that captures about 3 viewpoints out of 360 °.

摘果作業は、農作物の全体的な形状、又は、日当たり等を気にして行う場合がある。したがって、摘果対象物は、様々な角度に存在する場合がある。ゆえに、カメラ１１は、１つの視点では、すべての摘果対象物を撮影できない場合もある。そのため、入力画像データは、できるだけ死角がないように様々な視点で撮影されるのが望ましい。 The fruit-picking work may be carried out in consideration of the overall shape of the crop or the sunlight. Therefore, the fruit-picking object may exist at various angles. Therefore, the camera 11 may not be able to capture all the fruit-picking objects from one viewpoint. Therefore, it is desirable that the input image data be photographed from various viewpoints so as to have as few blind spots as possible.

なお、入力画像データは、Ｘ軸回りに複数の視点で更に撮影するのがより望ましい。例えば、カメラ１１は、光軸を第１農作物１２に向けて、第１農作物１２の正面となる視点、第１農作物１２を下から撮影する視点（いわゆる見上げ視点である。）、及び、第１農作物１２の背面となる視点等で撮影する。 It is more desirable that the input image data is further photographed from a plurality of viewpoints around the X axis. For example, the camera 11 has an optical axis directed toward the first crop 12, a viewpoint that is in front of the first crop 12, a viewpoint that photographs the first crop 12 from below (a so-called looking-up viewpoint), and a first. The photograph is taken from the viewpoint which is the back surface of the crop 12.

このように、カメラ１１は、Ｘ軸を中心に回転するように（いわゆるＰｉｔｃｈ軸回転である。図において「Ｐｉｔｃｈ」で示す回転である。）撮影するのが望ましい。 As described above, it is desirable that the camera 11 shoots so as to rotate about the X axis (so-called Pitch axis rotation, which is the rotation indicated by “Pitch” in the figure).

また、第２入力画像データ１１Ｄ２も同様に撮影されるのが望ましい。 Further, it is desirable that the second input image data 11D2 is also photographed in the same manner.

以上のように、Ｐｉｔｃｈ、又は、Ｙａｗの回転を行って複数の視点で農作物を撮影して入力画像データが撮影されるのが望ましい。このような撮影であると、農作物の全体の形状を整える摘果作業、又は、農作物の日当たりの良さを整える摘果作業等を入力画像データから把握できる。 As described above, it is desirable to rotate the Pitch or Yaw to photograph the crops from a plurality of viewpoints and capture the input image data. With such shooting, it is possible to grasp the fruit-picking work for adjusting the overall shape of the crop, the fruit-picking work for adjusting the sunnyness of the crop, and the like from the input image data.

また、入力画像データは、異なる気象条件、又は、異なる周囲物の配置等の条件下で撮影されてもよい。つまり、入力画像データは、季節又は天候等により、異なる周囲環境、又は、異なる照明条件下で撮影された状態を示すのが望ましい。 Further, the input image data may be captured under different weather conditions, different arrangements of surrounding objects, and the like. That is, it is desirable that the input image data shows a state of being photographed under different ambient environments or different lighting conditions depending on the season, weather, and the like.

［第２実施形態］
第２実施形態は、第１実施形態と比較すると、全体処理が以下のようになる点が異なる。 [Second Embodiment]
The second embodiment is different from the first embodiment in that the overall processing is as follows.

図６は、第２実施形態の全体処理例を示す図である。以下、第１実施形態と異なる点を中心に説明し、重複する説明を省略する。第２実施形態における全体処理は、第１実施形態における全体処理と比較すると、ステップＳ０６０１を行う点が異なる。 FIG. 6 is a diagram showing an example of overall processing of the second embodiment. Hereinafter, the points different from those of the first embodiment will be mainly described, and duplicate description will be omitted. The overall process in the second embodiment is different from the overall process in the first embodiment in that step S0601 is performed.

ステップＳ０６０１では、学習データ生成装置１０は、摘果対象物を抽出する。具体的には、学習データ生成装置１０は、以下のような抽出処理を行って摘果対象物を抽出する。 In step S0601, the learning data generation device 10 extracts the fruit-picking object. Specifically, the learning data generation device 10 extracts the fruit-picking object by performing the following extraction process.

図７は、抽出処理の例を示す図である。例えば、ステップＳ０６０１は、以下のような処理を行う。 FIG. 7 is a diagram showing an example of extraction processing. For example, step S0601 performs the following processing.

ステップＳ０７０１では、学習データ生成装置１０は、第１マスク画像データを生成する。 In step S0701, the learning data generation device 10 generates the first mask image data.

第１マスク画像データは、後段のステップＳ０７０２で行うインスタンスセグメンテーション（ＩｎｓｔａｎｃｅＳｅｇｍｅｎｔａｔｉｏｎ）用の学習において学習データとなるマスク画像データである。すなわち、第１マスク画像データは、「見本」となる画像データである。 The first mask image data is mask image data that becomes learning data in the learning for instance segmentation (Instance Segmentation) performed in the subsequent step S0702. That is, the first mask image data is image data that serves as a "sample".

なお、第１マスク画像データは、画像データ内の一部、又は、全部を塗り潰す等のマスクする領域を指定するデータでもよい。 The first mask image data may be data that specifies a masked area such as filling a part or the whole of the image data.

以下、第１マスク画像データをインスタンスセグメンテーション用の学習データとし、かつ、インスタンスセグメンテーションにより生成されるマスク画像データを「第２マスク画像データ」という。なお、マスク画像データの詳細は後述する。 Hereinafter, the first mask image data is referred to as training data for instance segmentation, and the mask image data generated by the instance segmentation is referred to as "second mask image data". The details of the mask image data will be described later.

ステップＳ０７０２では、学習データ生成装置１０は、インスタンスセグメンテーションの学習を行う。 In step S0702, the training data generation device 10 learns instance segmentation.

ステップＳ０７０３では、学習データ生成装置１０は、インスタンスセグメンテーションを評価する。 In step S0703, the training data generator 10 evaluates the instance segmentation.

ステップＳ０７０４では、学習データ生成装置１０は、インスタンスセグメンテーションを行う第２マスク画像データを生成する。 In step S0704, the learning data generation device 10 generates the second mask image data for instance segmentation.

例えば、インスタンスセグメンテーション、及び、マスク画像データの生成は以下のような処理である。 For example, instance segmentation and generation of mask image data are the following processes.

図８は、インスタンスセグメンテーションの処理例、及び、マスク画像データの例を示す図である。以下、図８（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 FIG. 8 is a diagram showing an example of instance segmentation processing and an example of mask image data. Hereinafter, the first input image data 11D1 shown in FIG. 8A will be described as an example.

例えば、第１入力画像データ１１Ｄ１に、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４の４つの対象物体が撮影されたとする。 For example, it is assumed that four target objects of the first object 31, the second object 32, the third object 33, and the fourth object 34 are photographed in the first input image data 11D1.

図８（Ｂ）は、インスタンスセグメンテーションの実行結果、及び、インスタンスセグメンテーションにより生成されるマスク画像データ４０の例を示す図である。 FIG. 8B is a diagram showing an execution result of instance segmentation and an example of mask image data 40 generated by the instance segmentation.

インスタンスセグメンテーションは、例えば、図８（Ａ）に示す第１入力画像データ１１Ｄ１に対して処理を実行することで、図８（Ｂ）に示すマスク画像データ４０を生成する処理である。 The instance segmentation is, for example, a process of generating the mask image data 40 shown in FIG. 8 (B) by executing the process on the first input image data 11D1 shown in FIG. 8 (A).

具体的には、インスタンスセグメンテーションは、第１入力画像データ１１Ｄ１において、物体の検出、及び、検出した複数の物体を別々の物体と識別する処理である。 Specifically, the instance segmentation is a process of detecting an object and identifying a plurality of detected objects from different objects in the first input image data 11D1.

図８（Ｂ）に示す例は、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４を示す領域（以下、画像データにおいて対象物体を示す領域を「第１領域」という。）と、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４以外の領域（以下「第２領域」という。例えば、第２領域は背景等である。）とを２色で区別して示すマスク画像データ４０の例である。 In the example shown in FIG. 8B, the area showing the first object 31, the second object 32, the third object 33, and the fourth object 34 (hereinafter, the area showing the target object in the image data is referred to as the “first area”. ”), Areas other than the first object 31, the second object 32, the third object 33, and the fourth object 34 (hereinafter referred to as“ second area ”. For example, the second area is a background or the like. ) Is an example of the mask image data 40 which is distinguished by two colors.

具体的には、図８（Ｂ）に示すように、マスク画像データ４０において、第１領域は、白色で示す領域である。一方で、マスク画像データ４０において、第２領域は、黒色で示す領域である。このように、マスク画像データ４０は、例えば、第１領域、及び、第２領域を二値化して異なる色で示す画像データである。 Specifically, as shown in FIG. 8B, in the mask image data 40, the first region is a region shown in white. On the other hand, in the mask image data 40, the second region is a region shown in black. As described above, the mask image data 40 is, for example, image data in which the first region and the second region are binarized and shown in different colors.

なお、マスク画像データ４０は、図８（Ｂ）に示すような形式に限られない。例えば、第１領域、及び、第２領域をどのような色にするか等は事前に設定でき、他の色の組み合わせでもよい。また、マスク画像データ４０は、色で領域を区別する形式に限られず、例えば、ハッチングの有無、又は、識別データで区別する等の形式でもよい。 The mask image data 40 is not limited to the format shown in FIG. 8B. For example, the colors of the first region and the second region can be set in advance, and other color combinations may be used. Further, the mask image data 40 is not limited to a format in which regions are distinguished by color, and may be in a format such as the presence or absence of hatching or discrimination by identification data.

学習データ生成装置１０は、マスク画像データ４０を第１入力画像データ１１Ｄ１に適用すると、第１領域を抽出した画像データを生成できる。すなわち、マスク画像データ４０を参照すると、学習データ生成装置１０は、第１入力画像データ１１Ｄ１において、対象物体を認識し、対象物体を抽出した画像データを生成できる。 When the mask image data 40 is applied to the first input image data 11D1, the learning data generation device 10 can generate image data obtained by extracting the first region. That is, with reference to the mask image data 40, the learning data generation device 10 can recognize the target object in the first input image data 11D1 and generate image data obtained by extracting the target object.

マスク画像データ４０を利用すると、第１入力画像データ１１Ｄ１が示す背景等を削除できる。すなわち、学習において、背景等といった対象物体以外のデータを排除できると、ＡＩが、摘果作業において重要でない物体、又は、背景等を無駄に学習してしまうのを防ぐことができる。 By using the mask image data 40, the background and the like indicated by the first input image data 11D1 can be deleted. That is, if data other than the target object such as the background can be excluded in the learning, it is possible to prevent the AI from unnecessarily learning the object or the background which is not important in the fruit-picking work.

このように、マスク画像データ４０は、背景等を第２領域とする等のように、第１領域以外をマスク化ができる画像データであるのが望ましい。 As described above, it is desirable that the mask image data 40 is image data capable of masking a region other than the first region, such as setting the background or the like as the second region.

また、マスク画像データ４０は、同じ種類の対象物体であっても、個々の対象物体を識別できる。すなわち、マスク画像データ４０を適用すると、図８（Ｂ）に示すように、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４を第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４のように、異なる物体と識別できる。 Further, the mask image data 40 can identify individual target objects even if they are the same type of target objects. That is, when the mask image data 40 is applied, as shown in FIG. 8B, the first object 31, the second object 32, the third object 33, and the fourth object 34 are the first target objects 41 and the second object 34. It can be distinguished from different objects such as the target object 42, the third target object 43, and the fourth target object 44.

例えば、セマンティックセグメンテーション（ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎ）の処理であると、第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４は、同じ物体又はカテゴリーに分類され、区別されない場合が多い。 For example, in the process of semantic segmentation, the first target object 41, the second target object 42, the third target object 43, and the fourth target object 44 are classified into the same object or category and distinguished. Often not done.

一方で、インスタンスセグメンテーションの処理であると、１つの対象物体を示す複数の画素をまとめて１つの物体と識別し、かつ、同じ種類であっても異なる物体であれば、別の物体であると識別できる。 On the other hand, in the case of instance segmentation processing, a plurality of pixels indicating one target object are collectively identified as one object, and if they are different objects of the same type, they are different objects. Can be identified.

すなわち、インスタンスセグメンテーションの処理を行うと、画像データ内において同じ種類の複数の対象物体がある場合には、いわゆるラベリング（ｌａｂｅｌｉｎｇ）が可能となる。例えば、図８（Ｂ）に示す例では、第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４が異なる識別番号等で管理できる。 That is, when the instance segmentation process is performed, so-called labeling becomes possible when there are a plurality of target objects of the same type in the image data. For example, in the example shown in FIG. 8B, the first target object 41, the second target object 42, the third target object 43, and the fourth target object 44 can be managed by different identification numbers or the like.

したがって、ステップＳ０７０２における学習は、対象物体を精度良く識別できる程度に行われる。そして、ステップＳ０７０３における評価は、対象物体を抽出する精度等を評価する。このようなステップＳ０７０２、及び、ステップＳ０７０３が行われると、ステップＳ０７０４で、学習データ生成装置１０は、インスタンスセグメンテーションを行う第２マスク画像データを生成できる。 Therefore, the learning in step S0702 is performed to such an extent that the target object can be accurately identified. Then, the evaluation in step S0703 evaluates the accuracy of extracting the target object and the like. When such steps S0702 and S0703 are performed, in step S0704, the learning data generation device 10 can generate the second mask image data for instance segmentation.

そして、インスタンスセグメンテーションの評価結果によっては、ステップＳ０７０１乃至ステップＳ０７０３は繰り返し実行される。すなわち、「学習処理」、及び、図７に示す処理は、ある程度の精度が確保されるまで繰り返し実行され、その後、十分な学習が完了している状態下において、「生成処理」、及び、図７に示す処理が行われてもよい。 Then, depending on the evaluation result of the instance segmentation, steps S0701 to S0703 are repeatedly executed. That is, the "learning process" and the process shown in FIG. 7 are repeatedly executed until a certain degree of accuracy is ensured, and then the "generation process" and the process shown in FIG. The process shown in 7 may be performed.

なお、学習データ生成装置１０は、ステップＳ０７０５のように、イラスト化を更に行うのが望ましい。例えば、イラスト化は以下のような処理である。 It is desirable that the learning data generation device 10 is further illustrated as in step S0705. For example, illustration is the following process.

図９は、イラスト化の処理例を示す図である。以下、図９（Ａ）に示すような写真形式の第１入力画像データ１１Ｄ１を入力する場合を例に説明する。 FIG. 9 is a diagram showing an example of an illustration process. Hereinafter, a case where the first input image data 11D1 in the photographic format as shown in FIG. 9A is input will be described as an example.

図９（Ａ）に示す例は、画像データの中央部分（図において果実が撮影されている部分である。以下「対象物体領域５１」という。）に、対象物体が存在する例を示す。例えば、対象物体領域５１に写る対象物体は、インスタンスセグメンテーション等の物体認識により識別される。 The example shown in FIG. 9A shows an example in which the target object exists in the central portion of the image data (the portion in which the fruit is photographed. Hereinafter referred to as “target object region 51”). For example, the target object reflected in the target object area 51 is identified by object recognition such as instance segmentation.

イラスト化の処理は、例えば、第１入力画像データ１１Ｄ１を入力し、図９（Ｂ）に示すような画像データ（以下「イラスト化画像データ５０」という。）を生成する処理である。 The illustration process is, for example, a process of inputting the first input image data 11D1 and generating image data (hereinafter referred to as “illustrated image data 50”) as shown in FIG. 9 (B).

図９（Ｂ）は、イラスト化画像データ５０の例を示す図である。 FIG. 9B is a diagram showing an example of the illustrated image data 50.

イラスト化画像データ５０は、対象物体の領域を所定の色で塗り潰す。例えば、図９（Ｂ）に示すように、イラスト化画像データ５０は、ハッチングで示す、対象物体の領域を塗り潰した画像データである。 The illustrated image data 50 fills the area of the target object with a predetermined color. For example, as shown in FIG. 9B, the illustrated image data 50 is image data in which a region of a target object is filled, which is shown by hatching.

以下、図９（Ｂ）に示す例において、対象物体の領域と識別され、イラスト化の処理で塗り潰す領域を「塗り潰し領域５２」という。 Hereinafter, in the example shown in FIG. 9B, the area that is identified as the area of the target object and is filled by the illustration process is referred to as “filled area 52”.

さらに、イラスト化画像データ５０は、塗り潰し領域５２以外の領域（背景等を示す領域である。）を白色（塗り潰し領域５２とは異なる色で塗り潰す等である。）とする。 Further, in the illustrated image data 50, the area other than the filled area 52 (the area showing the background or the like) is white (the area is filled with a color different from the filled area 52, etc.).

このように、イラスト化の処理は、対象物体の領域と、それ以外の領域を所定の色で色分けする処理等である。このように、イラスト化の処理を行うと、画像データにおけるＲＧＢ値又は輝度値等が単純化できる。 As described above, the process of creating an illustration is a process of color-coding the region of the target object and the other regions with a predetermined color. By performing the illustration process in this way, the RGB value, the luminance value, and the like in the image data can be simplified.

第１入力画像データ１１Ｄ１のような写真形式の画像データであると、人の目には分かりにくい細かなＲＧＢ値、又は、輝度値等の変化がある場合が多い。 In the case of image data in a photographic format such as the first input image data 11D1, there are many cases where there are small changes in RGB values or luminance values that are difficult for the human eye to understand.

例えば、トマトの果実は、単純には赤色の１色である。このような対象物体を示す場合において、写真形式の画像データであると、同じ対象物体における赤色を示す画素は、細かくＲＧＢ値等の画素値が変化する場合がある。このような細かなＲＧＢ値等の変化は、学習の対象としない方がよい場合が多い。 For example, tomato fruits are simply one red color. In the case of indicating such an object, if the image data is in a photographic format, the pixel value indicating red in the same object may finely change the pixel value such as the RGB value. In many cases, it is better not to study such small changes in RGB values.

そこで、イラスト化の処理は、対象物体を同じ色で統一して示す等の処理を行う。具体的には、第１入力画像データ１１Ｄ１に対して、インスタンスセグメンテーション等を行うと、対象物体と識別できる画素がグルーピング化される。 Therefore, in the illustration process, the target object is shown in the same color in a unified manner. Specifically, when instance segmentation or the like is performed on the first input image data 11D1, the pixels that can be identified as the target object are grouped.

そして、イラスト化の処理は、このように同じグルーピング化された画素を同じ色で塗り潰す処理である。さらに、イラスト化の処理は、背景等の領域を対象物体の領域とは異なる色で別の色に塗り潰す処理である。 The illustration process is a process of filling the same grouped pixels with the same color. Further, the illustration process is a process of painting an area such as a background with a color different from the area of the target object.

なお、イラスト化の処理は、画像データを単純化する処理であれば、所定の色で塗り潰す以外の処理であってもよい。例えば、イラスト化の処理は、背景等を単色にする等でもよい。また、イラスト化の処理は、色で塗り潰すに代えて、ハッチング等を用いる処理でもよい。 The illustration process may be a process other than filling with a predetermined color as long as it is a process for simplifying the image data. For example, the illustration process may be performed by making the background or the like a single color. Further, the illustration process may be a process using hatching or the like instead of painting with a color.

このように、画像データをイラスト化すると、抽出結果等を単純化して表現できる。抽出結果は、対象物体の位置、及び、形状等が大まかに表現できればよい場合が多い。すなわち、抽出結果には、細かな色の変化、及び、背景等のデータが不要な場合が多い。 By making the image data into an illustration in this way, the extraction result and the like can be simplified and expressed. In many cases, the extraction result should be able to roughly express the position and shape of the target object. That is, in many cases, the extraction result does not require data such as fine color changes and backgrounds.

そこで、対象物体を単色で簡略に示す方が、写真形式等と比較して、学習の妨げとなる要素を排除し、精度良く学習できる。すなわち、イラスト化された画像データを学習データに摘果作業をＡＩに学習させると、ＡＩは、摘果作業に重要な特徴量を精度良く学習できる。 Therefore, if the target object is simply shown in a single color, it is possible to eliminate elements that hinder learning and to learn with high accuracy as compared with a photographic format or the like. That is, when the AI is made to learn the fruit-picking work by using the illustrated image data as the learning data, the AI can accurately learn the feature amount important for the fruit-picking work.

また、写真形式等の画像データより、イラスト化された画像データの方が、色の表現等が簡略であるため、データ量を少なくできる。 In addition, the amount of data can be reduced in the illustrated image data because the color expression and the like are simpler than the image data in the photographic format.

図１０は、イラスト化された画像データ、又は、マスク画像データの変形例を示す図である。例えば、マスク画像データは、図１０（Ｂ）又は図１０（Ｃ）のように生成されてもよい。以下、図１０（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 FIG. 10 is a diagram showing a modified example of the illustrated image data or the mask image data. For example, the mask image data may be generated as shown in FIG. 10 (B) or FIG. 10 (C). Hereinafter, the first input image data 11D1 shown in FIG. 10A will be described as an example.

図１０（Ａ）は、林檎の４つの果実を対象物体にする第１入力画像データ１１Ｄ１の例を示す図である。以下、学習データ生成装置１０は、このような第１入力画像データ１１Ｄ１を入力し、学習データ生成装置１０は、インスタンスセグメンテーション等を行う例で説明する。 FIG. 10A is a diagram showing an example of the first input image data 11D1 in which four apple fruits are targeted objects. Hereinafter, the learning data generation device 10 will be described with an example in which such first input image data 11D1 is input, and the learning data generation device 10 performs instance segmentation or the like.

例えば、図８に示すインスタンスセグメンテーションを行う場合には、第２マスク画像データは、図１０（Ｂ）に示すように生成される。 For example, when performing the instance segmentation shown in FIG. 8, the second mask image data is generated as shown in FIG. 10 (B).

一方で、第２マスク画像データは、図１０（Ｃ）に示すように生成されてもよい。 On the other hand, the second mask image data may be generated as shown in FIG. 10 (C).

図１０（Ｂ）は、４つの対象物体をまとめて１つの画像データで示す形式の例を示す図である。このように、第２マスク画像データは、複数の対象物体を１つの画像データで示してもよい。 FIG. 10B is a diagram showing an example of a format in which four target objects are collectively shown by one image data. As described above, the second mask image data may represent a plurality of target objects with one image data.

図１０（Ｃ）は、４つの対象物体を対象物体ごとに分けた４つの画像データとし、画像データ群の形式とする例を示す図である。このように、第２マスク画像データは、対象物体ごとに、画像データを分けて、複数の画像データ群で１つの第２マスク画像データとする画像データ群の形式でもよい。 FIG. 10C is a diagram showing an example in which four target objects are divided into four image data for each target object and are in the form of an image data group. As described above, the second mask image data may be in the form of an image data group in which the image data is divided for each target object and a plurality of image data groups are used as one second mask image data.

以上のように、マスク画像データ、又は、イラスト化して生成する画像データは、複数の対象物体をまとめて１つの画像データとしてもよいし、又は、対象物体ごとに別々に分けて画像データ群としてもよい。 As described above, the mask image data or the image data generated by making an illustration may be a plurality of target objects collectively as one image data, or may be separately divided for each target object as an image data group. May be good.

［抽出結果の例］
図１１は、対象物体の認識例を示す図である。以下、図１１（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 [Example of extraction result]
FIG. 11 is a diagram showing a recognition example of the target object. Hereinafter, the first input image data 11D1 shown in FIG. 11A will be described as an example.

図１１（Ａ）に示す対象物体を扱う場合には、学習データ生成装置１０は、対象物体の形状、色、又は、これらの組み合わせ等を事前に学習する。このような学習を行うと、例えば、学習データ生成装置１０は、図１１（Ｂ）又は図１１（Ｃ）のように対象物体を認識できる。 When handling the target object shown in FIG. 11A, the learning data generation device 10 learns in advance the shape, color, combination thereof, and the like of the target object. When such learning is performed, for example, the learning data generation device 10 can recognize the target object as shown in FIG. 11B or FIG. 11C.

図１１（Ｂ）、及び、図１１（Ｃ）は、対象物体を認識した位置、及び、範囲等を破線で囲んで示す例である。なお、認識結果は、図１１（Ｂ）、及び、図１１（Ｃ）以外の形式で出力されてもよい。 11 (B) and 11 (C) are examples in which the position where the target object is recognized, the range, and the like are surrounded by a broken line. The recognition result may be output in a format other than those shown in FIGS. 11 (B) and 11 (C).

図１１（Ｂ）は、対象物体を認識した結果の第１例を示す図である。例えば、図１１（Ｂ）に示すように、対象物体は、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７のように、学習データ生成装置１０によって認識される。 FIG. 11B is a diagram showing a first example of the result of recognizing the target object. For example, as shown in FIG. 11B, the target objects are the first target object 101, the second target object 102, the third target object 103, the fourth target object 104, the fifth target object 105, and the sixth target object. Like 106 and the seventh object object 107, it is recognized by the learning data generator 10.

また、対象物体は、例えば、図１１（Ｃ）のような形式で認識されてもよい。 Further, the target object may be recognized in the form shown in FIG. 11C, for example.

図１１（Ｃ）は、対象物体を認識した結果の第２例を示す図である。第２例は、対象物体ごとに認識結果を別々の画像データに分ける形式の例である。具体的には、学習データ生成装置１０は、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７の認識結果を対象物体ごとに分けて出力する。 FIG. 11C is a diagram showing a second example of the result of recognizing the target object. The second example is an example of a format in which the recognition result is divided into separate image data for each target object. Specifically, the learning data generation device 10 includes a first target object 101, a second target object 102, a third target object 103, a fourth target object 104, a fifth target object 105, a sixth target object 106, and the like. The recognition result of the seventh target object 107 is output separately for each target object.

なお、対象物体の認識結果は、図１１（Ｂ）又は図１１（Ｃ）に示すように、画像データの形式にされなくともよい。すなわち、対象物体の認識結果は、中間生成物であり、対象物体が画像データ内において占める位置、大きさ、範囲、数、又は、座標等のパラメータ（統計値、又は、代表値を用いる場合を含む。）を学習データ生成装置１０が把握できる形式であればよい。 The recognition result of the target object does not have to be in the form of image data as shown in FIG. 11 (B) or FIG. 11 (C). That is, the recognition result of the target object is an intermediate product, and parameters such as the position, size, range, number, or coordinates occupied by the target object in the image data (statistical value or representative value are used). Including)) may be in a format that can be grasped by the learning data generation device 10.

したがって、学習データ生成装置１０は、認識結果を示すパラメータを内部に記憶し、図示するような画像データ等を出力しなくともよい。 Therefore, the learning data generation device 10 does not have to store the parameters indicating the recognition result internally and output the image data or the like as shown in the figure.

ステップＳ０３０６では、学習データ生成装置１０は、学習データを用いて学習モデルを学習させる。例えば、学習データは、ステップＳ０６０１で生成する画像データ、すなわち、イラスト化した画像データ等である。なお、学習データは、学習データは、複数の形式の画像データでもよい。学習データの詳細は後述する。 In step S0306, the learning data generation device 10 trains the learning model using the learning data. For example, the learning data is the image data generated in step S0601, that is, the illustrated image data and the like. The learning data may be image data in a plurality of formats. The details of the training data will be described later.

［全体処理の処理結果例］
図１２は、全体処理の処理結果例を示す図である。以下、図１２（Ａ）及び図１２（Ｂ）を摘果前及び摘果後とする場合を例に説明する。 [Example of processing result of whole processing]
FIG. 12 is a diagram showing a processing result example of the entire processing. Hereinafter, the case where FIGS. 12 (A) and 12 (B) are before and after fruit thinning will be described as an example.

図１２（Ａ）は、第１入力画像データ１１Ｄ１の例を示す図である。 FIG. 12A is a diagram showing an example of the first input image data 11D1.

図１２（Ｂ）は、第２入力画像データ１１Ｄ２の例を示す図である。 FIG. 12B is a diagram showing an example of the second input image data 11D2.

図１２（Ｃ）は、第２学習データの例を示す図である。 FIG. 12C is a diagram showing an example of the second learning data.

以下、第１入力画像データ１１Ｄ１において、すなわち、摘果作業の前において、図１２（Ａ）に示すように、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７の７つの対象物体がある例とする。 Hereinafter, in the first input image data 11D1, that is, before the fruit thinning work, as shown in FIG. 12A, the first target object 101, the second target object 102, the third target object 103, and the fourth target object It is assumed that there are seven target objects of 104, a fifth target object 105, a sixth target object 106, and a seventh target object 107.

一方で、第２入力画像データ１１Ｄ２において、すなわち、摘果作業が行われた後において、図１２（Ｂ）に示すように、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、及び、第６対象物体１０６の４つの対象物体が摘果対象物となり、摘果対象物が摘果される例とする。 On the other hand, in the second input image data 11D2, that is, after the fruit thinning work is performed, as shown in FIG. 12B, the second target object 102, the third target object 103, and the fourth target object 104, In addition, four target objects of the sixth target object 106 are fruit-picking objects, and the fruit-picking target is an example of fruit-picking.

このように、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較すると、摘果対象物が抽出できる。このような抽出結果を学習すると、学習データ生成装置１０は、未知の第１入力画像データ１１Ｄ１が入力されると、摘果作業を推定し、推定結果画像データを生成できる。 By comparing the first input image data 11D1 and the second input image data 11D2 in this way, the fruit-picking target can be extracted. When such an extraction result is learned, the learning data generation device 10 can estimate the fruit-picking operation and generate the estimation result image data when the unknown first input image data 11D1 is input.

このように生成される推定結果画像データ等が学習データ１５となる。そして、ＡＩ１６は、学習データ１５等を第２学習データとし、摘果作業を学習する。 The estimation result image data or the like generated in this way becomes the learning data 15. Then, the AI 16 learns the fruit-picking work by using the learning data 15 and the like as the second learning data.

図１２（Ｃ）は、対象物体を点線で囲んで示す形式の例を示す図である。また、図１２（Ｃ）は、摘果対象物をハッチングで示す形式の例を示す図である。 FIG. 12C is a diagram showing an example of a format in which the target object is surrounded by a dotted line. Further, FIG. 12C is a diagram showing an example of a format in which the fruit-picking object is indicated by hatching.

なお、第２学習データは、図１２（Ｃ）に示す形式に限られない。すなわち、第２学習データは、摘果対象物の位置、数、配置、形状、又は、範囲等をＡＩ１６が学習できればよい。したがって、第２学習データは、摘果対象物、及び、対象物体を他の形式で特定してもよい。 The second learning data is not limited to the format shown in FIG. 12 (C). That is, for the second learning data, it is sufficient that AI16 can learn the position, number, arrangement, shape, range, etc. of the fruit-picking object. Therefore, in the second learning data, the fruit-picking target and the target object may be specified in other formats.

［第３実施形態］
図１３は、学習装置の構成例を示す図である。第１実施形態等と比較すると、第３実施形態における学習データ生成装置１０等の構成は、例えば、第１実施形態と同様である。一方で、学習装置３０１は、情報処理装置等である。なお、学習データ生成装置１０、及び、学習装置３０１は同じ情報処理装置等でもよい。 [Third Embodiment]
FIG. 13 is a diagram showing a configuration example of the learning device. Compared with the first embodiment and the like, the configuration of the learning data generation device 10 and the like in the third embodiment is the same as that of the first embodiment, for example. On the other hand, the learning device 301 is an information processing device or the like. The learning data generation device 10 and the learning device 301 may be the same information processing device or the like.

第３実施形態は、第１実施形態、又は、第２実施形態における構成により生成された学習データ１５等を用いて学習モデル３０２を学習させて学習済みモデル３０３を生成する。 In the third embodiment, the learning model 302 is trained using the learning data 15 or the like generated by the configuration in the first embodiment or the second embodiment to generate the trained model 303.

以下、学習中、又は、学習が行われる前のＡＩを単に「学習モデル３０２」という。一方で、ある程度、第２学習データによる学習が行われた後のＡＩを「学習済みモデル３０３」という。 Hereinafter, the AI during learning or before learning is simply referred to as "learning model 302". On the other hand, to some extent, the AI after learning with the second training data is called "trained model 303".

学習装置３０１は、学習データ１５を入力する。そして、学習装置３０１は、学習データ１５により、学習モデル３０２を学習させる。 The learning device 301 inputs the learning data 15. Then, the learning device 301 trains the learning model 302 by the learning data 15.

なお、学習には、学習データ１５以外のデータが用いられてもよい。例えば、学習装置３０１は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２等も入力して、学習モデル３０２を学習させてもよい。ほかにも、学習装置３０１は、抽出結果等の形式で第２学習データを入力してもよい。 In addition, data other than learning data 15 may be used for learning. For example, the learning device 301 may also input the first input image data 11D1 and the second input image data 11D2 to train the learning model 302. In addition, the learning device 301 may input the second learning data in the form of an extraction result or the like.

以上のように、学習装置３０１は、学習モデル３０２を学習させて学習済みモデル３０３を生成する。このような学習済みモデル３０３が生成できると、摘果対象物を推定するＡＩが実現できる。 As described above, the learning device 301 trains the learning model 302 to generate the trained model 303. If such a trained model 303 can be generated, AI for estimating the fruit-picking object can be realized.

具体的には、学習装置３０１は、例えば、以下のような構成である。 Specifically, the learning device 301 has the following configuration, for example.

図１４は、学習装置によって学習を行う構成の例を示す図である。図示するように、学習モデル３０２は、少なくとも生成部１０Ｆ３、及び、識別部１０Ｆ４を備える構成である。 FIG. 14 is a diagram showing an example of a configuration in which learning is performed by a learning device. As shown in the figure, the learning model 302 has at least a generation unit 10F3 and an identification unit 10F4.

そして、生成部１０Ｆ３、及び、識別部１０Ｆ４は、敵対的生成ネットワークにおける生成器、及び、識別器である。 Then, the generation unit 10F3 and the identification unit 10F4 are the generator and the classifier in the hostile generation network.

まず、学習装置３０１は、第１入力画像データ１１Ｄ１を入力する。 First, the learning device 301 inputs the first input image data 11D1.

次に、生成部１０Ｆ３は、第１入力画像データ１１Ｄ１が示す対象物体のうち、摘果対象物となる対象物体を推定する。そして、生成部１０Ｆ３は、推定結果画像データ２１を生成する。以下、図１２（Ｃ）と同様に、対象物体を点線で囲んで示し、かつ、対象物体のうち、摘果対象物をハッチングで示す形式の例で説明する。 Next, the generation unit 10F3 estimates the target object to be the fruit-picking target among the target objects indicated by the first input image data 11D1. Then, the generation unit 10F3 generates the estimation result image data 21. Hereinafter, as in FIG. 12C, an example of a format in which the target object is surrounded by a dotted line and the fruit-picking target is shown by hatching among the target objects will be described.

次に、推定結果画像データ２１が生成されると、識別部１０Ｆ４は、推定結果画像データ２１に対して、識別を行う。そして、識別部１０Ｆ４は、学習データ１５を「正解」とし、推定結果画像データ２１の識別を行う。 Next, when the estimation result image data 21 is generated, the identification unit 10F4 identifies the estimation result image data 21. Then, the identification unit 10F4 identifies the estimation result image data 21 by setting the learning data 15 as the “correct answer”.

具体的には、まず、推定結果画像データ２１は、摘果対象物の位置、及び、数等を示す。一方で、学習データ１５も、推定結果画像データ２１と同様に、摘果対象物の位置、及び、数等を示す。 Specifically, first, the estimation result image data 21 indicates the position, number, and the like of the fruit-picking object. On the other hand, the learning data 15 also shows the position, number, and the like of the fruit-picking object, like the estimation result image data 21.

以下に説明する例では、識別部１０Ｆ４は、推定結果画像データ２１を参照して、摘果対象物の位置、及び、数がどちらも学習データ１５と一致すると、「正解」と識別する。 In the example described below, the identification unit 10F4 refers to the estimation result image data 21 and identifies the “correct answer” when both the position and the number of the fruit-picking objects match the learning data 15.

一方で、識別部１０Ｆ４は、推定結果画像データ２１が示す摘果対象物の位置、及び、数のうち、少なくともどちらか一方が学習データ１５と異なると、「誤答」と識別する。 On the other hand, if at least one of the position and the number of the fruit-picking target indicated by the estimation result image data 21 is different from the learning data 15, the identification unit 10F4 identifies it as an “wrong answer”.

そして、識別部１０Ｆ４は、少なくとも生成部１０Ｆ３に「正解」、又は、「誤答」の識別結果をフィードバックさせる。このように、フィードバックは、識別部１０Ｆ４から少なくとも生成部１０Ｆ３に、識別結果を伝える処理等である。 Then, the identification unit 10F4 causes at least the generation unit 10F3 to feed back the identification result of the "correct answer" or the "wrong answer". As described above, the feedback is a process of transmitting the identification result from the identification unit 10F4 to at least the generation unit 10F3.

なお、生成部１０Ｆ３の学習のため、フィードバックは、識別部１０Ｆ４による識別の過程、識別の基準、又は、識別の途中で生成した中間データ等を伝えてもよい。すなわち、フィードバックは、識別結果を出力するまでの過程、及び、途中で生成されたデータ等も識別結果とセットで伝えてもよい。そして、生成部１０Ｆ３は、フィードバックされる識別結果を参照して学習する。なお、他にセットでデータが送信される場合には、生成部１０Ｆ３は、セットのデータも参照して学習してもよい。 For the learning of the generation unit 10F3, the feedback may convey the identification process by the identification unit 10F4, the identification standard, the intermediate data generated during the identification, and the like. That is, the feedback may convey the process up to the output of the identification result and the data generated in the middle as a set with the identification result. Then, the generation unit 10F3 learns by referring to the feedback identification result. When data is transmitted as a set, the generation unit 10F3 may also refer to the set data for learning.

具体的には、図１４に示す例では、推定結果画像データ２１、及び、学習データ１５は、７個の対象物体から摘果対象物を選択して示す。そして、推定結果画像データ２１による推定結果、及び、学習データ１５による「正解」を比較すると、この例は、中央に位置する対象物体（図において、差異１５１で示す対象物体である。）が摘果対象物となるか否かが異なる。 Specifically, in the example shown in FIG. 14, the estimation result image data 21 and the learning data 15 are shown by selecting a fruit-picking object from seven object objects. Then, when the estimation result by the estimation result image data 21 and the "correct answer" by the learning data 15 are compared, in this example, the target object located at the center (the target object indicated by the difference 151 in the figure) is picked. Whether it is an object or not is different.

ゆえに、推定結果画像データ２１、及び、学習データ１５の比較結果は、摘果対象物の数、及び、差異１５１の判断結果が異なるため、差異があると識別される。したがって、比較結果に基づき、摘果対象物の数、及び、位置がいずれも基準とする学習データ１５と異なるため、識別部１０Ｆ４は、「誤答」と識別する。 Therefore, the comparison result of the estimation result image data 21 and the learning data 15 is identified as having a difference because the number of fruit-picking objects and the judgment result of the difference 151 are different. Therefore, based on the comparison result, the number and the position of the fruit-picking target are different from the reference learning data 15, so that the identification unit 10F4 identifies it as an "wrong answer".

なお、識別部１０Ｆ４による識別は、基準に対して許容範囲があってもよい。例えば、数は、基準に対して２個以下であれば許容する等と設定されてもよい。このような許容範囲の設定である場合には、差異１５１の差異だけであれば、識別部１０Ｆ４は、「正解」と識別する。また、学習において、設定できる項目があってもよい。 The identification by the identification unit 10F4 may have an allowable range with respect to the standard. For example, the number may be set to allow as long as it is 2 or less with respect to the standard. In the case of setting such an allowable range, if there is only a difference of difference 151, the identification unit 10F4 identifies it as a "correct answer". In addition, there may be items that can be set in learning.

そして、例えば、生成部１０Ｆ３が生成する複数の推定結果画像データ２１を専門家が見て、評価が行われる。具体的には、生成部１０Ｆ３が１００枚の推定結果画像データ２１を生成し、専門家が推定結果画像データ２１を見て１００枚ともすべて問題ないと判断すれば、生成部１０Ｆ３等は学習が完了したと評価される。 Then, for example, an expert looks at the plurality of estimation result image data 21 generated by the generation unit 10F3 and evaluates them. Specifically, if the generation unit 10F3 generates 100 estimation result image data 21 and the expert looks at the estimation result image data 21 and determines that all 100 images are okay, the generation unit 10F3 and the like can learn. Evaluated as completed.

以上のような生成、及び、識別のフィードバックを繰り返すと、学習装置３０１は、推定結果画像データ２１の生成精度を高くできる。 By repeating the above generation and identification feedback, the learning device 301 can increase the accuracy of generating the estimation result image data 21.

なお、学習装置３０１は、生成、又は、識別において、摘果対象物を抽出するのが望ましい。具体的には、学習装置３０１は、生成、又は、識別において、マスク画像データの生成、及び、イラスト化等の処理を行う。 In addition, it is desirable that the learning device 301 extracts the fruit-picking target in the generation or identification. Specifically, the learning device 301 performs processing such as generation of mask image data and illustration in generation or identification.

このように、画像データをマスクする、イラスト化する、又は、両方の処理を行って、抽出を行うと、抽出結果等を単純化して表現できる。そして、抽出結果は、対象物体の位置、及び、形状等が大まかに表現できればよい場合が多い。すなわち、抽出結果には、細かな色の変化、摘果作業に関係の薄い被写体、及び、背景等のデータが不要な場合が多い。 In this way, if the image data is masked, illustrated, or both are processed and extracted, the extraction result and the like can be simplified and expressed. In many cases, the extraction result only needs to be able to roughly express the position, shape, and the like of the target object. That is, in many cases, the extraction result does not require data such as fine color changes, subjects that are not related to fruit-picking work, and backgrounds.

特に、農作物がある環境は、周囲の環境をＡＩの学習用、及び、撮影用に調整しにくい場合も多い。また、農作物がある環境は、不意に関係の薄い被写体も入り込みやすい環境である場合が多い。したがって、画像データをマスクする処理により、このような外乱を少なくできると、ＡＩは、摘果作業の内容を把握するのに重要な特徴量を精度良く学習できる。 In particular, in an environment with agricultural products, it is often difficult to adjust the surrounding environment for learning AI and for photography. In addition, the environment in which there are agricultural products is often an environment in which subjects with little relation to each other can easily enter. Therefore, if such disturbance can be reduced by the process of masking the image data, the AI can accurately learn the feature amount important for grasping the content of the fruit-picking work.

また、対象物体をイラスト化して単色で簡略に示す、又は、重要な部分に絞った画像データとする方が、写真形式等と比較して、摘果作業の内容を学習する妨げとなる要素を排除し、精度良く学習できる。すなわち、画像データに対して抽出処理を前処理として施して、摘果作業をＡＩに学習させると、ＡＩは、摘果作業の内容を把握するのに重要な特徴量を精度良く学習できる。 In addition, it is better to make the target object into an illustration and simply show it in a single color, or to use image data focusing on important parts, which eliminates elements that hinder learning the contents of fruit-picking work compared to photographic formats. And you can learn with high accuracy. That is, when the image data is subjected to the extraction process as the pre-process and the AI is made to learn the fruit-picking work, the AI can accurately learn the feature amount important for grasping the content of the fruit-picking work.

なお、識別部１０Ｆ４は、推定結果画像データ２１、識別結果、及び、学習データ１５等で学習して識別の精度を向上させてもよい。 The identification unit 10F4 may improve the accuracy of identification by learning from the estimation result image data 21, the identification result, the learning data 15, and the like.

また、学習データ１５は、学習データ生成装置１０が生成したデータでもよいし、第１入力画像データ１１Ｄ１を操作して生成したデータでもよいし、又は、これらの組み合わせでもよい。 Further, the learning data 15 may be data generated by the learning data generation device 10, data generated by manipulating the first input image data 11D1, or a combination thereof.

さらに、推定結果画像データ２１、及び、学習データ１５の形式は、図示する形式に限られない。すなわち、推定結果画像データ２１、及び、学習データ１５の形式は、摘果作業の内容が特定できればよい。例えば、推定結果画像データ２１、及び、学習データ１５の形式は、摘果対象物の位置、及び、数等の内容を数値（画像内の座標又は数量等を示す。）を用いる形式等でもよい。 Further, the formats of the estimation result image data 21 and the learning data 15 are not limited to the formats shown. That is, as for the format of the estimation result image data 21 and the learning data 15, it is sufficient that the content of the fruit thinning work can be specified. For example, the format of the estimation result image data 21 and the learning data 15 may be a format in which the position of the fruit-picking object and the content such as the number are numerical values (indicating the coordinates or quantity in the image).

なお、識別の基準は、摘果対象物の位置、及び、数に限られず、他の基準でもよい。そして、何を基準にして識別するかも学習の対象となってよい。また、何を基準にして識別するかは、人が設定できてもよい。 The criteria for identification are not limited to the position and number of fruit-picking objects, and may be other criteria. Then, what is used as a reference for identification may be the subject of learning. In addition, a person may be able to set what criteria should be used for identification.

［機能構成例］
図１５は、学習装置の機能構成例を示す図である。例えば、学習装置３０１は、画像データ入力部１０Ｆ１、学習データ入力部３０１Ｆ１、生成部１０Ｆ３、及び、識別部１０Ｆ４等を備える機能構成である。なお、学習装置３０１は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６を更に備える機能構成であるのが望ましい。以下、図示する機能構成を例に説明する。 [Functional configuration example]
FIG. 15 is a diagram showing a functional configuration example of the learning device. For example, the learning device 301 has a functional configuration including an image data input unit 10F1, a learning data input unit 301F1, a generation unit 10F3, an identification unit 10F4, and the like. It is desirable that the learning device 301 has a functional configuration further including an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6. Hereinafter, the illustrated functional configuration will be described as an example.

画像データ入力部１０Ｆ１は、第１入力画像データ１１Ｄ１を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting the first input image data 11D1. For example, the image data input unit 10F1 is realized by the camera 11, the interface 10H3, and the like.

生成部１０Ｆ３は、推定結果画像データ２１を生成する生成手順を行う。例えば、生成部１０Ｆ３は、ＣＰＵ１０Ｈ１等で実現する。 The generation unit 10F3 performs a generation procedure for generating the estimation result image data 21. For example, the generation unit 10F3 is realized by the CPU 10H1 or the like.

識別部１０Ｆ４は、学習データ１５と比較して、推定結果画像データ２１を識別して、識別結果を生成部１０Ｆ３へフィードバックさせて学習モデル３０２を学習させる識別手順を行う。例えば、識別部１０Ｆ４は、ＣＰＵ１０Ｈ１等で実現する。 The identification unit 10F4 performs an identification procedure of identifying the estimation result image data 21 by comparing with the learning data 15 and feeding back the identification result to the generation unit 10F3 to learn the learning model 302. For example, the identification unit 10F4 is realized by the CPU 10H1 or the like.

推定結果画像データ２１、及び、学習データ１５は、どちらか一方、又は、両方が抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 The estimation result image data 21 and the training data 15 are illustrated by either one or both of them generating mask image data by the extraction unit 10F2, the mask image data generation unit 10F5, and the illustration processing unit 10F6. It is desirable that an extraction process is performed in which the process is performed or both processes are performed.

このように、摘果対象物が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、学習モデル３０２は、摘果対象物等の重要な特徴量を精度良く学習できる。すなわち、学習装置３０１は、学習モデル３０２を学習させて、摘果作業を精度良く推定できる学習済みモデル３０３を生成できる。 In this way, when the fruit-picking target is extracted, the learning model 302 can accurately learn important features such as the fruit-picking target, as compared with the case where the image data obtained by simply photographing the crop is used as it is. .. That is, the learning device 301 can train the learning model 302 to generate a learned model 303 that can accurately estimate the fruit-picking operation.

［第４実施形態］
図１６は、摘果対象物推定装置の構成例を示す図である。以下、未知の摘果前の農作物を示す画像データの例を「未知画像データ４０１」という。 [Fourth Embodiment]
FIG. 16 is a diagram showing a configuration example of a fruit thinning object estimation device. Hereinafter, an example of image data showing an unknown crop before fruit thinning will be referred to as “unknown image data 401”.

第４実施形態は、第３実施形態による学習によって生成された学習済みモデル３０３を実行する実施形態である。以下、学習済みモデル３０３を用いる摘果対象物推定装置を「摘果対象物推定装置４０２」とする。 The fourth embodiment is an embodiment in which the trained model 303 generated by the learning according to the third embodiment is executed. Hereinafter, the fruit-picking object estimation device using the trained model 303 will be referred to as a “fruit-picking object estimation device 402”.

摘果対象物推定装置４０２は、例えば、スマートフォン等の情報処理装置である。なお、学習済みモデル３０３は、他のサーバ装置等が用いる構成であって、摘果対象物推定装置４０２は、サーバ装置と通信して学習済みモデル３０３による推定結果を取得し、出力する構成でもよい。 The fruit-picking object estimation device 402 is, for example, an information processing device such as a smartphone. The trained model 303 may be configured to be used by another server device or the like, and the fruit-picking object estimation device 402 may be configured to communicate with the server device to acquire and output the estimation result by the trained model 303. ..

具体的には、学習済みモデル３０３は、ネットワーク等を介して配布される。なお、学習済みモデル３０３は、アプリケーションソフト等に組み込まれる形式等でもよい。このように配布される学習済みモデル３０３を摘果対象物推定装置４０２にインストールすると、摘果対象物推定装置４０２は、図示するような推定、及び、推定結果の出力等ができる状態となる。 Specifically, the trained model 303 is distributed via a network or the like. The trained model 303 may be in a format or the like incorporated in application software or the like. When the learned model 303 distributed in this way is installed in the fruit-picking object estimation device 402, the fruit-picking object estimation device 402 is in a state where it can perform estimation as shown in the figure and output of the estimation result.

未知画像データ４０１は、摘果対象物推定装置４０２が撮影する画像データである。また、未知画像データ４０１が示す農作物は、摘果作業が行われる前の状態である。このように、未知画像データ４０１が示す農作物は、第１実施形態、又は、第２実施形態において、学習の対象となった農作物とは異なる「未知」の農作物である。 The unknown image data 401 is image data captured by the fruit thinning object estimation device 402. Further, the crop shown by the unknown image data 401 is in a state before the fruit thinning work is performed. As described above, the crop indicated by the unknown image data 401 is an "unknown" crop different from the crop targeted for learning in the first embodiment or the second embodiment.

なお、摘果対象物推定装置４０２は、推定において、摘果対象物を抽出するのが望ましい。具体的には、摘果対象物推定装置４０２は、推定において、マスク画像データの生成、及び、イラスト化等の処理を行うのが望ましい。このような摘果対象物の抽出が行われると、摘果対象物推定装置４０２は、推定を精度良くできる。 In addition, it is desirable that the fruit-picking target estimation device 402 extracts the fruit-picking target in the estimation. Specifically, it is desirable that the fruit-picking object estimation device 402 performs processing such as generation of mask image data and illustration in estimation. When such an extraction of the fruit-picking target is performed, the fruit-picking target estimation device 402 can perform the estimation with high accuracy.

摘果対象物推定装置４０２は、未知画像データ４０１に基づき、対象物体を識別する。そして、摘果対象物推定装置４０２は、学習済みモデル３０３により、摘果対象物を推定する。例えば、推定結果は、ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ（ＡＲ、拡張現実）の形式等で出力される。具体的には、摘果対象物推定装置４０２は、出力画面４０３をユーザ４０４に対して表示する。 The fruit thinning object estimation device 402 identifies the target object based on the unknown image data 401. Then, the fruit-picking object estimation device 402 estimates the fruit-picking target by the trained model 303. For example, the estimation result is output in the format of Augmented Reality (AR). Specifically, the fruit thinning object estimation device 402 displays the output screen 403 to the user 404.

出力画面４０３は、未知画像データ４０１の上に「×」を重ねて表示して、摘果対象物をユーザ４０４に伝える画面である。なお、出力は、他の表示形式、又は、音声を用いる等の形式でもよい。 The output screen 403 is a screen in which an “x” is superimposed on the unknown image data 401 to inform the user 404 of the fruit-picking target. The output may be in another display format or in a format such as using voice.

なお、摘果対象物推定装置４０２は、例えば、「最適化項目設定」の操作画面（以下単に「設定画面４０５」という。）等により、項目を受け付ける構成があるのが望ましい。 It is desirable that the fruit thinning object estimation device 402 has a configuration for accepting items on, for example, an operation screen of "optimization item setting" (hereinafter, simply referred to as "setting screen 405").

摘果作業は、いわゆる好みに応じて行われる場合がある。そこで、設定画面４０５は、好み等を設定するインタフェースである。 The fruit-picking work may be performed according to so-called preference. Therefore, the setting screen 405 is an interface for setting preferences and the like.

設定画面４０５は、「甘味」、「酸味」、「サイズ（全体）」、「サイズ（粒）」、「色」、「均一性」、及び、「ケースに入る形状にする。」等の項目を設定する例である。なお、項目、及び、設定形式は事前に定める。 The setting screen 405 has items such as "sweetness", "sourness", "size (overall)", "size (grain)", "color", "uniformity", and "shape to fit in the case". This is an example of setting. The items and setting format will be determined in advance.

「甘味」、及び、「酸味」は、収穫時の農作物の味を調整する項目である。 "Sweetness" and "sourness" are items for adjusting the taste of agricultural products at the time of harvesting.

「サイズ（全体）」は、収穫時の農作物の全体的なサイズを調整する項目である。例えば、「サイズ（全体）」は、複数の実を有する農作物等の場合に、複数の実による全体的なバランス等を調整するのに用いる。 "Size (overall)" is an item that adjusts the overall size of the crop at the time of harvest. For example, "size (overall)" is used to adjust the overall balance of a plurality of fruits in the case of a crop having a plurality of fruits.

「サイズ（実）」は、収穫時の農作物の１つの実当たりのサイズを調整する項目である。例えば、「サイズ（実）」は、複数の実を有する農作物等の場合に、１つ当たりの実の大きさ等を調整するのに用いる。 "Size (fruit)" is an item for adjusting the size of one crop at the time of harvest. For example, "size (fruit)" is used to adjust the size of each fruit in the case of a crop having a plurality of fruits.

「色」は、収穫時の農作物の色を調整する項目である。 "Color" is an item for adjusting the color of crops at the time of harvest.

「均一性」は、収穫時の農作物の実の大きさを均一にするかを調整する項目である。 "Homogeneity" is an item for adjusting whether or not the size of the fruit of the crop at the time of harvest is made uniform.

「ケースに入る形状にする」は、出荷に用いる所定の形状に収まるサイズにするか否かを調整する項目である。このように、項目は、チェックボックス形式で入力されてもよい。 "Making a shape that fits in a case" is an item for adjusting whether or not the size fits in a predetermined shape used for shipping. In this way, the items may be entered in the form of check boxes.

また、「ケースに入る形状にする」は、例えば、「縦（ｍｍ）×横（ｍｍ）×高さ（ｍｍ）のケースに入るように」等のように、ケースのサイズが数値で指定できる形式等でもよい。 In addition, the size of the case can be specified numerically, for example, "to fit in a case" means "to fit in a case of length (mm) x width (mm) x height (mm)". The format may be used.

これらの項目は、摘果作業で調整できる項目である。また、どのような摘果作業を行うと、どの項目に影響するかは、学習（すなわち、第３実施形態である。）において、学習データに入力される。例えば、農作物が甘くなる摘果作業、又は、農作物を大きくする摘果作業等のように、学習モデルは摘果作業の目的ごとに学習する。したがって、学習済みモデルは、項目を最適化する摘果作業を特定できる。また、程度（例えば、甘さ、又は、大きさ等である。）は、例えば、数値等で入力する。 These items are items that can be adjusted by fruit thinning work. In addition, what kind of fruit-picking work affects which item is input to the learning data in the learning (that is, the third embodiment). For example, the learning model learns for each purpose of the fruit-picking work, such as a fruit-picking work in which the crop becomes sweet or a fruit-picking work in which the crop is enlarged. Therefore, the trained model can identify the fruit-picking operation that optimizes the item. Further, the degree (for example, sweetness, size, etc.) is input by, for example, a numerical value.

なお、項目を受け付ける受付部は、設定画面４０５に限られない。すなわち、設定できる項目は、図示する以外の項目があってもよい。また、受付部は、タスクバー、又は、チェックボックス以外のインタフェースでよい。例えば、受付部は、テキストボックス等で入力するインタフェースでよい。さらに、最適化する項目は、固定であってもよい。 The reception unit that accepts items is not limited to the setting screen 405. That is, the items that can be set may include items other than those shown in the figure. Further, the reception unit may be an interface other than the taskbar or the check box. For example, the reception unit may be an interface for inputting in a text box or the like. Further, the item to be optimized may be fixed.

図１７は、摘果対象物推定装置によって推定を行う構成の例を示す図である。例えば、学習済みモデル３０２は、第３実施形態による学習後、第３実施形態で用いた敵対的生成ネットワークを構成する生成部１０Ｆ３、及び、識別部１０Ｆ４のうち、識別部１０Ｆ４を取り除いた構成である。 FIG. 17 is a diagram showing an example of a configuration in which estimation is performed by a fruit-picking object estimation device. For example, the trained model 302 has a configuration in which the identification unit 10F4 is removed from the generation unit 10F3 and the identification unit 10F4 that form the hostile generation network used in the third embodiment after learning according to the third embodiment. be.

すなわち、摘果対象物推定装置４０２は、未知画像データ４０１を入力すると、未知画像データ４０１が示す対象物体に適した摘果作業を推定する。そして、摘果対象物推定装置４０２は、推定結果を示す推定結果画像データ２１を出力する。 That is, when the unknown image data 401 is input, the fruit-picking object estimation device 402 estimates the fruit-picking operation suitable for the target object indicated by the unknown image data 401. Then, the fruit-picking object estimation device 402 outputs the estimation result image data 21 showing the estimation result.

なお、識別部１０Ｆ４は、機能が停止していればよい。すなわち、学習済みモデル３０２は、学習モデル３０２と同様に識別部１０Ｆ４を有しても、識別部１０Ｆ４を停止させればよい。一方で、学習済みモデル３０２は、識別部１０Ｆ４を取り除く、又は、識別部１０Ｆ４がない構成とし、識別部１０Ｆ４の構成が全くなくともよい。 The function of the identification unit 10F4 may be stopped. That is, even if the learned model 302 has the identification unit 10F4 as in the learning model 302, the identification unit 10F4 may be stopped. On the other hand, the trained model 302 may have a configuration in which the identification unit 10F4 is removed or the identification unit 10F4 is not provided, and the identification unit 10F4 may not be provided at all.

［機能構成例］
図１８は、摘果対象物推定装置の機能構成例を示す図である。例えば、摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。なお、摘果対象物推定装置４０２は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６を更に備える機能構成であるのが望ましい。以下、図示する機能構成を例に説明する。 [Functional configuration example]
FIG. 18 is a diagram showing a functional configuration example of the fruit thinning object estimation device. For example, the fruit thinning object estimation device 402 has a functional configuration including an image data input unit 10F1, an estimation unit 402F1, an output unit 402F2, and the like. It is desirable that the fruit thinning object estimation device 402 further includes an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6. Hereinafter, the illustrated functional configuration will be described as an example.

推定部４０２Ｆ１は、学習済みモデル３０３により、摘果対象物を推定する推定手順を行う。例えば、推定部４０２Ｆ１は、ＣＰＵ１０Ｈ１等で実現する。 The estimation unit 402F1 performs an estimation procedure for estimating the fruit-picking target by the trained model 303. For example, the estimation unit 402F1 is realized by the CPU 10H1 or the like.

例えば、推定部４０２Ｆ１は、生成部１０Ｆ３等で構成する。 For example, the estimation unit 402F1 is composed of a generation unit 10F3 and the like.

出力部４０２Ｆ２は、推定結果を出力する出力手順を行う。例えば、出力部４０２Ｆ２は、出力装置１０Ｈ５等で実現する。 The output unit 402F2 performs an output procedure for outputting the estimation result. For example, the output unit 402F2 is realized by an output device 10H5 or the like.

未知画像データ４０１は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 The unknown image data 401 is subjected to an extraction process in which the mask image data is generated, illustrated, or both are processed by the extraction unit 10F2, the mask image data generation unit 10F5, and the illustration processing unit 10F6. Is desirable.

推定においても、学習した要素にできるだけ注目した方が、摘果対象物推定装置４０２は、摘果対象物等を精度良く推定できる。 In the estimation, the fruit-picking target estimation device 402 can accurately estimate the fruit-picking target and the like by paying attention to the learned elements as much as possible.

このように、未知画像データ４０１において摘果対象物が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、摘果対象物推定装置４０２は、摘果対象物等を精度良く推定できる。 In this way, when the fruit-picking target is extracted from the unknown image data 401, the fruit-picking target estimation device 402 accurately adjusts the fruit-picking target, etc., as compared with the case where the image data obtained by simply photographing the crop is used as it is. Can be estimated well.

［学習システムの機能構成例］
図１９は、機能構成例を示す図である。例えば、学習データ生成装置１０は、画像データ入力部１０Ｆ１、抽出部１０Ｆ２、生成部１０Ｆ３、及び、識別部１０Ｆ４等を備える機能構成である。また、学習データ生成装置１０は、図示するように、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６等を更に備える機能構成であるのが望ましい。 [Example of functional configuration of learning system]
FIG. 19 is a diagram showing a functional configuration example. For example, the learning data generation device 10 has a functional configuration including an image data input unit 10F1, an extraction unit 10F2, a generation unit 10F3, an identification unit 10F4, and the like. Further, as shown in the figure, the learning data generation device 10 preferably has a functional configuration further including a mask image data generation unit 10F5, an illustration processing unit 10F6, and the like.

画像データ入力部１０Ｆ１は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting the first input image data 11D1 and the second input image data 11D2. For example, the image data input unit 10F1 is realized by the camera 11, the interface 10H3, and the like.

抽出部１０Ｆ２は、対象物体のうち、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２の差異となる対象物体を摘果対象物として抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting the target object, which is the difference between the first input image data 11D1 and the second input image data 11D2, as the fruit-picking target object among the target objects. For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

生成部１０Ｆ３は、抽出結果を示す画像データを第１学習データとして学習し、かつ、推定結果画像データを生成する生成手順を行う。例えば、生成部１０Ｆ３は、ＣＰＵ１０Ｈ１等で実現する。 The generation unit 10F3 learns the image data indicating the extraction result as the first learning data, and performs a generation procedure for generating the estimation result image data. For example, the generation unit 10F3 is realized by the CPU 10H1 or the like.

識別部１０Ｆ４は、推定結果画像データを識別して、識別結果に基づき第２学習データを生成する識別手順を行う。例えば、識別部１０Ｆ４は、ＣＰＵ１０Ｈ１等で実現する。 The identification unit 10F4 performs an identification procedure of identifying the estimation result image data and generating the second learning data based on the identification result. For example, the identification unit 10F4 is realized by the CPU 10H1 or the like.

マスク画像データ生成部１０Ｆ５は、対象物体、及び、対象物体以外を区別して示すマスク画像データを生成するマスク画像データ生成手順を行う。例えば、マスク画像データ生成部１０Ｆ５は、ＣＰＵ１０Ｈ１等で実現する。 The mask image data generation unit 10F5 performs a mask image data generation procedure for generating mask image data that distinguishes between the target object and other than the target object. For example, the mask image data generation unit 10F5 is realized by the CPU 10H1 or the like.

イラスト化処理部１０Ｆ６は、対象物体、及び、対象物体以外をイラスト化するイラスト化手順を行う。例えば、イラスト化処理部１０Ｆ６は、ＣＰＵ１０Ｈ１等で実現する。 The illustration processing unit 10F6 performs an illustration procedure for illustration of the target object and other than the target object. For example, the illustration processing unit 10F6 is realized by the CPU 10H1 or the like.

以上のように、学習データ生成装置１０は、学習データ１５等の第２学習データを生成する。このように、第２学習データを生成できると、学習データを人手で生成する場合等と比較して、農作物の摘果箇所を推定するＡＩ用の学習データを用意する作業負荷を軽減できる。例えば、農作物の摘果箇所を推定するＡＩ用の学習データは、少なくとも数千枚の画像データを用意する必要がある。このような用意を行うには、少なくとも１年乃至数年程度の準備期間を要する場合が多い。 As described above, the learning data generation device 10 generates the second learning data such as the learning data 15. As described above, if the second learning data can be generated, the workload of preparing the learning data for AI for estimating the fruit-picked portion of the crop can be reduced as compared with the case where the learning data is manually generated. For example, it is necessary to prepare at least several thousand image data for the learning data for AI for estimating the fruit-picked part of the crop. It often takes at least one to several years to make such preparations.

特に、農作物は、屋外等のように、いわゆる自然光下で撮影される場合が多い。このような照明環境下は、工場等より、照明環境が安定しない条件の場合が多い。具体的には、日光等は、人為的に調整するのが難しい。ゆえに、自然光は、工場等の照明等と比較して、光の強さ、向き、又は、影の有無等といった様々な条件が変動する。ゆえに、農作物を対象とする撮影は、照明環境が工場内等の屋内と比較して条件が厳しい場合が多い。このような外乱の多い条件下でＡＩを用いる場合には、特に学習データが多いのが望ましい。 In particular, agricultural products are often photographed under so-called natural light, such as outdoors. Under such a lighting environment, the lighting environment is often more unstable than in factories and the like. Specifically, it is difficult to artificially adjust sunlight and the like. Therefore, the natural light has various conditions such as the intensity, direction, and the presence or absence of shadows, as compared with the lighting of a factory or the like. Therefore, when shooting crops, the lighting environment is often stricter than indoors such as in factories. When AI is used under such a condition with a lot of disturbance, it is desirable that there is a large amount of training data.

なお、準備期間は、対象とする農作物の周期によって異なる。 The preparation period differs depending on the cycle of the target crop.

さらに、ＡＩの推定精度を十分に高めようとするのであれば、学習データは、更に多く準備されるのが望ましい。例えば、バーニーおじさんのルール（ＵｎｃｌｅＢｅｒｎｉｅ‘ｓｒｕｌｅ）等に基づくと、ＡＩの学習には、ニューラルネットワークにおけるパラメータ数の１０倍以上の学習データを準備するのが望ましい。したがって、農作物の摘果箇所を推定するＡＩ用の学習データは、数万枚乃至数十万枚以上の画像データが準備されるのが望ましい場合もある。 Further, if the estimation accuracy of AI is to be sufficiently improved, it is desirable that more training data are prepared. For example, based on Uncle Bernie's rule, etc., it is desirable to prepare training data that is 10 times or more the number of parameters in the neural network for AI training. Therefore, it may be desirable to prepare tens of thousands to hundreds of thousands or more image data for the learning data for AI for estimating the fruit-picked portion of the crop.

準備する学習データの量が多くなれば、学習データを実物の農作物を撮影して生成する場合には、準備期間が長くなり、作業負荷も大きくなりやすい。このように、作業負荷が大きくなると、開発コストの増大、及び、開発の長期化等の原因になる。 If the amount of learning data to be prepared is large, the preparation period becomes long and the workload tends to be large when the learning data is generated by photographing the actual agricultural products. As described above, when the workload becomes large, it causes an increase in development cost and a prolongation of development.

一方で、本実施形態のように、学習データを生成できると、少ない作業負荷で多くの学習データを用意できる。したがって、学習データを用意する作業負荷を軽減できる。 On the other hand, if the learning data can be generated as in the present embodiment, a large amount of learning data can be prepared with a small workload. Therefore, the workload of preparing the learning data can be reduced.

学習装置３０１は、例えば、学習データ入力部３０１Ｆ１、及び、学習部３０１Ｆ２等を備える機能構成である。 The learning device 301 has, for example, a functional configuration including a learning data input unit 301F1 and a learning unit 301F2.

学習データ入力部３０１Ｆ１は、第２学習データを入力する学習データ入力手順を行う。例えば、学習データ入力部３０１Ｆ１は、インタフェース１０Ｈ３等で実現する。 The learning data input unit 301F1 performs a learning data input procedure for inputting the second learning data. For example, the learning data input unit 301F1 is realized by the interface 10H3 or the like.

学習部３０１Ｆ２は、第２学習データにより、学習モデル３０２を学習させる学習手順を行う。例えば、学習部３０１Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The learning unit 301F2 performs a learning procedure for learning the learning model 302 using the second learning data. For example, the learning unit 301F2 is realized by the CPU 10H1 or the like.

以上のように、学習装置３０１は、学習データ生成装置１０が生成する第２学習データ等を用いて学習モデル３０２を学習させる。このような学習により、学習装置３０１は、摘果対象物を推定する学習済みモデル３０３を生成できる。例えば、学習済みモデル３０３は、以下のように摘果対象物推定装置４０２が用いる。 As described above, the learning device 301 trains the learning model 302 using the second learning data or the like generated by the learning data generation device 10. By such learning, the learning device 301 can generate a trained model 303 that estimates the fruit-picking object. For example, the trained model 303 is used by the fruit thinning object estimation device 402 as follows.

摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。 The fruit-picking object estimation device 402 has a functional configuration including an image data input unit 10F1, an estimation unit 402F1, an output unit 402F2, and the like.

画像データ入力部１０Ｆ１は、未知画像データ４０１を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting unknown image data 401. For example, the image data input unit 10F1 is realized by the camera 11, the interface 10H3, and the like.

以上のように、摘果対象物推定装置４０２は、学習済みモデル３０３を実装すると、学習済みモデル３０３により、摘果作業の内容を推定し、摘果対象物（なお、位置、数、又は、候補等の情報を含む。）を推定できる。このような推定結果が出力されると、ユーザ４０４は、初心者等であっても、推定結果を参照して、適切な摘果作業を行うことができる。すなわち、ユーザ４０４が初心者等であっても、推定結果を参照すると、摘果作業で残す果実と、摘果する果実とが把握できる。 As described above, when the trained model 303 is mounted, the fruit-picking object estimation device 402 estimates the content of the fruit-picking work by the trained model 303, and the fruit-picking object (position, number, candidate, etc.) is estimated. Information is included.) Can be estimated. When such an estimation result is output, the user 404 can refer to the estimation result and perform an appropriate fruit-picking operation even if he / she is a beginner or the like. That is, even if the user 404 is a beginner or the like, the fruit left in the fruit-picking operation and the fruit to be picked can be grasped by referring to the estimation result.

学習システム５００は、例えば、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２の備える機能構成のうち、いずれかの機能構成を備える。 The learning system 500 includes, for example, one of the functional configurations of the learning data generation device 10, the learning device 301, and the fruit-picking object estimation device 402.

具体的には、学習システム５００は、学習データ生成装置１０、及び、学習装置３０１等の複数の情報処理装置で構成する。このような学習システム５００であると、学習データを生成し、かつ、学習モデル３０２を学習させて学習済みモデル３０３を生成できる。 Specifically, the learning system 500 is composed of a plurality of information processing devices such as a learning data generation device 10 and a learning device 301. With such a learning system 500, it is possible to generate learning data and train the learning model 302 to generate the trained model 303.

なお、学習システム５００は、複数の情報処理装置に限られず、１台の情報処理装置であってもよい。 The learning system 500 is not limited to a plurality of information processing devices, and may be one information processing device.

また、学習システム５００は、学習装置３０１、及び、摘果対象物推定装置４０２の組み合わせでもよい。 Further, the learning system 500 may be a combination of the learning device 301 and the fruit thinning object estimation device 402.

［推定システムの機能構成例］
図２０は、推定システムの機能構成例を示す図である。例えば、推定システム５０１は、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２等で構成する。ただし、推定システム５０１は、学習データ生成装置１０がなくともよい。すなわち、推定システム５０１は、学習データ１５に、撮影した画像データを用いる、学習データ生成装置１０が生成した画像データを用いる、及び、両方を用いるのうち、いずれでもよい。 [Example of functional configuration of estimation system]
FIG. 20 is a diagram showing a functional configuration example of the estimation system. For example, the estimation system 501 includes a learning data generation device 10, a learning device 301, a fruit thinning object estimation device 402, and the like. However, the estimation system 501 does not have to have the learning data generation device 10. That is, the estimation system 501 may use the captured image data as the learning data 15, the image data generated by the learning data generation device 10, or both.

なお、学習モデル３０２、及び、学習済みモデル３０３（学習済みモデル３０３を利用するプログラムを含む。）は、複製されて学習装置３０１、及び、摘果対象物推定装置４０２等が複数であってもよい。 The learning model 302 and the trained model 303 (including a program that uses the trained model 303) may be duplicated and have a plurality of learning devices 301, a fruit thinning object estimation device 402, and the like. ..

学習装置３０１は、例えば、画像データ入力部１０Ｆ１、学習データ入力部３０１Ｆ１、学習部３０１Ｆ２、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６等を備える機能構成である。 The learning device 301 has, for example, a functional configuration including an image data input unit 10F1, a learning data input unit 301F1, a learning unit 301F2, an extraction unit 10F2, a mask image data generation unit 10F5, an illustration processing unit 10F6, and the like.

学習データ入力部３０１Ｆ１は、学習データ１５を入力する学習データ入力手順を行う。例えば、学習データ入力部３０１Ｆ１は、インタフェース１０Ｈ３等で実現する。 The learning data input unit 301F1 performs a learning data input procedure for inputting the learning data 15. For example, the learning data input unit 301F1 is realized by the interface 10H3 or the like.

学習部３０１Ｆ２は、学習データ１５に基づき、学習モデル３０２を学習させる学習手順を行う。例えば、学習部３０１Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The learning unit 301F2 performs a learning procedure for learning the learning model 302 based on the learning data 15. For example, the learning unit 301F2 is realized by the CPU 10H1 or the like.

抽出部１０Ｆ２は、第１入力画像データ１１Ｄ１、及び、学習データ１５において、対象物体、又は、摘果対象物を抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting the target object or the fruit-picking target from the first input image data 11D1 and the learning data 15. For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

第１入力画像データ１１Ｄ１、及び、学習データ１５は、どちらか一方、又は、両方が抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 One or both of the first input image data 11D1 and the training data 15 generate mask image data by the extraction unit 10F2, the mask image data generation unit 10F5, and the illustration processing unit 10F6. It is desirable that an extraction process is performed in which either the conversion process is performed or both processes are performed.

このように、対象物体、又は、摘果対象物等が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、学習モデル３０２は、摘果対象物等の重要な特徴量を精度良く学習できる。すなわち、学習装置３０１は、学習モデル３０２を学習させて、摘果作業を精度良く推定できる学習済みモデル３０３を生成できる。 In this way, when the target object or the fruit-picking target is extracted, the learning model 302 has important features such as the fruit-picking target, as compared with the case where the image data obtained by simply photographing the crop is used as it is. You can learn the amount accurately. That is, the learning device 301 can train the learning model 302 to generate a learned model 303 that can accurately estimate the fruit-picking operation.

以上のように、推定システム５０１は、学習部３０１Ｆ２により、学習モデル３０２を学習させて、学習済みモデル３０３を生成する。このように、生成された学習済みモデル３０３が、ネットワーク等を介して、摘果対象物推定装置４０２に送られる。 As described above, the estimation system 501 trains the learning model 302 by the learning unit 301F2 to generate the learned model 303. In this way, the generated trained model 303 is sent to the fruit-picking object estimation device 402 via a network or the like.

摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、イラスト化処理部１０Ｆ６、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。 The fruit-picking object estimation device 402 has a functional configuration including an image data input unit 10F1, an extraction unit 10F2, a mask image data generation unit 10F5, an illustration processing unit 10F6, an estimation unit 402F1, an output unit 402F2, and the like.

抽出部１０Ｆ２は、未知画像データ４０１において、対象物体、又は、摘果対象物を抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting the target object or the fruit-picking target in the unknown image data 401. For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

以上のように、推定システム５０１では、まず、学習装置３０１が学習モデル３０２を学習させて、学習済みモデル３０３を生成する。次に、推定システム５０１では、このように生成された学習済みモデル３０３が摘果対象物推定装置４０２に配布される。 As described above, in the estimation system 501, the learning device 301 first trains the learning model 302 to generate the trained model 303. Next, in the estimation system 501, the trained model 303 thus generated is distributed to the fruit thinning object estimation device 402.

摘果対象物推定装置４０２は、学習済みモデル３０３を実装すると、学習済みモデル３０３により、摘果作業の内容を推定し、摘果対象物（なお、位置、数、又は、候補等の情報を含む。）を推定できる。このような推定結果が出力されると、ユーザ４０４は、初心者等であっても、推定結果を参照して、適切な摘果作業を行うことができる。 When the trained model 303 is mounted, the fruit-picking target estimation device 402 estimates the content of the fruit-picking work by the trained model 303, and the fruit-picking target (including information such as position, number, or candidate). Can be estimated. When such an estimation result is output, the user 404 can refer to the estimation result and perform an appropriate fruit-picking operation even if he / she is a beginner or the like.

すなわち、ユーザ４０４が初心者等であっても、推定結果を参照すると、摘果作業で残す果実と、摘果する果実とが把握できる。また、例えば、学習装置３０１がクラウド環境等を利用する場合には、データの収集、及び、学習済みモデル３０３の配布等を速やかに行うことができる。 That is, even if the user 404 is a beginner or the like, the fruit left in the fruit-picking operation and the fruit to be picked can be grasped by referring to the estimation result. Further, for example, when the learning device 301 uses a cloud environment or the like, data can be collected and the learned model 303 can be quickly distributed.

［学習データの形式について］
第１学習データ、及び、第２学習データ等の学習データは、農作物を抽出した形式の画像データを用いるのが望ましい。ただし、抽出は、複数の段階に分けて行ってもよい。このような場合において、学習装置３０１は、抽出において、途中の段階となる形式の画像データ等を学習データに含めてもよい。 [About the format of learning data]
As the learning data such as the first learning data and the second learning data, it is desirable to use the image data in the format of extracting the agricultural products. However, the extraction may be performed in a plurality of stages. In such a case, the learning device 301 may include image data or the like in a format that is in the middle of the extraction in the learning data.

例えば、抽出処理は、第１段階乃至第３段階の３段階に分けて行うとする。 For example, the extraction process is divided into three stages, a first stage to a third stage.

第１段階は、入力された状態、すなわち、写真の形式（ただし、ホワイトバランス等の調整がされてもよい。）の画像データである。 The first stage is image data in the input state, that is, in the form of a photograph (however, white balance and the like may be adjusted).

第２段階は、農作物以外の箇所を背景とし、背景をマスクした形式の画像データである。例えば、背景は白色（マスクにより、どのような色にするかは設定する。）にマスク化される。 The second stage is image data in a format in which a part other than the agricultural product is used as a background and the background is masked. For example, the background is masked to white (the color is set by the mask).

第３段階は、農作物等をイラスト化した形式の画像データである。 The third stage is image data in the form of illustrations of agricultural products and the like.

学習データは、上記の第１段階乃至第３段階のうち、どの段階の画像データでもよい。また、学習データは、上記の第１段階乃至第３段階のうち、どの段階の画像データだけでなく、複数の段階、すなわち、抽出処理がされる前と後の両方の画像データでもよい。 The learning data may be image data of any of the above-mentioned first to third stages. Further, the learning data may be not only the image data of any stage of the first to third stages described above, but also the image data of a plurality of stages, that is, both before and after the extraction process.

マスク化等で農作物が抽出された形式の画像データであると、学習装置３０１は、学習モデルに摘果対象物を精度良く学習できる。 If the image data is in a format in which agricultural products are extracted by masking or the like, the learning device 301 can accurately learn the fruit-picking object in the learning model.

一方で、学習データは、写真等の形式の画像データを含むのが望ましい場合もある。例えば、イラスト化すると、画像データは、対象物体に発生している傷等（例えば、日当たりが悪い、塩害、腐食、病気、外傷、又は、虫食い等を原因とする。また、変色等でもよい。）を省略する場合がある。これに対し、摘果作業は、傷等がある対象物体を優先的に摘果する場合もある。このような摘果作業のためのＡＩは、第１段階、又は、第２段階等の形式、すなわち、傷等を表示する形式の画像データで学習するのが望ましい。したがって、学習データは、摘果作業の好み等に応じて形式が選択されてもよい。 On the other hand, it may be desirable that the training data include image data in a format such as a photograph. For example, when illustrated, the image data may be caused by scratches or the like (for example, poor sunlight, salt damage, corrosion, illness, trauma, worm-eaten, etc.) occurring on the target object, or may be discolored or the like. ) May be omitted. On the other hand, in the fruit-picking operation, the target object having scratches or the like may be preferentially picked. It is desirable that the AI for such fruit-picking work is learned in the form of the first stage or the second stage, that is, the image data in the format of displaying scratches and the like. Therefore, the format of the training data may be selected according to the preference of the fruit-picking work and the like.

このように、学習データは、複数段階の画像データであると、学習装置３０１は、より好みに合致した摘果作業を学習モデルに学習させることができる。 As described above, when the learning data is image data of a plurality of stages, the learning device 301 can make the learning model learn the fruit-picking work that more closely matches the preference.

［ＡＩについて］
ＡＩは、例えば、以下のようなネットワーク構造で画像データ等を処理する。 [About AI]
AI processes image data and the like with the following network structure, for example.

図２１は、ネットワーク構造例を示す図である。例えば、ＡＩは、入力層Ｌ１、隠れ層Ｌ２、及び、出力層Ｌ３を有するネットワーク構造を有してもよい。 FIG. 21 is a diagram showing an example of a network structure. For example, the AI may have a network structure having an input layer L1, a hidden layer L2, and an output layer L3.

具体的には、ＡＩは、図示するようなＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ（畳み込みニューラルネットワーク、ＣＮＮ）等を有するネットワーク構造である。 Specifically, AI is a network structure having a Convolution Natural Network (convolutional neural network, CNN) or the like as shown in the figure.

入力層Ｌ１は、入力データＤＩＮを入力する層である。 The input layer L1 is a layer for inputting input data DIN.

隠れ層Ｌ２は、入力層Ｌ１から入力される入力データＤＩＮに対して、畳み込み、プーリング、正規化、又は、これらの組み合わせ等の処理を行う層である。 The hidden layer L2 is a layer that performs processing such as convolution, pooling, normalization, or a combination thereof with respect to the input data DIN input from the input layer L1.

出力層Ｌ３は、隠れ層Ｌ２で処理された結果を出力データＤＯＵＴで出力する層である。例えば、出力層Ｌ３は、全結合層等で構成される。 The output layer L3 is a layer that outputs the result processed by the hidden layer L2 as the output data DOUT. For example, the output layer L3 is composed of a fully connected layer or the like.

畳み込み（Ｃｏｎｖｏｌｕｔｉｏｎ）は、例えば、フィルタ、マスク、又は、カーネル（以下単に「フィルタ」という。）等に基づいて、画像、又は、画像に対して所定の処理を行って生成される特徴マップ等に対して、フィルタ処理を行って、特徴マップを生成する処理である。 Convolution is, for example, an image or a feature map generated by performing a predetermined process on an image based on a filter, a mask, a kernel (hereinafter, simply referred to as "filter"), or the like. On the other hand, it is a process of performing a filter process and generating a feature map.

具体的には、フィルタは、フィルタ係数（「重み」又は「パラメータ」等という場合もある。）を画像又は特徴マップの画素値に乗じる計算をするのに用いるデータである。なお、フィルタ係数は、学習又は設定等により定まる値である。 Specifically, the filter is data used for calculating the filter coefficient (sometimes referred to as "weight" or "parameter") by multiplying the pixel value of the image or feature map. The filter coefficient is a value determined by learning or setting.

そして、畳み込みの処理は、画像又は特徴マップを構成する画素のそれぞれの画素値に、フィルタ係数を乗じる計算を行い、計算結果を構成要素とする特徴マップを生成する処理である。 Then, the convolution process is a process of multiplying each pixel value of the pixels constituting the image or the feature map by a filter coefficient to generate a feature map having the calculation result as a component.

このように、畳み込みの処理が行われると、画像又は特徴マップの特徴が抽出できる。特徴は、例えば、エッジ成分、又は、対象とする画素の周辺を統計処理した結果等である。 When the convolution process is performed in this way, the features of the image or the feature map can be extracted. The feature is, for example, the edge component or the result of statistically processing the periphery of the target pixel.

また、畳み込みの処理が行われると、対象とする画像又は特徴マップが示す被写体等が、上下にずれる、左右にずれる、斜めにずれる、回転、又は、これらの組み合わせとなる画像又は特徴マップであっても同様の特徴が抽出できる。 Further, when the convolution process is performed, the target image or the subject indicated by the feature map is an image or feature map that shifts up and down, shifts left and right, shifts diagonally, rotates, or is a combination thereof. However, the same characteristics can be extracted.

プーリング（Ｐｏｏｌｉｎｇ）は、対象とする範囲に対して、平均の計算、最小値の抽出、又は、最大値の抽出等の処理を行って、特徴を抽出して特徴マップを生成する処理である。すなわち、プーリングは、ｍａｘプーリング、又は、ａｖｇプーリング等である。 Pooling is a process of extracting features and generating a feature map by performing processing such as average calculation, extraction of the minimum value, or extraction of the maximum value for a target range. That is, the pooling is max pooling, avg pooling, or the like.

なお、畳み込み、及び、プーリングは、ゼロパディング（ＺｅｒｏＰａｄｄｉｎｇ）等の前処理があってもよい。 The convolution and pooling may be pretreated such as zero padding.

以上のような、畳み込み、プーリング、又は、これらの組み合わせによって、いわゆるデータ量削減効果、合成性、又は、移動不変性等が獲得できる。 By convolution, pooling, or a combination thereof as described above, a so-called data amount reduction effect, syntheticity, movement invariance, or the like can be obtained.

正規化（Ｎｏｒｍａｌｉｚａｔｉｏｎ）は、例えば、分散及び平均値を揃える処理等である。なお、正規化は、局所的に行う場合を含む。そして、正規化が行われるとは、データは、所定の範囲内の値等になる。ゆえに、以降の処理においてデータの扱いが容易にできる。 Normalization is, for example, a process of aligning variances and averaging values. The normalization includes the case where it is performed locally. Then, when normalization is performed, the data becomes a value or the like within a predetermined range. Therefore, the data can be easily handled in the subsequent processing.

全結合（Ｆｕｌｌｙｃｏｎｎｅｃｔｅｄ）は、特徴マップ等のデータを出力に落とし込む処理である。 Fully connected is a process of dropping data such as a feature map into the output.

例えば、出力は、「ＹＥＳ」又は「ＮＯ」等のように、出力が２値の形式である。このような出力形式では、全結合は、２種類のうち、いずれかの結論となるように、隠れ層Ｌ２で抽出される特徴に基づいてノードを結合する処理である。 For example, the output is in the form of a binary output, such as "YES" or "NO". In such an output format, full coupling is a process of joining nodes based on the features extracted in the hidden layer L2 so that one of the two conclusions can be reached.

一方で、出力が３種類以上ある場合等には、全結合は、いわゆるソフトマックス関数等を行う処理である。このようにして、全結合により、最尤推定法等によって分類（確率を示す出力を行う場合を含む。）を行うことができる。 On the other hand, when there are three or more types of outputs, the full coupling is a process of performing a so-called softmax function or the like. In this way, by full coupling, classification (including the case where an output indicating a probability is performed) can be performed by a maximum likelihood estimation method or the like.

［その他の実施形態］
学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２は、異なる種類の情報処理装置であってもよい。すなわち、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２は、異なるハードウェア構成であってもよい。 [Other Embodiments]
The learning data generation device 10, the learning device 301, and the fruit thinning object estimation device 402 may be different types of information processing devices. That is, the learning data generation device 10, the learning device 301, and the fruit thinning object estimation device 402 may have different hardware configurations.

学習データは、教師データ、又は、訓練データ等と呼ばれる場合もある。 The learning data may be referred to as teacher data, training data, or the like.

実施形態は、上記の実施形態を組み合わせたものでもよい。すなわち、学習データを生成する装置、学習モデルに対して学習処理を行って学習済みモデルを生成する装置、及び、学習済みモデルを用いて実行処理を行う装置は、同じ装置でもよいし、異なる装置であってもよい。このように、学習モデルの学習、及び、学習済みモデルによる実行は、同一の情報処理装置で行われなくともよい。すなわち、学習モデルの学習、及び、学習済みモデルによる実行は、異なる情報処理装置で行われてもよい。 The embodiment may be a combination of the above embodiments. That is, the device that generates the training data, the device that performs the learning process on the learning model to generate the trained model, and the device that performs the execution process using the trained model may be the same device or different devices. It may be. As described above, the learning of the learning model and the execution by the trained model do not have to be performed by the same information processing device. That is, the learning of the learning model and the execution by the trained model may be performed by different information processing devices.

なお、異なる装置である場合には、互いの装置は、例えば、ネットワーク等を介して、学習データ、又は、学習済みモデル等のデータを送受信する。 In the case of different devices, the devices send and receive training data or data such as a trained model via a network or the like, for example.

ゆえに、学習済みモデルは、学習によって生成された後、ネットワーク等を介して、プログラム等の形式で配信され、学習された情報処理装置とは異なる装置で実行されてもよい。なお、他の情報処理装置において学習して生成された学習モデルに対し、追加して学習が行われてもよい。 Therefore, the trained model may be generated by learning, then distributed in the form of a program or the like via a network or the like, and executed by a device different from the trained information processing device. In addition, learning may be additionally performed with respect to the learning model generated by learning in another information processing apparatus.

なお、学習データは、データ拡張（ｄａｔａａｕｇｍｅｎｔａｔｉｏｎ）が行われてもよい。具体的には、学習データは、画像データの場合には、画像データが示す画像の一部を切り出して新たなデータを生成する等のデータ拡張がされてもよい。 The training data may be data-expanded (data augmentation). Specifically, in the case of image data, the training data may be data-expanded such as cutting out a part of the image indicated by the image data to generate new data.

同様に、データ拡張は、例えば、回転、スライド、データの一部せん断、左右反転、上下反転、歪みを加える、歪みを補正する、濃淡の変更、色の補正、ノイズを減らす、ノイズを加える、フィルタをかける、拡大、縮小、エッジの強調、又は、これらの組み合わせとなる処理等を画像データに対してランダムに適用する処理である。 Similarly, data expansion includes, for example, rotation, sliding, partial shearing of data, left-right reversal, upside-down, distortion, distortion correction, shading change, color correction, noise reduction, noise addition, etc. This is a process of randomly applying a filter, enlargement, reduction, edge enhancement, or a combination of these processes to image data.

このようにデータ拡張により、学習データを増やせると、学習モデルの学習に用いる学習データを増やすことができる。 If the learning data can be increased by data expansion in this way, the learning data used for learning the learning model can be increased.

実施形態では、バッチノーマライゼーション（ＢａｔｃｈＮｏｒｍａｌｉｚａｔｉｏｎ）、又は、ドロップアウト等といった過学習（「過剰適合」又は「過適合」等ともいう。ｏｖｅｒｆｉｔｔｉｎｇ）を軽減化させる処理が行われてもよい。ほかにも、次元削減等の処理が行われてもよい。 In the embodiment, a process for reducing overfitting (also referred to as “overfitting” or “overfitting”, etc., overfitting) such as batch normalization or dropout may be performed. In addition, processing such as dimension reduction may be performed.

学習モデル、及び、学習済みモデル等におけるネットワーク構造は、ＣＮＮのネットワーク構造に限られない。例えば、ネットワーク構造は、ＲＮＮ（再帰型ニューラルネットワーク、ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ−ＴｅｒｍＭｅｍｏｒｙ）、又は、Ｔｒａｎｓｆｏｒｍｅｒ等の構成を有してもよい。 The network structure in the learning model, the trained model, and the like is not limited to the network structure of CNN. For example, the network structure may have a configuration such as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), or Transformer.

また、学習モデル、及び、学習済みモデルは、ハイパパラメータを有する構成であってもよい。すなわち、学習モデル、及び、学習済みモデルは、一部の設定をユーザが行う構成でもよい。 Further, the learning model and the trained model may have a configuration having hyperparameters. That is, the learning model and the trained model may be configured in which the user makes some settings.

ほかにも、例えば、グラフ（頂点、及び、辺で構成されるデータである。）を扱う場合には、学習モデル、及び、学習済みモデルは、ＧｒａｐｈＮｅｕｒａｌＮｅｔｗｏｒｋ（グラフニューラルネットワーク、ＧＮＮ）等の構造を有してもよい。 In addition, for example, when dealing with a graph (data composed of vertices and edges), the learning model and the trained model may be a Graph Neural Network (GNN) or the like. It may have a structure.

また、学習モデル、及び、学習済みモデルは、他の機械学習を利用してもよい。例えば、学習モデル、及び、学習済みモデルは、教師なしのモデルにより、正規化等を前処理で行ってもよい。 Further, the learning model and the trained model may use other machine learning. For example, the learning model and the trained model may be normalized or the like by preprocessing by an unsupervised model.

本発明は、上記に例示する学習データ生成方法、学習方法、推定方法、又は、上記に示す処理と等価な処理を実行するプログラム（ファームウェア、及び、プログラムに準ずるものを含む。以下単に「プログラム」という。）で実現されてもよい。 The present invention includes a learning data generation method, a learning method, an estimation method exemplified above, or a program (firmware and a program equivalent to the program) that executes a process equivalent to the process shown above. Hereinafter, simply "program". It may be realized by.).

すなわち、本発明は、コンピュータに対して指令を行って所定の結果が得られるように、プログラミング言語等で記載されたプログラム等で実現されてもよい。なお、プログラムは、処理の一部をＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ（集積回路、ＩＣ）等のハードウェア又はＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ）等の演算装置等で実行する構成であってもよい。 That is, the present invention may be realized by a program or the like described in a programming language or the like so that a predetermined result can be obtained by giving a command to a computer. The program may be configured to execute a part of the processing by hardware such as an integrated circuit (IC) or an arithmetic unit such as a Graphics Processing Unit (GPU).

プログラムは、コンピュータが有する演算装置、制御装置、及び、記憶装置等を協働させて上記に示す処理等をコンピュータに実行させる。すなわち、プログラムは、主記憶装置等にロードされて、演算装置に命令を発して演算を行わせてコンピュータを動作させる。 The program causes the computer to execute the above-mentioned processing and the like in cooperation with the arithmetic unit, the control device, the storage device, and the like possessed by the computer. That is, the program is loaded into the main storage device or the like, and issues an instruction to the arithmetic unit to perform an arithmetic operation to operate the computer.

また、プログラムは、コンピュータが読み込み可能な記録媒体、又は、ネットワーク等の電気通信回線を介して提供されてもよい。 The program may also be provided via a computer-readable recording medium or a telecommunication line such as a network.

本発明は、複数の装置で構成されるシステムで実現されてもよい。すなわち、複数のコンピュータによるシステムは、上記に示す処理を冗長、並列、分散、又は、これらの組み合わせとなるように実行してもよい。したがって、本発明は、上記に示すハードウェア構成以外の装置、及び、上記に示す装置以外のシステムで実現されてもよい。 The present invention may be realized in a system composed of a plurality of devices. That is, a system using a plurality of computers may execute the above-mentioned processes in a redundant, parallel, distributed manner, or a combination thereof. Therefore, the present invention may be realized by a device other than the hardware configuration shown above and a system other than the device shown above.

なお、本発明は、上記に例示する各実施形態に限定されない。したがって、本発明は、技術的な要旨を逸脱しない範囲で、構成要素の追加、又は、変形が可能である。ゆえに、特許請求の範囲に記載された技術思想に含まれる技術的事項のすべてが本発明の対象となる。なお、上記に例示する実施形態は、実施において好適な具体例である。そして、当業者であれば、開示した内容から様々な変形例を実現で可能であって、このような変形例は、特許請求の範囲に記載された技術的範囲に含まれる。 The present invention is not limited to the above-exemplified embodiments. Therefore, the present invention can add or modify components without departing from the technical gist. Therefore, all the technical matters included in the technical idea described in the claims are the subject of the present invention. The embodiments illustrated above are specific examples suitable for implementation. A person skilled in the art can realize various modified examples from the disclosed contents, and such modified examples are included in the technical scope described in the claims.

１０：学習データ生成装置
１０Ｆ１：画像データ入力部
１０Ｆ２：抽出部
１０Ｆ３：生成部
１０Ｆ４：識別部
１０Ｆ５：マスク画像データ生成部
１０Ｆ６：イラスト化処理部
１１：カメラ
１１Ｄ１：第１入力画像データ
１１Ｄ２：第２入力画像データ
１２：第１農作物
１３：第２農作物
１４：作業者
１５：学習データ
２０：抽出結果
２１：推定結果画像データ
２２：正解データ
３１：第１物体
３２：第２物体
３３：第３物体
３４：第４物体
４０：マスク画像データ
４１：第１対象物体
４２：第２対象物体
４３：第３対象物体
４４：第４対象物体
５０：イラスト化画像データ
５１：対象物体領域
５２：塗り潰し領域
１０１：第１対象物体
１０２：第２対象物体
１０３：第３対象物体
１０４：第４対象物体
１０５：第５対象物体
１０６：第６対象物体
１０７：第７対象物体
３０１：学習装置
３０１Ｆ１：学習データ入力部
３０１Ｆ２：学習部
３０２：学習モデル
３０３：学習済みモデル
４０１：未知画像データ
４０２：摘果対象物推定装置
４０２Ｆ１：推定部
４０２Ｆ２：出力部
４０３：出力画面
４０４：ユーザ
４０５：設定画面
５００：学習システム
10: Learning data generation device 10F1: Image data input unit 10F2: Extraction unit 10F3: Generation unit 10F4: Identification unit 10F5: Mask image data generation unit 10F6: Illustration processing unit 11: Camera 11D1: First input image data 11D2: First 2 Input image data 12: 1st agricultural product 13: 2nd agricultural product 14: Worker 15: Learning data 20: Extraction result 21: Estimated result image data 22: Correct answer data 31: 1st object 32: 2nd object 33: 3rd Object 34: Fourth object 40: Mask image data 41: First object object 42: Second object object 43: Third object object 44: Fourth object object 50: Illustrated image data 51: Target object area 52: Filled area 101: 1st object 102: 2nd object 103: 3rd object 104: 4th object 105: 5th object 106: 6th object 107: 7th object 301: Learning device 301F1: Learning data Input unit 301F2: Learning unit 302: Learning model 303: Learned model 401: Unknown image data 402: Fruit thinning object estimation device 402F1: Estimating unit 402F2: Output unit 403: Output screen 404: User 405: Setting screen 500: Learning system

Claims

A learning data generation device that generates learning data, a learning device that has a generation unit and an identification unit and trains a learning model using the learning data, and fruit picking using a learned model trained by the learning device. An estimation system that has an object estimation device.
The learning data generator is
An image data input unit for inputting a first input image data which is image data showing an agricultural product before fruit thinning and a second input image data which is an image data showing the agricultural product after fruit thinning.
Extraction of the target object which is the difference between the first input image data and the second input image data among the target objects which are the fruits, flowers, leaves, or a combination thereof of the agricultural crop as the fruit-picking target. Department and
A generation unit that learns using the first learning data including the extraction result by the extraction unit and generates an estimation result image data showing the result of estimating the fruit-picking object.
It is provided with an identification unit that identifies the estimation result image data and generates a second learning data based on the identification result.
The learning device is
A learning data input unit for inputting the second learning data,
The generation unit that generates the estimation result image data showing the result of estimating the fruit-picking object by the second learning data, and the generation unit.
It is provided with the identification unit that identifies the estimation result image data as compared with the learning data and feeds back the identification result to the generation unit to train the learning model.
The fruit thinning object estimation device is
An image data input unit that inputs unknown image data indicating an unknown crop before fruit thinning,
An estimation unit that estimates the fruit-picking object using the trained model,
An estimation system including an output unit that outputs an estimation result by the estimation unit.

The learning device is
The estimation system according to claim 1, further comprising a mask image data generation unit that generates mask image data that distinguishes between the target object and other than the target object.

The mask image data generation unit
The estimation system according to claim 2, wherein the mask image data for performing instance segmentation for identifying the target object is generated.

The learning device is
The estimation system according to any one of claims 1 to 3, further comprising an illustration processing unit for illustration of the target object.

In the learning device
The first input image data and the second input image data are
The estimation system according to any one of claims 1 to 4, which is image data showing a moving image or a plurality of still images taken from a plurality of viewpoints and showing the entire circumference of the crop.

In the learning device
The generation unit and the identification unit
Configure a hostile generation network,
The generator
The estimation result image data is generated with the aim of being identified as genuine by the identification unit.
The identification unit
It is identified whether it is a genuine product showing the extraction result or a fake product which is the estimation result image data generated in the generation unit.
The generation unit and the identification unit
The estimation system according to any one of claims 1 to 5, which generates the estimation result image data identified as genuine by the identification unit as the second learning data.

A learning system having a learning data generator that generates learning data and a learning device that trains a learning model using the learning data.
The learning data generator is
An image data input unit for inputting a first input image data which is image data showing an agricultural product before fruit thinning and a second input image data which is an image data showing the agricultural product after fruit thinning.
Extraction of the target object which is the difference between the first input image data and the second input image data among the target objects which are the fruits, flowers, leaves, or a combination thereof of the agricultural crop as the fruit-picking target. Department and
A generation unit that learns using the first learning data including the extraction result by the extraction unit and generates an estimation result image data showing the result of estimating the fruit-picking object.
With the identification unit that identifies the estimation result image data and generates the second learning data based on the identification result.
With
The learning device is
A learning data input unit for inputting the second learning data,
A generation unit that generates estimation result image data showing the result of estimating the fruit-picking object based on the second learning data, and a generation unit.
With the identification unit that identifies the estimation result image data in comparison with the learning data and feeds back the identification result to the generation unit of the learning device to learn the learning model.
Learning system with.

An image data input unit for inputting a first input image data which is image data showing an agricultural product before fruit thinning and a second input image data which is an image data showing the agricultural product after fruit thinning.
Extraction of the target object which is the difference between the first input image data and the second input image data among the target objects which are the fruits, flowers, leaves, or a combination thereof of the agricultural crop as the fruit-picking target. Department and
A generation unit that learns using the first learning data including the extraction result by the extraction unit and generates an estimation result image data showing the result of estimating the fruit-picking object.
A learning data generation device including an identification unit that identifies the estimation result image data and generates a second learning data based on the identification result.

A program that causes a computer to execute a learning data generation method.
An image data input procedure in which a computer inputs a first input image data which is image data showing an agricultural product before fruit thinning and a second input image data which is an image data showing the agricultural product after fruit thinning.
The computer uses the target object that is the difference between the first input image data and the second input image data among the target objects that are the fruits, flowers, leaves, or a combination thereof as the fruit picking target. Extraction procedure and extraction procedure
A generation procedure in which a computer learns from the first learning data including the extraction result by the extraction procedure and generates an estimation result image data showing the result of estimating the fruit-picking object.
A program for a computer to identify the estimation result image data and execute an identification procedure for generating second learning data based on the identification result.