JP2022114418A

JP2022114418A - Training device of artificial intelligence (ai), picking object estimation device, estimation system, and program

Info

Publication number: JP2022114418A
Application number: JP2021112182A
Authority: JP
Inventors: 浩二佐々木; Koji Sasaki; 和治井上; Kazuharu Inoue; 葵岩渕; Aoi Iwabuchi
Original assignee: AdIn Research Inc
Current assignee: AdIn Research Inc
Priority date: 2021-01-26
Filing date: 2021-07-06
Publication date: 2022-08-05
Anticipated expiration: 2041-01-26
Also published as: JP6994212B1

Abstract

PROBLEM TO BE SOLVED: To provide a training device, a picking object estimation device, an estimation system, and a program for estimating the object to be picked in farm crop picking work with high accuracy by AI.

SOLUTION: An estimation system 501 comprises a training data generator, a training device and a picking object estimation device. The training data generator generates training data from image data before picking and image data after picking. In the training device, a training data input unit performs inputting the training data, estimation result image data is generated by a generation unit that is the estimation result, for the object to be picked, of having used the image data before picking and the training data; the estimation result image data is identified using the training data by an identification unit; and a learning model is trained with the identification result fed back to the generation unit. In the picking object estimation device, an image data input unit performs inputting unknown image data that indicates an unknown farm crop before picking. An estimation unit estimates the object to be picked by the trained learning model. An output unit outputs the result of estimation by the estimation unit.

SELECTED DRAWING: Figure 20

Description

本発明は、人工知能（ＡＩ）の学習装置、摘果対象物推定装置、推定システム、及び、プログラムに関する。 The present invention relates to an artificial intelligence (AI) learning device, thinning target object estimation device, estimation system, and program.

人工知能（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ、以下「ＡＩ」という。）により、現状等に基づき推定、又は、様々な対象物を認識する技術が知られている。 Techniques for estimating or recognizing various objects based on the current state or the like using artificial intelligence (hereinafter referred to as "AI") are known.

例えば、コンベアにロボットハンドを設置した工場等に用いるロボットシステムがある。具体的には、ロボットシステムは、まず、カメラによって物体を撮影する。撮影後、撮影された画像に基づき、物体が画像認識される。そして、ロボットシステムは、撮影した画像に基づき、物体の重心位置を計算する。このように計算される重心位置に基づき、ロボットシステムは、ロボットハンドで物体を把持する正確な位置等を決定する。このようにして、ロボットハンドで物体を安定して把持する技術が知られている（例えば、特許文献１等を参照）。 For example, there is a robot system used in a factory or the like in which a robot hand is installed on a conveyor. Specifically, the robot system first photographs an object with a camera. After photographing, the object is image-recognized based on the photographed image. The robot system then calculates the position of the center of gravity of the object based on the captured image. Based on the position of the center of gravity calculated in this way, the robot system determines an accurate position for gripping the object with the robot hand. Techniques for stably gripping an object with a robot hand in this manner are known (see, for example, Patent Document 1 and the like).

また、ＡＩによる物体の認識は、農業の場面にも用いられる。具体的には、ぶどうの摘粒作業において、ＡＩが粒数を自動的に判定する技術が知られている（例えば、非特許文献１等を参照）。 Recognition of objects by AI is also used in agricultural situations. Specifically, there is known a technique in which AI automatically determines the number of grapes in grape picking (see, for example, Non-Patent Document 1).

特開２０１９－１８５２０４号公報JP 2019-185204 A

庄司健一，“「ぶどうの粒いくつある？」を自動判定するＡＩ来夏実用化へ～山梨大学と農業生産法人が共同開発”，［ｏｎｌｉｎｅ］，２０２０年８月１７日，ＤＧＬａｂＨａｕｓ，［令和２年１２月２日検索］，インターネット，＜ＵＲＬ：https://media.dglab.com/2020/08/17-grape-01/＞Kenichi Shoji, “Automatic judgment of “How many grapes are there?” AI will be put into practical use next summer-Joint development by University of Yamanashi and agricultural production corporation”, [online], August 17, 2020, DG Lab Haus, [ Retrieved December 2, 2020], Internet, <URL: https://media.dglab.com/2020/08/17-grape-01/>

上記の特許文献１に記載のような技術は、工場内等の照明環境を想定した技術である。すなわち、工場内等といった照明環境は、撮影、及び、画像認識等の処理を行うのに、野外等の自然光の下といった照明環境と比較して、光等の条件が安定している環境である場合が多い。したがって、工場内等の照明環境を想定した技術は、農作物を扱う等の照明環境には適用させにくい課題がある。 The technology described in Patent Document 1 is a technology that assumes a lighting environment such as in a factory. In other words, the lighting environment such as in a factory is an environment in which conditions such as light are stable compared to the lighting environment under natural light such as outdoors for performing processing such as shooting and image recognition. often. Therefore, there is a problem that it is difficult to apply the technology that assumes the lighting environment such as in a factory to the lighting environment such as handling agricultural products.

また、上記の非特許文献１に記載のような技術において、ＡＩを学習させるには、学習データを十分に確保することになる。特に、ＡＩを高精度化させるには、大量の学習データを確保するのが望ましい。ゆえに、上記の非特許文献１に記載のような技術では、摘果対象物をＡＩで高精度に推定するのが難しい課題がある。 In addition, in the technique as described in Non-Patent Document 1 above, sufficient learning data must be secured in order for AI to learn. In particular, in order to improve the accuracy of AI, it is desirable to secure a large amount of learning data. Therefore, with the technique as described in Non-Patent Document 1, it is difficult to estimate the thinning object with AI with high accuracy.

本発明は、農作物の摘果作業における摘果対象物をＡＩで高精度に推定することを目的とする。 An object of the present invention is to highly accurately estimate a thinning object in thinning work of agricultural crops by AI.

上記の課題を解決するため、本発明の一態様における、
生成部と識別部を有する学習モデルを学習させる学習装置は、
摘果前の農作物を示す画像データである第１入力画像データ、及び、摘果後の前記農作物を示す画像データである第２入力画像データを入力する画像データ入力部と、
前記農作物における摘果対象物を推定した結果を示す推定結果画像データを生成する前記生成部と、
前記推定結果画像データを識別して、識別結果を前記生成部へフィードバックさせて前記学習モデルを学習させる前記識別部と
を備える。 In order to solve the above problems, in one aspect of the present invention,
A learning device for learning a learning model having a generation unit and a recognition unit,
an image data input unit for inputting first input image data, which is image data representing crops before thinning, and second input image data, which is image data representing the crops after thinning;
the generation unit that generates estimation result image data indicating a result of estimating a thinning target object in the crop;
and the identification unit that identifies the estimation result image data and feeds back the identification result to the generation unit to learn the learning model.

本発明によれば、農作物の摘果作業における摘果対象物を高精度にＡＩで推定できる。 ADVANTAGE OF THE INVENTION According to this invention, the thinning target object in fruit thinning work of agricultural products can be estimated with high precision by AI.

ＡＩ用の学習データ生成装置の全体構成例を示す図である。It is a figure which shows the whole structural example of the learning data generation apparatus for AI. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of an information processing apparatus. 第１実施形態の全体処理例を示す図である。It is a figure which shows the example of the whole process of 1st Embodiment. 敵対的生成ネットワークの構成例を示す図である。FIG. 4 is a diagram showing a configuration example of a generative adversarial network; 撮影方法の例を示す図である。It is a figure which shows the example of the imaging|photography method. 第２実施形態の全体処理例を示す図である。It is a figure which shows the example of the whole process of 2nd Embodiment. 抽出処理の例を示す図である。It is a figure which shows the example of an extraction process. インスタンスセグメンテーションの処理例、及び、マスク画像データの例を示す図である。FIG. 10 is a diagram illustrating an example of instance segmentation processing and an example of mask image data; イラスト化の処理例を示す図である。It is a figure which shows the processing example of illustration. イラスト化された画像データ、又は、マスク画像データの変形例を示す図である。FIG. 10 is a diagram showing a modified example of illustrated image data or mask image data; 対象物体の認識例を示す図である。It is a figure which shows the recognition example of a target object. 全体処理の処理結果例を示す図である。FIG. 10 is a diagram illustrating an example of a processing result of overall processing; 学習装置の構成例を示す図である。It is a figure which shows the structural example of a learning apparatus. 学習装置によって学習を行う構成の例を示す図である。It is a figure which shows the example of the structure which learns by a learning apparatus. 学習装置の機能構成例を示す図である。It is a figure which shows the functional structural example of a learning apparatus. 摘果対象物推定装置の構成例を示す図である。It is a figure which shows the structural example of a thinning target object estimation apparatus. 摘果対象物推定装置によって推定を行う構成の例を示す図である。It is a figure which shows the example of the structure which estimates by a thinning target object estimation apparatus. 摘果対象物推定装置の機能構成例を示す図である。It is a figure which shows the functional structural example of a thinning target object estimation apparatus. 学習システムの機能構成例を示す図である。It is a figure which shows the functional structural example of a learning system. 推定システムの機能構成例を示す図である。It is a figure which shows the functional structural example of an estimation system. ネットワーク構造例を示す図である。It is a figure which shows an example of a network structure.

以下、添付する図面を参照して、具体例を説明する。なお、以下の説明において、図面に記載する符号は、符号が同一の場合には同一の要素を指す。 A specific example will be described below with reference to the accompanying drawings. In the following description, reference numerals in the drawings refer to the same elements when the reference numerals are the same.

［第１実施形態］
図１は、ＡＩ用の学習データ生成装置の全体構成例を示す図である。例えば、ＡＩ用の学習データ生成装置（以下「学習データ生成装置１０」という。）は、以下のように用いる。 [First embodiment]
FIG. 1 is a diagram showing an example of the overall configuration of a learning data generation device for AI. For example, a learning data generation device for AI (hereinafter referred to as "learning data generation device 10") is used as follows.

学習データ生成装置１０は、例えば、以下のような情報処理装置等である。 The learning data generation device 10 is, for example, an information processing device such as the following.

［情報処理装置のハードウェア構成例］
図２は、情報処理装置のハードウェア構成例を示す図である。例えば、学習データ生成装置１０は、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ、以下「ＣＰＵ１０Ｈ１」という。）、記憶装置１０Ｈ２、インタフェース１０Ｈ３、入力装置１０Ｈ４、及び、出力装置１０Ｈ５等を有するハードウェア構成である。また、学習データ生成装置１０は、ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ、以下「ＧＰＵ１０Ｈ６」という。）を有するハードウェア構成であるのが望ましい。 [Hardware Configuration Example of Information Processing Device]
FIG. 2 is a diagram illustrating a hardware configuration example of an information processing apparatus. For example, the learning data generation device 10 has a hardware configuration including a Central Processing Unit (CPU, hereinafter referred to as "CPU 10H1"), a storage device 10H2, an interface 10H3, an input device 10H4, an output device 10H5, and the like. Further, the learning data generation device 10 preferably has a hardware configuration including a Graphics Processing Unit (GPU, hereinafter referred to as "GPU 10H6").

ＣＰＵ１０Ｈ１は、演算装置及び制御装置の例である。例えば、ＣＰＵ１０Ｈ１は、プログラム、又は、操作等に基づいて演算を行う。 The CPU 10H1 is an example of an arithmetic device and a control device. For example, the CPU 10H1 performs calculations based on programs, operations, or the like.

記憶装置１０Ｈ２は、メモリ等の主記憶装置である。なお、記憶装置１０Ｈ２は、ＳＳＤＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ（ＳＳＤ）、又は、ハードディスク等の補助記憶装置があってもよい。 The storage device 10H2 is a main storage device such as a memory. The storage device 10H2 may be an SSD Solid State Drive (SSD) or an auxiliary storage device such as a hard disk.

インタフェース１０Ｈ３は、ネットワーク、又は、ケーブル等を介して外部装置とデータを送受信する。例えば、インタフェース１０Ｈ３は、コネクタ、又は、アンテナ等である。 The interface 10H3 transmits and receives data to and from an external device via a network, cable, or the like. For example, the interface 10H3 is a connector, an antenna, or the like.

入力装置１０Ｈ４は、ユーザによる操作を入力する装置である。例えば、入力装置１０Ｈ４は、マウス、又は、キーボード等である。 The input device 10H4 is a device for inputting an operation by the user. For example, the input device 10H4 is a mouse, keyboard, or the like.

出力装置１０Ｈ５は、ユーザに対して処理結果等を出力する装置である。例えば、出力装置１０Ｈ５は、ディスプレイ等である。 The output device 10H5 is a device that outputs processing results and the like to the user. For example, the output device 10H5 is a display or the like.

ＧＰＵ１０Ｈ６は、画像処理用の演算装置である。なお、ＧＰＵ１０Ｈ６は、グラフィックコントローラ等と呼ばれる場合もある。特に、ＧＰＵ１０Ｈ６は、画像処理をリアルタイムに行う場合、又は、学習における並列計算等に用いる。 The GPU 10H6 is an arithmetic unit for image processing. Note that the GPU 10H6 may also be called a graphic controller or the like. In particular, the GPU 10H6 is used for real-time image processing or for parallel computation in learning.

なお、学習データ生成装置１０は、上記以外のハードウェア資源を内部、又は、外部に更に有するハードウェア構成であってもよい。また、学習データ生成装置１０は、複数の装置であってもよい。 Note that the learning data generation device 10 may have a hardware configuration that further includes internal or external hardware resources other than those described above. Also, the learning data generation device 10 may be a plurality of devices.

［農作物、対象物体、摘果対象物、及び、摘果作業について］
学習データ生成装置１０は、摘果作業を行う前の農作物（以下、摘果作業前の状態の農作物を「第１農作物１２」という。）をカメラ１１で撮影した画像データ（以下「第１入力画像データ１１Ｄ１」という。）を入力する。なお、カメラ１１等の撮影装置は、学習データ生成装置１０が有する構成でもよい。 [Regarding crops, target objects, thinning targets, and fruit thinning work]
The learning data generating device 10 generates image data (hereinafter referred to as "first input image data 11D1”). Note that the photographing device such as the camera 11 may be configured in the learning data generation device 10 .

さらに、学習データ生成装置１０は、摘果作業を行った後の農作物（以下、摘果作業後の状態の農作物を「第２農作物１３」という。）をカメラ１１で撮影した画像データ（以下「第２入力画像データ１１Ｄ２」という。）を入力する。 Furthermore, the learning data generation device 10 generates image data (hereinafter referred to as "second Input image data 11D2”) is input.

以下、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２をまとめて単に「入力画像データ」という場合がある。 Hereinafter, the first input image data 11D1 and the second input image data 11D2 may be collectively referred to simply as "input image data".

第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２は、動画、静止画、又は、これらの組み合わせである。また、動画の形式で入力する場合には、例えば、動画を構成する複数のフレームのうち、１枚、又は、所定数のフレームを切り出して、入力画像データとする。 The first input image data 11D1 and the second input image data 11D2 are moving images, still images, or a combination thereof. When inputting in the form of a moving image, for example, one frame or a predetermined number of frames are extracted from a plurality of frames constituting the moving image and used as input image data.

摘果作業は、農作物が有する、若しくは、農作物の周辺に存在する実、花、葉、又は、これらの組み合わせ（以下「対象物体」という。）を間引く作業である。すなわち、摘果作業は、摘粒、摘果、摘花、又は、これらの組み合わせとなる作業である。 Fruit thinning work is work to thin out fruits, flowers, leaves, or a combination thereof (hereinafter referred to as “target objects”) that are present in or around the crops. That is, the fruit thinning work is grain thinning, fruit thinning, flower thinning, or a combination thereof.

以下、対象物体のうち、摘果作業で間引く対象を「摘果対象物」という。つまり、摘果作業は、複数の対象物体のうち、いくつかの摘果対象物を選んで間引く作業である。なお、図では、摘果対象物を「×」で示し、間引かれた状態であることを示す。ただし、対象物体と、摘果対象物とをどのように区別して示すかの形式は問わない。 Hereinafter, among the target objects, an object to be thinned out in the fruit thinning operation is referred to as a “thinning object”. In other words, the thinning work is a work of selecting and thinning some thinning objects among a plurality of target objects. In addition, in the figure, the thinning object is indicated by "x" to indicate that it is thinned out. However, it does not matter how the target object and the thinning target object are distinguished from each other.

作業者１４は、対象物体のうち、どれを摘果対象物とするかを決定する。 The operator 14 decides which of the target objects is to be the thinning target.

例えば、摘果対象物は、同じ農作物であっても、目的により、異なる場合がある。まず、目的は、例えば、農作物に全体的に日当たりが均等となるようにする、味を調整する、農作物がある程度密集するようにする、農作物が所定の大きさに収まるようにする、又は、収穫時に農作物の見栄え（色、形状、傷がついている対象物体を少なく、又は、これらを総合した外観等である。）が良くなるようにする等である。 For example, the thinning target may differ depending on the purpose even if the crop is the same. First, the purpose is, for example, to make the crops evenly exposed to the sun overall, to adjust the taste, to make the crops dense to some extent, to make the crops fit in a predetermined size, or to harvest. Sometimes, the appearance of crops (color, shape, fewer damaged target objects, or overall appearance of these, etc.) is improved.

作業者１４は、摘果の目的に基づき、第１農作物１２に対して、見本となる摘果作業を行う。そして、作業者１４は、摘果作業の前後を別々に撮影する。このような各々の撮影により、入力画像データが生成される。 The worker 14 performs a sample thinning operation on the first crop 12 based on the purpose of thinning. Then, the operator 14 separately photographs before and after the fruit thinning work. Each such capture produces input image data.

また、入力画像データは、摘果の目的、又は、農作物の種類等によって別々に撮影する。すなわち、目的によって摘果作業の内容が異なる場合がある。ゆえに、入力画像データは、目的、又は、農作物の種類等に応じて別々に生成する。なお、作業者１４は、見本となる摘果作業を示すため、例えば、熟練の農業者等である。 Also, the input image data is captured separately depending on the purpose of fruit thinning, the type of crops, or the like. In other words, the content of the fruit thinning work may differ depending on the purpose. Therefore, the input image data is generated separately according to the purpose, the type of crops, or the like. The worker 14 is, for example, a skilled farmer or the like in order to show the sample fruit thinning work.

第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較すると、学習データ生成装置１０等は、どの箇所の対象物体を摘果対象物とするか、及び、どの程度の量を摘果対象物とするか等が把握できる。 When the first input image data 11D1 and the second input image data 11D2 are compared, the learning data generation device 10 or the like determines which part of the target object is the thinning target and how much the thinning target is. It is possible to grasp whether or not

農作物は、例えば、トマト等といった実を実らせる農作物である。以下、農作物がトマトである場合を例に説明する。ただし、農作物は、トマトに限られない。例えば、農作物は、柿、さくらんぼ、苺、葡萄、又は、蜜柑等の果物である。又は、農作物は、花、若しくは、野菜等でもよい。なお、農作物がトマト等であっても、摘果対象物には、実の周辺に存在する葉、又は、茎等が含まれてもよい。 Agricultural crops are, for example, agricultural crops that bear fruit such as tomatoes. In the following, the case where the crop is tomato will be described as an example. However, crops are not limited to tomatoes. For example, crops are fruits such as persimmons, cherries, strawberries, grapes, or oranges. Alternatively, the crops may be flowers, vegetables, or the like. Note that even if the crop is a tomato or the like, the thinning object may include leaves, stems, or the like around the fruit.

以上のように撮影される第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２が学習データ生成装置１０に入力される。次に、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２が入力されると、学習データ生成装置１０は、全体処理により、学習用の画像データ（以下「学習データ１５」という。）を生成する。このように生成される学習データ１５を入力し、ＡＩ１６は、学習を行う。 The first input image data 11D1 and the second input image data 11D2 photographed as described above are input to the learning data generating device 10. FIG. Next, when the first input image data 11D1 and the second input image data 11D2 are input, the learning data generation device 10 generates image data for learning (hereinafter referred to as "learning data 15") through overall processing. to generate The learning data 15 generated in this manner is input to the AI 16 to perform learning.

［全体処理例］
図３は、第１実施形態の全体処理例を示す図である。 [Overall processing example]
FIG. 3 is a diagram illustrating an example of overall processing according to the first embodiment.

ステップＳ０３０１では、作業者１４は、第１入力画像データ１１Ｄ１を撮影する。すなわち、作業者１４は、摘果作業を行う前に第１農作物１２を撮影して、第１入力画像データ１１Ｄ１を生成する。 In step S0301, the operator 14 photographs the first input image data 11D1. That is, the worker 14 photographs the first crop 12 before performing the fruit thinning work to generate the first input image data 11D1.

ステップＳ０３０２では、作業者１４は、摘果作業を行う。この摘果作業により、第１農作物１２は、摘果対象物が排除された状態となり、第２農作物１３となる。このような摘果作業の後、ステップＳ０３０３が行われる。 In step S0302, the operator 14 performs fruit thinning work. As a result of this thinning operation, the first crop 12 becomes a second crop 13 from which the target of thinning has been removed. After such thinning work, step S0303 is performed.

ステップＳ０３０３では、作業者１４は、第２入力画像データ１１Ｄ２を撮影する。すなわち、作業者１４は、摘果作業を行った後に第２農作物１３を撮影して、第２入力画像データ１１Ｄ２を生成する。 In step S0303, the operator 14 photographs the second input image data 11D2. That is, the worker 14 photographs the second crop 13 after performing the fruit thinning work, and generates the second input image data 11D2.

ステップＳ０３０４では、学習データ生成装置１０は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を入力する。 In step S0304, the learning data generation device 10 inputs the first input image data 11D1 and the second input image data 11D2.

ステップＳ０３０５では、学習データ生成装置１０は、摘果対象物を抽出する。例えば、学習データ生成装置１０は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較して、第１入力画像データ１１Ｄ１が示すすべての対象物体のうち、第２入力画像データ１１Ｄ２上では無くなっている対象物体を摘果対象物と抽出する。 In step S0305, the learning data generation device 10 extracts a thinning object. For example, the learning data generation device 10 compares the first input image data 11D1 and the second input image data 11D2, and out of all the target objects indicated by the first input image data 11D1, the second input image data 11D2 In the above, the missing target object is extracted as the thinning target object.

したがって、抽出結果は、摘果対象物の位置を示す画像データ等の形式となる。具体的には、抽出結果は、第１入力画像データ１１Ｄ１を加工して、摘果対象物の領域を所定の色で塗り潰す、又は、ハッチングする等によって示す。 Therefore, the extraction result is in the form of image data or the like indicating the position of the object to be thinned. Specifically, the extraction result is indicated by processing the first input image data 11D1 and filling in the region of the thinning object with a predetermined color, hatching, or the like.

なお、抽出結果は、画像データ形式に限られず、摘果対象物を特定できればよい。例えば、抽出において、対象物体を認識する場合には、各々の対象物体に対し、識別番号、又は、画像データにおける座標値（図心等の代表値でもよい。）が設定される。このような識別番号、又は、座標値等を指定して摘果対象物を特定する形式で、抽出結果は生成されてもよい。 Note that the extraction result is not limited to the image data format, as long as the thinning object can be specified. For example, when recognizing a target object in extraction, an identification number or a coordinate value in image data (a representative value such as the centroid may be used) is set for each target object. The extraction result may be generated in a format that identifies the thinning object by designating such an identification number or coordinate values.

ただし、学習データ生成装置１０は、識別番号等のデータがあれば、抽出結果を示す画像データが生成できるとする。以下、抽出結果は、画像データの形式である例で説明する。 However, it is assumed that the learning data generation device 10 can generate image data indicating the extraction result if there is data such as an identification number. In the following, extraction results will be described using an example in the form of image data.

なお、抽出結果は、ユーザによる指定、訂正、又は、追加がされてもよい。 Note that the extraction result may be specified, corrected, or added by the user.

ステップＳ０３０６では、学習データ生成装置１０は、抽出結果を示す画像データ等を学習データとし、学習を行う。 In step S0306, the learning data generation device 10 performs learning using image data or the like representing the extraction result as learning data.

学習データは、抽出結果等を示す画像データ等、すなわち、イラスト化された形式の画像等である。ただし、学習データは、複数の形式の画像データでもよい。学習データの形式は、後述する。 The learning data is image data or the like indicating the extraction result or the like, that is, an illustrated image or the like. However, the learning data may be image data in multiple formats. The format of learning data will be described later.

なお、学習は、繰り返し行われてもよい。すなわち、学習は、後述するステップＳ０３０７、及び、ステップＳ０３０８が所定の精度を確保して実行できる程度に繰り返されてもよい。 In addition, learning may be performed repeatedly. That is, the learning may be repeated to the extent that steps S0307 and S0308, which will be described later, can be executed with a predetermined accuracy.

ステップＳ０３０７では、学習データ生成装置１０は、推定結果画像データを生成する。 In step S0307, the learning data generation device 10 generates estimation result image data.

ステップＳ０３０８では、学習データ生成装置１０は、推定結果画像データを識別する。 In step S0308, the learning data generation device 10 identifies the estimation result image data.

ステップＳ０３０７、及び、ステップＳ０３０８は、例えば、以下のような構成で実現されるのが望ましい。 Steps S0307 and S0308 are desirably realized by, for example, the following configuration.

［敵対的生成ネットワーク（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ、以下「ＧＡＮ」という。）による画像データの生成と識別の例］
図４は、敵対的生成ネットワークの構成例を示す図である。例えば、学習データ生成装置１０は、抽出部１０Ｆ２、生成部１０Ｆ３、及び、識別部１０Ｆ４等により、以下のような構成であるのが望ましい。 [Example of generation and identification of image data by generative adversarial networks (hereinafter referred to as "GAN")]
FIG. 4 is a diagram showing a configuration example of a generative adversarial network. For example, the learning data generation device 10 preferably has the following configuration with the extraction unit 10F2, the generation unit 10F3, the identification unit 10F4, and the like.

ＧＡＮは、図示するように、生成部１０Ｆ３が生成する画像データと、抽出部１０Ｆ２による抽出結果を示す画像データを識別部１０Ｆ４が見分ける構成である。 As illustrated, the GAN has a configuration in which the identification unit 10F4 distinguishes between image data generated by the generation unit 10F3 and image data representing the extraction result of the extraction unit 10F2.

生成部１０Ｆ３は、敵対的生成ネットワークにおける生成器（Ｇｅｎｅｒａｔｏｒ、生成ネットワーク等とも呼ばれる。）となる。すなわち、生成部１０Ｆ３は、画像データを作り出すニューラルネットワークモデルである。 The generation unit 10F3 becomes a generator in the hostile generation network (also called a generator, a generation network, etc.). That is, the generation unit 10F3 is a neural network model that generates image data.

識別部１０Ｆ４は、敵対的生成ネットワークにおける識別器（Ｄｉｓｃｒｉｍｉｎａｔｏｒ、識別ネットワーク等とも呼ばれる。）となる。すなわち、識別部１０Ｆ４は、画像データが生成器によって生成された画像データであるか否かを識別するニューラルネットワークモデルである。 The identification unit 10F4 serves as a discriminator (also called a discriminator, identification network, etc.) in the hostile generation network. That is, the identification unit 10F4 is a neural network model that identifies whether image data is image data generated by a generator.

以下、ＧＡＮを構成する生成器、及び、識別器の学習に用いる学習データを「第１学習データ」という。一方で、全体処理によって生成される、すなわち、識別部１０Ｆ４の識別結果に基づき、出力する学習データを「第２学習データ」という。 Hereinafter, the learning data used for learning the generator and classifier that constitute the GAN will be referred to as "first learning data". On the other hand, learning data generated by the overall process, that is, output based on the identification result of the identification unit 10F4 is referred to as "second learning data".

図示するＧＡＮでは、抽出結果を示す画像データ（以下単に「抽出結果２０」という。）が「本物」となる。また、抽出結果２０は、生成部１０Ｆ３の「見本」にもなる。すなわち、生成部１０Ｆ３は、例えば、いくつかの抽出結果２０を第１学習データとして事前に学習し、ある程度の精度で抽出結果２０に似せた画像データを生成できる構成とする。 In the illustrated GAN, the image data representing the extraction result (hereinafter simply referred to as "extraction result 20") is "genuine". The extraction result 20 also serves as a "sample" for the generation unit 10F3. That is, the generation unit 10F3 is configured to learn in advance some extraction results 20 as first learning data, for example, and generate image data that resembles the extraction results 20 with a certain degree of accuracy.

一方で、生成部１０Ｆ３が生成する摘果作業の内容を推定した結果を示す画像データ（以下「推定結果画像データ２１」という。）が「偽物」である。 On the other hand, the image data (hereinafter referred to as "estimation result image data 21") indicating the result of estimating the content of the fruit thinning work generated by the generation unit 10F3 is "fake".

ステップＳ０３０７では、生成部１０Ｆ３は、推定結果画像データ２１を生成する。 In step S<b>0307 , the generation unit 10F3 generates estimation result image data 21 .

推定結果画像データ２１は、抽出結果２０を真似て生成する画像データである。したがって、推定結果画像データ２１は、抽出結果２０と同様の形式であって、摘果対象物を特定する画像データである。このように、生成部１０Ｆ３は、「偽物」である推定結果画像データ２１を識別部１０Ｆ４に「本物」と識別させるのを狙って生成する。 The estimation result image data 21 is image data generated by imitating the extraction result 20 . Therefore, the estimation result image data 21 has the same format as the extraction result 20, and is image data that specifies the thinning object. In this way, the generation unit 10F3 generates the estimation result image data 21, which is a "fake", with the aim of causing the identification unit 10F4 to identify it as "genuine".

ただし、推定結果画像データ２１は、生成部１０Ｆ３が生成する画像データであるため、実在する農作物を示す画像データではない。このように、生成部１０Ｆ３、及び、識別部１０Ｆ４、すなわち、ＧＡＮは、合成画像データを生成する。 However, since the estimation result image data 21 is image data generated by the generation unit 10F3, it is not image data representing an actual agricultural product. Thus, the generation unit 10F3 and the identification unit 10F4, that is, the GAN generate synthetic image data.

また、推定結果画像データ２１は、抽出結果２０が示す摘果作業を別の農作物において再現する。すなわち、推定結果画像データ２１は、すべての対象物体のうち、摘果対象物となる対象物体を推定した結果を示す。 In addition, the estimation result image data 21 reproduces the fruit thinning work indicated by the extraction result 20 in another crop. That is, the estimation result image data 21 indicates the result of estimating the target object to be the thinning target among all the target objects.

生成部１０Ｆ３は、事前に、抽出結果２０等を第１学習データにして摘果作業のパターン等を学習する。したがって、生成部１０Ｆ３は、未知の農作物を示す第１入力画像データ１１Ｄ１が入力されると、まず、事前の学習により、第１入力画像データ１１Ｄ１が示す対象物体を認識できる。 The generation unit 10F3 previously learns the pattern of the fruit thinning work using the extraction result 20 and the like as first learning data. Therefore, when the first input image data 11D1 representing unknown crops is input, the generation unit 10F3 can first recognize the target object represented by the first input image data 11D1 through prior learning.

次に、生成部１０Ｆ３は、事前の学習により、認識した対象物体のうち、どの位置にある対象物体を摘果対象物するか、又は、どの程度の量を摘果対象物とするか等を推定できる。そして、生成部１０Ｆ３は、これらの推定結果を画像データの形式で示し、推定結果画像データ２１を生成する。 Next, the generation unit 10F3 can estimate, by prior learning, which position of the recognized target object is to be the thinning target object, or how much of the target object is to be the thinning target object. . The generation unit 10F3 generates estimation result image data 21 by presenting these estimation results in the form of image data.

ステップＳ０３０８では、抽出結果２０、及び、推定結果画像データ２１を混ぜ、識別部１０Ｆ４は、「本物」であるか、又は、「偽物」であるかを識別する。 In step S0308, the extraction result 20 and the estimation result image data 21 are mixed, and the identification unit 10F4 identifies whether it is "genuine" or "fake".

生成部１０Ｆ３は、できる限り「本物」と識別部１０Ｆ４に識別されるように推定結果画像データ２１を生成するように、画像処理等を学習する。一方で、識別部１０Ｆ４は、フィードバック等に基づき、「偽物」を「偽物」と識別できる精度を高めるように学習する。 The generation unit 10F3 learns image processing and the like so as to generate the estimation result image data 21 so as to be identified as "genuine" by the identification unit 10F4 as much as possible. On the other hand, the identification unit 10F4 learns based on feedback and the like so as to improve the accuracy with which it can identify a "fake" as a "fake".

具体的には、識別部１０Ｆ４による識別結果に対し、第１学習データには、識別対象となった画像データが「本物」であるか、又は、「偽物」であるかの「正解」を示すデータ（以下「正解データ２２」という。）が用意される。そして、識別結果と正解データ２２を照合すると、識別部１０Ｆ４が正しい識別であったか否かを評価できる。 Specifically, for the identification result of the identification unit 10F4, the first learning data indicates the “correct answer” as to whether the image data to be identified is “genuine” or “fake”. Data (hereinafter referred to as "correct answer data 22") is prepared. Then, by collating the identification result with the correct answer data 22, it is possible to evaluate whether or not the identification part 10F4 has correctly identified.

このような評価、及び、識別結果等が生成部１０Ｆ３にフィードバック（Ｆｅｅｄｂａｃｋ）されると、生成部１０Ｆ３は、識別部１０Ｆ４に「本物」と識別されるのを狙って、推定結果画像データ２１を生成するように学習できる。すなわち、生成部１０Ｆ３は、フィードバックによって「本物」と識別されやすい「偽物」を生成できるように学習する。 When such evaluation and identification results are fed back to the generation unit 10F3, the generation unit 10F3 generates the estimation result image data 21 with the aim of being identified as "genuine" by the identification unit 10F4. It can be learned to generate. That is, the generation unit 10F3 learns so as to generate a "fake" that can be easily identified as a "genuine" by feedback.

また、評価が識別部１０Ｆ４にフィードバックされると、識別部１０Ｆ４は、「偽物」を「偽物」と識別できる精度を高めるように学習できる。すなわち、識別部１０Ｆ４は、フィードバックによって、「偽物」を見逃す、又は、「偽物」を「本物」と誤認する確率を低くするように学習する。 Further, when the evaluation is fed back to the identification unit 10F4, the identification unit 10F4 can learn to improve the accuracy with which it can identify a "fake" as a "fake". That is, the identifying unit 10F4 learns by feedback so as to reduce the probability of overlooking the "fake" or misidentifying the "fake" as the "genuine".

なお、学習データ生成装置１０は、事前にステップＳ０３０６による第１学習データに基づく学習を繰り返す、学習処理を行って、生成部１０Ｆ３、及び、識別部１０Ｆ４にある程度の精度を持たせてもよい。 Note that the learning data generation device 10 may repeat learning based on the first learning data in step S0306 in advance, and perform learning processing to give the generation unit 10F3 and the identification unit 10F4 a certain degree of accuracy.

そして、識別部１０Ｆ４によって「本物」と識別される程度の品質で生成された推定結果画像データ２１を第２学習データとする。このように、学習データ１５を生成すると、ＡＩ１６が学習に用いる第２学習データを増やすことができる。 Then, the estimation result image data 21 generated with such a quality as to be identified as "genuine" by the identification unit 10F4 is used as second learning data. By generating the learning data 15 in this way, the second learning data used by the AI 16 for learning can be increased.

一方で、識別部１０Ｆ４によって「偽物」と識別された推定結果画像データ２１は、「再利用」の対象とする。すなわち、「偽物」と識別された推定結果画像データは、学習が不十分な結果である。 On the other hand, the estimation result image data 21 identified as "counterfeit" by the identification unit 10F4 is targeted for "reuse". That is, the estimation result image data identified as "fake" is the result of insufficient learning.

そこで、例えば、「偽物」と識別された推定結果画像データに対して、「本物」と識別させるように、不十分な点を修正する操作を行う。このように、手動で操作された内容を反映させた画像データ等により、生成部１０Ｆ３にフィードバックさせる等の処理が「再利用」となる。このような「再利用」がされると、生成部１０Ｆ３は、不十分な点を学習し、より「本物」と識別されやすい推定結果画像データ２１を生成できる。 Therefore, for example, the estimation result image data identified as "fake" is subjected to an operation of correcting the insufficiency so as to be identified as "genuine". In this way, processing such as feedback to the generation unit 10F3 by image data or the like reflecting the content of manual operation is "reuse". When such "reuse" is performed, the generation unit 10F3 can learn the insufficient point and generate the estimation result image data 21 that is more likely to be identified as "genuine".

なお、「再利用」は、生成部１０Ｆ３の学習に用いるに限られない。例えば、「再利用」は、手動で操作された内容を反映させた画像データを学習データ１５に加える等でもよい。ただし、「再利用」が難しい場合には、「偽物」と識別された推定結果画像データは、破棄されてもよい。 Note that "reuse" is not limited to being used for learning by the generation unit 10F3. For example, "reuse" may be added to the learning data 15 by image data reflecting the content of manual operation. However, if "reuse" is difficult, the estimation result image data identified as "fake" may be discarded.

なお、図示するようなＧＡＮは、ＡＩ１６の学習に用いる学習データ１５を生成する。このように生成される第２学習データは、農作物の摘果箇所を推定するＡＩ用であり、人による目視で評価される画像データとは異なる。 The illustrated GAN generates learning data 15 used for AI 16 learning. The second learning data generated in this way is for AI to estimate thinning locations of crops, and is different from image data visually evaluated by humans.

例えば、一般的な風景等を撮影した場合には、画像データには、人の目視では判断しにくいような微小な色の変化等が存在する場合がある。このような変化は、人の目視による評価ではあまり重視されない。一方で、コンピュータによる評価では、画素値の変動等を計算すると把握できる場合がある。このように、画像データの生成は、コンピュータによる評価を意識するか、又は、人の目視による評価を意識するかにより、重視する評価項目等が異なる場合がある。 For example, when a general landscape is photographed, the image data may include minute changes in color that are difficult to determine by human eyes. Such changes are not given much importance in human visual evaluation. On the other hand, in evaluation by a computer, there are cases where it can be grasped by calculating fluctuations in pixel values. In this way, when generating image data, evaluation items to be emphasized may differ depending on whether evaluation by a computer or visual evaluation by a person is conscious.

［撮影方法の例］
第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２等の入力画像データは、例えば、以下のように撮影されるのが望ましい。 [Example of shooting method]
The input image data such as the first input image data 11D1 and the second input image data 11D2 are desirably shot as follows, for example.

図５は、撮影方法の例を示す図である。以下、図において上下方向を「Ｚ軸方向」とする。Ｚ軸方向は、いわゆる重力方向である。また、図において、主に左右方向を「Ｘ軸方向」とする。Ｘ軸方向は、農作物に対して正面に向かい合った状態で右手方向とする。さらに、奥行き方向を「Ｙ軸方向」とする。 FIG. 5 is a diagram showing an example of an imaging method. Hereinafter, the vertical direction in the drawings will be referred to as the "Z-axis direction". The Z-axis direction is the so-called direction of gravity. Also, in the drawings, the horizontal direction is mainly defined as the "X-axis direction". The X-axis direction is the right-hand direction when facing the crops. Further, the depth direction is defined as "Y-axis direction".

以下、第１農作物１２を撮影する場合を例に説明する。 A case of photographing the first crop 12 will be described below as an example.

入力画像データは、Ｚ軸回りに複数の視点で撮影するのが望ましい。すなわち、入力画像データは、第１農作物１２をできるだけ様々な視点で示す画像データであるのが望ましい。 The input image data is desirably photographed from a plurality of viewpoints around the Z-axis. That is, it is desirable that the input image data be image data showing the first crop 12 from as many viewpoints as possible.

具体的には、カメラ１１は、光軸を第１農作物１２に向けて、Ｚ軸を中心に回転するように（いわゆるＹａｗ軸回転である。図において「Ｙａｗ」で示す回転である。）動画で撮影するのが望ましい。 Specifically, the camera 11 turns the optical axis toward the first crop 12 and rotates around the Z axis (so-called Yaw axis rotation. This is the rotation indicated by "Yaw" in the figure). It is preferable to shoot with

このように撮影すると、第１農作物１２を全周方向から撮影できる。なお、入力画像データは、３６０°のうち、３視点程度を撮影する静止画等でもよい。 By photographing in this way, the first crop 12 can be photographed from all directions. Note that the input image data may be still images or the like obtained by photographing about three viewpoints out of 360 degrees.

摘果作業は、農作物の全体的な形状、又は、日当たり等を気にして行う場合がある。したがって、摘果対象物は、様々な角度に存在する場合がある。ゆえに、カメラ１１は、１つの視点では、すべての摘果対象物を撮影できない場合もある。そのため、入力画像データは、できるだけ死角がないように様々な視点で撮影されるのが望ましい。 The fruit thinning work may be performed with consideration given to the overall shape of the crops or the sun exposure. Therefore, the thinning object may exist at various angles. Therefore, the camera 11 may not be able to photograph all thinning objects from one viewpoint. Therefore, it is desirable that the input image data be shot from various viewpoints with as few blind spots as possible.

なお、入力画像データは、Ｘ軸回りに複数の視点で更に撮影するのがより望ましい。例えば、カメラ１１は、光軸を第１農作物１２に向けて、第１農作物１２の正面となる視点、第１農作物１２を下から撮影する視点（いわゆる見上げ視点である。）、及び、第１農作物１２の背面となる視点等で撮影する。 In addition, it is more desirable that the input image data is further photographed from a plurality of viewpoints around the X axis. For example, the camera 11 directs the optical axis to the first crop 12, and has a viewpoint that is the front of the first crop 12, a viewpoint that captures the first crop 12 from below (a so-called looking-up viewpoint), and a viewpoint that captures the first crop 12 from below. Photographing is performed from a viewpoint or the like that is the back of the crop 12 .

このように、カメラ１１は、Ｘ軸を中心に回転するように（いわゆるＰｉｔｃｈ軸回転である。図において「Ｐｉｔｃｈ」で示す回転である。）撮影するのが望ましい。 In this way, it is desirable that the camera 11 takes an image while rotating around the X axis (so-called Pitch axis rotation, which is indicated by "Pitch" in the figure).

また、第２入力画像データ１１Ｄ２も同様に撮影されるのが望ましい。 Also, it is desirable that the second input image data 11D2 is similarly captured.

以上のように、Ｐｉｔｃｈ、又は、Ｙａｗの回転を行って複数の視点で農作物を撮影して入力画像データが撮影されるのが望ましい。このような撮影であると、農作物の全体の形状を整える摘果作業、又は、農作物の日当たりの良さを整える摘果作業等を入力画像データから把握できる。 As described above, it is desirable that the input image data is photographed by photographing crops from a plurality of viewpoints by performing Pitch or Yaw rotation. With this type of photography, it is possible to grasp, from the input image data, the fruit thinning work for adjusting the overall shape of the crops, the fruit thinning work for adjusting the sunnyness of the crops, or the like.

また、入力画像データは、異なる気象条件、又は、異なる周囲物の配置等の条件下で撮影されてもよい。つまり、入力画像データは、季節又は天候等により、異なる周囲環境、又は、異なる照明条件下で撮影された状態を示すのが望ましい。 Also, the input image data may be captured under different weather conditions, different arrangement of surrounding objects, or the like. In other words, it is desirable that the input image data show the state of being photographed under different ambient environments or different lighting conditions depending on the season, weather, or the like.

［第２実施形態］
第２実施形態は、第１実施形態と比較すると、全体処理が以下のようになる点が異なる。 [Second embodiment]
The second embodiment differs from the first embodiment in that the overall processing is as follows.

図６は、第２実施形態の全体処理例を示す図である。以下、第１実施形態と異なる点を中心に説明し、重複する説明を省略する。第２実施形態における全体処理は、第１実施形態における全体処理と比較すると、ステップＳ０６０１を行う点が異なる。 FIG. 6 is a diagram illustrating an example of overall processing of the second embodiment. In the following, differences from the first embodiment will be mainly described, and redundant description will be omitted. The overall processing in the second embodiment differs from the overall processing in the first embodiment in that step S0601 is performed.

ステップＳ０６０１では、学習データ生成装置１０は、摘果対象物を抽出する。具体的には、学習データ生成装置１０は、以下のような抽出処理を行って摘果対象物を抽出する。 In step S0601, the learning data generation device 10 extracts a thinning object. Specifically, the learning data generation device 10 extracts the thinning object by performing the following extraction process.

図７は、抽出処理の例を示す図である。例えば、ステップＳ０６０１は、以下のような処理を行う。 FIG. 7 is a diagram illustrating an example of extraction processing. For example, step S0601 performs the following processing.

ステップＳ０７０１では、学習データ生成装置１０は、第１マスク画像データを生成する。 In step S0701, the learning data generation device 10 generates first mask image data.

第１マスク画像データは、後段のステップＳ０７０２で行うインスタンスセグメンテーション（ＩｎｓｔａｎｃｅＳｅｇｍｅｎｔａｔｉｏｎ）用の学習において学習データとなるマスク画像データである。すなわち、第１マスク画像データは、「見本」となる画像データである。 The first mask image data is mask image data that becomes learning data in learning for instance segmentation performed in step S0702 below. That is, the first mask image data is image data that serves as a "sample".

なお、第１マスク画像データは、画像データ内の一部、又は、全部を塗り潰す等のマスクする領域を指定するデータでもよい。 It should be noted that the first mask image data may be data specifying a masked area, such as filling out part or all of the image data.

以下、第１マスク画像データをインスタンスセグメンテーション用の学習データとし、かつ、インスタンスセグメンテーションにより生成されるマスク画像データを「第２マスク画像データ」という。なお、マスク画像データの詳細は後述する。 Hereinafter, the mask image data generated by using the first mask image data as learning data for instance segmentation and generating the instance segmentation will be referred to as "second mask image data". Details of the mask image data will be described later.

ステップＳ０７０２では、学習データ生成装置１０は、インスタンスセグメンテーションの学習を行う。 In step S0702, the learning data generation device 10 learns instance segmentation.

ステップＳ０７０３では、学習データ生成装置１０は、インスタンスセグメンテーションを評価する。 In step S0703, the learning data generation device 10 evaluates instance segmentation.

ステップＳ０７０４では、学習データ生成装置１０は、インスタンスセグメンテーションを行う第２マスク画像データを生成する。 In step S0704, the learning data generation device 10 generates second mask image data for instance segmentation.

例えば、インスタンスセグメンテーション、及び、マスク画像データの生成は以下のような処理である。 For example, instance segmentation and generation of mask image data are the following processes.

図８は、インスタンスセグメンテーションの処理例、及び、マスク画像データの例を示す図である。以下、図８（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 FIG. 8 is a diagram illustrating an example of instance segmentation processing and an example of mask image data. The first input image data 11D1 shown in FIG. 8A will be described below as an example.

例えば、第１入力画像データ１１Ｄ１に、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４の４つの対象物体が撮影されたとする。 For example, assume that four target objects, a first object 31, a second object 32, a third object 33, and a fourth object 34, are photographed in the first input image data 11D1.

図８（Ｂ）は、インスタンスセグメンテーションの実行結果、及び、インスタンスセグメンテーションにより生成されるマスク画像データ４０の例を示す図である。 FIG. 8B is a diagram showing an example of the execution result of instance segmentation and mask image data 40 generated by instance segmentation.

インスタンスセグメンテーションは、例えば、図８（Ａ）に示す第１入力画像データ１１Ｄ１に対して処理を実行することで、図８（Ｂ）に示すマスク画像データ４０を生成する処理である。 Instance segmentation is, for example, a process of generating the mask image data 40 shown in FIG. 8B by executing the process on the first input image data 11D1 shown in FIG. 8A.

具体的には、インスタンスセグメンテーションは、第１入力画像データ１１Ｄ１において、物体の検出、及び、検出した複数の物体を別々の物体と識別する処理である。 Specifically, the instance segmentation is a process of detecting objects in the first input image data 11D1 and distinguishing a plurality of detected objects from separate objects.

図８（Ｂ）に示す例は、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４を示す領域（以下、画像データにおいて対象物体を示す領域を「第１領域」という。）と、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４以外の領域（以下「第２領域」という。例えば、第２領域は背景等である。）とを２色で区別して示すマスク画像データ４０の例である。 In the example shown in FIG. 8(B), the area showing the first object 31, the second object 32, the third object 33, and the fourth object 34 (hereinafter, the area showing the target object in the image data is referred to as the "first area ), and regions other than the first object 31, the second object 32, the third object 33, and the fourth object 34 (hereinafter referred to as the “second region”. For example, the second region is the background or the like. ) are distinguished by two colors.

具体的には、図８（Ｂ）に示すように、マスク画像データ４０において、第１領域は、白色で示す領域である。一方で、マスク画像データ４０において、第２領域は、黒色で示す領域である。このように、マスク画像データ４０は、例えば、第１領域、及び、第２領域を二値化して異なる色で示す画像データである。 Specifically, as shown in FIG. 8B, in the mask image data 40, the first area is the area shown in white. On the other hand, in the mask image data 40, the second area is the area shown in black. Thus, the mask image data 40 is, for example, image data in which the first region and the second region are binarized and shown in different colors.

なお、マスク画像データ４０は、図８（Ｂ）に示すような形式に限られない。例えば、第１領域、及び、第２領域をどのような色にするか等は事前に設定でき、他の色の組み合わせでもよい。また、マスク画像データ４０は、色で領域を区別する形式に限られず、例えば、ハッチングの有無、又は、識別データで区別する等の形式でもよい。 Note that the mask image data 40 is not limited to the format shown in FIG. 8B. For example, the colors of the first area and the second area can be set in advance, and other color combinations may be used. Also, the mask image data 40 is not limited to a format in which areas are distinguished by color, and may be in a format in which areas are distinguished by the presence or absence of hatching, identification data, or the like.

学習データ生成装置１０は、マスク画像データ４０を第１入力画像データ１１Ｄ１に適用すると、第１領域を抽出した画像データを生成できる。すなわち、マスク画像データ４０を参照すると、学習データ生成装置１０は、第１入力画像データ１１Ｄ１において、対象物体を認識し、対象物体を抽出した画像データを生成できる。 By applying the mask image data 40 to the first input image data 11D1, the learning data generation device 10 can generate image data in which the first region is extracted. That is, referring to the mask image data 40, the learning data generation device 10 can recognize the target object in the first input image data 11D1 and generate image data in which the target object is extracted.

マスク画像データ４０を利用すると、第１入力画像データ１１Ｄ１が示す背景等を削除できる。すなわち、学習において、背景等といった対象物体以外のデータを排除できると、ＡＩが、摘果作業において重要でない物体、又は、背景等を無駄に学習してしまうのを防ぐことができる。 By using the mask image data 40, the background or the like indicated by the first input image data 11D1 can be deleted. That is, if data other than the target object, such as the background, can be excluded in learning, it is possible to prevent AI from learning unimportant objects or the background in fruit thinning work in vain.

このように、マスク画像データ４０は、背景等を第２領域とする等のように、第１領域以外をマスク化ができる画像データであるのが望ましい。 Thus, it is desirable that the mask image data 40 be image data capable of masking areas other than the first area, such as setting the background or the like as the second area.

また、マスク画像データ４０は、同じ種類の対象物体であっても、個々の対象物体を識別できる。すなわち、マスク画像データ４０を適用すると、図８（Ｂ）に示すように、第１物体３１、第２物体３２、第３物体３３、及び、第４物体３４を第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４のように、異なる物体と識別できる。 Further, the mask image data 40 can identify individual target objects even if the target objects are of the same type. That is, when the mask image data 40 is applied, as shown in FIG. Different objects such as the target object 42 , the third target object 43 and the fourth target object 44 can be identified.

例えば、セマンティックセグメンテーション（ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎ）の処理であると、第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４は、同じ物体又はカテゴリーに分類され、区別されない場合が多い。 For example, in semantic segmentation processing, the first target object 41, the second target object 42, the third target object 43, and the fourth target object 44 are classified into the same object or category, and distinguished. often not.

一方で、インスタンスセグメンテーションの処理であると、１つの対象物体を示す複数の画素をまとめて１つの物体と識別し、かつ、同じ種類であっても異なる物体であれば、別の物体であると識別できる。 On the other hand, in the case of instance segmentation processing, a plurality of pixels representing one target object are collectively identified as one object, and objects of the same type but different are regarded as different objects. Identifiable.

すなわち、インスタンスセグメンテーションの処理を行うと、画像データ内において同じ種類の複数の対象物体がある場合には、いわゆるラベリング（ｌａｂｅｌｉｎｇ）が可能となる。例えば、図８（Ｂ）に示す例では、第１対象物体４１、第２対象物体４２、第３対象物体４３、及び、第４対象物体４４が異なる識別番号等で管理できる。 That is, when the instance segmentation process is performed, so-called labeling becomes possible when there are multiple target objects of the same type in the image data. For example, in the example shown in FIG. 8B, the first target object 41, the second target object 42, the third target object 43, and the fourth target object 44 can be managed with different identification numbers or the like.

したがって、ステップＳ０７０２における学習は、対象物体を精度良く識別できる程度に行われる。そして、ステップＳ０７０３における評価は、対象物体を抽出する精度等を評価する。このようなステップＳ０７０２、及び、ステップＳ０７０３が行われると、ステップＳ０７０４で、学習データ生成装置１０は、インスタンスセグメンテーションを行う第２マスク画像データを生成できる。 Therefore, the learning in step S0702 is performed to the extent that the target object can be accurately identified. And the evaluation in step S0703 evaluates the accuracy etc. which extract a target object. When such steps S0702 and S0703 are performed, in step S0704, the learning data generation device 10 can generate second mask image data for instance segmentation.

そして、インスタンスセグメンテーションの評価結果によっては、ステップＳ０７０１乃至ステップＳ０７０３は繰り返し実行される。すなわち、「学習処理」、及び、図７に示す処理は、ある程度の精度が確保されるまで繰り返し実行され、その後、十分な学習が完了している状態下において、「生成処理」、及び、図７に示す処理が行われてもよい。 Steps S0701 to S0703 are repeatedly executed depending on the evaluation result of the instance segmentation. That is, the "learning process" and the process shown in FIG. 7 are repeatedly executed until a certain degree of accuracy is ensured. 7 may be performed.

なお、学習データ生成装置１０は、ステップＳ０７０５のように、イラスト化を更に行うのが望ましい。例えば、イラスト化は以下のような処理である。 In addition, it is desirable that the learning data generation device 10 further performs illustration as in step S0705. For example, illustration processing is as follows.

図９は、イラスト化の処理例を示す図である。以下、図９（Ａ）に示すような写真形式の第１入力画像データ１１Ｄ１を入力する場合を例に説明する。 FIG. 9 is a diagram showing an example of illustration processing. A case of inputting the first input image data 11D1 in the photograph format as shown in FIG. 9A will be described below as an example.

図９（Ａ）に示す例は、画像データの中央部分（図において果実が撮影されている部分である。以下「対象物体領域５１」という。）に、対象物体が存在する例を示す。例えば、対象物体領域５１に写る対象物体は、インスタンスセグメンテーション等の物体認識により識別される。 The example shown in FIG. 9A shows an example in which the target object exists in the central portion of the image data (the portion where the fruit is photographed in the drawing; hereinafter referred to as "target object region 51"). For example, the target object appearing in the target object region 51 is identified by object recognition such as instance segmentation.

イラスト化の処理は、例えば、第１入力画像データ１１Ｄ１を入力し、図９（Ｂ）に示すような画像データ（以下「イラスト化画像データ５０」という。）を生成する処理である。 The illustration process is, for example, a process of inputting the first input image data 11D1 and generating image data as shown in FIG. 9B (hereinafter referred to as "illustration image data 50").

図９（Ｂ）は、イラスト化画像データ５０の例を示す図である。 FIG. 9B is a diagram showing an example of the illustrated image data 50. As shown in FIG.

イラスト化画像データ５０は、対象物体の領域を所定の色で塗り潰す。例えば、図９（Ｂ）に示すように、イラスト化画像データ５０は、ハッチングで示す、対象物体の領域を塗り潰した画像データである。 The illustrated image data 50 fills in the area of the target object with a predetermined color. For example, as shown in FIG. 9B, the illustrated image data 50 is image data in which the area of the target object indicated by hatching is painted out.

以下、図９（Ｂ）に示す例において、対象物体の領域と識別され、イラスト化の処理で塗り潰す領域を「塗り潰し領域５２」という。 Hereinafter, in the example shown in FIG. 9(B), the area identified as the area of the target object and painted in the illustration process is referred to as "filled area 52".

さらに、イラスト化画像データ５０は、塗り潰し領域５２以外の領域（背景等を示す領域である。）を白色（塗り潰し領域５２とは異なる色で塗り潰す等である。）とする。 Further, in the illustrated image data 50, the area other than the painted area 52 (the area indicating the background, etc.) is made white (filled with a different color from the painted area 52, etc.).

このように、イラスト化の処理は、対象物体の領域と、それ以外の領域を所定の色で色分けする処理等である。このように、イラスト化の処理を行うと、画像データにおけるＲＧＢ値又は輝度値等が単純化できる。 In this way, the illustration process is a process of classifying the area of the target object and the other areas with a predetermined color. By performing the illustration processing in this manner, the RGB values or luminance values in the image data can be simplified.

第１入力画像データ１１Ｄ１のような写真形式の画像データであると、人の目には分かりにくい細かなＲＧＢ値、又は、輝度値等の変化がある場合が多い。 Photographic image data such as the first input image data 11D1 often includes subtle changes in RGB values or luminance values that are difficult for the human eye to perceive.

例えば、トマトの果実は、単純には赤色の１色である。このような対象物体を示す場合において、写真形式の画像データであると、同じ対象物体における赤色を示す画素は、細かくＲＧＢ値等の画素値が変化する場合がある。このような細かなＲＧＢ値等の変化は、学習の対象としない方がよい場合が多い。 For example, a tomato fruit is simply one color, red. In the case of showing such a target object, if the image data is in the form of a photograph, the pixel values such as the RGB values of the pixels showing red in the same target object may change minutely. In many cases, such fine changes in RGB values should not be learned.

そこで、イラスト化の処理は、対象物体を同じ色で統一して示す等の処理を行う。具体的には、第１入力画像データ１１Ｄ１に対して、インスタンスセグメンテーション等を行うと、対象物体と識別できる画素がグルーピング化される。 Therefore, in the illustration processing, processing such as displaying the target objects in the same color is performed. Specifically, when instance segmentation or the like is performed on the first input image data 11D1, pixels that can be identified as a target object are grouped.

そして、イラスト化の処理は、このように同じグルーピング化された画素を同じ色で塗り潰す処理である。さらに、イラスト化の処理は、背景等の領域を対象物体の領域とは異なる色で別の色に塗り潰す処理である。 The illustration process is a process of filling in the same grouped pixels with the same color. Furthermore, the illustration process is a process of filling in a region such as a background with a color different from that of the region of the target object.

なお、イラスト化の処理は、画像データを単純化する処理であれば、所定の色で塗り潰す以外の処理であってもよい。例えば、イラスト化の処理は、背景等を単色にする等でもよい。また、イラスト化の処理は、色で塗り潰すに代えて、ハッチング等を用いる処理でもよい。 Note that the illustration process may be any process other than painting with a predetermined color as long as it simplifies the image data. For example, the illustration processing may be such that the background or the like is rendered in a single color. Further, the illustration process may be a process using hatching or the like instead of filling with color.

このように、画像データをイラスト化すると、抽出結果等を単純化して表現できる。抽出結果は、対象物体の位置、及び、形状等が大まかに表現できればよい場合が多い。すなわち、抽出結果には、細かな色の変化、及び、背景等のデータが不要な場合が多い。 In this way, if the image data is illustrated, extraction results and the like can be expressed in a simplified manner. As for the extraction result, it is often sufficient if the position, shape, etc. of the target object can be roughly expressed. That is, extraction results often do not require data such as fine color changes and backgrounds.

そこで、対象物体を単色で簡略に示す方が、写真形式等と比較して、学習の妨げとなる要素を排除し、精度良く学習できる。すなわち、イラスト化された画像データを学習データに摘果作業をＡＩに学習させると、ＡＩは、摘果作業に重要な特徴量を精度良く学習できる。 Therefore, simply showing the target object in a single color eliminates factors that hinder learning and allows for more accurate learning than in a photographic format or the like. In other words, if the AI learns the fruit thinning work using the illustrated image data as learning data, the AI can accurately learn the feature quantities that are important for the fruit thinning work.

また、写真形式等の画像データより、イラスト化された画像データの方が、色の表現等が簡略であるため、データ量を少なくできる。 In addition, illustrated image data is easier to express colors than photographic image data, so that the amount of data can be reduced.

図１０は、イラスト化された画像データ、又は、マスク画像データの変形例を示す図である。例えば、マスク画像データは、図１０（Ｂ）又は図１０（Ｃ）のように生成されてもよい。以下、図１０（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 FIG. 10 is a diagram showing modified examples of illustrated image data or mask image data. For example, mask image data may be generated as shown in FIG. 10(B) or FIG. 10(C). The first input image data 11D1 shown in FIG. 10A will be described below as an example.

図１０（Ａ）は、林檎の４つの果実を対象物体にする第１入力画像データ１１Ｄ１の例を示す図である。以下、学習データ生成装置１０は、このような第１入力画像データ１１Ｄ１を入力し、学習データ生成装置１０は、インスタンスセグメンテーション等を行う例で説明する。 FIG. 10A is a diagram showing an example of the first input image data 11D1 with four apples as target objects. An example in which the learning data generation device 10 inputs such first input image data 11D1 and performs instance segmentation and the like will be described below.

例えば、図８に示すインスタンスセグメンテーションを行う場合には、第２マスク画像データは、図１０（Ｂ）に示すように生成される。 For example, when performing the instance segmentation shown in FIG. 8, the second mask image data is generated as shown in FIG. 10(B).

一方で、第２マスク画像データは、図１０（Ｃ）に示すように生成されてもよい。 On the other hand, the second mask image data may be generated as shown in FIG. 10(C).

図１０（Ｂ）は、４つの対象物体をまとめて１つの画像データで示す形式の例を示す図である。このように、第２マスク画像データは、複数の対象物体を１つの画像データで示してもよい。 FIG. 10B is a diagram showing an example of a format in which four target objects are collectively shown as one piece of image data. In this way, the second mask image data may represent a plurality of target objects with one image data.

図１０（Ｃ）は、４つの対象物体を対象物体ごとに分けた４つの画像データとし、画像データ群の形式とする例を示す図である。このように、第２マスク画像データは、対象物体ごとに、画像データを分けて、複数の画像データ群で１つの第２マスク画像データとする画像データ群の形式でもよい。 FIG. 10C is a diagram showing an example of the format of an image data group, in which four target objects are divided into four image data. In this way, the second mask image data may be in the form of an image data group in which image data is divided for each target object and a plurality of image data groups are used as one second mask image data.

以上のように、マスク画像データ、又は、イラスト化して生成する画像データは、複数の対象物体をまとめて１つの画像データとしてもよいし、又は、対象物体ごとに別々に分けて画像データ群としてもよい。 As described above, the mask image data or the image data generated by illustration may be a single image data group by grouping a plurality of target objects, or may be separately divided for each target object to form an image data group. good too.

［抽出結果の例］
図１１は、対象物体の認識例を示す図である。以下、図１１（Ａ）に示す第１入力画像データ１１Ｄ１を例に説明する。 [Extraction result example]
FIG. 11 is a diagram showing an example of target object recognition. The first input image data 11D1 shown in FIG. 11A will be described below as an example.

図１１（Ａ）に示す対象物体を扱う場合には、学習データ生成装置１０は、対象物体の形状、色、又は、これらの組み合わせ等を事前に学習する。このような学習を行うと、例えば、学習データ生成装置１０は、図１１（Ｂ）又は図１１（Ｃ）のように対象物体を認識できる。 When the target object shown in FIG. 11A is handled, the learning data generation device 10 learns in advance the shape, color, or combination thereof of the target object. By performing such learning, for example, the learning data generation device 10 can recognize a target object as shown in FIG. 11(B) or FIG. 11(C).

図１１（Ｂ）、及び、図１１（Ｃ）は、対象物体を認識した位置、及び、範囲等を破線で囲んで示す例である。なお、認識結果は、図１１（Ｂ）、及び、図１１（Ｃ）以外の形式で出力されてもよい。 11(B) and 11(C) are examples in which the position, range, etc., where the target object is recognized are shown enclosed by dashed lines. Note that the recognition result may be output in formats other than those shown in FIGS. 11B and 11C.

図１１（Ｂ）は、対象物体を認識した結果の第１例を示す図である。例えば、図１１（Ｂ）に示すように、対象物体は、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７のように、学習データ生成装置１０によって認識される。 FIG. 11B is a diagram showing a first example of the result of recognizing the target object. For example, as shown in FIG. 11B, the target objects are a first target object 101, a second target object 102, a third target object 103, a fourth target object 104, a fifth target object 105, and a sixth target object. 106 and the seventh target object 107 are recognized by the learning data generation device 10 .

また、対象物体は、例えば、図１１（Ｃ）のような形式で認識されてもよい。 Also, the target object may be recognized in a format as shown in FIG. 11(C), for example.

図１１（Ｃ）は、対象物体を認識した結果の第２例を示す図である。第２例は、対象物体ごとに認識結果を別々の画像データに分ける形式の例である。具体的には、学習データ生成装置１０は、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７の認識結果を対象物体ごとに分けて出力する。 FIG. 11C is a diagram showing a second example of the result of recognizing the target object. A second example is an example of a format in which recognition results are divided into separate image data for each target object. Specifically, the learning data generation device 10 includes a first target object 101, a second target object 102, a third target object 103, a fourth target object 104, a fifth target object 105, a sixth target object 106, and The recognition result of the seventh target object 107 is divided for each target object and output.

なお、対象物体の認識結果は、図１１（Ｂ）又は図１１（Ｃ）に示すように、画像データの形式にされなくともよい。すなわち、対象物体の認識結果は、中間生成物であり、対象物体が画像データ内において占める位置、大きさ、範囲、数、又は、座標等のパラメータ（統計値、又は、代表値を用いる場合を含む。）を学習データ生成装置１０が把握できる形式であればよい。 Note that the recognition result of the target object does not have to be in the form of image data as shown in FIG. 11(B) or FIG. 11(C). That is, the recognition result of the target object is an intermediate product, and parameters such as the position, size, range, number, or coordinates of the target object in the image data (statistical values or representative values may be used). ) can be used as long as the learning data generation device 10 can grasp the format.

したがって、学習データ生成装置１０は、認識結果を示すパラメータを内部に記憶し、図示するような画像データ等を出力しなくともよい。 Therefore, the learning data generation device 10 does not need to internally store parameters indicating recognition results and output image data and the like as illustrated.

ステップＳ０３０６では、学習データ生成装置１０は、学習データを用いて学習モデルを学習させる。例えば、学習データは、ステップＳ０６０１で生成する画像データ、すなわち、イラスト化した画像データ等である。なお、学習データは、学習データは、複数の形式の画像データでもよい。学習データの詳細は後述する。 In step S0306, the learning data generation device 10 makes the learning model learn using the learning data. For example, the learning data is image data generated in step S0601, that is, illustrated image data. Note that the learning data may be image data in a plurality of formats. Details of the learning data will be described later.

［全体処理の処理結果例］
図１２は、全体処理の処理結果例を示す図である。以下、図１２（Ａ）及び図１２（Ｂ）を摘果前及び摘果後とする場合を例に説明する。 [Example of processing result of overall processing]
FIG. 12 is a diagram illustrating a processing result example of the overall processing. Hereinafter, a case where FIGS. 12(A) and 12(B) are before and after thinning will be described as an example.

図１２（Ａ）は、第１入力画像データ１１Ｄ１の例を示す図である。 FIG. 12A is a diagram showing an example of the first input image data 11D1.

図１２（Ｂ）は、第２入力画像データ１１Ｄ２の例を示す図である。 FIG. 12B is a diagram showing an example of the second input image data 11D2.

図１２（Ｃ）は、第２学習データの例を示す図である。 FIG. 12C is a diagram showing an example of the second learning data.

以下、第１入力画像データ１１Ｄ１において、すなわち、摘果作業の前において、図１２（Ａ）に示すように、第１対象物体１０１、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、第５対象物体１０５、第６対象物体１０６、及び、第７対象物体１０７の７つの対象物体がある例とする。 Hereinafter, in the first input image data 11D1, that is, before the fruit thinning work, as shown in FIG. 104 , a fifth target object 105 , a sixth target object 106 and a seventh target object 107 .

一方で、第２入力画像データ１１Ｄ２において、すなわち、摘果作業が行われた後において、図１２（Ｂ）に示すように、第２対象物体１０２、第３対象物体１０３、第４対象物体１０４、及び、第６対象物体１０６の４つの対象物体が摘果対象物となり、摘果対象物が摘果される例とする。 On the other hand, in the second input image data 11D2, that is, after the fruit thinning work is performed, the second target object 102, the third target object 103, the fourth target object 104, In addition, it is assumed that the four target objects of the sixth target object 106 are the thinning target objects, and the thinning target objects are thinned.

このように、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を比較すると、摘果対象物が抽出できる。このような抽出結果を学習すると、学習データ生成装置１０は、未知の第１入力画像データ１１Ｄ１が入力されると、摘果作業を推定し、推定結果画像データを生成できる。 In this way, by comparing the first input image data 11D1 and the second input image data 11D2, the thinning object can be extracted. By learning such extraction results, the learning data generation device 10 can estimate the fruit thinning work and generate estimation result image data when unknown first input image data 11D1 is input.

このように生成される推定結果画像データ等が学習データ１５となる。そして、ＡＩ１６は、学習データ１５等を第２学習データとし、摘果作業を学習する。 The estimation result image data and the like generated in this way become the learning data 15 . Then, the AI 16 uses the learning data 15 and the like as second learning data, and learns the fruit thinning work.

図１２（Ｃ）は、対象物体を点線で囲んで示す形式の例を示す図である。また、図１２（Ｃ）は、摘果対象物をハッチングで示す形式の例を示す図である。 FIG. 12C is a diagram showing an example of a format in which the target object is surrounded by dotted lines. Also, FIG. 12C is a diagram showing an example of a format in which the thinning object is indicated by hatching.

なお、第２学習データは、図１２（Ｃ）に示す形式に限られない。すなわち、第２学習データは、摘果対象物の位置、数、配置、形状、又は、範囲等をＡＩ１６が学習できればよい。したがって、第２学習データは、摘果対象物、及び、対象物体を他の形式で特定してもよい。 Note that the second learning data is not limited to the format shown in FIG. 12(C). That is, as for the second learning data, it is sufficient that the AI 16 can learn the position, number, arrangement, shape, range, or the like of the thinning object. Therefore, the second learning data may specify the thinning target object and the target object in other formats.

［第３実施形態］
図１３は、学習装置の構成例を示す図である。第１実施形態等と比較すると、第３実施形態における学習データ生成装置１０等の構成は、例えば、第１実施形態と同様である。一方で、学習装置３０１は、情報処理装置等である。なお、学習データ生成装置１０、及び、学習装置３０１は同じ情報処理装置等でもよい。 [Third Embodiment]
FIG. 13 is a diagram illustrating a configuration example of a learning device. Compared with the first embodiment and the like, the configuration of the learning data generation device 10 and the like in the third embodiment is, for example, the same as in the first embodiment. On the other hand, the learning device 301 is an information processing device or the like. Note that the learning data generation device 10 and the learning device 301 may be the same information processing device or the like.

第３実施形態は、第１実施形態、又は、第２実施形態における構成により生成された学習データ１５等を用いて学習モデル３０２を学習させて学習済みモデル３０３を生成する。 In the third embodiment, a learned model 303 is generated by learning a learning model 302 using the learning data 15 or the like generated by the configuration in the first embodiment or the second embodiment.

以下、学習中、又は、学習が行われる前のＡＩを単に「学習モデル３０２」という。一方で、ある程度、第２学習データによる学習が行われた後のＡＩを「学習済みモデル３０３」という。 Hereinafter, AI during learning or before learning is simply referred to as "learning model 302". On the other hand, the AI after learning with the second learning data to some extent is referred to as a "learned model 303".

学習装置３０１は、学習データ１５を入力する。そして、学習装置３０１は、学習データ１５により、学習モデル３０２を学習させる。 The learning device 301 inputs the learning data 15 . Then, the learning device 301 causes the learning model 302 to learn using the learning data 15 .

なお、学習には、学習データ１５以外のデータが用いられてもよい。例えば、学習装置３０１は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２等も入力して、学習モデル３０２を学習させてもよい。ほかにも、学習装置３０１は、抽出結果等の形式で第２学習データを入力してもよい。 Note that data other than the learning data 15 may be used for learning. For example, the learning device 301 may also input the first input image data 11D1, the second input image data 11D2, etc., and allow the learning model 302 to learn. Alternatively, the learning device 301 may input the second learning data in the form of an extraction result or the like.

以上のように、学習装置３０１は、学習モデル３０２を学習させて学習済みモデル３０３を生成する。このような学習済みモデル３０３が生成できると、摘果対象物を推定するＡＩが実現できる。 As described above, the learning device 301 trains the learning model 302 to generate the trained model 303 . If such a learned model 303 can be generated, an AI for estimating a thinning object can be realized.

具体的には、学習装置３０１は、例えば、以下のような構成である。 Specifically, the learning device 301 has, for example, the following configuration.

図１４は、学習装置によって学習を行う構成の例を示す図である。図示するように、学習モデル３０２は、少なくとも生成部１０Ｆ３、及び、識別部１０Ｆ４を備える構成である。 FIG. 14 is a diagram showing an example of a configuration in which learning is performed by a learning device. As illustrated, the learning model 302 is configured to include at least a generation unit 10F3 and an identification unit 10F4.

そして、生成部１０Ｆ３、及び、識別部１０Ｆ４は、敵対的生成ネットワークにおける生成器、及び、識別器である。 The generation unit 10F3 and the identification unit 10F4 are generators and classifiers in the adversarial generation network.

まず、学習装置３０１は、第１入力画像データ１１Ｄ１を入力する。 First, the learning device 301 inputs the first input image data 11D1.

次に、生成部１０Ｆ３は、第１入力画像データ１１Ｄ１が示す対象物体のうち、摘果対象物となる対象物体を推定する。そして、生成部１０Ｆ３は、推定結果画像データ２１を生成する。以下、図１２（Ｃ）と同様に、対象物体を点線で囲んで示し、かつ、対象物体のうち、摘果対象物をハッチングで示す形式の例で説明する。 Next, the generation unit 10F3 estimates a target object to be a thinning target among the target objects indicated by the first input image data 11D1. Then, the generation unit 10F3 generates the estimation result image data 21. FIG. Hereinafter, similarly to FIG. 12(C), an example of a format in which target objects are indicated by dotted lines and thinning target objects are indicated by hatching among the target objects will be described.

次に、推定結果画像データ２１が生成されると、識別部１０Ｆ４は、推定結果画像データ２１に対して、識別を行う。そして、識別部１０Ｆ４は、学習データ１５を「正解」とし、推定結果画像データ２１の識別を行う。 Next, when the estimation result image data 21 is generated, the identification unit 10F4 identifies the estimation result image data 21 . Then, the identification unit 10F4 identifies the estimation result image data 21 with the learning data 15 as “correct”.

具体的には、まず、推定結果画像データ２１は、摘果対象物の位置、及び、数等を示す。一方で、学習データ１５も、推定結果画像データ２１と同様に、摘果対象物の位置、及び、数等を示す。 Specifically, first, the estimation result image data 21 indicates the positions, numbers, and the like of thinning objects. On the other hand, similarly to the estimation result image data 21, the learning data 15 also indicates the positions, numbers, and the like of thinning objects.

以下に説明する例では、識別部１０Ｆ４は、推定結果画像データ２１を参照して、摘果対象物の位置、及び、数がどちらも学習データ１５と一致すると、「正解」と識別する。 In the example described below, the identification unit 10F4 refers to the estimation result image data 21, and if both the position and the number of the thinning target objects match the learning data 15, the identification unit 10F4 identifies them as "correct".

一方で、識別部１０Ｆ４は、推定結果画像データ２１が示す摘果対象物の位置、及び、数のうち、少なくともどちらか一方が学習データ１５と異なると、「誤答」と識別する。 On the other hand, if at least one of the positions and the number of thinning objects indicated by the estimation result image data 21 is different from the learning data 15, the identification unit 10F4 identifies it as an “wrong answer”.

そして、識別部１０Ｆ４は、少なくとも生成部１０Ｆ３に「正解」、又は、「誤答」の識別結果をフィードバックさせる。このように、フィードバックは、識別部１０Ｆ４から少なくとも生成部１０Ｆ３に、識別結果を伝える処理等である。 Then, the identification unit 10F4 feeds back at least the generation unit 10F3 with the identification result of "correct answer" or "wrong answer". In this way, the feedback is a process of transmitting the identification result from the identification unit 10F4 to at least the generation unit 10F3.

なお、生成部１０Ｆ３の学習のため、フィードバックは、識別部１０Ｆ４による識別の過程、識別の基準、又は、識別の途中で生成した中間データ等を伝えてもよい。すなわち、フィードバックは、識別結果を出力するまでの過程、及び、途中で生成されたデータ等も識別結果とセットで伝えてもよい。そして、生成部１０Ｆ３は、フィードバックされる識別結果を参照して学習する。なお、他にセットでデータが送信される場合には、生成部１０Ｆ３は、セットのデータも参照して学習してもよい。 For learning of the generation unit 10F3, the feedback may convey the process of identification by the identification unit 10F4, criteria for identification, intermediate data generated during identification, or the like. In other words, the feedback may include the process up to the output of the identification result and the data generated during the process, together with the identification result. Then, the generation unit 10F3 learns with reference to the feedback identification result. In addition, when data is transmitted in another set, the generation unit 10F3 may learn by referring to the data in the set as well.

具体的には、図１４に示す例では、推定結果画像データ２１、及び、学習データ１５は、７個の対象物体から摘果対象物を選択して示す。そして、推定結果画像データ２１による推定結果、及び、学習データ１５による「正解」を比較すると、この例は、中央に位置する対象物体（図において、差異１５１で示す対象物体である。）が摘果対象物となるか否かが異なる。 Specifically, in the example shown in FIG. 14 , the estimation result image data 21 and the learning data 15 are selected from seven target objects and shown as thinning objects. Comparing the estimation result based on the estimation result image data 21 and the "correct answer" based on the learning data 15, in this example, the target object located in the center (the target object indicated by the difference 151 in the figure) is thinned. It is different whether it becomes an object or not.

ゆえに、推定結果画像データ２１、及び、学習データ１５の比較結果は、摘果対象物の数、及び、差異１５１の判断結果が異なるため、差異があると識別される。したがって、比較結果に基づき、摘果対象物の数、及び、位置がいずれも基準とする学習データ１５と異なるため、識別部１０Ｆ４は、「誤答」と識別する。 Therefore, the comparison result of the estimation result image data 21 and the learning data 15 is identified as having a difference because the number of thinning objects and the judgment result of the difference 151 are different. Therefore, based on the comparison result, the identification unit 10F4 identifies "wrong answer" because both the number and position of the thinning objects are different from the reference learning data 15. FIG.

なお、識別部１０Ｆ４による識別は、基準に対して許容範囲があってもよい。例えば、数は、基準に対して２個以下であれば許容する等と設定されてもよい。このような許容範囲の設定である場合には、差異１５１の差異だけであれば、識別部１０Ｆ４は、「正解」と識別する。また、学習において、設定できる項目があってもよい。 Note that the identification by the identification unit 10F4 may have an allowable range with respect to the reference. For example, the number may be set such that two or less is allowed with respect to the reference. In the setting of such an allowable range, if there is only the difference of the difference 151, the identification unit 10F4 identifies it as "correct". Also, in learning, there may be items that can be set.

そして、例えば、生成部１０Ｆ３が生成する複数の推定結果画像データ２１を専門家が見て、評価が行われる。具体的には、生成部１０Ｆ３が１００枚の推定結果画像データ２１を生成し、専門家が推定結果画像データ２１を見て１００枚ともすべて問題ないと判断すれば、生成部１０Ｆ３等は学習が完了したと評価される。 Then, for example, an expert views and evaluates a plurality of pieces of estimation result image data 21 generated by the generation unit 10F3. Specifically, if the generation unit 10F3 generates 100 images of the estimation result image data 21, and the expert looks at the estimation result image data 21 and determines that all 100 images are satisfactory, the generation unit 10F3 and the like cannot learn. Evaluated as complete.

以上のような生成、及び、識別のフィードバックを繰り返すと、学習装置３０１は、推定結果画像データ２１の生成精度を高くできる。 By repeating generation and identification feedback as described above, the learning device 301 can improve the generation accuracy of the estimation result image data 21 .

なお、学習装置３０１は、生成、又は、識別において、摘果対象物を抽出するのが望ましい。具体的には、学習装置３０１は、生成、又は、識別において、マスク画像データの生成、及び、イラスト化等の処理を行う。 Note that the learning device 301 preferably extracts thinning objects during generation or identification. Specifically, the learning device 301 performs processing such as generation of mask image data and illustration during generation or identification.

このように、画像データをマスクする、イラスト化する、又は、両方の処理を行って、抽出を行うと、抽出結果等を単純化して表現できる。そして、抽出結果は、対象物体の位置、及び、形状等が大まかに表現できればよい場合が多い。すなわち、抽出結果には、細かな色の変化、摘果作業に関係の薄い被写体、及び、背景等のデータが不要な場合が多い。 In this way, if the image data is masked, illustrated, or both are processed for extraction, extraction results and the like can be expressed in a simplified manner. In many cases, it is sufficient that the extraction result can roughly express the position, shape, and the like of the target object. That is, in many cases, extraction results do not require data such as fine color changes, subjects with little relation to the fruit thinning work, backgrounds, and the like.

特に、農作物がある環境は、周囲の環境をＡＩの学習用、及び、撮影用に調整しにくい場合も多い。また、農作物がある環境は、不意に関係の薄い被写体も入り込みやすい環境である場合が多い。したがって、画像データをマスクする処理により、このような外乱を少なくできると、ＡＩは、摘果作業の内容を把握するのに重要な特徴量を精度良く学習できる。 In particular, in environments with crops, it is often difficult to adjust the surrounding environment for AI learning and photography. In addition, in many cases, an environment with crops is an environment in which it is easy for subjects with little relation to enter unexpectedly. Therefore, if such a disturbance can be reduced by masking the image data, the AI can accurately learn the feature quantity that is important for grasping the content of the fruit thinning work.

また、対象物体をイラスト化して単色で簡略に示す、又は、重要な部分に絞った画像データとする方が、写真形式等と比較して、摘果作業の内容を学習する妨げとなる要素を排除し、精度良く学習できる。すなわち、画像データに対して抽出処理を前処理として施して、摘果作業をＡＩに学習させると、ＡＩは、摘果作業の内容を把握するのに重要な特徴量を精度良く学習できる。 In addition, it is better to illustrate the target object and show it in a single color in a simplified manner, or to use image data that focuses on the important part. and can learn with high accuracy. That is, if the image data is subjected to extraction processing as preprocessing and the AI learns the fruit thinning work, the AI can accurately learn the feature quantities that are important for grasping the details of the fruit thinning work.

なお、識別部１０Ｆ４は、推定結果画像データ２１、識別結果、及び、学習データ１５等で学習して識別の精度を向上させてもよい。 Note that the identification unit 10F4 may improve the accuracy of identification by learning using the estimation result image data 21, the identification result, the learning data 15, and the like.

また、学習データ１５は、学習データ生成装置１０が生成したデータでもよいし、第１入力画像データ１１Ｄ１を操作して生成したデータでもよいし、又は、これらの組み合わせでもよい。 The learning data 15 may be data generated by the learning data generation device 10, data generated by manipulating the first input image data 11D1, or a combination thereof.

さらに、推定結果画像データ２１、及び、学習データ１５の形式は、図示する形式に限られない。すなわち、推定結果画像データ２１、及び、学習データ１５の形式は、摘果作業の内容が特定できればよい。例えば、推定結果画像データ２１、及び、学習データ１５の形式は、摘果対象物の位置、及び、数等の内容を数値（画像内の座標又は数量等を示す。）を用いる形式等でもよい。 Furthermore, the formats of the estimation result image data 21 and the learning data 15 are not limited to the illustrated formats. That is, the format of the estimation result image data 21 and the learning data 15 should be able to specify the details of the fruit thinning work. For example, the format of the estimation result image data 21 and the learning data 15 may be a format using numerical values (indicating coordinates or quantities in the image) for contents such as the positions and numbers of thinning objects.

なお、識別の基準は、摘果対象物の位置、及び、数に限られず、他の基準でもよい。そして、何を基準にして識別するかも学習の対象となってよい。また、何を基準にして識別するかは、人が設定できてもよい。 Note that the criteria for identification are not limited to the position and number of thinning objects, and other criteria may be used. What is used as a reference for identification may also be an object of learning. Moreover, what is used as a reference for identification may be set by a person.

［機能構成例］
図１５は、学習装置の機能構成例を示す図である。例えば、学習装置３０１は、画像データ入力部１０Ｆ１、学習データ入力部３０１Ｆ１、生成部１０Ｆ３、及び、識別部１０Ｆ４等を備える機能構成である。なお、学習装置３０１は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６を更に備える機能構成であるのが望ましい。以下、図示する機能構成を例に説明する。 [Example of functional configuration]
FIG. 15 is a diagram illustrating a functional configuration example of a learning device; For example, the learning device 301 has a functional configuration including an image data input unit 10F1, a learning data input unit 301F1, a generation unit 10F3, an identification unit 10F4, and the like. The learning device 301 preferably has a functional configuration further including an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6. The illustrated functional configuration will be described below as an example.

画像データ入力部１０Ｆ１は、第１入力画像データ１１Ｄ１を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting the first input image data 11D1. For example, the image data input unit 10F1 is implemented by the camera 11, the interface 10H3, and the like.

生成部１０Ｆ３は、推定結果画像データ２１を生成する生成手順を行う。例えば、生成部１０Ｆ３は、ＣＰＵ１０Ｈ１等で実現する。 The generation unit 10F3 performs a generation procedure for generating the estimation result image data 21. FIG. For example, the generation unit 10F3 is realized by the CPU 10H1 or the like.

識別部１０Ｆ４は、学習データ１５と比較して、推定結果画像データ２１を識別して、識別結果を生成部１０Ｆ３へフィードバックさせて学習モデル３０２を学習させる識別手順を行う。例えば、識別部１０Ｆ４は、ＣＰＵ１０Ｈ１等で実現する。 The identification unit 10F4 compares the estimation result image data 21 with the learning data 15 and feeds back the identification result to the generation unit 10F3 to perform the identification procedure for learning the learning model 302 . For example, the identification unit 10F4 is implemented by the CPU 10H1 or the like.

推定結果画像データ２１、及び、学習データ１５は、どちらか一方、又は、両方が抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 Either or both of the estimation result image data 21 and the learning data 15 are illustrated by generating mask image data by an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6. It is desirable to have an extraction process that does or does both.

このように、摘果対象物が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、学習モデル３０２は、摘果対象物等の重要な特徴量を精度良く学習できる。すなわち、学習装置３０１は、学習モデル３０２を学習させて、摘果作業を精度良く推定できる学習済みモデル３０３を生成できる。 In this way, when the thinning target is extracted, the learning model 302 can learn important feature amounts such as the thinning target with high accuracy, compared to the case where the image data of the crops is simply used as it is. . That is, the learning device 301 can make the learning model 302 learn and generate the learned model 303 that can accurately estimate the fruit thinning work.

［第４実施形態］
図１６は、摘果対象物推定装置の構成例を示す図である。以下、未知の摘果前の農作物を示す画像データの例を「未知画像データ４０１」という。 [Fourth embodiment]
FIG. 16 is a diagram illustrating a configuration example of a thinning target object estimation device. Hereinafter, an example of image data representing an unknown crop before thinning is referred to as "unknown image data 401".

第４実施形態は、第３実施形態による学習によって生成された学習済みモデル３０３を実行する実施形態である。以下、学習済みモデル３０３を用いる摘果対象物推定装置を「摘果対象物推定装置４０２」とする。 The fourth embodiment is an embodiment for executing the trained model 303 generated by learning according to the third embodiment. Hereinafter, the thinning target object estimation device using the learned model 303 is referred to as a "thinning target object estimation device 402".

摘果対象物推定装置４０２は、例えば、スマートフォン等の情報処理装置である。なお、学習済みモデル３０３は、他のサーバ装置等が用いる構成であって、摘果対象物推定装置４０２は、サーバ装置と通信して学習済みモデル３０３による推定結果を取得し、出力する構成でもよい。 The thinning target object estimation device 402 is, for example, an information processing device such as a smart phone. In addition, the learned model 303 may be configured to be used by another server device or the like, and the thinning target object estimation device 402 may communicate with the server device to acquire and output the estimated result of the learned model 303. .

具体的には、学習済みモデル３０３は、ネットワーク等を介して配布される。なお、学習済みモデル３０３は、アプリケーションソフト等に組み込まれる形式等でもよい。このように配布される学習済みモデル３０３を摘果対象物推定装置４０２にインストールすると、摘果対象物推定装置４０２は、図示するような推定、及び、推定結果の出力等ができる状態となる。 Specifically, the trained model 303 is distributed via a network or the like. Note that the learned model 303 may be in a format incorporated in application software or the like. When the learned model 303 distributed in this way is installed in the thinning target object estimation device 402, the thinning target object estimation device 402 is put into a state in which it is possible to perform estimation as shown in the figure, output of the estimation result, and the like.

未知画像データ４０１は、摘果対象物推定装置４０２が撮影する画像データである。また、未知画像データ４０１が示す農作物は、摘果作業が行われる前の状態である。このように、未知画像データ４０１が示す農作物は、第１実施形態、又は、第２実施形態において、学習の対象となった農作物とは異なる「未知」の農作物である。 The unknown image data 401 is image data captured by the thinning object estimation device 402 . Also, the crop indicated by the unknown image data 401 is in a state before the fruit thinning work is performed. In this way, the crops indicated by the unknown image data 401 are "unknown" crops different from the crops that were the target of learning in the first embodiment or the second embodiment.

なお、摘果対象物推定装置４０２は、推定において、摘果対象物を抽出するのが望ましい。具体的には、摘果対象物推定装置４０２は、推定において、マスク画像データの生成、及び、イラスト化等の処理を行うのが望ましい。このような摘果対象物の抽出が行われると、摘果対象物推定装置４０２は、推定を精度良くできる。 In addition, the thinning target object estimation device 402 preferably extracts the thinning target object in the estimation. Specifically, the thinning object estimation device 402 desirably performs processes such as generation of mask image data and illustration during estimation. When such a thinning target object is extracted, the thinning target object estimation device 402 can perform estimation with high accuracy.

摘果対象物推定装置４０２は、未知画像データ４０１に基づき、対象物体を識別する。そして、摘果対象物推定装置４０２は、学習済みモデル３０３により、摘果対象物を推定する。例えば、推定結果は、ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ（ＡＲ、拡張現実）の形式等で出力される。具体的には、摘果対象物推定装置４０２は、出力画面４０３をユーザ４０４に対して表示する。 A thinning object estimation device 402 identifies a target object based on the unknown image data 401 . Then, the thinning target object estimation device 402 estimates the thinning target object using the learned model 303 . For example, the estimation result is output in the form of Augmented Reality (AR). Specifically, the thinning object estimation device 402 displays the output screen 403 to the user 404 .

出力画面４０３は、未知画像データ４０１の上に「×」を重ねて表示して、摘果対象物をユーザ４０４に伝える画面である。なお、出力は、他の表示形式、又は、音声を用いる等の形式でもよい。 The output screen 403 is a screen that displays an “x” superimposed on the unknown image data 401 to inform the user 404 of the object to be thinned. Note that the output may be in another display format or in a format using voice.

なお、摘果対象物推定装置４０２は、例えば、「最適化項目設定」の操作画面（以下単に「設定画面４０５」という。）等により、項目を受け付ける構成があるのが望ましい。 In addition, it is desirable that the thinning object estimation device 402 has a configuration for accepting items, for example, through an operation screen for "optimization item setting" (hereinafter simply referred to as "setting screen 405").

摘果作業は、いわゆる好みに応じて行われる場合がある。そこで、設定画面４０５は、好み等を設定するインタフェースである。 Fruit thinning work may be performed according to so-called preference. Therefore, the setting screen 405 is an interface for setting preferences.

設定画面４０５は、「甘味」、「酸味」、「サイズ（全体）」、「サイズ（粒）」、「色」、「均一性」、及び、「ケースに入る形状にする。」等の項目を設定する例である。なお、項目、及び、設定形式は事前に定める。 The setting screen 405 includes items such as "sweetness", "sourness", "size (whole)", "size (grain)", "color", "uniformity", and "shape to fit in a case". This is an example of setting The items and setting format are determined in advance.

「甘味」、及び、「酸味」は、収穫時の農作物の味を調整する項目である。 “Sweetness” and “sourness” are items for adjusting the taste of crops at the time of harvest.

「サイズ（全体）」は、収穫時の農作物の全体的なサイズを調整する項目である。例えば、「サイズ（全体）」は、複数の実を有する農作物等の場合に、複数の実による全体的なバランス等を調整するのに用いる。 “Size (whole)” is an item for adjusting the overall size of crops at the time of harvest. For example, "size (whole)" is used to adjust the overall balance of a plurality of fruits in the case of crops having a plurality of fruits.

「サイズ（実）」は、収穫時の農作物の１つの実当たりのサイズを調整する項目である。例えば、「サイズ（実）」は、複数の実を有する農作物等の場合に、１つ当たりの実の大きさ等を調整するのに用いる。 “Size (fruit)” is an item for adjusting the size of one fruit of crops at the time of harvest. For example, "size (fruit)" is used to adjust the size of each fruit in the case of crops having a plurality of seeds.

「色」は、収穫時の農作物の色を調整する項目である。 “Color” is an item for adjusting the color of crops at the time of harvest.

「均一性」は、収穫時の農作物の実の大きさを均一にするかを調整する項目である。 “Uniformity” is an item for adjusting whether or not the fruit size of crops at the time of harvesting is uniform.

「ケースに入る形状にする」は、出荷に用いる所定の形状に収まるサイズにするか否かを調整する項目である。このように、項目は、チェックボックス形式で入力されてもよい。 “Make a shape to fit in a case” is an item for adjusting whether or not the size should fit within a predetermined shape used for shipping. Thus, items may be entered in the form of checkboxes.

また、「ケースに入る形状にする」は、例えば、「縦（ｍｍ）×横（ｍｍ）×高さ（ｍｍ）のケースに入るように」等のように、ケースのサイズが数値で指定できる形式等でもよい。 In addition, "shape to fit in the case" can specify the size of the case with a numerical value, such as "to fit in a case of length (mm) x width (mm) x height (mm)". format, etc. may be used.

これらの項目は、摘果作業で調整できる項目である。また、どのような摘果作業を行うと、どの項目に影響するかは、学習（すなわち、第３実施形態である。）において、学習データに入力される。例えば、農作物が甘くなる摘果作業、又は、農作物を大きくする摘果作業等のように、学習モデルは摘果作業の目的ごとに学習する。したがって、学習済みモデルは、項目を最適化する摘果作業を特定できる。また、程度（例えば、甘さ、又は、大きさ等である。）は、例えば、数値等で入力する。 These items are items that can be adjusted in the fruit thinning operation. Also, what kind of fruit thinning work affects which item is input to learning data in learning (that is, in the third embodiment). For example, the learning model learns for each purpose of fruit thinning work, such as fruit thinning work that makes the crops sweeter or fruit thinning work that makes the crops larger. Therefore, the trained model can identify the thinning operation that optimizes the item. Also, the degree (for example, sweetness, size, etc.) is input, for example, as a numerical value.

なお、項目を受け付ける受付部は、設定画面４０５に限られない。すなわち、設定できる項目は、図示する以外の項目があってもよい。また、受付部は、タスクバー、又は、チェックボックス以外のインタフェースでよい。例えば、受付部は、テキストボックス等で入力するインタフェースでよい。さらに、最適化する項目は、固定であってもよい。 Note that the reception unit that receives items is not limited to the setting screen 405 . That is, the items that can be set may include items other than those shown in the figure. Also, the reception unit may be an interface other than a taskbar or a check box. For example, the reception unit may be an interface for inputting using a text box or the like. Furthermore, the items to be optimized may be fixed.

図１７は、摘果対象物推定装置によって推定を行う構成の例を示す図である。例えば、学習済みモデル３０２は、第３実施形態による学習後、第３実施形態で用いた敵対的生成ネットワークを構成する生成部１０Ｆ３、及び、識別部１０Ｆ４のうち、識別部１０Ｆ４を取り除いた構成である。 FIG. 17 is a diagram illustrating an example of a configuration for performing estimation by the thinning target object estimation device. For example, after learning according to the third embodiment, the trained model 302 has a configuration in which the identification unit 10F4 is removed from the generation unit 10F3 and the identification unit 10F4 that constitute the adversarial generation network used in the third embodiment. be.

すなわち、摘果対象物推定装置４０２は、未知画像データ４０１を入力すると、未知画像データ４０１が示す対象物体に適した摘果作業を推定する。そして、摘果対象物推定装置４０２は、推定結果を示す推定結果画像データ２１を出力する。 That is, when the unknown image data 401 is input, the thinning target object estimation device 402 estimates a thinning work suitable for the target object indicated by the unknown image data 401 . Then, the thinning target object estimation device 402 outputs the estimation result image data 21 indicating the estimation result.

なお、識別部１０Ｆ４は、機能が停止していればよい。すなわち、学習済みモデル３０２は、学習モデル３０２と同様に識別部１０Ｆ４を有しても、識別部１０Ｆ４を停止させればよい。一方で、学習済みモデル３０２は、識別部１０Ｆ４を取り除く、又は、識別部１０Ｆ４がない構成とし、識別部１０Ｆ４の構成が全くなくともよい。 Note that the function of the identification unit 10F4 may be stopped. That is, even if the trained model 302 has the identification unit 10F4 like the learning model 302, the identification unit 10F4 may be stopped. On the other hand, the trained model 302 may be configured without the identification unit 10F4 or without the identification unit 10F4 at all.

［機能構成例］
図１８は、摘果対象物推定装置の機能構成例を示す図である。例えば、摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。なお、摘果対象物推定装置４０２は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６を更に備える機能構成であるのが望ましい。以下、図示する機能構成を例に説明する。 [Example of functional configuration]
FIG. 18 is a diagram illustrating a functional configuration example of a thinning target object estimation device. For example, the thinning object estimation device 402 has a functional configuration including an image data input unit 10F1, an estimation unit 402F1, an output unit 402F2, and the like. It is desirable that the thinning object estimation device 402 has a functional configuration further including an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6. The illustrated functional configuration will be described below as an example.

推定部４０２Ｆ１は、学習済みモデル３０３により、摘果対象物を推定する推定手順を行う。例えば、推定部４０２Ｆ１は、ＣＰＵ１０Ｈ１等で実現する。 The estimating unit 402F1 performs an estimating procedure for estimating the thinning object using the learned model 303 . For example, the estimation unit 402F1 is realized by the CPU 10H1 or the like.

例えば、推定部４０２Ｆ１は、生成部１０Ｆ３等で構成する。 For example, the estimation unit 402F1 is composed of the generation unit 10F3 and the like.

出力部４０２Ｆ２は、推定結果を出力する出力手順を行う。例えば、出力部４０２Ｆ２は、出力装置１０Ｈ５等で実現する。 The output unit 402F2 performs an output procedure for outputting the estimation result. For example, the output unit 402F2 is realized by the output device 10H5 or the like.

未知画像データ４０１は、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 The unknown image data 401 is subjected to extraction processing of generating mask image data, illustration, or both processing by the extraction unit 10F2, the mask image data generation unit 10F5, and the illustration processing unit 10F6. is desirable.

推定においても、学習した要素にできるだけ注目した方が、摘果対象物推定装置４０２は、摘果対象物等を精度良く推定できる。 Also in estimation, the thinning target object estimation device 402 can accurately estimate the thinning target object and the like by paying attention to the learned elements as much as possible.

このように、未知画像データ４０１において摘果対象物が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、摘果対象物推定装置４０２は、摘果対象物等を精度良く推定できる。 In this way, when the thinning target object is extracted from the unknown image data 401, the thinning target object estimating device 402 can accurately identify the thinning target object, etc., compared to the case where the image data obtained by simply photographing the crops is used as it is. can be estimated well.

［学習システムの機能構成例］
図１９は、機能構成例を示す図である。例えば、学習データ生成装置１０は、画像データ入力部１０Ｆ１、抽出部１０Ｆ２、生成部１０Ｆ３、及び、識別部１０Ｆ４等を備える機能構成である。また、学習データ生成装置１０は、図示するように、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６等を更に備える機能構成であるのが望ましい。 [Example of functional configuration of learning system]
FIG. 19 is a diagram illustrating an example of a functional configuration; For example, the learning data generation device 10 has a functional configuration including an image data input unit 10F1, an extraction unit 10F2, a generation unit 10F3, an identification unit 10F4, and the like. Moreover, as shown in the drawing, the learning data generation device 10 preferably has a functional configuration further including a mask image data generation unit 10F5, an illustration processing unit 10F6, and the like.

画像データ入力部１０Ｆ１は、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting first input image data 11D1 and second input image data 11D2. For example, the image data input unit 10F1 is implemented by the camera 11, the interface 10H3, and the like.

抽出部１０Ｆ２は、対象物体のうち、第１入力画像データ１１Ｄ１、及び、第２入力画像データ１１Ｄ２の差異となる対象物体を摘果対象物として抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting target objects that are different between the first input image data 11D1 and the second input image data 11D2 from among the target objects as thinning targets. For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

生成部１０Ｆ３は、抽出結果を示す画像データを第１学習データとして学習し、かつ、推定結果画像データを生成する生成手順を行う。例えば、生成部１０Ｆ３は、ＣＰＵ１０Ｈ１等で実現する。 The generation unit 10F3 learns the image data representing the extraction result as the first learning data, and performs a generation procedure of generating estimation result image data. For example, the generation unit 10F3 is realized by the CPU 10H1 or the like.

識別部１０Ｆ４は、推定結果画像データを識別して、識別結果に基づき第２学習データを生成する識別手順を行う。例えば、識別部１０Ｆ４は、ＣＰＵ１０Ｈ１等で実現する。 The identification unit 10F4 identifies the estimation result image data and performs an identification procedure of generating second learning data based on the identification result. For example, the identification unit 10F4 is implemented by the CPU 10H1 or the like.

マスク画像データ生成部１０Ｆ５は、対象物体、及び、対象物体以外を区別して示すマスク画像データを生成するマスク画像データ生成手順を行う。例えば、マスク画像データ生成部１０Ｆ５は、ＣＰＵ１０Ｈ１等で実現する。 The mask image data generation unit 10F5 performs a mask image data generation procedure for generating mask image data that distinguishes between a target object and non-target objects. For example, the mask image data generation unit 10F5 is realized by the CPU 10H1 or the like.

イラスト化処理部１０Ｆ６は、対象物体、及び、対象物体以外をイラスト化するイラスト化手順を行う。例えば、イラスト化処理部１０Ｆ６は、ＣＰＵ１０Ｈ１等で実現する。 The illustration processing unit 10F6 performs an illustration procedure for illustrating a target object and objects other than the target object. For example, the illustration processing unit 10F6 is realized by the CPU 10H1 or the like.

以上のように、学習データ生成装置１０は、学習データ１５等の第２学習データを生成する。このように、第２学習データを生成できると、学習データを人手で生成する場合等と比較して、農作物の摘果箇所を推定するＡＩ用の学習データを用意する作業負荷を軽減できる。例えば、農作物の摘果箇所を推定するＡＩ用の学習データは、少なくとも数千枚の画像データを用意する必要がある。このような用意を行うには、少なくとも１年乃至数年程度の準備期間を要する場合が多い。 As described above, the learning data generation device 10 generates the second learning data such as the learning data 15 and the like. In this way, when the second learning data can be generated, the workload of preparing learning data for AI for estimating thinning locations of crops can be reduced compared to the case of manually generating learning data. For example, it is necessary to prepare at least several thousand pieces of image data as learning data for AI for estimating thinning locations of crops. Such preparation often requires a preparation period of at least one to several years.

特に、農作物は、屋外等のように、いわゆる自然光下で撮影される場合が多い。このような照明環境下は、工場等より、照明環境が安定しない条件の場合が多い。具体的には、日光等は、人為的に調整するのが難しい。ゆえに、自然光は、工場等の照明等と比較して、光の強さ、向き、又は、影の有無等といった様々な条件が変動する。ゆえに、農作物を対象とする撮影は、照明環境が工場内等の屋内と比較して条件が厳しい場合が多い。このような外乱の多い条件下でＡＩを用いる場合には、特に学習データが多いのが望ましい。 In particular, crops are often photographed under so-called natural light, such as outdoors. Under such a lighting environment, there are many cases where the lighting environment is not stable, such as in a factory. Specifically, sunlight and the like are difficult to adjust artificially. Therefore, in natural light, various conditions such as light intensity, direction, presence or absence of shadows, etc. fluctuate compared to lighting in a factory or the like. Therefore, when photographing agricultural products, the lighting environment is often harsher than indoors such as in a factory. When AI is used under such conditions with many disturbances, it is particularly desirable to have a large amount of learning data.

なお、準備期間は、対象とする農作物の周期によって異なる。 Note that the preparation period varies depending on the cycle of the target agricultural products.

さらに、ＡＩの推定精度を十分に高めようとするのであれば、学習データは、更に多く準備されるのが望ましい。例えば、バーニーおじさんのルール（ＵｎｃｌｅＢｅｒｎｉｅ‘ｓｒｕｌｅ）等に基づくと、ＡＩの学習には、ニューラルネットワークにおけるパラメータ数の１０倍以上の学習データを準備するのが望ましい。したがって、農作物の摘果箇所を推定するＡＩ用の学習データは、数万枚乃至数十万枚以上の画像データが準備されるのが望ましい場合もある。 Furthermore, if an attempt is made to sufficiently improve the AI estimation accuracy, it is desirable to prepare a larger amount of learning data. For example, based on Uncle Bernie's rule, it is desirable to prepare learning data ten times or more the number of parameters in the neural network for AI learning. Therefore, it may be desirable to prepare tens of thousands to hundreds of thousands of image data as AI learning data for estimating thinning locations of crops.

準備する学習データの量が多くなれば、学習データを実物の農作物を撮影して生成する場合には、準備期間が長くなり、作業負荷も大きくなりやすい。このように、作業負荷が大きくなると、開発コストの増大、及び、開発の長期化等の原因になる。 As the amount of learning data to be prepared increases, the preparation period becomes longer and the workload tends to increase when the learning data is generated by photographing the actual crops. As described above, when the work load increases, it causes an increase in development costs and a prolonged development period.

一方で、本実施形態のように、学習データを生成できると、少ない作業負荷で多くの学習データを用意できる。したがって、学習データを用意する作業負荷を軽減できる。 On the other hand, if learning data can be generated as in this embodiment, a large amount of learning data can be prepared with a small workload. Therefore, the workload of preparing learning data can be reduced.

学習装置３０１は、例えば、学習データ入力部３０１Ｆ１、及び、学習部３０１Ｆ２等を備える機能構成である。 The learning device 301 has, for example, a functional configuration including a learning data input unit 301F1, a learning unit 301F2, and the like.

学習データ入力部３０１Ｆ１は、第２学習データを入力する学習データ入力手順を行う。例えば、学習データ入力部３０１Ｆ１は、インタフェース１０Ｈ３等で実現する。 The learning data input unit 301F1 performs a learning data input procedure for inputting second learning data. For example, the learning data input unit 301F1 is realized by the interface 10H3 or the like.

学習部３０１Ｆ２は、第２学習データにより、学習モデル３０２を学習させる学習手順を行う。例えば、学習部３０１Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The learning unit 301F2 performs a learning procedure for learning the learning model 302 using the second learning data. For example, the learning unit 301F2 is realized by the CPU 10H1 or the like.

以上のように、学習装置３０１は、学習データ生成装置１０が生成する第２学習データ等を用いて学習モデル３０２を学習させる。このような学習により、学習装置３０１は、摘果対象物を推定する学習済みモデル３０３を生成できる。例えば、学習済みモデル３０３は、以下のように摘果対象物推定装置４０２が用いる。 As described above, the learning device 301 makes the learning model 302 learn using the second learning data generated by the learning data generation device 10 and the like. Through such learning, the learning device 301 can generate a learned model 303 for estimating the thinning object. For example, the learned model 303 is used by the thinning object estimation device 402 as follows.

摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。 The thinning object estimation device 402 has a functional configuration including an image data input unit 10F1, an estimation unit 402F1, an output unit 402F2, and the like.

画像データ入力部１０Ｆ１は、未知画像データ４０１を入力する画像データ入力手順を行う。例えば、画像データ入力部１０Ｆ１は、カメラ１１、及び、インタフェース１０Ｈ３等で実現する。 The image data input unit 10F1 performs an image data input procedure for inputting unknown image data 401. FIG. For example, the image data input unit 10F1 is implemented by the camera 11, the interface 10H3, and the like.

以上のように、摘果対象物推定装置４０２は、学習済みモデル３０３を実装すると、学習済みモデル３０３により、摘果作業の内容を推定し、摘果対象物（なお、位置、数、又は、候補等の情報を含む。）を推定できる。このような推定結果が出力されると、ユーザ４０４は、初心者等であっても、推定結果を参照して、適切な摘果作業を行うことができる。すなわち、ユーザ４０４が初心者等であっても、推定結果を参照すると、摘果作業で残す果実と、摘果する果実とが把握できる。 As described above, when the learned model 303 is installed, the thinning target object estimation device 402 estimates the content of the thinning work using the learned model 303, information.) can be estimated. When such an estimation result is output, the user 404 can refer to the estimation result and perform an appropriate thinning operation even if the user is a beginner. That is, even if the user 404 is a beginner or the like, he/she can grasp the fruit left in the fruit thinning work and the fruit to be thinned by referring to the estimation result.

学習システム５００は、例えば、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２の備える機能構成のうち、いずれかの機能構成を備える。 The learning system 500 includes, for example, any one of the functional configurations of the learning data generation device 10, the learning device 301, and the thinning target object estimation device 402.

具体的には、学習システム５００は、学習データ生成装置１０、及び、学習装置３０１等の複数の情報処理装置で構成する。このような学習システム５００であると、学習データを生成し、かつ、学習モデル３０２を学習させて学習済みモデル３０３を生成できる。 Specifically, the learning system 500 includes a plurality of information processing devices such as the learning data generation device 10 and the learning device 301 . With such a learning system 500 , it is possible to generate learning data and to train the learning model 302 to generate the trained model 303 .

なお、学習システム５００は、複数の情報処理装置に限られず、１台の情報処理装置であってもよい。 Note that the learning system 500 is not limited to a plurality of information processing devices, and may be a single information processing device.

また、学習システム５００は、学習装置３０１、及び、摘果対象物推定装置４０２の組み合わせでもよい。 Also, the learning system 500 may be a combination of the learning device 301 and the thinning object estimation device 402 .

［推定システムの機能構成例］
図２０は、推定システムの機能構成例を示す図である。例えば、推定システム５０１は、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２等で構成する。ただし、推定システム５０１は、学習データ生成装置１０がなくともよい。すなわち、推定システム５０１は、学習データ１５に、撮影した画像データを用いる、学習データ生成装置１０が生成した画像データを用いる、及び、両方を用いるのうち、いずれでもよい。 [Function configuration example of estimation system]
FIG. 20 is a diagram illustrating a functional configuration example of an estimation system; For example, the estimation system 501 includes the learning data generation device 10, the learning device 301, the thinning object estimation device 402, and the like. However, the estimation system 501 does not need the learning data generation device 10 . That is, the estimation system 501 may use photographed image data as the learning data 15, use image data generated by the learning data generation device 10, or use both.

なお、学習モデル３０２、及び、学習済みモデル３０３（学習済みモデル３０３を利用するプログラムを含む。）は、複製されて学習装置３０１、及び、摘果対象物推定装置４０２等が複数であってもよい。 In addition, the learning model 302 and the learned model 303 (including a program that uses the learned model 303) may be duplicated so that there may be a plurality of learning devices 301, thinning object estimation devices 402, and the like. .

学習装置３０１は、例えば、画像データ入力部１０Ｆ１、学習データ入力部３０１Ｆ１、学習部３０１Ｆ２、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６等を備える機能構成である。 The learning device 301 has, for example, an image data input unit 10F1, a learning data input unit 301F1, a learning unit 301F2, an extraction unit 10F2, a mask image data generation unit 10F5, an illustration processing unit 10F6, and the like.

学習データ入力部３０１Ｆ１は、学習データ１５を入力する学習データ入力手順を行う。例えば、学習データ入力部３０１Ｆ１は、インタフェース１０Ｈ３等で実現する。 The learning data input unit 301 F 1 performs a learning data input procedure for inputting the learning data 15 . For example, the learning data input unit 301F1 is realized by the interface 10H3 or the like.

学習部３０１Ｆ２は、学習データ１５に基づき、学習モデル３０２を学習させる学習手順を行う。例えば、学習部３０１Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The learning unit 301F2 performs a learning procedure for learning the learning model 302 based on the learning data 15 . For example, the learning unit 301F2 is realized by the CPU 10H1 or the like.

抽出部１０Ｆ２は、第１入力画像データ１１Ｄ１、及び、学習データ１５において、対象物体、又は、摘果対象物を抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting a target object or a thinning target object from the first input image data 11D1 and the learning data 15 . For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

第１入力画像データ１１Ｄ１、及び、学習データ１５は、どちらか一方、又は、両方が抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、及び、イラスト化処理部１０Ｆ６により、マスク画像データを生成する、イラスト化する、又は、両方の処理を行う抽出処理がされるのが望ましい。 Either or both of the first input image data 11D1 and the learning data 15 are extracted by an extraction unit 10F2, a mask image data generation unit 10F5, and an illustration processing unit 10F6 to generate mask image data. It is desirable to have an extraction process that converts or does both.

このように、対象物体、又は、摘果対象物等が抽出されると、単純に農作物を撮影した画像データをそのまま用いる場合等と比較して、学習モデル３０２は、摘果対象物等の重要な特徴量を精度良く学習できる。すなわち、学習装置３０１は、学習モデル３０２を学習させて、摘果作業を精度良く推定できる学習済みモデル３０３を生成できる。 In this way, when the target object or the thinning target object is extracted, the learning model 302 can extract the important features of the thinning target object, etc. compared to the case where the image data of the crops is simply used as it is. Able to learn quantity accurately. That is, the learning device 301 can make the learning model 302 learn and generate the learned model 303 that can accurately estimate the fruit thinning work.

以上のように、推定システム５０１は、学習部３０１Ｆ２により、学習モデル３０２を学習させて、学習済みモデル３０３を生成する。このように、生成された学習済みモデル３０３が、ネットワーク等を介して、摘果対象物推定装置４０２に送られる。 As described above, the estimation system 501 causes the learning unit 301F2 to learn the learning model 302 and generates the trained model 303. FIG. Thus, the generated learned model 303 is sent to the thinning object estimation device 402 via a network or the like.

摘果対象物推定装置４０２は、画像データ入力部１０Ｆ１、抽出部１０Ｆ２、マスク画像データ生成部１０Ｆ５、イラスト化処理部１０Ｆ６、推定部４０２Ｆ１、及び、出力部４０２Ｆ２等を備える機能構成である。 The thinning object estimation device 402 has a functional configuration including an image data input unit 10F1, an extraction unit 10F2, a mask image data generation unit 10F5, an illustration processing unit 10F6, an estimation unit 402F1, an output unit 402F2, and the like.

抽出部１０Ｆ２は、未知画像データ４０１において、対象物体、又は、摘果対象物を抽出する抽出手順を行う。例えば、抽出部１０Ｆ２は、ＣＰＵ１０Ｈ１等で実現する。 The extraction unit 10F2 performs an extraction procedure for extracting a target object or a thinning target object from the unknown image data 401 . For example, the extraction unit 10F2 is realized by the CPU 10H1 or the like.

以上のように、推定システム５０１では、まず、学習装置３０１が学習モデル３０２を学習させて、学習済みモデル３０３を生成する。次に、推定システム５０１では、このように生成された学習済みモデル３０３が摘果対象物推定装置４０２に配布される。 As described above, in the estimation system 501 , the learning device 301 first learns the learning model 302 to generate the trained model 303 . Next, in the estimation system 501 , the learned model 303 generated in this way is distributed to the thinning target object estimation device 402 .

摘果対象物推定装置４０２は、学習済みモデル３０３を実装すると、学習済みモデル３０３により、摘果作業の内容を推定し、摘果対象物（なお、位置、数、又は、候補等の情報を含む。）を推定できる。このような推定結果が出力されると、ユーザ４０４は、初心者等であっても、推定結果を参照して、適切な摘果作業を行うことができる。 When the learned model 303 is installed, the thinning target object estimating device 402 estimates the content of the thinning work using the learned model 303, and identifies the thinning target object (including information such as position, number, or candidate). can be estimated. When such an estimation result is output, the user 404 can refer to the estimation result and perform an appropriate thinning operation even if the user is a beginner.

すなわち、ユーザ４０４が初心者等であっても、推定結果を参照すると、摘果作業で残す果実と、摘果する果実とが把握できる。また、例えば、学習装置３０１がクラウド環境等を利用する場合には、データの収集、及び、学習済みモデル３０３の配布等を速やかに行うことができる。 That is, even if the user 404 is a beginner or the like, he/she can grasp the fruit left in the fruit thinning work and the fruit to be thinned by referring to the estimation result. Further, for example, when the learning device 301 uses a cloud environment or the like, it is possible to quickly collect data, distribute the learned model 303, and the like.

［学習データの形式について］
第１学習データ、及び、第２学習データ等の学習データは、農作物を抽出した形式の画像データを用いるのが望ましい。ただし、抽出は、複数の段階に分けて行ってもよい。このような場合において、学習装置３０１は、抽出において、途中の段階となる形式の画像データ等を学習データに含めてもよい。 [Regarding the format of learning data]
Learning data such as the first learning data and the second learning data are desirably image data in the form of extracted crops. However, extraction may be performed in multiple steps. In such a case, the learning device 301 may include, in the learning data, image data or the like in an intermediate stage of extraction.

例えば、抽出処理は、第１段階乃至第３段階の３段階に分けて行うとする。 For example, it is assumed that the extraction process is divided into three stages, ie, the first stage to the third stage.

第１段階は、入力された状態、すなわち、写真の形式（ただし、ホワイトバランス等の調整がされてもよい。）の画像データである。 The first stage is image data in the input state, that is, image data in a photograph format (however, white balance and the like may be adjusted).

第２段階は、農作物以外の箇所を背景とし、背景をマスクした形式の画像データである。例えば、背景は白色（マスクにより、どのような色にするかは設定する。）にマスク化される。 The second stage is image data in a format in which portions other than crops are used as a background and the background is masked. For example, the background is masked to white (which color is set by the mask).

第３段階は、農作物等をイラスト化した形式の画像データである。 The third stage is image data in the form of illustrations of crops and the like.

学習データは、上記の第１段階乃至第３段階のうち、どの段階の画像データでもよい。また、学習データは、上記の第１段階乃至第３段階のうち、どの段階の画像データだけでなく、複数の段階、すなわち、抽出処理がされる前と後の両方の画像データでもよい。 The learning data may be image data at any stage among the first to third stages. Further, the learning data may be not only image data at any stage among the first to third stages, but also image data at a plurality of stages, that is, both before and after extraction processing.

マスク化等で農作物が抽出された形式の画像データであると、学習装置３０１は、学習モデルに摘果対象物を精度良く学習できる。 If the image data is in a format in which crops are extracted by masking or the like, the learning device 301 can accurately learn thinning objects into the learning model.

一方で、学習データは、写真等の形式の画像データを含むのが望ましい場合もある。例えば、イラスト化すると、画像データは、対象物体に発生している傷等（例えば、日当たりが悪い、塩害、腐食、病気、外傷、又は、虫食い等を原因とする。また、変色等でもよい。）を省略する場合がある。これに対し、摘果作業は、傷等がある対象物体を優先的に摘果する場合もある。このような摘果作業のためのＡＩは、第１段階、又は、第２段階等の形式、すなわち、傷等を表示する形式の画像データで学習するのが望ましい。したがって、学習データは、摘果作業の好み等に応じて形式が選択されてもよい。 On the other hand, it may be desirable for the training data to include image data in the form of photographs or the like. For example, if illustrated, the image data may be caused by scratches, etc. (for example, poor sunlight, salt damage, corrosion, disease, trauma, worm-eaten, etc.) occurring on the target object, or discoloration. ) may be omitted. On the other hand, in the fruit-picking operation, there are cases where target objects with scratches or the like are preferentially picked. It is desirable that the AI for such fruit thinning work is learned in the first stage or second stage format, that is, in the format of image data that displays scars and the like. Therefore, the format of the learning data may be selected according to the preference of the fruit thinning work.

このように、学習データは、複数段階の画像データであると、学習装置３０１は、より好みに合致した摘果作業を学習モデルに学習させることができる。 In this way, when the learning data is image data of multiple stages, the learning device 301 can cause the learning model to learn the fruit thinning work that more closely matches the user's preference.

［ＡＩについて］
ＡＩは、例えば、以下のようなネットワーク構造で画像データ等を処理する。 [About AI]
AI processes image data etc., for example, with the following network structure.

図２１は、ネットワーク構造例を示す図である。例えば、ＡＩは、入力層Ｌ１、隠れ層Ｌ２、及び、出力層Ｌ３を有するネットワーク構造を有してもよい。 FIG. 21 is a diagram illustrating an example network structure. For example, an AI may have a network structure with an input layer L1, a hidden layer L2, and an output layer L3.

具体的には、ＡＩは、図示するようなＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ（畳み込みニューラルネットワーク、ＣＮＮ）等を有するネットワーク構造である。 Specifically, AI is a network structure having a Convolution Neural Network (Convolution Neural Network, CNN) or the like as shown.

入力層Ｌ１は、入力データＤＩＮを入力する層である。 The input layer L1 is a layer for inputting input data DIN.

隠れ層Ｌ２は、入力層Ｌ１から入力される入力データＤＩＮに対して、畳み込み、プーリング、正規化、又は、これらの組み合わせ等の処理を行う層である。 The hidden layer L2 is a layer that performs processing such as convolution, pooling, normalization, or a combination thereof on the input data DIN input from the input layer L1.

出力層Ｌ３は、隠れ層Ｌ２で処理された結果を出力データＤＯＵＴで出力する層である。例えば、出力層Ｌ３は、全結合層等で構成される。 The output layer L3 is a layer that outputs the result processed by the hidden layer L2 as output data DOUT. For example, the output layer L3 is composed of a fully connected layer or the like.

畳み込み（Ｃｏｎｖｏｌｕｔｉｏｎ）は、例えば、フィルタ、マスク、又は、カーネル（以下単に「フィルタ」という。）等に基づいて、画像、又は、画像に対して所定の処理を行って生成される特徴マップ等に対して、フィルタ処理を行って、特徴マップを生成する処理である。 Convolution is, for example, based on a filter, mask, or kernel (hereinafter simply referred to as "filter"), or the like, to an image or a feature map generated by performing a predetermined process on the image. On the other hand, it is a process of performing filtering and generating a feature map.

具体的には、フィルタは、フィルタ係数（「重み」又は「パラメータ」等という場合もある。）を画像又は特徴マップの画素値に乗じる計算をするのに用いるデータである。なお、フィルタ係数は、学習又は設定等により定まる値である。 Specifically, a filter is data used to perform calculations for multiplying pixel values of an image or feature map by filter coefficients (sometimes referred to as "weights" or "parameters"). Note that the filter coefficient is a value determined by learning, setting, or the like.

そして、畳み込みの処理は、画像又は特徴マップを構成する画素のそれぞれの画素値に、フィルタ係数を乗じる計算を行い、計算結果を構成要素とする特徴マップを生成する処理である。 The convolution process is a process of multiplying each pixel value of pixels constituting an image or a feature map by a filter coefficient to generate a feature map having the calculation results as constituent elements.

このように、畳み込みの処理が行われると、画像又は特徴マップの特徴が抽出できる。特徴は、例えば、エッジ成分、又は、対象とする画素の周辺を統計処理した結果等である。 Thus, once the convolution process is performed, the features of the image or feature map can be extracted. A feature is, for example, an edge component, a result of statistical processing of the periphery of a target pixel, or the like.

また、畳み込みの処理が行われると、対象とする画像又は特徴マップが示す被写体等が、上下にずれる、左右にずれる、斜めにずれる、回転、又は、これらの組み合わせとなる画像又は特徴マップであっても同様の特徴が抽出できる。 Further, when the convolution process is performed, the subject or the like indicated by the target image or feature map shifts up and down, shifts left and right, shifts obliquely, rotates, or is an image or feature map that is a combination of these. Similar features can be extracted from

プーリング（Ｐｏｏｌｉｎｇ）は、対象とする範囲に対して、平均の計算、最小値の抽出、又は、最大値の抽出等の処理を行って、特徴を抽出して特徴マップを生成する処理である。すなわち、プーリングは、ｍａｘプーリング、又は、ａｖｇプーリング等である。 Pooling is a process of calculating the average, extracting the minimum value, or extracting the maximum value for a target range, extracting features, and generating a feature map. That is, the pooling is max pooling, avg pooling, or the like.

なお、畳み込み、及び、プーリングは、ゼロパディング（ＺｅｒｏＰａｄｄｉｎｇ）等の前処理があってもよい。 Note that convolution and pooling may be preprocessed such as zero padding.

以上のような、畳み込み、プーリング、又は、これらの組み合わせによって、いわゆるデータ量削減効果、合成性、又は、移動不変性等が獲得できる。 The so-called data amount reduction effect, synthesizability, or movement invariance can be obtained by convolution, pooling, or a combination thereof as described above.

正規化（Ｎｏｒｍａｌｉｚａｔｉｏｎ）は、例えば、分散及び平均値を揃える処理等である。なお、正規化は、局所的に行う場合を含む。そして、正規化が行われるとは、データは、所定の範囲内の値等になる。ゆえに、以降の処理においてデータの扱いが容易にできる。 Normalization is, for example, a process of aligning variances and average values. It should be noted that normalization includes a case where it is performed locally. Then, normalization means that the data becomes a value or the like within a predetermined range. Therefore, the data can be easily handled in subsequent processing.

全結合（Ｆｕｌｌｙｃｏｎｎｅｃｔｅｄ）は、特徴マップ等のデータを出力に落とし込む処理である。 Fully connected is a process of putting data such as feature maps into an output.

例えば、出力は、「ＹＥＳ」又は「ＮＯ」等のように、出力が２値の形式である。このような出力形式では、全結合は、２種類のうち、いずれかの結論となるように、隠れ層Ｌ２で抽出される特徴に基づいてノードを結合する処理である。 For example, the output is in the form of a binary output, such as "YES" or "NO". In such an output format, full connection is a process of connecting nodes based on the features extracted in the hidden layer L2 so as to obtain either of the two types of conclusions.

一方で、出力が３種類以上ある場合等には、全結合は、いわゆるソフトマックス関数等を行う処理である。このようにして、全結合により、最尤推定法等によって分類（確率を示す出力を行う場合を含む。）を行うことができる。 On the other hand, when there are three or more types of outputs, the full combination is a process that performs a so-called softmax function. In this way, the full combination enables classification (including output indicating probability) by maximum likelihood estimation or the like.

［その他の実施形態］
学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２は、異なる種類の情報処理装置であってもよい。すなわち、学習データ生成装置１０、学習装置３０１、及び、摘果対象物推定装置４０２は、異なるハードウェア構成であってもよい。 [Other embodiments]
The learning data generation device 10, the learning device 301, and the thinning target object estimation device 402 may be different types of information processing devices. That is, the learning data generation device 10, the learning device 301, and the thinning target object estimation device 402 may have different hardware configurations.

学習データは、教師データ、又は、訓練データ等と呼ばれる場合もある。 The learning data may also be called teacher data, training data, or the like.

実施形態は、上記の実施形態を組み合わせたものでもよい。すなわち、学習データを生成する装置、学習モデルに対して学習処理を行って学習済みモデルを生成する装置、及び、学習済みモデルを用いて実行処理を行う装置は、同じ装置でもよいし、異なる装置であってもよい。このように、学習モデルの学習、及び、学習済みモデルによる実行は、同一の情報処理装置で行われなくともよい。すなわち、学習モデルの学習、及び、学習済みモデルによる実行は、異なる情報処理装置で行われてもよい。 Embodiments may be combinations of the above embodiments. That is, a device that generates learning data, a device that performs learning processing on a learning model to generate a trained model, and a device that performs execution processing using the trained model may be the same device or different devices. may be In this way, the learning of the learning model and the execution by the trained model do not have to be performed by the same information processing device. That is, the learning of the learning model and the execution by the trained model may be performed by different information processing apparatuses.

なお、異なる装置である場合には、互いの装置は、例えば、ネットワーク等を介して、学習データ、又は、学習済みモデル等のデータを送受信する。 If the devices are different devices, the devices transmit and receive data such as learning data or learned models via a network or the like.

ゆえに、学習済みモデルは、学習によって生成された後、ネットワーク等を介して、プログラム等の形式で配信され、学習された情報処理装置とは異なる装置で実行されてもよい。なお、他の情報処理装置において学習して生成された学習モデルに対し、追加して学習が行われてもよい。 Therefore, a trained model may be distributed in the form of a program or the like via a network or the like after being generated by learning, and may be executed by a device different from the information processing device in which the trained model was trained. In addition, learning may be performed in addition to a learning model generated by learning in another information processing apparatus.

なお、学習データは、データ拡張（ｄａｔａａｕｇｍｅｎｔａｔｉｏｎ）が行われてもよい。具体的には、学習データは、画像データの場合には、画像データが示す画像の一部を切り出して新たなデータを生成する等のデータ拡張がされてもよい。 Note that the learning data may be subjected to data augmentation. Specifically, when the learning data is image data, the data may be expanded by extracting a part of the image indicated by the image data to generate new data.

同様に、データ拡張は、例えば、回転、スライド、データの一部せん断、左右反転、上下反転、歪みを加える、歪みを補正する、濃淡の変更、色の補正、ノイズを減らす、ノイズを加える、フィルタをかける、拡大、縮小、エッジの強調、又は、これらの組み合わせとなる処理等を画像データに対してランダムに適用する処理である。 Similarly, data augmentation includes, for example, rotation, slide, partial shearing of data, horizontal flip, vertical flip, add distortion, correct distortion, change shade, correct color, reduce noise, add noise, This is a process of randomly applying filtering, enlargement, reduction, edge enhancement, or a combination thereof to image data.

このようにデータ拡張により、学習データを増やせると、学習モデルの学習に用いる学習データを増やすことができる。 By increasing the amount of learning data through data expansion in this way, it is possible to increase the amount of learning data used for learning a learning model.

実施形態では、バッチノーマライゼーション（ＢａｔｃｈＮｏｒｍａｌｉｚａｔｉｏｎ）、又は、ドロップアウト等といった過学習（「過剰適合」又は「過適合」等ともいう。ｏｖｅｒｆｉｔｔｉｎｇ）を軽減化させる処理が行われてもよい。ほかにも、次元削減等の処理が行われてもよい。 In the embodiment, processing for reducing overfitting (also referred to as “overfitting” or “overfitting”) such as batch normalization or dropout may be performed. In addition, processing such as dimension reduction may be performed.

学習モデル、及び、学習済みモデル等におけるネットワーク構造は、ＣＮＮのネットワーク構造に限られない。例えば、ネットワーク構造は、ＲＮＮ（再帰型ニューラルネットワーク、ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＬＳＴＭ（ＬｏｎｇＳｈｏｒｔ－ＴｅｒｍＭｅｍｏｒｙ）、又は、Ｔｒａｎｓｆｏｒｍｅｒ等の構成を有してもよい。 The network structure in the learning model, the trained model, etc. is not limited to the network structure of CNN. For example, the network structure may have a configuration such as RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), or Transformer.

また、学習モデル、及び、学習済みモデルは、ハイパパラメータを有する構成であってもよい。すなわち、学習モデル、及び、学習済みモデルは、一部の設定をユーザが行う構成でもよい。 Also, the learning model and the trained model may be configured to have hyperparameters. That is, the learning model and the learned model may be partially configured by the user.

ほかにも、例えば、グラフ（頂点、及び、辺で構成されるデータである。）を扱う場合には、学習モデル、及び、学習済みモデルは、ＧｒａｐｈＮｅｕｒａｌＮｅｔｗｏｒｋ（グラフニューラルネットワーク、ＧＮＮ）等の構造を有してもよい。 In addition, for example, when dealing with a graph (data composed of vertices and edges), the learning model and the trained model are Graph Neural Network (Graph Neural Network, GNN), etc. It may have a structure.

また、学習モデル、及び、学習済みモデルは、他の機械学習を利用してもよい。例えば、学習モデル、及び、学習済みモデルは、教師なしのモデルにより、正規化等を前処理で行ってもよい。 Also, the learning model and the trained model may utilize other machine learning. For example, the learning model and the trained model may be subjected to preprocessing such as normalization by an unsupervised model.

本発明は、上記に例示する学習データ生成方法、学習方法、推定方法、又は、上記に示す処理と等価な処理を実行するプログラム（ファームウェア、及び、プログラムに準ずるものを含む。以下単に「プログラム」という。）で実現されてもよい。 The present invention includes a program (including firmware and programs equivalent to the learning data generation method, the learning method, the estimation method, or the processing equivalent to the processing shown above, hereinafter simply "program" ) may be implemented.

すなわち、本発明は、コンピュータに対して指令を行って所定の結果が得られるように、プログラミング言語等で記載されたプログラム等で実現されてもよい。なお、プログラムは、処理の一部をＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ（集積回路、ＩＣ）等のハードウェア又はＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＧＰＵ）等の演算装置等で実行する構成であってもよい。 That is, the present invention may be realized by a program or the like written in a programming language or the like so as to issue a command to a computer and obtain a predetermined result. Note that the program may be configured such that part of the processing is executed by hardware such as an Integrated Circuit (IC) or an arithmetic unit such as a Graphics Processing Unit (GPU).

プログラムは、コンピュータが有する演算装置、制御装置、及び、記憶装置等を協働させて上記に示す処理等をコンピュータに実行させる。すなわち、プログラムは、主記憶装置等にロードされて、演算装置に命令を発して演算を行わせてコンピュータを動作させる。 The program causes the computer to execute the processes described above by cooperating with the arithmetic device, the control device, the storage device, and the like of the computer. That is, the program is loaded into the main storage device or the like, issues instructions to the arithmetic unit to perform arithmetic operation, and operates the computer.

また、プログラムは、コンピュータが読み込み可能な記録媒体、又は、ネットワーク等の電気通信回線を介して提供されてもよい。 Also, the program may be provided via a computer-readable recording medium or an electric communication line such as a network.

本発明は、複数の装置で構成されるシステムで実現されてもよい。すなわち、複数のコンピュータによるシステムは、上記に示す処理を冗長、並列、分散、又は、これらの組み合わせとなるように実行してもよい。したがって、本発明は、上記に示すハードウェア構成以外の装置、及び、上記に示す装置以外のシステムで実現されてもよい。 The present invention may be implemented in a system composed of multiple devices. That is, a system of multiple computers may perform the processes described above redundantly, in parallel, distributed, or any combination thereof. Therefore, the present invention may be realized by devices with hardware configurations other than those shown above, and systems other than those shown above.

なお、本発明は、上記に例示する各実施形態に限定されない。したがって、本発明は、技術的な要旨を逸脱しない範囲で、構成要素の追加、又は、変形が可能である。ゆえに、特許請求の範囲に記載された技術思想に含まれる技術的事項のすべてが本発明の対象となる。なお、上記に例示する実施形態は、実施において好適な具体例である。そして、当業者であれば、開示した内容から様々な変形例を実現で可能であって、このような変形例は、特許請求の範囲に記載された技術的範囲に含まれる。 In addition, the present invention is not limited to the embodiments illustrated above. Therefore, the present invention can be added or modified without departing from the technical scope. Therefore, all the technical matters included in the technical idea described in the claims are covered by the present invention. In addition, the embodiment illustrated above is a specific example suitable for implementation. A person skilled in the art can implement various modifications from the disclosed contents, and such modifications are included in the technical scope described in the claims.

１０：学習データ生成装置
１０Ｆ１：画像データ入力部
１０Ｆ２：抽出部
１０Ｆ３：生成部
１０Ｆ４：識別部
１０Ｆ５：マスク画像データ生成部
１０Ｆ６：イラスト化処理部
１１：カメラ
１１Ｄ１：第１入力画像データ
１１Ｄ２：第２入力画像データ
１２：第１農作物
１３：第２農作物
１４：作業者
１５：学習データ
２０：抽出結果
２１：推定結果画像データ
２２：正解データ
３１：第１物体
３２：第２物体
３３：第３物体
３４：第４物体
４０：マスク画像データ
４１：第１対象物体
４２：第２対象物体
４３：第３対象物体
４４：第４対象物体
５０：イラスト化画像データ
５１：対象物体領域
５２：塗り潰し領域
１０１：第１対象物体
１０２：第２対象物体
１０３：第３対象物体
１０４：第４対象物体
１０５：第５対象物体
１０６：第６対象物体
１０７：第７対象物体
３０１：学習装置
３０１Ｆ１：学習データ入力部
３０１Ｆ２：学習部
３０２：学習モデル
３０３：学習済みモデル
４０１：未知画像データ
４０２：摘果対象物推定装置
４０２Ｆ１：推定部
４０２Ｆ２：出力部
４０３：出力画面
４０４：ユーザ
４０５：設定画面
５００：学習システム
10: learning data generation device 10F1: image data input unit 10F2: extraction unit 10F3: generation unit 10F4: identification unit 10F5: mask image data generation unit 10F6: illustration processing unit 11: camera 11D1: first input image data 11D2: first 2 Input image data 12 : First crop 13 : Second crop 14 : Worker 15 : Learning data 20 : Extraction result 21 : Estimation result image data 22 : Correct data 31 : First object 32 : Second object 33 : Third Object 34 : Fourth object 40 : Mask image data 41 : First target object 42 : Second target object 43 : Third target object 44 : Fourth target object 50 : Illustrated image data 51 : Target object region 52 : Filled region 101: first target object 102: second target object 103: third target object 104: fourth target object 105: fifth target object 106: sixth target object 107: seventh target object 301: learning device 301F1: learning data Input unit 301F2 : Learning unit 302 : Learning model 303 : Learned model 401 : Unknown image data 402 : Thinning object estimation device 402F1 : Estimation unit 402F2 : Output unit 403 : Output screen 404 : User 405 : Setting screen 500 : Learning system

Claims

A learning device for learning a learning model having a generation unit and an identification unit,
an image data input unit for inputting first input image data, which is image data representing the crops before thinning, and second input image data, which is image data representing the crops after thinning;
the generation unit that generates estimation result image data indicating a result of estimating a thinning target object in the crop;
A learning device comprising: the identification unit that identifies the estimation result image data, feeds back the identification result to the generation unit, and learns the learning model.

A thinning target object estimation device that uses a learned model learned by the learning device according to claim 1,
an image data input unit for inputting unknown image data showing unknown crops before fruit thinning;
an estimating unit that estimates the thinning object using the learned model;
and an output unit that outputs an estimation result obtained by the estimation unit.

3. The thinning target object estimation apparatus according to claim 2, further comprising a mask image data generation unit that generates mask image data that distinguishes between a target object and objects other than the target object.

The thinning target object estimation device according to claim 2 or 3, further comprising an illustration processing unit that illustrates the target object in the first input image data.

A program for causing a computer having a generation unit and an identification unit to execute a learning method,
an image data input procedure in which a computer inputs first input image data, which is image data representing the crop before thinning, and second input image data, which is image data representing the crop after thinning;
a generation procedure in which a computer generates estimation result image data indicating a result of estimating a thinning object in the crop;
A program for causing a computer to identify the estimation result image data, feed back the identification result to the generation unit, and execute an identification procedure for learning a learning model.

A program for causing a computer using a trained model trained by executing the program according to claim 5 to execute an estimation method,
An image data input procedure in which a computer inputs unknown image data showing an unknown crop before fruit thinning;
an estimation procedure in which a computer estimates the thinning object using the learned model;
A program for causing a computer to execute an output procedure for outputting an estimation result obtained by the estimation procedure.

An estimation system having a learning device for learning a learning model having a generation unit and an identification unit, and a thinning target object estimation device that uses the learned model learned by the learning device,
The learning device
an image data input unit for inputting first input image data, which is image data representing the crops before thinning, and second input image data, which is image data representing the crops after thinning;
the generation unit that generates estimation result image data indicating a result of estimating a thinning target object in the crop;
the identification unit that identifies the estimation result image data, feeds back the identification result to the generation unit, and learns the learning model;
The thinning target object estimation device includes:
an image data input unit for inputting unknown image data showing unknown crops before fruit thinning;
an estimating unit that estimates the thinning object using the learned model;
An estimation system comprising an output unit that outputs an estimation result obtained by the estimation unit.