JP7479007B2

JP7479007B2 - Information processing device, program, system, and method for detecting grapes from an image

Info

Publication number: JP7479007B2
Application number: JP2020094006A
Authority: JP
Inventors: 暁陽茅; プラウィットブアヤイ; 正広豊浦; 公司三井
Original assignee: University of Yamanashi NUC
Current assignee: University of Yamanashi NUC
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2024-05-08
Anticipated expiration: 2040-05-29
Also published as: JP2021189718A

Description

本発明は、画像からぶどう粒を検出する情報処理装置、プログラム、システム、及び方法に関する。 The present invention relates to an information processing device, program, system, and method for detecting grapes from an image.

特許文献１には、対象とするぶどう一房を仕切りによって作られた撮影空間部に配置し、ぶどう粒数を画像解析によって計数するぶどう粒計数装置等が開示されている。 Patent Document 1 discloses a grape counting device that places a bunch of grapes in a photography space created by partitions and counts the number of grapes through image analysis.

特開２０１９－２００５６３Patent Publication 2019-200563

しかし、ぶどう房に付いているぶどう粒を通常の摘粒作業の流れを止めずに計測するためには、仕切り等によって作られた撮影空間部に配置せず、摘粒作業者の視界と同様の、複数のぶどう房が含まれている画像からでも作業中のぶどう房に属するぶどう粒を検出できるようにする必要があった。 However, in order to measure the grapes on the grape bunches without interrupting the normal flow of the picking work, it was necessary to be able to detect the grapes belonging to the bunch being worked on even from an image that contains multiple grape bunches, similar to the field of vision of the picker, without placing them in the shooting space created by a partition or the like.

本発明は、複数のぶどう房が含まれている画像から作業中のぶどう房に属するぶどう粒を検出できる情報処理装置、プログラム、システム、及び方法を提供する。 The present invention provides an information processing device, program, system, and method that can detect grapes belonging to a bunch of grapes being worked on from an image that includes multiple bunches of grapes.

本発明によれば、画像からぶどう粒を検出する情報処理装置であって、粒検出部と、房特定部と、統合処理部と、を備え、前記粒検出部は、前記画像に含まれる前記ぶどう粒を検出し、前記房特定部は、前記画像に含まれるぶどう房を検出し、前記画像中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定し、前記統合処理部は、前記ぶどう粒の検出結果及び前記ぶどう房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する、情報処理装置が提供される。 According to the present invention, there is provided an information processing device that detects grapes from an image, the information processing device comprising a grape detection unit, a bunch identification unit, and an integration processing unit, the grape detection unit detects the grapes contained in the image, the bunch identification unit detects the grape bunches contained in the image, and identifies the bunch of grapes currently being worked on from the detected bunches based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image, and the integration processing unit determines the grapes that belong to the bunch of grapes currently being worked on based on the results of the grape detection and the results of the grape bunch identification.

本発明では、ぶどう粒の検出とぶどう房の検出を行い、ぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定した上で、作業中のぶどう房に属する粒を決定する。このため、作業中ぶどう房を仕切り等によって作られた撮影空間部に配置しなくとも、複数のぶどう房が含まれている画像から作業中のぶどう房を特定し、作業中のぶどう房に属するぶどう粒を検出することができる。 In the present invention, grape berries and grape bunches are detected, and the bunch being worked on is identified from the detected bunches based on the position and size of the bunch, and the grapes belonging to the bunch being worked on are determined. Therefore, even if the bunch being worked on is not placed in a shooting space created by a partition or the like, it is possible to identify the bunch being worked on from an image containing multiple bunches and detect the grapes belonging to the bunch being worked on.

以下、本発明の種々の実施形態を例示する。以下に示す実施形態は互いに組み合わせ可能である。
好ましくは、前記情報処理装置において、前記粒検出部は、第１学習モデルに基づき前記画像に含まれる前記ぶどう粒を検出し、前記第１学習モデルは、前記画像の特徴量を利用し物体を検出する物体検出器と、前記画像の特徴量を利用し物体の分類を行う分類器と、を含む学習モデルである。
好ましくは、前記情報処理装置において、前記第１学習モデルに含まれる分類器は、特徴量として位置情報を分類に利用しない分類器である非位置的分類器と、特徴量として位置情報を分類に利用する分類器である位置的分類器と、を有し、前記非位置的分類器による分類結果及び前記位置的分類器による分類結果に基づき分類する。
好ましくは、前記情報処理装置において、前記房特定部は、第２学習モデルに基づき前記画像に含まれるぶどう房を検出し、前記第２学習モデルは、前記画像の特徴量を利用し物体を検出する物体検出器と、前記画像の特徴量を利用し物体の分類を行う分類器と、を含む学習モデルであり、前記第２学習モデルは、前記第１学習モデルと同一又は異なる学習モデルである。
好ましくは、前記情報処理装置において、粒数算出部をさらに備え、前記粒数算出部は、前記画像において作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数を計測し、計測した前記ぶどう粒の数及び所定の係数に基づき、前記ぶどう房が有する粒の総数の範囲を算出する。 Various embodiments of the present invention will be described below. The embodiments described below can be combined with each other.
Preferably, in the information processing device, the grain detection unit detects the grape grains contained in the image based on a first learning model, and the first learning model is a learning model including an object detector that detects objects using features of the image, and a classifier that classifies objects using the features of the image.
Preferably, in the information processing device, the classifier included in the first learning model has a non-positional classifier that is a classifier that does not use location information as a feature for classification, and a positional classifier that is a classifier that uses location information as a feature for classification, and classifies based on the classification results by the non-positional classifier and the classification results by the positional classifier.
Preferably, in the information processing device, the bunch identification unit detects grape bunches included in the image based on a second learning model, the second learning model being a learning model including an object detector that detects objects using features of the image and a classifier that classifies objects using the features of the image, and the second learning model being the same as or different from the first learning model.
Preferably, the information processing device further includes a grape number calculation unit that counts the number of grapes detected in the image as grapes belonging to the bunch of grapes under work, and calculates a range of the total number of grapes in the bunch of grapes based on the counted number of grapes and a predetermined coefficient.

また、本発明によれば、画像からぶどう粒を検出するシステムであって、画像撮影部と、画像解析部を備え、前記画像撮影部は、前記画像を撮影し、前記画像解析部は、粒検出部と、房特定部と、統合処理部を備え、前記粒検出部は、前記画像に含まれる前記ぶどう粒を検出し、前記房特定部は、前記画像に含まれるぶどう房を検出し、前記画像中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定し、前記統合処理部は、前記ぶどう粒の検出結果及び前記ぶどう房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する、システムが提供される。
好ましくは、前記システムにおいて、解析結果表示部をさらに備え、前記解析結果表示部は、作業中のぶどう房が有するぶどう粒の総数の範囲を表示する、システム。 Furthermore, according to the present invention, there is provided a system for detecting grapes from an image, the system comprising an image capturing unit and an image analysis unit, the image capturing unit captures the image, the image analysis unit comprises a grape detection unit, a bunch identification unit and an integration processing unit, the grape detection unit detects the grapes contained in the image, the bunch identification unit detects the grape bunches contained in the image, and identifies a bunch of grapes currently being worked on from the detected bunches based on the positions of the bunches and the sizes of the bunches, and the integration processing unit determines which grapes belong to the bunch currently being worked on based on the results of the grape grain detection and the results of the bunch identification.
Preferably, the system further comprises an analysis result display unit, the analysis result display unit displaying the range of the total number of grapes in the grape bunch under operation.

また、本発明によれば、画像からぶどう粒を検出させるプログラムであって、コンピュータに、粒検出工程と、房特定工程と、統合処理工程とを実行させ、前記粒検出工程では、前記画像に含まれる前記ぶどう粒を検出し、前記房特定工程では、前記画像に含まれるぶどう房を検出し、前記画像中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定し、前記統合処理工程では、前記ぶどう粒の検出結果及び前記ぶどう房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する、プログラムが提供される。 The present invention also provides a program for detecting grapes from an image, which causes a computer to execute a grape detection step, a bunch identification step, and an integration process step, in which the grape detection step detects the grapes contained in the image, the bunch identification step detects the bunches of grapes contained in the image, and identifies the bunch of grapes currently being worked on from the detected bunches based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image, and the integration process determines the grapes that belong to the bunch of grapes currently being worked on based on the results of the grape detection and the results of the bunch identification.

また、本発明によれば、画像からぶどう粒を検出する情報処理方法であって、粒検出工程と、房特定工程と、統合処理工程と、を備え、前記粒検出工程では、前記画像に含まれる前記ぶどう粒を検出し、前記房特定工程では、前記画像に含まれるぶどう房を検出し、前記画像中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定し、前記統合処理工程では、前記ぶどう粒の検出結果及び前記ぶどう房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する、情報処理方法が提供される。 The present invention also provides an information processing method for detecting grapes from an image, the information processing method comprising a grape detection step, a bunch identification step, and an integration step, in which the grape detection step detects the grapes contained in the image, the bunch identification step detects the grape clusters contained in the image, and identifies the bunch of grapes currently being worked on from the detected bunches based on the positions of the bunches and the sizes of the bunches in the image, and the integration step determines the grapes that belong to the bunch of grapes currently being worked on based on the results of the grape detection and the results of the grape cluster identification.

第１実施形態に係るぶどう粒検出システム１の概要を示す図である。1 is a diagram showing an overview of a grape grain detection system 1 according to a first embodiment. FIG. 第１実施形態に係る情報処理装置１０及びユーザ端末２０のハードウェア構成を示すブロック図である。1 is a block diagram showing the hardware configuration of an information processing device 10 and a user terminal 20 according to a first embodiment. 第１実施形態に係る情報処理装置１０及びユーザ端末２０の機能構成を示すブロック図である。1 is a block diagram showing functional configurations of an information processing device 10 and a user terminal 20 according to a first embodiment. 粒検出部１１ａによるぶどう粒検出の結果について説明する概念図である。1 is a conceptual diagram illustrating the results of grape grain detection by the grain detection unit 11a. FIG. 房特定部１１ｂによる作業中のぶどう房の検出・特定の結果について説明する概念図である。11 is a conceptual diagram for explaining the results of detection and identification of grape bunches during operation by the bunch identification unit 11b. FIG. 房特定部１１ｂによる同一ぶどう房の重複検出とその排除処理について説明する概念図である。11 is a conceptual diagram illustrating detection of overlapping grape bunches by the bunch specifying unit 11b and processing for eliminating the overlapping grape bunches. FIG. 統合処理部１１ｃによる作業中のぶどう房に属するぶどう粒の検出結果について説明する概念図である。11 is a conceptual diagram illustrating the detection result of grapes belonging to a grape bunch during work by the integrated processing unit 11c. FIG. 粒数算出部１１ｄによる粒数算出の検証結果を示す図である。13 is a diagram showing a verification result of particle number calculation by the particle number calculation unit 11d. FIG. 第２実施形態に係る畳み込みニューラルネットワークの構造の概略図である。FIG. 13 is a schematic diagram of the structure of a convolutional neural network according to a second embodiment. 領域提案ネットワークから位置情報について出力されるまでの流れの概略図である。FIG. 1 is a schematic diagram of the flow from the region proposal network to the output of location information. 学習モデルの一例を示す概略図である。FIG. 2 is a schematic diagram illustrating an example of a learning model. ぶどう粒の検出結果をバウンディングボックスで表す場合の概略図である。FIG. 13 is a schematic diagram showing the detection result of grapes represented by a bounding box. マスクを含む学習モデルの一例の概略図である。FIG. 2 is a schematic diagram of an example of a learning model including a mask. 細分化された領域毎に分類を行った結果をぶどう粒を塗りつぶして表した概略図である。This is a schematic diagram showing the results of classification for each subdivided region by filling in grapes. 複数の分類器、物体検出器を有する学習モデルの一例の概略図である。FIG. 1 is a schematic diagram of an example of a learning model having multiple classifiers and object detectors. 複数の分類器、物体検出器、マスクを有する学習モデルの一例の概略図である。FIG. 1 is a schematic diagram of an example of a learning model having multiple classifiers, object detectors, and masks. 変形例２のモデルによる画像解析の効果の検証結果を説明する図である。13A to 13C are diagrams illustrating the results of verification of the effect of image analysis using a model of the second modified example. 第３実施形態に係る画像合成工程を説明する概念図である。FIG. 13 is a conceptual diagram illustrating an image synthesis process according to the third embodiment.

以下、図面を用いて本発明のいくつかの実施形態について説明する。以下に示す実施形態中で示した各種特徴事項は、互いに組み合わせ可能である。また、各特徴事項について独立して発明が成立する。 Below, several embodiments of the present invention will be described with reference to the drawings. The various features shown in the embodiments below can be combined with each other. In addition, each feature can be an invention independently.

＜１．第１実施形態＞
（１－１．ぶどう粒検出システム１）
本発明の一実施形態に係る情報処理装置は、図１に示すようなぶどう粒検出システム１の一部を構成するサーバ等の情報処理装置１０である。ぶどう粒検出システム１は、情報処理装置１０、及びユーザ端末２０を備える。 <1. First embodiment>
(1-1. Grape Grain Detection System 1)
An information processing device according to one embodiment of the present invention is an information processing device 10 such as a server constituting a part of a grape grain detection system 1 as shown in Fig. 1. The grape grain detection system 1 includes the information processing device 10 and a user terminal 20.

情報処理装置１０は、通信回線５を介してユーザ端末２０と通信可能に構成される。ユーザ端末２０は、複数のぶどう房が含まれていてもよい画像Ｐ１を撮影し情報処理装置１０へ送信する。情報処理装置１０は、ユーザ端末２０から受信した画像Ｐ１を解析する。ぶどう粒の検出、ぶどう房の検出、作業中のぶどう房を特定に基づいて、作業中のぶどう房に属するぶどう粒の検出を実現する。以下、各構成について説明する。 The information processing device 10 is configured to be able to communicate with the user terminal 20 via the communication line 5. The user terminal 20 takes an image P1, which may include multiple grape bunches, and transmits it to the information processing device 10. The information processing device 10 analyzes the image P1 received from the user terminal 20. Based on the detection of grape berries, detection of grape bunches, and identification of the grape bunch being worked on, detection of grape berries belonging to the grape bunch being worked on is realized. Each component will be described below.

（１－２．ぶどう粒検出システム１のハードウェア構成）
図２を参照し、ぶどう粒検出システム１のハードウェア構成を説明する。 (1-2. Hardware configuration of grape grain detection system 1)
The hardware configuration of the grape grain detection system 1 will be described with reference to FIG.

（１ー２ー１．情報処理装置１０のハードウェア構成）
図２は、本実施形態に係る情報処理装置１０及びユーザ端末２０のハードウェア構成を示すブロック図である。情報処理装置１０は、制御部１１、記憶部１２、通信部１３を備える。また、情報処理装置１０は、キーボード及びマウス等で構成された各種操作の入力を受け付ける操作入力部１４、各種画像を表示する例えば液晶ディスプレイ装置等のモニタ１５を備えていてもよい。 (1-2-1. Hardware configuration of information processing device 10)
2 is a block diagram showing the hardware configuration of the information processing device 10 and the user terminal 20 according to this embodiment. The information processing device 10 includes a control unit 11, a storage unit 12, and a communication unit 13. The information processing device 10 may also include an operation input unit 14 configured with a keyboard, a mouse, etc., that receives input of various operations, and a monitor 15, such as a liquid crystal display device, that displays various images.

制御部１１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、マイクロプロセッサ、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）等であり、情報処理装置１０の全体の動作を制御する。 The control unit 11 is, for example, a CPU (Central Processing Unit), a microprocessor, or a DSP (Digital Signal Processor), and controls the overall operation of the information processing device 10.

記憶部１２の一部は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）やＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成されており、制御部１１による各種プログラムに基づく処理の実行時のワークエリア等として用いられる。また、記憶部１２の一部は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の不揮発性メモリ、又はＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）であり、各種データ及び制御部１１の処理に利用されるプログラム等を保存する。 A part of the storage unit 12 is composed of, for example, RAM (Random Access Memory) or DRAM (Dynamic Random Access Memory), and is used as a work area when the control unit 11 executes processes based on various programs. In addition, a part of the storage unit 12 is, for example, a non-volatile memory such as ROM (Read Only Memory) or an HDD (Hard Disk Drive), and stores various data and programs used in the processing of the control unit 11.

記憶部１２に記憶されるプログラムは、例えば、情報処理装置１０の基本的な機能を実現するためのＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）、各種ハードウェア制御するためのドライバ、各種機能を実現するためのプログラム等であって、本実施形態に係るコンピュータプログラムを含む。 The programs stored in the storage unit 12 include, for example, an OS (Operating System) for implementing the basic functions of the information processing device 10, drivers for controlling various hardware, programs for implementing various functions, etc., and include the computer program according to this embodiment.

通信部１３は、例えばＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣｏｎｔｒｏｌｌｅｒ）であり、通信回線５に接続する機能を有する。なお、通信部１３は、ＮＩＣに代えて又はＮＩＣと共に、無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）に接続する機能、無線ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）に接続する機能、例えばＢｌｕｅｔｏｏｔｈ（登録商標）等の近距離の無線通信、及び赤外線通信等を可能とする機能を有してもよい。情報処理装置１０は、通信回線５を介してユーザ端末２０等の他の情報処理装置等と接続され、他の情報処理装置等との間で各種データの送受信を行うことができる。 The communication unit 13 is, for example, a NIC (Network Interface Controller) and has a function of connecting to the communication line 5. Instead of or together with the NIC, the communication unit 13 may have a function of connecting to a wireless LAN (Local Area Network), a function of connecting to a wireless WAN (Wide Area Network), a function of enabling short-range wireless communication such as Bluetooth (registered trademark), and infrared communication. The information processing device 10 is connected to other information processing devices such as a user terminal 20 via the communication line 5, and can transmit and receive various data to and from the other information processing devices.

これら制御部１１、記憶部１２、通信部１３、操作入力部１４、及びモニタ１５は、システムバス１６を介して相互に電気的に接続されている。従って、制御部１１は、記憶部１２へのアクセス、モニタ１５に対する画像の表示、ユーザによる操作入力部１４に対する操作状態の把握、及び通信部１３を介した各種通信網や他の情報処理装置へのアクセス等を行うことができる。 The control unit 11, memory unit 12, communication unit 13, operation input unit 14, and monitor 15 are electrically connected to each other via a system bus 16. Therefore, the control unit 11 can access the memory unit 12, display images on the monitor 15, grasp the operation state of the user on the operation input unit 14, and access various communication networks and other information processing devices via the communication unit 13.

（１－２－２．ユーザ端末２０のハードウェア構成）
ユーザ端末２０は、例えば、ＡＲ（拡張現実）グラス、ＭＲ（複合現実）グラス、スマートグラス、スマートフォンやタブレット端末等の情報処理端末であり、制御部２１、記憶部２２、通信部２３、撮影部２４、表示部２５を備える。また、ユーザ端末２０は、音を出力するスピーカ２６、電源ボタンその他の操作ボタン等により構成される操作部（図示せず）等を備えていてもよい。以下、情報処理装置１０との相違点を中心に説明する。 (1-2-2. Hardware Configuration of User Terminal 20)
The user terminal 20 is, for example, an information processing terminal such as AR (Augmented Reality) glasses, MR (Mixed Reality) glasses, smart glasses, a smartphone, or a tablet terminal, and includes a control unit 21, a storage unit 22, a communication unit 23, an image capturing unit 24, and a display unit 25. The user terminal 20 may also include a speaker 26 that outputs sound, an operation unit (not shown) configured with a power button and other operation buttons, etc. The following description will focus on the differences from the information processing device 10.

撮影部２４は、静止画、動画等を撮影できるカメラを備える。表示部２５は、ユーザ端末２０がＡＲ（拡張現実）グラス、ＭＲ（複合現実）グラス、スマートグラス等である場合には、眼鏡のレンズに相当する部位と投影等の機能を有する部位との組み合わせによるディスプレイを備えうる。また、ＡＲ（拡張現実）グラス、ＭＲ（複合現実）グラス、スマートグラス等は、網膜に直接映像を照射する部位を備えていてもよい。表示部２５は、ユーザ端末２０がスマートフォンやタブレット端末等である場合には、画像等を表示し操作を受け付け可能なタッチパネルディスプレイ等であってもよい。 The photographing unit 24 includes a camera capable of photographing still images, videos, etc. If the user terminal 20 is AR (augmented reality) glasses, MR (mixed reality) glasses, smart glasses, etc., the display unit 25 may include a display that combines a portion equivalent to a lens of glasses with a portion having a function such as projection. In addition, AR (augmented reality) glasses, MR (mixed reality) glasses, smart glasses, etc. may include a portion that projects an image directly onto the retina. If the user terminal 20 is a smartphone, tablet terminal, etc., the display unit 25 may be a touch panel display that can display images, etc. and accept operations.

また、スピーカ２６は、後述する作業中のぶどう房が有するぶどう粒の総数に関する情報を音声や信号音等で作業者Ｗに伝えるために用いられてもよい。 The speaker 26 may also be used to communicate information regarding the total number of grapes in the grape bunch being worked on to the worker W by voice, signal sound, etc., as described below.

これら制御部２１、記憶部２２、通信部２３、撮影部２４、表示部２５、及びスピーカ２６は、システムバス２７を介して相互に電気的に接続されている。従って、制御部２１は、記憶部２２へのアクセス、撮影部２４に対する制御、表示部２５による画像の表示、作業者による操作状態の把握、スピーカ２６からの音の出力、及び通信部２３を介した各種通信網や他の情報処理装置へのアクセス等を行うことができる。 The control unit 21, memory unit 22, communication unit 23, image capture unit 24, display unit 25, and speaker 26 are electrically connected to each other via a system bus 27. Therefore, the control unit 21 can access the memory unit 22, control the image capture unit 24, display images on the display unit 25, grasp the operation state of the operator, output sound from the speaker 26, and access various communication networks and other information processing devices via the communication unit 23.

（１－３．情報処理装置１０の機能構成）
図３に示すように、情報処理装置１０の制御部１１は、粒検出部１１ａと、房特定部１１ｂと、統合処理部１１ｃを有する。制御部１１は、さらに粒数算出部１１ｄを有していてもよい。粒検出部１１ａ、房特定部１１ｂ、及び統合処理部１１ｃは、これらをまとめて画像解析部３０と称することもある。 (1-3. Functional configuration of information processing device 10)
3, the control unit 11 of the information processing device 10 has a kernel detection unit 11a, a bunch identification unit 11b, and an integration processing unit 11c. The control unit 11 may further have a kernel number calculation unit 11d. The kernel detection unit 11a, bunch identification unit 11b, and integration processing unit 11c may be collectively referred to as an image analysis unit 30.

粒検出部１１ａは、画像Ｐ１に含まれるぶどう粒を検出する。房特定部１１ｂは、画像Ｐ１に含まれるぶどう房を検出し、画像Ｐ１中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定する。統合処理部１１ｃは、ぶどう粒の検出結果及び房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する。 The grape detection unit 11a detects grapes contained in the image P1. The bunch identification unit 11b detects grape bunches contained in the image P1, and identifies the bunch currently being worked on from the detected bunches based on the position of the bunches and the size of the bunches in the image P1. The integration processing unit 11c determines the grapes that belong to the bunch currently being worked on based on the results of grape detection and bunch identification.

また、粒数算出部１１ｄは、画像Ｐ１において作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数を計測し、計測したぶどう粒の数及び所定の係数に基づき、ぶどう房が有する粒の総数の範囲を算出する。各機能の詳細は、後述する。 The grape number calculation unit 11d also counts the number of grapes detected in the image P1 as belonging to the grape bunch being worked on, and calculates the range of the total number of grapes in the grape bunch based on the counted number of grapes and a predetermined coefficient. The details of each function will be described later.

（１－４．ユーザ端末２０の機能構成）
図３に示すように、ユーザ端末２０の制御部２１は、画像撮影部２１ａを有する。画像撮影部２１ａは、画像Ｐ１を撮影する。また、制御部２１は、解析結果表示部２１ｂを有していてもよい。解析結果表示部２１ｂは、作業中のぶどう房が有するぶどう粒の総数の範囲を表示する。すなわち、解析結果表示部２１ｂは、当該ぶどう粒の総数の範囲を、表示部２５に表示することによってユーザ端末２０のユーザである作業者Ｗに伝達する。 (1-4. Functional configuration of user terminal 20)
As shown in Fig. 3, the control unit 21 of the user terminal 20 has an image capturing unit 21a. The image capturing unit 21a captures an image P1. The control unit 21 may also have an analysis result display unit 21b. The analysis result display unit 21b displays the range of the total number of grapes in the grape bunch under work. In other words, the analysis result display unit 21b communicates the range of the total number of grapes to the worker W, who is the user of the user terminal 20, by displaying it on the display unit 25.

上述のように、ぶどう粒検出システム１は、画像撮影部２１ａと、画像解析部３０を備える。ぶどう粒検出システム１は、解析結果表示部２１ｂをさらに備えてもよい。 As described above, the grape grain detection system 1 includes an image capture unit 21a and an image analysis unit 30. The grape grain detection system 1 may further include an analysis result display unit 21b.

上述した機能構成は、情報処理装置１０又はユーザ端末２０に適宜インストールされるソフトウェア（いわゆるアプリを含む）によって実現してもよく、ハードウェアによって実現してもよい。ソフトウェアによって実現する場合、制御部１１又は制御部２１がソフトウェアを構成するプログラムを実行することによって各種機能を実現することができる。 The above-mentioned functional configuration may be realized by software (including so-called apps) that is appropriately installed on the information processing device 10 or the user terminal 20, or may be realized by hardware. When realized by software, the various functions can be realized by the control unit 11 or the control unit 21 executing the programs that constitute the software.

プログラムを実行することで実現される場合、当該プログラムは、情報処理装置１０又はユーザ端末２０が内蔵する記憶部１２又は記憶部２２に格納してもよく、コンピュータが読み取り可能な非一時的な記録媒体に格納してもよい。また、外部の記憶装置に格納されたプログラムを読み出し、いわゆるクラウドコンピューティングにより実現してもよい。もしくは、ハードウェアによって実現する場合、ＡＳＩＣ、ＳＯＣ、ＦＰＧＡ、又はＤＲＰなどの種々の回路によって実現することができる。また、情報処理装置１０の機能として説明した一部の機能構成は、ソフトウェア又はハードウェアによってユーザ端末２０等で処理されるようにしてもよい。反対に、ユーザ端末２０の機能として説明した一部の機能構成は、ソフトウェア又はハードウェアによって情報処理装置１０等で処理されるようにしてもよい。 When it is realized by executing a program, the program may be stored in the memory unit 12 or memory unit 22 built into the information processing device 10 or the user terminal 20, or may be stored in a non-transitory recording medium that is readable by a computer. It may also be realized by reading out a program stored in an external storage device and using so-called cloud computing. Or, when it is realized by hardware, it can be realized by various circuits such as ASIC, SOC, FPGA, or DRP. Also, some of the functional configurations described as functions of the information processing device 10 may be processed by the user terminal 20, etc., by software or hardware. Conversely, some of the functional configurations described as functions of the user terminal 20 may be processed by the information processing device 10, etc., by software or hardware.

（１－５．粒検出部１１ａの機能）
図４を参照し、粒検出部１１ａの機能を説明する。粒検出部１１ａは、画像Ｐ１に含まれるぶどう粒を検出する。画像Ｐ１は、摘粒の作業者Ｗが有するユーザ端末２０の撮影部２４によって撮影され、情報処理装置１０に送信された画像である。ユーザ端末２０が、ＡＲグラス等である場合には画像Ｐ１に映る範囲は、作業者Ｗの視野に近い範囲となりうる。 (1-5. Functions of the grain detection unit 11a)
The function of the grain detection unit 11a will be described with reference to Fig. 4. The grain detection unit 11a detects grapes contained in an image P1. The image P1 is an image captured by the image capture unit 24 of a user terminal 20 held by a worker W performing the picking, and transmitted to the information processing device 10. If the user terminal 20 is an AR glass or the like, the range captured in the image P1 may be a range close to the field of view of the worker W.

粒検出部１１ａによるぶどう粒の検出は、画像Ｐ１が有する色相・明度・彩度等その他画像解析によって得られる種々の特徴量に基づき行われる。例えば、画像Ｐ１を２値化処理した上で、輪郭や大きさ等に基づき画像解析することによって検出してもよい。 Grape detection by the grain detection unit 11a is performed based on various feature quantities obtained by image analysis, such as the hue, brightness, saturation, and other features of image P1. For example, image P1 may be binarized, and then image analysis may be performed based on the contours, size, etc. to detect grapes.

ぶどう粒検出結果は、各ぶどう粒について位置、大きさ、範囲等として得られ記憶部１２に記録されうる。このようなぶどう粒検出結果に基づいて画像Ｐ１を加工すると、例えば、図４に示す画像Ｐ２のようになる。ぶどう粒として検出されたぶどう粒（検出ぶどう粒ＤＧ）の領域は、黒く塗りつぶされている。ぶどう粒として検出されなかったぶどう粒（非検出ぶどう粒ＤＧ）の領域は、塗りつぶされず白抜きのままである。 The grape detection results can be obtained as the position, size, range, etc. of each grape and can be recorded in the memory unit 12. When image P1 is processed based on such grape detection results, it becomes, for example, image P2 shown in FIG. 4. The areas of grapes detected as grapes (detected grapes DG) are filled in black. The areas of grapes not detected as grapes (non-detected grapes DG) are not filled in and remain white.

（１－６．房特定部１１ｂの機能）
図５を参照し、房特定部１１ｂの機能を説明する。房特定部１１ｂは、画像Ｐ１に含まれるぶどう房を検出する。房特定部１１ｂによるぶどう房の検出は、ぶどう粒の検出と同様に、画像Ｐ１が有する色相・明度・彩度等その他画像解析によって得られる種々の特徴量に基づき行われる。例えば、画像Ｐ１を２値化処理した上で、輪郭や大きさ等に基づき画像解析することによって検出してもよい。 (1-6. Function of the tuft specifying unit 11b)
The function of the bunch identification unit 11b will be described with reference to Fig. 5. The bunch identification unit 11b detects grape bunches contained in the image P1. The detection of grape bunches by the bunch identification unit 11b is performed based on various feature quantities obtained by image analysis, such as the hue, brightness, saturation, etc., of the image P1, in the same manner as the detection of grape grains. For example, the image P1 may be subjected to binarization processing, and then image analysis based on the contour, size, etc. may be performed to detect the grape bunches.

ぶどう房検出結果は、各ぶどう房について位置、大きさ、範囲等として得られ記憶部１２に記録されうる。このようなぶどう房検出結果に基づいて画像Ｐ１を加工すると、例えば、図５に示す画像Ｐ３のようになる。ぶどう房として検出されたぶどう房は、ぶどう房Ｂ１、ぶどう房Ｂ２、及びぶどう房Ｂ３として枠で囲われて示されている。 The grape bunch detection results can be obtained as the position, size, range, etc. of each grape bunch and can be recorded in the memory unit 12. When image P1 is processed based on such grape bunch detection results, it becomes, for example, image P3 shown in FIG. 5. The grape bunches detected as grape bunches are shown surrounded by frames as grape bunch B1, grape bunch B2, and grape bunch B3.

房特定部１１ｂは、画像Ｐ１中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定する。図５においては、ぶどう房として、ぶどう房Ｂ１、ぶどう房Ｂ２、及びぶどう房Ｂ３が検出されているため、これらの中から作業中のぶどう房を特定する。 The bunch identification unit 11b identifies the bunch of grapes being worked on from the detected bunches based on the position of the bunch of grapes in the image P1 and the size of the bunch of grapes. In FIG. 5, bunch of grapes B1, bunch of grapes B2, and bunch of grapes B3 have been detected as bunches of grapes, and from these, the bunch of grapes being worked on is identified.

ここで、「作業中のぶどう房」とは、作業者Ｗが作業対象とするぶどう房である。作業者Ｗが作業対象とするぶどう房とは、例えば、摘粒作業を行おうとしているぶどう房、摘粒作業を行っているぶどう房、摘粒作業の状態を確認しているぶどう房等を意味する。摘粒作業のために作業者Ｗが注目、注視しているぶどう房ともいえる。 Here, the "bunch of grapes being worked on" refers to the bunch of grapes that the worker W is working on. The bunch of grapes that the worker W is working on refers to, for example, a bunch of grapes about to be thinned, a bunch of grapes on which the thinning work is being performed, a bunch of grapes on which the worker W is checking the status of the thinning work, etc. It can also be said to be a bunch of grapes that the worker W is paying attention to and gazing at in order to thin the grapes.

作業中のぶどう房を特定するために考慮する「房の位置」の判定基準は、例えば、画像Ｐ１の中心に対する検出された各ぶどう房の近さである。図５に示すように、画像Ｐ１の左下を原点、その右方向をＸ軸、上方向をＹ軸、右端をＸ＝１、上端をＹ＝１と設定する。この場合、画像Ｐ１の中心は、（Ｘ,Ｙ）＝（０．５，０．５）となる点Ｄ１である。この点Ｄ１と、検出された各ぶどう房の中心との距離（画像房中心間距離）に基づき、近さを算出することができる。近さを表す指標として、例えば、１を画像房中心間距離で割った値、言い換えれば「画像中心への近接率」等を利用することができる。 The criterion for determining the "bunch position" to be considered in identifying the bunch being worked on is, for example, the proximity of each detected bunch to the center of image P1. As shown in FIG. 5, the bottom left of image P1 is set as the origin, the right direction is the X axis, the top direction is the Y axis, the right end is X=1, and the top end is Y=1. In this case, the center of image P1 is point D1 where (X, Y)=(0.5, 0.5). The proximity can be calculated based on the distance between this point D1 and the center of each detected bunch of grapes (image bunch center distance). For example, a value obtained by dividing 1 by the image bunch center distance, in other words, the "proximity rate to the image center" can be used as an index of proximity.

作業中のぶどう房を特定するために考慮する「ぶどう房の大きさ」は、例えば、画像Ｐ１中の検出された各ぶどう房に属する領域の面積である。画像Ｐ１中における各ぶどう房の大きさを表す指標として、例えば、「ぶどう房に属する領域の面積」を「画像Ｐ１の面積」で割った値、言い換えれば「画像における占有率」等を利用することができる。 The "size of the bunch of grapes" considered to identify the bunch of grapes being worked on is, for example, the area of the area belonging to each detected bunch of grapes in image P1. As an index representing the size of each bunch of grapes in image P1, for example, the value obtained by dividing the "area of the area belonging to the bunch of grapes" by the "area of image P1", in other words, the "occupancy rate in the image", etc. can be used.

そして、房特定部１１ｂが、「ぶどう房の位置」及び「ぶどう房の大きさ」に基づき作業中のぶどう房を特定する方法の一例としては、画像Ｐ１の中心に対する検出された各ぶどう房の近さと、画像Ｐ１中の検出された各ぶどう房に属する領域の面積に基づいて特定する方法が挙げられる。より具体的には、例えば、「画像中心への近接率」と「画像における占有率」を足した値を基準とする。一態様においては、足した値がより大きい方が作業中のぶどう房である確率がより高いと判断できる。 An example of a method for the bunch identification unit 11b to identify bunches of grapes in work based on the "position of the bunch" and the "size of the bunch" is to identify the bunches of grapes in work based on the proximity of each detected bunch of grapes to the center of image P1 and the area of the area belonging to each detected bunch of grapes in image P1. More specifically, for example, the standard is the sum of the "proximity rate to the image center" and the "occupancy rate in the image". In one aspect, it can be determined that the larger the sum, the higher the probability that the bunch is in work.

図５において、仮に、ぶどう房Ｂ１～Ｂ３の「画像中心への近接率」がそれぞれ０．５、０．３、０．１であり、ぶどう房Ｂ１～Ｂ３の「画像における占有率」がそれぞれ０．２、０．１５、０．１であるとする。このような場合、「画像中心への近接率」と「画像における占有率」を足した値、作業中のぶどう房である確率は、ぶどう房Ｂ１～Ｂ３についてそれぞれ０．７、０．４５、０．２である。すなわち、画像Ｐ１の中でぶどう房Ｂ１が作業中のぶどう房である確率が一番高く、作業中のぶどう房として特定される。 In FIG. 5, let us assume that the "proximity rate to the image center" of grape bunches B1 to B3 is 0.5, 0.3, and 0.1, respectively, and that the "occupancy rate in the image" of grape bunches B1 to B3 is 0.2, 0.15, and 0.1, respectively. In such a case, the sum of the "proximity rate to the image center" and the "occupancy rate in the image", that is, the probability that a bunch is being worked on, is 0.7, 0.45, and 0.2 for grape bunches B1 to B3, respectively. In other words, grape bunch B1 has the highest probability of being a bunch being worked on in image P1, and is identified as a bunch being worked on.

また、房特定部１１ｂによるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定する処理は、図６の画像Ｐ３'に示すように、同一のぶどう房がぶどう房Ｂ１及びぶどう房Ｂ１'として重複して検出されてしまった際に除外することにも寄与する。ぶどう房Ｂ１'はぶどう房Ｂ１の一部が誤ってぶどう房として検出されてしまったものである。しかし、作業中のぶどう房を特定するためにぶどう房の位置及びぶどう房の大きさを考慮すると、作業中のぶどう房としてはぶどう房Ｂ１'は除外されることになり、精度良く作業中のぶどう房を特定することができる。 The process of identifying a bunch of grapes being worked on from the detected bunches based on the position and size of the bunch by the bunch identification unit 11b also contributes to excluding the same bunch of grapes when it is detected as bunch B1 and bunch B1' in duplicate, as shown in image P3' in FIG. 6. Bunch B1' is a part of bunch B1 that has been mistakenly detected as a bunch of grapes. However, when the position and size of the bunch of grapes are taken into consideration in order to identify the bunch of grapes being worked on, bunch B1' is excluded as a bunch of grapes being worked on, and the bunch of grapes being worked on can be identified with high accuracy.

（１－７．統合処理部１１ｃの機能）
図７を参照し、統合処理部１１ｃの機能を説明する。統合処理部１１ｃは、ぶどう粒の検出結果及びぶどう房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する。ぶどう粒の検出結果とぶどう房の特定結果を総合的に考慮して判断する。 (1-7. Functions of the integration processing unit 11c)
The function of the integration processing unit 11c will be described with reference to Fig. 7. The integration processing unit 11c determines which grapes belong to the grape bunch under processing based on the grape grape detection results and the grape bunch identification results. The determination is made by comprehensively considering the grape grape detection results and the grape bunch identification results.

「ぶどう粒の検出結果」に基づきぶどう粒であると考えられるぶどう粒ＤＧうち、「ぶどう房の特定結果」に基づき特定された作業中のぶどう房Ｂ１に属する可能性の高いものを、作業中のぶどう房Ｂ１に属するぶどう粒ＤＧとして決定する。このような決定に基づいて画像Ｐ１を加工すると、例えば、図７に示す画像Ｐ４のようになる。 Of the grape berries DG that are considered to be grape berries based on the "grape berry detection result," those that are highly likely to belong to the grape bunch B1 during work that is identified based on the "grape bunch identification result" are determined to be grape berries DG that belong to the grape bunch B1 during work. When image P1 is processed based on such a determination, it becomes, for example, image P4 shown in FIG. 7.

画像Ｐ４全体で検出されたぶどう粒ＤＧ（黒塗りの粒）は２６個である。しかし、作業中と特定されたぶどう房Ｂ１は実線の枠内であり、作業中のぶどう房Ｂ１に属するぶどう粒として決定されたはぶどう粒は、ぶどう房Ｂ１は実線の枠で囲まれた範囲内の９個のぶどう粒ＤＧとなる。 There are 26 grape berries DG (black berries) detected in the entire image P4. However, the grape bunch B1 identified as being in work is within the solid line frame, and the grape berries determined to belong to the grape bunch B1 in work are the nine grape berries DG within the range enclosed by the solid line frame.

（１－８．粒数算出部１１ｄの機能）
粒数算出部１１ｄは、画像Ｐ１において作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数を計測する。作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数は、例えば、図７では、ぶどう房Ｂ１は実線の枠で囲まれた範囲内の９個である。 (1-8. Functions of particle number calculation unit 11d)
The grape number calculation unit 11d counts the number of grapes detected as belonging to the grape bunch under work in the image P1. For example, in Fig. 7, the number of grapes detected as belonging to the grape bunch under work is 9 for grape bunch B1 within the range surrounded by a solid line frame.

また、粒数算出部１１ｄは、計測したぶどう粒の数及び所定の係数に基づき、ぶどう房Ｂ１が有するぶどう粒の総数の範囲を算出する。所定の係数は、例えば、２次元画像から検出し計測したぶどう房に属しているぶどう粒の数と、２次元画像上では見えない等の理由により検出されない裏側に位置しているぶどう粒等も含めた、当該ぶどう房に属しているぶどう粒の実際の数の関係に基づいて決定される係数である。 The grape number calculation unit 11d also calculates the range of the total number of grapes in the grape bunch B1 based on the number of measured grapes and a predetermined coefficient. The predetermined coefficient is a coefficient determined based on the relationship between the number of grapes belonging to the grape bunch detected and measured from the two-dimensional image, and the actual number of grapes belonging to the grape bunch, including grapes located on the back side that are not detected because they are not visible in the two-dimensional image, for example.

一態様における所定の係数の決定について説明する。まず、ぶどう房に属しているぶどう粒の実際の数がわかっている複数のぶどう房について、それらのぶどう房を含む複数のサンプル画像を用意し、作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数を計測する。計測したぶどう粒の数毎に、ぶどう房に属しているぶどう粒の実際の数を集計（クラスタリング）する。このような集計によって、計測したぶどう粒の数に対応する、ぶどう房が有するぶどう粒の総数の範囲を算出する係数を決定することができる。なお、複数のぶどう房の一部は同一のぶどう房であってもよい。ただし、同一のぶどう房であっても、摘粒作業によって付いている粒の数が異なるものであることが好ましい。 The determination of the predetermined coefficient in one embodiment will be described. First, for multiple grape bunches for which the actual number of grapes belonging to the bunch is known, multiple sample images including those grape bunches are prepared, and the number of grapes detected as grapes belonging to the bunch under operation is counted. For each number of measured grapes, the actual number of grapes belonging to the bunch is tallied (clustered). By such tallying, it is possible to determine a coefficient for calculating the range of the total number of grapes in the bunch, which corresponds to the number of measured grapes. Note that some of the multiple grape bunches may be the same bunch. However, even if they are the same bunch, it is preferable that the number of grapes on the bunch differs due to the thinning operation.

例えば、複数のサンプル画像から計測したぶどう粒の数が４４の場合には、ぶどう房に属しているぶどう粒の実際の数が４４～５２であることがわかったとする。計測したぶどう粒の数が４４の場合には、所定の係数は１（＝４４／４４）～１．８（＝５２／４４）と決定することができる。すなわち、計測したぶどう粒の数毎に個別に所定の係数が決定されてよい。 For example, suppose that the number of grapes measured from multiple sample images is 44, and it is found that the actual number of grapes belonging to the grape cluster is between 44 and 52. If the number of grapes measured is 44, the predetermined coefficient can be determined to be between 1 (=44/44) and 1.8 (=52/44). In other words, the predetermined coefficient may be determined separately for each number of grapes measured.

計測したぶどう粒の数毎に、ぶどう房に属しているぶどう粒の実際の数を集計した結果は、図８のグラフのように表すことができる。計測したぶどう粒の数をＸとし、ぶどう房に属しているぶどう粒の実際の数をＹとする。全ての点（Ｘ，Ｙ）をプロットした上で、同一のＸにおけるＹの最小点、最大点をプロットする。このプロットにより、各最小点に基づく回帰曲線と、各最大点に基づく回帰曲線を描くことができる。各最小点に基づく回帰曲線は、ぶどう房に属しているぶどう粒の実際の数の下限曲線となり、各最大点に基づく回帰曲線は、ぶどう房に属しているぶどう粒の実際の数の上限曲線となる。 The results of tallying up the actual number of grapes belonging to a grape cluster for each number of grapes measured can be represented as in the graph in Figure 8. Let X be the number of grapes measured, and Y be the actual number of grapes belonging to a grape cluster. After plotting all points (X, Y), plot the minimum and maximum points of Y at the same X. This plot makes it possible to draw a regression curve based on each minimum point and a regression curve based on each maximum point. The regression curve based on each minimum point becomes the lower limit curve for the actual number of grapes belonging to a grape cluster, and the regression curve based on each maximum point becomes the upper limit curve for the actual number of grapes belonging to a grape cluster.

＜粒数算出の検証＞
サンプル画像として１００枚の画像を用意し、検証用画像として２６枚の画像を用意して試験を行った際の粒数算出の結果が図８に示されている。サンプル画像として１００枚の画像の解析に基づき下限曲線４１、上限曲線４２が描かれている。また、検証用画像から計測したぶどう粒の数毎に、ぶどう房に属しているぶどう粒の実際の数がプロットされている。サンプル画像から作成された下限曲線４１及び上限曲線４２によって区切られた範囲内にプロットがほぼ収まったことがわかる。 <Verification of grain number calculation>
FIG. 8 shows the results of calculating the number of grapes when a test was conducted using 100 sample images and 26 verification images. A lower limit curve 41 and an upper limit curve 42 are drawn based on the analysis of the 100 sample images. In addition, the actual number of grapes belonging to a grape cluster is plotted for each number of grapes measured from the verification images. It can be seen that the plots are almost within the range bounded by the lower limit curve 41 and the upper limit curve 42 created from the sample images.

（１－８．処理の流れ）
ぶどう粒検出システム１による処理の流れについて説明する。まず、画像撮影工程が実行される。画像撮影工程では、ユーザ端末２０の撮影部２４により画像Ｐ１を撮影する。画像Ｐ１は、ユーザ端末２０から情報処理装置１０へ送信される。 (1-8. Processing flow)
The following describes the flow of processing by the grape grain detection system 1. First, an image capturing step is executed. In the image capturing step, an image P1 is captured by the image capturing unit 24 of the user terminal 20. The image P1 is transmitted from the user terminal 20 to the information processing device 10.

次に、粒検出工程が実行される。粒検出工程では、画像Ｐ１に含まれるぶどう粒を検出する。粒検出工程と前後して、又は並列的に房特定工程が実行される。房特定工程では、画像Ｐ１に含まれるぶどう房を検出し、画像Ｐ１中におけるぶどう房の位置及びぶどう房の大きさに基づき、検出されたぶどう房から作業中のぶどう房を特定する。そして、統合処理工程が実行される。統合処理工程では、ぶどう粒の検出結果及び房の特定結果に基づき、作業中のぶどう房に属するぶどう粒を決定する。これら粒検出工程、房特定工程、及び統合処理工程を含む工程を画像解析工程と称することがある。 Next, a grain detection process is performed. In the grain detection process, grapes contained in image P1 are detected. A bunch identification process is performed before or after the grain detection process, or in parallel. In the bunch identification process, grape bunches contained in image P1 are detected, and a bunch of grapes currently being worked on is identified from the detected bunches based on the position of the bunch and the size of the bunch in image P1. Then, an integration process is performed. In the integration process, grapes belonging to the bunch currently being worked on are determined based on the results of grape grain detection and bunch identification. The process including the grain detection process, bunch identification process, and integration process is sometimes referred to as an image analysis process.

次に、粒数算出工程が実行される。粒数算出工程では、画像Ｐ１において作業中のぶどう房に属するぶどう粒として検出されたぶどう粒の数を計測し、計測したぶどう粒の数及び所定の係数に基づき、ぶどう房が有する粒の総数の範囲を算出する。 Next, a grape number calculation process is executed. In the grape number calculation process, the number of grapes detected in image P1 as belonging to the grape bunch being worked on is counted, and the range of the total number of grapes in the grape bunch is calculated based on the counted number of grapes and a predetermined coefficient.

その後、解析結果表示工程が実行される。解析結果表示工程では、作業中のぶどう房が有するぶどう粒の総数の範囲をユーザ端末２０の表示部２５に表示する。解析結果表示工程では、ぶどう粒の領域やぶどう房の位置等の解析結果をＡＲ技術によって視界に重ねて表示することも可能である。 Then, an analysis result display process is executed. In the analysis result display process, the range of the total number of grapes in the grape bunch being worked on is displayed on the display unit 25 of the user terminal 20. In the analysis result display process, it is also possible to display the analysis results, such as the area of the grapes and the position of the grape bunch, superimposed on the field of view using AR technology.

＜２．第２実施形態＞
（２－１．第２実施形態に係るぶどう粒検出システム１）
以下、本発明の第２実施形態に係るぶどう粒検出システム１について説明する。第２実施形態におけるぶどう粒検出システム１は、学習モデルに基づいて種々の検出、分類等を行う点で第１実施形態と異なる。以下、相違点を中心に説明する。 <2. Second embodiment>
(2-1. Grape Grain Detection System 1 According to the Second Embodiment)
A grape grain detection system 1 according to a second embodiment of the present invention will be described below. The grape grain detection system 1 according to the second embodiment differs from the first embodiment in that it performs various detections, classifications, etc. based on a learning model. The following description will focus on the differences.

（２－１．粒検出部１１ａの機能）
粒検出部１１ａは、第１学習モデルに基づき画像Ｐ１に含まれるぶどう粒を検出する。すなわち、粒検出部１１ａは、機械学習、ディープラーニング等に基づく画像認識処理を行い、物体の検出や分類等を行う。第１学習モデルとは、ぶどう粒を検出することができる学習モデルであれば特に制限されないが、例えば、多数の教師データ（既知の入力データと正解データの組）を用いてモデルを訓練し、将来の出力を予測可能にする学習モデルである。 (2-1. Functions of the Grain Detector 11a)
The grain detection unit 11a detects grapes contained in the image P1 based on the first learning model. That is, the grain detection unit 11a performs image recognition processing based on machine learning, deep learning, etc., and detects and classifies objects. The first learning model is not particularly limited as long as it is a learning model that can detect grapes, but for example, it is a learning model that trains a model using a large amount of teacher data (a set of known input data and correct answer data) and makes it possible to predict future outputs.

第１学習モデルには、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋｓ）を利用した深層学習モデルを採用することができる。このような学習モデルにおいては、入力画像（画像Ｐ１）に対して畳み込みニューラルネットワークによる処理を行う。畳み込みニューラルネットワークは、畳み込み層とプリーリング層の１以上の組み合わせにより構成される。例えば図９に示すように、入力画像は、第１畳み込み層、第１プーリング層、第２畳み込み層、第２プーリング層・・・第ｎ畳み込み層、第ｎプーリング層により構成される畳み込みニューラルネットワークにより処理されて、特徴マップが生成される。 For the first learning model, a deep learning model using a convolutional neural network (CNN) can be adopted. In such a learning model, the input image (image P1) is processed by the convolutional neural network. The convolutional neural network is composed of one or more combinations of convolutional layers and pooling layers. For example, as shown in FIG. 9, the input image is processed by a convolutional neural network composed of a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, ... an nth convolutional layer, and an nth pooling layer, and a feature map is generated.

第１学習モデルは、画像Ｐ１の特徴量を利用し物体を検出する物体検出器と、画像Ｐ１の特徴量を利用し物体の分類を行う分類器と、を含む学習モデルである。第１学習モデルが、畳み込みニューラルネットワークを利用した深層学習モデルである場合には、物体検出器及び分類器は、上記特徴マップを利用する。 The first learning model is a learning model that includes an object detector that detects objects using the features of image P1, and a classifier that classifies objects using the features of image P1. If the first learning model is a deep learning model that uses a convolutional neural network, the object detector and the classifier use the feature map.

第１学習モデルに含まれる分類器は、特徴量として位置情報を分類に利用しない分類器である非位置的分類器と、特徴量として位置情報を分類に利用する分類器である位置的分類器と、を有し、非位置的分類器による分類結果及び位置的分類器による分類結果に基づき分類するように構成されてもよい。学習モデルにおいて、物体の分類のためには位置に関する情報は特徴量から排除されることが通常である。画像内での位置、例えば物体が画像内で右上に位置していることは、その物体が何であるかということを判別するための分類器においては悪影響を及ぼすことが一般的だからである。 The classifier included in the first learning model may have a non-positional classifier that is a classifier that does not use positional information as a feature for classification, and a positional classifier that is a classifier that uses positional information as a feature for classification, and may be configured to perform classification based on the classification results by the non-positional classifier and the classification results by the positional classifier. In a learning model, information about position is usually excluded from the features for classifying objects. This is because the position in an image, for example an object being located in the upper right corner of an image, usually has a negative effect on a classifier that determines what the object is.

しかし、発明者らは、第１学習モデルに、分類器として、特徴量として位置情報を分類に利用しない分類器である非位置的分類器だけを含む場合に比べ、特徴量として位置情報を分類に利用する分類器である位置的分類器も含める場合に、ぶどう粒をぶどう粒として正しく検出できた割合である再現率が改善されることを見出した。すなわち、物体をぶどう粒としてより正しく分類することができたといえる。 However, the inventors discovered that the recall rate, which is the rate at which grapes are correctly detected as grapes, is improved when the first learning model also includes a positional classifier, which is a classifier that uses position information as a feature for classification, compared to when the first learning model includes only a non-positional classifier, which is a classifier that does not use position information as a feature for classification. In other words, it can be said that objects can be more correctly classified as grapes.

また、特徴量として位置情報を分類に利用する分類器である位置的分類器も含める場合には、作業中のぶどう房に属するぶどう粒以外の検出が抑制された。その結果、検出されたぶどう粒のうちの作業中のぶどう房に属するぶどう粒の割合が向上するという効果も得られた。 In addition, when a positional classifier, which is a classifier that uses position information as a feature for classification, was included, the detection of grapes other than those belonging to the bunch being worked on was suppressed. As a result, the proportion of grapes belonging to the bunch being worked on among the detected grapes was improved.

ここで、特徴量として位置情報は、種々の方法によって入力画像から得ることができるが、例えば、領域提案ネットワーク（ＲＰＮ：ＲｅｓｉｏｎＰｒｏｐｏｓａｌＮｅｔｗｏｒｋ）を利用することによって得てもよい。領域提案ネットワークは、畳み込みニューラルネットワークと同様のネットワーク構造を有してもよい。また、図１０に示すように、領域提案ネットワークが、畳み込みニューラルネットワークにより生成された特徴マップを入力として、位置情報を出力するように構成されてもよい。 Here, the location information as a feature can be obtained from the input image by various methods, but for example, it may be obtained by using a region proposal network (RPN: Region Proposal Network). The region proposal network may have a network structure similar to that of a convolutional neural network. Also, as shown in FIG. 10, the region proposal network may be configured to receive a feature map generated by a convolutional neural network as input and output location information.

畳み込みニューラルネットワーク（ＣＮＮ）及び領域提案ネットワーク（ＲＰＮ）を利用し、さらに特徴量として位置情報を分類に利用する分類器を含む第１学習モデルは、図１１のように表すことができる。すなわち、畳み込みニューラルネットワークにより出力された特徴マップを入力として領域提案ネットワークが位置情報を出力する。特徴マップと位置情報は、これらを統合するＲｏＩプーリング層（ＲｏＩＰｏｏｌｉｎｇ）に入力され、抽出処理が行われ、特徴量が出力される。 The first learning model, which uses a convolutional neural network (CNN) and a region proposal network (RPN) and further includes a classifier that uses location information as a feature for classification, can be expressed as shown in Figure 11. That is, the region proposal network outputs location information using the feature map output by the convolutional neural network as input. The feature map and location information are input to an RoI pooling layer that integrates them, where extraction processing is performed and features are output.

物体検出器及び非位置的分類器には、ＲｏＩプーリング層で抽出された特徴量が入力される。一方、位置的分類器には、ＲｏＩプーリング層で抽出された特徴量に加え、領域提案ネットワークから出力される位置情報を含んだ特徴量も入力される。物体検出器、非位置的分類器、及び位置的分類器は、１層以上によって構成される全結合層等であってよい。 The object detector and the non-positional classifier are input with the features extracted in the RoI pooling layer. Meanwhile, the positional classifier is input with features including position information output from the region proposal network in addition to the features extracted in the RoI pooling layer. The object detector, the non-positional classifier, and the positional classifier may be a fully connected layer composed of one or more layers.

物体検出器からは、検出した物体の画像Ｐ１中における位置、領域等の情報が出力される。図１２の例においては、物体検出の結果に基づいて加工された画像Ｐ５が示されている。この画像Ｐ５では、検出された物体の領域を四角形（バウンディングボックスＢＢ１）で表すために、領域の左下の点Ｄ２と右上の点Ｄ３の座標が出力されている。点Ｄ２は（ｘ_１,ｙ_１）であり、点Ｄ３は（ｘ_２,ｙ_２）である。これら２つの点の座標が物体検出器から出力される情報の一例である。 The object detector outputs information such as the position and area of the detected object in image P1. In the example of Fig. 12, an image P5 processed based on the result of object detection is shown. In this image P5, the coordinates of point D2 in the lower left and point D3 in the upper right of the area are output in order to represent the area of the detected object as a rectangle (bounding box BB1). Point D2 is ( _x1 , _y1 ), and point D3 is ( _x2 , _y2 ). The coordinates of these two points are an example of information output from the object detector.

非位置的分類器からは、検出された物体について特定の物であるかの確率が出力される。粒検出部１１ａでは、第１学習モデルとしてぶどう粒を検出するために学習が行われた学習モデルを利用するため、検出された物体のぶどう粒である確率が出力される。この際、背景である確率も同時に出力し、分類の際に考慮してもよい。 The non-positional classifier outputs the probability that a detected object is a specific object. The grain detection unit 11a uses a learning model that has been trained to detect grapes as the first learning model, and outputs the probability that the detected object is a grape. At this time, the probability that it is the background may also be output at the same time and taken into consideration when classifying.

位置的分類器からは、入力された位置情報も利用する点で異なるが、非位置的分類器と同様に、検出された物体について特定の物であるかの確率が出力される。粒検出部１１ａでは、第１学習モデルとしてぶどう粒を検出するために学習が行われた学習モデルを利用するため、検出された物体のぶどう粒である確率が出力される。この際、背景である確率も同時に出力し、分類の際に考慮してもよい。 The positional classifier differs from the non-positional classifier in that it also uses input position information, but like the non-positional classifier, it outputs the probability that a detected object is a specific object. The grain detection unit 11a uses a learning model that has been trained to detect grapes as the first learning model, so it outputs the probability that the detected object is a grape. At this time, the probability that it is the background may also be output at the same time and taken into consideration when classifying.

非位置的分類器のみを分類器として含む場合には、非位置的分類器が出力する確率が一定以上であるかに基づいて、ぶどう粒か否かを分類する。非位置的分類器とともに位置的分類器を分類器として含む場合には、非位置的分類器が出力する確率と、位置的分類器が出力する確率の両方に基づいてぶどう粒か否かを分類する。より具体的には、例えば、「非位置的分類器が出力する確率」に０．５を掛けた値と、「位置的分類器が出力する確率」に０．５を掛けた値と、を足した値（すなわち、平均値）に基づいて分類する。すなわち、足した値がぶどう粒である確率となり、当該確率が一定以上であるか否かに基づいて、ぶどう粒か否かを分類する。 When the classifier only includes a non-positional classifier, it classifies whether or not it is a grape based on whether the probability output by the non-positional classifier is a certain level or higher. When the classifier includes a positional classifier together with a non-positional classifier, it classifies whether or not it is a grape based on both the probability output by the non-positional classifier and the probability output by the positional classifier. More specifically, for example, it classifies based on the sum (i.e., the average value) of the value obtained by multiplying the "probability output by the non-positional classifier" by 0.5 and the value obtained by multiplying the "probability output by the positional classifier" by 0.5. In other words, the sum is the probability that it is a grape, and it is classified whether or not it is a grape based on whether or not this probability is a certain level or higher.

（２－２．房特定部１１ｂの機能）
房特定部１１ｂは、第２学習モデルに基づき画像Ｐ１に含まれるぶどう房を検出するように構成されてもよい。すなわち、房特定部１１ｂは、機械学習、ディープラーニング等に基づく画像認識処理を行い、物体の検出や分類等を行う。第２学習モデルは、画像Ｐ１の特徴量を利用し物体を検出する物体検出器と、画像Ｐ１の特徴量を利用し物体の分類を行う分類器と、を含む学習モデルである。第２学習モデルには、畳み込みニューラルネットワークを利用した深層学習モデルを採用することができる。 (2-2. Function of the tuft specification unit 11b)
The bunch identification unit 11b may be configured to detect grape bunches included in the image P1 based on the second learning model. That is, the bunch identification unit 11b performs image recognition processing based on machine learning, deep learning, etc., and detects and classifies objects. The second learning model is a learning model including an object detector that detects objects using the feature amount of the image P1, and a classifier that classifies objects using the feature amount of the image P1. A deep learning model using a convolutional neural network can be adopted as the second learning model.

また、第２学習モデルは、第１学習モデルと同一又は異なる学習モデルである。第２学習モデルが、第１学習モデルと同一である場合には、ぶどう粒の検出とぶどう房の検出を同じ学習モデルによって行うように構成することができる。 The second learning model may be the same as or different from the first learning model. If the second learning model is the same as the first learning model, the detection of grapes and the detection of grape clusters may be configured to be performed using the same learning model.

（２－３．第２実施形態の変形例１）
粒検出部１１ａ及び房特定部１１ｂは、細分化された領域毎に分類を行うように構成されてもよい。「細分化された領域毎」とは、例えば、ピクセル毎等である。画像全体に対して細分化された領域毎の分類を行ってもよいが、検出された物体、例えばぶどう粒やぶどう房に対してのみ細分化された領域毎の分類を行うようにしてもよい。細分化された領域毎に分類を行う処理は、セグメンテーションと称することもある。 (2-3. Modification 1 of the second embodiment)
The grain detection unit 11a and the bunch identification unit 11b may be configured to perform classification for each subdivided region. "For each subdivided region" refers to, for example, for each pixel. Classification for each subdivided region may be performed for the entire image, but classification for each subdivided region may also be performed for only detected objects, such as grapes or grape bunches. The process of classifying for each subdivided region is sometimes called segmentation.

細分化された領域毎に分類を行う場合には、例えば、図１３に示すようなモデルを利用することができる。図１３のモデルはマスクを有し、図１１のモデルとはＲｏＩプーリング層で抽出された特徴量がマスクにも入力される点で異なる。このマスクにより、細分化された領域毎に分類が行われる。 When classifying each subdivided region, for example, a model such as that shown in FIG. 13 can be used. The model in FIG. 13 has a mask, and differs from the model in FIG. 11 in that the features extracted in the RoI pooling layer are also input to the mask. This mask is used to classify each subdivided region.

物体検出器による検出結果に基づくだけでは、図１４の画像Ｐ６に示すように、バウンディングボックスＢＢ２で囲むように物体を認識できるだけであるが、マスクを用いることでその物体（ぶどう粒）の実際の輪郭に近い形で物体を認識可能となる。 Based solely on the detection results from the object detector, it is only possible to recognize the object as being surrounded by a bounding box BB2, as shown in image P6 in Figure 14, but by using a mask, it is possible to recognize the object (grapes) in a form that is closer to its actual contours.

（２－４．第２実施形態の変形例２）
図１５に示すように、第１学習モデルは、物体検出器として、第１物体検出器と、第２物体検出器と、を含むように構成されてもよい。第２物体検出器による検出の閾値は、第１物体検出器による検出の閾値と異なる値であるように構成される。また、第２物体検出器は、第１物体検出器の検出結果を利用することができる。このような構成とすることにより、異なる複数の閾値により物体検出が行われ物体の検出における取りこぼしが抑制されることを期待できる。 (2-4. Modification 2 of the Second Embodiment)
As shown in Fig. 15, the first learning model may be configured to include a first object detector and a second object detector as object detectors. A threshold value for detection by the second object detector is configured to be different from a threshold value for detection by the first object detector. In addition, the second object detector can use the detection result of the first object detector. With such a configuration, object detection is performed using a plurality of different threshold values, and it is expected that oversight in object detection is suppressed.

検出の閾値が異なる第１物体検出器及び第２物体検出器は、学習段階においては独立して学習を行うことができる。第１物体検出器及び第２物体検出器は異なる検出の閾値が設定され、学習が行われる。ここで、検出の閾値とは、例えば、ＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）に対する値であり、検出した物体の画像の重なりの割合である。バウンディングボックス同士の重なりと言い換えてもよい。 The first object detector and the second object detector, which have different detection thresholds, can be trained independently during the training phase. Different detection thresholds are set for the first object detector and the second object detector, and training is performed. Here, the detection threshold is, for example, a value for IoU (Intersection over Union), and is the ratio of overlap of the images of the detected objects. It may also be rephrased as the overlap between bounding boxes.

そして、正解が未知のデータに関する推論時に第１物体検出器と第２物体検出器を連携させて用いる。例えば、推論時に図１５におけるようなモデルを用いる場合には、低い閾値で学習を行った第１物体検出器と、より高い閾値で学習を行った第２物体検出器がＲｏｌプーリング層を介して連携されている。低い閾値で学習を行った第１物体検出器では、検出の閾値が低く検出ノイズが比較的多い出力を得ることになる。より高い閾値で学習を行った第２物体検出器では、検出の閾値がより高く検出ノイズが比較的少ない出力を得ることができ、さらに第１物体検出器の出力結果を利用しているため検出精度の向上を期待できる。 The first and second object detectors are then used in conjunction with each other when making inferences about data for which the correct answer is unknown. For example, when using a model such as that shown in FIG. 15 during inference, the first object detector trained with a low threshold and the second object detector trained with a higher threshold are linked via a Rol pooling layer. The first object detector trained with a low threshold will have a low detection threshold and will obtain an output with relatively high detection noise. The second object detector trained with a higher threshold will have a higher detection threshold and can obtain an output with relatively low detection noise, and since the output results of the first object detector are used, improved detection accuracy can be expected.

図１５に示すように、第１学習モデルは、分類器として、第１分類器と、第２分類器と、を含み、第２分類器は、第１物体検出器の検出結果を利用するように構成されてもよい。より具体的には、第２分類器が有する第２位置的分類器が第１物体検出器の検出結果を利用する。第２位置的分類器は、第１物体検出器から位置情報を含む出力を受け取る。このような構成とすることにより、ぶどう粒をぶどう粒として正しく検出できた割合である再現率が向上することが期待できる。 As shown in FIG. 15, the first learning model may include a first classifier and a second classifier as classifiers, and the second classifier may be configured to use the detection results of the first object detector. More specifically, a second positional classifier included in the second classifier uses the detection results of the first object detector. The second positional classifier receives an output including position information from the first object detector. With this configuration, it is expected that the recall rate, which is the rate at which grapes are correctly detected as grapes, will be improved.

制御部１１は、細分化された領域毎に分類を行う細分化領域検出部をさらに備えてもよい。細分化領域検出部は、第１学習モデルに基づき画像に含まれるぶどう粒の領域を細分化された領域毎に検出する。第１学習モデルは、画像Ｐ１の特徴量を利用し細分化された領域毎に物体を分類する（セグメンテーションする）マスクを含むように構成されうる。 The control unit 11 may further include a subdivided area detection unit that performs classification for each subdivided area. The subdivided area detection unit detects grape grain areas included in the image for each subdivided area based on the first learning model. The first learning model may be configured to include a mask that uses the features of the image P1 to classify (segment) objects for each subdivided area.

図１６に示すように、第１学習モデルは、第２ＲｏＩプーリング層が第１物体検出器から位置情報を含む出力を受け取り、第１マスクが第２ＲｏＩプーリング層からの出力に基づき細分化された領域毎に物体を分類した結果を出力するように構成されてもよい。 As shown in FIG. 16, the first learning model may be configured such that the second RoI pooling layer receives an output including position information from the first object detector, and the first mask outputs a result of classifying objects for each subdivided region based on the output from the second RoI pooling layer.

同様に、第２マスクは第３ＲｏＩプーリング層からの出力に基づき、第３マスクは第４ＲｏＩプーリング層からの出力に基づき処理を行う。このように構成することにより、モデルの後ろに位置するＲｏＩプーリング層からの出力を利用することになり、より洗練された特徴量を利用することでより良いセグメンテーションの結果が得られることが期待できる。 Similarly, the second mask is processed based on the output from the third RoI pooling layer, and the third mask is processed based on the output from the fourth RoI pooling layer. By configuring it in this way, the output from the RoI pooling layer located behind the model is used, and it is expected that better segmentation results will be obtained by using more refined features.

第１学習モデルが、特徴量として位置情報を分類に利用する分類器である位置的分類器を含む学習モデルである場合には、学習段階において重み付けの調整等に利用されるロス関数（損失関数）にも位置情報による項を加えてもよい。ロス関数Ｌは、例えば、下記式（１）のように設定されうる。
When the first learning model is a learning model including a positional classifier that uses positional information as a feature for classification, a term based on the positional information may be added to a loss function used for adjusting weights in the learning stage. The loss function L may be set, for example, as shown in the following formula (1).

第１学習モデルが、複数のステージ、第１ステージ～第Ｔステージで構成されている場合のロス関数Ｌについて説明する。なお、図１６では、Ｔ＝３であり、第１ステージ～第３ステージで構成されているといえる。 The loss function L is explained when the first learning model is composed of multiple stages, from the first stage to the Tth stage. In FIG. 16, T=3, which means that the model is composed of the first to third stages.

Ｌ_ｂｂｏｘ ^ｔは、第ｔステージにおけるバウンディングボックス（物体検出器）のロスを表す。Ｌ_ｃｌｓ ^ｔは、第ｔステージにおける非位置的分類器のロスを表す。Ｌ_ｓｃｌｓ ^ｔは、第ｔステージにおける位置的分類器のロスを表す。Ｌ_ｍａｓｋ ^ｔは、第ｔステージにおけるマスクのロスを表す。係数α_ｔは、各ステージのロス関数における寄与度を調整するための係数である。βは、各ロス（Ｌ_ｂｂｏｘ ^ｔ、Ｌ_ｃｌｓ ^ｔ、及びＬ_ｓｃｌｓ ^ｔ）に対する重み付け係数である。このようなロス関数における設計、各係数値の設定は、Cascade R-CNN (Cai and Vasconcelos 2019)、Mask R-CNN (He et al. 2017)、Hybrid Task Cascade (HTC) (Chen, Ouyang, et al. 2019)などを参考にすることができる。 L _bbox ^t represents the loss of the bounding box (object detector) in the tth stage. L _cls ^t represents the loss of the non-positional classifier in the tth stage. L _scls ^t represents the loss of the positional classifier in the tth stage. L _mask ^t represents the loss of the mask in the tth stage. The coefficient α _t is a coefficient for adjusting the contribution in the loss function of each stage. β is a weighting coefficient for each loss (L _bbox ^t , L _cls ^t , and L _scls ^t ). The design of such a loss function and the setting of each coefficient value can be referred to Cascade R-CNN (Cai and Vasconcelos 2019), Mask R-CNN (He et al. 2017), Hybrid Task Cascade (HTC) (Chen, Ouyang, et al. 2019), etc.

各ロスは下記（２）（４）（５）（６）のように定義される。下記式（２）中の「ｓｍｏｏｔｈ_Ｌ１」については、下記式（３）のように定義される。 Each loss is defined as in the following (2), (4), (5), and (6): "smooth _L1 " in the following formula (2) is defined as in the following formula (3).

Ｌ_ｂｂｏｘは、正解とするバンディングボックスと予測したバウンディングバックスそれぞれ4次元のベクトル（位置情報ｘ，ｙとサイズ情報ｗ，ｈ）として表すことができ、そしてロス関数は二つのベクトル間のマンハッタン距離（各次元の座標の差の絶対値の総和）として算出される。ここで、正解とするバンディングボックスについては、ｖ＝（ｖ_ｘ，ｖ_ｙ，ｖ_ｗ，ｖ_ｈ）、予測したバウンディングバックスについては、ｂ＝（ｂ_ｘ，ｂ_ｙ，ｂ_ｗ，ｂ_ｈ）と表すことができる。 _Lbbox can be expressed as a four-dimensional vector (position information x, y and size information w, h) for the correct bounding box and the predicted bounding box, and the loss function is calculated as the Manhattan distance between the two vectors (the sum of the absolute values of the differences in the coordinates of each dimension). Here, the correct bounding box can be expressed as v = ( _vx , _vy , _vw , _vh ), and the predicted bounding box can be expressed as b = ( _bx , _by , _bw , _bh ).

Ｌ_ｃｌｓ ^ｔ及びＬ_ｓｃｌｓ ^ｔは、交差エントロピー（ｃｒｏｓｓ－ｅｎｔｒｏｐｙ：ＣＥ）として定義される（下記式（４））。各分類器の出力はあるクラスである（ラベルで表す）確率であり、評価ロス関数は、分類結果を実際に得られた確率（予測確率）と正解の確率との交差エントロピーとして表す。予測した結果が正解からずれるほど、この交差エントロピーの値が大きくなる。Ｋはモデルにおけるクラスの数であり、ｐは全結合層のソフトマックス関数によって算出された予測確率であり、ｕは各クラス毎の正解（ｇｒｏｕｎｄｔｒｕｔｈ）である。 _Lcls ^t and _Lscls ^t are defined as cross-entropy (CE) (Equation (4) below). The output of each classifier is the probability of a certain class (represented by a label), and the evaluation loss function is expressed as the cross-entropy between the probability of actually obtaining the classification result (prediction probability) and the probability of the correct answer. The more the predicted result deviates from the correct answer, the larger the value of this cross-entropy. K is the number of classes in the model, p is the prediction probability calculated by the softmax function of the fully connected layer, and u is the ground truth for each class.

Ｋが２の場合のバイナリ交差エントロピー（ｂｉｎａｒｙｃｒｏｓｓ－ｅｎｔｒｏｐｙｌｏｓｓ：ＢＣＥ）では、交差エントロピーは下記式（５）によって計算することができる。 For binary cross-entropy loss (BCE) when K is 2, the cross-entropy can be calculated using the following formula (5):

マスクは、ＲｏＩ毎に結果を出力し、Ｋ×ｍ^２次元の結果を出力する。出力は、各次元の値（１ピクセル毎の値）であり、０から１の間の実数である。マスクの正解（ｇｒｏｕｎｄｔｒｕｔｈ）は０か１の２値画像であり、Ｌ_ｍａｓｋはＳｉｇｍｏｉｄｃｒｏｓｓｅｎｔｒｏｐｙを使って計算でき、バイナリ交差エントロピー（ｂｉｎａｒｙｃｒｏｓｓ－ｅｎｔｒｏｐｙｌｏｓｓ：ＢＣＥ）の平均として定義されうる。ｍ_ｐｒｅｄは予測したマスクであり、ｍ_ｇｔは正解である。 The mask outputs a result for each RoI, and outputs a K x m ^two- dimensional result. The output is the value of each dimension (value for each pixel), which is a real number between 0 and 1. The ground truth of the mask is a binary image of 0 or 1, and L _mask can be calculated using sigmoid cross entropy and can be defined as the average of binary cross-entropy loss (BCE). m _pred is the predicted mask, and m _gt is the ground truth.

＜変形例２のモデルによる画像解析の効果の検証＞
発明者らは、第１学習モデルが図１６のような第１ステージ～第３ステージで構成されているモデルについて実験を行った。 <Verification of the effect of image analysis using the model of modification example 2>
The inventors conducted an experiment on a first learning model that is composed of the first to third stages as shown in FIG.

まず、７９０枚の画像でモデルの学習を行った。この学習の際、各ステージは個別に学習が行われた。物体検出器のＵｏＩに関する閾値は、第１物体検出器、第２物体検出器、第３物体検出器の順で大きくなるように設定した。第１物体検出器が最も物体検出に係るノイズを許容するように設定されていた。 First, the model was trained using 790 images. During this training, each stage was trained separately. The thresholds for UoI of the object detectors were set to increase in the order of the first object detector, the second object detector, and the third object detector. The first object detector was set to be most tolerant of noise related to object detection.

学習後、各ステージを図１６の構成となるように構築し、１９８枚の画像でテストした。その結果を表１に示す。Hybrid Task Cascade (HTC) (Chen, Ouyang, et al. 2019)によって報告されているモデルに比べ、擬陽性率（ぶどう粒として検出した対象に占めるぶどう粒でない対象物の割合）は０．０７％低下したが、再現率（ぶどう粒をぶどう粒として検出できた割合）において、１．７７％の改善が見られた。また、既存手法で検出できなかったぶどうを検出できた（図１７）。図１７Ａで丸で囲んだ部３つの部分に存在しているぶどう粒は、従来のモデルでは検出できなかったが、本発明の第１学習モデルでは検出することに成功した（図１７Ｂ）。 After learning, each stage was constructed as shown in Figure 16 and tested with 198 images. The results are shown in Table 1. Compared to the model reported by Hybrid Task Cascade (HTC) (Chen, Ouyang, et al. 2019), the false positive rate (the proportion of non-grape objects among those detected as grapes) decreased by 0.07%, but the recall rate (the proportion of grapes that were detected as grapes) improved by 1.77%. In addition, grapes that could not be detected by existing methods could be detected (Figure 17). The grapes present in the three circled areas in Figure 17A could not be detected by the conventional model, but were successfully detected by the first learning model of the present invention (Figure 17B).

＜３．第３実施形態＞
（３－１．第３実施形態に係るぶどう粒検出システム１）
以下、本発明の第３実施形態に係るぶどう粒検出システム１について説明する。第３実施形態におけるぶどう粒検出システム１は、モデルの学習に利用される教師データとしての画像の準備に特徴を有する。以下、第１実施形態、第２実施形態との相違点を中心に説明する。 <3. Third embodiment>
(3-1. Grape Grain Detection System 1 According to the Third Embodiment)
A grape grain detection system 1 according to a third embodiment of the present invention will be described below. The grape grain detection system 1 according to the third embodiment is characterized in the preparation of images as teacher data used for model learning. The following description will focus on the differences from the first and second embodiments.

（２－２．情報処理装置１０の機能構成）
制御部１１は、教師データ生成部と、モデル学習部をさらに有する。教師データ生成部は、学習モデルが学習段階で用いる教師データ（既知の入力データと正解データの組）を生成する。モデル学習部は、生成された教師データを用いてモデルを訓練する。 (2-2. Functional configuration of information processing device 10)
The control unit 11 further includes a teacher data generation unit and a model learning unit. The teacher data generation unit generates teacher data (a set of known input data and correct answer data) used by the learning model in the learning stage. The model learning unit trains the model using the generated teacher data.

（２－３．教師データ生成部）
教師データ生成部は、画像内にあるぶどう粒を一粒ずつ除去していくことで摘粒していく過程の画像を擬似的に合成する（教師データ生成工程）。教師データ生成工程は、円形度計算工程と、除去候補抽出工程と、除去補完工程とを含んでよい。 (2-3. Teacher data generation unit)
The teacher data generating unit synthesizes an image of a process of removing grapes one by one from the image (teacher data generating step). The teacher data generating step may include a circularity calculating step, a removal candidate extracting step, and a removal complementing step.

円形度計算工程では、画像内においてぶどう粒であることが判明しているぶどう粒の領域について円形度を計算する。除去候補抽出工程では、ぶどう粒の領域の円形度が閾値以下であるぶどう粒を除去する候補として抽出する。除去補完工程では、抽出されたぶどう粒の中からランダムで一つ選択し除去し、除去した部分を背景等で画像補完を行う。画像補完はIizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. "Globally and Locally Consistent Image Completion." ACM Transactions on Graphics (Proc. of SIGGRAPH) 36(4): 107.の技術を応用することができる。除去補完工程は、抽出された候補がなくなるまで繰り返す。繰り返し毎に摘粒状態の異なる画像を生成することができる。また、除去前を最初から、又は途中からやり直すことでランダムに摘粒された異なるパターンの画像を生成することができる。 In the circularity calculation process, the circularity is calculated for the grape berry regions in the image that are known to be grape berries. In the removal candidate extraction process, grape berries whose grape berry region has a circularity below a threshold are extracted as candidates for removal. In the removal and completion process, one grape berry is randomly selected from the extracted grape berries and removed, and the removed part is image-completed with the background or the like. Image completion can be performed using the technology described in Iizuka, Satoshi, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. "Globally and Locally Consistent Image Completion." ACM Transactions on Graphics (Proc. of SIGGRAPH) 36(4): 107. The removal and completion process is repeated until there are no more candidates extracted. An image with a different berry-picking state can be generated with each repetition. Also, by starting over from the beginning or partway through the process before removal, an image with a different pattern of random berry-picking can be generated.

図１８Ａでは、ぶどう粒５１の向こう側にぶどう粒５２が位置している。このような画像において、ぶどう粒５２の領域は、ぶどう粒５１等に比べ円形度が低いといえる。円形度計算工程において、ぶどう粒５２の領域の円形度を算出し、ぶどう粒５２の領域の円形度が閾値以下である場合には除去候補抽出工程において除去候補として抽出される（図１８Ｂ）。そして、除去補完工程において、ぶどう粒５２は削除され、削除によってできた画像中の穴（空白領域）をぶどう粒でない背景画像の特徴に基づいて補完する（図１８Ｃ）。 In Figure 18A, grape 52 is located behind grape 51. In such an image, the region of grape 52 has a lower circularity than grape 51 and the like. In the circularity calculation process, the circularity of the region of grape 52 is calculated, and if the circularity of the region of grape 52 is equal to or less than a threshold, it is extracted as a removal candidate in the removal candidate extraction process (Figure 18B). Then, in the removal and completion process, grape 52 is deleted, and the hole (blank region) in the image created by the deletion is completed based on the characteristics of the background image that is not a grape (Figure 18C).

＜合成画像を用いた学習の効果の検証＞
発明者らは、５０枚のオリジナル画像とそれらを元に合成された７９０枚の合成画像でモデルの学習を行った。そして、別の１０枚のオリジナル画像とそれらを元に合成された１９８枚の合成画像を用いてモデルのぶどう粒の検出の性能を検証した。学習にオリジナル画像５０枚のみを用いた場合に比べ、再現率、偽陽性率ともに向上していることがわかる。なお、検証にはChen, Kai, Wanli Ouyang, et al. 2019. "Hybrid Task Cascade for Instance Segmentation." Proceedings of the IEEE International Conference on Computer Vision: 4969-78. http://arxiv.org/abs/1901.07518 (November 21, 2019).の深層学習モデルを使用した。 <Verification of the effect of learning using synthetic images>
The inventors trained the model using 50 original images and 790 synthetic images synthesized from them. Then, the performance of the model to detect grapes was verified using another 10 original images and 198 synthetic images synthesized from them. It can be seen that both the recall rate and the false positive rate are improved compared to when only 50 original images are used for learning. For the verification, the deep learning model of Chen, Kai, Wanli Ouyang, et al. 2019. "Hybrid Task Cascade for Instance Segmentation." Proceedings of the IEEE International Conference on Computer Vision: 4969-78. http://arxiv.org/abs/1901.07518 (November 21, 2019). was used.

このような画像合成技術を用いて、オリジナル画像から教師データとなりうる多数の合成画像を生成することで、第２実施形態等で用いられる第１学習モデル等における性能を向上させることができる。 By using such image synthesis technology to generate a large number of synthetic images from original images that can serve as training data, it is possible to improve the performance of the first learning model used in the second embodiment, etc.

１：粒検出システム，５：通信回線，１０：情報処理装置，１１：制御部，１１ａ：粒検出部，１１ｂ：房特定部，１１ｃ：統合処理部，１１ｄ：粒数算出部，１２：記憶部，１３：通信部，１４：操作入力部，１５：モニタ，１６：システムバス，２０：ユーザ端末，２１：制御部，２１ａ：画像撮影部，２１ｂ：解析結果表示部，２２：記憶部，２３：通信部，２４：撮影部，２５：表示部，２６：スピーカ，２７：システムバス，３０：画像解析部，４１：下限曲線，４２：上限曲線，５１、５２、ＤＧ、ＵＧ：ぶどう粒，Ｂ１～Ｂ３、Ｂ１：ぶどう房，ＢＢ１、ＢＢ２：バウンディングボックス，Ｄ１～Ｄ３:点，Ｐ１～Ｐ６、Ｐ３'：画像，Ｗ：作業者，ＷＡＮ：無線 1: Grain detection system, 5: Communication line, 10: Information processing device, 11: Control unit, 11a: Grain detection unit, 11b: Bunch identification unit, 11c: Integration processing unit, 11d: Grain number calculation unit, 12: Memory unit, 13: Communication unit, 14: Operation input unit, 15: Monitor, 16: System bus, 20: User terminal, 21: Control unit, 21a: Image capture unit, 21b: Analysis result display unit, 22: Memory unit, 23: Communication unit, 24: Capture unit, 25: Display unit, 26: Speaker, 27: System bus, 30: Image analysis unit, 41: Lower limit curve, 42: Upper limit curve, 51, 52, DG, UG: Grape grain, B1 to B3, B1: Grape bunch, BB1, BB2: Bounding box, D1 to D3: Points, P1 to P6, P3': Images, W: Operator, WAN: Wireless

Claims

An information processing device for detecting grapes from an image,
The apparatus includes a grain detection unit, a bunch identification unit, and an integration processing unit,
The grain detection unit detects the grapes included in the image,
The tuft specifying unit is
Detecting grape clusters contained in the image;
Identifying a bunch of grapes that is to be subjected to a grape thinning operation from among the detected bunches of grapes based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image;
The integrated processing unit determines grapes belonging to a grape bunch that is to be thinned based on the detection result of the grapes and the identification result of the grape bunch.
Information processing device.

2. The information processing device according to claim 1,
The particle detection unit is
Detecting the grapes included in the image based on a first learning model;
The first learning model is a learning model including an object detector that detects an object by using a feature amount of the image, and a classifier that classifies the object by using the feature amount of the image.
Information processing device.

3. The information processing device according to claim 2,
The classifier included in the first learning model is
A non-positional classifier that is a classifier that does not use position information as a feature for classification, and a positional classifier that is a classifier that uses position information as a feature for classification,
classifying based on the classification result by the non-positional classifier and the classification result by the positional classifier;
Information processing device.

4. The information processing device according to claim 2,
The tuft specifying unit is
Detecting grape bunches included in the image based on a second learning model;
The second learning model is a learning model including an object detector that detects an object by using a feature amount of the image, and a classifier that classifies the object by using the feature amount of the image,
The second learning model is the same as or different from the first learning model.
Information processing device.

The information processing device according to any one of claims 1 to 4,
Further comprising a particle number calculation unit,
The particle number calculation unit is
Counting the number of grapes detected in the image as belonging to the grape bunch that is to be thinned ;
Calculating the range of the total number of grapes in the grape cluster based on the number of grapes measured and a predetermined coefficient;
Information processing device.

1. A system for detecting grapes from an image, comprising:
The apparatus includes an image capturing unit and an image analyzing unit,
The image capturing unit captures the image,
The image analysis unit includes a grain detection unit, a bunch identification unit, and an integration processing unit.
The grain detection unit detects the grapes included in the image,
The tuft specifying unit is
Detecting grape clusters contained in the image;
Identifying a bunch of grapes that is to be subjected to a grape thinning operation from among the detected bunches of grapes based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image;
The integrated processing unit determines grapes belonging to a grape bunch that is to be thinned based on the detection result of the grapes and the identification result of the grape bunch.
system.

7. The system of claim 6,
Further comprising an analysis result display unit;
The analysis result display unit displays the range of the total number of grapes in the grape clusters that are the subject of the grape thinning operation .
system.

A program for detecting grapes from an image, comprising:
A computer is caused to execute a kernel detection step, a bunch identification step, and an integration processing step;
In the grain detection step, the grape grains included in the image are detected,
In the tuft identification step,
Detecting grape clusters contained in the image;
Identifying a bunch of grapes that is to be subjected to a grape thinning operation from among the detected bunches of grapes based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image;
In the integration process, grapes belonging to a grape bunch that is to be thinned are determined based on the detection result of the grapes and the identification result of the grape bunch.
program.

An information processing method for detecting grapes from an image, comprising the steps of:
The method includes a grain detection step, a bunch identification step, and an integration processing step,
In the grain detection step, the grape grains included in the image are detected,
In the tuft identification step,
Detecting grape clusters contained in the image;
Identifying a bunch of grapes that is to be subjected to a grape thinning operation from among the detected bunches of grapes based on the positions of the bunches of grapes and the sizes of the bunches of grapes in the image;
In the integration process, grapes belonging to a grape bunch that is to be thinned are determined based on the detection result of the grapes and the identification result of the grape bunch.
Information processing methods.