JP7208503B2

JP7208503B2 - Machine learning program, machine learning method and machine learning apparatus

Info

Publication number: JP7208503B2
Application number: JP2019042111A
Authority: JP
Inventors: 由信飯村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2023-01-19
Anticipated expiration: 2039-03-08
Also published as: JP2020144720A

Description

本発明は機械学習プログラム、機械学習方法および機械学習装置に関する。 The present invention relates to a machine learning program, a machine learning method and a machine learning apparatus.

コンピュータを利用したデータ分析として、機械学習が行われることがある。機械学習では、結果が既知である複数の事例を示す訓練データをコンピュータに入力する。コンピュータは、訓練データを分析して、要因（説明変数や独立変数と言うことがある）と結果（目的変数や従属変数と言うことがある）との間の関係を一般化した予測モデルを生成する。生成された予測モデルを用いることで、未知の結果を予測することができる。 Machine learning is sometimes performed as data analysis using a computer. In machine learning, a computer is fed training data representing multiple instances with known outcomes. A computer analyzes training data to generate a predictive model that generalizes the relationship between factors (sometimes called explanatory or independent variables) and outcomes (sometimes called objective or dependent variables). do. An unknown result can be predicted by using the generated prediction model.

機械学習は、農作物の収穫予測に用いられることがある。例えば、農作物の最適収穫日を予測する予測装置が提案されている。提案の予測装置は、収穫前の異なる複数の日に撮像された農作物の画像と、当該農作物が実際に収穫された収穫日とを含む教師データを収集する。予測装置は、教師データから機械学習により予測モデルを生成し、予測モデルに対象の農作物の画像を入力して対象の農作物の収穫日を予測する。 Machine learning is sometimes used for crop yield prediction. For example, a prediction device has been proposed that predicts the optimal harvest date of agricultural products. The proposed prediction device collects teacher data including images of crops taken on different days before harvest and harvest dates when the crops were actually harvested. The prediction device generates a prediction model from the teacher data by machine learning, inputs an image of the target crop to the prediction model, and predicts the harvest date of the target crop.

特開２０１８－１６９９９３号公報JP 2018-169993 A

農作物の収穫予測を可能とする機械学習では、気温や日射量などの育成環境を説明変数とし、着果日などの基準日から収穫日までの所要日数を目的変数とする予測モデルを生成することが考えられる。しかし、農作物は、同じ育成環境のもとで育てても成長速度が異なるという個体差をもっている。特に、一部の種類の農作物は個体差が大きい。これに対して、一般的な機械学習は、説明変数の１つの値に対して目的変数の１つの期待値（最も可能性が高い値）を算出する予測モデルを生成する。その結果、実際は収穫日にばらつきが生じるにもかかわらず、予測モデルによれば多くの農作物の予測収穫日が特定の日に集中することになり、実情から乖離した予測結果となってしまうおそれがある。 In machine learning, which makes it possible to predict the yield of agricultural crops, a prediction model is generated that uses the growth environment such as temperature and amount of sunlight as explanatory variables, and the required number of days from the reference date, such as the date of fruiting, to the harvest date as the objective variable. can be considered. However, crops have individual differences in that they grow at different rates even if they are grown under the same growing environment. In particular, some types of crops have large individual differences. In contrast, general machine learning generates a predictive model that calculates one expected value (most likely value) of the objective variable for one value of the explanatory variable. As a result, although the actual harvest dates vary, the prediction model will concentrate the harvest dates for many crops on a specific date, which may lead to prediction results that deviate from the actual situation. be.

１つの側面では、本発明は、農作物の収穫予測の精度が向上した予測モデルを生成する機械学習プログラム、機械学習方法および機械学習装置を提供することを目的とする。 In one aspect, an object of the present invention is to provide a machine learning program, a machine learning method, and a machine learning apparatus that generate a prediction model with improved accuracy in crop yield prediction.

１つの態様では、コンピュータに以下の処理を実行させる機械学習プログラムが提供される。それぞれ標本農作物の育成環境の情報と所定の状態が観測された基準日から標本農作物の収穫日までの所要日数とを対応付けた複数のレコードを含む訓練データと、複数のレコードが示す複数の標本農作物および他の農作物を含む農作物集合について収穫日に対する収穫数の実績分布を示す総数データとを取得する。育成環境の情報から所要日数の確率分布を算出する予測モデルを生成し、訓練データを用いて、予測モデルにより算出される確率分布の誤差を評価して予測モデルを更新することを繰り返す学習処理を開始する。学習処理の途中において、複数のレコードが示す育成環境の情報から予測モデルにより算出される複数の確率分布を合成して、収穫日に対する収穫数の予測分布を算出し、予測分布と総数データが示す実績分布との間の類似度に基づいて、学習処理の停止タイミングを判定する。 In one aspect, a machine learning program is provided that causes a computer to perform the following processes. Training data including multiple records each of which is associated with information on the growing environment of sample crops and the required number of days from the reference date when a predetermined state was observed to the harvest date of sample crops, and multiple samples indicated by the multiple records Total data indicating the actual distribution of the number of harvests with respect to the harvest date is acquired for the crop set including the crop and other crops. A learning process that repeats the process of generating a prediction model that calculates the probability distribution of the number of days required from the training environment information, evaluating the error in the probability distribution calculated by the prediction model using the training data, and updating the prediction model. Start. In the middle of the learning process, combine multiple probability distributions calculated by the prediction model from information on the growing environment indicated by multiple records, calculate the predicted distribution of the number of harvests for the harvest date, and calculate the predicted distribution and the total number data. The timing to stop the learning process is determined based on the degree of similarity with the actual distribution.

また、１つの態様では、コンピュータが実行する機械学習方法が提供される。また、１つの態様では、記憶部と処理部とを有する機械学習装置が提供される。 Also, in one aspect, a computer-implemented machine learning method is provided. Also, in one aspect, a machine learning device having a storage unit and a processing unit is provided.

１つの側面では、農作物の収穫予測の精度が向上した予測モデルを生成できる。 In one aspect, it is possible to generate a prediction model with improved accuracy in crop yield prediction.

第１の実施の形態の機械学習装置の例を説明する図である。It is a figure explaining the example of the machine-learning apparatus of 1st Embodiment. 第２の実施の形態の情報処理システムの例を示す図である。It is a figure which shows the example of the information processing system of 2nd Embodiment. 機械学習装置のハードウェア例を示す図である。It is a figure which shows the hardware example of a machine-learning apparatus. 収穫予測のデータフローの例を示す図である。It is a figure which shows the example of the data flow of harvest prediction. 期待値を出力する予測モデルの使用例を示す図である。It is a figure which shows the usage example of the prediction model which outputs an expected value. 確率分布を出力する予測モデルの使用例を示す図である。It is a figure which shows the usage example of the prediction model which outputs a probability distribution. 学習不足の予測モデルの使用例を示す図である。It is a figure which shows the example of use of the prediction model of under-learning. 過学習した予測モデルの使用例を示す図である。It is a figure which shows the usage example of the prediction model which over-trained. 機械学習の停止タイミング例を示す図である。It is a figure which shows the stop timing example of machine learning. 機械学習のデータフローの例を示す図である。FIG. 4 is a diagram illustrating an example of a machine learning data flow; 機械学習装置の機能例を示すブロック図である。3 is a block diagram showing an example of functions of a machine learning device; FIG. 気象データと標本データと総数データのテーブル例を示す図である。It is a figure which shows the example of a table of meteorological data, sample data, and total data. 訓練データテーブルの例を示す図である。It is a figure which shows the example of a training data table. 機械学習の手順例を示すフローチャートである。5 is a flow chart showing an example of machine learning procedure. 機械学習の手順例を示すフローチャート（続き）である。10 is a flowchart (continued) showing an example of the machine learning procedure; 収穫予測の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of harvest prediction.

以下、本実施の形態を図面を参照して説明する。
［第１の実施の形態］
第１の実施の形態を説明する。 Hereinafter, this embodiment will be described with reference to the drawings.
[First embodiment]
A first embodiment will be described.

図１は、第１の実施の形態の機械学習装置の例を説明する図である。
第１の実施の形態の機械学習装置１０は、農作物の収穫予測に用いる予測モデルを機械学習によって生成する。機械学習装置１０を、情報処理装置やコンピュータと言うこともある。機械学習装置１０は、クライアント装置でもよいしサーバ装置でもよい。 FIG. 1 is a diagram illustrating an example of a machine learning device according to the first embodiment.
The machine learning device 10 according to the first embodiment uses machine learning to generate a prediction model used for crop yield prediction. The machine learning device 10 is also called an information processing device or a computer. The machine learning device 10 may be a client device or a server device.

機械学習装置１０は、記憶部１１および処理部１２を有する。記憶部１１は、ＲＡＭ（Random Access Memory）などの揮発性の半導体メモリでもよいし、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの不揮発性ストレージでもよい。処理部１２は、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＤＳＰ（Digital Signal Processor）などのプロセッサである。ただし、処理部１２は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの特定用途の電子回路を含んでもよい。プロセッサは、ＲＡＭなどのメモリ（記憶部１１でもよい）に記憶されたプログラムを実行する。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 A machine learning device 10 has a storage unit 11 and a processing unit 12 . The storage unit 11 may be a volatile semiconductor memory such as a RAM (Random Access Memory), or may be a non-volatile storage such as an HDD (Hard Disk Drive) or flash memory. The processing unit 12 is, for example, a processor such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a DSP (Digital Signal Processor). However, the processing unit 12 may include electronic circuits for specific purposes such as ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays). The processor executes a program stored in a memory such as RAM (which may be the storage unit 11). A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

記憶部１１は、訓練データ１３および総数データ１４を記憶する。訓練データ１３および総数データ１４は、収穫済みの農作物に関する履歴データである。訓練データ１３および総数データ１４は、前年に収穫された農作物など過去に収穫された農作物の育成状況や収穫状況を示す。訓練データ１３が示す農作物と総数データ１４が示す農作物は、同じ年に収穫されたものである。訓練データ１３および総数データ１４は、１年分の農作物を示す単年データであってもよいし、複数年分の農作物が混在した複数年データであってもよい。農作物には果実が含まれ得る。果実は、野菜や果物など植物の食用の実であり、農家によって栽培される。農作物は、同じ育成環境で育てても成長に個体差があり、収穫可能日にばらつきが生じる。農作物は、パプリカなど成長の個体差が大きい種類でもよい。 Storage unit 11 stores training data 13 and total number data 14 . The training data 13 and the total data 14 are historical data regarding harvested crops. The training data 13 and the total data 14 indicate the growth status and harvest status of past harvested crops such as crops harvested in the previous year. The crops indicated by the training data 13 and the crops indicated by the total data 14 were harvested in the same year. The training data 13 and the total data 14 may be single-year data representing agricultural crops for one year, or may be multi-year data in which agricultural crops for multiple years are mixed. Crops may include fruits. Fruits are the edible fruits of plants, such as vegetables and fruits, and are grown by farmers. Agricultural crops have individual differences in growth even if they are grown in the same growing environment, resulting in variations in harvestable days. The crops may be of a variety with large individual differences in growth, such as paprika.

訓練データ１３は、収穫された農作物全体（農作物集合）の一部である標本農作物に関する複数のレコードを含む。農作物集合のうち標本農作物は、育成状況や収穫状況について個別の詳細情報を収集したものである。農作物集合のうち他の農作物は、個別の詳細情報を収集しなかったものである。農作物集合に対する標本農作物の割合（標本割合）は、０．０１％～０．３％程度でよい。詳細情報の収集には手間がかかるためである。 Training data 13 includes a plurality of records for sample crops that are part of an entire harvested crop (crop set). A sample crop in a set of crops is a collection of individual detailed information about the growth status and harvesting status. Other crops in the crop collection did not collect individual detailed information. The ratio of sample crops to the crop set (sample ratio) may be about 0.01% to 0.3%. This is because it takes time and effort to collect detailed information.

訓練データ１３に含まれる複数のレコードは、それぞれ、標本農作物の育成環境の情報と所要日数とを対応付けている。育成環境の情報は、気温や日射量など農作物の成長と相関のある指標を含む。例えば、育成環境の情報は、下記の基準日から収穫日までの間の平均気温および平均日射量を含む。ただし、農作物の成長と相関が認められれば、基準日より前の気温や日射量を用いてもよいし、累積気温や累積日射量を用いてもよい。標本農作物によって基準日が異なることがある。基準日が異なると、結果的にその標本農作物に対応付けられる育成環境の情報も変わることがある。所要日数は、標本農作物について所定の状態が観測された基準日から、当該標本農作物が収穫された収穫日までの日数である。例えば、基準日は、植物が実をつけたことが観測された日（着果日）である。ただし、基準日は、着果前に植物が所定の状態になった日でもよいし、着果後に標本農作物が所定の状態になった日でもよい。収穫管理が週単位で行われている場合、所要日数の単位が週であってもよい。 A plurality of records included in the training data 13 each associate information on the growing environment of the sample crop with the required number of days. The information on the growing environment includes indices that are correlated with the growth of crops, such as temperature and amount of solar radiation. For example, the growing environment information includes the following average temperature and average solar radiation from the reference date to the harvest date. However, if a correlation with the growth of crops is recognized, the temperature and the amount of insolation before the reference date may be used, or the cumulative temperature and the amount of insolation may be used. The base date may differ depending on the sample crop. If the reference date changes, the growing environment information associated with the sample crop may also change accordingly. The required number of days is the number of days from the reference date when a predetermined state of the sample crop was observed to the harvest date when the sample crop was harvested. For example, the reference date is the date when it is observed that the plant bears fruit (fruit bearing date). However, the reference date may be the day when the plant reaches a predetermined state before bearing fruit, or the day when the sample crop reaches a predetermined state after bearing fruit. When harvest management is performed on a weekly basis, the unit of required days may be a week.

総数データ１４は、訓練データ１３が示す標本農作物および他の農作物を含む農作物集合について、収穫日に対する収穫数（収穫された農作物の個数）の実績分布を示す。各農作物は、十分に成長したと農家が判断した日に植物から切り離されて収穫される。着果日の違いや成長の個体差により、収穫日にばらつきが生じる。収穫管理が週単位で行われている場合、総数データ１４は、収穫日の属する週に対する収穫数の実績分布を示してもよい。総数データ１４は、出荷管理のために収集され、訓練データ１３よりも収集の手間が少ない。 The total number data 14 indicates the actual distribution of the number of harvests (the number of harvested crops) with respect to the harvest date for the crop set including the sample crops and other crops indicated by the training data 13 . Each crop is detached from the plant and harvested on the day the farmer determines it is mature enough. Harvest dates vary due to differences in fruiting dates and individual differences in growth. When harvest management is performed on a weekly basis, the total number data 14 may indicate the actual distribution of the number of harvests for the week to which the harvest date belongs. The total data 14 is collected for shipping management, and requires less effort than the training data 13 to collect.

例えば、訓練データ１３に含まれる１つのレコードは、特定の平均気温や平均日射量のもとで育成された標本農作物について、着果日から収穫日までの所要日数が８週間であったという情報を示す。また、訓練データ１３に含まれる別のレコードは、別の平均気温や平均日射量のもとで育成された標本農作物について、着果日から収穫日までの所要日数が７週間であったという情報を示す。総数データ１４は、標本農作物および他の農作物を含む１２，０００個の農作物のうち、ある週に３，７００個が収穫され、その次の週に５，８００個が収穫され、その次の週に２，５００個が収穫されたという情報を示す。 For example, one record included in the training data 13 is information that the required number of days from the date of bearing fruit to the date of harvest was 8 weeks for a sample crop grown under a specific average temperature and average amount of solar radiation. indicates Another record included in the training data 13 is information that the number of days required from the date of bearing fruit to the date of harvest was 7 weeks for the sample crop grown under another average temperature and average amount of solar radiation. indicates Total data 14 shows that out of 12,000 crops including sample crops and other crops, 3,700 were harvested in one week, 5,800 were harvested in the next week, and 5,800 were harvested in the next week. shows the information that 2,500 were harvested.

処理部１２は、学習処理１５を実行して予測モデル１６を生成する。予測モデル１６の生成には、遺伝的プログラミング（ＧＰ：Genetic Programming）、重回帰分析、ニューラルネットワーク（ＮＮ：Neural Network）など、様々な機械学習アルゴリズムを使用することが可能である。予測モデル１６は、育成環境の情報を説明変数として受け付け、所要日数の確率分布を目的変数として出力する統計モデルである。訓練データ１３の所要日数の単位が週である場合、予測モデル１６は、週数の確率分布を出力するようにしてもよい。予測モデル１６は、最も確率が高い所要日数（所要日数の期待値）のみを出力する代わりに、複数の所要日数それぞれの確率を出力するように学習される。例えば、予測モデル１６は、特定の平均気温および平均日射量に対して、７週間が３０％、８週間が５０％、９週間が２０％という確率分布を出力する。 The processing unit 12 executes a learning process 15 to generate a prediction model 16 . Various machine learning algorithms such as genetic programming (GP), multiple regression analysis, and neural network (NN) can be used to generate the prediction model 16 . The prediction model 16 is a statistical model that receives information on the growing environment as an explanatory variable and outputs a probability distribution of required days as an objective variable. When the training data 13 indicates the required number of days in weeks, the prediction model 16 may output the probability distribution of the number of weeks. The prediction model 16 is learned so as to output probabilities for each of a plurality of required days instead of outputting only the required days with the highest probability (expected value of required days). For example, the forecast model 16 outputs a probability distribution of 30% for 7 weeks, 50% for 8 weeks, and 20% for 9 weeks for a particular average temperature and average solar radiation.

学習処理１５では、処理部１２は、訓練データ１３を用いて、予測モデル１６により算出される確率分布の誤差を評価して予測モデル１６を更新することを繰り返す。例えば、処理部１２は、訓練データ１３に含まれる複数のレコードそれぞれについて、当該レコードが示す育成環境の情報を予測モデル１６に入力し、当該レコードが示す所要日数を用いて、予測モデル１６が出力する確率分布の誤差を評価する。そして、例えば、処理部１２は、誤差が小さくなるように、予測モデル１６に含まれる係数を更新する。ニューラルネットワークの場合、ノード間のエッジ（シナプス）の重みが更新される。 In the learning process 15 , the processing unit 12 repeats updating the prediction model 16 by evaluating the error of the probability distribution calculated by the prediction model 16 using the training data 13 . For example, the processing unit 12 inputs the training environment information indicated by each of the plurality of records included in the training data 13 to the prediction model 16, and the prediction model 16 outputs using the required number of days indicated by the record. Evaluate the error of the probability distribution that Then, for example, the processing unit 12 updates the coefficients included in the prediction model 16 so that the error becomes smaller. For neural networks, the weights of edges (synapses) between nodes are updated.

ここで、処理部１２は、学習処理１５において予測モデル１６を更新する繰り返し（イテレーション）の回数を制御する。イテレーション回数が少ない場合、予測モデル１６が出力する確率分布は、訓練データ１３に対する誤差が大きく訓練データ１３へのフィッティング精度が低いものとなる。イテレーション回数の増加に応じて、予測モデル１６が出力する確率分布は、訓練データ１３に対する誤差が段階的に小さくなり、訓練データ１３へのフィッティング精度が段階的に高くなっていく。 Here, the processing unit 12 controls the number of iterations of updating the prediction model 16 in the learning process 15 . When the number of iterations is small, the probability distribution output by the prediction model 16 has a large error with respect to the training data 13 and a low fitting accuracy to the training data 13 . As the number of iterations increases, the error of the probability distribution output by the prediction model 16 with respect to the training data 13 decreases step by step, and the accuracy of fitting to the training data 13 increases step by step.

ただし、訓練データ１３が示す標本農作物は全体の農作物集合に対して少数であると共に、農作物の成長には個体差がある。よって、訓練データ１３が示す所要日数の標本は、全体の農作物集合に対する真実の確率分布を忠実に表しているとは限らず、バイアスが存在する。このため、イテレーション回数を増やし過ぎると、過学習により、予測モデル１６が訓練データ１３に過度にフィットするものとなってしまう。過学習された予測モデル１６が出力する確率分布は、分散が過度に小さいものとなり、個体差により所要日数がばらつくという真実の確率分布から乖離したものとなるおそれがある。 However, the number of sample crops represented by the training data 13 is small with respect to the entire set of crops, and there are individual differences in the growth of the crops. Therefore, the samples of required days indicated by the training data 13 do not necessarily faithfully represent the true probability distribution for the entire crop set, and bias exists. Therefore, if the number of iterations is increased too much, the prediction model 16 will fit the training data 13 excessively due to over-learning. The probability distribution output by the over-learned prediction model 16 has an excessively small variance, and may deviate from the true probability distribution in which the number of required days varies due to individual differences.

そこで、処理部１２は、総数データ１４を参照して、学習処理１５における予測モデル１６の更新を適切なタイミングで停止するようにする。
具体的には、処理部１２は、学習処理１５の途中において、訓練データ１３に含まれる複数のレコードが示す育成環境の情報から、現在の予測モデル１６により複数の確率分布を算出し、これら複数の確率分布を合成して予測分布１７を算出する。予測分布１７は、例えば、予測モデル１６が更新される毎に算出される。予測分布１７は、標本農作物および他の農作物を含む農作物集合について収穫日に対する収穫数の分布を予測したものである。 Therefore, the processing unit 12 refers to the total number data 14 and stops updating the prediction model 16 in the learning process 15 at an appropriate timing.
Specifically, in the middle of the learning process 15, the processing unit 12 calculates a plurality of probability distributions using the current prediction model 16 from the training environment information indicated by the plurality of records included in the training data 13. are combined to calculate a prediction distribution 17. The prediction distribution 17 is calculated, for example, each time the prediction model 16 is updated. The predicted distribution 17 is a predicted distribution of the number of harvests with respect to the harvest date for a set of crops including sample crops and other crops.

訓練データ１３が、着果日などの基準日が異なる標本農作物のデータを含んでいる場合、例えば、訓練データ１３に基準日を含めておき、予測モデル１６が出力する複数の確率分布を基準日に応じてシフトして合成すればよい。予測モデル１６が週数の確率分布を出力する場合、予測分布１７は、収穫日の属する週に対する収穫数の分布を示してもよい。また、処理部１２は、標本割合を用いて、標本農作物の収穫数の予測分布を全体の農作物集合の収穫数の予測分布１７に変換してもよい。例えば、予測分布１７は、標本農作物および他の農作物を含む１２，０００個の農作物のうち、ある週に３，６００個が収穫され、その次の週に６，０００個が収穫され、その次の週に２，４００個が収穫されるという予測を示す。 When the training data 13 includes data of sample crops with different reference dates such as fruit bearing dates, for example, the reference dates are included in the training data 13, and a plurality of probability distributions output by the prediction model 16 are used as reference dates. It suffices to shift and synthesize them according to . When the prediction model 16 outputs the probability distribution of the number of weeks, the prediction distribution 17 may indicate the distribution of the number of harvests for the week to which the harvest date belongs. The processing unit 12 may also convert the predicted distribution of the number of harvests of the sample crops into the predicted distribution 17 of the number of harvests of the entire crop set using the sample ratio. For example, the prediction distribution 17 shows that, of 12,000 crops, including sample crops and other crops, 3,600 will be harvested one week, 6,000 will be harvested the next week, and 6,000 will be harvested the next week. Estimated harvest of 2,400 per week.

そして、処理部１２は、予測分布１７と総数データ１４が示す実績分布との間の類似度を評価し、類似度に基づいて学習処理１５の停止タイミングを判定する。学習処理１５の初期では、予測モデル１６を更新する毎に予測モデル１６の出力が真実の確率分布に近付き、その結果として予測分布１７が総数データ１４に近付く。一方、過学習になると、予測モデル１６を更新する毎に予測モデル１６の出力が過度に分散の小さいものとなり真実の確率分布から遠ざかり、その結果として予測分布１７が総数データ１４から遠ざかる。 Then, the processing unit 12 evaluates the degree of similarity between the predicted distribution 17 and the actual distribution indicated by the total number data 14, and determines the timing of stopping the learning process 15 based on the degree of similarity. At the beginning of the learning process 15 , the output of the prediction model 16 approaches the true probability distribution each time the prediction model 16 is updated, and as a result the prediction distribution 17 approaches the total data 14 . On the other hand, when overfitting occurs, the output of the prediction model 16 has an excessively small variance every time the prediction model 16 is updated, moving away from the true probability distribution.

そこで、例えば、処理部１２は、予測モデル１６が更新される毎に類似度を評価して類似度のピークを検出し、ピークが検出されると学習処理１５を停止して、ピークに対応する予測モデル１６を学習結果として出力する。処理部１２は、予測分布１７と総数データ１４が示す実績分布との間の類似度を示す指標として両者の誤差（総数誤差）を算出し、総数誤差が最小になるタイミングを検出するようにしてもよい。誤差は、収穫日毎に予測収穫数と実績収穫数の差の二乗を合計した残差平方和でもよい。また、処理部１２は、類似度の評価結果が、予測分布１７と総数データ１４が示す実績分布とが所定の基準以上類似することを示す場合に、学習処理１５を停止することとしてもよい。 Therefore, for example, the processing unit 12 evaluates the similarity each time the prediction model 16 is updated, detects a peak of the similarity, and stops the learning process 15 when the peak is detected. The prediction model 16 is output as a learning result. The processing unit 12 calculates the error (total error) between the predicted distribution 17 and the actual distribution indicated by the total data 14 as an index indicating the degree of similarity between the two, and detects the timing when the total error becomes the minimum. good too. The error may be the residual sum of squares obtained by summing the squares of the difference between the predicted harvest number and the actual harvest number for each harvest date. Further, the processing unit 12 may stop the learning process 15 when the similarity evaluation result indicates that the predicted distribution 17 and the actual distribution indicated by the total data 14 are similar to each other by a predetermined criterion or more.

第１の実施の形態の機械学習装置１０によれば、気温や日射量などの育成環境の情報から、着果日などの基準日から収穫日までの所要日数を予測する予測モデル１６が生成される。よって、農作物の収穫日および収穫数の予測が可能となる。また、予測モデル１６は、所要日数の期待値ではなく所要日数の確率分布を出力するように学習される。よって、同じ育成環境のもとで育てても成長速度が異なるという農作物の個体差の性質を考慮して、収穫日のばらつきを表現することが可能となる。 According to the machine learning device 10 of the first embodiment, the prediction model 16 that predicts the required number of days from the reference date such as the date of fruiting to the harvest date is generated from the information of the growing environment such as the temperature and the amount of solar radiation. be. Therefore, it is possible to predict the harvest date and the number of crops harvested. Also, the prediction model 16 is learned so as to output the probability distribution of the number of required days instead of the expected value of the number of required days. Therefore, it is possible to express variations in harvest dates, taking into consideration the individual differences in crops that grow at different growth rates even under the same growing environment.

また、訓練データ１３に含まれる個々のレコードに対して予測モデル１６の予測結果の誤差を評価することに加え、訓練データ１３の全体から予測される収穫数の予測分布１７と総数データ１４が示す収穫数の実績分布との間の類似度が評価される。そして、この類似度に基づいて、学習処理１５による予測モデル１６の更新のイテレーションが停止される。よって、過学習により予測モデル１６が過度に分散の小さい確率分布を出力するようになることを抑制でき、予測モデル１６の予測精度が向上する。 In addition to evaluating the error of the prediction result of the prediction model 16 for each record included in the training data 13, the prediction distribution 17 of the number of harvests predicted from the entire training data 13 and the total number data 14 are shown The degree of similarity between the actual distribution of harvest numbers is evaluated. Based on this degree of similarity, the iteration of updating the prediction model 16 by the learning process 15 is stopped. Therefore, it is possible to prevent the prediction model 16 from outputting a probability distribution with an excessively small variance due to over-learning, and the prediction accuracy of the prediction model 16 is improved.

特に、個々の標本農作物について基準日や収穫日などの詳細情報を収集することは農家の負担が大きいことから、訓練データ１３が十分な数の標本農作物のデータを含んでいないことがある。また、成長の個体差から、訓練データ１３が示す所要日数にはバイアスがある。このような訓練データ１３を用いて予測モデル１６を生成すると、過学習が生じた場合に、予測モデル１６が不適切な分散をもつ確率分布を出力してしまう可能性が高い。これに対して、機械学習装置１０によれば、過学習が抑制され、予測モデル１６が適切な分散をもつ確率分布を出力でき、収穫日のばらつきを表現することが可能となる。 In particular, collecting detailed information such as reference dates and harvest dates for individual sample crops imposes a heavy burden on farmers. Also, due to individual differences in growth, the number of required days indicated by the training data 13 is biased. If the prediction model 16 is generated using such training data 13, there is a high possibility that the prediction model 16 will output a probability distribution with inappropriate variance when overfitting occurs. On the other hand, according to the machine learning device 10, over-learning can be suppressed, the prediction model 16 can output a probability distribution with an appropriate variance, and it is possible to express variations in harvest dates.

なお、少ない訓練データから、できる限り予測精度の高い予測モデルを生成する機械学習技術として、クロスバリデーション法がある。クロスバリデーション法では、データ集合を複数のブロック（例えば、１０個のブロック）に分割し、それら複数のブロックの１つをテストデータとして選択し、残りのブロック（例えば、９個のブロック）を訓練データとして選択する。訓練データを用いて予測モデルを生成し、テストデータを用いて予測モデルの予測精度を測定する。テストデータとして選択するブロックを変えることで、予測モデルの生成を複数回（例えば、１０回）繰り返す。 In addition, there is a cross-validation method as a machine learning technique for generating a prediction model with the highest possible prediction accuracy from a small amount of training data. The cross-validation method divides the dataset into multiple blocks (e.g., 10 blocks), selects one of the multiple blocks as test data, and trains the remaining blocks (e.g., 9 blocks). Select as data. A prediction model is generated using training data, and the prediction accuracy of the prediction model is measured using test data. Generation of the prediction model is repeated multiple times (for example, 10 times) by changing the blocks selected as test data.

すなわち、クロスバリデーション法は、訓練データに含まれるレコードを入れ替えながら予測モデルの生成を繰り返すことで、できる限り予測精度の高い予測モデルが生成されるレコードの組み合わせを発見するものである。しかし、使用可能なレコードが非常に少ない場合、データ集合を適切に複数のブロックに分割することが難しく、クロスバリデーション法によっても農作物の収穫予測の精度を向上させることは容易でない。 That is, the cross-validation method finds a combination of records that generates a prediction model with the highest possible prediction accuracy by repeating prediction model generation while replacing records included in training data. However, when the number of available records is very small, it is difficult to properly divide the data set into multiple blocks, and it is not easy to improve the accuracy of crop yield prediction even by the cross-validation method.

［第２の実施の形態］
次に、第２の実施の形態を説明する。
図２は、第２の実施の形態の情報処理システムの例を示す図である。 [Second embodiment]
Next, a second embodiment will be described.
FIG. 2 is a diagram illustrating an example of an information processing system according to the second embodiment.

第２の実施の形態の情報処理システムは、機械学習を利用して農作物の収穫日および収穫数を予測する。収穫日および収穫数の予測は、農家が出荷先と契約する際の基礎資料として使用することができる。第２の実施の形態の情報処理システムは、成長の個体差が大きく収穫日のばらつきが大きい農作物の管理に好適である。第２の実施の形態では、農作物の種類としてパプリカを想定する。ただし、第２の実施の形態の情報処理システムは、パプリカ以外の農作物の管理に適用することも可能である。 The information processing system of the second embodiment uses machine learning to predict the harvest date and the number of crops. Harvest date and harvest number forecasts can be used by farmers as a basis for contracts with shipping destinations. The information processing system of the second embodiment is suitable for managing crops with large individual differences in growth and large variations in harvest dates. In the second embodiment, paprika is assumed as the type of crop. However, the information processing system of the second embodiment can also be applied to management of crops other than paprika.

第２の実施の形態の情報処理システムは、ビニールハウス２０、ネットワーク３０、気象データサーバ３１および機械学習装置１００を含む。
ビニールハウス２０の屋内は、パプリカを栽培する農地として標本栽培エリア２１および一般栽培エリア２２を含む。標本栽培エリア２１で栽培されるパプリカは、農家が着果日および収穫日を個別に観測する標本果実である。一般栽培エリア２２で栽培されるパプリカは、着果日および収穫日を個別に観測しない果実である。標本栽培エリア２１で栽培される標本果実は、標本栽培エリア２１と一般栽培エリア２２を合わせた果実全体の約０．１％である。ただし、出荷管理のため、収穫日毎の収穫総数はカウントされる。また、農地を標本栽培エリア２１と一般栽培エリア２２に分けず、農地内に点在する幾つかの木の果実を標本果実として選択してもよい。また、図２には１つのビニールハウスを示しているが、農地が複数のビニールハウスに分割されていてもよい。 The information processing system of the second embodiment includes a greenhouse 20, a network 30, a weather data server 31 and a machine learning device 100. FIG.
The interior of the greenhouse 20 includes a sample cultivation area 21 and a general cultivation area 22 as farmland for cultivating paprika. The paprika cultivated in the sample cultivation area 21 is a sample fruit for which a farmer individually observes the date of fruiting and the date of harvest. The paprika cultivated in the general cultivation area 22 is a fruit whose fruiting date and harvest date are not individually observed. The sample fruit cultivated in the sample cultivation area 21 accounts for about 0.1% of the total fruit in the sample cultivation area 21 and the general cultivation area 22 combined. However, for shipping management, the total number of harvests for each harvest day is counted. Moreover, instead of dividing the farmland into the sample cultivation area 21 and the general cultivation area 22, fruits of several trees scattered in the farmland may be selected as sample fruits. Moreover, although one greenhouse is shown in FIG. 2, the farmland may be divided into a plurality of greenhouses.

ビニールハウス２０の屋内には、センサ２３が設置されている。センサ２３は、少なくとも気温および日射量を測定するセンサデバイスである。センサ２３が測定する気温および日射量は、ビニールハウス２０の屋内のものであり、屋外の気温および日射量とは異なる。センサ２３は、測定されたデータを定期的に所定の情報処理装置に送信する。 A sensor 23 is installed inside the greenhouse 20 . The sensor 23 is a sensor device that measures at least temperature and solar radiation. The temperature and amount of solar radiation measured by the sensor 23 are those inside the greenhouse 20, and are different from the temperature and amount of solar radiation outside. The sensor 23 periodically transmits measured data to a predetermined information processing device.

ネットワーク３０は、インターネットなどの広域データ通信ネットワークを含む。ネットワーク３０には、気象データサーバ３１および機械学習装置１００が接続されている。センサ２３がネットワーク３０に接続されることもある。 Network 30 includes a wide area data communication network such as the Internet. A weather data server 31 and a machine learning device 100 are connected to the network 30 . A sensor 23 may also be connected to the network 30 .

気象データサーバ３１は、現在日以降の気象予報を示す気象予報データを提供するサーバコンピュータである。気象予報データは、公的機関または民間気象会社により提供される。気象データサーバ３１は、機械学習装置１００からの要求に応じて気象予報データを機械学習装置１００に送信する。気象予報データは、現在日以降の屋外の予報気温および予報日射量を含む。予報気温および予報日射量は、１時間毎の数値であることが好ましい。数値は、翌日の午前６時の気温および日射量といった日単位の予報でもよいし、翌週の午前６時の平均気温および平均日射量といった週単位の予報でもよいし、翌月の午前６時の平均気温および平均日射量といった月単位の予報でもよい。 The weather data server 31 is a server computer that provides weather forecast data indicating the weather forecast for the current day and later. Weather forecast data are provided by public agencies or private weather companies. Weather data server 31 transmits weather forecast data to machine learning device 100 in response to a request from machine learning device 100 . The weather forecast data includes forecast outdoor temperature and forecast solar radiation from the current day onwards. The forecast air temperature and the forecast solar radiation amount are preferably hourly values. The numbers can be daily forecasts such as next day temperature and radiation at 6:00 am, weekly forecasts such as next week's average temperature and radiation at 6:00 am, and next month's averages at 6:00 am. Monthly forecasts such as temperature and average solar radiation are also acceptable.

機械学習装置１００は、機械学習により予測モデルを生成し、予測モデルを用いてパプリカの収穫日および収穫数を予測するコンピュータである。機械学習装置１００は、過年度（例えば、前年度）における個々の標本果実の着果日および収穫日を示す標本データを収集する。また、機械学習装置１００は、センサ２３によって測定された気温および日射量を示す気象データを収集する。機械学習装置１００は、標本データおよび気象データを用いて、パプリカの着果から収穫までに要する所要日数を、着果から収穫までの間の気温および日射量から予測する予測モデルを生成する。 The machine learning device 100 is a computer that generates a prediction model by machine learning and uses the prediction model to predict the harvest date and harvest number of paprika. The machine learning device 100 collects sample data indicating the fruit bearing date and the harvest date of each sample fruit in the past year (eg, the previous year). The machine learning device 100 also collects weather data indicating the temperature and the amount of solar radiation measured by the sensor 23 . The machine learning device 100 uses sample data and weather data to generate a prediction model that predicts the number of days required from bearing fruit to harvesting of paprika from the temperature and the amount of insolation during the period from bearing fruit to harvesting.

機械学習装置１００は、標本栽培エリア２１において今年度のパプリカの着果が観測されてから収穫時期になる前に、予測モデルを用いて所要日数を予測する。このとき、機械学習装置１００は、気象データサーバ３１から気象予報データを受信する。予測モデルの入力に対応する気温および日射量は、予測日以前の期間についてはセンサ２３で測定されたものを使用し、予測日以降の期間については気象予報データのものを使用する。機械学習装置１００は、標本果実の着果日および着果数と、予測モデルが出力する所要日数と、標本果実の割合（０．１％）から、パプリカの収穫日および収穫数を予測する。 The machine learning device 100 uses a prediction model to predict the number of days required after the paprika of the current year is observed to bear fruit in the sample cultivation area 21 and before the harvest time. At this time, the machine learning device 100 receives weather forecast data from the weather data server 31 . For the temperature and solar radiation corresponding to the input of the prediction model, those measured by the sensor 23 are used for the period before the prediction date, and the weather forecast data are used for the period after the prediction date. The machine learning device 100 predicts the harvest date and number of paprika from the date and number of fruit bearing of the sample fruit, the required number of days output by the prediction model, and the ratio (0.1%) of the sample fruit.

なお、機械学習装置１００は、クライアントコンピュータでもよいしサーバコンピュータでもよい。また、機械学習装置１００は、農家が所有するコンピュータでもよいし、データセンタなど情報処理事業者が所有するコンピュータでもよい。機械学習装置１００を農家が所有している場合、例えば、機械学習装置１００は、広域データ通信ネットワークを介さずにセンサ２３から気象データを受信し、農家のユーザから標本データの入力を受け付ける。機械学習装置１００を農家が所有していない場合、例えば、機械学習装置１００は、農家が所有する端末装置から、広域データ通信ネットワークを介して気象データおよび標本データを受信する。また、第２の実施の形態では、機械学習装置１００が予測モデルの生成と予測モデルを用いた収穫予測の両方を行っているが、両者を異なるコンピュータが行うようにしてもよい。例えば、サーバコンピュータが予測モデルを生成し、クライアントコンピュータが予測モデルを用いて収穫予測を行ってもよい。 Note that the machine learning device 100 may be a client computer or a server computer. Further, the machine learning device 100 may be a computer owned by a farmer, or may be a computer owned by an information processing business such as a data center. When the machine learning device 100 is owned by a farmer, for example, the machine learning device 100 receives weather data from the sensor 23 without going through a wide area data communication network, and accepts input of sample data from the farmer's user. When the farmer does not own the machine learning device 100, for example, the machine learning device 100 receives weather data and sample data from a terminal device owned by the farmer via a wide area data communication network. Further, in the second embodiment, the machine learning device 100 performs both prediction model generation and harvest prediction using the prediction model, but different computers may perform both. For example, a server computer may generate a forecast model, and a client computer may use the forecast model to make harvest forecasts.

図３は、機械学習装置のハードウェア例を示す図である。
機械学習装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＨＤＤ１０３、画像インタフェース１０４、入力インタフェース１０５、媒体リーダ１０６および通信インタフェース１０７を有する。上記ユニットはバスに接続されている。ＣＰＵ１０１は、第１の実施の形態の処理部１２に対応する。ＲＡＭ１０２またはＨＤＤ１０３は、第１の実施の形態の記憶部１１に対応する。気象データサーバ３１なども同様のハードウェアを有する。 FIG. 3 is a diagram illustrating a hardware example of a machine learning device;
Machine learning device 100 has CPU 101 , RAM 102 , HDD 103 , image interface 104 , input interface 105 , medium reader 106 and communication interface 107 . The above units are connected to a bus. A CPU 101 corresponds to the processing unit 12 of the first embodiment. A RAM 102 or HDD 103 corresponds to the storage unit 11 of the first embodiment. The weather data server 31 and the like also have similar hardware.

ＣＰＵ１０１は、プログラムの命令を実行するプロセッサである。ＣＰＵ１０１は、ＨＤＤ１０３に記憶されたプログラムやデータの少なくとも一部をＲＡＭ１０２にロードし、プログラムを実行する。なお、ＣＰＵ１０１は複数のプロセッサコアを備えてもよく、機械学習装置１００は複数のプロセッサを備えてもよい。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The CPU 101 is a processor that executes program instructions. The CPU 101 loads at least part of the programs and data stored in the HDD 103 into the RAM 102 and executes the programs. Note that the CPU 101 may include multiple processor cores, and the machine learning device 100 may include multiple processors. A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

ＲＡＭ１０２は、ＣＰＵ１０１が実行するプログラムやＣＰＵ１０１が演算に使用するデータを一時的に記憶する揮発性の半導体メモリである。なお、機械学習装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the CPU 101 and data used by the CPU 101 for calculation. Note that the machine learning device 100 may be provided with a type of memory other than the RAM, and may be provided with a plurality of memories.

ＨＤＤ１０３は、ＯＳ（Operating System）やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性ストレージである。なお、機械学習装置１００は、フラッシュメモリやＳＳＤ（Solid State Drive）など他の種類のストレージを備えてもよく、複数のストレージを備えてもよい。 The HDD 103 is a nonvolatile storage that stores an OS (Operating System), software programs such as middleware and application software, and data. Note that the machine learning device 100 may include other types of storage such as flash memory and SSD (Solid State Drive), or may include multiple storages.

画像インタフェース１０４は、ＣＰＵ１０１からの命令に従って、機械学習装置１００に接続された表示装置１１１に画像を出力する。表示装置１１１として、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、有機ＥＬ（ＯＥＬ：Organic Electro-Luminescence）ディスプレイ、プロジェクタなど、任意の種類の表示装置を使用することができる。また、機械学習装置１００に、プリンタなど表示装置１１１以外の出力デバイスが接続されてもよい。 The image interface 104 outputs an image to the display device 111 connected to the machine learning device 100 according to instructions from the CPU 101 . As the display device 111, any type of display device can be used, such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD: Liquid Crystal Display), an organic EL (OEL: Organic Electro-Luminescence) display, or a projector. . Also, an output device other than the display device 111, such as a printer, may be connected to the machine learning device 100. FIG.

入力インタフェース１０５は、機械学習装置１００に接続された入力デバイス１１２から入力信号を受け付ける。入力デバイス１１２として、マウス、タッチパネル、タッチパッド、キーボードなど、任意の種類の入力デバイスを使用することができる。また、機械学習装置１００に複数種類の入力デバイスが接続されてもよい。 Input interface 105 receives an input signal from input device 112 connected to machine learning apparatus 100 . Input device 112 can be any type of input device such as a mouse, touch panel, touch pad, keyboard, or the like. Also, multiple types of input devices may be connected to the machine learning device 100 .

媒体リーダ１０６は、記録媒体１１３に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体１１３として、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤなどの磁気ディスク、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの光ディスク、半導体メモリなど、任意の種類の記録媒体を使用することができる。媒体リーダ１０６は、例えば、記録媒体１１３から読み取ったプログラムやデータを、ＲＡＭ１０２やＨＤＤ１０３などの他の記録媒体にコピーする。読み取られたプログラムは、例えば、ＣＰＵ１０１によって実行される。なお、記録媒体１１３は可搬型記録媒体であってもよく、プログラムやデータの配布に用いられることがある。また、記録媒体１１３やＨＤＤ１０３を、コンピュータ読み取り可能な記録媒体と言うことがある。 The medium reader 106 is a reading device that reads programs and data recorded on the recording medium 113 . Any type of recording medium can be used as the recording medium 113, such as magnetic disks such as flexible disks (FDs) and HDDs, optical disks such as CDs (Compact Discs) and DVDs (Digital Versatile Discs), and semiconductor memories. can be done. The medium reader 106 copies, for example, programs and data read from the recording medium 113 to other recording media such as the RAM 102 and the HDD 103 . The read program is executed by the CPU 101, for example. Note that the recording medium 113 may be a portable recording medium, and may be used for distribution of programs and data. Also, the recording medium 113 and the HDD 103 may be referred to as a computer-readable recording medium.

通信インタフェース１０７は、ネットワーク３０に接続され、気象データサーバ３１など他の情報処理装置と通信する。通信インタフェース１０７は、スイッチやルータなどの有線通信装置に接続される有線通信インタフェースでもよいし、基地局やアクセスポイントなどの無線通信装置に接続される無線通信インタフェースでもよい。 The communication interface 107 is connected to the network 30 and communicates with other information processing devices such as the weather data server 31 . The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or router, or a wireless communication interface connected to a wireless communication device such as a base station or access point.

次に、予測モデルを用いた収穫予測の方法について説明する。なお、第２の実施の形態では、着果の観測および収穫の管理は週単位で行うものとする。そこで、標本データの着果日および収穫日は１週間のうちの特定の曜日の日付である。また、予測モデルが出力する所要日数は週数を示し、予測収穫日は１週間のうちの特定の曜日の日付となる。 Next, a method of harvest prediction using a prediction model will be described. Note that in the second embodiment, observation of fruiting and management of harvest are performed on a weekly basis. Therefore, the date of bearing fruit and the date of harvest of the sample data are the dates of a specific day of the week. Moreover, the required number of days output by the prediction model indicates the number of weeks, and the predicted harvest date is the date of a specific day of the week.

図４は、収穫予測のデータフローの例を示す図である。
ある日に１以上の標本果実の着果が観測されると、標本データとして着果日２１１および標本着果数２１２が採取される。着果日２１１は農家が着果を観測した日であり、標本着果数２１２はその日に着果した標本果実の個数である。例えば、着果日２１１が１０月２３日であり、標本着果数２１２が５個である。 FIG. 4 is a diagram showing an example of a data flow for harvest prediction.
When one or more sample fruits are observed to bear on a certain day, a date of bearing of fruits 211 and a number of specimens of fruit bearing 212 are collected as sample data. The date of bearing fruit 211 is the date on which the farmer observed the bearing of fruits, and the number of specimen bearing fruits 212 is the number of specimen bearing fruits on that day. For example, the fruit bearing date 211 is October 23, and the sample number of fruit bearing 212 is five.

すると、着果日２１１から収穫時期までの期間について、説明変数としてビニールハウス２０の屋内の平均気温２１３および平均日射量２１４が算出される。平均気温２１３および平均日射量２１４は、それぞれ１時間毎の数値である。よって、説明変数は４８次元のベクトルとなる。なお、第２の実施の形態では、説明変数として平均気温および平均日射量を使用しているが、累積気温や累積日射量など他の指標とすることも可能である。 Then, the average temperature 213 and the average amount of solar radiation 214 indoors of the greenhouse 20 are calculated as explanatory variables for the period from the fruit bearing date 211 to the harvest time. The average temperature 213 and the average amount of solar radiation 214 are values for each hour. Therefore, explanatory variables are 48-dimensional vectors. In the second embodiment, the average temperature and the average amount of solar radiation are used as explanatory variables, but other indicators such as the cumulative temperature and the cumulative amount of solar radiation may be used.

平均気温２１３および平均日射量２１４は以下のように算出される。着果日から予測日の前日までの期間については、センサ２３により測定されたビニールハウス２０の屋内の測定気温２２１および測定日射量２２２が使用される。予測日から収穫時期までの期間については、気象予報データの予報気温２２３および予報日射量２２４が使用される。 The average temperature 213 and the average amount of solar radiation 214 are calculated as follows. For the period from the date of bearing fruit to the day before the predicted date, the measured temperature 221 and the amount of solar radiation 222 inside the greenhouse 20 measured by the sensor 23 are used. For the period from the forecast date to the harvest time, forecast temperature 223 and forecast solar radiation 224 of the weather forecast data are used.

ただし、予報気温２２３および予報日射量２２４は、屋外の気温および日射量である。そこで、環境パラメータ２２７を用いて、予報気温２２３がビニールハウス２０の屋内の予想気温２２５に変換され、予報日射量２２４がビニールハウス２０の屋内の予想日射量２２６に変換される。環境パラメータ２２７は、屋外の気温と屋内の気温の関係、および、屋外の日射量と屋内の日射量の関係を示す。例えば、環境パラメータ２２７は、屋外の気温を屋内の気温に変換する一次式と、屋外の日射量を屋内の日射量に変換する一次式とを含む。環境パラメータ２２７は予め用意されている。環境パラメータ２２７は、ビニールハウス毎に個別に調整された個別パラメータであってもよいし、様々なビニールハウスに共通に適用される汎用パラメータであってもよい。 However, the forecast air temperature 223 and the forecast solar radiation amount 224 are outdoor air temperature and solar radiation amount. Therefore, using the environmental parameter 227 , the predicted temperature 223 is converted into the predicted indoor temperature 225 of the greenhouse 20 , and the predicted solar radiation amount 224 is converted into the predicted indoor solar radiation amount 226 of the greenhouse 20 . The environmental parameters 227 indicate the relationship between outdoor temperature and indoor temperature, and the relationship between outdoor solar radiation and indoor solar radiation. For example, environmental parameters 227 include a linear expression for converting outdoor temperature to indoor temperature and a linear expression for converting outdoor solar radiation to indoor solar radiation. Environmental parameters 227 are prepared in advance. The environmental parameters 227 may be individual parameters individually adjusted for each greenhouse, or may be general parameters commonly applied to various greenhouses.

測定気温２２１および予想気温２２５が平均化されて平均気温２１３が算出され、測定日射量２２２および予想日射量２２６が平均化されて平均日射量２１４が算出される。すると、予め生成された予測モデル２１０に平均気温２１３および平均日射量２１４が入力され、予測モデル２１０から所要日数２１５が出力される。所要日数２１５は、着果から収穫までの日数の予測である。着果日２１１に所要日数２１５を加えることで、収穫日２１６が算出される。収穫日２１６は、着果日２１１に着果した果実の好適な収穫日の予測である。例えば、所要日数２１５が８週間であり、収穫日２１６が１２月１８日である。 Measured temperature 221 and expected temperature 225 are averaged to calculate average temperature 213 , and measured solar radiation 222 and expected solar radiation 226 are averaged to calculate average solar radiation 214 . Then, an average temperature 213 and an average amount of solar radiation 214 are input to a prediction model 210 generated in advance, and a required number of days 215 is output from the prediction model 210 . The required number of days 215 is an estimate of the number of days from fruit setting to harvest. The harvest date 216 is calculated by adding the required number of days 215 to the fruit bearing date 211 . The harvest date 216 is a prediction of a suitable harvest date for the fruit that bears fruit on the fruit bearing date 211 . For example, the number of days required 215 is eight weeks and the harvest date 216 is December 18th.

また、標本着果数２１２から標本収穫数２１７が算出される。標本収穫数２１７は、着果日２１１に観測された標本果実のうち、収穫日２１６に収穫される標本果実の個数の予測である。ここでは、予測モデル２１０が所要日数２１５として所要日数の期待値を出力することを想定しているため、標本収穫数２１７は標本着果数２１２と同じである。例えば、標本収穫数２１７は５個である。ただし、後述するように、所要日数の確率分布を出力する予測モデルを生成することもできる。その場合、標本収穫数２１７は、所要日数毎の標本果実の個数を示す。所要日数毎の標本果実の個数は、標本着果数２１２に所要日数毎の確率を乗ずることで算出することができる。 Also, the sample harvest number 217 is calculated from the sample fruit bearing number 212 . The number of harvested samples 217 is a prediction of the number of sample fruits that will be harvested on the harvest date 216 among the sample fruits observed on the fruit bearing date 211 . Here, it is assumed that the prediction model 210 outputs the expected value of the required number of days as the required number of days 215 , so the number of sample harvests 217 is the same as the number of sample fruit bearings 212 . For example, the sample harvest number 217 is five. However, as will be described later, it is also possible to generate a prediction model that outputs a probability distribution of required days. In that case, the sample harvest number 217 indicates the number of sample fruits for each required number of days. The number of sample fruits for each required number of days can be calculated by multiplying the sample fruit bearing number 212 by the probability for each required number of days.

そして、標本収穫数２１７が収穫数２１８に変換される。収穫数２１８は、着果日２１１に着果したと予測される果実であって、収穫日２１６に収穫される果実の個数の予測である。着果日２１１に着果したと予測される果実には、観測された標本果実とそれ以外の果実とが含まれる。収穫数２１８は、標本収穫数２１７と標本割合２１９から算出される。標本割合２１９は、果実全体に対する標本果実の割合である。収穫数２１８は、標本収穫数２１７を標本割合２１９で割る、すなわち、標本収穫数２１７に標本割合２１９の逆数を乗ずることで算出できる。例えば、標本割合２１９が０．１％であり、収穫数２１８が５００個÷０．１％＝５００個×１，０００＝５，０００個である。 Then, the sample harvest number 217 is converted to the harvest number 218 . The number of harvests 218 is a prediction of the number of fruits that are predicted to bear on the date 211 of fruiting and will be harvested on the date 216 of harvesting. The fruit predicted to bear fruit on the fruit-bearing date 211 includes observed sample fruit and other fruits. The number of harvests 218 is calculated from the number of sample harvests 217 and the sample ratio 219 . The sample ratio 219 is the ratio of the sample fruit to the whole fruit. The harvested number 218 can be calculated by dividing the sample harvested number 217 by the sample ratio 219 , that is, by multiplying the sample harvested number 217 by the reciprocal of the sample ratio 219 . For example, the sample rate 219 is 0.1%, and the number of harvests 218 is 500/0.1%=500×1,000=5,000.

このようにして、着果日２１１に着果した果実について、収穫日２１６に収穫数２１８の果実が収穫されると予測される。例えば、１０月２３日に着果した果実について、１２月１８日に５，０００個の果実が収穫されると予測される。異なる着果日の予測を合算することで、全体の収穫日および収穫数を予測することが可能である。 In this way, it is predicted that 218 fruits will be harvested on the harvest date 216 for the fruits that bear fruit on the fruit bearing date 211 . For example, for a fruit set on October 23rd, 5,000 fruits are expected to be harvested on December 18th. By summing the predictions for different fruiting dates, it is possible to predict the overall harvest date and number of crops.

ただし、パプリカは成長の個体差が大きいため、実際には着果日が同じでも収穫日にばらつきが生じる。このため、所要日数の期待値を出力する予測モデル２１０を使用すると、全体の収穫日および収穫数の予測が実情から乖離するおそれがある。 However, paprika has a large individual difference in growth, so even if the fruiting date is the same, the harvest date varies. Therefore, if the prediction model 210 that outputs the expected value of the required number of days is used, there is a risk that the prediction of the overall harvest date and number of harvests will deviate from the actual situation.

図５は、期待値を出力する予測モデルの使用例を示す図である。
異なる着果日に標本着果数２３１，２３２，２３３が測定されたとする。標本着果数２３１は、１０月２３日に着果が観測された５個の標本果実を示す。標本着果数２３２は、１０月３０日に着果が観測された３個の標本果実を示す。標本着果数２３３は、１１月６日に着果が観測された４個の標本果実を示す。 FIG. 5 is a diagram showing a usage example of a prediction model that outputs expected values.
Suppose that sample fruit bearing numbers 231, 232, and 233 are measured on different fruit bearing days. The number of sample fruit bearing 231 indicates 5 sample fruits for which fruit bearing was observed on October 23rd. The number of sample fruit bearing 232 indicates three sample fruit for which fruit bearing was observed on October 30th. The sample fruit bearing number 233 indicates 4 sample fruits on which fruit bearing was observed on November 6th.

標本着果数２３１，２３２，２３３それぞれに対して所要日数が予測される。ここでは、所要日数の期待値を出力する予測モデルを使用するものとする。すると、標本着果数２３１に対して、１０月２３日以降の平均気温および平均日射量から所要日数の期待値が算出される。標本着果数２３２に対して、１０月３０日以降の平均気温および平均日射量から所要日数の期待値が算出される。標本着果数２３３に対して、１１月６日以降の平均気温および平均日射量から所要日数の期待値が算出される。異なる着果日に対しては異なる平均気温および平均日射量が使用されるため、異なる所要日数の期待値が算出され得る。ここでは、標本着果数２３１に対する所要日数が８週間、標本着果数２３２に対する所要日数が７週間、標本着果数２３３に対する所要日数が６週間である。 The required number of days is predicted for each of the sample fruit bearing numbers 231 , 232 , and 233 . Here, it is assumed that a prediction model that outputs the expected value of required days is used. Then, the expected value of the required number of days is calculated from the average temperature and the average amount of solar radiation after October 23 for the number of sample fruits set 231 . For the number of sample fruit bearings 232, the expected value of the required number of days is calculated from the average temperature and the average amount of solar radiation after October 30th. For the sample number of fruit bearing 233, the expected value of the required number of days is calculated from the average temperature and the average amount of solar radiation after November 6th. Since different average temperatures and average insolation are used for different fruiting days, different duration expectations can be calculated. Here, the required number of days for the sample number 231 of bearing fruit is 8 weeks, the required number of days for the sample number 232 of bearing fruit is 7 weeks, and the required number of days for the sample number 233 of bearing fruit is 6 weeks.

すると、標本収穫数２３４，２３５，２３６が予測される。標本収穫数２３４は、１０月２３日から８週間後の１２月１８日に収穫が予測される５個の標本果実を示す。標本収穫数２３５は、１０月３０日から７週間後の１２月１８日に収穫が予測される３個の標本果実を示す。標本収穫数２３６は、１１月６日から６週間後の１２月１８日に収穫が予測される４個の標本果実を示す。標本収穫数２３４，２３５，２３６を合計すると、１２月１１日に０個の標本果実が収穫され、１２月１８日に１２個の標本果実が収穫され、１２月２５日０個の標本果実が収穫されるという予測になる。 Then, sample harvest numbers 234, 235, and 236 are predicted. The sample harvest number 234 indicates five sample fruits expected to be harvested on December 18, eight weeks after October 23rd. Sample Harvest Number 235 indicates three sample fruits expected to be harvested on December 18, seven weeks after October 30th. The sample harvest number 236 indicates four sample fruits expected to be harvested on December 18, six weeks after November 6th. Summing up the sample harvest numbers of 234, 235, and 236, 0 sample fruits were harvested on December 11, 12 sample fruits were harvested on December 18, and 0 sample fruits were harvested on December 25. It is predicted that it will be harvested.

標本割合＝０．１％を用いてこれらの標本収穫数を全体の収穫数に変換すると、収穫数２３７，２３８，２３９が予測される。収穫数２３７は、１２月１１日に収穫が予測される０個の果実を示す。収穫数２３８は、１２月１８日に収穫が予測される１２，０００個の果実を示す。収穫数２３９は、１２月２５日に収穫が予測される０個の果実を示す。このように、所要日数の期待値を出力する予測モデルを使用すると、特定の収穫日に収穫数が集中するという予測になることがある。しかし、実際には個体差により収穫日にばらつきが生じるため、収穫数２３７，２３８，２３９の信頼度は低い。そこで、所要日数の期待値ではなく所要日数の確率分布を出力する予測モデルを使用する。 Converting these sample yields to total yields using a sample fraction of 0.1% predicts yields of 237, 238, and 239. Harvest number 237 indicates 0 fruits expected to be harvested on December 11th. Harvest number 238 indicates 12,000 fruits expected to be harvested on December 18th. Harvest number 239 indicates 0 fruits expected to be harvested on December 25th. In this way, using a prediction model that outputs the expected value of the required number of days may result in a prediction that the number of harvests will be concentrated on a specific harvest date. However, the harvest numbers 237, 238, and 239 are less reliable because the harvest dates actually vary due to individual differences. Therefore, we use a prediction model that outputs the probability distribution of the number of required days instead of the expected number of required days.

図６は、確率分布を出力する予測モデルの使用例を示す図である。
標本着果数２３１，２３２，２３３それぞれに対して所要日数の確率分布が予測される。標本着果数２３１に対して、１０月２３日以降の平均気温および平均日射量から所要日数の確率分布が算出される。標本着果数２３２に対して、１０月３０日以降の平均気温および平均日射量から所要日数の確率分布が算出される。標本着果数２３３に対して、１１月６日以降の平均気温および平均日射量から所要日数の確率分布が算出される。 FIG. 6 is a diagram showing a usage example of a prediction model that outputs a probability distribution.
Probability distributions of the number of required days are predicted for each of the number of sample fruit bearings 231, 232, and 233. FIG. A probability distribution of the required number of days is calculated from the average temperature and the average amount of solar radiation after October 23 for the sample fruit bearing number 231 . A probability distribution of the required number of days is calculated from the average temperature and the average amount of solar radiation after October 30 for the sample number of fruits 232 . A probability distribution of the required number of days is calculated from the average temperature and the average amount of solar radiation after November 6th for the sample fruit bearing number 233 .

異なる着果日に対しては異なる平均気温および平均日射量が使用されるため、異なる所要日数の確率分布が算出され得る。ここでは、標本着果数２３１に対する確率分布は、７週間が３０％、８週間が５０％、９週間が２０％である。標本着果数２３２に対する確率分布は、６週間が３０％、７週間が５０％、８週間が２０％である。標本着果数２３３に対する確率分布は、５週間が３０％、６週間が５０％、７週間が２０％である。 Since different average temperatures and average insolation are used for different fruiting days, probability distributions for different required days can be calculated. Here, the probability distribution with respect to the sample number of bearing fruit 231 is 30% for 7 weeks, 50% for 8 weeks, and 20% for 9 weeks. The probability distribution for the sample number of fruit bearing 232 is 30% for 6 weeks, 50% for 7 weeks, and 20% for 8 weeks. The probability distribution for the number of specimen bearing 233 is 30% for 5 weeks, 50% for 6 weeks, and 20% for 7 weeks.

すると、１０月２３日の標本着果数２３１に対して標本収穫数２４１，２４２，２４３という分布が予測される。標本収穫数２４１は、１２月１１日に収穫が予測される５個×３０％＝１．５個の標本果実を示す。標本収穫数２４２は、１２月１８日に収穫が予測される５個×５０％＝１．５個の標本果実を示す。標本収穫数２４３は、１２月２５日に収穫が予測される５個×２０％＝１．０個の標本果実を示す。 Then, a distribution of 241, 242, and 243 sample harvests is predicted for the sample fruit bearing number 231 on October 23rd. Sample harvest number 241 indicates 5 x 30% = 1.5 sample fruits expected to be harvested on December 11th. The number of sample harvests 242 indicates 5 x 50% = 1.5 sample fruits expected to be harvested on December 18th. Sample harvest number 243 indicates 5×20%=1.0 sample fruits expected to be harvested on December 25th.

同様にして、１０月３０日の標本着果数２３２に対して標本収穫数２４４，２４５，２４６という分布が予測される。標本収穫数２４４は、１２月１１日に収穫が予測される３個×３０％＝０．９個の標本果実を示す。標本収穫数２４５は、１２月１８日に収穫が予測される３個×５０％＝１．５個の標本果実を示す。標本収穫数２４６は、１２月２５日に収穫が予測される３個×２０％＝０．６個の標本果実を示す。 Similarly, a distribution of 244, 245, and 246 harvested specimens is predicted for 232 specimens set on October 30th. The sample harvest number 244 indicates 3 x 30% = 0.9 sample fruit expected to be harvested on December 11th. Sample Harvest 245 indicates 3 x 50% = 1.5 sample fruits expected to be harvested on December 18th. The number of sample harvests 246 indicates 3 x 20% = 0.6 sample fruits expected to be harvested on December 25th.

また、１１月６日の標本着果数２３３に対して標本収穫数２４７，２４８，２４９という分布が予測される。標本収穫数２４７は、１２月１１日に収穫が予測される４個×３０％＝１．２個の標本果実を示す。標本収穫数２４８は、１２月１８日に収穫が予測される４個×５０％＝２．０個の標本果実を示す。標本収穫数２４９は、１２月２５日に収穫が予測される４個×２０％＝０．８個の標本果実を示す。 Also, a distribution of 247, 248, and 249 sample harvests is predicted for 233 sample fruit bearings on November 6th. Sample harvest number 247 indicates 4 x 30% = 1.2 sample fruits expected to be harvested on December 11th. Sample Harvest 248 indicates 4 x 50% = 2.0 sample fruits expected to be harvested on December 18th. Sample Harvest 249 indicates 4 x 20% = 0.8 sample fruit expected to be harvested on December 25th.

標本収穫数２４１，２４４，２４７を合計すると、１２月１１日に３．６個の標本果実が収穫されるという予測になる。標本収穫数２４２，２４６，２４８を合計すると、１２月１８日に６．０個の標本果実が収穫されるという予測になる。標本収穫数２４３，２４６，２４９を合計すると、１２月２５日に２．４個の標本果実が収穫されるという予測になる。標本割合＝０．１％を用いてこれらの標本収穫数を全体の収穫数に変換すると、収穫数２５１，２５２，２５３が予測される。収穫数２５１は、１２月１１日に収穫が予測される３，６００個の果実を示す。収穫数２５２は、１２月１８日に収穫が予測される６，０００個の果実を示す。収穫数２５３は、１２月２５日に収穫が予測される２，４００個の果実を示す。このように、確率分布を出力する予測モデルを使用することで、収穫日のばらつきを表現でき、収穫数２５１，２５２，２５３の信頼度が高くなる。 Summing up the sample harvest numbers of 241, 244, and 247 yields a prediction that 3.6 sample fruit will be harvested on December 11th. Summing the sample harvest numbers of 242, 246, and 248 yields a prediction that 6.0 sample fruit will be harvested on December 18th. Summing up the numbers of sample harvests of 243, 246, and 249, it is predicted that 2.4 sample fruits will be harvested on December 25th. Converting these sample yields to total yields using a sample proportion of 0.1% predicts yields of 251, 252, and 253. Harvest number 251 indicates 3,600 fruits expected to be harvested on December 11th. Harvest number 252 indicates 6,000 fruits expected to be harvested on December 18th. Harvest number 253 indicates 2,400 fruits expected to be harvested on December 25th. In this way, by using a prediction model that outputs a probability distribution, variations in harvest dates can be expressed, and the reliability of the harvest numbers 251, 252, and 253 is increased.

ここで、所要日数の確率分布を出力する予測モデルを、どの様に学習すればよいかが問題となる。典型的な機械学習は、訓練データを用いて予測モデルの出力の誤差を評価し、誤差が小さくなるように予測モデルの係数を更新することを繰り返す。予測モデルには、ＧＰモデル、重回帰モデル、ニューラルネットワークなど様々な機械学習モデルを使用できる。イテレーション回数が少ないうちは、予測モデルの出力の誤差が大きく訓練データに対するフィッティング精度が低い。イテレーション回数が多くなるほど、予測モデルの出力の誤差が小さくなり訓練データに対するフィッティング精度が高くなる。訓練データに対する誤差が十分に小さくなるまで上記を繰り返すことが多い。 Here, the problem is how to learn the prediction model that outputs the probability distribution of the required number of days. Typical machine learning repeatedly evaluates the error in the output of a prediction model using training data and updates the coefficients of the prediction model so as to reduce the error. Various machine learning models such as GP models, multiple regression models, and neural networks can be used as prediction models. While the number of iterations is small, the output error of the prediction model is large and the fitting accuracy to the training data is low. As the number of iterations increases, the output error of the prediction model decreases and the fitting accuracy to the training data increases. The above is often repeated until the error on the training data is small enough.

一方で、農作物の収穫予測の場合、個々の標本果実の観察および追跡は農家の負担が大きいため、標本果実は少数に限られており機械学習に使用できる訓練データは少量になる。また、パプリカは成長の個体差が大きいこともあり、これら少数の標本果実の所要日数は収穫果実全体の所要日数のばらつきを正確に表現しているわけではない。このため、訓練データに対する誤差が十分に小さくなるまでイテレーション回数を増やすと、予測モデルが訓練データに過度にフィッティングする過学習が発生しやすい。少量の訓練データから過学習された予測モデルは、分散が過度に小さい確率分布を出力する。その結果、予測モデルが出力する確率分布の信頼度が低下してしまう。 On the other hand, in the case of crop yield prediction, observation and tracking of individual fruit samples is a heavy burden on farmers, so the number of sample fruit is limited and the amount of training data that can be used for machine learning is small. In addition, since paprika has a large individual difference in growth, the number of days required for these small sample fruits does not accurately represent the variation in the number of days required for all harvested fruits. For this reason, if the number of iterations is increased until the error with respect to the training data becomes sufficiently small, overfitting, in which the prediction model excessively fits the training data, tends to occur. A predictive model that is overtrained from a small amount of training data will output a probability distribution with excessively small variance. As a result, the reliability of the probability distribution output by the prediction model is lowered.

図７は、学習不足の予測モデルの使用例を示す図である。
イテレーション回数が少ない初期段階の予測モデルを考える。学習不足の予測モデルが出力する確率分布は、所要日数を十分に絞り込めておらず分散が大きい。 FIG. 7 is a diagram illustrating a usage example of an under-learning prediction model.
Consider an early-stage predictive model with a small number of iterations. The probability distribution output by the insufficiently trained prediction model has a large variance because the number of required days is not sufficiently narrowed down.

１０月２３日の着果に対して、予測モデルは７週間が３３％、８週間が３３％、９週間が３３％という確率分布を出力する。すると、標本着果数２３１が５個であるため、１２月１１日に１．７個、１２月１８日に１．７個、１２月２５日に１．７個という標本収穫数が予測される。同様に、１０月３０日の着果に対して、予測モデルは６週間が３３％、７週間が３３％、８週間が３３％という確率分布を出力する。すると、標本着果数２３２が３個であるため、１２月１１日に１．０個、１２月１８日に１．０個、１２月２５日に１．０個という標本収穫数が予測される。１１月６日の着果に対して、予測モデルは５週間が３３％、６週間が３３％、７週間が３３％という確率分布を出力する。すると、標本着果数２３３が４個であるため、１２月１１日に１．３個、１２月１８日に１．３個、１２月２５日に１．３個という標本収穫数が予測される。 For October 23rd fruit set, the prediction model outputs a probability distribution of 33% for 7 weeks, 33% for 8 weeks, and 33% for 9 weeks. Then, since the number of specimen bearing fruit 231 is 5, the number of harvested specimens is predicted to be 1.7 on December 11th, 1.7 on December 18th, and 1.7 on December 25th. be. Similarly, for October 30th fruit set, the prediction model outputs a probability distribution of 33% for 6 weeks, 33% for 7 weeks, and 33% for 8 weeks. Then, since the number of specimen bearing fruit 232 is 3, the number of harvested specimens is predicted to be 1.0 on December 11th, 1.0 on December 18th, and 1.0 on December 25th. be. For November 6th fruit set, the prediction model outputs a probability distribution of 33% for 5 weeks, 33% for 6 weeks, and 33% for 7 weeks. Then, since the number of specimen bearing fruit 233 is 4, the number of harvested specimens is predicted to be 1.3 on December 11th, 1.3 on December 18th, and 1.3 on December 25th. be.

上記の標本収穫数を収穫日毎に合計すると、１２月１１日は４．０個、１２月１８日は４．０個、１２月２５日は４．０個と算出される。標本割合＝０．１％を用いると、全体の収穫数２５４，２５５，２５６が予測される。収穫数２５４は、１２月１１日の収穫数として４，０００個を示す。収穫数２５５は、１２月１８日の収穫数として４，０００個を示す。収穫数２５５は、１２月２５日の収穫数として４，０００個を示す。 Summing up the numbers of harvested specimens for each harvest date yields 4.0 on December 11th, 4.0 on December 18th, and 4.0 on December 25th. Using a sample proportion = 0.1%, we predict an overall yield of 254,255,256. Harvest number 254 indicates 4,000 as the harvest number on December 11th. Harvest number 255 indicates 4,000 as the harvest number on December 18th. Harvest number 255 indicates 4,000 as the harvest number on December 25th.

このように、学習不足の予測モデルを使用すると、確率分布の分散が過度に大きくなり所要日数が適切に絞り込まれない。その結果、予測される収穫数２５４，２５５，２５６が過度にばらつくことになり信頼度が低下してしまう。 In this way, if an under-trained prediction model is used, the variance of the probability distribution becomes excessively large, and the required number of days is not properly narrowed down. As a result, the predicted harvest numbers 254, 255, and 256 fluctuate excessively and reliability decreases.

図８は、過学習した予測モデルの使用例を示す図である。
イテレーション回数が多く過学習された予測モデルを考える。過学習された予測モデルが出力する確率分布は、訓練データが示す所要日数に適合し過ぎており分散が小さい。 FIG. 8 is a diagram showing an example of using an over-trained prediction model.
Consider an overtrained prediction model with many iterations. The probability distribution output by the over-learned prediction model is too well adapted to the required number of days indicated by the training data and has a small variance.

１０月２３日の着果に対して、予測モデルは７週間が０％、８週間が１００％、９週間が０％という確率分布を出力する。すると、標本着果数２３１が５個であるため、１２月１１日に０個、１２月１８日に５個、１２月２５日に０個という標本収穫数が予測される。１０月３０日の着果に対して、予測モデルは６週間が０％、７週間が１００％、８週間が０％という確率分布を出力する。すると、標本着果数２３２が３個であるため、１２月１１日に０個、１２月１８日に３個、１２月２５日に０個という標本収穫数が予測される。１１月６日の着果に対して、予測モデルは５週間が０％、６週間が１００％、７週間が０％という確率分布を出力する。すると、標本着果数２３３が４個であるため、１２月１１日に０個、１２月１８日に４個、１２月２５日に０個という標本収穫数が予測される。 For October 23rd fruit set, the prediction model outputs a probability distribution of 0% for 7 weeks, 100% for 8 weeks, and 0% for 9 weeks. Then, since the number of specimen bearing fruit 231 is 5, the number of harvested specimens is predicted to be 0 on December 11th, 5 on December 18th, and 0 on December 25th. For October 30th fruit set, the prediction model outputs a probability distribution of 0% for 6 weeks, 100% for 7 weeks, and 0% for 8 weeks. Then, since the number of specimen bearing fruit 232 is 3, the number of harvested specimens is predicted to be 0 on December 11th, 3 on December 18th, and 0 on December 25th. For fruit set on Nov. 6, the prediction model outputs a probability distribution of 0% for 5 weeks, 100% for 6 weeks, and 0% for 7 weeks. Then, since the number of specimen bearing fruit 233 is 4, the number of harvested specimens is predicted to be 0 on December 11th, 4 on December 18th, and 0 on December 25th.

上記の標本収穫数を収穫日毎に合計すると、１２月１１日は０個、１２月１８日は１２個、１２月２５日は０個と算出される。標本割合＝０．１％を用いると、全体の収穫数２５７，２５８，２５９が予測される。収穫数２５７は、１２月１１日の収穫数として０個を示す。収穫数２５８は、１２月１８日の収穫数として１２，０００個を示す。収穫数２５９は、１２月２５日の収穫数として０個を示す。 Summing up the numbers of harvested specimens for each harvest date yields 0 specimens on December 11th, 12 specimens on December 18th, and 0 specimens on December 25th. Using a sample proportion = 0.1%, we predict an overall yield of 257,258,259. The number of harvests 257 indicates 0 as the number of harvests on December 11th. Harvest number 258 indicates 12,000 as the harvest number on December 18th. The number of harvests 259 indicates 0 as the number of harvests on December 25th.

上記の例では、収穫数２５７，２５８，２５９は、図５に示した収穫数２３７，２３８，２３９と同一になっている。すなわち、確率分布を出力する予測モデルを使用しても、過学習により分散が過度に小さくなってしまうと、結果的に期待値を出力する予測モデルに近い予測結果が得られることになり予測結果の信頼度が向上しない。 In the above example, the harvest numbers 257, 258 and 259 are the same as the harvest numbers 237, 238 and 239 shown in FIG. In other words, even if a prediction model that outputs a probability distribution is used, if the variance becomes excessively small due to overfitting, prediction results close to those of the prediction model that outputs the expected value will be obtained. reliability does not improve.

予測モデルが出力する確率分布の分散は、機械学習のイテレーション回数の増加に応じて小さくなる。そのため、機械学習のイテレーションを適切な回数で停止することで、確率分布の分散を適切な大きさに誘導することができる。そこで、機械学習のイテレーションを何れのタイミングで停止すればよいかが問題となる。 The variance of the probability distribution output by the prediction model decreases as the number of machine learning iterations increases. Therefore, by stopping machine learning iterations at an appropriate number of times, it is possible to guide the variance of the probability distribution to an appropriate size. Therefore, the question arises as to when to stop machine learning iterations.

ここで、過年度のパプリカの栽培について、着果から収穫までの所要日数の実績を示す標本データは、少数の標本果実についてのみ収集される一方、収穫日毎の全体の収穫数の実績を示す総数データは、出荷管理のために農業機械などを用いて収集されている。そこで、機械学習装置１００は、予測モデルの係数を更新するイテレーション毎に、そのときの予測モデルと訓練データと標本割合から過年度の全体の収穫数を予測し、予測と総数データが示す実績とを比較して、イテレーションを停止するタイミングを判定する。全体の収穫数は、訓練データに対して図６と同様の方法を適用することで予測できる。 Here, regarding paprika cultivation in past years, sample data indicating the actual number of days required from fruit setting to harvest is collected only for a small number of sample fruits, while total data indicating the total number of harvests for each harvest date is collected. are collected using agricultural machinery for shipping management. Therefore, the machine learning device 100 predicts the total number of harvests in the past year from the prediction model at that time, the training data, and the sample ratio for each iteration that updates the coefficients of the prediction model, and compares the prediction and the actual results indicated by the total data. The comparison determines when to stop the iteration. The total yield can be predicted by applying a method similar to that of FIG. 6 to the training data.

予測モデルが出力する確率分布の分散が過度に大きい場合、全体の収穫数の予測は実績と類似しない可能性が高い。また、予測モデルが出力する確率分布の分散が過度に小さい場合も、全体の収穫数の予測は実績と類似しない可能性が高い。一方、予測モデルが出力する確率分布の分散が実際の収穫日のばらつきを反映して最適である場合、全体の収穫数の予測と実績との間の類似度が最大になる可能性が高い。そのため、機械学習装置１００は、類似度が最大になったときの予測モデルを学習結果として採用する。 If the variance of the probability distribution output by the prediction model is excessively large, the prediction of the total number of harvests is likely to be dissimilar to the actual results. Also, when the variance of the probability distribution output by the prediction model is excessively small, there is a high possibility that the prediction of the total number of harvests will not resemble the actual results. On the other hand, when the variance of the probability distribution output by the prediction model is optimal reflecting the variation in the actual harvest dates, it is likely that the degree of similarity between the prediction of the total number of harvests and the actual number of harvests will be maximized. Therefore, the machine learning device 100 adopts the prediction model with the maximum similarity as the learning result.

図９は、機械学習の停止タイミング例を示す図である。
イテレーション回数の増加に応じて、特定の平均気温および平均日射量に対して予測モデルが出力する確率分布は、確率分布２６１，２６２，２６３のように変化する。 FIG. 9 is a diagram illustrating an example of stop timing of machine learning.
As the number of iterations increases, the probability distributions output by the prediction model for a specific average temperature and average solar radiation amount change as shown in probability distributions 261, 262, and 263. FIG.

確率分布２６１は、学習不足の予測モデルから出力されるものであり、図７の予測モデルに対応する。すなわち、確率分布２６１の分散は過度に大きい。確率分布２６２は、最適な予測モデルから出力されたものであり、図６の予測モデルに対応する。すなわち、確率分布２６２の分散はパプリカの収穫日のばらつきを反映して最適である。確率分布２６３は、過学習された予測モデルから出力されたものであり、図８の予測モデルに対応する。すなわち、確率分布２６３の分散は過度に小さい。 A probability distribution 261 is output from the under-learning prediction model, and corresponds to the prediction model in FIG. That is, the variance of probability distribution 261 is excessively large. Probability distribution 262 is output from the optimal prediction model and corresponds to the prediction model of FIG. That is, the variance of the probability distribution 262 is optimal reflecting the variation in paprika harvest dates. Probability distribution 263 is output from the overtrained prediction model and corresponds to the prediction model of FIG. That is, the variance of probability distribution 263 is too small.

機械学習装置１００は、予測モデルが確率分布２６１を出力するとき、訓練データから図７と同様の方法で収穫数分布２６４を予測する。収穫数分布２６４は、収穫日毎の全体の収穫数の予測を示す。収穫数分布２６４は、収穫数２５４，２５５，２５６に相当する。すなわち、機械学習装置１００は、訓練データのレコード毎に、平均気温および平均日射量を予測モデルに入力して収穫数の確率分布を算出し、確率分布に標本着果数を乗じて収穫日毎の標本収穫数を算出する。機械学習装置１００は、訓練データのレコード毎の予測を合計し、標本割合の逆数を乗じて収穫日毎の収穫数を算出する。 When the prediction model outputs the probability distribution 261, the machine learning device 100 predicts the harvest number distribution 264 from the training data in the same manner as in FIG. The harvest number distribution 264 shows the forecast of the total number of harvests by harvest date. The harvest number distribution 264 corresponds to the harvest numbers 254, 255, and 256. That is, the machine learning device 100 calculates the probability distribution of the number of harvests by inputting the average temperature and the average amount of solar radiation into the prediction model for each record of the training data, multiplies the probability distribution by the sample number of fruit bearing, and Calculate the number of harvested specimens. The machine learning device 100 totals the predictions for each record of the training data and multiplies them by the reciprocal of the sample ratio to calculate the number of harvests for each harvest date.

収穫数分布２６４が予測されると、機械学習装置１００は、収穫数分布２６４と収穫数分布２６７とを比較して誤差（総数誤差）を算出する。収穫数分布２６７は、訓練データと同じ年度の収穫状況であって、収穫日毎の全体の収穫数の実績を示す。収穫数分布２６７は、１２月１１の収穫数が３，７００個、１２月１８日の収穫数が５，８００個、１２月２５日の収穫数が２，５００個であることを示す。総数誤差の指標として、例えば、残差平方和を用いる。残差平方和は、収穫日毎に予測と実績の間で収穫数の差の二乗を算出し、差の二乗を合計した数値である。収穫数分布２６４と収穫数分布２６７の残差平方和は、５，５８０，０００である。よって、総数誤差は大きい。 When the harvest number distribution 264 is predicted, the machine learning device 100 compares the harvest number distribution 264 and the harvest number distribution 267 to calculate an error (total error). The harvest number distribution 267 is the harvest situation in the same year as the training data, and indicates the overall harvest number for each harvest date. Harvest number distribution 267 indicates that the number of harvests on December 11 is 3,700, the number of harvests on December 18 is 5,800, and the number of harvests on December 25 is 2,500. As an index of total error, for example, residual sum of squares is used. The residual sum of squares is a numerical value obtained by calculating the square of the difference in the number of harvests between the forecast and the actual number of harvests for each harvest date and summing the squares of the difference. The sum of squared residuals of the harvest number distribution 264 and the harvest number distribution 267 is 5,580,000. Therefore, the total error is large.

次に、機械学習装置１００は、予測モデルが確率分布２６２を出力するとき、訓練データから図６と同様の方法で収穫数分布２６５を予測する。収穫数分布２６５は、収穫数２５１，２５２，２５３に相当する。収穫数分布２６５が予測されると、機械学習装置１００は、収穫数分布２６５と収穫数分布２６７とを比較して総数誤差を算出する。収穫数分布２６５と収穫数分布２６７の残差平方和は、６０，０００である。よって、総数誤差は予測モデルが確率分布２６１を出力するときよりも小さい。 Next, when the prediction model outputs the probability distribution 262, the machine learning device 100 predicts the harvest number distribution 265 from the training data in the same manner as in FIG. The harvest number distribution 265 corresponds to the harvest numbers 251, 252, and 253. When the harvest number distribution 265 is predicted, the machine learning device 100 compares the harvest number distribution 265 and the harvest number distribution 267 to calculate the total error. The sum of squared residuals of the harvest number distribution 265 and the harvest number distribution 267 is 60,000. Therefore, the total error is smaller than when the prediction model outputs probability distribution 261 .

次に、機械学習装置１００は、予測モデルが確率分布２６３を出力するとき、訓練データから図８と同様の方法で収穫数分布２６６を予測する。収穫数分布２６６は、収穫数２５７，２５８，２５９に相当する。収穫数分布２６６が予測されると、機械学習装置１００は、収穫数分布２６６と収穫数分布２６７とを比較して総数誤差を算出する。収穫数分布２６６と収穫数分布２６７の残差平方和は、５８，３８０，０００である。よって、総数誤差は予測モデルが確率分布２６２を出力するときよりも大きい。 Next, when the prediction model outputs the probability distribution 263, the machine learning device 100 predicts the harvest number distribution 266 from the training data in the same manner as in FIG. The harvest number distribution 266 corresponds to harvest numbers 257, 258, and 259. When the harvest number distribution 266 is predicted, the machine learning device 100 compares the harvest number distribution 266 and the harvest number distribution 267 to calculate the total error. The sum of squared residuals of the harvest number distribution 266 and the harvest number distribution 267 is 58,380,000. Therefore, the total error is greater than when the prediction model outputs probability distribution 262 .

このようにして、機械学習装置１００は、予測モデルが確率分布２６２を出力するときに総数誤差が最小になった、すなわち、類似度が最大になったことを検出する。すると、機械学習装置１００は、機械学習のイテレーションを停止し、確率分布２６２を出力する予測モデルを学習結果として出力する。 In this way, the machine learning device 100 detects that the total error is minimized when the prediction model outputs the probability distribution 262, that is, the similarity is maximized. Then, the machine learning device 100 stops the machine learning iterations and outputs a prediction model that outputs the probability distribution 262 as a learning result.

図１０は、機械学習のデータフローの例を示す図である。
予測モデル２７０の生成に使用する訓練データは、着果日の異なる複数のレコードを含む。訓練データの各レコードは、着果日２７１、標本着果数２７２、標本日数分布２７３、平均気温２７７および平均日射量２７８を含む。標本日数分布２７３は、所要日数毎の標本収穫数を示す。標本日数分布２７３は、個数で表現されていてもよいし、個数を標本着果数２７２で割った確率で表現されていてもよい。例えば、標本日数分布２７３は、７週間が４０％、８週間が６０％、９週間が０％であることを示す。 FIG. 10 is a diagram illustrating an example of a machine learning data flow.
The training data used to generate the predictive model 270 includes multiple records with different fruit bearing dates. Each record of the training data includes a fruit-bearing date 271 , sample fruit-bearing number 272 , sample number of days distribution 273 , average temperature 277 and average solar radiation 278 . The number of sample days distribution 273 indicates the number of sample harvests for each required number of days. The sample number of days distribution 273 may be represented by the number, or may be represented by the probability obtained by dividing the number by the sample number of fruits 272 . For example, the sample days distribution 273 indicates that 7 weeks is 40%, 8 weeks is 60%, and 9 weeks is 0%.

平均気温２７７は、１時間毎の屋内の気温であって着果日２７１から収穫日までの期間で平均化したものである。平均日射量２７８は、１時間毎の屋内の日射量であって着果日２７１から収穫日までの期間で平均化したものである。よって、平均気温２７７および平均日射量２７８はそれぞれ２４次元のベクトルであり、合わせて４８次元のベクトルになる。平均気温２７７は、センサ２３によって測定された測定気温２７５から算出される。平均日射量２７８は、センサ２３によって測定された測定日射量２７６から算出される。訓練データは過年度の標本果実を示しているため、着果日２７１から収穫日までの測定気温２７５および測定日射量２７６は既知であり、気象予報データは使用しなくてよい。 The average temperature 277 is the hourly indoor temperature, which is averaged over the period from the fruit bearing date 271 to the harvest date. The average solar radiation amount 278 is the hourly indoor solar radiation amount, which is averaged over the period from the fruit bearing date 271 to the harvest date. Therefore, the average temperature 277 and the average amount of solar radiation 278 are each 24-dimensional vectors, making a total of 48-dimensional vectors. Average air temperature 277 is calculated from measured air temperature 275 measured by sensor 23 . Average solar radiation 278 is calculated from measured solar radiation 276 measured by sensor 23 . Since the training data shows sample fruits from previous years, the measured temperature 275 and the measured solar radiation amount 276 from the date of fruiting 271 to the date of harvest are already known, and weather forecast data need not be used.

また、訓練データとは別に収穫数２７４を示す総数データが予め用意される。収穫数２７４は、収穫日毎の収穫数の実績である。例えば、収穫数２７４は、１２月１１日に３，７００個の果実が収穫され、１２月１８日に５，８００個の果実が収穫され、１２月２５日に２，５００個の果実が収穫されたことを示す。 In addition to the training data, total data indicating the number of harvests 274 is prepared in advance. The number of harvests 274 is the actual number of harvests for each harvest date. For example, a harvest number of 274 means that 3,700 fruits were harvested on December 11th, 5,800 fruits were harvested on December 18th, and 2,500 fruits were harvested on December 25th. indicates that the

機械学習が開始されると、予測モデル２７０の係数が初期化される。訓練データのレコード毎に、予測モデル２７０に平均気温２７７および平均日射量２７８が入力され、予測モデル２７０から所要日数分布２８１が出力される。所要日数分布２８１は、所要日数毎の収穫確率の予測を示す。例えば、所要日数分布２８１は、７週間が３３％、８週間が３３％、９週間が３３％であることを示す。訓練データのレコード毎に、所要日数分布２８１と標本日数分布２７３が比較されて誤差が算出される。そして、訓練データのレコード毎の誤差が合算されて、訓練データ全体に対するモデル誤差２８２が算出される。 When machine learning starts, the coefficients of predictive model 270 are initialized. For each record of training data, the prediction model 270 receives an average temperature 277 and an average amount of solar radiation 278, and outputs a required number of days distribution 281 from the prediction model 270. FIG. The required number of days distribution 281 indicates the prediction of the harvest probability for each required number of days. For example, the required days distribution 281 indicates that 7 weeks is 33%, 8 weeks is 33%, and 9 weeks is 33%. For each record of training data, the required number of days distribution 281 and the sample number of days distribution 273 are compared to calculate an error. Then, the errors for each record of the training data are summed to calculate the model error 282 for the entire training data.

訓練データのレコード毎の誤差には、例えば、残差平方和を用いる。この残差平方和は、所要日数毎に所要日数分布２８１の値と標本日数分布２７３の値の差を二乗し、複数の所要日数について差の二乗を合計した指標である。所要日数分布２８１と標本日数分布２７３の比較は、３３％と４０％の比較など確率同士の比較として行ってもよい。また、所要日数分布２８１と標本日数分布２７３の比較は、確率に標本着果数２７２を乗ずることで、１．７個と２個の比較など個数同士の比較として行ってもよい。 For the error for each record of training data, for example, residual sum of squares is used. This residual sum of squares is an index obtained by squaring the difference between the value of the distribution of required days 281 and the value of the sample distribution of days 273 for each required number of days, and totaling the squares of the differences for a plurality of required days. The comparison of the required number of days distribution 281 and the sample number of days distribution 273 may be performed as a comparison of probabilities such as a comparison of 33% and 40%. Further, the comparison between the required number of days distribution 281 and the sample number of days distribution 273 may be performed by multiplying the probability by the sample number of fruits 272 to compare numbers such as 1.7 and 2.

モデル誤差２８２が算出されると、モデル誤差２８２が小さくなるように予測モデル２７０の係数が更新される。予測モデル２７０の係数を更新する際には、１つ前の係数を退避しておく。以上の予測モデル２７０の更新からモデル誤差２８２の算出までが１回のイテレーションである。下記の停止判定によってイテレーションの停止が決定されるまで予測モデル２７０の更新が繰り返される。停止判定はイテレーション毎に実行される。停止判定は、予測モデル２７０が所要日数分布２８１を出力してから予測モデル２７０が次に更新されるまでの間に、イテレーションを中断して実行してもよい。また、上記のイテレーションと並列に停止判定を実行してもよい。異なるプロセッサまたはプロセッサコアを用いて、イテレーションと停止判定を並列実行してもよい。 After the model error 282 is calculated, the coefficients of the prediction model 270 are updated so that the model error 282 becomes smaller. When updating the coefficients of the prediction model 270, the previous coefficient is saved. The process from updating the prediction model 270 to calculating the model error 282 is one iteration. Updating of the prediction model 270 is repeated until it is decided to stop the iteration by the following stop determination. Stop judgment is executed for each iteration. The stop determination may be performed by interrupting the iteration after the forecast model 270 outputs the required days distribution 281 until the forecast model 270 is updated next time. Also, the stop determination may be executed in parallel with the above iterations. Different processors or processor cores may be used to perform iteration and stop determination in parallel.

所要日数分布２８１が算出されると、訓練データのレコード毎に、収穫確率に標本着果数２７２を乗じて標本収穫数２８３が算出される。標本収穫数２８３は、所要日数毎の標本果実の収穫数の予測を示す。例えば、標本収穫数２８３は、５個の標本果実のうち、７週間が１．７個、８週間が１．７個、９週間が１．７個であることを示す。 After the required number of days distribution 281 is calculated, a sample harvest number 283 is calculated by multiplying the harvest probability by the sample fruit bearing number 272 for each record of the training data. The number of sample harvests 283 indicates the prediction of the number of sample fruit harvests for each required number of days. For example, the number of harvested specimens of 283 indicates that 1.7 fruits were harvested in 7 weeks, 1.7 in 8 weeks, and 1.7 in 9 weeks out of 5 specimen fruits.

訓練データのレコード毎の標本収穫数２８３の所要日数が、着果日２７１に基づいて、収穫日が揃うようにシフトされる。例えば、１０月２３日の７週間後は１０月３０日の６週間後に相当するため、着果日２７１が１０月３０日であるレコードに対応する標本収穫数２８３は、着果日２７１が１０月２３日であるレコードに対応する標本収穫数２８３に対して１週間後ろにシフトされる。訓練データの複数のレコードについて、収穫日が揃った標本収穫数２８３が収穫日毎に合算される。 The number of days required for the sample harvest number 283 for each record of the training data is shifted based on the fruit bearing date 271 so that the harvest dates are aligned. For example, 7 weeks after October 23rd corresponds to 6 weeks after October 30th. The sample harvest number 283 corresponding to the record that is the 23rd of the month is shifted one week later. For a plurality of records of training data, sample harvest numbers 283 with the same harvest date are totaled for each harvest date.

そして、合算された標本収穫数に標本割合２８４の逆数を乗じて（標本割合２８４で割って）収穫数２８５が算出される。例えば、合算された標本収穫数が１，０００倍される。収穫数２８５は、収穫日毎の果実全体の収穫数の予測を示す。例えば、収穫数２８５は、１２月１１日に４，０００個の果実が収穫され、１２月１８日に４，０００個の果実が収穫され、１２月２５日に４，０００個の果実が収穫されるという予測を示す。 Then, the harvested number 285 is calculated by multiplying the reciprocal of the sample rate 284 by the summed number of harvested samples (divided by the sample rate 284). For example, the combined number of sample harvests is multiplied by 1,000. Harvest number 285 indicates the forecast of the number of whole fruit harvests for each harvest date. For example, a harvest number of 285 means that 4,000 fruits were harvested on December 11th, 4,000 fruits were harvested on December 18th, and 4,000 fruits were harvested on December 25th. It shows the prediction that it will be done.

収穫数２８５が算出されると、収穫数２８５と収穫数２７４とが比較されて総数誤差２８６が算出される。総数誤差２８６には、例えば、残差平方和を用いる。そして、前回のイテレーションの総数誤差２８６と今回のイテレーションの総数誤差２８６とが比較される。今回の総数誤差２８６が前回の総数誤差２８６以下であれば、イテレーションの継続が決定される。この場合、モデル誤差２８２に応じて予測モデル２７０が更新される。 After the harvested number 285 is calculated, the harvested number 285 and the harvested number 274 are compared to calculate the total error 286 . For the total error 286, for example, residual sum of squares is used. Then, the total error 286 of the previous iteration and the total error 286 of the current iteration are compared. If the current total error 286 is less than or equal to the previous total error 286, it is decided to continue the iteration. In this case, predictive model 270 is updated according to model error 282 .

一方、今回の総数誤差２８６が前回の総数誤差２８６より大きければ、イテレーションの停止が決定される。この場合、予測モデル２７０は更新されない。最適な予測モデル２７０の係数は前回のイテレーションの係数であるため、退避しておいた予測モデル２７０の係数が読み出され、学習結果として出力される。すなわち、収穫数２７４と収穫数２８５の類似度が最大になり、総数誤差２８６が最小になったことが検出される。ここでは、最適な予測モデル２７０の係数に到達する前は総数誤差２８６が単調に減少し、最適な予測モデル２７０の係数に到達した後は総数誤差２８６が単調に増加すると仮定している。 On the other hand, if the current total error 286 is greater than the previous total error 286, it is decided to stop the iteration. In this case, predictive model 270 is not updated. Since the coefficients of the optimum prediction model 270 are the coefficients of the previous iteration, the saved coefficients of the prediction model 270 are read out and output as the learning result. That is, it is detected that the similarity between the number of harvests 274 and 285 is maximized and the total error 286 is minimized. Here, it is assumed that the total error 286 monotonically decreases before reaching the optimal prediction model 270 coefficients, and that the total error 286 monotonically increases after reaching the optimal prediction model 270 coefficients.

次に、機械学習装置１００の機能について説明する。
図１１は、機械学習装置の機能例を示すブロック図である。
機械学習装置１００は、気象データ記憶部１２１、標本データ記憶部１２２、総数データ記憶部１２３、予測モデル記憶部１２４、データ収集部１２５、データ加工部１２６、機械学習部１２７、イテレーション制御部１２８および収穫予測部１２９を有する。気象データ記憶部１２１、標本データ記憶部１２２、総数データ記憶部１２３および予測モデル記憶部１２４は、例えば、ＲＡＭ１０２またはＨＤＤ１０３の記憶領域を用いて実現される。データ収集部１２５、データ加工部１２６、機械学習部１２７、イテレーション制御部１２８および収穫予測部１２９は、例えば、プログラムを用いて実現される。 Next, functions of the machine learning device 100 will be described.
FIG. 11 is a block diagram illustrating an example of functions of the machine learning device.
Machine learning device 100 includes weather data storage unit 121, sample data storage unit 122, total data storage unit 123, prediction model storage unit 124, data collection unit 125, data processing unit 126, machine learning unit 127, iteration control unit 128, and It has a harvest prediction unit 129 . The weather data storage unit 121, the sample data storage unit 122, the total data storage unit 123, and the prediction model storage unit 124 are realized using storage areas of the RAM 102 or the HDD 103, for example. The data collection unit 125, the data processing unit 126, the machine learning unit 127, the iteration control unit 128, and the harvest prediction unit 129 are realized using programs, for example.

気象データ記憶部１２１は、過年度の着果日から収穫日までの気象データと、今年度の着果日から予測日の前日までの気象データを記憶する。気象データは、センサ２３によって測定された測定気温および測定日射量を含む。また、気象データ記憶部１２１は、今年度の予測日以降の気象予報データを記憶する。気象予報データは、気象データサーバ３１から収集される。気象予報データは、屋外の予報気温および予報日射量を含む。また、気象データ記憶部１２１は、屋外の予報気温および予報日射量を、屋内の予想気温および予想日射量に変換するための環境パラメータを記憶する。 The meteorological data storage unit 121 stores meteorological data from the date of bearing fruit in the past year to the date of harvest, and meteorological data from the date of bearing fruit in the current year to the day before the forecast date. The meteorological data includes measured air temperature and measured solar radiation measured by sensor 23 . In addition, the weather data storage unit 121 stores weather forecast data after the forecast date of the current year. Weather forecast data is collected from the weather data server 31 . Weather forecast data includes outdoor forecast temperatures and forecast solar radiation. The weather data storage unit 121 also stores environmental parameters for converting outdoor forecast air temperature and forecast solar radiation into indoor forecast air temperature and forecast solar radiation.

標本データ記憶部１２２は、過年度の標本果実毎の着果日および収穫日を示す標本データと、今年度の標本果実毎の着果日を示す標本データを記憶する。また、標本データ記憶部１２２は、全体の果実に対する標本果実の割合である標本割合を記憶する。 The sample data storage unit 122 stores sample data indicating the fruit bearing date and harvest date for each sample fruit in the past year and sample data indicating the fruit bearing date for each sample fruit in the current year. The sample data storage unit 122 also stores a sample ratio, which is the ratio of the sample fruit to the whole fruit.

総数データ記憶部１２３は、過年度の収穫日毎の収穫数を示す総数データを記憶する。
予測モデル記憶部１２４は、学習結果としての予測モデルを記憶する。
データ収集部１２５は、気象データ記憶部１２１、標本データ記憶部１２２および総数データ記憶部１２３に記憶される各種のデータを収集する。データの収集方法として、データ収集部１２５は、ユーザからデータの入力を受け付けることがある。また、データ収集部１２５は、他の情報処理装置からデータを受信することがある。 The total number data storage unit 123 stores total number data indicating the number of harvests for each harvest date in the past year.
The prediction model storage unit 124 stores prediction models as learning results.
The data collection unit 125 collects various data stored in the weather data storage unit 121 , sample data storage unit 122 and total number data storage unit 123 . As a data collection method, the data collection unit 125 may receive data input from the user. The data collection unit 125 may also receive data from other information processing devices.

データ加工部１２６は、気象データ記憶部１２１に記憶された過年度の気象データと、標本データ記憶部１２２に記憶された過年度の標本データを加工して、着果日が異なる複数のレコードを含む訓練データを生成する。具体的には、データ加工部１２６は、過年度の標本データから着果日を抽出し、着果日毎に標本着果数をカウントし、着果日毎に着果日と収穫日の差から標本日数分布を算出する。また、データ加工部１２６は、過年度の気象データから、着果日毎に着果日から収穫日までの測定気温および測定日射量を抽出し、１時間毎の平均気温および平均日射量を算出する。 The data processing unit 126 processes past year weather data stored in the weather data storage unit 121 and past year sample data stored in the sample data storage unit 122, and performs training including a plurality of records with different fruit bearing dates. Generate data. Specifically, the data processing unit 126 extracts the date of bearing fruit from the sample data of the previous year, counts the number of sample bearing days for each date of bearing fruit, and calculates the sample number of days from the difference between the date of bearing fruit and the date of harvest for each bearing date. Calculate the distribution. The data processing unit 126 also extracts the measured temperature and amount of solar radiation from the date of fruiting to the date of harvest from the meteorological data of past years, and calculates the average temperature and amount of solar radiation for each hour.

また、データ加工部１２６は、気象データ記憶部１２１に記憶された今年度の気象データおよび気象予報データと、標本データ記憶部１２２に記憶された今年度の標本データを加工して、収穫予測用の入力データを生成する。具体的には、データ加工部１２６は、今年度の標本データから着果日を抽出し、着果日毎に標本着果数をカウントする。また、データ加工部１２６は、今年度の気象データから、着果日毎に着果日から予測日の前日までの測定気温および測定日射量を抽出する。データ加工部１２６は、気象予報データから、予測日から収穫時期までの予報気温および予報日射量を抽出し、気象データ記憶部１２１に記憶された環境パラメータを用いて、屋内の予想気温および予想日射量に変換する。そして、データ加工部１２６は、着果日毎に着果日から収穫時期までの通算の１時間毎の平均気温および平均日射量を算出する。 In addition, the data processing unit 126 processes the current year's weather data and weather forecast data stored in the weather data storage unit 121 and the current year's sample data stored in the sample data storage unit 122 to produce data for harvest prediction. generate the input data for Specifically, the data processing unit 126 extracts the date of fruit bearing from the sample data of the current year, and counts the number of sample fruit bearings for each date of fruit bearing. In addition, the data processing unit 126 extracts the measured temperature and the measured amount of solar radiation from the date of fruiting to the day before the predicted date for each day of fruiting from the weather data of the current year. The data processing unit 126 extracts the predicted temperature and solar radiation amount from the forecast date to the harvest time from the weather forecast data, and uses the environmental parameters stored in the weather data storage unit 121 to calculate the predicted indoor temperature and solar radiation. Convert to quantity. Then, the data processing unit 126 calculates the average temperature and the average amount of solar radiation for each hour from the day of fruiting to the harvest time for each day of fruiting.

データ加工部１２６は、訓練データを機械学習部１２７に提供する。また、データ加工部１２６は、総数データ記憶部１２３に記憶された総数データをイテレーション制御部１２８に提供する。データ加工部１２６は、入力データを収穫予測部１２９に提供する。 The data processing unit 126 provides training data to the machine learning unit 127 . The data processing unit 126 also provides the total number data stored in the total number data storage unit 123 to the iteration control unit 128 . The data processing unit 126 provides input data to the harvest prediction unit 129 .

機械学習部１２７は、異なる着果日についての複数のレコードを含む訓練データを用いて機械学習を行う。使用する機械学習アルゴリズムは予め指定されている。生成される予測モデルは、着果から収穫までの所要日数の確率分布を出力する。機械学習部１２７は、予測モデルの係数を更新して訓練データに対するモデル誤差を算出することを繰り返す。機械学習部１２７は、イテレーション制御部１２８からイテレーションの停止が指示されると、１つ前の予測モデルを予測モデル記憶部１２４に出力する。 The machine learning unit 127 performs machine learning using training data including multiple records for different fruit bearing dates. The machine learning algorithm to use is specified in advance. The generated prediction model outputs a probability distribution of the required number of days from fruit setting to harvest. The machine learning unit 127 repeats updating the coefficients of the prediction model and calculating the model error for the training data. When the iteration control unit 128 instructs the machine learning unit 127 to stop the iteration, the machine learning unit 127 outputs the previous prediction model to the prediction model storage unit 124 .

イテレーション制御部１２８は、機械学習部１２７が予測モデルを更新する毎に、予測モデルが出力する所要日数分布と標本着果数と標本割合から、過年度の収穫日毎の全体の収穫数を予測し、総数データが示す実績と比較する。イテレーション制御部１２８は、全体の収穫数の予測と実績の間の総数誤差を算出し、総数誤差が前回より増えている場合、イテレーションを停止するよう機械学習部１２７に指示する。 Each time the machine learning unit 127 updates the prediction model, the iteration control unit 128 predicts the total number of harvests for each harvest date in the past year from the required number of days distribution, the sample number of fruit bearing, and the sample ratio output by the prediction model, Compare with the performance indicated by the total data. The iteration control unit 128 calculates the total error between the predicted and actual harvest numbers, and instructs the machine learning unit 127 to stop the iteration when the total error increases from the previous time.

収穫予測部１２９は、予測モデル記憶部１２４に記憶された予測モデルとデータ加工部１２６から提供される入力データに基づいて、今年度の収穫日毎の収穫数を予測する。具体的には、収穫予測部１２９は、今年度の平均気温および平均日射量を予測モデルに入力し、着果日毎の所要日数分布を予測する。収穫予測部１２９は、着果日に所要日数を加えて収穫日を算出すると共に、所要日数分布が示す確率に標本着果数を乗じて標本収穫数を算出し、標本割合の逆数を乗じて収穫数に変換する。収穫予測部１２９は、異なる着果日の収穫数を収穫日毎に合算し、収穫日毎の全体の収穫数を予測する。 The harvest prediction unit 129 predicts the number of harvests for each harvest date in the current year based on the prediction model stored in the prediction model storage unit 124 and the input data provided from the data processing unit 126 . Specifically, the harvest prediction unit 129 inputs the average temperature and the average amount of solar radiation for the current year into the prediction model, and predicts the required number of days distribution for each fruit-bearing day. The harvest prediction unit 129 calculates the harvest date by adding the required number of days to the fruit-bearing date, multiplies the probability indicated by the required number of days distribution by the sample number of fruit-bearing, calculates the number of sample harvests, and multiplies the reciprocal of the sample ratio. Convert to harvest number. The harvest prediction unit 129 sums up the number of harvests on different fruiting days for each harvest date, and predicts the total number of harvests for each harvest date.

収穫予測部１２９は、収穫日毎の全体の収穫数の予測結果を出力する。例えば、収穫予測部１２９は、表示装置１１１に予測結果を表示する。また、例えば、収穫予測部１２９は、ＨＤＤ１０３などの不揮発性ストレージに予測結果を保存する。また、例えば、収穫予測部１２９は、プリンタなどの他の出力デバイスに予測結果を出力する。また、例えば、収穫予測部１２９は、他の情報処理装置に予測結果を送信する。 The harvest prediction unit 129 outputs a prediction result of the total number of harvests for each harvest date. For example, the harvest prediction unit 129 displays the prediction result on the display device 111 . Also, for example, the harvest prediction unit 129 saves the prediction result in a non-volatile storage such as the HDD 103 . Also, for example, the harvest prediction unit 129 outputs the prediction result to another output device such as a printer. Also, for example, the harvest prediction unit 129 transmits prediction results to other information processing devices.

図１２は、気象データと標本データと総数データのテーブル例を示す図である。
気象データテーブル１３１は、気象データ記憶部１２１に記憶される。気象データテーブル１３１は、過年度の気象データを含む。今年度の気象データや気象予報データも、気象データテーブル１３１と同様のテーブルで管理することが可能である。気象データテーブル１３１は、日時、気温および日射量の項目を含む。日時は、１時間刻みである。気温は、気温の１時間の平均である。気温の単位は、例えば、℃である。日射量は、瞬間日射量の１時間の平均である。日射量の単位は、例えば、ｋＷ／ｍ^２である。 FIG. 12 is a diagram showing a table example of weather data, sample data, and total number data.
The weather data table 131 is stored in the weather data storage unit 121 . The weather data table 131 includes past year weather data. This year's weather data and weather forecast data can also be managed in a table similar to the weather data table 131 . The weather data table 131 includes items of date and time, temperature, and amount of solar radiation. The date and time are in increments of one hour. Temperature is the hourly average of temperature. The temperature unit is, for example, degrees Celsius. Insolation is an hourly average of instantaneous insolation. The unit of solar radiation is, for example, kW/m ² .

標本データテーブル１３２は、標本データ記憶部１２２に記憶される。標本データテーブル１３２は、過年度の標本データを含む。今年度の標本データも、標本データテーブル１３２と同様のテーブルで管理することが可能である。ただし、今年度の標本データについては収穫日は登録されない。標本データテーブル１３２は、品種、果実番号、着果日および収穫日の項目を含む。品種は、パプリカの品種であり、果実の色が異なる赤品種と黄品種と橙品種を含む。収穫予測は品種毎に行われる。果実番号は、標本果実を個々に識別する識別番号である。同じ品種の中で一意な果実番号が標本果実に付与される。着果日は、標本果実の着果が観測された日である。収穫日は、標本果実を収穫した日である。ただし、データ管理の都合上、着果日および収穫日は１週間の中の特定の曜日の日付である。 A sample data table 132 is stored in the sample data storage unit 122 . The sample data table 132 includes past year sample data. This year's sample data can also be managed in a table similar to the sample data table 132 . However, the harvest date is not registered for this year's sample data. The sample data table 132 includes items of variety, fruit number, fruiting date, and harvest date. The cultivars are paprika cultivars, including red, yellow and orange cultivars with different fruit colors. Yield prediction is made for each variety. The fruit number is an identification number that individually identifies the sample fruit. A unique fruit number is assigned to the sample fruit within the same variety. The date of bearing fruit is the date on which the bearing of the sample fruit was observed. The harvest date is the date on which the sample fruit was harvested. However, for the convenience of data management, the date of fruiting and the date of harvest are dates of specific days of the week.

総数データテーブル１３３は、総数データ記憶部１２３に記憶される。総数データテーブル１３３は、過年度の総数データを含む。総数データテーブル１３３は、収穫日、赤個数、黄個数および橙個数の項目を含む。収穫日は、果実を収穫した日である。ただし、データ管理の都合上、収穫日は１週間の中の特定の曜日の日付である。赤個数は、赤品種の収穫数である。黄個数は、黄品種の収穫数である。橙個数は、橙品種の収穫数である。 The total data table 133 is stored in the total data storage unit 123 . The total data table 133 includes past year total data. The total data table 133 includes items of harvest date, red count, yellow count, and orange count. The harvest date is the date on which the fruit was harvested. However, for convenience of data management, the harvest date is the date of a specific day of the week. The red number is the harvested number of red varieties. The number of yellow cultivars is the harvested number of yellow cultivars. The number of oranges is the harvested number of orange varieties.

図１３は、訓練データテーブルの例を示す図である。
訓練データテーブル１３４は、気象データテーブル１３１および標本データテーブル１３２に基づいて生成され、機械学習に使用される。訓練データテーブル１３４は、着果日、標本着果数、目的変数および説明変数の項目を含む。着果日は、標本データテーブル１３２に出現する着果日である。標本着果数は、標本データテーブル１３２に登録された標本果実のうち着果日が同じ標本果実の個数である。 FIG. 13 is a diagram showing an example of a training data table.
A training data table 134 is generated based on the weather data table 131 and the sample data table 132 and used for machine learning. The training data table 134 includes items of fruit bearing date, sample fruit bearing number, objective variable, and explanatory variable. The date of bearing fruit is the date of bearing fruit that appears in the sample data table 132 . The number of fruit bearing samples is the number of fruit samples registered in the sample data table 132 that have the same fruit bearing date.

目的変数は、標本日数分布である。標本日数分布は、６週間が０個、７週間が２個、８週間が３個のように、所要日数毎の標本果実の個数である。所要日数は、標本データテーブル１３２の着果日と収穫日の差である。所要日数毎の標本果実は、標本データテーブル１３２に登録された標本果実のうち所要日数が同じ標本果実である。所要日数毎の標本果実の個数の合計は、標本着果数に一致する。 The target variable is the sample day distribution. The sample day distribution is the number of sample fruits for each required number of days, such as 0 for 6 weeks, 2 for 7 weeks, and 3 for 8 weeks. The required number of days is the difference between the date of bearing fruit and the date of harvest in the sample data table 132 . The sample fruit for each required number of days is the sample fruit registered in the sample data table 132 that has the same required number of days. The total number of sample fruit set for each required number of days matches the number of sample fruit set.

説明変数は、１時間毎の平均気温および１時間毎の平均日射量を含む。１時間毎の平均気温は、着果日から収穫日までの各日付の気温を気象データテーブル１３１から抽出し、０時、１時、２時、…のように時刻で分類して平均化することで算出される。１時間毎の平均日射量は、着果日から収穫日までの各日付の日射量を気象データテーブル１３１から抽出し、０時、１時、２時、…のように時刻で分類して平均化することで算出される。なお、ある着果日に対応する収穫日は、標本日数分布で１以上の標本果実が収穫された収穫日のうち、最後の収穫日でもよいし最初の収穫日でもよいし中心の収穫日でもよい。 Explanatory variables include hourly average temperature and hourly average solar radiation. The hourly average temperature is obtained by extracting the temperature of each date from the date of fruiting to the date of harvest from the weather data table 131, classifying them by time such as 0:00, 1:00, 2:00, . . . and averaging them. It is calculated by The average amount of solar radiation for each hour is obtained by extracting the amount of solar radiation on each date from the date of fruiting to the date of harvest from the weather data table 131, classifying it by time such as 0:00, 1:00, 2:00, . . . It is calculated by converting The harvest date corresponding to a certain fruit-bearing date may be the last harvest date, the first harvest date, or the central harvest date among the harvest dates on which one or more sample fruits are harvested in the sample date distribution. good.

次に、機械学習装置１００の処理手順について説明する。
図１４は、機械学習の手順例を示すフローチャートである。
（Ｓ１０）データ収集部１２５は、気象データと標本データと総数データを収集する。機械学習は品種毎に行う。ただし、品種を予測モデルの説明変数に加えてもよい。 Next, a processing procedure of the machine learning device 100 will be described.
FIG. 14 is a flowchart illustrating an example of machine learning procedures.
(S10) The data collection unit 125 collects weather data, sample data, and total number data. Machine learning is performed for each breed. However, the breed may be added to the explanatory variables of the prediction model.

（Ｓ１１）データ加工部１２６は、標本データから着果日を抽出し、標本果実を着果日で分類する。データ加工部１２６は、着果日毎に標本果実をカウントして訓練データの標本着果数とする。また、データ加工部１２６は、着果日と収穫日の差である所要日数を算出し、各所要日数の標本果実をカウントして訓練データの標本日数分布とする。 (S11) The data processing unit 126 extracts the fruit-bearing date from the sample data, and classifies the sample fruit by the fruit-bearing date. The data processing unit 126 counts sample fruits for each fruit-bearing date and uses it as the sample fruit-bearing number of the training data. In addition, the data processing unit 126 calculates the number of required days, which is the difference between the date of bearing fruit and the date of harvest, and counts the number of sample fruits for each required number of days to obtain the sample number of days distribution of the training data.

（Ｓ１２）データ加工部１２６は、着果日毎に気象データから、着果日から収穫日までの気温および日射量を抽出する。データ加工部１２６は、抽出した気温を時刻で分類し、各時刻の気温の平均を訓練データの平均気温とする。また、データ加工部１２６は、抽出した日射量を時刻で分類し、各時刻の日射量の平均を訓練データの平均日射量とする。 (S12) The data processing unit 126 extracts the temperature and the amount of solar radiation from the date of fruiting to the date of harvest from the weather data for each day of fruiting. The data processing unit 126 classifies the extracted temperatures by time, and sets the average of the temperatures at each time as the average temperature of the training data. The data processing unit 126 also classifies the extracted solar radiation amounts by time, and sets the average of the solar radiation amounts at each time as the average solar radiation amount of the training data.

（Ｓ１３）機械学習部１２７は、予測モデルの係数を初期化する。
（Ｓ１４）機械学習部１２７は、訓練データのレコードを１つ選択する。
（Ｓ１５）機械学習部１２７は、平均気温および平均日射量を示す４８次元の説明変数のデータを予測モデルに入力し、予測モデルから目的変数のデータを読み出すことで所要日数分布を予測する。所要日数分布は、所要日数毎の確率を示す。 (S13) The machine learning unit 127 initializes the coefficients of the prediction model.
(S14) The machine learning unit 127 selects one training data record.
(S15) The machine learning unit 127 inputs the data of the 48-dimensional explanatory variables indicating the average temperature and the average amount of solar radiation to the prediction model, and reads out the data of the objective variable from the prediction model to predict the required number of days distribution. The required number of days distribution indicates the probability for each required number of days.

（Ｓ１６）機械学習部１２７は、ステップＳ１４で訓練データの全てのレコードを選択したか判断する。全てのレコードを選択した場合はステップＳ１７に進み、訓練データに未選択のレコードがある場合はステップＳ１４に戻る。 (S16) The machine learning unit 127 determines whether all the training data records have been selected in step S14. If all the records have been selected, the process proceeds to step S17, and if there are unselected records in the training data, the process returns to step S14.

（Ｓ１７）機械学習部１２７は、訓練データのレコード毎に、ステップＳ１５で予測された所要日数分布と標本日数分布とを比較して誤差を算出する。誤差は、例えば、残差平方和である。機械学習部１２７は、訓練データ全体に対するモデル誤差を算出する。例えば、モデル誤差は、訓練データのレコード毎の誤差の合計である。 (S17) The machine learning unit 127 compares the required number of days distribution predicted in step S15 with the sample number of days distribution for each record of the training data to calculate an error. The error is, for example, the residual sum of squares. The machine learning unit 127 calculates model errors for the entire training data. For example, the model error is the sum of errors for each record of the training data.

図１５は、機械学習の手順例を示すフローチャート（続き）である。
（Ｓ１８）イテレーション制御部１２８は、訓練データのレコードを１つ選択する。
（Ｓ１９）イテレーション制御部１２８は、ステップＳ１５で機械学習部１２７が算出した所要日数分布の確率に標本着果数を乗じて、所要日数毎の標本収穫数を予測する。 FIG. 15 is a flowchart (continued) showing a procedure example of machine learning.
(S18) The iteration control unit 128 selects one training data record.
(S19) The iteration control unit 128 multiplies the probability of the required number of days distribution calculated by the machine learning unit 127 in step S15 by the sample number of fruits to predict the sample harvest number for each required number of days.

（Ｓ２０）イテレーション制御部１２８は、ステップＳ１８で訓練データの全てのレコードを選択したか判断する。全てのレコードを選択した場合はステップＳ２１に進み、訓練データに未選択のレコードがある場合はステップＳ１８に戻る。 (S20) The iteration control unit 128 determines whether all the training data records have been selected in step S18. If all the records have been selected, the process proceeds to step S21, and if there are unselected records in the training data, the process returns to step S18.

（Ｓ２１）イテレーション制御部１２８は、異なる着果日の標本収穫数を、収穫日が揃うように着果日に応じてずらし、収穫日毎に標本収穫数を集計する。
（Ｓ２２）イテレーション制御部１２８は、収穫日毎の合計の標本収穫数に標本割合の逆数を乗じて、収穫日毎の全体の収穫数を予測する。 (S21) The iteration control unit 128 shifts the number of samples harvested on different fruiting days according to the harvesting dates so that the harvesting dates are aligned, and tallies the sample harvesting numbers for each harvesting date.
(S22) The iteration control unit 128 multiplies the total number of harvested samples for each harvest date by the reciprocal of the sample ratio to predict the total number of harvests for each harvest date.

（Ｓ２３）イテレーション制御部１２８は、総数データが示す収穫日毎の収穫数の実績とステップＳ２２で算出した収穫日毎の収穫数の予測とを比較し、総数誤差を算出する。総数誤差は、例えば、残差平方和である。 (S23) The iteration control unit 128 compares the actual number of harvests for each harvest date indicated by the total data with the predicted number of harvests for each harvest date calculated in step S22, and calculates the total error. The total error is, for example, the residual sum of squares.

（Ｓ２４）イテレーション制御部１２８は、ステップＳ１４～Ｓ１７でモデル誤差を評価するイテレーションが２回目以降であるか判断する。イテレーションが２回目以降の場合はステップＳ２５に進み、１回目である場合はステップＳ２６に進む。 (S24) The iteration control unit 128 determines whether the iteration for evaluating the model error in steps S14 to S17 is the second time or later. If it is the second iteration or later, the process proceeds to step S25, and if it is the first iteration, the process proceeds to step S26.

（Ｓ２５）イテレーション制御部１２８は、今回の総数誤差が前回の総数誤差より大きいか判断する。今回の総数誤差が前回の総数誤差より大きい場合はステップＳ２６に進み、今回の総数誤差が前回の総数誤差以下である場合はステップＳ２７に進む。 (S25) The iteration control unit 128 determines whether the current total error is greater than the previous total error. If the current total error is greater than the previous total error, the process proceeds to step S26, and if the current total error is less than or equal to the previous total error, the process proceeds to step S27.

（Ｓ２６）イテレーション制御部１２８は、機械学習部１２７にイテレーション継続を指示する。機械学習部１２７は、現在の予測モデルの係数を退避し、ステップＳ１７で算出したモデル誤差が小さくなるように係数を更新する。そして、ステップＳ１４に戻る。 (S26) The iteration control unit 128 instructs the machine learning unit 127 to continue the iteration. The machine learning unit 127 saves the coefficients of the current prediction model, and updates the coefficients so that the model error calculated in step S17 becomes smaller. Then, the process returns to step S14.

（Ｓ２７）イテレーション制御部１２８は、退避してある予測モデルの係数を読み出し、１つ前のイテレーションで使用した予測モデルを予測モデル記憶部１２４に出力する。
図１６は、収穫予測の手順例を示すフローチャートである。 (S27) The iteration control unit 128 reads the coefficients of the saved prediction model, and outputs the prediction model used in the previous iteration to the prediction model storage unit 124.
FIG. 16 is a flow chart showing an example of a harvest prediction procedure.

（Ｓ３０）データ収集部１２５は、今年度の気象データと標本データを収集する。また、データ収集部１２５は、予測日以降の気象予報データを収集する。
（Ｓ３１）データ加工部１２６は、標本データから着果日を抽出し、標本果実を着果日で分類する。データ加工部１２６は、着果日毎に標本着果数をカウントする。 (S30) The data collection unit 125 collects the current year's weather data and sample data. The data collection unit 125 also collects weather forecast data after the forecast date.
(S31) The data processing unit 126 extracts the fruit-bearing date from the sample data, and classifies the sample fruit according to the fruit-bearing date. The data processing unit 126 counts the number of samples bearing fruits for each bearing day.

（Ｓ３２）データ加工部１２６は、気象予報データから、予測日から収穫時期までの予報気温および予報日射量を抽出する。データ加工部１２６は、環境パラメータを用いて、予報気温および予報日射量を屋内の予想気温および予想日射量に変換する。 (S32) The data processing unit 126 extracts forecast temperature and forecast solar radiation from the forecast date to the harvest time from the weather forecast data. The data processing unit 126 converts the predicted temperature and amount of solar radiation into a predicted indoor temperature and amount of solar radiation using the environmental parameters.

（Ｓ３３）データ加工部１２６は、着果日毎に気象データから、着果日から予測日の前日までの測定気温および測定日射量を抽出する。データ加工部１２６は、測定気温および予想気温を時刻で分類し、各時刻の平均気温を算出する。また、データ加工部１２６は、測定日射量および予想日射量を時刻で分類し、各時刻の平均日射量を算出する。 (S33) The data processing unit 126 extracts the measured temperature and the measured amount of solar radiation from the date of fruiting to the day before the forecast date from the weather data for each day of fruiting. The data processing unit 126 sorts the measured temperatures and predicted temperatures by time, and calculates the average temperature at each time. In addition, the data processing unit 126 classifies the measured amount of solar radiation and the expected amount of solar radiation by time, and calculates the average amount of solar radiation at each time.

（Ｓ３４）収穫予測部１２９は、着果日を１つ選択する。
（Ｓ３５）収穫予測部１２９は、選択した着果日に対応する平均気温および平均日射量を予測モデルに入力し、所要日数分布を予測する。 (S34) The harvest prediction unit 129 selects one fruit bearing date.
(S35) The harvest prediction unit 129 inputs the average temperature and the average amount of solar radiation corresponding to the selected fruit-bearing date to the prediction model, and predicts the required number of days distribution.

（Ｓ３６）収穫予測部１２９は、選択した着果日に対応する標本着果数を、所要日数分布が示す確率に乗じて、所要日数毎の標本収穫数に変換する。
（Ｓ３７）収穫予測部１２９は、ステップＳ３４で、標本データに出現する全ての着果日を選択したか判断する。全ての着果日を選択した場合はステップＳ３８に進み、未選択の着果日がある場合はステップＳ３４に戻る。 (S36) The harvest prediction unit 129 multiplies the sample number of fruit bearing corresponding to the selected fruit bearing date by the probability indicated by the required number of days distribution, and converts it into the number of sample harvests for each required number of days.
(S37) The harvest prediction unit 129 determines whether or not all fruit-bearing dates appearing in the sample data have been selected in step S34. If all dates have been selected, the process proceeds to step S38, and if there are unselected dates, the process returns to step S34.

（Ｓ３８）収穫予測部１２９は、異なる着果日の標本収穫数を、収穫日が揃うように着果日に応じてずらし、収穫日毎に標本収穫数を集計する。
（Ｓ３９）収穫予測部１２９は、収穫日毎の合計の標本収穫数に標本割合の逆数を乗じて、収穫日毎の全体の収穫数を予測する。 (S38) The harvest prediction unit 129 shifts the sample harvest numbers on different fruit-bearing days according to the fruit-bearing days so that the harvest dates are aligned, and tallies the sample harvest numbers for each harvest date.
(S39) The harvest prediction unit 129 multiplies the total number of sample harvests for each harvest date by the reciprocal of the sample ratio to predict the total number of harvests for each harvest date.

（Ｓ４０）収穫予測部１２９は、収穫日毎の収穫数を示す予測結果を出力する。例えば、収穫予測部１２９は、予測結果を表示装置１１１に表示する。
第２の実施の形態の情報処理システムによれば、過年度の着果から収穫までの所要日数とその期間の平均気温および平均日射量とを対応付けた訓練データを用いて、平均気温および平均日射量から所要日数を予測する予測モデルが学習される。そして、学習された予測モデルと今年度の着果状況から、今年度の収穫日および収穫数が予測される。よって、パプリカの収穫前に農家の運営にとって有用な情報を提供することができる。 (S40) The harvest prediction unit 129 outputs a prediction result indicating the number of harvests for each harvest date. For example, the harvest prediction unit 129 displays the prediction result on the display device 111 .
According to the information processing system of the second embodiment, using training data that associates the number of days required from fruit setting to harvest in the past year with the average temperature and average solar radiation during that period, the average temperature and average solar radiation A prediction model is learned that predicts the number of days required from the quantity. Then, the harvest date and number of harvests for the current year are predicted from the learned prediction model and the fruiting situation for the current year. Therefore, it is possible to provide useful information for farm management before the paprika is harvested.

また、予測モデルは所要日数の期待値ではなく所要日数の確率分布を出力するよう学習される。よって、同じ育成環境のもとで育てても成長速度が大きく異なるというパプリカの個体差の性質を考慮して、収穫日のばらつきを予測することが可能となる。また、学習途中の予測モデルを用いて訓練データから予測される全体収穫数と過年度の実際の全体収穫数とが比較され、総数誤差が最小になったことが検出されると機械学習のイテレーションが停止され、総数誤差が最小になったときの予測モデルが出力される。よって、過学習により予測モデルが過度に分散の小さい確率分布を出力するようになることを抑制でき、予測モデルの予測精度を向上させることができる。また、少ない標本データからであっても、収穫日のばらつきを適切に反映した予測モデルが生成される。よって、標本果実を観測して標本データを採取する農家の負担を軽減できる。 Also, the prediction model is trained to output the probability distribution of the number of required days instead of the expected value of the number of required days. Therefore, it is possible to predict the variation in harvest dates, taking into consideration the individual differences in paprika that the growth rate varies greatly even when grown under the same growing environment. In addition, using the prediction model in the middle of learning, the total number of harvests predicted from the training data and the actual total number of harvests in the past year are compared, and when it is detected that the total error is minimized, the machine learning iteration is started. It is stopped and the prediction model when the total error is minimized is output. Therefore, it is possible to prevent the prediction model from outputting a probability distribution with excessively small variance due to over-learning, and it is possible to improve the prediction accuracy of the prediction model. Moreover, even from a small amount of sample data, a prediction model that appropriately reflects variations in harvest dates is generated. Therefore, it is possible to reduce the burden on the farmer who observes the sample fruit and collects the sample data.

１０機械学習装置
１１記憶部
１２処理部
１３訓練データ
１４総数データ
１５学習処理
１６予測モデル
１７予測分布 REFERENCE SIGNS LIST 10 machine learning device 11 storage unit 12 processing unit 13 training data 14 total data 15 learning process 16 prediction model 17 prediction distribution

Claims

to the computer,
training data each including a plurality of records in which information on the growing environment of the sample crop and the required number of days from the reference date when a predetermined state was observed to the harvest date of the sample crop are associated; and a plurality of records indicated by the plurality of records Obtain total data showing the actual distribution of the number of harvests for the harvest date for the crop set including the sample crops and other crops,
Generating a prediction model for calculating the probability distribution of the required number of days from information on the training environment, and using the training data to evaluate the error of the probability distribution calculated by the prediction model and updating the prediction model. start the iterative learning process,
In the middle of the learning process, a plurality of probability distributions calculated by the prediction model are synthesized from the information of the growing environment indicated by the plurality of records to calculate a predicted distribution of the number of harvests for the harvest date, and the predicted distribution is calculated. Determining when to stop the learning process based on the similarity between the actual distribution indicated by the total data,
A machine learning program that makes you do things.

each of the sample crop and the other crop is a fruit;
The reference date is a date on which fruiting was observed,
The information on the growing environment of the sample crop includes the temperature and the amount of solar radiation from the date of fruiting to the date of harvest of the sample crop.
The machine learning program according to claim 1.

In determining the stop timing, the similarity is evaluated each time the prediction model is updated, and when a peak of the similarity is detected, the learning process is stopped, and the learning process corresponding to the peak of the similarity is detected. Output the prediction model as a learning result,
3. The machine learning program according to claim 1 or 2.

each of the plurality of records includes the reference date;
In determining the stop timing, combining the plurality of probability distributions corresponding to the plurality of records based on the reference date to calculate a sample prediction distribution of the number of harvests for the harvest date for the plurality of sample crops; calculating the predicted distribution for the crop set from the sample ratio of the plurality of sample crops to the crop set and the sample predicted distribution;
The machine learning program according to any one of claims 1 to 3.

In the determination of the stop timing, the learning process is stopped when the similarity indicates that the similarity is equal to or greater than a predetermined standard.
The machine learning program according to any one of claims 1 to 4.

the computer
training data each including a plurality of records in which information on the growing environment of the sample crop and the required number of days from the reference date when a predetermined state was observed to the harvest date of the sample crop are associated; and a plurality of records indicated by the plurality of records Obtain total data showing the actual distribution of the number of harvests for the harvest date for the crop set including the sample crops and other crops,
Generating a prediction model for calculating the probability distribution of the required number of days from information on the training environment, and using the training data to evaluate the error of the probability distribution calculated by the prediction model and updating the prediction model. start the iterative learning process,
In the middle of the learning process, a plurality of probability distributions calculated by the prediction model are synthesized from the information of the growing environment indicated by the plurality of records to calculate a predicted distribution of the number of harvests for the harvest date, and the predicted distribution is calculated. Determining when to stop the learning process based on the similarity between the actual distribution indicated by the total data,
machine learning method.

training data each including a plurality of records in which information on the growing environment of the sample crop and the required number of days from the reference date when a predetermined state was observed to the harvest date of the sample crop are associated; and a plurality of records indicated by the plurality of records a storage unit for storing total data indicating the actual distribution of the number of harvests with respect to the harvest date for a set of crops including sample crops and other crops;
Generating a prediction model for calculating the probability distribution of the required number of days from information on the training environment, and using the training data to evaluate the error of the probability distribution calculated by the prediction model and updating the prediction model. Repeated learning processing is started, and in the middle of the learning processing, a plurality of probability distributions calculated by the prediction model are synthesized from the information of the growing environment indicated by the plurality of records to obtain a prediction distribution of the number of harvests for the harvest date. a processing unit that calculates and determines when to stop the learning process based on the degree of similarity between the predicted distribution and the actual distribution indicated by the total data;
A machine learning device having