JP6570929B2

JP6570929B2 - Characteristic estimation model generation apparatus and method, analysis target characteristic estimation apparatus and method

Info

Publication number: JP6570929B2
Application number: JP2015176416A
Authority: JP
Inventors: 井澤　毅; 毅井澤; 麻衣子深田
Original assignee: National Agriculture and Food Research Organization
Current assignee: National Agriculture and Food Research Organization
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2019-09-04
Anticipated expiration: 2035-09-08
Also published as: JP2017051118A

Description

本発明は、解析対象の特性変数及び状態情報に基づいて、解析対象の特性推定モデルを生成する、解析対象の特性推定モデル生成装置および方法、解析対象の特性推定装置および方法に関する。特に、イネ等作物の遺伝子発現情報から、作物の特性である形質を推定するのに好適な特性推定モデル生成装置および方法、解析対象の特性推定装置および方法に関する。 The present invention relates to an analysis target characteristic estimation model generation apparatus and method, and an analysis target characteristic estimation apparatus and method, which generate an analysis target characteristic estimation model based on an analysis target characteristic variable and state information. In particular, the present invention relates to a characteristic estimation model generation apparatus and method suitable for estimating a trait that is a characteristic of a crop from gene expression information of a crop such as rice, and a characteristic estimation apparatus and method to be analyzed.

近年、ＰＣなどの処理装置の高性能化が著しい。そのため、自然界に発生する現象のように複雑な系を、大量のデータを用いた統計解析によってモデル化する技術開発が盛んである。このような統計解析手法としてよく用いられる手法がLASSO回帰などといった、正則化項を有する回帰分析である。
しかし、正則化項を有する回帰分析は、与えられたデータから最も良いモデルを生成することに終始する。そのため、結果に一意性がなく、モデル生成の際に目的変数に対する説明変数を網羅的に抽出することができないという課題があった。 In recent years, the performance of processing apparatuses such as PCs has been greatly improved. For this reason, technological development for modeling a complex system such as a phenomenon occurring in the natural world by statistical analysis using a large amount of data has been active. A technique often used as such a statistical analysis technique is a regression analysis having a regularization term such as LASSO regression.
However, regression analysis with regularization terms is all about generating the best model from the given data. For this reason, there is a problem that the results are not unique, and it is not possible to exhaustively extract explanatory variables for the objective variable when generating the model.

また、上記のようなモデル化が望まれている分野の一つが農業である。
作物の生育度を定量的に測定し、生育診断や土壌診断を行って、栽培管理を行う技術が近年開発されている。具体的には、作物の生育度等を定量的に測定、もしくは予測を行い、的確な施肥時期や施肥量を決定することで科学的、計画的に作物の栽培、収穫を行うものである。 Moreover, agriculture is one of the fields in which modeling as described above is desired.
In recent years, a technique for performing cultivation management by quantitatively measuring the degree of growth of crops, performing growth diagnosis and soil diagnosis has been developed. Specifically, cultivation and harvesting of crops are carried out scientifically and systematically by quantitatively measuring or predicting the degree of growth of crops, etc., and determining the exact fertilization time and amount.

科学的、計画的な作物の栽培、収穫の実現に特に重要な要素は、施肥である。施肥のタイミングは、作物の収量と品質管理に大きく影響を与える。
一般的には、施肥の量を増やすことで、収量を上げることができるが、コストが増加してしまう。さらに、過剰な施肥は、収穫対象部位に多くのタンパク質、すなわち窒素を溜めることとなり、品質が低下してしまう。また、肥料の中にはリン酸が含まれる。過剰な施肥により、例えば水田中に飽和したリン酸が、河川や海などに流出することは深刻な水質汚染にもつながるといった問題も発生する。
従って、作物の生育状況に合わせて、適切な量およびタイミングでの施肥が、作物の収量と品質を上げるためには重要である。 Fertilization is a particularly important factor in the cultivation and harvesting of scientific and planned crops. The timing of fertilization has a major impact on crop yield and quality control.
In general, increasing the amount of fertilizer can increase the yield, but increases the cost. Furthermore, excessive fertilization will accumulate a large amount of protein, that is, nitrogen, in the harvest target site, and the quality will deteriorate. In addition, the fertilizer contains phosphoric acid. Due to excessive fertilization, for example, the phosphoric acid saturated in the paddy field flows out into rivers, seas, etc., leading to serious water pollution.
Therefore, fertilization with an appropriate amount and timing according to the growing situation of the crop is important for increasing the yield and quality of the crop.

従って、科学的、計画的な作物の栽培、収穫の実現のためには、施肥の適切な管理が重要である。そのためには、農作物の生育状況をより正確に、短時間に把握することが重要である。
従来は、作物の生育状況を、葉の色などの作物の外観より得られる情報から判断していた。しかし、作物の外観は温度や日照時間、水質など様々な要素の影響を受けるため、外観から定量的に作物の生育状況を判断することは難しい。 Therefore, appropriate management of fertilization is important for the cultivation and harvesting of scientific and planned crops. For that purpose, it is important to grasp the growing situation of the crop more accurately and in a short time.
Conventionally, the growth status of a crop has been judged from information obtained from the appearance of the crop, such as the color of leaves. However, since the appearance of crops is affected by various factors such as temperature, sunshine duration, and water quality, it is difficult to quantitatively judge the crop growth status from the appearance.

このような課題に対応するために、作物の葉の葉緑素量と、遅延発光量とを測定することで、水分量が足りているかどうかの指標である渇水ストレスの状況を判断する方法が特許文献１に開示されている。 In order to cope with such a problem, a method for determining a drought stress situation, which is an index of whether or not the amount of water is sufficient, is measured by measuring the amount of chlorophyll and the amount of delayed luminescence in a crop. 1 is disclosed.

しかし、葉緑素値は外乱の影響を受けやすいため、個体差が大きい。そのため、葉緑素量を使用することで生育状況の判定結果が不安定になるという課題があり、葉緑素量を用いることなく、生育状況を定量的に判断する手法が望まれていた。 However, the chlorophyll value is susceptible to disturbances, so there are large individual differences. For this reason, there is a problem that the determination result of the growth state becomes unstable by using the amount of chlorophyll, and a method for quantitatively determining the growth state without using the amount of chlorophyll has been desired.

葉緑素量を用いることなく定量的に生育状況を判断する従来技術として、トウモロコシの遺伝子発現情報を解析することで、施肥の有無に応答する遺伝子をかどうかを判断する推定モデルを生成する技術が非特許文献１に開示されている。 As a conventional technique for quantitatively determining the growth status without using chlorophyll content, there is no technique for generating an estimation model for determining whether a gene responds to the presence or absence of fertilization by analyzing gene expression information in maize. It is disclosed in Patent Document 1.

非特許文献１の技術を実施することで、測定対象のトウモロコシに発現した遺伝子情報が、施肥の有無に応答するかどうかを判断することはできる。しかし、推定モデルを生成するためには、施肥の有無に応答する遺伝子の絞り込みのための実験を行う必要があり、非常に煩雑であった。また、この技術によると、葉内に窒素がどの程度溜まっているかを表す、窒素含有量を推定するモデルを作成することはできる。しかし、窒素含有量以外の農業形質を網羅的に推定することができないという課題があった。 By implementing the technique of Non-Patent Document 1, it is possible to determine whether or not the gene information expressed in the measurement target corn responds to the presence or absence of fertilization. However, in order to generate an estimation model, it is necessary to conduct experiments for narrowing down genes that respond to the presence or absence of fertilization, which is very complicated. Also, according to this technique, it is possible to create a model for estimating the nitrogen content that expresses how much nitrogen is accumulated in the leaves. However, there was a problem that agricultural traits other than nitrogen content could not be comprehensively estimated.

さらに、上記のような遺伝子解析は、専門の知識を有する技術者によって行われることが多く、生育状況の判断のために煩雑な手間および時間が必要となる。そのため、生育状況を短時間で、自動的に判断する技術が望まれている。 Furthermore, the gene analysis as described above is often performed by an engineer having specialized knowledge, and complicated labor and time are required for determining the growth status. Therefore, a technique for automatically determining the growth status in a short time is desired.

特開２０１３−１８３７０２号公報JP 2013-183702 A

Yang XS1, Wu J, Ziegler TE, Yang X, Zayed A, Rajani MS, Zhou D, Basra AS, Schachtman DP, Peng M, Armstrong CL, Caldo RA, Morrell JA, Lacy M, Staub JM (2011) Gene expression biomarkers provide sensitive indicators of in planta nitrogen status in maize. Plant Physiology December 2011, Vol. 157, pp. 1841-1852Yang XS1, Wu J, Ziegler TE, Yang X, Zayed A, Rajani MS, Zhou D, Basra AS, Schachtman DP, Peng M, Armstrong CL, Caldo RA, Morrell JA, Lacy M, Staub JM (2011) Gene expression biomarkers provide sensitive indicators of in planta nitrogen status in maize.Plant Physiology December 2011, Vol. 157, pp. 1841-1852

本発明は上記事情に鑑み、解析対象の特性を推定するモデル生成において、目的変数に対する説明変数を網羅的に抽出可能なモデル生成装置および、生成されたモデルを用いて作物の生育状況を自動的に推定する装置を得ることを目的とする。 In view of the above circumstances, in the present invention, in model generation for estimating the characteristics of an analysis target, a model generation apparatus capable of exhaustively extracting explanatory variables for objective variables, and the growth status of crops automatically using the generated model An object of the present invention is to obtain an apparatus for estimating the above.

（構成１）
解析対象の状態を表す状態変数と、前記解析対象の特性を表す特性変数から、前記特性変数を前記状態変数により推定するモデルを生成する特性推定モデル生成装置であって、
前記解析対象の状態変数と、前記解析対象の特性変数が入力され、解析用データとして出力する、データ出力部と、
前記解析用データのうち、前記特性変数を目的変数とし、前記状態変数を説明変数として、正則化項を有する回帰分析を行うことで、前記目的変数と前記説明変数との関係を表す回帰モデルを生成する、回帰分析部と、
前記回帰モデルと前記解析用データを用いて、事前に設定された検証回数まで交差検証を行い、前記回帰モデルのうち最適な正則化項を有するモデルを、特性推定モデルとして生成する特性推定モデル生成部と、
前記特性推定モデルにおいて選択された説明変数に対応するデータを、前記解析用データから除外したデータを、更新データとして生成し、次回の特性推定モデル生成時の解析用データとして前記データ出力部へと出力する、解析用データ更新部と、
を備え、
事前に設定された繰り返し回数まで、前記解析用データの更新と、前記更新データを用いた特性推定モデルの生成と、を繰り返すことを特徴とする特性推定モデル生成装置。 (Configuration 1)
A characteristic estimation model generation device that generates a model that estimates the characteristic variable from the state variable that represents the state of the analysis target and the characteristic variable that represents the characteristic of the analysis target.
A data output unit that receives the state variable to be analyzed and the characteristic variable to be analyzed and outputs as analysis data;
Of the analysis data, the regression variable representing the relationship between the objective variable and the explanatory variable is obtained by performing a regression analysis having a regularization term using the characteristic variable as an objective variable and the state variable as an explanatory variable. A regression analysis unit to generate,
Using the regression model and the analysis data, cross-validation is performed up to a preset number of verifications, and a model having an optimal regularization term among the regression models is generated as a characteristic estimation model generation. And
Data corresponding to the explanatory variable selected in the characteristic estimation model, data excluded from the analysis data, is generated as update data, and is sent to the data output unit as analysis data at the next generation of the characteristic estimation model. An analysis data update unit to output,
With
A characteristic estimation model generation apparatus that repeats updating of the analysis data and generation of a characteristic estimation model using the update data up to a preset number of repetitions.

（構成２）
前記特性推定モデル生成部において交差検証を行う毎に生成される、前記回帰モデルについて、全てのモデルにおいて選択されている説明変数を、前記解析対象の特性に関連する状態情報である特性関連状態情報として抽出する、特性関連状態情報抽出部を更に備える構成１に記載の特性推定モデル生成装置。 (Configuration 2)
About the regression model generated every time cross-validation is performed in the characteristic estimation model generation unit, the explanatory variables selected in all models are characteristic-related state information that is state information related to the characteristic to be analyzed The characteristic estimation model production | generation apparatus of the structure 1 further provided with the characteristic relevant state information extraction part extracted as follows.

（構成３）
前記回帰分析は、LASSO回帰、Ridge回帰もしくは、Elastic-netである構成１又は２に記載の特性推定モデル生成装置。 (Configuration 3)
The characteristic estimation model generation device according to Configuration 1 or 2, wherein the regression analysis is LASSO regression, Ridge regression, or Elastic-net.

（構成４）
前記解析対象は生物であり、前記状態変数は遺伝子発現情報であり、前記特性変数は形質情報である、構成１から３の何れかに記載の特性推定モデル生成装置。 (Configuration 4)
The characteristic estimation model generation device according to any one of configurations 1 to 3, wherein the analysis target is a living organism, the state variable is gene expression information, and the characteristic variable is trait information.

（構成５）
前記生物は作物である、構成４に記載の特性推定モデル生成装置。 (Configuration 5)
The characteristic estimation model generation device according to Configuration 4, wherein the organism is a crop.

（構成６）
前記作物はイネ科作物である、構成５に記載の特性推定モデル生成装置。 (Configuration 6)
The characteristic estimation model generation device according to Configuration 5, wherein the crop is a gramineous crop.

（構成７）
前記イネ科作物はイネである、構成６に記載の特性推定モデル生成装置。 (Configuration 7)
The characteristic estimation model generation device according to configuration 6, wherein the gramineous crop is rice.

（構成８）
前記形質情報は、移植日からの日数を表す移植日後日数、サンプリング日から開花までに要する日数を表す開花日または、観測対象の乾燥重量当たりの窒素含量を表す窒素含量である構成５から７の何れかに記載の特性推定モデル生成装置。 (Configuration 8)
The trait information is the number of days after the transplantation date representing the number of days from the date of transplantation, the flowering date representing the number of days required from the sampling date to flowering, or the nitrogen content representing the nitrogen content per dry weight of the observation target. The characteristic estimation model generation apparatus according to any one of the above.

（構成９）
前記解析対象の前記特性を推定する、特性推定装置であって、
構成１から８の何れかに記載の特性推定モデル生成装置によって生成した前記特性推定モデルに、前記解析対象の状態変数が入力されることで、前記解析対象の特性の推定結果である、特性推定情報が出力される、特性推定部を有することを特徴とする、特性推定装置。 (Configuration 9)
A characteristic estimation device for estimating the characteristic of the analysis target,
A characteristic estimation that is an estimation result of the characteristic of the analysis target when the state variable of the analysis target is input to the characteristic estimation model generated by the characteristic estimation model generation device according to any one of configurations 1 to 8 A characteristic estimation apparatus comprising a characteristic estimation unit for outputting information.

（構成１０）
前記解析対象の前記特性を推定する、特性推定装置であって、
構成４から８の何れかに記載の特性推定モデル生成装置によって生成した前記特性推定モデルに、前記生物の遺伝子発現情報が入力されることで、前記解析対象の特性の推定結果である、特性推定情報が出力される、特性推定部を有し、
前記生物の前記遺伝子発現情報は、前記特性関連状態情報抽出部にて抽出された、前記特性関連状態情報に対応する遺伝子より転写されるｍＲＮＡ量から測定されることを特徴とする特性推定装置。 (Configuration 10)
A characteristic estimation device for estimating the characteristic of the analysis target,
Characteristic estimation that is an estimation result of the characteristics of the analysis target by inputting gene expression information of the organism to the characteristic estimation model generated by the characteristic estimation model generation device according to any one of configurations 4 to 8 A characteristic estimation unit for outputting information;
The gene expression information of the organism is measured from the amount of mRNA transcribed from a gene corresponding to the property-related state information extracted by the property-related state information extraction unit.

（構成１１）
前記ｍＲＮＡの発現量が、定量ＲＴ−ＰＣＲ法により測定される、構成１０に記載の特性推定装置。 (Configuration 11)
The characteristic estimation apparatus according to Configuration 10, wherein the expression level of the mRNA is measured by a quantitative RT-PCR method.

（構成１２）
前記定量ＲＴ−ＰＣＲ法は、リアルタイムＲＴ−ＰＣＲ法である、構成１１に記載の特性推定装置。 (Configuration 12)
The characteristic estimation apparatus according to Configuration 11, wherein the quantitative RT-PCR method is a real-time RT-PCR method.

（構成１３）
前記定量ＲＴ−ＰＣＲ法が、マルチプレックスＲＴ−ＰＣＲ法である、構成１１に記載の特性推定装置。 (Configuration 13)
The characteristic estimation apparatus according to Configuration 11, wherein the quantitative RT-PCR method is a multiplex RT-PCR method.

（構成１４）
前記特性推定情報を用いて、前記生物の状態を診断する、状態診断部を更に備える、構成１０から１３の何れかに記載の特性推定装置。 (Configuration 14)
The characteristic estimation apparatus according to any one of configurations 10 to 13, further comprising a state diagnosis unit that diagnoses the state of the organism using the characteristic estimation information.

（構成１５）
解析対象の状態を表す状態変数と、前記解析対象の特性を表す特性変数から、前記特性変数を前記状態変数により推定するモデルを生成する特性推定モデル生成方法であって、
前記解析対象の状態変数と、前記解析対象の特性変数を、解析用データとし、
前記解析用データのうち、前記特性変数を目的変数とし、前記状態変数を説明変数として、正則化項を有する回帰分析を行うことで、前記目的変数と前記説明変数との関係を表す回帰モデルを生成する、回帰分析ステップと、
前記回帰モデルと前記解析用データを用いて、事前に設定された検証回数まで交差検証を行い、前記回帰モデルのうち最適な正則化項を有するモデルを、特性推定モデルとして生成する特性推定モデル生成ステップと、
前記特性推定モデルにおいて選択された説明変数に対応するデータを、前記解析用データから除外したデータを、更新データとして生成し、次回の特性推定モデル生成時の解析用データとして更新する、解析用データ更新ステップと、
を備え、
事前に設定された繰り返し回数まで、前記解析用データの更新と、前記更新データを用いた特性推定モデルの生成と、を繰り返すことを特徴とする特性推定モデル生成方法。 (Configuration 15)
A characteristic estimation model generation method for generating a model for estimating the characteristic variable from the state variable representing the state of the analysis target and the characteristic variable representing the characteristic of the analysis target by the state variable,
The state variable to be analyzed and the characteristic variable to be analyzed are data for analysis,
Of the analysis data, the regression variable representing the relationship between the objective variable and the explanatory variable is obtained by performing a regression analysis having a regularization term using the characteristic variable as an objective variable and the state variable as an explanatory variable. A regression analysis step to generate,
Using the regression model and the analysis data, cross-validation is performed up to a preset number of verifications, and a model having an optimal regularization term among the regression models is generated as a characteristic estimation model generation. Steps,
Data for analysis corresponding to the explanatory variable selected in the characteristic estimation model is generated as update data, data that is excluded from the analysis data, and is updated as analysis data for the next generation of the characteristic estimation model An update step;
With
A characteristic estimation model generation method characterized by repeating the updating of the analysis data and the generation of a characteristic estimation model using the update data up to a preset number of repetitions.

（構成１６）
前記特性推定モデル生成ステップにおいて交差検証を行う毎に生成される、前記回帰モデルについて、全てのモデルにおいて選択されている説明変数を、前記解析対象の特性に関連する状態情報である特性関連状態情報として抽出する、特性関連状態情報抽出ステップを更に備える構成１５に記載の特性推定モデル生成方法。 (Configuration 16)
About the regression model generated every time cross-validation is performed in the characteristic estimation model generation step, the explanatory variables selected in all models are characteristic-related state information that is state information related to the characteristic to be analyzed The characteristic estimation model generation method according to configuration 15, further comprising a characteristic-related state information extraction step that is extracted as follows.

（構成１７）
前記回帰分析は、LASSO回帰、Ridge回帰もしくは、Elastic-netである構成１５又は１６に記載の特性推定モデル生成方法。 (Configuration 17)
The characteristic estimation model generation method according to Configuration 15 or 16, wherein the regression analysis is LASSO regression, Ridge regression, or Elastic-net.

（構成１８）
前記解析対象は生物であり、前記状態変数は遺伝子発現情報であり、前記特性変数は形質情報である、構成１５から１７の何れかに記載の特性推定モデル生成方法。 (Configuration 18)
The characteristic estimation model generation method according to any one of configurations 15 to 17, wherein the analysis target is a living organism, the state variable is gene expression information, and the characteristic variable is trait information.

（構成１９）
前記生物は作物である、構成１８に記載の特性推定モデル生成方法。 (Configuration 19)
The characteristic estimation model generation method according to configuration 18, wherein the organism is a crop.

（構成２０）
前記作物はイネ科作物である、構成１９に記載の特性推定モデル生成方法。 (Configuration 20)
The characteristic estimation model generation method according to Configuration 19, wherein the crop is a gramineous crop.

（構成２１）
前記イネ科作物はイネである、構成２０に記載の特性推定モデル生成方法。 (Configuration 21)
The characteristic estimation model generation method according to configuration 20, wherein the gramineous crop is rice.

（構成２２）
前記形質情報は、移植日からの日数を表す移植日後日数、開花までに要する日数を表す開花日または、観測対象の乾燥重量当たりの窒素含量を表す窒素含量である構成１９から２１の何れかに記載の特性推定モデル生成方法。 (Configuration 22)
The trait information is any one of configurations 19 to 21 which is the number of days after the transplantation date indicating the number of days from the date of transplantation, the flowering date indicating the number of days required for flowering, or the nitrogen content indicating the nitrogen content per dry weight of the observation target The characteristic estimation model generation method described.

（構成２３）
前記解析対象の前記特性を推定する、特性推定方法であって、
構成１５から２２の何れかに記載の特性推定モデル生成方法によって生成した前記特性推定モデルに、前記解析対象の状態変数が入力されることで、前記解析対象の特性の推定結果である、特性推定情報が出力される、特性推定ステップを有することを特徴とする、特性推定方法。 (Configuration 23)
A property estimation method for estimating the property of the analysis target,
A characteristic estimation that is an estimation result of the characteristic of the analysis target by inputting the state variable of the analysis target into the characteristic estimation model generated by the characteristic estimation model generation method according to any one of the configurations 15 to 22 A characteristic estimation method comprising a characteristic estimation step in which information is output.

（構成２４）
前記解析対象の前記特性を推定する、特性推定方法であって、
構成１８から２２の何れかに記載の特性推定モデル生成方法によって生成した前記特性推定モデルに、前記生物の遺伝子発現情報が入力されることで、前記解析対象の特性の推定結果である、特性推定情報が出力される、特性推定ステップを有し、
前記生物の前記遺伝子発現情報は、前記特性関連状態情報抽出ステップにて抽出された、前記特性関連状態情報に対応する遺伝子より転写されるｍＲＮＡ量から測定されることを特徴とする特性推定方法。 (Configuration 24)
A property estimation method for estimating the property of the analysis target,
A characteristic estimation, which is an estimation result of the characteristic of the analysis target, when gene expression information of the organism is input to the characteristic estimation model generated by the characteristic estimation model generation method according to any one of configurations 18 to 22. A characteristic estimation step in which information is output;
The gene estimation information of the organism is measured from the amount of mRNA transcribed from the gene corresponding to the property-related state information extracted in the property-related state information extraction step.

（構成２５）
前記ｍＲＮＡの発現量が、定量ＲＴ−ＰＣＲ法により測定される、構成２４に記載の特性推定方法。 (Configuration 25)
25. The characteristic estimation method according to Configuration 24, wherein the mRNA expression level is measured by a quantitative RT-PCR method.

（構成２６）
前記定量ＲＴ−ＰＣＲ法は、リアルタイムＲＴ−ＰＣＲ法である、構成２５に記載の特性推定方法。 (Configuration 26)
The characteristic estimation method according to Configuration 25, wherein the quantitative RT-PCR method is a real-time RT-PCR method.

（構成２７）
前記定量ＲＴ−ＰＣＲ法が、マルチプレックスＲＴ−ＰＣＲ法である、構成２５に記載の特性推定方法。 (Configuration 27)
The characteristic estimation method according to Configuration 25, wherein the quantitative RT-PCR method is a multiplex RT-PCR method.

（構成２８）
前記特性推定情報を用いて、前記生物の状態を診断する、状態診断ステップを更に備える、構成２４から２７の何れかに記載の特性推定方法。 (Configuration 28)
The characteristic estimation method according to any one of configurations 24 to 27, further comprising a state diagnosis step of diagnosing the state of the organism using the characteristic estimation information.

この発明によれば、解析対象の特性変数に関連する状態変数を網羅的に抽出可能な特性推定モデル生成装置を得ることができるという効果がある。更に、生成されたモデルを用いて、作物の生育状況を自動的に推定する装置を得ることができるという効果がある。 According to the present invention, there is an effect that it is possible to obtain a characteristic estimation model generation apparatus capable of exhaustively extracting state variables related to characteristic variables to be analyzed. Furthermore, there is an effect that an apparatus for automatically estimating the growth state of a crop can be obtained using the generated model.

この発明の実施の形態における特性推定モデル生成装置を示す構成図である。It is a block diagram which shows the characteristic estimation model production | generation apparatus in embodiment of this invention. この発明の実施の形態における特性推定モデル生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the characteristic estimation model production | generation apparatus in embodiment of this invention. この発明の実施の形態における特性推定装置を示す構成図である。It is a block diagram which shows the characteristic estimation apparatus in embodiment of this invention. この発明の実施の形態における特性推定装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the characteristic estimation apparatus in embodiment of this invention. 横軸を移植日後日数、縦軸を乾燥重量あたりの窒素含量として、２２５の解析用データと、解析用データから作成した特性推定モデルを用いて特性を推定した結果を表す図である。It is a figure showing the result of having estimated the characteristic using the characteristic estimation model created from the analytical data of 225, and the analytical data by making the horizontal axis the number of days after transplantation and the vertical axis the nitrogen content per dry weight. 図６は、解析対象をイネとして、移植日後日数、開花日、窒素含量について、それぞれ実際の圃場から採取した４９７サンプル、４２８サンプル、２２４を解析用データとして、繰り返し回数毎に生成された特性推定モデルの精度を平均二乗誤差により評価している。FIG. 6 shows the estimation of characteristics generated for each number of iterations, using rice as an analysis target, and 497 samples, 428 samples, and 224 sampled from actual fields for the number of days after transplanting, date of flowering, and nitrogen content, respectively. The accuracy of the model is evaluated by the mean square error. 図７は、繰り返し回数を１０とし、その他は図６と同様の条件にした場合の、繰り返し回数毎に記録された特性変数の数を示したものである。7, the number of repetitions is 10, and the other in the case of the same conditions as FIG. 6 shows the number of recorded properties variables for each number of iterations. 図８は、移植日後日数について、交差検証を10回繰り返した場合に同じ説明変数が何回選ばれたかを表した図である。FIG. 8 is a diagram showing how many times the same explanatory variable is selected when cross-validation is repeated 10 times for the number of days after transplantation. 図９は、開花日について、交差検証を10回繰り返した場合に同じ説明変数が何回選ばれたかを表した図である。FIG. 9 is a diagram showing how many times the same explanatory variable is selected when the cross-validation is repeated 10 times for the flowering date. 図１０は、窒素含量について、交差検証を10回繰り返した場合に同じ説明変数が何回選ばれたかを表した図である。FIG. 10 is a diagram showing how many times the same explanatory variable is selected when the cross-validation is repeated 10 times for the nitrogen content.

実施の形態
図１はこの発明の実施の形態による特性推定モデル生成装置の構成を示すブロック図である。実施の形態に係る特性推定モデル生成装置の構成について図を参照しながら説明する。 Embodiment FIG. 1 is a block diagram showing a configuration of a characteristic estimation model generation apparatus according to an embodiment of the present invention. The configuration of the characteristic estimation model generation device according to the embodiment will be described with reference to the drawings.

なお、本実施の形態において解析対象とは、イネである。
状態変数とは、イネの遺伝子発現情報であり、
特性変数とは、イネの性質や特徴を表現する情報である形質情報を表し、具体的には、イネの葉の乾燥重量あたりの葉内窒素含量、サンプリング日から何日後に穂が出るかを表す開花日および、田植えから何日経過したかを表す田植え後日数である。 In the present embodiment, the analysis target is rice.
A state variable is rice gene expression information.
Characteristic variables represent trait information, which is information that expresses the nature and characteristics of rice. Specifically, the nitrogen content in leaves per dry weight of rice leaves, and how many days after the sampling date the ears will appear. It represents the flowering date and the number of days after planting that represents how many days have passed since the planting.

図１において、特性推定モデル生成装置１００は、データ出力部１１０と、回帰分析部１２０と、特性推定モデル生成部１３０と、特性関連状態情報抽出部１４０と、解析用データ更新部１５０と、を備え、解析対象の特性変数と遺伝子発現情報から、解析対象の特性変数推定モデルを生成する装置である。 In FIG. 1, a characteristic estimation model generation apparatus 100 includes a data output unit 110, a regression analysis unit 120, a characteristic estimation model generation unit 130, a characteristic related state information extraction unit 140, and an analysis data update unit 150. It is an apparatus for generating a characteristic variable estimation model to be analyzed from a characteristic variable to be analyzed and gene expression information.

データ出力部１１０には、図示しないインターフェース等を介して、特性推定モデル作成のためのデータとして、解析対象の特性変数および状態変数が入力され、それらを解析用データとして回帰分析部１２０へ出力する。 The data output unit 110 receives characteristic variables and state variables to be analyzed as data for creating a characteristic estimation model via an interface (not shown) or the like, and outputs them to the regression analysis unit 120 as analysis data. .

回帰分析部１２０はデータ出力部１１０から解析用データが入力される。そして、解析用データに対して、正則化項を有する回帰分析を行う。具体的には、解析用データのうち、特性変数を目的変数（従属変数）、状態変数のそれぞれを説明変数として、正則化項を有する回帰モデルを生成する。 The regression analysis unit 120 receives analysis data from the data output unit 110. Then, regression analysis having a regularization term is performed on the analysis data. Specifically, a regression model having a regularization term is generated using characteristic variables as objective variables (dependent variables) and state variables as explanatory variables in the analysis data.

以下に回帰分析部１２０にて用いる回帰分析について説明する。 The regression analysis used in the regression analysis unit 120 will be described below.

本実施の形態では、回帰分析部１２０において、LASSO回帰（Least Absolute Shrinkage and Selection Operator）を用いる。LASSO回帰による回帰モデルの算出式は、以下の数１にて表される。 In the present embodiment, the regression analysis unit 120 uses LASSO regression (Least Absolute Shrinkage and Selection Operator). The calculation formula of the regression model by LASSO regression is expressed by the following formula 1.

ｙは目的変数、ｘは説明変数、βは回帰係数、γは正則化パラメータである。

y is an objective variable, x is an explanatory variable, β is a regression coefficient, and γ is a regularization parameter.

入力された解析用データを数１に代入し、Sの値を最少化する係数βを算出することで回帰モデルを生成する。すなわち、Sに関する最少化問題を解く。なお、この最少化問題は、Sに関する凸最適化問題を解くことで、解を一意に求めることができる。 A regression model is generated by substituting the input analytical data into Equation 1 and calculating a coefficient β that minimizes the value of S. That is, it solves the minimization problem for S. The minimization problem can be uniquely obtained by solving the convex optimization problem for S.

求められた最適解は、特性変数を目的変数、状態変数を説明変数とした、γの関数により表される回帰モデルである。 The obtained optimal solution is a regression model represented by a function of γ, with characteristic variables as objective variables and state variables as explanatory variables.

特性推定モデル生成部１３０では、回帰分析部にて生成した回帰モデルにおけるγの最適値を求め、任意の特性変数と、状態変数の関係式である、特性推定モデルを生成する。 The characteristic estimation model generation unit 130 obtains an optimum value of γ in the regression model generated by the regression analysis unit, and generates a characteristic estimation model that is a relational expression between an arbitrary characteristic variable and a state variable.

以下に特性推定モデル生成部１３０における特性推定モデルの生成方法について説明する。 Hereinafter, a method for generating a characteristic estimation model in the characteristic estimation model generation unit 130 will be described.

特性推定モデル生成部１３０においては、回帰分析部１２０にて生成された回帰モデルにおける最適なγの値を求めるために、交差検証を行う。 The characteristic estimation model generation unit 130 performs cross-validation in order to obtain an optimal value of γ in the regression model generated by the regression analysis unit 120.

本実施の形態においては交差検証の手法としてk-fold cross-validationを用いる。k-fold cross-validationにおいては、検証回数を表すｋの値を事前に設定する。そして、全ての解析用データを、ｋ個の標本群に分割し、そのうちの１つをテストデータとし、残るｋ−１個を訓練データとする。すなわち、解析用データから、ｋ通りのテストデータと訓練データの組み合わせが作成される。 In this embodiment, k-fold cross-validation is used as a cross-validation technique. In k-fold cross-validation, a value of k representing the number of verifications is set in advance. Then, all analysis data is divided into k sample groups, one of which is used as test data, and the remaining k-1 is used as training data. That is, k combinations of test data and training data are created from the analysis data.

そして、１つの訓練データを数１に代入し、γの関数としての回帰モデルを求め、記録する。そのモデルに対してテストデータを当てはめ、γの値を逐次的に代入し、目的変数についての平均二乗誤差を求める。これをｋ通りのテストデータおよび訓練データの組み合わせについて繰り返す。 Then, one training data is substituted into Equation 1, and a regression model as a function of γ is obtained and recorded. Test data is applied to the model, and the value of γ is sequentially substituted to find the mean square error for the objective variable. This is repeated for k combinations of test data and training data.

このようにすることで、あるγの値において、平均二乗誤差がｋ通り計算される。これらを加算平均したものを、あるγにおける平均二乗誤差の値とする。更に、あるγにおける標準誤差を算出する。 Thus, k mean square errors are calculated for a certain value of γ. The average of these values is taken as the value of the mean square error at a certain γ. Further, a standard error at a certain γ is calculated.

そして、このように算出した平均二乗誤差の最小値と、その時のγを求める。次に、one standard error ruleに基づき、最適なγの値を選択する。すなわち、そのγにおける標準誤差の範囲内に存在する最大のγをγの最適値として選択する。 Then, the minimum value of the mean square error calculated in this way and γ at that time are obtained. Next, an optimal value of γ is selected based on one standard error rule. That is, the maximum γ existing within the standard error range for γ is selected as the optimum value of γ.

最後に、このようにして求めたγの最適値と解析用データを用いて、数１に従い回帰モデルを求めることで、特性推定モデルを生成し、記録する。 Finally, a characteristic estimation model is generated and recorded by obtaining a regression model according to Equation 1 using the optimum value of γ thus obtained and analysis data.

なお、本実施の形態においては、k=10とする。また、ｋの値は事前に設定しておくようにしてもよいし、図示しないインターフェースを介して入力されるようにしてもよい。 In the present embodiment, k = 10. The value of k may be set in advance or may be input via an interface (not shown).

特性関連状態情報抽出部１４０は、特性推定モデル生成部１３０にて生成された複数の回帰モデルを用いて、ある特性に関連する状態変数を、抽出し、記録する。 The characteristic-related state information extraction unit 140 extracts and records a state variable related to a certain characteristic using the plurality of regression models generated by the characteristic estimation model generation unit 130.

まず、特性推定モデル生成部１３０において訓練データを数１に代入することで算出した回帰モデルを、交差検証が行われる毎に記録してゆく。すなわち、ｋ個の回帰モデルが記録される。
そして、記録したｋ個の回帰モデルを比較する。このｋ個のモデル全てにおいて選択された状態変数、すなわち、β≠０となる説明変数は、ある特性に関する状態変数である可能性が高いものとして抽出する。
このようにして抽出された、状態変数を、特性関連状態情報として、記録してゆく。 First, the regression model calculated by substituting the training data into Equation 1 in the characteristic estimation model generation unit 130 is recorded every time cross-validation is performed. That is, k regression models are recorded.
Then, the k regression models recorded are compared. The state variables selected in all k models, that is, explanatory variables satisfying β ≠ 0 are extracted as those that are highly likely to be state variables related to a certain characteristic.
The state variables extracted in this way are recorded as characteristic related state information.

解析用データ更新部１５０は、次回の特性推定モデル生成の際に用いる解析用データである、更新データを生成する。そして、更新データをデータ出力部１１０へと出力し、解析用データを更新する。 The analysis data update unit 150 generates update data, which is analysis data used when generating the next characteristic estimation model. Then, the update data is output to the data output unit 110, and the analysis data is updated.

更新用データの作成にあたり、特性推定モデル生成部１３０にて生成された特性推定モデルのうち、係数βが０ではない説明変数に関するデータについて、解析データから除外する。 In creating the update data, data relating to the explanatory variable whose coefficient β is not 0 among the characteristic estimation model generated by the characteristic estimation model generation unit 130 is excluded from the analysis data.

つまり、次回の特性推定モデルの生成においては、前回の特性推定モデルの生成に使用した状態変数を使用しない。 That is, in the next generation of the characteristic estimation model, the state variables used for generating the previous characteristic estimation model are not used.

このようにして作成した更新用データを用いて新たに特性推定モデルを生成することで、前回の特性推定モデルの生成には利用されることがなかった状態変数のみを用いて特性推定モデルを生成することができる。 By generating a new property estimation model using the update data created in this way, a property estimation model is generated using only state variables that were not used to generate the previous property estimation model. can do.

また、事前に設定した繰り返し回数まで、解析データの更新と特性推定モデルの生成を繰り返すことにより、任意の特性変数に関連する状態変数を網羅的に用いて、複数の特性推定モデルを生成することとなる。 In addition, by repeatedly updating analysis data and generating characteristic estimation models up to a preset number of iterations, it is possible to generate multiple characteristic estimation models using all state variables related to any characteristic variable. It becomes.

なお、本実施の形態においては、繰り返し回数=５とする。また、繰り返し回数の値は事前に設定しておくようにしてもよいし、図示しないインターフェース等を介して入力されるようにしてもよい。 In the present embodiment, the number of repetitions is set to 5. Further, the value of the number of repetitions may be set in advance or may be input via an interface (not shown) or the like.

図２は本実施の形態における特性推定モデル生成装置１００の概略動作を示すフローチャートであり、特性推定モデル生成装置１００は以下のように動作する。 FIG. 2 is a flowchart showing a schematic operation of the characteristic estimation model generation apparatus 100 according to the present embodiment. The characteristic estimation model generation apparatus 100 operates as follows.

まず、図示しない入力部によって、特性変数、状態変数、検証回数および繰り返し回数が入力され、特性変数と状態変数を解析データとする（Ｓ２０１）。 First, a characteristic variable, a state variable, the number of verifications, and the number of repetitions are input by an input unit (not shown), and the characteristic variable and the state variable are set as analysis data (S201).

次に、解析データをｋ個の標本群へと分割し、ｉ番目の標本群をテストデータとし、残りの標本群を訓練データとする。なお、ｉの初期値は１である。また、標本群への分割は、各々の標本群の有する標本数が均等になるようにする（Ｓ２０２）。 Next, the analysis data is divided into k sample groups, the i-th sample group is used as test data, and the remaining sample groups are used as training data. The initial value of i is 1. Further, the division into the sample groups is performed so that the number of samples of each sample group becomes equal (S202).

訓練データを用いて、回帰分析部１２０にて回帰モデルを求めｉ、ｊの値とともに記録する（Ｓ２０３）。 Using the training data, the regression analysis unit 120 obtains a regression model and records it together with the values of i and j (S203).

次に、特性推定モデル生成部１３０において、Ｓ２０３にて求めた回帰モデルへテストデータを入力し、各γにおける平均二乗誤差と標準誤差を計算する（Ｓ２０４）。 Next, the characteristic estimation model generation unit 130 inputs test data to the regression model obtained in S203, and calculates a mean square error and a standard error at each γ (S204).

ここでｉの値をチェックし、ｉの値が検証回数と等しくない場合（Ｓ２０５におけるＮＯの場合）、ｉの値に１を加算し、Ｓ２０２へと戻る。ｉの値が検証回数と等しい場合（Ｓ２０５におけるＹＥＳの場合）Ｓ２０６へと移行する。 Here, the value of i is checked. If the value of i is not equal to the number of verifications (NO in S205), 1 is added to the value of i, and the process returns to S202. When the value of i is equal to the number of verifications (YES in S205), the process proceeds to S206.

Ｓ２０５においてｉの値が検証回数と等しい場合、特性推定モデル生成部１３０において、各γの値における平均二乗誤差の値について加算平均を算出し、one standard error ruleに基づいて最適なγの値を決定する（Ｓ２０６）。 When the value of i is equal to the number of verifications in S205, the characteristic estimation model generation unit 130 calculates an addition average for the mean square error value in each γ value, and calculates the optimum γ value based on one standard error rule. Determine (S206).

次に、特性推定モデル生成部１３０において、Ｓ２０６にて求めた最適なγの値および、解析データの全てを用いて、回帰分析部１２０にて特性推定モデルを求め、記録する（Ｓ２０７）。 Next, in the characteristic estimation model generation unit 130, the regression analysis unit 120 obtains and records a characteristic estimation model using all of the optimum γ value obtained in S206 and the analysis data (S207).

Ｓ２０３にて記録した複数の回帰モデルのうち、現在のｊの値と共に記録されている回帰モデル全てについて、比較を行う。比較をした全てのモデルにおいて係数が０ではない説明変数を抽出し、特性関連状態情報として記録する（Ｓ２０８）。
ここで記録された特性関連状態情報は、上記全てのモデルにおいて、特性変数を表す状態変数として選択されているため、特定の特性変数に関連する状態変数である可能性が高い。 Of the plurality of regression models recorded in S203, all regression models recorded together with the current value of j are compared. An explanatory variable whose coefficient is not 0 in all the models compared is extracted and recorded as characteristic related state information (S208).
Since the characteristic-related state information recorded here is selected as a state variable representing the characteristic variable in all the models, there is a high possibility that the characteristic-related state information is a state variable related to a specific characteristic variable.

Ｓ２０８にて特性関連状態情報を記録した後、Ｓ２０７にて記録された特性推定モデルにおいて、係数が０ではない説明変数に関するデータを解析用データから除外した、更新データを解析用データ更新部１５０にて生成する（Ｓ２０９）。 After recording the characteristic-related state information in S208, the update data obtained by excluding the data related to the explanatory variable whose coefficient is not 0 from the analysis data in the characteristic estimation model recorded in S207 is sent to the analysis data update unit 150. (S209).

ここでｊの値をチェックし、ｊの値が繰り返し回数と等しくない場合（Ｓ２１０におけるＮＯの場合）、ｊの値に１を加算し、解析データをＳ２０９にて生成した更新データへと変更し、ｉの値を初期値（ｉ＝１）に設定し、Ｓ２０２へと戻る。ｊの値が繰り返し回数と等しい場合（Ｓ２１０におけるＹＥＳの場合）動作を終了する。 Here, the value of j is checked. If the value of j is not equal to the number of repetitions (in the case of NO in S210), 1 is added to the value of j, and the analysis data is changed to the update data generated in S209. , I is set to an initial value (i = 1), and the process returns to S202. If the value of j is equal to the number of repetitions (YES in S210), the operation is terminated.

上記のように、解析データを更新して、特性推定モデルの生成および特性関連状態情報の抽出を繰り返すことで、任意の特性変数に関連する特性関連状態情報を網羅的に抽出しつつ、特性を推定するモデルを複数生成することができる。 As described above, the analysis data is updated, and the generation of the characteristic estimation model and the extraction of the characteristic related state information are repeated. A plurality of models to be estimated can be generated.

図３はこの発明の実施の形態における特性推定装置の構成を示すブロック図である。本実施の形態に係る特性推定装置の構成について図を参照しながら説明する。 FIG. 3 is a block diagram showing the configuration of the characteristic estimation apparatus according to the embodiment of the present invention. The configuration of the characteristic estimation apparatus according to the present embodiment will be described with reference to the drawings.

図３において、特性推定装置３００は、特性推定部３１０と、状態診断部３２０と、を備え、入力された状態変数から、特性推定情報および、状態診断結果を出力する装置である。 In FIG. 3, a characteristic estimation apparatus 300 includes a characteristic estimation unit 310 and a state diagnosis unit 320, and outputs characteristic estimation information and a state diagnosis result from input state variables.

特性推定部３１０には、特性推定モデル生成装置１００にて生成された、種々の特性に関する推定モデルが記録されており、解析対象の状態情報が入力される。入力された状態情報を、記録された特性推定モデルへと代入することにより、種々の特性に関する推定情報が出力される。ここで、推定情報とは、特性推定モデルから出力された推定値そのものを表す。 In the characteristic estimation unit 310, estimation models related to various characteristics generated by the characteristic estimation model generation apparatus 100 are recorded, and state information to be analyzed is input. By substituting the input state information into the recorded characteristic estimation model, estimation information regarding various characteristics is output. Here, the estimation information represents the estimation value itself output from the characteristic estimation model.

なお、推定情報は、図示されない表示部などに表示されるようにしてもよく、推定情報の出力値をそのまま表示してもよいし、閾値を定め、閾値に比較して多い、少ないなどといった概念的な表現による表示にしてもよい。また、推定情報を全て出力するようにしてもよいし、特性推定装置３００の動作開始時に推定を所望する特性について、図示しない入力部によりユーザーが指定するようにしてもよい。 Note that the estimation information may be displayed on a display unit (not shown), the output value of the estimation information may be displayed as it is, or a threshold value is set and the concept is such that it is more or less than the threshold value. It may be displayed in a typical expression. Further, all of the estimation information may be output, or the user may specify the characteristic desired to be estimated at the start of the operation of the characteristic estimation apparatus 300 by an input unit (not shown).

ここで、特性推定装置３００における状態情報について説明する。 Here, the state information in the characteristic estimation apparatus 300 will be described.

本実施の形態において、状態情報とはイネの遺伝子発現情報である。イネの遺伝子発現情報とは、具体的には、イネを構成する遺伝子に対応するｍＲＮＡの発現量を表す。イネを構成する全ての遺伝子数は３２０００個を超えると推定されている。従って、本実施例における状態情報とは、この全ての遺伝子に対応するｍＲＮＡの発現量を表す。
しかし、状態情報については、上記３２０００個全ての遺伝子に対応するｍＲＮＡの発現量を測定し、入力することは、短時間でイネの生育状況を判断したいという観点から考えると、手間や特性推定の計算コストなどの観点から、現実的ではない。また、すべてのｍＲＮＡの発現情報が生育診断に関する情報を有しているわけではない。
そのため、特性推定装置３００において用いる状態変数は、特性関連状態情報抽出部１４０にて抽出された状態変数を用いる。 In the present embodiment, the state information is rice gene expression information. The rice gene expression information specifically represents the expression level of mRNA corresponding to the genes constituting rice. The number of all genes constituting rice is estimated to exceed 32,000. Therefore, the state information in this example represents the expression level of mRNA corresponding to all these genes.
However, for state information, measuring and inputting the expression level of mRNA corresponding to all 32,000 genes described above is a matter of time and characteristic estimation from the viewpoint of determining the growth status of rice in a short time. From the viewpoint of calculation cost, it is not realistic. Further, not all mRNA expression information has information on growth diagnosis.
Therefore, the state variable used in the characteristic estimation apparatus 300 uses the state variable extracted by the characteristic related state information extraction unit 140.

特性推定対象となるイネの、遺伝子発現量の測定は、定量ＲＴ―ＰＣＲ法を用いて測定することができる。 The gene expression level of rice as a characteristic estimation target can be measured using a quantitative RT-PCR method.

なお、定量ＲＴ―ＰＣＲ法としては、通常のＲＴ−ＰＣＲ法のほか、リアルタイムＲＴ―ＰＣＲ法や、マルチプレックスＲＴ―ＰＣＲ法など任意の手法を用いることができる。また、ＲＴ―ＰＣＲ法だけではなく、ＬＡＭＰ法などの他の遺伝子増幅手法を用いても良い。 As the quantitative RT-PCR method, an arbitrary method such as a real-time RT-PCR method or a multiplex RT-PCR method can be used in addition to a normal RT-PCR method. In addition to the RT-PCR method, other gene amplification methods such as the LAMP method may be used.

状態診断部３２０は、特性推定部３１０にて出力した特性推定情報が入力され、特性推定情報に従って、解析対象の現在の状態を診断し、状態診断結果を出力する。 The state diagnosis unit 320 receives the property estimation information output from the property estimation unit 310, diagnoses the current state of the analysis target according to the property estimation information, and outputs a state diagnosis result.

状態診断結果とは、イネの田植えからの日数を表す移植日後日数、サンプリング日から開花までに要する日数を表す開花日または、観測対象の乾燥重量当たりの窒素含量を表す窒素含量などの特性推定情報を用いて、イネの生育状況を表したものである。
生育状況に加えて更に、事前に理想的な生育データを入力しておき、理想的な生育データとのずれを計算し、ずれの程度に従い、いつ、どの程度肥料を与えるべきかなどといった行動指針を、図示しない表示部に示すようにしてもよい。 The condition diagnosis results are characteristic estimation information such as the number of days after transplanting that represents the number of days since rice planting, the flowering date that represents the number of days required from the sampling date to flowering, or the nitrogen content that represents the nitrogen content per dry weight of the observation target. Is used to represent the growth status of rice.
In addition to the growth situation, input ideal growth data in advance, calculate the deviation from the ideal growth data, and follow the behavior guidelines such as when and how much fertilizer should be given according to the degree of deviation May be shown on a display unit (not shown).

図４は本実施の形態における特性推定装置３００の概略動作を示すフローチャートであり、特性推定装置３００は以下のように動作する。 FIG. 4 is a flowchart showing a schematic operation of the characteristic estimation apparatus 300 in the present embodiment. The characteristic estimation apparatus 300 operates as follows.

まず、特性の推定対象となるイネを用意し、当該イネにおける、特性関連状態情報抽出部１４０にて抽出された遺伝子の発現量を測定する。（Ｓ４０１）。 First, rice to be estimated for characteristics is prepared, and the expression level of the gene extracted by the characteristic-related state information extraction unit 140 in the rice is measured. (S401).

次に、特性推定モデル生成装置１００にて生成された、推定を希望する特性に関する特性推定モデルへ上記測定結果を代入する（Ｓ４０２）。 Next, the measurement result is substituted into the characteristic estimation model for the characteristic desired to be estimated, which is generated by the characteristic estimation model generation apparatus 100 (S402).

特性推定モデルにおける特性推定結果を、特性推定情報として出力する（Ｓ４０３）。 The characteristic estimation result in the characteristic estimation model is output as characteristic estimation information (S403).

特性推定モデルにおける特性推定結果を用いて、イネの生育情報を診断し、出力する（Ｓ４０４）。 Rice growth information is diagnosed and output using the characteristic estimation result in the characteristic estimation model (S404).

以上より、本実施の形態によれば、以下のような効果を奏する。 As mentioned above, according to this Embodiment, there exist the following effects.

特性推定モデル生成装置１００は、特性推定モデル生成部１３０において、解析対象の特性変数と特性変数から、特性変数を従属変数、特性変数を説明変数とする特性推定モデルを生成するように構成したので、任意の特性変数を量的に推定するモデルを生成することができるという効果を奏する。 The characteristic estimation model generation apparatus 100 is configured to generate a characteristic estimation model in which the characteristic variable is a dependent variable and the characteristic variable is an explanatory variable from the characteristic variable and characteristic variable to be analyzed in the characteristic estimation model generation unit 130. There is an effect that a model for quantitatively estimating an arbitrary characteristic variable can be generated.

さらに、特性推定モデル生成装置１００は、解析用データ更新部１５０において、特性推定モデル生成に用いた遺伝子情報を解析用データから除外したデータを更新データとして生成する。そして、更新データを用いて事前に設定された繰り返し回数まで特性推定モデルの生成を繰り返し、特性推定モデルの生成毎に、特性に関連する遺伝子を特性関連遺伝子抽出部１４０にて抽出し記録するように構成されているので、任意の特性変数に関連する遺伝子情報を網羅的に抽出することができるという効果を奏する。 Furthermore, in the characteristic estimation model generation apparatus 100, the analysis data update unit 150 generates, as update data, data obtained by excluding gene information used for characteristic estimation model generation from the analysis data. Then, the generation of the characteristic estimation model is repeated up to a preset number of times using the update data, and the characteristic-related gene extraction unit 140 extracts and records the gene related to the characteristic every time the characteristic estimation model is generated. Therefore, there is an effect that gene information related to an arbitrary characteristic variable can be exhaustively extracted.

特性推定装置３００は、特性推定部３１０において、特性推定モデル生成装置１００にて生成したモデルを用いて、入力された状態変数から特性変数を量的に推定する。また、状態診断部３２０は、特性推定部３１０にて推定した特性変数を用いて、作物の生育情報を診断する。特性推定装置３００は上記のように構成されているので、専門的な技術者の手を介することなく、解析対象の生育状況を自動的に推定することができるという効果を奏する。更に、解析対象の生育状況を自動的に推定することによる、生育診断や土壌診断による栽培管理の高度化という効果も奏する。 In the characteristic estimation apparatus 300, the characteristic estimation unit 310 quantitatively estimates the characteristic variable from the input state variable using the model generated by the characteristic estimation model generation apparatus 100. Further, the state diagnosis unit 320 uses the characteristic variable estimated by the characteristic estimation unit 310 to diagnose crop growth information. Since the characteristic estimation apparatus 300 is configured as described above, there is an effect that the growth state of the analysis target can be automatically estimated without the intervention of a professional engineer. Furthermore, there is also an effect that the cultivation management by the growth diagnosis and the soil diagnosis is advanced by automatically estimating the growth state of the analysis target.

なお、本実施の形態において解析対象はイネについて説明したが、本発明はこれに限定されるものではなく、正則化項を有する回帰分析によって推定モデルを生成可能な解析対象であればよい。 In the present embodiment, rice has been described as an analysis target. However, the present invention is not limited to this, and any analysis target can be used as long as an estimation model can be generated by regression analysis having a regularization term.

また、特性変数についても、乾燥重量あたりの葉内窒素含量、田植えから何日後に穂が出るかを表す開花日および、田植えから何日経過したかを表す田植え後日数について説明したが、葉内リン含量、穂の大きさやその重量など、更には、正則化項を有する回帰分析によって推定モデルを生成可能な解析対象を構成する特性変数であればよい。 In addition, regarding the characteristic variables, the nitrogen content in the leaf per dry weight, the flowering date indicating how many days after the rice planting, and the number of days after rice planting indicating how many days have passed since the rice planting were explained. It may be a characteristic variable that constitutes an analysis target that can generate an estimation model by regression analysis having a regularization term, such as phosphorus content, ear size and weight.

本実施の形態における特性推定モデル生成装置１００および特性推定装置３００は、数値演算、数値の入力、その結果の記録、出力等が可能であればよく、具体的には、ＣＰＵ、メモリ、表示部、入出力インタフェース等を備えるコンピュータや、専用のハードウェアを使用することができる。 The characteristic estimation model generation device 100 and the characteristic estimation device 300 according to the present embodiment are only required to be able to perform numerical computation, input numerical values, and record and output the results. Specifically, a CPU, a memory, and a display unit A computer having an input / output interface or the like, or dedicated hardware can be used.

なお、本実施の形態においては、回帰分析部１２０にて用いる回帰分析手法として、LASSO回帰を用いたが、Ridge回帰やErastic-Netなどの正則化項を有する回帰分析手法であってもよい。また、特性推定モデル生成部１３０および、特性関連遺伝子抽出部１４０において用いる交差検証方法について、本実施の形態においては、K-fold cross-validationを用いたが、leave-one-out cross-validationなど、他の交差検証方法を用いてもよい。 In this embodiment, LASSO regression is used as a regression analysis method used in the regression analysis unit 120, but a regression analysis method having a regularization term such as Ridge regression or Elastic-Net may be used. In addition, in the present embodiment, K-fold cross-validation is used for the cross-validation method used in the characteristic estimation model generation unit 130 and the characteristic-related gene extraction unit 140, but leave-one-out cross-validation, etc. Other cross-validation methods may be used.

また、本実施の形態において、LASSO回帰によるモデル作成、最適なγの決定については統計解析ソフト「Ｒ」を用いて行った。 In the present embodiment, model creation by LASSO regression and determination of optimal γ were performed using statistical analysis software “R”.

図５は、解析対象をイネとして、解析用データを実際の圃場から採取した２２４サンプルとした場合の、本実施の形態における特性推定モデル生成装置１００にて生成されたモデルの再現度を示したものである。なお、検証回数は１０、繰り返し回数は１としている。
図５上段図は、解析対象であるイネについて、横軸を移植日後日数、縦軸を乾燥重量あたりの窒素含量として、解析用データ２２４サンプルについてプロットしたものである。マーカーの種類は、イネが採取された場所に対応している。
次に、前記解析用データにおける遺伝子発現情報を状態変数、移植日後日数、乾燥重量あたりの窒素含量を特性変数として、本実施の形態における特性推定モデル生成装置１００にて特性推定モデル作成する。
図５の下段図は、このようにして作成した移植日後日数と、乾燥重量あたりの窒素含量についての特性推定モデルへ上記解析用データを代入した結果をプロットしたものである。 FIG. 5 shows the reproducibility of the model generated by the characteristic estimation model generation device 100 in the present embodiment when the analysis target is rice and the analysis data is 224 samples collected from an actual field. Is. The number of verifications is 10 and the number of repetitions is 1.
The upper part of FIG. 5 is a plot of 224 samples for analysis, with the horizontal axis representing the number of days after transplantation and the vertical axis representing the nitrogen content per dry weight, for the rice to be analyzed. The type of marker corresponds to the place where the rice was collected.
Next, a characteristic estimation model is created by the characteristic estimation model generation apparatus 100 in the present embodiment using the gene expression information in the analysis data as a state variable, the number of days after transplantation, and the nitrogen content per dry weight as a characteristic variable.
The lower diagram of FIG. 5 is a plot of the result of substituting the data for analysis into the characteristic estimation model for the number of days after transplantation and the nitrogen content per dry weight.

図６は、解析対象をイネとして、移植日後日数、開花日日数、窒素含量について、それぞれ実際の圃場から採取した４９７サンプル、４２８サンプル、２２４を解析用データとして、本実施の形態における特性推定モデル生成装置１００にて特性推定モデルを生成した場合に、各サンプルとの平均二乗誤差によりモデルの精度を評価したものである。なお、検証回数は１０、繰り返し回数は５としている。図６における１ｓｔＭｏｄｅｌ、２ｎｄＭｏｄｅｌ…は繰り返し回数毎に生成されたモデルについての評価を示している。 FIG. 6 shows a characteristic estimation model according to the present embodiment, using rice as an analysis target, the days after transplanting, the days of flowering, and the nitrogen content, using 497 samples, 428 samples, and 224 collected from an actual field as analysis data. When the characteristic estimation model is generated by the generation device 100, the accuracy of the model is evaluated by the mean square error with each sample. The number of verifications is 10 and the number of repetitions is 5. In FIG. 6, 1stModel, 2ndModel,... Indicate the evaluation of the model generated for each repetition count.

図７は、繰り返し回数を１０とし、その他の条件は図６と同様の条件にした場合の、繰り返し回数毎に特性関連状態情報抽出部１４０に記録された特性変数、すなわち遺伝子の数を示したものである。
窒素含量を例にとると、イネを構成するとされる３２０００個の遺伝子から、窒素含量に関わる可能性が高い遺伝子を５８２個にまで絞り込めていることがわかる。 FIG. 7 shows the number of characteristic variables, that is, the number of genes recorded in the characteristic-related state information extraction unit 140 for each number of repetitions when the number of repetitions is 10 and the other conditions are the same as in FIG. Is.
Taking the nitrogen content as an example, it can be seen that the genes that have a high possibility of being related to the nitrogen content are narrowed down to 582 from the 32,000 genes that constitute rice.

図８は移植日後日数、図９は開花日、図１０は乾燥重量あたりの窒素含量について解析対象をイネとした場合の、安定性分布を表したものである。
安定性分布の縦軸は、特性関連状態情報抽出部１４０において説明変数を抽出する過程において、モデルの生成を検証回数繰り返した際に、全てのモデルを比較し、説明変数の選択された回数とその個数である。
安定性分布の横軸は、繰り返し回数である。
繰り返し回数が増加するにつれて、検証回数分全てのモデルにおいて選択される説明変数が減少してゆくことがわかる。
図８、９、１０において、検証回数は全て１０回である。繰り返し回数はそれぞれ１５回、２５回、１５回である。また、解析データについては、それぞれ実際の圃場から採取したイネ９３９サンプル、８１９サンプル、３０２サンプルである。 FIG. 8 shows the number of days after transplanting, FIG. 9 shows the flowering date, and FIG. 10 shows the stability distribution when the analysis target is the nitrogen content per dry weight.
The vertical axis of the stability distribution shows the number of times the explanatory variable is selected by comparing all the models when the generation of the model is repeated the number of verifications in the process of extracting the explanatory variable in the characteristic related state information extracting unit 140. That number.
The horizontal axis of the stability distribution is the number of repetitions.
It can be seen that as the number of iterations increases, the explanatory variables selected in all models decrease by the number of verifications.
In FIGS. 8, 9, and 10, the number of verifications is 10 times. The number of repetitions is 15, 25 and 15 respectively. The analysis data are rice 939 samples, 819 samples, and 302 samples collected from actual fields, respectively.

以上、実施の形態を参照して本発明を説明したが、本発明は上述した実施の形態に限定されるものではない。本発明の構成及び動作については、本発明の趣旨を逸脱しない範囲において、当業者が理解しうる様々な変更を行うことができる。 Although the present invention has been described with reference to the embodiment, the present invention is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and operation of the present invention without departing from the spirit of the present invention.

１００…特性推定モデル生成装置
１１０…データ出力部
１２０…回帰分析部
１３０…特性推定モデル生成部
１４０…特性関連情報抽出部
１５０…解析用データ更新部
３００…特性推定装置
３１０…特性推定部
３２０…状態診断部 DESCRIPTION OF SYMBOLS 100 ... Characteristic estimation model generation apparatus 110 ... Data output part 120 ... Regression analysis part 130 ... Characteristic estimation model generation part 140 ... Characteristic related information extraction part 150 ... Analysis data update part 300 ... Characteristic estimation apparatus 310 ... Characteristic estimation part 320 ... Condition diagnosis section

Claims

A characteristic estimation model generation device that generates a model that estimates the characteristic variable from the state variable that represents the state of the analysis target and the characteristic variable that represents the characteristic of the analysis target.
A data output unit that receives the state variable to be analyzed and the characteristic variable to be analyzed and outputs as analysis data;
Of the analysis data, the regression variable representing the relationship between the objective variable and the explanatory variable is obtained by performing a regression analysis having a regularization term using the characteristic variable as an objective variable and the state variable as an explanatory variable. A regression analysis unit to generate,
Using the regression model and the analysis data, cross-validation is performed up to a preset number of verifications, and a model having an optimal regularization term among the regression models is generated as a characteristic estimation model generation. And
Data corresponding to the explanatory variable selected in the characteristic estimation model, data excluded from the analysis data, is generated as update data, and is sent to the data output unit as analysis data at the next generation of the characteristic estimation model. An analysis data update unit to output,
With
A characteristic estimation model generation apparatus that repeats updating of the analysis data and generation of a characteristic estimation model using the update data up to a preset number of repetitions.

About the regression model generated every time cross-validation is performed in the characteristic estimation model generation unit, the explanatory variables selected in all models are characteristic-related state information that is state information related to the characteristic to be analyzed The characteristic estimation model production | generation apparatus of Claim 1 further equipped with the characteristic relevant state information extraction part extracted as.

The characteristic estimation model generation device according to claim 1, wherein the regression analysis is LASSO regression, Ridge regression, or Elastic-net.

The characteristic estimation model generation apparatus according to claim 1, wherein the analysis target is a living organism, the state variable is gene expression information, and the characteristic variable is trait information.

The characteristic estimation model generation device according to claim 4, wherein the organism is a crop.

The characteristic estimation model generation device according to claim 5, wherein the crop is a gramineous crop.

The characteristic estimation model generation apparatus according to claim 6, wherein the gramineous crop is rice.

The trait information is the number of days after the date of transplantation representing the number of days from the date of transplantation, the date of flowering representing the number of days required for flowering, or the nitrogen content representing the nitrogen content per dry weight of the observation target. The characteristic estimation model generation device described in 1.

A characteristic estimation device for estimating the characteristic of the analysis target,
A characteristic that is an estimation result of the characteristic of the analysis target when the state variable of the analysis target is input to the characteristic estimation model generated by the characteristic estimation model generation device according to claim 1. A characteristic estimation apparatus, comprising: a characteristic estimation unit that outputs estimation information.

A characteristic estimation device for estimating the characteristic of the analysis target,
A characteristic that is an estimation result of the characteristic of the analysis target by inputting gene expression information of the organism into the characteristic estimation model generated by the characteristic estimation model generation device according to claim 4. A characteristic estimation unit that outputs estimation information;
The gene expression information of the organism is measured from the amount of mRNA transcribed from a gene corresponding to the property-related state information extracted by the property-related state information extraction unit.

The characteristic estimation apparatus according to claim 10, wherein the expression level of the mRNA is measured by a quantitative RT-PCR method.

The characteristic estimation apparatus according to claim 11, wherein the quantitative RT-PCR method is a real-time RT-PCR method.

The characteristic estimation apparatus according to claim 11, wherein the quantitative RT-PCR method is a multiplex RT-PCR method.

The characteristic estimation apparatus according to claim 10, further comprising a state diagnosis unit that diagnoses the state of the living organism using the characteristic estimation information.

A computer is a characteristic estimation model generation method for generating a model for estimating the characteristic variable from the state variable representing the state to be analyzed and the characteristic variable representing the characteristic of the analysis target by the state variable,
The computer includes a state variable of the analyzed, among the analysis data with the analyzed characteristic variables, the characteristic variable and objective variable and an explanatory variable of the state variable, a regression analysis with regularization term Performing a regression analysis step for generating a regression model representing a relationship between the objective variable and the explanatory variable,
The computer performs cross-validation up to a preset number of verifications using the regression model and the analysis data, and generates a model having an optimal regularization term among the regression models as a characteristic estimation model A characteristic estimation model generation step;
The computer generates, as update data, data obtained by excluding data corresponding to the explanatory variable selected in the characteristic estimation model from the analysis data, and updates the data as analysis data at the next generation of the characteristic estimation model. Analysis data update step,
With
A method for generating a characteristic estimation model, wherein the computer repeats updating of the analysis data and generation of a characteristic estimation model using the update data up to a preset number of repetitions.

The computer, for the previous SL regression model in the estimation model generation step Ru generated for each performing cross validation, the explanatory variables that are selected in all models is the status information relating to the characteristics of the analyzed The characteristic estimation model generation method according to claim 15, further comprising a characteristic-related state information extraction step that extracts the characteristic-related state information.

The characteristic estimation model generation method according to claim 15 or 16, wherein the regression analysis is LASSO regression, Ridge regression, or Elastic-net.

The characteristic estimation model generation method according to claim 15, wherein the analysis target is an organism, the state variable is gene expression information, and the characteristic variable is trait information.

The characteristic estimation model generation method according to claim 18, wherein the organism is a crop.

The characteristic estimation model generation method according to claim 19, wherein the crop is a gramineous crop.

21. The characteristic estimation model generation method according to claim 20, wherein the gramineous crop is rice.

The trait information is the number of days after the transplantation date representing the number of days from the date of transplantation, the flowering date representing the number of days required for flowering, or the nitrogen content representing the nitrogen content per dry weight of the observation target. The characteristic estimation model generation method described in 1.

A property estimation method for estimating the property of the analysis target,
The estimation model generated by the estimation model generation method according to any of claims 15 22, by the computer inputs the analyzed state variables, the estimated the computer is characteristic of the analyzed and outputs the result der Ru characteristic estimation information, and having a characteristic estimating step, characteristic estimating method.

A property estimation method for estimating the property of the analysis target,
The estimation model generated by the estimation model generation method according to any of claims 18 22, wherein the computer, by inputting the gene expression information of the organism, the computer, the characteristics of the analyzed estimation results you outputs der Ru characteristic estimation information, has a characteristic estimating step,
The gene expression information of the organism is information measured from the amount of mRNA transcribed from a gene corresponding to the property-related state information extracted in the property-related state information extraction step. Estimation method.

The expression level of the mRNA is an expression levels measured by quantitative RT-PCR method, characteristics estimation method according to claim 24.

The characteristic estimation method according to claim 25, wherein the quantitative RT-PCR method is a real-time RT-PCR method.

The characteristic estimation method according to claim 25, wherein the quantitative RT-PCR method is a multiplex RT-PCR method.

The computer, using the estimation information to diagnose the condition of the organism, further comprising a status diagnostic steps, characteristics estimation method according to any of claims 24 27.