CN110705182B

CN110705182B - Crop breeding adaptive time prediction method coupling crop model and machine learning

Info

Publication number: CN110705182B
Application number: CN201911076188.7A
Authority: CN
Inventors: 张朝; 张亮亮; 陶福禄
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2019-09-06
Filing date: 2019-11-06
Publication date: 2020-07-10
Anticipated expiration: 2039-11-06
Also published as: CN110705182A

Abstract

The invention discloses a crop breeding adaptive time prediction method by coupling a crop model and machine learning, which comprises the following steps: s1: calibration of the crop model and simulation of the management scenario to obtain the growth time (DOY) and yield (Y) of the crop; s2: selecting key feature variables, S3: constructing a hybrid evaluation model, including selecting the hybrid evaluation model with the highest precision by combining a machine learning method; s4: assessing the effect of climate change, including calculating the yield change (Yc) for each variety; and S5: identifying a time of breeding adaptation; calculating whether the median value of the yield change of at least any half year in each time window exceeds an adaptability threshold value, if so, determining that the lattice point needs breeding intervention, wherein the breeding adaptability time is the middle time t of the time window; this ultimately results in times and locations in the study area that require breeding adaptations in specific future climatic scenarios.

Description

Crop breeding adaptive time prediction method coupling crop model and machine learning

Technical Field

The invention relates to the technical field of agricultural information, in particular to a crop breeding adaptive time prediction method by coupling a crop model and machine learning.

Background

Climate change causes a significant increase in the frequency and intensity of extreme climatic events (e.g., extreme high temperatures, drought and heat waves, etc.), which poses a serious threat to global food safety. The variety updating is a key measure for the agricultural production system to cope with climate change, and relates to three links of cultivation, delivery and adoption, which generally requires 15-30 years and consumes a large amount of capital. Therefore, the breeding adaptation time should be scientifically predicted in advance so as not to waste funds. However, completing a systematic assessment of the impact of climate change on existing varieties on a regional scale is a prerequisite to determining when and where breeding adaptations are needed.

At present, there are two main methods for evaluating the influence of climate change on crop varieties: (1) the statistical model is used for establishing a regression relationship between meteorological factors and yield in a reference time period, and then substituting the trend of meteorological elements under the future climate situation into the statistical model to estimate the influence of climate change; however, the method can only evaluate the influence of climate change on a single variety, and cannot systematically research the response of the existing variety to the climate change; (2) crop models, which can artificially reproduce the continuous process of crop from sowing to maturity on a day or even hour scale, reflect the way in which crop growth responds to different environmental and regulatory factors. In the evaluation of the influence of climate change, only the weather, soil and management data in the reference time period and the future situation are input into a crop model for simulation to obtain the corresponding yield, and then the yields in the two time periods are compared to estimate the influence of climate change. Existing crop models can be divided into two categories: site models, designed for specific field trials, while successfully characterized the effects of management measures on yield development, were able to perform only a single point of simulation. The regional application of the model can be realized through the variety parameter regionalization and the meteorological element spatial interpolation technology, but a new error is inevitably introduced; although the lattice crop model can represent regional spatial differences, a large amount of driving data is required for construction and operation, and the parameters are very difficult to define due to the large spatial heterogeneity of surface parameters, varieties and management modes, and large-regional research is still not easy to realize. In addition, the grid point model mainly considers the influence of the change of meteorological elements on the crop yield, and the contribution of agricultural measures is usually ignored.

Most of the existing breeding adaptation researches utilize field temperature increasing experiments or crop models to study whether the cultivated high-temperature and drought resistant varieties can make up for yield loss caused by climate change or not based on the simulation of hypothetical varieties, and at present, no framework for predicting breeding adaptation time exists.

Therefore, it is necessary to construct a flexible and efficient method that can quantify the influence of climate change on a regional scale, and at the same time can consider the contribution of management measures such as varieties, which is the basis of predicting breeding adaptation time.

Disclosure of Invention

The inventor realizes that machine learning is a direct succession of statistical methods in the research process, and the difference is that the machine learning utilizes weights for prediction without making any assumption on input information, so that the data containing noise is more stable, and the complex nonlinear relation of an agricultural system can be better described; furthermore, machine learning is completely data dependent, i.e. its predicted spatial scale depends on the input data, allowing flexible multi-scale applications. Therefore, the mechanism process of the site model and the advantages of machine learning model data driving are combined, a mixed evaluation model is constructed by utilizing the output of the crop model to train a machine learning algorithm to depict the complex relation among climate, soil, management, variety and yield in a specific environment, and then the relation is applied to a homogeneous region, so that the influence of climate change on different varieties can be evaluated on the regional scale, and a foundation is laid for the prediction of the next breeding adaptation time.

And the cross threshold analysis for climate change prediction can determine the time and place of occurrence of a certain event (such as estimation of global temperature at any time and any place which is higher than 2 ℃ in the pre-industrial era), and the method is applied to the research of climate change adaptability measures, can predict breeding adaptive time and place, can provide early signals that the existing variety cannot be planted at any time and any place for decision makers, and further promotes breeding investment, which is of great importance for guaranteeing national and regional food safety.

Based on the findings, the invention constructs a mixed evaluation model by coupling the site crop model and machine learning, realizes the evaluation of the influence of regional scale climate change on different varieties, and then predicts the breeding adaptive time and place by utilizing cross threshold analysis.

According to an aspect of the present invention, there is provided a method for predicting adaptation time of crop breeding by coupling crop models and machine learning, comprising the steps of:

s1: simulation of calibration and management scenarios for a crop model, comprising: acquiring soil data (S), meteorological data (W) and agricultural production data (A) of experiment sites in a research area, calibrating a crop model by using the data, and simulating various management scenes by using the calibrated crop model to obtain the growth period (DOY) and the yield (Y) of crops under various simulation scenes;

s2: selecting key feature variables, including: for each simulation scene, extracting meteorological data of crops from sowing to maturity every day according to the growth period (DOY) obtained by the simulation, and calculating the agricultural gas index in the growth period; integrating the characteristic variables which influence the growth and development of the crops and correspond to each simulation scene to establish a characteristic variable table; calculating a correlation between the characteristic variables through Pearson correlation analysis, analyzing and sequencing the importance of the characteristic variables relative to the yield by utilizing a machine learning model, removing the characteristic variables with the correlation larger than a preset value (for example, 0.75) and the characteristic variables with insignificant contribution to the yield, and keeping the rest characteristic variables as key characteristic variables;

s3: constructing a hybrid assessment model, comprising: inserting the yield (Y) corresponding to each simulation into a corresponding feature variable table, dividing each simulation scene into a training set and a testing set according to a certain proportion, optimizing the hyper-parameters of a machine learning model by using the key feature variables and the yield (Y) based on the training set by adopting a grid search (GridsearchCV) method in Python, and selecting the mixed evaluation model with the highest precision by using 10-fold cross validation (10-fold cross validation) on the testing set;

s4: assessing the effect of climate change comprising: respectively inputting the key characteristic variables of the grid point scale of each variety under the climate conditions of the reference time period and the future time period into the mixed evaluation model with the highest precision to obtain the grid yield of the reference time period and the future time period, comparing and calculating the yield change (Yc) of each variety, wherein a yield change formula is calculated as follows;

wherein Y is_fAnd Y_bAnnual average production for future and benchmark periods, respectively;

s5: identifying a time of breeding adaptation; the method comprises the following steps: calculating, for each crop planting site, a median of yield variation for a plurality of varieties per year over a future time period; setting a time window in the future period, taking the starting point of the future period as the starting point of the time window, calculating whether the median of the yield change of at least any half year in the time window exceeds an adaptability threshold value, if the condition is met, determining that the lattice point needs breeding intervention, and the breeding adaptive time is the middle time t of the time window; and then setting the next time window by taking t +1 as an intermediate point to perform the same analysis, and circulating until the end point of the time window moves to the end point of the future time period, thereby finally obtaining the time and the place of the research area needing breeding adaptation under the specific future climate scene, wherein when the yield change is a negative value and is less than a certain value, the yield change is defined as an adaptation threshold value.

Preferably, the crop model is a DSSAT model.

Preferably, in steps S2 and S3, the machine learning model is selected from RF and XGBoost.

Preferably, the variables influencing the growth and development of the crops comprise the agricultural gas indexes, soil attributes, geographical positions and varieties.

Preferably, step S3 includes using the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), and the coefficient of determination (R)²) To evaluate the accuracy when MAE and RMSE are lowest and R is²The highest time the hybrid evaluation model is the highest precision hybrid evaluation model.

Preferably, the ratio of the training set to the test set is about 7-9:3-1, preferably 8: 2.

Preferably, the time window is 20 years.

Preferably, the yield is changed to-10% as the adaptive threshold.

Preferably, the method of coupled crop model and machine-learned crop breeding adaptation time prediction further comprises replacing the median with a maximum loss or minimum loss value for yield change of a plurality of varieties annually over a future period, and repeating step S5 to identify earliest and latest breeding adaptation times and locations.

Compared with the prior art, the invention realizes the beneficial technical effects that:

1. and the regional application of the site crop model is realized. The method is characterized in that a site model is used for simulating the yield under various local production conditions, then a machine learning model is trained by using a simulation result, and the purpose is to describe the complex relation among climate, management and variety and yield of a specific area by using machine learning, further apply the relation to a homogeneous area and realize point-to-surface extrapolation. Compared with the traditional regionalization of variety parameters, the method is more scientific and reasonable.

2. The efficiency of climate change influence evaluation is improved. Compared with the simulation based on the lattice point crop model, the method only needs a small amount of experimental data to calibrate the station model, and avoids the complex data preparation and parameter determination processes of the lattice point model. The spatial scale of the hybrid evaluation model based on the machine learning model only depends on input data, and multi-scale climate change influence evaluation can be flexibly completed.

3. A framework for predicting climate change adaptability measures is presented. The technology applies the cross threshold analysis for climate change prediction to the determination of crop breeding time, and predicts breeding adaptive time and place for the first time, which is important for an agricultural production system to cope with climate change and guarantee grain safety. The method is not limited to the determination of breeding adaptation time, and can also be applied to the research of various adaptive measures such as transformation adaptation and the like.

Drawings

The same reference numbers in the drawings identify the same or similar elements or components. The objects and features of the present invention will become more apparent in view of the following description taken in conjunction with the accompanying drawings, in which:

fig. 1 is a schematic flow diagram of a method of coupled crop model and machine-learned crop breeding adaptation time prediction according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of the prediction results of a crop breeding adaptation time prediction method coupling crop models and machine learning according to an embodiment of the present invention.

Detailed Description

For a clear description of the solution according to the invention, preferred embodiments are given below and are described in detail with reference to the accompanying drawings. The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses.

It should be understood that the crop model and the machine learning model referred to in the present invention are known per se, such as various sub-modules of the model, various parameters, operation mechanisms, and so on, and therefore the present invention focuses on the coupled application process between the crop model and the machine learning.

Fig. 1 is a schematic flow diagram of a method for predicting adaptation time for crop breeding by coupling crop models and machine learning according to an embodiment of the present invention, which is further described below with reference to the accompanying drawings.

Referring to fig. 1, the method for predicting adaptation time for crop breeding by coupling crop model and machine learning according to the present invention may include the following steps:

the method includes selecting an appropriate crop model, such as a DSSAT series model, a MCW L A series model, etc., using soil data (S) from test sites within a study area, the calibration of the crop model (i.e., the localization of the model) using meteorological data (W) and agricultural production data (A). The soil parameters may include, for example, soil type, color, grade, permeability, reflectivity, soil thickness, soil moisture evaporation limits, runoff curve number and soil drainage rate, photosynthesis factors, soil water lower or withered point moisture content, field capacity, saturated moisture content, soil capacity, soil organic carbon, nitrogen, soil pH, clay particle content and particle content, etc., the meteorological parameters may include daily solar radiation, daily maximum temperature, daily minimum air temperature, daily rainfall, daily relative humidity, daily average wind speed, etc.

The figure shows that the DSSAT model is adopted, and specifically, the method can comprise the steps of inputting the data (S), (W) and (A) into the DSSAT model to generate files S ', W ' and A ' which can be executed by the model respectively, calling the files S ', W ' and A ' for calculation through a G L UE parameter estimation tool to obtain a file C containing the parameters of the crop varieties in the research area, then setting various agricultural management scenes based on the agricultural production data (A), namely expanding the agricultural production data A to simulate various management scenes, inputting the modified data into the crop model again to generate a file A ", calling the files S ', W ', A ' and C through a crop system model embedded in the crop model to simulate the growth period (DOY) and the yield (Y) of the crops.

If necessary, the calibrated model may be verified by using the data (S), (W) and (a), for example, the data (S), (W) and (a) may be divided into two parts, i.e., a calibration part to calibrate the model and a verification part to verify the model.

Selecting key characteristic variables, including extracting meteorological data of crops from sowing to maturity every day, such as maximum temperature, lowest temperature, average temperature and rainfall, and the like, according to the growth period (DOY) obtained by the simulation, calculating the indexes of the agricultural gas in the growth period, such as accumulated temperature (GDD), accumulated rainfall (Pgs), standard rainfall index (SPI), and the like, integrating the characteristic variables which influence the growth and development of the crops and are corresponding to each simulation scenario, and establishing a characteristic variable table.

And calculating the correlation among the characteristic variables through Pearson correlation analysis, analyzing and sorting the importance of the characteristic variables relative to the yield by utilizing a machine learning model, removing the characteristic variables with the correlation larger than a preset value and the characteristic variables with insignificant contribution to the yield, and keeping the rest characteristic variables as key characteristic variables.

Pearson correlation analysis may calculate the correlation between characteristic variables, for example, when the correlation coefficient is greater than a certain value, for example, it may be set to be greater than 0.75, and these variables are excluded. The machine learning models, which may be for example RF and XGBoost, are known per se, and the above parameter variables are input into these machine learning models, and the importance ranking of the characteristic variables is automatically calculated and output, i.e. ranking is performed according to the influence (contribution) of these characteristic variables on the yield, and the characteristic variables that do not contribute significantly to the yield are removed, for example, the last few variables of the sequence obtained by the ranking may be deleted. One or more machine learning models may be used, such as RF and XGBoost, and the results of both models are then considered together.

After the characteristic variables are eliminated, the rest of the characteristic variables are key characteristic variables, and the influence of the variables on the crop yield is large.

Next, a hybrid assessment model is constructed using the key feature variables and the yield (Y). May include inserting the yield (Y) corresponding to each simulation into a corresponding feature variable table, and then dividing each simulation scenario (i.e., sample) into a training set and a testing set in a certain ratio, for example, the ratio of the training set to the testing set may be about 7-9:3-1, preferably 8: 2; optimizing hyper-parameters of a machine learning model, such as coefficients of a multiple linear regression, based on the training set and using the key feature variables and yield (Y) using a grid search in Python (GridsearchCV) method; the machine learning model may be, for example, RF and/or XGBoost, thereby constructing a hybrid assessment model.

Then, the precision of the model is evaluated on the test set by using 10-fold cross validation (10-fold cross validation) mixture, and the model with the highest precision is selectedHybrid evaluation models, in which Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R) can be used²) To evaluate the accuracy when MAE and RMSE are lowest and R is²The highest time the hybrid evaluation model is the highest precision hybrid evaluation model.

Then, the influence of the climate change is evaluated by using the mixed evaluation model with the highest precision, which can comprise respectively inputting the key characteristic variables of the grid point scale of each variety under the climate situation of the reference time interval and the future time interval into the mixed evaluation model with the highest precision to obtain the grid yield of the reference time interval and the future time interval, comparing and calculating the yield change (Y) of each variety_c) The yield variation formula is calculated as follows;

wherein Y is_fAnd Y_bAnnual production during the future period and annual average production during the benchmark period, respectively.

Identifying a time of breeding acclimation. In order to determine the time and place of breeding adaptation, it is first necessary to define an adaptation threshold (adaptation threshold) for a variety, below which means that planting an existing variety in future climatic scenarios at the site will suffer a large yield loss, and new varieties need to be bred. That is, when the yield change is negative and less than a certain value, the yield change may be defined as an adaptive threshold, for example, Y may be defined as_cAn adaptive threshold is defined at-10%, or some other suitable value, where negative values mean loss of production (yield reduction).

Calculating the median of the yield change of a plurality of varieties every year in a future period for each crop planting lattice point, namely simulating the annual yield change of different varieties of one crop on each planting lattice point and calculating the median of the yield change of each variety; then setting a time window in the future period, wherein the starting point of the future period is used as the starting point of the time window, calculating whether the median value (yield loss) of the yield change of at least any half year in the time window exceeds an adaptability threshold value, if the condition is met, determining that the lattice (place) point needs breeding intervention, and the breeding adaptation time is the middle time t of the time window; and then setting the next time window by taking t +1 as an intermediate point to perform the same analysis, and circulating until the end point of the time window moves to the end point of the future time period, thereby finally obtaining the time and the place of the research area needing breeding adaptation under the specific future climate scene, wherein when the yield change is a negative value and is less than a certain value, the yield change is an adaptive threshold value.

Referring to fig. 1, a 20-year time window is used to determine whether the yield loss of at least any 10 years in the 20 years exceeds the adaptive threshold, if the conditions are met, the site is considered to need breeding intervention, and the breeding adaptive time is the middle time of the 20-year time window; if the condition is not met, the next time window is analyzed, and so on. The time and place at which breeding adaptations are required for a particular future climate scenario in the research area are ultimately obtained.

The method of predicting crop breeding adaptation times coupling crop models and machine learning according to the present invention may further comprise replacing the median with a maximum loss or a minimum loss value for yield change of a plurality of varieties annually over a future period, and repeating step S5 to identify earliest and latest breeding adaptation times and locations.

The identification of the earliest and latest breeding adaptation times is consistent with the above process, but calculated based on the maximum and minimum values of the change in yield of the plurality of varieties per year over the future period, respectively. That is, for each crop planting site, calculating a maximum loss or minimum loss value for the change in yield of the plurality of varieties per year over a future period; setting a time window in the future period, wherein the starting point of the future period is used as the starting point of the time window, calculating whether the maximum loss or the minimum loss value of the yield change of at least any half year in the time window exceeds an adaptive threshold value, if the condition is met, determining that the breeding intervention is needed at the lattice (place) point, and the earliest or latest breeding adaptive time is the middle time t of the time window; the same analysis is then performed with the next time window set at the intermediate point t +1, and the cycle is repeated until the end of the time window moves to the end of the future time period, thereby ultimately resulting in the time and place of breeding adaptation where the study area requires the earliest and latest in the particular future climate scenario.

Examples

The present case further illustrates the technical solution of the present invention by taking the estimation of the yield of summer corn in northern China plain as an example. The method comprises the following steps: this example is intended to illustrate the invention only, but not to limit the scope of the invention, e.g. the invention may also be used for other crops such as wheat and the like:

in this example, taking the example of predicting the breeding adaptation time of Huang-Huai-Hai summer corn in China under the RCP8.5 (an assumed scenario of future carbon emission, namely, the concentration of carbon dioxide in air is 3-4 times higher than that before the industrial revolution by 2100 years) scenario as an example, the technical method, the flow of the method for predicting the breeding adaptation time of the crop by coupling the crop model and machine learning, of the present invention is further illustrated, and specifically includes:

the method comprises the steps of S1, adopting a CERES-Maize model in a DSSAT series model to carry out calibration and simulation of local management scenes, testing six Maize varieties at 13 sites of a Huang-Huai-Hai Maize planting area as shown in Table 1, inputting soil data S, meteorological data W and agricultural production data A of all experiments of each variety into the CERES-Maize model, dividing the data into a calibration part and a verification part according to a certain proportion, calling the three types of files of a calibration year through a G L UE parameter estimation tool, calculating to obtain a file C containing crop variety parameters of the area, then verifying the calibrated crop genetic parameters by using the data of the verification year, and finally obtaining 6 sets of crop variety genetic parameters.

TABLE 1 Huang-Huai-Hai plain tested maize varieties

Based on the agricultural production data a, for example, the agricultural production data a of each experimental site can be modified (augmented) according to specific agricultural planting experience or records of multiple years, so as to simulate the yield of 6 varieties in a reference year (1986-.

S2: a key feature is selected. Extracting the maximum temperature, the lowest temperature, the average mild rainfall of the corn from sowing to maturity in each day under each management scene according to the growth period (DOY) obtained by simulation in S1, and calculating 5 agricultural gas indexes according to the formula in Table 2 (the calculation of the agricultural gas indexes is well known in the art and is not repeated herein); then, the length of the growth period (DOY) of each simulated corn and 10 surface soil attributes (such as soil physicochemical property, hydrological property, pH value and the like at a position of 100 cm) and 3 geographic position information (longitude, latitude and elevation) of the corresponding station are extracted, and a characteristic variable table is established. And then, calculating the correlation among the features by using Pearson correlation analysis, simultaneously sequencing the importance of the factors by using RF and XGboost respectively, finally removing the factors with high correlation and low comprehensive importance score, and storing the finally selected features.

TABLE 2 calculation of the agricultural gas index for the corn growth period

s is planting date, m is maturing date.

*Droughts in a warming climate:a global assessment of standardizedprecipitation index(spi)and reconnaissance drought index，Asadi Zarch etal.2015

S3: and constructing a mixed evaluation model. Inserting the yield of each simulation into a characteristic variable table, dividing samples into a training set and a testing set according to the proportion of 8:2, respectively performing parameter optimization on RF and XGboost by using grid search in Pyrhon 3.7 based on 80% of samples, then evaluating the precision of the model by using 10-fold cross validation based on the remaining 20% of samples, and finally selecting MAE (mean absolute error) and RMSE (root mean square error) to be the lowest, R being the lowest, and²(coeffient of determination) the highest model. The results show that the XGboost-based hybrid assessment model constructed with the highest accuracy (Table 3) will be used for the next timeAnd (4) evaluating the influence of the climate change on the step point scale.

O_iAnd S_iFor observed and analog values, O_avgAnd S_avgAre the corresponding average values. Y is_predFor the prediction value of the mixture evaluation model, Y_simuIs a simulated value of CERES-Maize, and n is a sample size.

Table 3 accuracy of corn yield predicted by RF and XGBoost on test set

And S4, evaluating the influence of the climate change, wherein grid characteristic variables are required to be input into an evaluation model when the influence of the climate change is evaluated on a grid point scale, wherein soil and geographic position data are grid point data of 0.5 degrees × 0.5.5 degrees, only 10 surface soil attributes and 3 spatial position information of the corn planting grid points need to be extracted, and 5 agricultural gas indexes and growth period length (DOY) need to be further calculated.

S4.1, obtaining DOY of the lattice point scale, replacing the meteorological file W of each experimental site in S1 with meteorological data of a reference time interval (1986-.

And S4.2, calculating the agricultural gas index, namely calculating the agricultural gas indexes (GDD, TCD, OCA, Pgs and SPI) of each variety in the growth period of the corn in 1986-2005 one grid point by grid point according to the DOY of the 6 varieties in the grid point scale every year obtained in the last step, and obtaining the agricultural gas index with the resolution of 0.5 degree × 0.5.5 degrees for each variety in 20 years.

S4.3, calculating the yield of the grid point scale. For each variety, 5 agricultural gas indexes, 10 surface soil attributes, 1 DOY and 3 spatial position characteristics of the 1986-2005-year lattice point scale are input into the mixed evaluation model, and the yield of each variety in the 1986-2005-year lattice point scale is obtained.

S4.4, replacing the meteorological data of the reference time interval with data of 2020 + 2060 years under the RCP8.5 scene, and repeating the steps S4.1, S4.2 and S4.3 to obtain the yield of each variety 2020 + 2060 years in lattice point scale.

S4.5 comparing the yield in the future period with the average yield in the reference year to obtain the yield change of 2020-2060 year under RCP8.5 (Y)_c) The calculation formula is as follows:

Y_cis a predicted change in yield, Y_fAnd Y_bAverage production per year and baseline year, respectively, for the future period.

S5: identifying a time of breeding acclimation. Yield loss of 10% was first defined as the fitness threshold below which means that maize production will suffer a large yield loss, requiring breeding intervention. For each corn planting lattice point, calculating the median value of the yield change of 6 varieties per year in 2020-. If the probability is less than 0.5, selecting time windows for calculation in 10 years before and after the midpoint in the next year, and repeating the steps until the midpoint moves to 2050. Finally, the breeding adaptive time of the Huang-Huai-Hai corn planting area under the RCP8.5 scene is obtained, and the result is shown in figure 2.

The crop breeding time prediction method based on the coupling of the crop model and the machine learning integrates the advantages of the crop model and the machine learning, the machine learning model is trained through the output of the site crop model, the regional application of the site crop model is realized, the cross threshold analysis is applied to the prediction of the crop breeding time, and a new framework is provided for the research of climate change adaptability measures.

The principles and embodiments of the present invention have been described herein using specific examples, which are presented solely to aid in the understanding of the apparatus and its core concepts; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A crop breeding adaptive time prediction method coupling a crop model and machine learning comprises the following steps:

s1: simulation of calibration and management scenarios for a crop model, comprising: acquiring soil data, meteorological data and agricultural production data of experiment sites in a research area, calibrating a crop model by using the data, and simulating various management scenes by using the calibrated crop model to acquire the growth period and the yield of crops under various simulation scenes;

s2: selecting key feature variables, including: for each simulation scene, extracting meteorological data of crops from sowing to maturity every day according to the growth period obtained by the simulation, and calculating the agricultural gas index in the growth period; integrating the characteristic variables which influence the growth and development of the crops and correspond to each simulation scene to establish a characteristic variable table; calculating a correlation system among the characteristic variables through Pearson correlation analysis, analyzing and sequencing the importance of the characteristic variables relative to the yield by utilizing a machine learning model, removing the characteristic variables with the correlation larger than a preset value and the characteristic variables with insignificant contribution to the yield, and keeping the rest characteristic variables as key characteristic variables;

s3: constructing a hybrid assessment model, comprising: inserting the yield corresponding to each simulation into a corresponding characteristic variable table, dividing each simulation scene into a training set and a testing set according to a certain proportion, adopting a grid search GridsearchCV method in Python, optimizing hyper-parameters of a machine learning model based on the training set and by using the key characteristic variables and the yield, and then cross-verifying the precision of the hybrid evaluation model on the testing set by using 10 folds to select the hybrid evaluation model with the highest precision;

s4: assessing the effect of climate change comprising: respectively inputting the key characteristic variables of the grid point scale of each variety under the climatic conditions of the reference time period and the future time period into the mixed evaluation model with the highest precision to obtain the grid yield of the reference time period and the future time period, comparing and calculating the yield change Yc of each variety, wherein a yield change formula is calculated as follows;

s5: identifying a time of breeding adaptation; the method comprises the following steps: calculating, for each crop planting site, a median of yield variation for a plurality of varieties per year over a future time period; setting a time window in the future period, taking the starting point of the future period as the starting point of the time window, calculating whether the median of the yield change of at least any half year in the time window exceeds an adaptability threshold value, if the condition is met, determining that the lattice point needs breeding intervention, and the breeding adaptive time is the middle time t of the time window; and then setting the next time window by taking t +1 as an intermediate point to perform the same analysis, and circulating until the end point of the time window moves to the end point of the future time period, thereby finally obtaining the time and the place of the research area needing breeding adaptation under the specific future climate scene, wherein when the yield change is negative and less than a certain value, the yield change is defined as an adaptation threshold value.

2. The method of claim 1, wherein the crop model is a DSSAT model.

3. A method of crop breeding adaptation time prediction coupling crop model and machine learning as claimed in claim 1 wherein in steps S2 and S3, the machine learning model is selected from RF and XGBoost.

4. The method of claim 1, wherein the characteristic variables affecting the growth and development of the crop comprise an index of agricultural gas, soil properties, geographic location, and variety.

5. The method of claim 1, wherein step S3 includes using the mean absolute error MAE, the root mean square error RMSE, and the decision coefficient R²To evaluate the accuracy when MAE and RMSE are lowest and R is²The highest time the hybrid evaluation model is the highest precision hybrid evaluation model.

6. The method of claim 1, wherein the training set is 70-90% in proportion and the test set is 10-30% in proportion.

7. The method of claim 1, wherein the time window is 20 years.

8. The method of claim 1, wherein the fitness threshold is-10%.

9. The method of predicting crop breeding adaptation time coupled with crop model and machine learning of claim 1, further comprising replacing the median with a maximum loss or a minimum loss value for yield variation of a plurality of varieties per year over a future period, and repeating step S5 to identify earliest and latest breeding adaptation times and locations.

10. The method of predicting adaptation time for crop breeding by coupling crop model and machine learning of claim 1, wherein the crop is corn or wheat.

11. The method of claim 1, wherein the predetermined value is 0.75.

12. The method of claim 1, wherein the training set is 80% and the test set is 20% in proportion.