CN111985728A - Method for establishing organic sorghum yield prediction model - Google Patents

Method for establishing organic sorghum yield prediction model Download PDF

Info

Publication number
CN111985728A
CN111985728A CN202010926712.1A CN202010926712A CN111985728A CN 111985728 A CN111985728 A CN 111985728A CN 202010926712 A CN202010926712 A CN 202010926712A CN 111985728 A CN111985728 A CN 111985728A
Authority
CN
China
Prior art keywords
model
organic sorghum
organic
data
yield
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010926712.1A
Other languages
Chinese (zh)
Other versions
CN111985728B (en
Inventor
张华�
孙守伟
耿云涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN202010926712.1A priority Critical patent/CN111985728B/en
Publication of CN111985728A publication Critical patent/CN111985728A/en
Application granted granted Critical
Publication of CN111985728B publication Critical patent/CN111985728B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Primary Health Care (AREA)
  • Mining & Mineral Resources (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Husbandry (AREA)
  • Agronomy & Crop Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明特别涉及一种建立有机高粱产量预测模型的方法。该建立有机高粱产量预测模型的方法,首先将有机高粱的成长周期分为种植初期,成长期和成熟期三个时期,分别构建训练数据集;对三份训练数据集进行预处理和变量筛选,筛选出最重要的特征变量,对模型进行训练;调整模型变量并评测模型的可行性直至达到建模者的要求以及评判参考系数为止;最后,将训练好的不同成长阶段的三份模型分别保存到后台大数据集群中,方便后期用户调用。该建立有机高粱产量预测模型的方法,仅能够针对不同成长阶段对有机高粱产量进行预测模型,提高了预测速度和预测准确度,还能为用户提供预警提示,及时提醒用户采取挽救措施,为有机高粱提供了研判基准和参考。

Figure 202010926712

The invention particularly relates to a method for establishing an organic sorghum yield prediction model. The method for establishing an organic sorghum yield prediction model firstly divides the growth period of organic sorghum into three periods: the initial planting period, the growing period and the mature period, and constructs training data sets respectively; Filter out the most important feature variables and train the model; adjust the model variables and evaluate the feasibility of the model until it meets the requirements of the modeler and the reference coefficient for evaluation; finally, save the trained three models of different growth stages separately To the background big data cluster, it is convenient for users to call later. The method for establishing an organic sorghum yield prediction model can only predict the organic sorghum yield according to different growth stages, which improves the prediction speed and prediction accuracy, and can also provide users with early warning prompts, timely remind users to take rescue measures, and provide organic sorghum production. Sorghum provides benchmarks and references for research and judgment.

Figure 202010926712

Description

一种建立有机高粱产量预测模型的方法A kind of method to build organic sorghum yield prediction model

技术领域technical field

本发明涉及机器学习算法技术领域,特别涉及一种建立有机高粱产量预测模型的方法。The invention relates to the technical field of machine learning algorithms, in particular to a method for establishing an organic sorghum yield prediction model.

背景技术Background technique

有机高粱作为酒厂经常使用酿酒的原料之一,其产量对酒厂的运营至关重要。但是现有对有机高粱预测主要靠专家凭借经验对不同地块根据面积进行产量估计,人为预测结果不是很精确,误差偏差较大。Organic sorghum is one of the raw materials that wineries often use to make wine, and its yield is critical to the operation of the winery. However, the existing prediction of organic sorghum mainly relies on experts to estimate the yield of different plots according to the area. The artificial prediction results are not very accurate, and the error and deviation are large.

基于此,本发明提出了一种建立有机高粱产量预测模型的方法。Based on this, the present invention proposes a method for establishing an organic sorghum yield prediction model.

发明内容SUMMARY OF THE INVENTION

本发明为了弥补现有技术的缺陷,提供了一种简单高效的建立有机高粱产量预测模型的方法。In order to make up for the defects of the prior art, the present invention provides a simple and efficient method for establishing an organic sorghum yield prediction model.

本发明是通过如下技术方案实现的:The present invention is achieved through the following technical solutions:

一种建立有机高粱产量预测模型的方法,其特征在于,包括以下步骤:A method for establishing an organic sorghum yield prediction model, comprising the following steps:

第一步,构建训练数据集The first step is to build a training dataset

将有机高粱的成长周期分为种植初期,成长期和成熟期三个时期,并将种植的基础属性,种植时期的自然环境数据和田间管理数据的历史数据作为模型的初始解释变量,将种植地块的历史产量记录作为模型的响应变量,根据有机高粱的成长阶段分别汇总成三份模型训练数据集;The growth cycle of organic sorghum is divided into three periods: the initial planting period, the growing period and the mature period, and the basic attributes of planting, the natural environment data during the planting period and the historical data of field management data are used as the initial explanatory variables of the model. The historical yield records of the block were used as the response variable of the model, and were aggregated into three model training data sets according to the growth stage of organic sorghum;

第二步,数据预处理The second step is data preprocessing

对三份训练数据集进行预处理,包括观察数据集中的异样数据点,剔除可疑的人为错误录入的信息;Preprocess the three training data sets, including observing abnormal data points in the data set, and eliminating suspicious human error input information;

第三步,变量筛选The third step, variable filtering

为了提高模型的稳定程度与预测精度,对预处理完后的数据进行变量筛选,筛选出15个最重要的特征变量,并将特征变量进行共线性问题测试;In order to improve the stability and prediction accuracy of the model, the preprocessed data is filtered, and the 15 most important feature variables are screened out, and the feature variables are tested for collinearity;

第四步,训练模型The fourth step is to train the model

将有机高粱产量数设为响应变量,将其他影响有机高粱产量的特征变量作为模型的解释变量,以三份不同成长期的有机高粱的训练数据作为训练数据集对模型进行训练;The organic sorghum yield number was set as the response variable, and other characteristic variables that affected the organic sorghum yield were used as the explanatory variables of the model, and the training data of three organic sorghums in different growth stages were used as the training data set to train the model;

第五步,模型诊断The fifth step, model diagnosis

观察以及调整模型变量以及评测模型的可行性,不断循环调整直至达到建模者的要求以及评判参考系数为止;Observe and adjust the model variables and evaluate the feasibility of the model, and continuously adjust the cycle until the requirements of the modeler and the evaluation reference coefficient are met;

第六步,模型应用The sixth step, model application

将训练好的有机高粱产量在种植初期,成长期以及成熟期的三份模型分别保存到后台大数据集群中,方便后期用户调用。The three models of the trained organic sorghum yield in the early planting, growing and mature stages are saved to the backend big data cluster, which is convenient for users to call later.

所述第一步中,训练数据集由以下三个部分组成:In the first step, the training dataset consists of the following three parts:

种植的基础属性,包括土地的面积、土壤温度、土壤湿度、土壤酸碱度以及特殊化学物质含量信息,通过人工记录以及土壤检测仪的检测获得;The basic attributes of planting, including land area, soil temperature, soil moisture, soil pH and content of special chemical substances, are obtained through manual records and soil detector testing;

种植时期的自然环境数据,包括近期温度变化,光照时间,以及特殊自然灾害的影响程度,通过本地气象观测站以及地块所在基站来采集获得;The natural environment data during the planting period, including recent temperature changes, lighting time, and the degree of influence of special natural disasters, are collected through the local meteorological observation station and the base station where the plot is located;

田间管理数据,土地使用肥料量的记录,种子使用量,病虫草害的次数以及农技员现场支持次数,通过现场采集获得。Field management data, records of the amount of fertilizer used on the land, the amount of seeds used, the number of pests and weeds, and the number of on-site support by agricultural technicians were obtained through on-site collection.

所述第二步中,当训练数据集中某列特征数据存在空置情况时,若空置列占总列的5%以下,则在空置列中选择填充相应列平均值,否则删除该列特征数据或者人工对空置列进行填充。In the second step, when there is vacancy in a certain column of feature data in the training data set, if the vacant column accounts for less than 5% of the total columns, select the average value of the corresponding column to be filled in the vacant column, otherwise delete the column of feature data or Fill empty columns manually.

所述第三步中,使用数理统计中的方差膨胀系数概念衡量在回归建模中变量的共线性严重程度,剔除方差膨胀系数超过10的变量,得到三份不同成长期的有机高粱的训练数据作为有机高粱在不同时期产量的训练数据集。In the third step, the concept of variance expansion coefficient in mathematical statistics is used to measure the degree of collinearity of variables in regression modeling, and variables with a variance expansion coefficient exceeding 10 are excluded to obtain three training data of organic sorghum in different growth stages. As a training dataset for organic sorghum yield at different times.

所述第四步中,采用Python语言中的Statsmodels专业回归建模扩展工具语言包对有机高粱产量预测模型进行编写,并将三份不同成长期的有机高粱的训练数据作为训练数据集对分别模型进行训练,从而得到三份不同成长期的有机高粱产量预测模型。In the described fourth step, the organic sorghum yield prediction model is written using the Statsmodels professional regression modeling extension tool language package in the Python language, and the training data of three organic sorghum in different growth stages are used as the training data set for the respective models. Training was performed to obtain three organic sorghum yield prediction models at different growth stages.

所述第五步中,对模型的变量调整及测评,包括以下步骤:In the fifth step, the variable adjustment and evaluation of the model includes the following steps:

S1.利用残差诊断图观察构建的模型与数据之间的拟合程度,并对离群异常点以及强影响的样本点进行观察,并将强影响的利群异常点在训练模型过程中剔除;S1. Use the residual diagnostic chart to observe the fitting degree between the constructed model and the data, observe the outlier abnormal points and the sample points with strong influence, and eliminate the outlier abnormal points with strong influence in the process of training the model ;

S2.通过对模型中的重要变量指标绘制统计学中的GAM(广义相加模型)图来评测变量的拟合次方是否合理并且添加备选模型方案;S2. Evaluate whether the fitting power of the variable is reasonable by drawing a GAM (Generalized Additive Model) graph in statistics for the important variable indicators in the model and add an alternative model scheme;

S3.剔除模型统计诊断结果中T检验P值大于0.2的特征变量,因为T检验显示较高的特征变量与产量之间不存在显著的线形关系;S3. Eliminate the characteristic variables with the T-test P value greater than 0.2 in the statistical diagnosis results of the model, because the T-test shows that there is no significant linear relationship between higher characteristic variables and yield;

S4.计算候选模型的矫正决定系数(Adjusted R-Square),AIC(Akaikeinformation criterion,赤池信息量)赤池信息准则以及模型在测试数据集中所得到的均方误差和,最后比较挑选出对原数据解释力度和拟合程度最高的模型。S4. Calculate the adjusted coefficient of determination (Adjusted R-Square) of the candidate model, AIC (Akaikeinformation criterion, Akaike information criterion) Akaike information criterion and the sum of the mean square errors obtained by the model in the test data set, and finally compare and select the interpretation of the original data The model with the highest strength and fit.

所述第六步中,在高粱的生长阶段对种植的基础属性,种植时期的自然环境数据和田间管理数据进行持续监测,并将监测数据输入到模型的特征变量中,根据用户输入的成长阶段调用相应成长阶段的有机高粱产量预测模型对有机高粱的产量进行定期预测;In the sixth step, the basic attributes of planting, the natural environment data and field management data during the planting period are continuously monitored during the growth stage of sorghum, and the monitoring data are input into the characteristic variables of the model, and the growth stage input by the user is used. Call the organic sorghum yield prediction model of the corresponding growth stage to make regular predictions on the yield of organic sorghum;

如果预测值小于用户需求量或经验值,则发出预警信息,提醒用户对出现的异常情况进行及时补救。If the predicted value is less than the user's demand or experience value, an early warning message will be issued to remind the user to make timely remediation for the abnormal situation that occurs.

所述第六步中,若在收粮阶段,收购量高于预测产量15%以上,则认为该农户可能存在借用或以次充好现象,模型发出系统提醒。In the sixth step, if the purchase volume is more than 15% higher than the predicted output during the grain harvesting stage, it is considered that the farmer may have borrowed or shoddy goods, and the model will issue a system reminder.

本发明的有益效果是:该建立有机高粱产量预测模型的方法,仅能够针对不同成长阶段对有机高粱产量进行预测模型,提高了预测速度和预测准确度,还能为用户提供预警提示,及时提醒用户采取挽救措施,为有机高粱提供了研判基准和参考。The beneficial effects of the invention are as follows: the method for establishing the organic sorghum yield prediction model can only predict the organic sorghum yield according to different growth stages, which improves the prediction speed and prediction accuracy, and can also provide users with early warning prompts and timely reminders Users took rescue measures to provide benchmarks and references for organic sorghum.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

附图1为本发明建立有机高粱产量预测模型的方法示意图。Accompanying drawing 1 is the schematic diagram of the method for establishing the organic sorghum yield prediction model of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好的理解本发明中的技术方案,下面将结合本发明实施例,对本发明实施例中的技术方案进行清楚,完整的描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention. Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

该建立有机高粱产量预测模型的方法,包括以下步骤:The method for establishing an organic sorghum yield prediction model includes the following steps:

第一步,构建训练数据集The first step is to build a training dataset

本模型发明根据有机高粱的成长特点,考虑到有机高粱在不同成长阶段所需要的施肥量,降雨量等外在因素有所不同,所以将有机高粱的成长周期分为了三个时期:种植初期,成长期和成熟期三个时期,并将种植的基础属性,种植时期的自然环境数据和田间管理数据的历史数据作为模型的初始解释变量,将种植地块的历史产量记录作为模型的响应变量,根据有机高粱的成长阶段分别汇总成三份模型训练数据集;According to the growth characteristics of organic sorghum, this model invention divides the growth cycle of organic sorghum into three periods: the initial stage of planting, the There are three periods of growth and maturity, and the basic attributes of planting, the natural environment data and the historical data of field management data during the planting period are used as the initial explanatory variables of the model, and the historical yield records of the planting plots are used as the response variables of the model. According to the growth stage of organic sorghum, they were aggregated into three model training data sets;

第二步,数据预处理The second step is data preprocessing

对三份训练数据集进行预处理,包括观察数据集中的异样数据点,剔除可疑的人为错误录入的信息;Preprocess the three training data sets, including observing abnormal data points in the data set, and eliminating suspicious human error input information;

第三步,变量筛选The third step, variable filtering

为了提高模型的稳定程度与预测精度,对预处理完后的数据进行变量筛选,筛选出15个最重要的特征变量,并将特征变量进行共线性问题测试;In order to improve the stability and prediction accuracy of the model, the preprocessed data is filtered, and the 15 most important feature variables are screened out, and the feature variables are tested for collinearity;

第四步,训练模型The fourth step is to train the model

将有机高粱产量数设为响应变量,将其他影响有机高粱产量的特征变量作为模型的解释变量,以三份不同成长期的有机高粱的训练数据作为训练数据集对模型进行训练;The organic sorghum yield number was set as the response variable, and other characteristic variables that affected the organic sorghum yield were used as the explanatory variables of the model, and the training data of three organic sorghums in different growth stages were used as the training data set to train the model;

第五步,模型诊断The fifth step, model diagnosis

观察以及调整模型变量以及评测模型的可行性,不断循环调整直至达到建模者的要求以及评判参考系数为止;Observe and adjust the model variables and evaluate the feasibility of the model, and continuously adjust the cycle until the requirements of the modeler and the evaluation reference coefficient are met;

第六步,模型应用The sixth step, model application

将训练好的有机高粱产量在种植初期,成长期以及成熟期的三份预测模型分别保存到后台大数据集群中,方便后期用户调用。The three prediction models of the trained organic sorghum yield in the early planting, growing and mature stages are saved to the background big data cluster, which is convenient for users to call later.

所述第一步中,训练数据集由以下三个部分组成:In the first step, the training dataset consists of the following three parts:

种植的基础属性,包括土地的面积、土壤温度、土壤湿度、土壤酸碱度以及特殊化学物质含量信息,通过人工记录以及土壤检测仪的检测获得;The basic attributes of planting, including land area, soil temperature, soil moisture, soil pH and content of special chemical substances, are obtained through manual records and soil detector testing;

种植时期的自然环境数据,包括近期温度变化,光照时间,以及特殊自然灾害的影响程度,通过本地气象观测站以及地块所在基站来采集获得;The natural environment data during the planting period, including recent temperature changes, lighting time, and the degree of influence of special natural disasters, are collected through the local meteorological observation station and the base station where the plot is located;

田间管理数据,土地使用肥料量的记录,种子使用量,病虫草害的次数以及农技员现场支持次数,通过现场采集获得。Field management data, records of the amount of fertilizer used on the land, the amount of seeds used, the number of pests and weeds, and the number of on-site support by agricultural technicians were obtained through on-site collection.

所述第二步中,当训练数据集中某列特征数据存在空置情况时,若空置列占总列的5%以下,则在空置列中选择填充相应列平均值,否则删除该列特征数据或者人工对空置列进行填充。In the second step, when there is vacancy in a certain column of feature data in the training data set, if the vacant column accounts for less than 5% of the total columns, select the average value of the corresponding column to be filled in the vacant column, otherwise delete the column of feature data or Fill empty columns manually.

所述第三步中,使用数理统计中的方差膨胀系数概念衡量在回归建模中变量的共线性严重程度,剔除方差膨胀系数超过10的变量,得到三份不同成长期的有机高粱的训练数据作为有机高粱在不同时期产量的训练数据集。In the third step, the concept of variance expansion coefficient in mathematical statistics is used to measure the degree of collinearity of variables in regression modeling, and variables with a variance expansion coefficient exceeding 10 are excluded to obtain three training data of organic sorghum in different growth stages. As a training dataset for organic sorghum yield at different times.

所述第四步中,采用Python语言中的Statsmodels专业回归建模扩展工具语言包对有机高粱产量预测模型进行编写,并将三份不同成长期的有机高粱的训练数据作为训练数据集对分别模型进行训练,从而得到三份不同成长期的有机高粱产量预测模型。In the described fourth step, the organic sorghum yield prediction model is written using the Statsmodels professional regression modeling extension tool language package in the Python language, and the training data of three organic sorghum in different growth stages are used as the training data set for the respective models. Training was performed to obtain three organic sorghum yield prediction models at different growth stages.

所述第五步中,对模型的变量调整及测评,包括以下步骤:In the fifth step, the variable adjustment and evaluation of the model includes the following steps:

S1.利用残差诊断图观察构建的模型与数据之间的拟合程度,并对离群异常点以及强影响的样本点进行观察,并将强影响的利群异常点在训练模型过程中剔除;S1. Use the residual diagnostic chart to observe the fitting degree between the constructed model and the data, observe the outlier abnormal points and the sample points with strong influence, and eliminate the outlier abnormal points with strong influence in the process of training the model ;

S2.通过对模型中的重要变量指标绘制统计学中的GAM(广义相加模型)图来评测变量的拟合次方是否合理并且添加备选模型方案;S2. Evaluate whether the fitting power of the variable is reasonable by drawing a GAM (Generalized Additive Model) graph in statistics for the important variable indicators in the model and add an alternative model scheme;

S3.剔除模型统计诊断结果中T检验显示较高(P值大于0.2)的特征变量,因为T检验显示较高的特征变量与产量之间不存在显著的线形关系;S3. Eliminate the characteristic variables with higher T test (P value greater than 0.2) in the statistical diagnosis results of the model, because the T test shows that there is no significant linear relationship between the higher characteristic variables and the yield;

S4.计算候选模型的矫正决定系数(Adjusted R-Square),AIC(Akaikeinformation criterion,赤池信息量)赤池信息准则以及模型在测试数据集中所得到的均方误差和,最后比较挑选出对原数据解释力度和拟合程度最高的模型。S4. Calculate the adjusted coefficient of determination (Adjusted R-Square) of the candidate model, AIC (Akaikeinformation criterion, Akaike information criterion) Akaike information criterion and the sum of the mean square errors obtained by the model in the test data set, and finally compare and select the interpretation of the original data The model with the highest strength and fit.

用户根据下一年的酿酒计划推断下一年度对有机高粱的用量需求,从而制定种植计划。为保证用量需求得到满足,所述第六步中,在高粱的生长阶段对种植的基础属性,种植时期的自然环境数据和田间管理数据进行持续监测,根据模型中的特征变量在页面中输入变量的监测数值,并添加当前计算地块下有机高粱的成长阶段;后台任务在读取必要的参数后,调用相应成长阶段的有机高粱产量预测模型对有机高粱的产量进行定期预测;Based on the next year's brewing plan, the user infers the demand for organic sorghum in the next year to make a planting plan. In order to ensure that the dosage requirements are met, in the sixth step, the basic attributes of planting, the natural environment data and field management data during the planting period are continuously monitored in the growth stage of sorghum, and variables are entered on the page according to the characteristic variables in the model. and add the growth stage of the organic sorghum under the current calculation plot; after reading the necessary parameters, the background task calls the organic sorghum yield prediction model of the corresponding growth stage to regularly predict the yield of organic sorghum;

如果预测值小于用户需求量或经验值,则发出预警信息,提醒用户对出现的异常情况进行及时补救。If the predicted value is less than the user's demand or experience value, an early warning message will be issued to remind the user to make timely remediation for the abnormal situation that occurs.

所述第六步中,若在收粮阶段,收购量高于预测产量15%以上,则认为该农户可能存在借用或以次充好现象,模型发出系统提醒。In the sixth step, if the purchase volume is more than 15% higher than the predicted output during the grain harvesting stage, it is considered that the farmer may have borrowed or shoddy goods, and the model will issue a system reminder.

与目前的现有技术相比,该建立有机高粱产量预测模型的方法,具有以下特点:Compared with the current prior art, the method for establishing an organic sorghum yield prediction model has the following characteristics:

第一、考虑到有机高粱的成长阶段分别构建了三份训练数据集,分别对三个不同时期的有机高粱的产量进行了模型的构建,令模型可以有针对性的学习不同时期的有机高粱对各种指标的影响程度。First, considering the growth stage of organic sorghum, three training data sets were constructed, respectively, and models were constructed for the yield of organic sorghum in three different periods, so that the model could learn the effects of organic sorghum in different periods in a targeted manner. The degree of influence of various indicators.

第二、添加了结果比对的功能,即当用户通过构建的模型进行预测有机高粱未来产量的时候,如果结果低于用户需求量或经验值,则为用户提供预警提示,能够及时提醒用户对相应的地块产量不足采取挽救措施。Second, the function of result comparison is added, that is, when the user predicts the future yield of organic sorghum through the constructed model, if the result is lower than the user's demand or experience value, the user will be provided with an early warning prompt, which can timely remind the user to Rescue measures should be taken for the insufficient yield of the corresponding plots.

第三、通过机器学习,统计学领域回归建模的方法为客户提供了一套有机高粱产量的研判规则,即确定了影响有机高粱产量的主要因素。这些主要因素在经过严格的筛选后,具备了可观察,可调整的,可模拟的特点,为后续关于有机高粱以及其他农作物的研究提供了研判基准和参考。Third, through machine learning, the regression modeling method in the field of statistics provides customers with a set of rules for judging the yield of organic sorghum, that is, to determine the main factors affecting the yield of organic sorghum. After rigorous screening, these main factors have the characteristics of being observable, adjustable, and simulating, which provide benchmarks and references for subsequent research on organic sorghum and other crops.

以上所述的实施例,只是本发明具体实施方式的一种,本领域的技术人员在本发明技术方案范围内进行的通常变化和替换都应包含在本发明的保护范围内。The above-mentioned embodiment is only one of the specific embodiments of the present invention, and the usual changes and substitutions made by those skilled in the art within the scope of the technical solution of the present invention should be included in the protection scope of the present invention.

Claims (8)

1.一种建立有机高粱产量预测模型的方法,其特征在于,包括以下步骤:1. a method for establishing an organic sorghum yield prediction model, is characterized in that, comprises the following steps: 第一步,构建训练数据集The first step is to build a training dataset 将有机高粱的成长周期分为种植初期,成长期和成熟期三个时期,并将种植的基础属性,种植时期的自然环境数据和田间管理数据的历史数据作为模型的初始解释变量,将种植地块的历史产量记录作为模型的响应变量,根据有机高粱的成长阶段分别汇总成三份模型训练数据集;The growth cycle of organic sorghum is divided into three periods: the initial planting period, the growing period and the mature period, and the basic attributes of planting, the natural environment data during the planting period and the historical data of field management data are used as the initial explanatory variables of the model. The historical yield records of the block were used as the response variable of the model, and were aggregated into three model training data sets according to the growth stage of organic sorghum; 第二步,数据预处理The second step is data preprocessing 对三份训练数据集进行预处理,包括观察数据集中的异样数据点,剔除可疑的人为错误录入的信息;Preprocess the three training data sets, including observing abnormal data points in the data set, and eliminating suspicious human error input information; 第三步,变量筛选The third step, variable filtering 为了提高模型的稳定程度与预测精度,对预处理完后的数据进行变量筛选,筛选出15个最重要的特征变量,并将特征变量进行共线性问题测试;In order to improve the stability and prediction accuracy of the model, the preprocessed data is filtered, and the 15 most important feature variables are screened out, and the feature variables are tested for collinearity; 第四步,训练模型The fourth step is to train the model 将有机高粱产量数设为响应变量,将其他影响有机高粱产量的特征变量作为模型的解释变量,以三份不同成长期的有机高粱的训练数据作为训练数据集对模型进行训练;The organic sorghum yield number was set as the response variable, and other characteristic variables that affected the organic sorghum yield were used as the explanatory variables of the model, and the training data of three organic sorghums in different growth stages were used as the training data set to train the model; 第五步,模型诊断The fifth step, model diagnosis 观察以及调整模型变量以及评测模型的可行性,不断循环调整直至达到建模者的要求以及评判参考系数为止;Observe and adjust the model variables and evaluate the feasibility of the model, and continuously adjust the cycle until the requirements of the modeler and the evaluation reference coefficient are met; 第六步,模型应用The sixth step, model application 将训练好的有机高粱产量在种植初期,成长期以及成熟期的三份模型分别保存到后台大数据集群中,方便后期用户调用。The three models of the trained organic sorghum yield in the early planting, growing and mature stages are saved to the backend big data cluster, which is convenient for users to call later. 2.根据权利要求1所述的建立有机高粱产量预测模型的方法,其特征在于:所述第一步中,训练数据集由以下三个部分组成:2. the method for establishing organic sorghum yield prediction model according to claim 1, is characterized in that: in the described first step, training data set is made up of following three parts: 种植的基础属性,包括土地的面积、土壤温度、土壤湿度、土壤酸碱度以及特殊化学物质含量信息,通过人工记录以及土壤检测仪的检测获得;The basic attributes of planting, including land area, soil temperature, soil moisture, soil pH and content of special chemical substances, are obtained through manual records and soil detector testing; 种植时期的自然环境数据,包括近期温度变化,光照时间,以及特殊自然灾害的影响程度,通过本地气象观测站以及地块所在基站来采集获得;The natural environment data during the planting period, including recent temperature changes, lighting time, and the degree of influence of special natural disasters, are collected through the local meteorological observation station and the base station where the plot is located; 田间管理数据,土地使用肥料量的记录,种子使用量,病虫草害的次数以及农技员现场支持次数,通过现场采集获得。Field management data, records of the amount of fertilizer used on the land, the amount of seeds used, the number of pests and weeds, and the number of on-site support by agricultural technicians were obtained through on-site collection. 3.根据权利要求1或2所述的建立有机高粱产量预测模型的方法,其特征在于:所述第二步中,当训练数据集中某列特征数据存在空置情况时,若空置列占总列的5%以下,则在空置列中选择填充相应列平均值,否则删除该列特征数据或者人工对空置列进行填充。3. the method for establishing organic sorghum yield prediction model according to claim 1 and 2, is characterized in that: in the described second step, when a certain column of characteristic data in the training data set has a vacancy situation, if the vacant column accounts for the total column 5% or less, choose to fill the average value of the corresponding column in the vacant column, otherwise delete the characteristic data of the column or manually fill the vacant column. 4.根据权利要求1或2所述的建立有机高粱产量预测模型的方法,其特征在于:所述第三步中,使用数理统计中的方差膨胀系数概念衡量在回归建模中变量的共线性严重程度,剔除方差膨胀系数超过10的变量,得到三份不同成长期的有机高粱的训练数据作为有机高粱在不同时期产量的训练数据集。4. the method for establishing organic sorghum yield prediction model according to claim 1 and 2, is characterized in that: in the described 3rd step, use the variance expansion coefficient concept in mathematical statistics to measure the collinearity of variable in regression modeling Severity, excluding variables with a variance expansion coefficient exceeding 10, three training data of organic sorghum in different growth stages were obtained as the training data set of organic sorghum yield in different periods. 5.根据权利要求4所述的建立有机高粱产量预测模型的方法,其特征在于:所述第四步中,采用Python语言中的Statsmodels专业回归建模扩展工具语言包对有机高粱产量预测模型进行编写,并将三份不同成长期的有机高粱的训练数据作为训练数据集对分别模型进行训练,从而得到三份不同成长期的有机高粱产量预测模型。5. the method for establishing organic sorghum yield prediction model according to claim 4, is characterized in that: in the described 4th step, adopt the Statsmodels professional regression modeling expansion tool language package in the Python language to carry out the organic sorghum yield prediction model. Write and use the training data of three organic sorghum in different growth stages as training data sets to train the respective models, thereby obtaining three organic sorghum yield prediction models in different growth stages. 6.根据权利要求1或4所述的建立有机高粱产量预测模型的方法,其特征在于:所述第五步中,对模型的变量调整及测评,包括以下步骤:6. the method for setting up organic sorghum yield prediction model according to claim 1 and 4, is characterized in that: in the described 5th step, to the variable adjustment of model and evaluation, comprise the following steps: S1.利用残差诊断图观察构建的模型与数据之间的拟合程度,并对离群异常点以及强影响的样本点进行观察,并将强影响的利群异常点在训练模型过程中剔除;S1. Use the residual diagnostic chart to observe the fitting degree between the constructed model and the data, observe the outlier abnormal points and the sample points with strong influence, and eliminate the outlier abnormal points with strong influence in the process of training the model ; S2.通过对模型中的重要变量指标绘制统计学中的GAM图来评测变量的拟合次方是否合理并且添加备选模型方案;S2. Evaluate whether the fitting power of the variable is reasonable by drawing the GAM diagram in statistics for the important variable indicators in the model and add an alternative model scheme; S3.剔除模型统计诊断结果中T检验P值大于0.2的特征变量,因为T检验显示较高的特征变量与产量之间不存在显著的线形关系;S3. Eliminate the characteristic variables with the T-test P value greater than 0.2 in the statistical diagnosis results of the model, because the T-test shows that there is no significant linear relationship between higher characteristic variables and yield; S4.计算候选模型的矫正决定系数,AIC赤池信息准则以及模型在测试数据集中所得到的均方误差和,最后比较挑选出对原数据解释力度和拟合程度最高的模型。S4. Calculate the correction coefficient of determination of the candidate model, the AIC Akaike information criterion and the sum of the mean square errors obtained by the model in the test data set, and finally compare and select the model with the highest degree of interpretation and fit to the original data. 7.根据权利要求2或6所述的建立有机高粱产量预测模型的方法,其特征在于:所述第六步中,在高粱的生长阶段对种植的基础属性,种植时期的自然环境数据和田间管理数据进行持续监测,并将监测数据输入到模型的特征变量中,根据用户输入的成长阶段调用相应成长阶段的有机高粱产量预测模型对有机高粱的产量进行定期预测;7. the method for setting up organic sorghum yield prediction model according to claim 2 or 6, is characterized in that: in the described 6th step, in the growth stage of sorghum to the basic attribute of planting, the natural environment data of planting period and field The management data is continuously monitored, and the monitoring data is input into the characteristic variables of the model, and the organic sorghum yield prediction model of the corresponding growth stage is called according to the growth stage input by the user to regularly predict the yield of organic sorghum; 如果预测值小于用户需求量或经验值,则发出预警信息,提醒用户对出现的异常情况进行及时补救。If the predicted value is less than the user's demand or experience value, an early warning message will be issued to remind the user to make timely remediation for the abnormal situation that occurs. 8.根据权利要求7所述的建立有机高粱产量预测模型的方法,其特征在于:所述第六步中,若在收粮阶段,收购量高于预测产量15%以上,则认为该农户可能存在借用或以次充好现象,模型发出系统提醒。8. the method for establishing an organic sorghum yield prediction model according to claim 7, is characterized in that: in the described 6th step, if in the grain harvest stage, the purchase amount is higher than the predicted yield by more than 15%, then think that this farmer may be possible. If there is borrowing or shoddy charging, the model will issue a system reminder.
CN202010926712.1A 2020-09-07 2020-09-07 Method for establishing organic sorghum yield prediction model Active CN111985728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010926712.1A CN111985728B (en) 2020-09-07 2020-09-07 Method for establishing organic sorghum yield prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010926712.1A CN111985728B (en) 2020-09-07 2020-09-07 Method for establishing organic sorghum yield prediction model

Publications (2)

Publication Number Publication Date
CN111985728A true CN111985728A (en) 2020-11-24
CN111985728B CN111985728B (en) 2025-01-17

Family

ID=73446970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010926712.1A Active CN111985728B (en) 2020-09-07 2020-09-07 Method for establishing organic sorghum yield prediction model

Country Status (1)

Country Link
CN (1) CN111985728B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705657A (en) * 2021-08-24 2021-11-26 华北电力大学 Stepwise clustering statistical downscaling method for eliminating multiple collinearity based on difference method
CN116579521A (en) * 2023-05-12 2023-08-11 中山大学 Yield prediction time window determining method, device, equipment and readable storage medium
CN117217372A (en) * 2023-09-08 2023-12-12 湖北泰跃卫星技术发展股份有限公司 Method for predicting agricultural product yield and quantifying influence of agricultural activities on yield

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120591A1 (en) * 2000-11-29 2002-08-29 International Business Machines Corporation Partial stepwise regression for data mining
US20060161403A1 (en) * 2002-12-10 2006-07-20 Jiang Eric P Method and system for analyzing data and creating predictive models
CN106408132A (en) * 2016-09-30 2017-02-15 深圳前海弘稼科技有限公司 Method and device of crop yield prediction based on plantation device
US20170270446A1 (en) * 2015-05-01 2017-09-21 360 Yield Center, Llc Agronomic systems, methods and apparatuses for determining yield limits
CN109242201A (en) * 2018-09-29 2019-01-18 上海中信信息发展股份有限公司 A kind of method, apparatus and computer readable storage medium for predicting crop yield
CN109325433A (en) * 2018-09-14 2019-02-12 东北农业大学 Multi-temporal remote sensing inversion method of soybean biomass in black soil area by introducing terrain factors
CN110309960A (en) * 2019-06-20 2019-10-08 武汉华电工研科技有限公司 A kind of big data intellectual analysis forecasting system
CN110414738A (en) * 2019-08-01 2019-11-05 吉林高分遥感应用研究院有限公司 A kind of crop yield prediction technique and system
CN110428107A (en) * 2019-08-06 2019-11-08 吉林大学 A kind of corn yield remote sensing prediction method and system
CN110443420A (en) * 2019-08-05 2019-11-12 山东农业大学 A kind of crop production forecast method based on machine learning
CN111461435A (en) * 2020-04-01 2020-07-28 中国农业科学院农业信息研究所 Crop yield prediction method and system
US20220308568A1 (en) * 2018-11-16 2022-09-29 Korea University Research And Business Foundation System and method for monitoring soil gas and performing responsive processing on basis of result of monitoring
KR102542661B1 (en) * 2022-07-29 2023-06-15 농업협동조합중앙회 Apparatus and method for providing user interface for crop production prediction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020120591A1 (en) * 2000-11-29 2002-08-29 International Business Machines Corporation Partial stepwise regression for data mining
US20060161403A1 (en) * 2002-12-10 2006-07-20 Jiang Eric P Method and system for analyzing data and creating predictive models
US20170270446A1 (en) * 2015-05-01 2017-09-21 360 Yield Center, Llc Agronomic systems, methods and apparatuses for determining yield limits
CN106408132A (en) * 2016-09-30 2017-02-15 深圳前海弘稼科技有限公司 Method and device of crop yield prediction based on plantation device
CN109325433A (en) * 2018-09-14 2019-02-12 东北农业大学 Multi-temporal remote sensing inversion method of soybean biomass in black soil area by introducing terrain factors
CN109242201A (en) * 2018-09-29 2019-01-18 上海中信信息发展股份有限公司 A kind of method, apparatus and computer readable storage medium for predicting crop yield
US20220308568A1 (en) * 2018-11-16 2022-09-29 Korea University Research And Business Foundation System and method for monitoring soil gas and performing responsive processing on basis of result of monitoring
CN110309960A (en) * 2019-06-20 2019-10-08 武汉华电工研科技有限公司 A kind of big data intellectual analysis forecasting system
CN110414738A (en) * 2019-08-01 2019-11-05 吉林高分遥感应用研究院有限公司 A kind of crop yield prediction technique and system
CN110443420A (en) * 2019-08-05 2019-11-12 山东农业大学 A kind of crop production forecast method based on machine learning
CN110428107A (en) * 2019-08-06 2019-11-08 吉林大学 A kind of corn yield remote sensing prediction method and system
CN111461435A (en) * 2020-04-01 2020-07-28 中国农业科学院农业信息研究所 Crop yield prediction method and system
KR102542661B1 (en) * 2022-07-29 2023-06-15 농업협동조합중앙회 Apparatus and method for providing user interface for crop production prediction

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705657A (en) * 2021-08-24 2021-11-26 华北电力大学 Stepwise clustering statistical downscaling method for eliminating multiple collinearity based on difference method
CN113705657B (en) * 2021-08-24 2024-01-19 华北电力大学 Gradual clustering statistical downscaling method for eliminating multiple collinearity based on difference method
CN116579521A (en) * 2023-05-12 2023-08-11 中山大学 Yield prediction time window determining method, device, equipment and readable storage medium
CN116579521B (en) * 2023-05-12 2024-01-19 中山大学 Yield prediction time window determining method, device, equipment and readable storage medium
CN117217372A (en) * 2023-09-08 2023-12-12 湖北泰跃卫星技术发展股份有限公司 Method for predicting agricultural product yield and quantifying influence of agricultural activities on yield
CN117217372B (en) * 2023-09-08 2024-03-08 湖北泰跃卫星技术发展股份有限公司 Method for predicting agricultural product yield and quantifying influence of agricultural activities on yield

Also Published As

Publication number Publication date
CN111985728B (en) 2025-01-17

Similar Documents

Publication Publication Date Title
CN111985728A (en) Method for establishing organic sorghum yield prediction model
CN110555553B (en) Multi-factor comprehensive identification method for sudden drought
CN112070297A (en) Weather index prediction method, device, equipment and storage medium for farming activities
CN109615150B (en) Method and system for determining rice meteorological output
CN119006207B (en) An ecological assessment method and system for garden plant environmental monitoring
CN111754045A (en) Prediction system based on fruit tree growth
CN107103395B (en) Short-term early warning method for crop pests
CN115619583A (en) Construction method of composite agricultural meteorological disaster monitoring index system
CN119398963A (en) Corn cultivation environment monitoring system based on sensor
CN117784290A (en) A sudden drought early warning method and system based on Bayesian neural network
CN118627909B (en) Animal husbandry statistics monitoring system
CN118901368B (en) Orchard water and fertilizer regulation and control method and system
CN110046756A (en) Short-time weather forecasting method based on Wavelet Denoising Method and Catboost
CN117579784B (en) Remote video monitoring system for aquaculture farm
CN118779829A (en) An intelligent detection method for pesticide residues in vegetables
CN116757332B (en) Leafy vegetable yield prediction methods, devices, equipment and media
KR101859410B1 (en) Method, Apparatus for Regional Food Safety Factor Computing, And a Computer-readable Storage Medium for executing the Method
CN113762768B (en) Dynamic Risk Assessment Method of Agricultural Drought Based on Weather Generator and Crop Model
Cheeroo-Nayamuth Crop modelling/simulation: an overview
CN111681122A (en) Construction and application of summer corn drought influence evaluation model based on soil humidity
CN118229108B (en) Rapid analysis method for fertilization amount based on grass yield
CN100451882C (en) System and method for monitoring breed of edible fungus
CN117217372B (en) Method for predicting agricultural product yield and quantifying influence of agricultural activities on yield
CN118711069B (en) Adaptive weight adjustment corn water deficit diagnosis method and device
CN111667167B (en) Agricultural grain yield estimation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: No. 527 Dongyue Street, Daiyue District, Tai'an City, Shandong Province, 271000

Applicant after: INSPUR SOFTWARE Co.,Ltd.

Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Applicant before: INSPUR SOFTWARE Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant