CN113591632A - Fruit tree yield estimation method, device and equipment based on multi-source data - Google Patents
Fruit tree yield estimation method, device and equipment based on multi-source data Download PDFInfo
- Publication number
- CN113591632A CN113591632A CN202110809221.3A CN202110809221A CN113591632A CN 113591632 A CN113591632 A CN 113591632A CN 202110809221 A CN202110809221 A CN 202110809221A CN 113591632 A CN113591632 A CN 113591632A
- Authority
- CN
- China
- Prior art keywords
- data
- fruit tree
- yield
- estimation
- source data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a fruit tree yield estimation method based on multi-source data, which comprises the following steps: the method comprises the steps of obtaining multi-source data of a fruit tree to be estimated, calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm, converting the multi-source data into a plurality of mutually unrelated main component data according to a data dimension reduction method when any one correlation coefficient is larger than a certain threshold value, inputting the plurality of mutually unrelated main component data into a pre-established estimation model, and calculating the yield per mu of the fruit tree to be estimated. According to the method, aiming at the growth characteristics of crops and the acquirability of data, various different estimation models are established according to the difference of the acquired data under different time nodes, so that the yield prediction can have the best quality data, and the model precision is improved.
Description
Technical Field
The invention relates to the technical field of crop growth monitoring, in particular to a fruit tree yield estimation method, device and equipment based on multi-source data.
Background
China is a big agricultural country, and the development of agriculture plays a crucial role in the livelihood of the national people. The crop yield is related to the crop production capacity of a certain region, and if the crop yield of a certain region can be predicted in advance, the market value of crops can be predicted in advance. Meanwhile, if key factors for controlling the yield are analyzed, the production costs of irrigation, fertilization, labor and the like can be flexibly controlled, so that high yield can be achieved at the minimum cost. The popularity of the current machine learning algorithm greatly improves the precision of the estimated yield, and the intelligent agriculture also becomes a gospel for the peasants and the country.
Most of the existing fruit tree prediction models analyze and predict the self physiological conditions of fruit trees; the established model is single, and different predictions cannot be carried out under different conditions; compared algorithms are one-sided, and the accuracy of data can not be improved by utilizing various algorithms; and most of algorithms selected in the existing research are only suitable for the yield prediction of certain fruit trees, and have no universality and extensibility.
Disclosure of Invention
The invention provides a fruit tree yield estimation method based on multi-source data, and aims to solve the problems that a comparison algorithm and a prediction model are single and the model universality is poor in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a fruit tree yield estimation method based on multi-source data, which comprises the following steps:
acquiring multi-source data of a fruit tree to be estimated;
calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
when any correlation coefficient is larger than a certain threshold value, converting the multi-source data into a plurality of mutually unrelated main component data according to a data dimension reduction method;
and inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model, and calculating the per mu yield of the fruit tree to be estimated.
The method comprises the steps of obtaining vegetation indexes EVI and RVI, meteorological data and geographic environment data by using a remote sensing technology and a geographic information technology, dividing the data into data sets for different time periods according to the time periods, calculating correlation coefficients among the data in the different data sets according to a product moment correlation algorithm, and performing dimension reduction processing on the data when any one of the correlation coefficients is larger than a set threshold, wherein the method specifically comprises the following steps: firstly, converting the data into dimensionless sample data by using a data standardization method, converting the sample data with correlation into irrelevant principal component data by using a principal component analysis method, respectively fitting the data by using algorithms such as ridge regression, Bayesian regression, linear kernel support vector machine regression, Gaussian process regression, random forest, SVR (singular value regression), gradient tree lifting regression, voting regressor and the like to generate a plurality of sub-estimation models, evaluating the performance of the sub-estimation models by using a cross verification method, determining the sub-estimation model with the best performance in each data set, taking the algorithm for generating the sub-estimation model as the optimal algorithm of the corresponding data set, carrying out overall training on the data set by using the optimal algorithm to obtain a final estimation model corresponding to each data set, selecting a proper final estimation model to estimate the yield of fruit trees according to the time limit of new data, the method comprehensively considers the topographic characteristics, the climatic characteristics and the remote sensing characteristics of the key growth period of crops, is more suitable for predicting the fruit tree yield of a specific area, and is also suitable for predicting the yield of other crops.
Preferably, when any one of the correlation coefficients is greater than a certain threshold, the converting the multi-source data into a plurality of mutually unrelated principal component data according to a data dimension reduction method includes:
converting the multi-source data into sample data with uniform dimension according to a data standardization processing method;
and when any one correlation coefficient is larger than a certain threshold value, performing orthogonal transformation on the sample data with the uniform dimension according to a principal component analysis method to obtain a plurality of mutually uncorrelated principal component data, wherein each principal component data is used for representing the linear combination of the plurality of sample data.
Preferably, the inputting the plurality of independent principal component data into a pre-established yield assessment model, and calculating the yield per mu of the fruit tree to be assessed comprises:
obtaining N different data sets, and respectively fitting the N different data sets by using a regression algorithm to obtain M sub-estimation models, wherein the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result, and generating a regression algorithm of the optimal sub-assessment model;
according to the regression algorithm for generating the optimal sub-estimation model, performing overall training on a corresponding data set to obtain N candidate estimation models, and selecting one model from the N candidate estimation models as a target estimation model;
and inputting the plurality of mutually irrelevant main component data into the target estimation model for calculation to obtain the per mu yield of the fruit tree to be estimated.
Preferably, the obtaining N different data sets, and fitting the N different data sets by using a regression algorithm to obtain M sub-estimation models, where the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N, includes:
acquiring N different data sets, and dividing each data set into a training set and a verification set;
and respectively performing feature learning on each training set by using a plurality of regression algorithms to obtain M sub-estimation models, and testing the precision of the sub-estimation models by using corresponding verification sets, wherein N, M are integers greater than 1, and M is greater than N.
A fruit tree estimation device based on multi-source data comprises:
the acquisition module is used for acquiring multi-source data of the fruit tree to be estimated;
the calculation module is used for calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
the conversion module is used for converting the multi-source data into a plurality of irrelevant main component data according to a data dimension reduction method;
and the yield estimation module is used for inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model when any one correlation coefficient is larger than a certain threshold value, and calculating the per mu yield of the fruit tree to be estimated.
Preferably, the conversion module includes:
the processing unit is used for converting the multi-source data into sample data with uniform dimension according to a data standardization processing method when any one correlation coefficient is larger than a certain threshold value;
and the transformation unit is used for carrying out orthogonal transformation on the sample data with uniform dimensions according to a principal component analysis method to obtain a plurality of mutually irrelevant principal component data, and each principal component data is used for representing the linear combination of the plurality of sample data.
Preferably, the estimating module comprises:
the training unit is used for obtaining N different data sets, and fitting the N different data sets by utilizing a regression algorithm to obtain M sub-estimation models, wherein the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
the evaluation unit is used for evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result and generating a regression algorithm of the optimal sub-assessment model;
the selection unit is used for performing overall training on a corresponding data set according to the regression algorithm for generating the optimal sub-estimation model to obtain N candidate estimation models, and selecting one model from the N candidate estimation models as a target estimation model;
and the prediction unit is used for inputting the plurality of irrelevant main component data into the target assessment model for calculation to obtain the per mu yield of the fruit tree to be assessed.
Preferably, the training unit includes:
the dividing subunit is used for acquiring N different data sets and dividing each data set into a training set and a verification set;
and the learning subunit is used for respectively performing feature learning on each training set by using a plurality of regression algorithms to obtain M sub-estimation models, and testing the precision of the sub-estimation models by using corresponding verification sets, wherein N, M are integers greater than 1, and M is greater than N.
An electronic device comprising a memory and a processor, the memory storing one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement a multi-source data-based fruit tree assessment method as described in any one of the above.
A computer-readable storage medium storing a computer program which, when executed by a computer, implements a method for fruit tree production estimation based on multi-source data as described in any one of the above.
The invention has the following beneficial effects:
(1) the data set in the scheme comprises environmental data acquired based on a geographic space technology and annual yield data based on statistical investigation, and by utilizing geographic information and a remote sensing technology, the topographic characteristics, the climatic characteristics and the remote sensing characteristics of key growth periods of crops are comprehensively considered, so that the data set can better consider the influence of environmental factors and is more suitable for predicting the yield of fruit trees in a specific area;
(2) according to the scheme, two different models are established according to different acquired data under different time nodes aiming at the growth characteristics of crops and the acquirability of data, so that the yield prediction can have data with the best quality;
(3) according to the scheme, 8 algorithms are adopted to respectively fit the data of the two models and evaluate the data by using a cross-validation method, and finally the algorithm most suitable for the corresponding data is selected to generate the corresponding model;
(4) the whole process of the scheme can be extended to yield prediction of other crops, and the most suitable algorithm can be selected from data of peculiar crops.
Drawings
FIG. 1 is a first flowchart of a method for fruit tree yield estimation based on multi-source data according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a fruit tree yield estimation method based on multi-source data according to an embodiment of the present invention;
FIG. 3 is a third flowchart of a method for fruit tree yield estimation based on multi-source data according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an embodiment of a fruit tree yield estimation method based on multi-source data according to the present invention;
FIG. 5 is a schematic diagram of a fruit tree yield estimation apparatus based on multi-source data according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a conversion module of a fruit tree yield assessment apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a fruit tree yield estimation module for implementing a fruit tree yield estimation device based on multi-source data according to an embodiment of the present invention;
FIG. 8 is a block diagram of an embodiment of the present invention for implementing a fruit tree yield assessment apparatus based on multi-source data;
fig. 9 is a schematic diagram of an electronic device for implementing a fruit tree yield assessment method based on multi-source data according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the claims and in the description of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the terms so used are interchangeable under appropriate circumstances and are merely used to describe a distinguishing manner between similar elements in the embodiments of the present application and that the terms "comprising" and "having" and any variations thereof are intended to cover a non-exclusive inclusion such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
As shown in fig. 1, a fruit tree yield estimation method based on multi-source data includes the following steps:
s110, obtaining multi-source data of the fruit tree to be estimated;
s120, calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
s130, when any correlation coefficient is larger than a certain threshold value, converting the multi-source data into a plurality of irrelevant main component data according to a data dimension reduction method;
and S140, inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model, and calculating the per mu yield of the fruit tree to be estimated.
As can be seen from example 1, the product-moment correlation algorithm, also known as pearson correlation coefficient algorithm, is used to measure the degree of linear correlation between two variables X and Y, and its value is between-1 and 1, and the main idea is to calculate according to the product-difference method, and to reflect the degree of correlation between two variables by multiplying the two deviations based on the deviations of the two variables from their respective mean values.
The scheme includes that environmental data acquired based on a geographic space technology, historical yield data acquired based on statistical investigation and multisource remote sensing data acquired by a geographic information technology and a remote sensing technology are divided into different data sets according to time periods, correlation coefficients among the data in the different data sets are calculated by a product-moment correlation algorithm, data normalization processing is utilized to convert original data into dimensionless sample data, dimension reduction processing is carried out on the sample data when any one correlation coefficient is larger than a set threshold value, the data with the correlation relationship are converted into mutually independent main component data by a main component analysis method, and the data are respectively fitted by algorithms such as ridge regression, Bayesian regression, linear kernel support vector machine regression, Gaussian process regression, random forest, Baggings of SVR, gradient tree lifting regression, voting regressor and the like, obtaining a plurality of sub-estimated yield models, evaluating the performance of the sub-estimated yield models by using a cross-validation method, determining the sub-estimated yield model with the best performance in each data set, taking the algorithm for generating the sub-estimated yield model as the optimal algorithm corresponding to the data set, performing overall training on the data set by using the optimal algorithm to obtain the final estimated yield model corresponding to each data set, selecting a proper final estimated yield model to estimate the fruit tree yield according to the time limit of new data, comprehensively considering the topographic characteristics, the climatic characteristics and the remote sensing characteristics of the key growth period of crops, being more suitable for predicting the fruit tree yield in a specific area and also being suitable for predicting the yield of other crops.
Example 2
As shown in fig. 2, a fruit tree yield estimation method based on multi-source data includes:
s210, obtaining multi-source data of the fruit tree to be estimated;
s220, calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
s230, converting the multi-source data into sample data with uniform dimensions according to a data standardization processing method;
s240, when any one correlation coefficient is larger than a certain threshold value, carrying out orthogonal transformation on the sample data with uniform dimension according to a principal component analysis method to obtain a plurality of mutually irrelevant principal component data, wherein each principal component data is used for representing the linear combination of a plurality of sample data;
and S250, inputting the plurality of irrelevant main component data into a pre-established yield assessment model, and calculating the yield per mu of the fruit tree to be assessed.
According to embodiment 2, it can be known that the processing method for data normalization converts the variable into normally distributed variables, and not only can the dimension of each variable be eliminated, but also the subsequent algorithms can be better fitted, because the requirements of many algorithms on data are normal distribution, and if the dimension, that is, the unit, of each variable is different, a result that the variable with a large dimension has a large weight may be generated invisibly, but actually the weight of each variable should be the same, so the step of normalization is important, wherein the data normalization processing in the present scheme is a common processing method in the prior art, and is not described herein again.
The data dimensionality reduction is to reduce the dimensionality of data, and the purposes of increasing the sample density and denoising are achieved. The first method is feature selection, namely, a part of original dimensionality is directly selected to participate in the subsequent calculation and modeling process, all dimensionalities are replaced by the selected dimensionality, new dimensionality is not generated in the whole process, and the specific method comprises an empirical method, a measurement algorithm, a statistical analysis method and machine learning; the second method is feature extraction, namely mapping data points of a high-dimensional space into a low-dimensional space according to a certain mathematical transformation method, and then expressing the original overall features by using the mapped variable features. Principal component analysis, PCA, is a statistical process that transforms observations of a set of possible correlated variables (entities, each having a different value) into a set of linearly uncorrelated variables called principal components using orthogonal transformation, the number of different principal components being min (n-1, p) if there are n observations with p variables, such transformation being such that the first principal component has the largest possible definition of variance (i.e., accounts for as much of the variability in the data as possible) and each subsequent component has the largest possible variance under the constraint of being orthogonal to the preceding components, the resulting vectors (each vector being a linear combination of variables, containing n observations) being an orthogonal set of bases, the principal component analysis in which multi-source data having a correlation is transformed into mutually uncorrelated principal component data, the data are better fit with the algorithm, and the function of the algorithm is played to the maximum extent.
Example 3
As shown in fig. 3, a fruit tree yield estimation method based on multi-source data includes:
s310, acquiring multi-source data of the fruit tree to be estimated;
s320, calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
s330, when any correlation coefficient is larger than a certain threshold value, converting the multi-source data into a plurality of mutually unrelated main component data according to a data dimension reduction method;
s340, acquiring N different data sets, and dividing each data set into a training set and a verification set;
s350, respectively performing feature learning on each training set by utilizing a plurality of regression algorithms to obtain M sub-estimation models, wherein N, M are integers greater than 1, and M is greater than N;
s360, evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result, and generating a regression algorithm of the optimal sub-assessment model;
s370, performing integral training on a corresponding data set according to the regression algorithm for generating the optimal sub-estimation model to obtain N candidate estimation models, and selecting one model from the N candidate estimation models as a target estimation model;
and S380, inputting the plurality of irrelevant main component data into the target yield estimation model for calculation to obtain the per mu yield of the fruit tree to be estimated.
As can be seen from example 3, the data sets obtained here are pre-processed, and the pre-processing includes: calculating correlation coefficients among original data, changing the original data into sample data with uniform dimension, and changing the sample data into mutually irrelevant principal component data by using a principal component analysis method when any one correlation coefficient is larger than a certain threshold value.
The fitting is to establish an effective functional relationship between the dependent variable and the independent variable, and the fitting of the variables by using different algorithms can determine a model with the best cause-effect relationship of the description variables, so as to improve the estimation accuracy.
The basic idea of cross validation is to group the original data in a certain sense, one part is used as a training set, the other part is used as a test set, firstly, the training set is used for training a classifier, then, the test set is used for testing a model obtained by training, the performance index of the evaluation classifier is used as the performance index of the evaluation classifier, in the scheme, each sub-estimation model is subjected to cross validation, the performance of each sub-estimation model is evaluated, thereby determining the sub-estimated model with the optimal performance in each data set, taking the algorithm for generating the optimal sub-estimated model as the optimal algorithm for training the corresponding data set, integrally training the corresponding data set by using the optimal algorithm to obtain the final estimated model of each data set, and performing fitting and cross-validation, the algorithm with the best training effect on the scheme can be selected, the matching degree of the algorithm and the acquired data is improved, and the accuracy of the final estimated model is improved.
Example 4
As shown in fig. 4, a fruit tree yield estimation method based on multi-source data includes:
s410, acquiring a vegetation index, meteorological data and geographic environment data of a fruit tree to be estimated;
in the embodiment, taking hickory as an example, historical hickory yield data of 16 regions, enhanced vegetation indexes EVI and ratio vegetation indexes RVI acquired based on a remote sensing technology, meteorological data and geographic environment data are collected.
S420, calculating correlation coefficients among the acquired data according to a product-moment correlation algorithm, and visualizing the correlation coefficients by using a thermodynamic diagram;
and calculating correlation coefficients among the data in each data set according to a product moment correlation algorithm, namely a Pearson correlation coefficient algorithm, and drawing a corresponding thermodynamic diagram according to the correlation coefficients, so that the correlation relationship among the data is visualized, the data are convenient to observe, for example, in the first data set, the altitude and the longitude are in negative correlation, the correlation is high, namely-0.82, and the correlation between the last-year precipitation and the last-year yield is not high, namely-0.05.
S430, when any one correlation coefficient is larger than a certain threshold value, converting the multi-source data into a plurality of mutually unrelated main component data according to a data dimension reduction method;
the correlation between the variables affects the construction of the subsequent model, so that the variables need to be converted into mutually independent data, the variables are firstly subjected to data standardization processing, namely, all the variables with different units are converted into dimensionless quantities, namely, quantities without units, the units of the variables are unified like vegetation indexes EVI and RVI, then the values of the relevant variables are converted into a set of linear uncorrelated variable values by using orthogonal transformation, namely, the dimensionless quantities are processed again by using a principal component analysis method to obtain mutually uncorrelated principal component data, in the process, some variables are deleted to achieve the purpose of expressing as much information as possible by using less variables, and in the scheme, 8 variables are changed into 10 variables.
In addition, in the scheme, the threshold value can be a correlation coefficient with an absolute value of 0.3, when the absolute value of any one of the calculated correlation coefficients is greater than 0.3, the degree of correlation between the two data is considered to be high, and at the moment, the data standardization processing and the principal component analysis are carried out on the acquired multi-source data.
S440, obtaining N different data sets, and respectively fitting the N different data sets by using a regression algorithm to obtain M sub-estimation models, wherein the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
acquiring two data sets for different time periods, wherein the first data set comprises data such as yield per mu obtained from 11 months in the previous year to 8 months in the next year, such as RVI and EVI in 7 months, the second data set comprises data such as RVI and EVI obtained from 8 months in the present year, data between yield per mu obtained in the present year and yield per mu obtained in the present year from sunny days to 11 months in the flowering period are acquired, and the two data sets correspond to two prediction models for different time periods: the model I and the model II, wherein the model I adopts historical data to predict the yield per mu of the next year between the yield per mu of 11 months and the yield per mu of 8 months of the next year, the model I adopts EVI of 7 months, the model II adopts the model I of 7 months, the yield per mu of the next year between the precipitation of flowering phase and the sunny day of flowering phase to predict the yield per mu of the current year, and the model II adopts updated data to replace the historical data to predict the yield per mu of the current year between the yield per mu of the current year in 8 months and the yield per mu of the current year between the precipitation of flowering phase and the sunny day of the same year and 11 months of the same year;
calculating correlation coefficients among the data, converting the data into data with unified dimensions, converting the data with unified dimensions into the data with unified dimensions by using principal component analysis when any one of the correlation coefficients is larger than a set threshold value, then selecting eight different algorithms which are respectively a ridge regression, a Bayesian regression, a linear kernel support vector machine regression, a Gaussian process regression, a random forest, a Baggings of SVR, a gradient tree lifting regression and a voting regressor to fit the data with the irrelevant data, wherein the eight algorithms are not determined and can be transformed according to preferences, wherein the eight algorithms are representative and are mainstream algorithms, the eight algorithms are used for respectively fitting the principal component data, namely, the algorithm is used for obtaining causal relationships among the data with the irrelevant data, and the relationships can be expressed by using a model, thus, sixteen sub-estimation models were obtained.
S450, evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set and a regression algorithm for generating the optimal sub-assessment model according to an evaluation result, and performing overall training on the corresponding data set according to the regression algorithm for generating the optimal sub-assessment model to obtain N alternative assessment models;
in the scheme, the sixteen sub-assessment models are respectively subjected to cross validation, multiple groups of different training and validation are carried out on the sub-assessment models, so that the problems that an individual test result is too large and training data are insufficient are solved, the final assessment model is more accurate, the sub-assessment model with the highest precision can be selected through multiple validation, the algorithm corresponding to the sub-assessment model is the optimal algorithm for training the corresponding data set, the optimal algorithm is used for training the corresponding data set to obtain the final assessment model of each data set, the optimal algorithms corresponding to the two prediction models in the scheme are Bayesian regression algorithms, and the models generated through the two algorithms are the first model and the second model needed by the scheme.
And S460, selecting a model from the N candidate yield estimation models as a target yield estimation model, and inputting the plurality of irrelevant main component data into the target yield estimation model for calculation to obtain the yield per mu of the fruit tree to be estimated.
The method comprises the steps of carrying out time division on the obtained vegetation index, meteorological data and geographic environment data, training a plurality of irrelevant principal component data by using a model if the obtained data belongs to a first data set to obtain the next year yield of the fruit tree to be estimated, training the irrelevant principal component data by using the model if the obtained data belongs to a second data set to obtain the current year yield of the fruit tree to be estimated, and increasing the precision and the prediction capability of the estimation model by adopting different models under different time backgrounds.
Example 5
As shown in fig. 5, a fruit tree yield estimation device based on multi-source data includes:
the obtaining module 10 is used for obtaining multi-source data of the fruit trees to be estimated;
the calculation module 20 is used for calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
a conversion module 30, configured to, when any one of the correlation coefficients is greater than a certain threshold, convert the multi-source data into a plurality of unrelated principal component data according to a data dimension reduction method;
and the yield estimation module 40 is used for inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model and calculating the yield per mu of the fruit tree to be estimated.
One embodiment of the above apparatus may be: the method comprises the steps that an obtaining module 10 obtains multi-source data of a fruit tree to be estimated, a calculating module 20 calculates correlation coefficients among the multi-source data according to a product-moment correlation algorithm, a converting module 30 converts the multi-source data into a plurality of mutually irrelevant main component data according to a data dimension reduction method when any one correlation coefficient is larger than a certain threshold value, and an estimating module 40 inputs the plurality of mutually irrelevant main component data into a pre-established estimating model to calculate the yield per mu of the fruit tree to be estimated.
Example 6
As shown in fig. 6, a conversion module 30 of a fruit tree assessment apparatus based on multi-source data includes:
the processing unit 32 is configured to convert the multi-source data into sample data with uniform dimensions according to a data standardization processing method;
and a transforming unit 34, configured to perform orthogonal transformation on the sample data with uniform dimensions according to a principal component analysis method when any one of the correlation coefficients is greater than a certain threshold, to obtain a plurality of mutually uncorrelated principal component data, where each principal component data is used to represent a linear combination of a plurality of sample data.
One embodiment of the conversion module 30 of the above apparatus may be: the processing unit 32 converts the multi-source data into sample data with uniform dimensions according to a data standardization processing method, and the transformation unit 34 performs orthogonal transformation on the sample data with uniform dimensions according to a principal component analysis method when any correlation coefficient is larger than a certain threshold value, so as to obtain a plurality of principal component data which are not related to each other, wherein each principal component data is used for representing a linear combination of the plurality of sample data.
Example 7
As shown in fig. 7, an assessment module 40 of a fruit tree assessment apparatus based on multi-source data includes:
the training unit 42 is configured to obtain N different data sets, and respectively fit the N different data sets by using a regression algorithm to obtain M sub-estimation models, where the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
the evaluation unit 44 is configured to evaluate the M sub-assessment models by using a cross validation method, determine an optimal sub-assessment model corresponding to each data set according to an evaluation result, and generate a regression algorithm of the optimal sub-assessment model;
a selecting unit 46, configured to perform overall training on a corresponding data set according to the regression algorithm for generating the optimal sub-assessment model to obtain N candidate assessment models, and select one model from the N candidate assessment models as a target assessment model;
and the predicting unit 48 is used for inputting the plurality of mutually irrelevant main component data into the target assessment model for calculation to obtain the per mu yield of the fruit tree to be assessed.
One embodiment of the above described device's assessment module 40 may be: the training unit 42 obtains N different data sets, the N different data sets are respectively fitted by using a regression algorithm to obtain M sub-assessment models, the regression algorithm is multiple, N, M is an integer greater than 1, M is greater than N, the evaluation unit 44 evaluates the M sub-assessment models by using a cross-validation method, determines an optimal sub-assessment model corresponding to each data set according to an evaluation result and a regression algorithm for generating the optimal sub-assessment model, the selection unit 46 performs overall training on the corresponding data sets according to the regression algorithm for generating the optimal sub-assessment model to obtain N candidate assessment models, and selects one model from the N candidate assessment models as a target assessment model, the prediction unit 48 inputs the plurality of mutually unrelated main component data into the target assessment model for calculation, and obtaining the acre yield of the fruit tree to be estimated.
Example 8
As shown in fig. 8, a specific implementation module is:
the collecting module 1 is used for obtaining vegetation indexes, meteorological data and geographic environment data of fruit trees to be estimated;
the visualization module 2 is used for calculating correlation coefficients among the data acquired by the collection module 1 according to a product-moment correlation algorithm and visualizing the correlation coefficients by using a thermodynamic diagram;
the dimensionality reduction module 3 is used for converting the multi-source data into a plurality of irrelevant main component data according to a data dimensionality reduction method when any one correlation coefficient in the visualization module 2 is larger than a certain threshold value;
the fitting module 4 is configured to obtain N different data sets, and fit the N different data sets by using a regression algorithm to obtain M sub-estimation models, where the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
the evaluation module 5 is used for evaluating the M sub-assessment models obtained by the fitting module 4 by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result and a regression algorithm for generating the optimal sub-assessment model, and performing overall training on the corresponding data sets according to the regression algorithm for generating the optimal sub-assessment model to obtain N candidate assessment models;
and the prediction module 6 is configured to select one model from the N candidate yield estimation models obtained by the evaluation module 5 as a target yield estimation model, and input the plurality of mutually unrelated main component data into the target yield estimation model for calculation to obtain the yield per mu of the fruit tree to be estimated.
Example 9
As shown in fig. 9, an electronic device includes a memory 901 and a processor 902, where the memory 901 is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor 902 to implement the above-mentioned method for fruit tree assessment based on multi-source data.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
A computer-readable storage medium storing a computer program, which when executed by a computer, implements a fruit tree yield assessment method based on multi-source data as described above.
Illustratively, a computer program may be divided into one or more modules/units, one or more modules/units are stored in the memory 901, and executed by the processor 902, and the I/O interface transmission of data is performed by the input interface 905 and the output interface 906 to complete the present invention, and one or more modules/units may be a series of computer program instruction segments for describing the execution of the computer program in a computer device, which can perform specific functions.
The computer device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer device may include, but is not limited to, the memory 901 and the processor 902, and those skilled in the art will appreciate that the present embodiment is only an example of the computer device and does not constitute a limitation of the computer device, and may include more or less components, or combine some components, or different components, for example, the computer device may further include the input 907, a network access device, a bus, etc.
The processor 902 may be a Central Processing Unit (CPU), other general-purpose processor 902, a digital signal processor 902 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor 902 may be a microprocessor 902 or the processor 902 may be any conventional processor 902 or the like.
The storage 901 may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory 901 may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (FlashCard) and the like provided on the computer device, further, the memory 901 may also include both an internal storage unit and an external storage device of the computer device, the memory 901 is used for storing computer programs and other programs and data required by the computer device, the memory 901 may also be used for temporarily storing the program codes in the output device 908, and the aforementioned storage media include various media capable of storing program codes, such as a usb disk, a removable hard disk, a read only memory ROM903, a random access memory RAM904, a disk and an optical disk.
Claims (10)
1. A fruit tree yield estimation method based on multi-source data is characterized by comprising the following steps:
acquiring multi-source data of a fruit tree to be estimated;
calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
when any correlation coefficient is larger than a certain threshold value, converting the multi-source data into a plurality of mutually unrelated main component data according to a data dimension reduction method;
and inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model, and calculating the per mu yield of the fruit tree to be estimated.
2. The fruit tree yield assessment method based on multi-source data according to claim 1, wherein when any correlation coefficient is greater than a certain threshold, the multi-source data is converted into a plurality of mutually uncorrelated main component data according to a data dimension reduction method, which includes:
converting the multi-source data into sample data with uniform dimension according to a data standardization processing method;
and when any one correlation coefficient is larger than a certain threshold value, performing orthogonal transformation on the sample data with the uniform dimension according to a principal component analysis method to obtain a plurality of mutually uncorrelated principal component data, wherein each principal component data is used for representing the linear combination of the plurality of sample data.
3. The fruit tree yield estimation method based on multi-source data according to claim 1, wherein the inputting of the plurality of independent principal component data into a pre-established yield estimation model to calculate the yield per mu of the fruit tree to be estimated comprises:
obtaining N different data sets, and respectively fitting the N different data sets by using a regression algorithm to obtain M sub-estimation models, wherein the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result, and generating a regression algorithm of the optimal sub-assessment model;
according to the regression algorithm for generating the optimal sub-estimation model, performing overall training on a corresponding data set to obtain N candidate estimation models, and selecting one model from the N candidate estimation models as a target estimation model;
and inputting the plurality of mutually irrelevant main component data into the target estimation model for calculation to obtain the per mu yield of the fruit tree to be estimated.
4. The fruit tree yield estimation method based on multi-source data according to claim 3, wherein the obtaining of N different data sets and the fitting of the N different data sets by using a regression algorithm to obtain M sub-yield estimation models, the regression algorithm being multiple, N, M being an integer greater than 1, and M being greater than N comprises:
acquiring N different data sets, and dividing each data set into a training set and a verification set;
and respectively performing feature learning on each training set by using a plurality of regression algorithms to obtain M sub-estimation models, and testing the precision of the sub-estimation models by using corresponding verification sets, wherein N, M are integers greater than 1, and M is greater than N.
5. The utility model provides a fruit tree estimation device based on multisource data which characterized in that includes:
the acquisition module is used for acquiring multi-source data of the fruit tree to be estimated;
the calculation module is used for calculating correlation coefficients among the multi-source data according to a product moment correlation algorithm;
the conversion module is used for converting the multi-source data into a plurality of irrelevant main component data according to a data dimension reduction method when any one correlation coefficient is larger than a certain threshold value;
and the yield estimation module is used for inputting the plurality of mutually irrelevant main component data into a pre-established yield estimation model and calculating the yield per mu of the fruit tree to be estimated.
6. The fruit tree assessment device based on multi-source data according to claim 5, wherein said conversion module comprises:
the processing unit is used for converting the multi-source data into sample data with uniform dimensions according to a data standardization processing method;
and the transformation unit is used for carrying out orthogonal transformation on the sample data with uniform dimensions according to a principal component analysis method when any one correlation coefficient is larger than a certain threshold value to obtain a plurality of mutually irrelevant principal component data, wherein each principal component data is used for representing the linear combination of a plurality of sample data.
7. The fruit tree yield assessment device based on multi-source data of claim 5, wherein the yield assessment module comprises:
the training unit is used for obtaining N different data sets, and fitting the N different data sets by utilizing a regression algorithm to obtain M sub-estimation models, wherein the regression algorithm is multiple, N, M is an integer greater than 1, and M is greater than N;
the evaluation unit is used for evaluating the M sub-assessment models by using a cross-validation method, determining an optimal sub-assessment model corresponding to each data set according to an evaluation result and generating a regression algorithm of the optimal sub-assessment model;
the selection unit is used for performing overall training on a corresponding data set according to the regression algorithm for generating the optimal sub-estimation model to obtain N candidate estimation models, and selecting one model from the N candidate estimation models as a target estimation model;
and the prediction unit is used for inputting the plurality of irrelevant main component data into the target assessment model for calculation to obtain the per mu yield of the fruit tree to be assessed.
8. The fruit tree assessment device based on multi-source data as claimed in claim 7, wherein said training unit comprises:
the dividing subunit is used for acquiring N different data sets and dividing each data set into a training set and a verification set;
and the learning subunit is used for respectively performing feature learning on each training set by using a plurality of regression algorithms to obtain M sub-estimation models, and testing the precision of the sub-estimation models by using corresponding verification sets, wherein N, M are integers greater than 1, and M is greater than N.
9. An electronic device, comprising a memory and a processor, wherein the memory is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to realize the multi-source data-based fruit tree assessment method according to any one of claims 1 to 4.
10. A computer-readable storage medium storing a computer program, wherein the computer program is used for causing a computer to execute the method for fruit tree estimation based on multi-source data according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110809221.3A CN113591632A (en) | 2021-07-16 | 2021-07-16 | Fruit tree yield estimation method, device and equipment based on multi-source data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110809221.3A CN113591632A (en) | 2021-07-16 | 2021-07-16 | Fruit tree yield estimation method, device and equipment based on multi-source data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113591632A true CN113591632A (en) | 2021-11-02 |
Family
ID=78247936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110809221.3A Pending CN113591632A (en) | 2021-07-16 | 2021-07-16 | Fruit tree yield estimation method, device and equipment based on multi-source data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591632A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116362416A (en) * | 2023-05-19 | 2023-06-30 | 广东省气象服务中心(广东气象影视宣传中心) | Regional longan annual output prediction method and device and computer equipment |
-
2021
- 2021-07-16 CN CN202110809221.3A patent/CN113591632A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116362416A (en) * | 2023-05-19 | 2023-06-30 | 广东省气象服务中心(广东气象影视宣传中心) | Regional longan annual output prediction method and device and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112906298B (en) | Blueberry yield prediction method based on machine learning | |
CN111126662A (en) | Irrigation decision making method, device, server and medium based on big data | |
Brdar et al. | Support vector machines with features contribution analysis for agricultural yield prediction | |
CN116610735B (en) | Intelligent management method and system for data storage | |
CN117787510B (en) | Optimization method of pesticide residue monitoring process based on time sequence predictive analysis | |
Choudhary et al. | Yieldpredict: A crop yield prediction framework for smart farms | |
Yang et al. | Prediction of corn variety yield with attribute-missing data via graph neural network | |
Yamparla et al. | Crop yield prediction using random forest algorithm | |
CN113591632A (en) | Fruit tree yield estimation method, device and equipment based on multi-source data | |
Bhasha et al. | Automated crop yield prediction system using machine learning algorithm | |
Zhang et al. | Collaborative Forecasting and Analysis of Fish Catch in Hokkaido From Multiple Scales by Using Neural Network and ARIMA Model | |
Mohd et al. | Comparative study of rainfall prediction modeling techniques (A case study on Srinagar, J&K, India) | |
Deforce et al. | Harnessing the power of transformers and data fusion in smart irrigation | |
Jackson et al. | Robust Ensemble Machine Learning for Precision Agriculture | |
CN117930012A (en) | Battery consistency assessment method and device, computer equipment and storage medium | |
Bóbeda et al. | Using regression trees to predict citrus load balancing accuracy and costs | |
Devi et al. | Hybrid deep WaveNet-LSTM architecture for crop yield prediction | |
O'Leary et al. | An evaluation of machine learning approaches for milk volume prediction in Ireland | |
Manjula et al. | Efficient prediction of recommended crop variety through soil nutrients using deep learning algorithm | |
CN110458438A (en) | The calculation method and device of the impact factor of vegetation water use efficiency WUE | |
KR20200070736A (en) | Method of predicting crop yield and apparatus for embodying the same | |
Dharwadkar et al. | Crop yield prediction using deep learning algorithm based on CNN-LSTM with Attention Layer and Skip Connection | |
Islam et al. | Machine learning models to predict soil moisture for irrigation schedule | |
Pawar et al. | A comparative study of different algorithms used to predict the crop, its yield and price | |
Soodtoetong et al. | The performance of crop yield forecasting model based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |