CN116629492A - Integrated learning optimization evaluation method for soil quality improvement effect - Google Patents
Integrated learning optimization evaluation method for soil quality improvement effect Download PDFInfo
- Publication number
- CN116629492A CN116629492A CN202310650387.4A CN202310650387A CN116629492A CN 116629492 A CN116629492 A CN 116629492A CN 202310650387 A CN202310650387 A CN 202310650387A CN 116629492 A CN116629492 A CN 116629492A
- Authority
- CN
- China
- Prior art keywords
- soil quality
- evaluation
- verification
- model
- mds
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002689 soil Substances 0.000 title claims abstract description 112
- 238000011156 evaluation Methods 0.000 title claims abstract description 49
- 230000000694 effects Effects 0.000 title claims abstract description 19
- 230000006872 improvement Effects 0.000 title claims abstract description 15
- 238000005457 optimization Methods 0.000 title claims abstract description 7
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000012795 verification Methods 0.000 claims abstract description 38
- 239000011368 organic material Substances 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 27
- 238000010801 machine learning Methods 0.000 claims abstract description 18
- 238000010276 construction Methods 0.000 claims abstract description 7
- 230000004044 response Effects 0.000 claims abstract description 6
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 claims description 14
- 239000002028 Biomass Substances 0.000 claims description 12
- 230000000813 microbial effect Effects 0.000 claims description 12
- ZLMJMSJWJFRBEC-UHFFFAOYSA-N Potassium Chemical compound [K] ZLMJMSJWJFRBEC-UHFFFAOYSA-N 0.000 claims description 9
- 239000011591 potassium Substances 0.000 claims description 9
- 229910052700 potassium Inorganic materials 0.000 claims description 9
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims description 8
- 229910052698 phosphorus Inorganic materials 0.000 claims description 8
- 239000011574 phosphorus Substances 0.000 claims description 8
- 229910052757 nitrogen Inorganic materials 0.000 claims description 7
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 claims description 6
- 229910052799 carbon Inorganic materials 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 5
- 238000000540 analysis of variance Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000002790 cross-validation Methods 0.000 claims description 4
- 238000011160 research Methods 0.000 claims 1
- 238000011158 quantitative evaluation Methods 0.000 abstract description 3
- 241000209094 Oryza Species 0.000 description 14
- 235000007164 Oryza sativa Nutrition 0.000 description 14
- 235000009566 rice Nutrition 0.000 description 14
- 240000008042 Zea mays Species 0.000 description 9
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 9
- 241000196324 Embryophyta Species 0.000 description 8
- 241000209140 Triticum Species 0.000 description 8
- 235000021307 Triticum Nutrition 0.000 description 8
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 6
- 235000005822 corn Nutrition 0.000 description 6
- 239000003337 fertilizer Substances 0.000 description 6
- 238000012417 linear regression Methods 0.000 description 6
- 238000013441 quality evaluation Methods 0.000 description 6
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 description 3
- 235000009973 maize Nutrition 0.000 description 3
- 239000005416 organic matter Substances 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 239000000090 biomarker Substances 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000846 Bartlett's test Methods 0.000 description 1
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 description 1
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 description 1
- 101710184309 Probable sucrose-6-phosphate hydrolase Proteins 0.000 description 1
- 102400000472 Sucrase Human genes 0.000 description 1
- 101710112652 Sucrose-6-phosphate hydrolase Proteins 0.000 description 1
- 240000000359 Triticum dicoccon Species 0.000 description 1
- 108010046334 Urease Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009355 double cropping Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 230000004720 fertilization Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 235000011073 invertase Nutrition 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Computational Linguistics (AREA)
- Mining & Mineral Resources (AREA)
- Marine Sciences & Fisheries (AREA)
- Animal Husbandry (AREA)
- Agronomy & Crop Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of evaluation methods, in particular to an integrated learning optimization evaluation method for soil quality improvement effect, which comprises an example of a soil quality prediction data set, a mark of the soil quality prediction data set, and construction and verification of a machine learning single model and an integrated model based on the soil quality prediction data set; the construction and verification of the machine learning single model and the integrated model based on the soil quality prediction data set comprises model performance evaluation based on a training set and a verification set, model performance evaluation based on a test set, yield verification of the integrated model and response of a soil quality index to input of different organic materials; the method solves the problems of insufficient biological index information, insufficient comprehensive soil attribute data under large spatial scale, insufficient accuracy of a Minimum Dataset (MDS) verification link and the like in the traditional soil quality quantitative evaluation method.
Description
Technical Field
The invention relates to the technical field of evaluation methods, in particular to an integrated learning optimization evaluation method for soil quality improvement effect.
Background
The soil quality evaluation is used as a means for evaluating the influence of mankind activities such as management measures, land utilization changes and the like on the soil, is beneficial to timely grasping the current situation and change dynamics of the soil quality, and further realizes sustainable management of land resources. The key step of evaluating the soil quality is to establish a set of sensitive and representative evaluation index system. Soil quality is primarily represented by its intrinsic and extrinsic functions, which can be represented by a range of physical, chemical and biological indicators. The selection principle of soil quality indexes: (1) included in an ecological process, associated with a model process; (2) Integrates the physical, chemical and biological processes of soil; (3) acceptable to most users and applicable to field conditions; (4) easy measurement and good reproducibility; (5) Sensitive to climate and management conditions changes, so as to be able to monitor changes in the nature of the soil; (6) being as part of an existing database as possible. The soil quality evaluation has a plurality of selectable indexes, and the soil quality can be reflected more truly by selecting a more comprehensive index in the full data set, but the cost of data acquisition is increased obviously.
The traditional quantitative soil quality evaluation method has the problems of insufficient biological index information, insufficient comprehensive soil attribute data under large spatial scale, insufficient accuracy of a Minimum Data Set (MDS) verification link and the like.
Therefore, an integrated learning optimization evaluation method for soil quality improvement effect is provided for the problems.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention aims to provide an integrated learning optimization evaluation method for soil quality improvement effect, which comprises the steps of soil quality prediction data set example, soil quality prediction data set marking, machine learning single model based on the soil quality prediction data set and integrated model construction and verification;
the construction and verification of the machine learning single model and the integrated model based on the soil quality prediction data set comprises model performance evaluation based on a training set and a verification set, model performance evaluation based on a test set, yield verification of the integrated model and response of a soil quality index to input of different organic materials.
Preferably, the soil quality prediction data set is established by using a minimum data set, and the evaluation indexes of the final selected MDS comprise 6 indexes of volume weight, organic matters, quick-acting phosphorus, quick-acting potassium, microbial biomass carbon and microbial biomass nitrogen.
Preferably, the marking of the soil quality prediction data set adopts soil quality index and minimum data set verification, and after each evaluation index is scored and weighted, the soil quality index is obtained based on TDS and MDS calculation respectively.
Preferably, the model performance evaluation based on the training set and the verification set adopts the analysis of variance test corrected by the Welch method to study the difference of the soil quality index method and three machine learning models on the soil quality index.
Preferably, the model performance evaluation based on the training set and the verification set further evaluates the performance of each model by adopting a 10-fold cross verification method: the 10-fold cross validation method randomly divides the training set into 10 mutually exclusive subsets of similar size, then uses the union of 9 subsets each time as the training set, and the remaining subset as the validation set, thereby performing 10 training and evaluation on the model.
Preferably, the test set-based model performance evaluation, through further evaluation of the test set model performance and by combining the analysis results of the training set and the verification set, obtains the highest prediction performance on the training set by the RFR-MDS and keeps higher precision on the test set, and obtains the prediction performance similar to the RFR-MDS on the test set by the LightGBMR-MDS and has the highest potential of avoiding the fitting risk.
Preferably, the yield verification of the integrated model is used to verify the accuracy of the soil quality index calculation, since there is a strong correlation between soil quality and crop yield.
Preferably, the response of the soil quality index to the input of different organic materials adopts two integrated models for rice, corn and wheat to evaluate the influence of the input of different organic materials on the overall soil quality index of three crops; based on the prediction results of the two integrated models, different organic material types exhibit significance (P < 0.001) for the soil quality index.
The invention has the advantages that:
1. the method utilizes a machine learning integrated model and combines an MDS evaluation index system based on soil classification to predict the TDS soil quality index. The verification result of crop yield shows that SQI-TDS-classified of each crop under different soil types has a significant positive correlation (P < 0.05) with yield, and R is 75.5% of the samples 2 Values exceeding 0.5 indicate that the calculated soil quality index is reasonable. The evaluation result of the machine learning model proves the high-precision performance and better application prospect of the integrated model in the soil quality index prediction.
2. The test set and training set linear regression analysis shows that: r of machine learning integrated model 2 Values (RFR and R of LightGBMR 2 Values 0.976 and 0.974, respectively, P < 0.001) compared to the result of MDS verification by soil quality index method 2 Value (R) 2 A value of 0.771) is greatly improved.
3. And (3) integrating verification results of the verification set and the test set, wherein the LightGBMR is the optimal model selection in both the aspects of model precision and improvement of the over-fitting problem, so that the LightGBMR is suggested to be used as a soil quality evaluation model for replacing the traditional MDS-based soil quality index method. Furthermore, the RFR and LightGBMR integrated models are not limited by the sample size of the soil type for the prediction of the soil quality index, and quite high prediction accuracy is achieved in the case of relatively small samples.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a graph of a common factor variance ratio and a Norm value of each principal component load value, a characteristic value, a cumulative variance contribution ratio, and an evaluation index of a full-scale dataset of the present invention (in (a) - (c), physical, chemical, and biological indexes of soil are respectively concentrated on different color blocks, the principal component load value is a correlation coefficient of the index and the principal component, reflecting importance of each index to three principal components. (d) represents the common factor variance ratio obtained by performing factor analysis based on the full-scale dataset;
FIG. 2 is a graph of the results of a linear regression analysis between SQI-TDS and SQI-MDS of the present invention (the black dashed line represents the 1:1 line);
FIG. 3 is a graph of a violin of the present invention based on soil quality indices obtained by different methods (soil quality index method on left side of black dotted line, machine learning model on right side, (a) training set, (b) test set);
fig. 4 is a violin graph of soil quality index based on different methods of the present invention (left side of black dotted line is soil quality index method, right side is machine learning model. (a) is training set, (b) is test set);
FIG. 5 is a comparison graph of yield verification performed on three major crop divisions according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
1. Example of soil quality prediction dataset—minimum dataset establishment
In order to reduce the number of the evaluation indexes, eliminate information overlapping generated by interaction between the indexes, principal component analysis is performed on 11 primary selection indexes, and a PC with a characteristic value greater than 1 is extracted. The KMO (Kaiser-Meyer-Olkin) value exceeds 0.8 (0.834), and the corresponding P value of Bartlett test is smaller than 0.05, which shows that the bias correlation among various indexes is strong, and the method is suitable for principal component analysis. The total of three PC characteristic values is larger than 1, and the cumulative variance contribution rate reaches 62.5%. First, the indices are grouped based on the absolute values of the loads of the indices on the three PCs. In PC1, since the absolute value of the load of microbial biomass carbon, microbial biomass nitrogen, urease, total nitrogen, phosphatase and sucrase is 0.5 or more, they are classified into one group; PC2 groups the volume weight, quick-acting phosphorus and pH into a group; PC3 includes fast-acting potassium and organic matter. For PC1, PC2 and PC3, further analysis of the Norm value and correlation between the indices is required. The highest value of Norm in PC1 is the microbial biomass carbon (1.71), and the index in the 10% range is only microbial biomass nitrogen. Although the correlation coefficient value between microbial biomass carbon and microbial biomass nitrogen is greater than 0.5 (P < 0.001), they are selected simultaneously into MDS herein to increase the diversity of biological indicators. The highest value of Norm in PC2 is fast-acting phosphorus (1.18), whereas no index is in its 10% range. The highest value of Norm in PC3 is quick-acting potassium (1.08), while organic matter is the only index in its 10% range. Since there is no correlation between the fast-acting potassium and the organic matter, they both enter the MDS. In addition, given that the bulk density has a higher frequency of choice in the MDS, the bulk density also enters the MDS. The highest value of Norm in PC2 is quick-acting phosphorus (1.17), and in the range of 10% of the value (1.05), the index is quick-acting potassium only. Since the correlation coefficient value between fast acting potassium and fast acting phosphorus is less than 0.5 (P < 0.001), both fast acting phosphorus and fast acting potassium enter MDS. In summary, the evaluation indexes of the final selected MDS comprise 6 indexes in total of volume weight, organic matters, quick-acting phosphorus, quick-acting potassium, microbial biomass carbon and microbial biomass nitrogen.
2. Indicia of soil quality prediction dataset-soil quality index and minimum dataset verification
The rationality verification of the minimum data set evaluation index system is an important link of soil quality evaluation, and the accuracy of an evaluation result needs to be verified so as to ensure the accuracy of the evaluation. And after scoring and weighting each evaluation index, calculating based on the TDS and the MDS respectively to obtain the soil quality index. The SQI-TDS is between 0.138 and 0.992, the average value is 0.543+/-0.171, and the variation coefficient is 31.5%. The SQI-MDS is between 0.101 and 1.00, the average value is 0.587+/-0.195, and the variation coefficient is 33.2%. Compared with SQI-TDS, the SQI-MDS has larger polar difference, higher average value and relatively larger fluctuation amplitude. Linear regression analysis of SQI-TDS and SQI-MDS, from the fitting effect, the SQI-TDS and the SQI-MDS show a very significant positive correlation (P < 0.001), R 2 The value was 0.737, which is close to most of the findings (Guo et a1.,2017: li et al, 2019: li et al, 2020). RMSE value of 0.109, rpd value of 1.57, at lower level. The 1:1 line may reflect the consistency between the two comparison objects. The SQI-TDS and SQI-MDS can be distributed more evenly across the 1:1 line. The results show that: the SQI-MDS can achieve the expected verification effect of the traditional method in soil quality evaluation of large spatial scale. However, there is still a clear gap in pursuing high accuracy.
3. Construction and verification of machine learning single model and integrated model based on soil quality prediction data set
3.1 model Performance assessment based on training set and validation set
According to the result of the adjustment of the hyper-parameters, we first evaluate the model performance of the training set. For the training set, from the SQI-TDS, SQI-MDS, and Violin maps of predicted values of machine learning modelsThe range of DTR-MDS is seen to be smaller, while the range of SQI-MDS and the three models are closer to SQI-TDS. The soil quality indexes obtained by different methods are most concentrated in distribution between the median and 75% quantiles. The density distribution of RFR-MDS and LightGBMR-MDS is closer to that of SQI-TDS. The variance analysis and test corrected by the Welch method is used for researching the difference of the soil quality index method and three machine learning models on the soil quality index. The different methods all showed significance (P < 0.001) for soil quality index, and the average value of SQI-MDS was significantly higher than that of SQI-TDS, DTR-MDS, RFR-MDS and LightGBMR-MDS (P < 0.001). And (3) carrying out linear regression analysis on the SQI-TDS and the DTR-MDS, the RFR-MDS and the LightGBM-MDS respectively, wherein the DTR-MDS, the RFR-MDS and the LightGBM-MDS have extremely obvious positive correlation (P is less than 0.001) with the SQI-TDS from the aspect of fitting effect. R of three models 2 The values are all above 0.93, and a relatively high precision is achieved, wherein the highest value is RFR-MDS (R 2 A value of 0.983). For RMSE and RPD values, RFR-MDS. The RPD values of the three models are all over 2.5, and the extremely high requirement of quantitative evaluation of soil quality can be met.
Notably, the high predictive performance exhibited by the three machine learning models in the training set may be at risk of overfitting, so the performance of each model is further evaluated herein using a 10-fold cross validation method. The 10-fold cross validation method randomly divides the training set into 10 mutually exclusive subsets of similar size, then uses the union of 9 subsets each time as the training set, and the remaining subset as the validation set, thereby performing 10 training and evaluation on the model. 10 evaluations (RMSE values) of the three models. Analysis of variance test results corrected by Welch method show that: all three models exhibited significance (P < 0.001) for the RMSE values, with the DTR-MDS having the highest RMSE value and significantly higher than the RFR-MDS and LightGBM-MDS (P < 0.001). Similar to the results of the linear regression analysis, the two integrated models exhibited excellent predictive performance.
3.2 test set based model Performance assessment
Compared with the training set, the evaluation result of the test set is closer to the real application scene, so that the analysis of the test set is of great importance. From SQI-TDS, SQI-MDS and three machine learning modelsThe data density distribution of RFR-MDS and LightGBMR-MDS is similar to SQI-TDS, but the peak is higher than the training set, as seen in violin plots of model predictions. In addition to the most concentrated distribution of SQI-MDS around 75% quantiles, SQI-TDS, DTR-MDS, RFR-MDS and LightGBMR-MDS are all most concentrated in distribution between the median and 75% quantiles. Similar to the Welch method corrected analysis of variance test results of the training set, the different methods exhibited significance (P < 0.05) for soil quality index, and the average value of SQI-MDS was significantly higher than that of SQI-TDS, DTR-MDS, RFR-MDS and LightGBMR-MDS (P < 0.05). The linear regression analysis is utilized for verification, and from the fitting effect, the DTR-MDS, the RFR-MDS and the LightGBM-MDS all have extremely obvious positive correlation (P is less than 0.001) with the SQI-TDS. For R 2 The values, DTR-MDS, were significantly reduced compared to the training set, with significant overfitting problems (although the hyper-parameters that improved the overfitting have been maximally adjusted). Whereas the two integrated models of RFR-MDS and LightGBM-MDS clearly have superior performance over the single model in improving the over-fitting problem, with minimal risk of over-fitting of LightGBM-MDS (R of RFR-MDS and LightGBM-MDS 2 The values are 0.905 and 0.903, respectively). For RMSE and RPD values, none of the three models performed as well in the test set as in the training set. The LightGBMR-MDS has the lowest RMSE value (0.0549) and the highest RPD value (3.21). The RPD values of the two integrated models reach a higher level, and both the RPD values exceed 3, which is enough to meet the requirement of quantitative evaluation of soil quality. By further evaluating the performance of the test set model and combining the analysis results of the training set and the verification set, the RFR-MDS obtains the highest prediction performance on the training set and keeps higher precision on the test set, while the LightGBMR-MDS obtains the prediction performance similar to the RFR-MDS on the test set and has the highest potential of avoiding the fitting risk.
3.3 yield verification of Integrated models
Because of the strong correlation between soil quality and crop yield, crop yield is often used to verify the accuracy of soil quality index calculations. At least 5 sample points are extracted from three kinds of crops, and 455 samples are extracted. RFR-MDS and LightGBMR-MDS and rice, maize and wheat yieldsShows a very significant positive correlation (P < 0.001). R is R 2 The values range from 0.294 to 0.397, with RFR-MDS having a higher R in rice and maize 2 Value, while LightGBMR-MDS shows higher R in wheat 2 Values. In addition, three major crop categories were further partitioned for yield validation herein, and the results demonstrate that: for rice, RFR-MDS and LightGBMR-MDS both have the highest R in early rice season in the middle and downstream regions of the Yangtze river 2 A value; for corn and wheat, both RFR-MDS and LightGBMR-MDS have the highest R in Huang-Huai sea region 2 Values. RFR-MDS R in maize regions 2 The values are all higher than LightGBMR-MDS, while R in the wheat regions 2 The values were all below LightGBMR-MDS. For rice in the middle and downstream regions of the Yangtze river, the RFR-MDS has the total yield of R in early rice season, late rice season and double-cropping rice 2 The values are higher than LightGBMR-MDS, R in single-season rice and rice-wheat rotation (rice season) 2 The values were all below LightGBMR-MDS.
3.4 response of soil quality index to different organic Material inputs
For rice, corn and wheat, two integrated models were used to evaluate the impact of different organic material inputs on the overall soil quality index of three major classes of crops. Based on the prediction results of the two integrated models, different organic material types exhibit significance (P < 0.001) for the soil quality index. Compared with the application of no fertilizer and the application of inorganic fertilizer, the application of animal-derived organic materials and plant-derived organic materials obviously improves the soil quality index (P < 0.01) in the rice, corn and wheat planting modes. In the rice planting mode, compared with the mode without fertilization, the soil quality indexes of the applied animal source organic materials, the plant source organic materials and the animal source organic materials based on the RFR model are respectively improved by 84.4%, 61.9% and 80.6%, and the soil quality indexes of the three based on the LightGBMR model are respectively improved by 87.6%, 63.9% and 83.3%; compared with the application of inorganic fertilizers, the soil quality indexes of the application animal source organic materials, the plant source organic materials and the animal source organic materials based on the RFR model are respectively improved by 37.9%, 21.0% and 35.0%, and the soil quality indexes based on the LightGBMR model are respectively improved by 39.7%, 22.1% and 36.5%. In the corn planting mode, compared with the corn without fertilizing, the soil quality indexes of the applied animal source organic materials, the plant source organic materials and the animal source organic materials based on the RFR model are respectively improved by 78.3 percent, 67.3 percent and 87.5 percent, and the soil quality indexes of the applied animal source organic materials, the plant source organic materials and the animal source organic materials based on the LightGBMR model are respectively improved by 86.1 percent, 72.8 percent and 97.4 percent; compared with the application of inorganic fertilizer, the soil quality indexes of the application animal source organic materials, the plant source organic materials and the animal source organic materials based on the RFR model are respectively improved by 44.0%, 35.1% and 51.5%, and the soil quality indexes based on the LightGBMR model are respectively improved by 49.1%, 38.4% and 58.1%. In the wheat planting mode, compared with the application of no fertilizer, the soil quality indexes of the animal source organic material, the plant source organic material and the animal source organic material based on the RFR model are respectively improved by 80.4%, 61.8% and 71.2%, and the soil quality indexes of the three based on the LightGBMR model are respectively improved by 83.5%, 64.5% and 72.3%; compared with the application of inorganic fertilizers, the soil quality indexes of the application animal source organic materials, the plant source organic materials and the animal source organic materials based on the RFR model are respectively improved by 31.0%, 17.5% and 24.3%, and the soil quality indexes based on the LightGBMR model are respectively improved by 31.3%, 17.7% and 23.3%.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.
Claims (8)
1. An integrated learning optimization evaluation method for soil quality improvement effect is characterized by comprising the following steps of: examples of soil quality prediction data sets, marking of the soil quality prediction data sets, construction and verification of machine learning single models and integrated models based on the soil quality prediction data sets;
the construction and verification of the machine learning single model and the integrated model based on the soil quality prediction data set comprises model performance evaluation based on a training set and a verification set, model performance evaluation based on a test set, yield verification of the integrated model and response of a soil quality index to input of different organic materials.
2. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: the soil quality prediction data set is established by adopting a minimum data set, and the evaluation indexes of the final selected MDS comprise 6 indexes of volume weight, organic matters, quick-acting phosphorus, quick-acting potassium, microbial biomass carbon and microbial biomass nitrogen.
3. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: and marking the soil quality prediction data set by adopting a soil quality index and a minimum data set for verification, scoring and weighting each evaluation index, and respectively calculating based on TDS and MDS to obtain the soil quality index.
4. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: and model performance evaluation based on the training set and the verification set adopts a Welch method to correct analysis of variance, test and research the difference of the soil quality index method and three machine learning models on the soil quality index.
5. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: the model performance evaluation based on the training set and the verification set adopts a 10-fold cross verification method to further evaluate the performance of each model: the 10-fold cross validation method randomly divides the training set into 10 mutually exclusive subsets of similar size, then uses the union of 9 subsets each time as the training set, and the remaining subset as the validation set, thereby performing 10 training and evaluation on the model.
6. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: according to the test set-based model performance evaluation, through further evaluation of the test set model performance and combination of analysis results of the training set and the verification set, the RFR-MDS obtains the highest prediction performance on the training set and keeps higher precision on the test set, and the LightGBMR-MDS obtains the prediction performance similar to the RFR-MDS on the test set and has the highest potential of avoiding the fitting risk.
7. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: yield verification of the integrated model crop yield is typically used to verify the accuracy of the soil quality index calculation due to the strong correlation between soil quality and crop yield.
8. The method for optimizing evaluation of soil quality improvement effect according to claim 1, wherein the method comprises the steps of: the response of the soil quality index to the input of different organic materials adopts two integrated models to evaluate the influence of the input of different organic materials on the overall soil quality index of three kinds of crops; based on the prediction results of the two integrated models, different organic material types exhibit significance for soil quality index (P < 0.001).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310650387.4A CN116629492A (en) | 2023-06-03 | 2023-06-03 | Integrated learning optimization evaluation method for soil quality improvement effect |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310650387.4A CN116629492A (en) | 2023-06-03 | 2023-06-03 | Integrated learning optimization evaluation method for soil quality improvement effect |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116629492A true CN116629492A (en) | 2023-08-22 |
Family
ID=87621091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310650387.4A Pending CN116629492A (en) | 2023-06-03 | 2023-06-03 | Integrated learning optimization evaluation method for soil quality improvement effect |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116629492A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117541129A (en) * | 2024-01-10 | 2024-02-09 | 四川省华地建设工程有限责任公司 | Soil quality assessment method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109374860A (en) * | 2018-11-13 | 2019-02-22 | 西北大学 | A kind of soil nutrient prediction and integrated evaluating method based on machine learning algorithm |
CN110348490A (en) * | 2019-06-20 | 2019-10-18 | 宜通世纪科技股份有限公司 | A kind of soil quality prediction technique and device based on algorithm of support vector machine |
CN113344409A (en) * | 2021-06-22 | 2021-09-03 | 山东农业大学 | Evaluation method and system for facility continuous cropping soil quality |
CN115349316A (en) * | 2022-08-17 | 2022-11-18 | 陕西省微生物研究所 | Soil quality improvement and monitoring system and soil improvement method |
CN115616194A (en) * | 2022-11-03 | 2023-01-17 | 中科合肥智慧农业协同创新研究院 | Soil organic matter prediction method based on auxiliary information |
CN116148438A (en) * | 2023-01-10 | 2023-05-23 | 中南大学 | Soil mineral content prediction method based on machine learning |
-
2023
- 2023-06-03 CN CN202310650387.4A patent/CN116629492A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109374860A (en) * | 2018-11-13 | 2019-02-22 | 西北大学 | A kind of soil nutrient prediction and integrated evaluating method based on machine learning algorithm |
CN110348490A (en) * | 2019-06-20 | 2019-10-18 | 宜通世纪科技股份有限公司 | A kind of soil quality prediction technique and device based on algorithm of support vector machine |
CN113344409A (en) * | 2021-06-22 | 2021-09-03 | 山东农业大学 | Evaluation method and system for facility continuous cropping soil quality |
CN115349316A (en) * | 2022-08-17 | 2022-11-18 | 陕西省微生物研究所 | Soil quality improvement and monitoring system and soil improvement method |
CN115616194A (en) * | 2022-11-03 | 2023-01-17 | 中科合肥智慧农业协同创新研究院 | Soil organic matter prediction method based on auxiliary information |
CN116148438A (en) * | 2023-01-10 | 2023-05-23 | 中南大学 | Soil mineral content prediction method based on machine learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117541129A (en) * | 2024-01-10 | 2024-02-09 | 四川省华地建设工程有限责任公司 | Soil quality assessment method and system |
CN117541129B (en) * | 2024-01-10 | 2024-04-09 | 四川省华地建设工程有限责任公司 | Soil quality assessment method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Medar et al. | Crop yield prediction using machine learning techniques | |
Suchithra et al. | Improving the prediction accuracy of soil nutrient classification by optimizing extreme learning machine parameters | |
Nambiar et al. | Biophysical, chemical and socio-economic indicators for assessing agricultural sustainability in the Chinese coastal zone | |
CN109358178A (en) | A kind of purple soil soil fertility of paddy field evaluation method | |
Bennion et al. | The use of diatom records to establish reference conditions for UK lakes subject to eutrophication | |
Liu et al. | Novel methods to assess environmental, economic, and social sustainability of main agricultural regions in China | |
Han et al. | Evaluation of agricultural land suitability based on RS, AHP, and MEA: A case study in Jilin Province, China | |
CN108876209A (en) | A kind of Red Soil Paddy Fields fertility evaluation method considering fractional yield | |
Blesslin Sheeba et al. | Machine Learning Algorithm for Soil Analysis and Classification of Micronutrients in IoT‐Enabled Automated Farms | |
CN116629492A (en) | Integrated learning optimization evaluation method for soil quality improvement effect | |
CN116227692B (en) | Crop heavy metal enrichment risk quantification method, system and storable medium | |
CN113435707A (en) | Soil testing and formulated fertilization method based on deep learning and weighted multi-factor evaluation | |
CN117010587A (en) | Integrated learning optimization evaluation method for soil quality improvement effect of organic materials | |
CN103065043A (en) | Alpine grassland soil health evaluation method based on physical, biological and chemical composite indicators | |
CN113421023A (en) | Farmland soil ecosystem health evaluation method and system | |
Nabavi-Pelesaraei et al. | Applying artificial neural networks and multi-objective genetic algorithm to modeling and optimization of energy inputs and greenhouse gas emissions for peanut production. | |
Varshitha et al. | An artificial intelligence solution for crop recommendation | |
CN115496382A (en) | Regional agricultural ecological development dynamic evaluation method and system and storable medium | |
CN118097438A (en) | Fertilizing method and system based on big data | |
Liu et al. | Evaluation of cultivated land quality using attention mechanism-back propagation neural network | |
Wu et al. | Calculation and analysis of agricultural carbon emission efficiency considering water–energy–food pressure: Modeling and application | |
Xu et al. | Machine learning algorithms realized soil stoichiometry prediction and its driver identification in intensive agroecosystems across a north-south transect of eastern China | |
Bayrakli et al. | Soil quality assessment based on MCDA–GIS hybrid approach for sustainable hazelnut farming under humid ecosystem environment | |
Brewer et al. | The potential supply of cropland | |
Lin et al. | Simulation of citrus production space based on MaxEnt |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |