CN117541095A - Agricultural land soil environment quality classification method - Google Patents

Agricultural land soil environment quality classification method Download PDF

Info

Publication number
CN117541095A
CN117541095A CN202311275337.9A CN202311275337A CN117541095A CN 117541095 A CN117541095 A CN 117541095A CN 202311275337 A CN202311275337 A CN 202311275337A CN 117541095 A CN117541095 A CN 117541095A
Authority
CN
China
Prior art keywords
model
data
soil
agricultural land
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311275337.9A
Other languages
Chinese (zh)
Inventor
任顺
张清
任东
安毅
孙航
王成龙
闫仁凯
闫艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202311275337.9A priority Critical patent/CN117541095A/en
Publication of CN117541095A publication Critical patent/CN117541095A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Agronomy & Crop Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mining & Mineral Resources (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Animal Husbandry (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for classifying the soil environment quality of agricultural land, which comprises the following steps: collecting soil and agricultural product samples in a research area, and detecting soil physicochemical property indexes in the samples; preprocessing the data of the collected samples, including cleaning, removing abnormal values, interpolating missing values, deleting redundant data and normalizing, so as to obtain a preprocessed data set, and dividing the preprocessed data set into a training set and a testing set; based on a random forest algorithm, constructing an agricultural land soil environment quality category division model through feature selection, decision tree generation and model integration; training the model using a training set; and predicting the test set data by using the optimized random forest model, and evaluating the classification performance of the model. Compared with the existing method for dividing and evaluating the soil environment quality by means of the supervisor experience, the method can process larger-scale data by using a random forest algorithm, automatically discover the relationship in the data by means of the data characteristics, and improve the objectivity and accuracy of evaluation.

Description

Agricultural land soil environment quality classification method
Technical Field
The invention relates to the field of computer application, in particular to a method for classifying soil environmental quality categories of agricultural lands.
Background
In the background of agricultural development and agricultural product quality safety guarantee, the classification of the soil environment quality types of the agricultural land becomes an important management task. The quality of the soil environment is directly related to the growth of crops and the quality of agricultural products, which are indispensable food sources in daily life. Therefore, it becomes critical to scientifically and accurately divide the soil environment quality of the agricultural land so as to realize reasonable land utilization and effective environment protection supervision.
The past soil environment quality assessment and classification often depend on manual experience and traditional statistical methods, and the methods have certain subjectivity and limitation. Therefore, the agricultural land soil environment quality category is scientifically and accurately divided, so that reasonable land utilization and effective environment protection supervision are realized, and the method has positive practical significance.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for classifying the environmental quality of the agricultural land, which realizes the accurate classification of the environmental quality of the agricultural land by comprehensively considering a plurality of indexes and factors and facilitates the subsequent evaluation work.
In order to solve the problems, the invention adopts the following technical scheme: a method for classifying the quality of an agricultural land soil environment, the method using a random forest algorithm to obtain the quality of the agricultural land soil environment, comprising the steps of:
step 1, collecting soil and agricultural product samples in a research area, detecting soil physicochemical property indexes in the samples, and acquiring an atlas of the research area relevant to the agricultural land soil environment quality type work;
step 2, preprocessing the data of the samples collected in the step 1, including cleaning, removing abnormal values, interpolating missing values, deleting redundant data and normalizing, so as to obtain a preprocessed data set, and then dividing the preprocessed data set into a training set and a testing set according to a certain proportion;
step 3, constructing an agricultural land soil environment quality category division model through feature selection, decision tree generation and model integration based on a random forest algorithm;
step 4, training a model by using a training set, and optimizing parameters of the model by a grid searching and cross-validation method;
step 5, predicting the test set data by using the optimized random forest model, and evaluating the classification performance of the model;
and 6, predicting the soil environment quality type of the new sample by using the optimized model.
In a preferred embodiment, step 1 specifically includes the following substeps:
step 1.1: determining a research area, and acquiring soil sampling, precipitation sampling and input product content;
step 1.2: samples of soil points and agricultural products are taken, and soil physicochemical property indexes including but not limited to heavy metal content, pH value, organic matter content and soil granularity are detected in real time.
In a preferred embodiment, step 2 specifically includes the following substeps:
step 2.1: cleaning data, and eliminating possible errors and abnormal data;
step 2.2: filling missing data by adopting an interpolation method aiming at the missing value;
step 2.3: deleting characteristic data irrelevant to or redundant with the target evaluation;
step 2.4: each characteristic index is subjected to format conversion, so that subsequent operation is convenient;
step 2.5: the data is divided into a training set and a testing set according to a certain proportion.
In a preferred embodiment, step 3 specifically includes the following sub-steps:
step 3.1: randomly selecting samples, and selecting n samples from the training set separated in the step 2 by adopting a Bagging method to form a new sub-sample set;
step 3.2: randomly selecting features, randomly selecting m features from all features for training each sub-sample set selected in the step 3.1, selecting features by using a coefficient of the foundation, wherein the smaller the coefficient of the foundation is, the higher the purity of the data set is, and for a given data set D, the calculation formula of the coefficient of the foundation is as follows:
wherein C is k Is a sample subset belonging to the kth class in the data set D, and K is the quality class number of the soil environment of the agricultural land;
step 3.3: constructing a decision tree, wherein the decision tree is constructed for each sub-sample set based on the selected characteristics;
repeating steps 3.2 and 3.3 for a plurality of times, stopping generating decision trees when the generated decision tree child nodes reach full purity, namely only one type of sample in the child nodes, and the coefficient of the foundation is 0, wherein the decision trees form a random forest.
In a preferred scheme, in step 4, the performance of the model is instantiated by using a 5-fold cross-validation evaluation grid search method, the training sample is divided into 5 subsets, 1 subset is used as a validation set, the rest subset is used as a training set of the grid search method, in each cross-validation, 4 subsets are used for model training, then the rest 1 subset is used for model evaluation, the above process is repeated for 5 times, each subset can be ensured to serve as a validation set, the average value of 5 evaluation results is used as a final performance index, according to the cross-validation result, the best-performing parameter combination is found, and the combination is returned for subsequent use.
In a preferred scheme, in step 5, the model after parameter tuning in step 4 is evaluated by using a confusion matrix, and the Accuracy (Accuracy), the Precision (Precision), the Recall (Recall) and the F1-score are obtained according to the confusion matrix, and the calculation formula is as follows:
TP (True Positive) is a true example, namely, the actual example model prediction is also a positive example; FP (False Positive) is a false positive example, namely the actual negative example model predicts the false positive example; TN (True Negative) is true counterexample, i.e. the actual counterexample model prediction is also counterexample; FN (False Negative) is a false counterexample, i.e. the actual positive example model predicts a counterexample.
Compared with the prior art, the invention has the main beneficial effects that:
compared with the existing method for dividing and evaluating the soil environment quality by means of the supervisor experience, the method can process larger-scale data by using a random forest algorithm, automatically discover the relationship in the data by means of the data characteristics, and improve the objectivity and accuracy of evaluation.
Compared with the traditional method, the method adopts a data driving mode, automatically learns the characteristics in large-scale data through a random forest algorithm, does not need to manually extract the characteristics, reduces subjectivity, and improves the scientificity of classification of the soil environment quality of the agricultural land.
Drawings
The invention is further described below with reference to the accompanying drawings.
Fig. 1 is a flow chart of a method for classifying the environmental quality of agricultural land.
Fig. 2 is a flow chart of a random forest algorithm.
Fig. 3 is a schematic diagram of a split training set by a 5-fold cross-validation method.
FIG. 4 is a flow chart of parameter optimization in the present invention.
FIG. 5 is a schematic diagram of the accuracy of training and validation sets of random forest models under different numbers of decision trees.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As a preferred embodiment of the present invention, as shown in fig. 1 to 5, a method for classifying quality classes of agricultural environments includes the steps of:
step 1, collecting soil and agricultural product samples in a research area, detecting a plurality of indexes such as heavy metal content, pH value, physicochemical property and the like in the samples, and acquiring an album of the research area related to the agricultural land soil environment quality type work;
step 1.1: determining a research area, and acquiring soil sampling, precipitation sampling and input product content;
step 1.2: samples of soil sites and agricultural products are taken and laboratory detection is performed, including but not limited to, heavy metal content (such as cadmium, mercury, lead, etc.), pH, organic matter content, soil particle composition, etc., for a plurality of soil physicochemical property indicators.
Selecting a point in Yichang city, collecting samples of soil, agricultural products and the like at the point, and sending the samples to a laboratory for detection; and other environmental information such as precipitation, temperature and the like of the region where the point is located and album information related to the classification of the soil environmental quality of the agricultural land are acquired.
Step 2, preprocessing the data of the samples collected in the step 1, including cleaning, removing abnormal values, interpolating missing values, deleting redundant data, normalizing and the like, so as to obtain a preprocessed data set, and then dividing the preprocessed data set into a training set and a testing set according to a certain proportion;
step 2.1: cleaning data, and eliminating possible errors and abnormal data;
step 2.2: filling missing data by adopting a proper interpolation method aiming at the missing value;
step 2.3: deleting characteristic data irrelevant to or redundant with the target evaluation;
step 2.4: each characteristic index is subjected to format conversion, so that subsequent operation is convenient;
step 2.5: the data is divided into a training set and a testing set according to a certain proportion.
Sample information of previous inspection is obtained, and the data are cleaned by combining the environmental factors, so that possible errors and abnormal data are eliminated. Deleting missing items and repeated items, arranging the data format, and processing the length, the unit and the like of index data. Making a class label according to the data information, wherein 1 represents that the quality class of the agricultural land soil environment represented by the point location is a priority protection class; 2, the quality class of the agricultural land soil environment where the point is located is a safety utilization class; and 3, the agricultural land where the point is located is seriously polluted, and the quality class of the soil environment is strictly controlled.
After the data cleaning in this example is finished, 1195 pieces of data are input as a random forest algorithm according to 8:2 to divide the training set and the test set. Both the training set and the testing set contain three types of data, namely a priority protection type, a security utilization type and a strict management type, which are described before.
Step 3, constructing an agricultural land soil environment quality category division model through processes such as feature selection, decision tree generation, model integration and the like based on a random forest algorithm;
step 3.1: samples were randomly selected. And (3) selecting n samples from the training set separated in the step (2) by adopting a Bagging method to form a new sub-sample set. This ensures that each sub-sample set is partially duplicated, but also has independent samples. This helps to increase the diversity of the model and reduce the overfitting.
Step 3.2: features are randomly selected. And (3) randomly selecting m features from all the features for training each sub-sample set selected in the step (3.1). Therefore, each decision tree is facilitated to consider only part of the features, randomness is increased, and the generalization capability of the model is improved. In feature selection, information gain, a coefficient of kunity, or other criteria is typically used to evaluate the importance of each feature. In the present invention, the smaller the coefficient of the kene, the higher the purity of the data set, using the coefficient of the kene selection feature. For a given data set D, the calculation formula for its kunity coefficients is as follows:
wherein C is k Is a sample subset belonging to the kth class in the data set D, and K is the quality class number of the agricultural land soil environment.
Step 3.3: and constructing a decision tree. For each sub-sample set, a decision tree is constructed based on the selected features. The decision tree progressively divides the data according to the features so that each leaf node contains as many homogeneous samples as possible.
Repeating the steps 3.2 and 3.3 for a plurality of times, and stopping generating the decision tree when the generated decision tree child nodes reach full purity, namely only one type of sample exists in the child nodes, and the coefficient of the foundation is 0. Together, these decision trees form a random forest.
In this example, the decision tree is constructed by setting a random forest feature selection function criterion= 'gini', the minimum number of samples min_samples_split=1 that the decision tree node needs to contain before splitting.
Step 4, training the model by using a training set, and optimizing parameters of the model by using a grid searching method, a cross verification method and the like;
in this example, the best parameter combination is found out by using a grid search method with 5-fold cross validation on the basis of the training set divided in the step 2. The parameter evaluation uses the Accuracy (Accuracy) as an evaluation index, and finally a set of parameters with the highest Accuracy will be selected for the overall training dataset to train the model.
In this example, the result after grid search and cross validation is shown in fig. 5, and the number of the optimal decision trees n_identifiers in the random forest is 26, so that the model precision reaches 78.87%.
Step 5, predicting the test set data by using the optimized random forest model, and evaluating the classification performance of the model;
in the example, predicting the optimal parameter model obtained in the step 4 on a test set, and calculating relevant evaluation indexes to obtain a confusion matrix, see table 1; the evaluation report of the model on the test set is shown in table 2. As can be seen from the confusion matrix of table 1, diagonal elements show the number of correctly classified samples for each category, and non-diagonal elements show the number of incorrectly classified samples. The class 1 has 102 samples correctly classified, 13 misclassifications are classified into the class 2,2 misclassifications are classified into the class 3, and the model has good prediction effect on the class. Class 2 has 80 samples correctly classified, 21 misclassified to class 1,4 misclassified to class 3. Only 3 samples of the category 2 are correctly classified, the number of samples of the misclassification is large, and the prediction effect of the model on the category 3 is poor.
As can be seen from the evaluation report of table 2:
accuracy (precision) means the proportion of samples that the model predicts as a class, actually belonging to that class. The accuracy rates for category 1 and 2 are high, reaching 0.82 and 0.75, respectively, while the accuracy rate for category 3 is only 0.33.
Recall (recovery) indicates the proportion that the model correctly predicts for a certain class of samples. The highest recall rate of category 1 is 0.87; category 2 is 0.76; whereas class 3 is only 0.18, this result is likely to be related to the input sample size imbalance.
The F1 fraction comprehensively considers the accuracy and the recall, the categories 1 and 2 are above 0.8, the category 3 is only 0.23, and the classification of the reaction model category 3 is to be enhanced.
The data in the report is comprehensively evaluated, and indexes such as accuracy, precision, recall rate and the like of the model reflect that the prediction effect of the model on the categories 1 and 2 is still available, but the identification capability of the model on the category 3 is required to be enhanced, and the pertinence is required to be improved.
Table 1 test set confusion matrix
precision recall f1-score support
1 0.82 0.87 0.85 117
2 0.75 0.76 0.76 105
3 0.33 0.18 0.23 17
accuracy 0.77 239
macroavg 0.64 0.60 0.61 239
weightedavg 0.76 0.77 0.76 239
Table 2 evaluation report
And 6, predicting the soil environment quality type of the new sample by using the optimized model.
The invention provides a classification method for the soil environment quality of agricultural land. According to the method, automatic classification of soil environment quality is achieved by constructing a data driving model. Compared with the traditional method, the method utilizes the machine learning algorithm to automatically perform feature learning, so that subjectivity is reduced. The method provides more accurate and dependable evaluation results for soil environment detection and management decisions, and provides powerful support for scientific planning of agricultural land, implementation of soil pollution control and the like.
The new quantitative strength identification method can accurately decompose unconfined compressive strength into two strength components of gelation and compaction control. The gel strength is mainly controlled by the type of cement-based material, the amount of admixture, the water content, the interaction between the hydration product and clay, while the compaction strength is determined by the compaction properties of the gel matrix (i.e. comprising hydration product and clay particles).
The random forest algorithm is an integrated learning method, which performs classification and regression tasks by constructing a plurality of decision trees and voting or averaging. The algorithm has strong generalization capability and robustness, and can process a large amount of data and complex characteristic relations. In the classification of the agricultural soil environmental quality, a random forest algorithm can automatically discover rules and characteristics of the monitoring data and environmental factors by utilizing a large number of sudden data, so that a decision maker is assisted to accurately classify the agricultural soil environmental quality.
Compared with the traditional method, the model built by the method has strong generalization capability, can realize the automation and accurate soil environment quality division of large-scale agricultural lands, and is suitable for various agricultural areas and different soil types. By collecting the agricultural land soil monitoring data and related environmental factors of the research area, a proper feature set is constructed, and the data are preprocessed and marked. Then, model training and optimization are carried out by using a random forest algorithm, and the division result is evaluated and verified. Finally, a set of agricultural land soil environment quality classification model based on data driving is obtained, and is compared and analyzed with the traditional method to verify the accuracy and the practicability. The invention can be combined with other machine learning algorithms to further improve the accuracy and efficiency of agricultural land soil environment quality classification.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims (6)

1. The method for classifying the soil environment quality of the agricultural land is characterized by using a random forest algorithm to acquire the soil environment quality of the agricultural land, and comprises the following steps of:
step 1, collecting soil and agricultural product samples in a research area, detecting soil physicochemical property indexes in the samples, and acquiring an atlas of the research area relevant to the agricultural land soil environment quality type work;
step 2, preprocessing the data of the samples collected in the step 1, including cleaning, removing abnormal values, interpolating missing values, deleting redundant data and normalizing, so as to obtain a preprocessed data set, and then dividing the preprocessed data set into a training set and a testing set according to a certain proportion;
step 3, constructing an agricultural land soil environment quality category division model through feature selection, decision tree generation and model integration based on a random forest algorithm;
step 4, training a model by using a training set, and optimizing parameters of the model by a grid searching and cross-validation method;
step 5, predicting the test set data by using the optimized random forest model, and evaluating the classification performance of the model;
and 6, predicting the soil environment quality type of the new sample by using the optimized model.
2. The method for classifying the soil environmental quality of the agricultural land according to claim 1, wherein the step 1 comprises the following steps:
step 1.1: determining a research area, and acquiring soil sampling, precipitation sampling and input product content;
step 1.2: samples of soil points and agricultural products are taken, and soil physicochemical property indexes including but not limited to heavy metal content, pH value, organic matter content and soil granularity are detected in real time.
3. The method for classifying the environmental quality of the agricultural land according to claim 1, wherein the method comprises the following steps: step 2 specifically comprises the following substeps:
step 2.1: cleaning data, and eliminating possible errors and abnormal data;
step 2.2: filling missing data by adopting an interpolation method aiming at the missing value;
step 2.3: deleting characteristic data irrelevant to or redundant with the target evaluation;
step 2.4: each characteristic index is subjected to format conversion, so that subsequent operation is convenient;
step 2.5: the data is divided into a training set and a testing set according to a certain proportion.
4. The method for classifying the environmental quality of the agricultural land according to claim 1, wherein the method comprises the following steps: the step 3 specifically comprises the following sub-steps:
step 3.1: randomly selecting samples, and selecting n samples from the training set separated in the step 2 by adopting a Bagging method to form a new sub-sample set;
step 3.2: randomly selecting features, randomly selecting m features from all features for training each sub-sample set selected in the step 3.1, selecting features by using a coefficient of the foundation, wherein the smaller the coefficient of the foundation is, the higher the purity of the data set is, and for a given data set D, the calculation formula of the coefficient of the foundation is as follows:
wherein C is k Is a sample subset belonging to the kth class in the data set D, and K is the quality class number of the soil environment of the agricultural land;
step 3.3: constructing a decision tree, wherein the decision tree is constructed for each sub-sample set based on the selected characteristics;
repeating steps 3.2 and 3.3 for a plurality of times, stopping generating decision trees when the generated decision tree child nodes reach full purity, namely only one type of sample in the child nodes, and the coefficient of the foundation is 0, wherein the decision trees form a random forest.
5. The method for classifying the environmental quality of the agricultural land according to claim 1, wherein the method comprises the following steps: in step 4, the performance of the model is instantiated by using a 5-fold cross-validation evaluation grid search method, the training sample is divided into 5 subsets, 1 subset is used as a validation set, the rest subset is used as a training set of the grid search method, in each cross-validation, 4 subsets are used for model training, then the rest 1 subset is used for model evaluation, the above process is repeated for 5 times, each subset can be ensured to serve as a validation set, the average value of the 5 evaluation results is used as a final performance index, the best-performing parameter combination is found according to the cross-validation result, and the combination is returned for subsequent use.
6. The method for classifying the environmental quality of the agricultural land according to claim 1, wherein the method comprises the following steps: in step 5, the model after parameter tuning in step 4 is evaluated by using a confusion matrix, and the Accuracy (Accuracy), precision (Precision), recall (Recall) and F1-score are obtained according to the confusion matrix, wherein the calculation formula is as follows:
TP (True Positive) is a true example, namely, the actual example model prediction is also a positive example; FP (False Positive) is a false positive example, namely the actual negative example model predicts the false positive example; TN (True Negative) is true counterexample, i.e. the actual counterexample model prediction is also counterexample; FN (False Negative) is a false counterexample, i.e. the actual positive example model predicts a counterexample.
CN202311275337.9A 2023-09-28 2023-09-28 Agricultural land soil environment quality classification method Pending CN117541095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311275337.9A CN117541095A (en) 2023-09-28 2023-09-28 Agricultural land soil environment quality classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311275337.9A CN117541095A (en) 2023-09-28 2023-09-28 Agricultural land soil environment quality classification method

Publications (1)

Publication Number Publication Date
CN117541095A true CN117541095A (en) 2024-02-09

Family

ID=89792619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311275337.9A Pending CN117541095A (en) 2023-09-28 2023-09-28 Agricultural land soil environment quality classification method

Country Status (1)

Country Link
CN (1) CN117541095A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015661A (en) * 2024-04-08 2024-05-10 南京启数智能系统有限公司 Portrait view archive accuracy detection method based on random forest algorithm
CN118520281A (en) * 2024-07-24 2024-08-20 山东科技大学 Granite construction environment discriminating method based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118015661A (en) * 2024-04-08 2024-05-10 南京启数智能系统有限公司 Portrait view archive accuracy detection method based on random forest algorithm
CN118520281A (en) * 2024-07-24 2024-08-20 山东科技大学 Granite construction environment discriminating method based on machine learning

Similar Documents

Publication Publication Date Title
CN107292330B (en) Iterative label noise identification algorithm based on double information of supervised learning and semi-supervised learning
CN117541095A (en) Agricultural land soil environment quality classification method
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN107368700A (en) Based on the microbial diversity interaction analysis system and method for calculating cloud platform
CN110647830B (en) Bearing fault diagnosis method based on convolutional neural network and Gaussian mixture model
CN109597968A (en) Paste solder printing Performance Influence Factor analysis method based on SMT big data
CN105631203A (en) Method for recognizing heavy metal pollution source in soil
CN109558893B (en) Rapid integrated sewage treatment fault diagnosis method based on resampling pool
CN115641162A (en) Prediction data analysis system and method based on construction project cost
CN111105041B (en) Machine learning method and device for intelligent data collision
CN115602337A (en) Cryptocaryon irritans disease early warning method and system based on machine learning
CN112348264A (en) Carbon steel corrosion rate prediction method based on random forest algorithm
CN112183459B (en) Remote sensing water quality image classification method based on evolution multi-objective optimization
CN116468160A (en) Aluminum alloy die casting quality prediction method based on production big data
CN115794803B (en) Engineering audit problem monitoring method and system based on big data AI technology
CN113919235A (en) Method and medium for detecting abnormal emission of mobile source pollution based on LSTM evolution clustering
CN114186644A (en) Defect report severity prediction method based on optimized random forest
CN116416884A (en) Testing device and testing method for display module
CN118039029A (en) Method and system for identifying granite type based on machine learning and zircon component
CN117764413A (en) Accurate carbon emission accounting algorithm based on machine learning
CN113824580B (en) Network index early warning method and system
CN116884536A (en) Automatic optimization method and system for production formula of industrial waste residue bricks
CN116930423A (en) Automatic verification and evaluation method and system for air quality model simulation effect
CN114764682B (en) Rice safety risk assessment method based on multi-machine learning algorithm fusion
CN116502943A (en) Quality tracing method for investment casting product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination