CN117789038B

CN117789038B - Training method of data processing and recognition model based on machine learning

Info

Publication number: CN117789038B
Application number: CN202410205784.5A
Authority: CN
Inventors: 张镇; 靖婉琦; 刘晨甲; 王兆信; 谢东明; 宋光恒; 孙德润; 徐如明
Original assignee: Shuju Shandong Intelligent Technology Co ltd; Liaocheng Laike Intelligent Robot Co ltd
Current assignee: Shuju Shandong Intelligent Technology Co ltd; Liaocheng Laike Intelligent Robot Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-05-10
Anticipated expiration: 2044-02-26
Also published as: CN117789038A

Abstract

The invention provides a training method of a data processing and identifying model based on machine learning, which belongs to the technical field of data processing, and comprises the steps of firstly collecting soil information, marking training samples for training the model, performing dimension reduction operation on data, then performing sample expansion through a SMOTE sample generating method based on rapid clustering, then performing feature extraction on the data through a neural network, optimizing parameters of the neurons through a neural network model based on search operator algorithm optimization, avoiding gradient elimination and gradient explosion phenomena caused by a traditional neural network parameter optimizing method, and finally classifying hyperspectral data through a machine learning classifying model based on an improved random forest, and effectively improving classification precision of a classifier through evaluating classification performance of each decision tree in a decision tree training stage; the algorithm designed by the invention has higher detection precision, and has higher robustness and generalization capability.

Description

Training method of data processing and recognition model based on machine learning

Technical Field

The invention relates to the technical field of data processing, in particular to a training method of a data processing and identifying model based on machine learning.

Background

The heavy metal elements in the soil are difficult to degrade by natural environment, and the heavy metal pollution is difficult to treat and has strong hazard. Therefore, the heavy metal pollution condition of the soil is monitored in real time, the diffusion of a pollution area can be timely avoided, and the heavy metal pollution of the soil is prevented from further aggravating. The traditional method for identifying heavy metal pollution of soil is to collect soil in the field and perform chemical analysis by a laboratory to judge the pollution condition in a certain area, and has the defects of accurate identification precision, long analysis period, great manpower and material resources consumption and difficulty in meeting the real-time monitoring requirement of a macroscopic area. With the development of hyperspectral remote sensing and related fields, a solution is brought for rapidly monitoring soil heavy metal pollution in a macroscopic region. The hyperspectral remote sensing has the characteristics of rapidness, dynamics and no destructiveness, and the hyperspectral remote sensing is used in the field of soil heavy metal pollution and can meet the requirement of large-scale real-time monitoring. The hyperspectral image space information can reflect physical space structure information outside the object, such as texture features, geometric features and the like; the spectral information reflects the change in chemical composition within the object. Whether it is a single-point high-resolution spectrum band measured in a laboratory or a hyperspectral image obtained by satellite or airborne means, the spectrum bands contain a large amount of information of the measured object, but adjacent spectrum bands have the characteristic of higher correlation, and a large amount of information redundancy can increase the difficulty of feature extraction. And the soil is complex in composition and low in heavy metal element content, so that the response of the soil in the soil spectrum is weak. How to effectively extract important characteristic information in a complex plurality of spectrum bands is an important research content in the hyperspectral field.

The invention patent with the patent number of CN202110651965.7 in the prior art provides an undisturbed soil profile carbon component prediction method based on hyperspectral imaging and support vector machine technology, and based on the acquisition of hyperspectral images of soil profile samples with preset depths at various sample positions, the method takes each characteristic spectrum band of a target sample spectrum region corresponding to the soil carbon component type as input, and soil carbon component data of the target sample spectrum region corresponding to the soil carbon component type as output, and obtains a soil carbon component prediction model corresponding to the soil carbon component type through training, so as to further realize the prediction of the soil profile carbon component of the target region; the whole design scheme can rapidly and accurately predict the contents of the components such as organic carbon, soluble carbon, carbon easy to oxidize, soil microbial biomass carbon and the like in the undisturbed soil profile, and realize the fine drawing of the spatial distribution of the components on the soil profile; makes up the defects of the traditional laboratory chemical analysis method.

The invention patent with the patent number of CN201910717696.2 in the prior art provides a soil quality monitoring method based on aviation hyperspectrum, which comprises the following steps: step 1, acquiring aviation hyperspectral data of a soil quality monitoring area, and acquiring samples of the soil quality monitoring area in the wild to analyze the content of heavy metal elements; step 2, preprocessing the aviation hyperspectral data; step 3, reconstructing a hyperspectral data spectrum of the aviation to eliminate radiation distortion of a ground object spectrum caused by various atmospheric components; step 4, extracting the spectrum of the sampling point aviation hyperspectral image in the aviation hyperspectral remote sensing data; step 5, spectrum transformation and correlation coefficient analysis are carried out, the correlation coefficient between the content of the soil and the soil spectrum parameter is obtained, and the sensitive wave band of the characteristic spectrum is found out; and 6, establishing an inversion soil quality monitoring model of the aviation hyperspectral data to obtain monitored soil nutrient and metal element content data. When the method is applied, the large-range soil foundation data can be accurately obtained, the workload can be reduced, the soil quality monitoring period can be shortened, and the cost can be reduced.

The invention patent with the patent number of CN201510119440.3 in the prior art provides a technical method for identifying soil attribute hyperspectrum, and relates to the technical field of soil exploration. The method comprises the following steps: s1, acquiring soil hyperspectral images at different times based on remote sensing satellite data; s2, after image preprocessing, obtaining bare soil through supervision and classification, extracting the surface reflectivity of the bare soil, and establishing a bare soil surface reflectivity inversion model according to the surface reflectivity of the bare soil; s3, designing an indoor soil erosion test, and acquiring soil erosion data corresponding to the soil hyperspectral image acquisition time; s4, acquiring soil classification and calculating a soil K value through the soil corrosiveness data obtained in the step S3; and S5, establishing a hyperspectral model affecting the soil property of the corrodibility K according to the soil K value and the spectral data in the earth surface reflectivity inversion model. The invention solves the problem that the hyperspectral remote sensing technology cannot be used for measuring the soil corrosiveness.

Although the above prior art can identify the pollution degree of soil, the existing method still needs to be further improved in terms of model design and data processing, and especially needs to be further optimized in terms of improving detection precision, robustness and generalization capability of the model.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme: a training method of a data processing and identifying model based on machine learning comprises the following steps:

s1, acquiring soil data, and marking a training sample for model training;

S2, reducing the dimension of the data, and recombining high-dimension characteristic variables with larger correlation numbers to form a group of low-dimension linear independent variables;

s3, sample expansion, namely generating new samples in the categories of a few samples, and reducing the imbalance phenomenon of the categories of the samples;

s4, extracting characteristics of the data in the step S3, and providing a neural network model optimized based on a search operator algorithm to optimize parameters of neurons, wherein the number of layers of the neural network adopted in the step is 2, and the parameters are searched based on the search operator algorithm in the neural network model optimized based on the search operator algorithm;

S5, training a classifier machine learning model;

And S6, applying the trained model to carry out soil heavy metal pollution degree, training the model by using a marked sample, and detecting and identifying the data to be detected and identified after model training is completed.

Further, the dimension reduction method adopted in S2 is a principal component analysis method, and includes the following steps:

s201, data standardization, wherein the standardized calculation method comprises the following steps:

；

wherein Z represents a standardized value, and all variables are scaled according to the proportion through the step;

s202, calculating a covariance matrix, wherein the covariance matrix is defined as one mathematically Matrix/>Representing the dimensionality of the acquired data, each element in the matrix representing the covariance of the corresponding variable, for a vector with variable/>And the hyperspectral band scene of variable b, the covariance of which is a2 x 2 matrix, as follows:

；

Wherein, Representing covariance matrix,/>Representing the covariance of the variable with itself, i.e., variable/>Is a variance of (2); /(I)Representing the variable/>The covariance with the variable b is given by,Representing the variable/>Is a variance of (2);

S203, calculating a feature vector and a feature value:

Calculating from the covariance matrix to obtain feature vectors and feature values, wherein the feature vectors and the feature values are calculated in pairs, namely, each feature vector has a corresponding feature value, and the number of feature vectors to be calculated determines the dimension of data;

The eigenvectors are used to learn the maximum variance in the data using covariance matrices, since more variance in the hyperspectral data represents more information about the data, eigenvectors are used to identify and calculate principal components, and on the other hand, eigenvalues represent only scalar quantities for each eigenvector, so eigenvectors and eigenvalues will be used to calculate principal components of the data;

S204, calculating main components:

After the feature vectors and the feature values are calculated, the feature vectors are required to be ordered in a descending order, the feature vector corresponding to the higher feature value has more important position, the feature vector with the highest feature value is used as a first main component, and then the screened main components form a feature matrix;

s205, reducing the dimension of the data set:

Rearranging the raw data with final principal components representing the largest and most important information of the dataset; in order to replace the original data set with the newly formed principal component, it is simply multiplied with the transpose of the original data, and the obtained data is used as the dimension-reduced data.

Further, in S3, sample expansion adopts a SMOTE sample generation method based on fast clustering, which includes the following steps:

s301, obtaining each minority class sample by calculating Euclidean distance from the sample to other minority class samples And performing linear interpolation between the sample and the selected neighbor sample in a random selection mode to generate a new minority sample, wherein the specific process is as follows:

；

Wherein, Representation/>One sample in the immediate vicinity,/>Is a random number,/>Is an input sample,/>Is the new sample generated;

s302, pair generation The samples are clustered rapidly, and firstly, the distance between objects is calculated according to the following formula:

；

Wherein, And/>For/>2 Of the samples,/>For/>And/>In order to accelerate the clustering speed, a threshold value is set, and the formula is as follows:

；

Wherein, Representing a threshold value/>Is a scaling factor, set by man, typically greater than 0 and less than 1; /(I)Respectively minimum distance and maximum distance between categories;

At the generated sample In the method, samples generated in each category are screened to improve the quality of the generated samples, and screening conditions are as follows:

；

Represents the screening sample set, will/> Combining the sample set with the original data set to obtain an equalized sample set/>For subsequent feature extraction.

Further, in S4, a neural network model optimized based on a search operator algorithm is adopted to carry out the parameter on the neuronsSum parameter/>Optimizing,/>Weight parameters of neuronsThreshold parameters of neurons;

The number of layers of the neural network adopted in the step is 2, and parameters are subjected to search operator algorithm pairs in the neural network model optimized based on the search operator algorithm Sum parameter/>The searching method of (2) comprises the following steps:

s401, defining a search operator, and setting search conditions:

setting n search operators in the search operator population, wherein the individual states of the search operators are expressed as follows Wherein/>For/>The states of the search operators, namely free variables in the parameter optimizing problem; for objective function/>A representation; search operator/>、/>The distance between them is/>; The searching radius of the searching operator is Visual; the Step length of searching is Step; the crowding factor is/>; At a certain moment/>Search operatorsSearching for any position/>, within a search radius VisualIf/>Position status is better than/>Location, then go to/>Further forward in the direction of position, i.e. arrival/>A location; otherwise, continuing to search for other locations within the field of view, the process is expressed as:

；

In the method, in the process of the invention, A random number of 0 to 1;

Before the action, each search operator sequentially executes the searching action, the clustering action, the rear-end collision action and the random action, and then selects the optimal action to execute, so that the search operator population can reach a position closer to the optimal solution:

(1) Search behavior

Assume the firstThe state of a search operator at a certain moment is/>Randomly selecting a state within its search rangeThe following formula is satisfied:

；

and/> Respectively express/>And/>Priority decryption concentration in state, if/>This search operator is moved one step in this direction, namely:

；

If the forward condition is not met, a state is selected again in the search range, whether the moving condition is met or not is judged, after the set repeated times are repeatedly selected, if the moving condition is still not met, the moving is carried out randomly;

(2) Aggregation behavior

Assume the firstThe state of a search operator at a certain moment is/>The number of other search operators searched in the current state is n, and the central position is/>The judgment basis is as follows:

；

Wherein, Is a congestion degree factor,/>And/>The priority decryption concentration of the central position and the current position are respectively represented;

If the above formula is established, the priority decryption concentration of the center is higher and the center is not crowded, and the center is moved to the center direction by one step; if not, executing searching behavior;

(3) Rear-end collision behavior

Assume the firstThe state of a search operator at a certain moment is/>Searching other search operators nearby in the current state, and finding out/>, in the peers, with maximum priority decryption concentrationIts position is/>The judgment basis is as follows:

；

if the above formula holds, other search operators are indicated Where there is a denser preferential solution and less crowding, then the search operator/>Moving in one step in the direction; if not, executing searching behavior;

(4) Random behavior

This behavior is a default behavior of the search behavior, i.e. randomly selecting a position to move to within the field of view, the position of the next state is:

；

By the method, the optimal solution set of the neural network parameters is obtained.

Further, in S5, hyperspectral data is classified by a machine learning classification model based on an improved random forest, and the degree of heavy metal pollution is identified, and the improved random forest algorithm is as follows:

In the training stage of the decision tree, a higher weight is given to the decision tree capable of accurately classifying a few class samples by evaluating the classification performance of each decision tree, a final prediction result is obtained by a weighted voting mode, and the prediction result of the random forest is defined as follows:

；

Wherein, Representing the predicted outcome of a random forest,/>Representing the maximum index function, N is the test set, T is the number of decision trees,/>, andTo indicate a function,/>For/>Prediction result of decision tree,/>Representing category,/>For/>Voting weight of the decision tree; when the prediction result of the decision tree is true, the function/>, is indicatedThe value of (2) is 1, whereas 0;

when the improved random forest algorithm works, firstly, a confusion matrix is constructed, TP in the confusion matrix represents that a stable sample is judged as a stable sample, FN represents that the stable sample is judged as a unstable sample, FP represents that the unstable sample is judged as a stable sample, and TN represents that the unstable sample is judged as a unstable sample;

Accuracy of classification of destabilized samples using each decision tree And recall/>Harmonic mean value/>The voting weight value/>, of each tree is taken as the weight of the treeThe definition is as follows:

；

The larger the decision tree is, the better the classification performance of the decision tree on minority class samples is, and the heavy metal pollution degree is identified by improving a machine learning classification model of a random forest.

Compared with the prior art, the invention has the beneficial effects that: the algorithm designed by the invention realizes the expansion of samples by carrying out dimension reduction on high-dimension original data and introducing a SMOTE sample generation method of rapid clustering, thereby reducing the imbalance phenomenon of sample types; obtaining an optimal solution set of the neural network parameters by using a neural network model optimized based on a search operator algorithm; in the training stage of the decision tree by improving the random forest algorithm, a higher weight is given to the decision tree capable of accurately classifying a few types of samples by evaluating the classification performance of each decision tree, and a final prediction result is obtained by a weighted voting mode, so that the classification performance of the model is improved; and finally, the obtained algorithm model has higher detection precision, and higher robustness and generalization capability.

Drawings

Fig. 1 is a flowchart illustrating an embodiment of the present invention.

Detailed Description

The following describes the embodiments of the present invention further with reference to the drawings.

Examples: referring to fig. 1, a training method of a machine learning-based data processing and recognition model includes the steps of:

S1, acquiring soil data, and marking a training sample for model training; the collected data is derived from hyperspectral remote sensing images or sensor data, and in this embodiment, hyperspectral remote sensing images are taken as an example for illustration.

S2, performing dimension reduction operation on high-dimension hyperspectral data

The original hyperspectral data has multiple wave bands, high dimensionality and large data volume, and has data redundancy, so that the influence caused by 'dimensionality disaster' is reduced, the information loss is reduced as much as possible while the dimensionality of the data is reduced, and the proposed soil heavy metal pollution identification classification framework firstly carries out constraint on the spectrum dimension of the original hyperspectral remote sensing image, and the aims of dimension reduction and redundant information elimination of the data are achieved by reserving a plurality of main components.

The dimension reduction method adopted in the step is a principal component analysis method, and a high-dimensional characteristic variable with a large correlation coefficient is recombined by projecting a high-dimensional hyperspectral remote sensing image into a low-dimensional subspace to form a low-dimensional linear independent group of variables; when the primary component analysis method processes the original hyperspectral remote sensing image, the method mainly comprises the following steps:

S201, data standardization, wherein the standardization can enable all variables and values in hyperspectral data to be in a similar range, and if the standardization operation is not performed, deviation of results can occur; the standardized calculation method comprises the following steps:

；

Where Z represents a normalized value, all variables are scaled by this step.

S202, calculating a covariance matrix, wherein the principal component analysis method is helpful for identifying the correlation and the dependence among elements in the hyperspectral dataset, and the covariance matrix represents the correlation among different variables in the dataset; the covariance matrix is mathematically defined as oneMatrix, in hyperspectral remote sensing image,/>Representing the dimension of the hyperspectral remote sensing image, each element in the matrix representing the covariance of the corresponding variable, for a vector with the variable/>And the hyperspectral band scene of variable b, the covariance of which is a2 x 2 matrix, as follows:

；

Wherein, Representing covariance matrix,/>Representing the covariance of the variable with itself, i.e., variable/>Is a variance of (2); /(I)Representing the variable/>The covariance with the variable b is given by,Representing the variable/>Is a variance of (2); in the covariance matrix, the covariance value indicates the degree to which two variables are interdependent, and if the covariance value is negative, it indicates that the variables are inversely proportional to each other, and conversely, that the variables are directly proportional to each other.

S203, calculating a feature vector and a feature value:

Calculating from covariance matrix to obtain feature vector and feature value, wherein the principal component is obtained by converting original vector, re-representing partially converted vector, compressing and re-integrating most of information originally scattered in original vector in the process of extracting principal component, if the first 5 space dimensions in hyperspectral data are reserved, calculating 5 principal components, so that the 1 st principal component stores the maximum possible information, the 2 nd principal component stores the rest maximum information, and so on; the eigenvectors and eigenvalues are computed in pairs, i.e. there is a corresponding one for each eigenvector, the number of eigenvectors that need to be computed determines the dimensionality of the data.

The hyperspectral remote sensing image is a 3-dimensional data set, the number of characteristic vectors and characteristic values is 3, the characteristic vectors are used for knowing the maximum variance in the data by using a covariance matrix, and the characteristic vectors are used for identifying and calculating principal components because more differences in the hyperspectral data represent more information about the data; on the other hand, the eigenvalues represent only scalar quantities of the respective eigenvectors, and therefore, the eigenvectors and eigenvalues will be used to calculate the principal components of the hyperspectral data.

S204, calculating main components:

After the feature vectors and the feature values are calculated, the feature vectors are required to be ordered in a descending order, the feature vector corresponding to the higher feature value has more important position, the feature vector with the highest feature value is used as a first main component, and the like, so that the main component with lower importance can be deleted to reduce the size of data, and the screened main components form a feature matrix, wherein all important data variables with the maximum data information are contained.

S205, reducing the dimension of the data set:

S3, sample expansion:

because the data acquisition often has the phenomenon of sample class imbalance, namely the difference of the number of samples of different classes is large, and the class with small number of samples is difficult to effectively distinguish when the data is classified, the invention provides the SMOTE sample generation method based on the rapid clustering, which generates new samples in the classes of a few samples and reduces the phenomenon of sample class imbalance.

A SMOTE sample generation method based on rapid clustering is adopted, and comprises the following steps:

；

S4, extracting characteristics of the hyperspectral data:

The data obtained through the steps are subjected to feature extraction, the hyperspectral data is subjected to feature extraction by adopting a neural network, and the hyperspectral data is different from a traditional neural network model, the neural network optimization algorithm is improved in the step, and the parameters of the neural network model to the neurons based on the search operator algorithm optimization are provided Sum parametersOptimization is performed in which/>Weight parameters of neuronsThreshold parameters of neurons; the number of layers of the neural network adopted in the step is 2, and parameters/>, based on a search operator algorithm in a neural network model optimized by the search operator algorithmSum parameter/>The searching method of (2) comprises the following steps:

s401, defining a search operator, and setting search conditions:

；

In the method, in the process of the invention, Is a random number between 0 and 1.

(1) Search behavior

；

If the forward condition is not met, a state is selected again in the search range, whether the moving condition is met or not is judged, after the set repeated times are repeatedly selected, if the moving condition is still not met, the moving is carried out randomly.

(2) Aggregation behavior

；

Wherein, Is a congestion degree factor,/>And/>Representing the priority decryption concentration for the central location and the current location, respectively.

If the above formula is established, the priority decryption concentration of the center is higher and the center is not crowded, and the center is moved to the center direction by one step; if not, a search action is performed.

(3) Rear-end collision behavior

；

if the above formula holds, other search operators are indicated Where there is a denser preferential solution and less crowding, then the search operator/>Moving in one step in the direction; if not, a search action is performed.

(4) Random behavior

；

S5, training a classifier machine learning model;

After feature extraction, the invention provides a machine learning classification model based on an improved random forest to classify hyperspectral data and identify the heavy metal pollution degree.

In order to improve the recognition capability of the random forest to minority samples, the invention provides an improved random forest algorithm, in the training stage of the decision tree, the classification performance of each decision tree is evaluated, a higher weight is given to the decision tree capable of accurately classifying minority samples, and a final prediction result is obtained in a weighted voting mode, wherein the prediction result of the random forest is defined as:

；

Wherein, Representing the predicted outcome of a random forest,/>Representing the maximum index function, N is the test set, T is the number of decision trees,/>, andTo indicate a function,/>For/>Prediction result of decision tree,/>Representing category,/>For/>Voting weights of the decision tree; when the prediction result of the decision tree is true, the function/>, is indicatedThe value of (2) is 1, and vice versa is 0.

；

Claims

1. The training method of the data processing and identifying model based on machine learning is characterized by comprising the following steps:

s1, acquiring soil data, and marking a training sample for model training;

S2, reducing the dimension of the data, and recombining the high-dimension characteristic variables to form a group of low-dimension linear independent variables;

S4, adopting a neural network model optimized based on a search operator algorithm to carry out parameters on neurons Sum parameter/>Optimizing,/>Weight parameters of neuronsThreshold parameters of neurons;

s401, defining a search operator, and setting search conditions:

setting n search operators in the search operator population, wherein the individual states of the search operators can be expressed as follows Wherein/>For/>The states of the search operators, namely free variables in the parameter optimizing problem; for objective function/>A representation; search operator/>、/>The distance between them is/>; The searching radius of the searching operator is Visual; the Step length of searching is Step; the crowding factor is/>; At a certain moment/>Search operatorsSearching for any position/>, within a search radius VisualIf/>Position status is better than/>Location, then go to/>Further forward in the direction of position, i.e. arrival/>A location; otherwise, continuing to search for other locations within the field of view, the process is expressed as:

；

In the method, in the process of the invention, A random number of 0 to 1;

(1) Search behavior

Assume the firstThe state of a search operator at a certain moment is/>Randomly selecting a state/>, within its search rangeThe following formula is satisfied:

；

(2) Aggregation behavior

；

(3) Rear-end collision behavior

；

(4) Random behavior

；

Acquiring an optimal solution set of the neural network parameters through searching behaviors, aggregating behaviors, rear-end collision behaviors and random behaviors;

S5, training a classifier machine learning model;

S6, applying the trained model to carry out soil heavy metal pollution degree, training the model by using a sample with a mark, and detecting and identifying the data to be detected and identified after model training is completed;

S3, sample expansion adopts a SMOTE sample generation method based on rapid clustering, and the method comprises the following steps:

S301, obtaining k neighbor samples of each minority sample by calculating Euclidean distances from the minority sample to other minority samples, and generating a new minority sample by performing linear interpolation between the sample and the selected neighbor sample in a random selection mode, wherein the specific process is shown in the following formula:

；

Wherein, Representing one sample among k neighbors,/>Is a random number,/>Is an input sample,/>Is the new sample generated;

；

Wherein, Representing a threshold value/>Is a proportionality coefficient, is set by man, is provided withThe value range of (2) is more than 0 and less than 1; respectively minimum distance and maximum distance between categories;

；

2. The training method of a machine learning-based data processing and recognition model according to claim 1, wherein the dimension reduction method adopted in S2 is a principal component analysis method, comprising the steps of:

；

s202, calculating a covariance matrix, wherein the covariance matrix is defined as one mathematically Matrix/>Representing the dimensionality of the acquired data, each element in the matrix representing the covariance of the corresponding variable, for a vector with variable/>And the hyperspectral band scene of variable b, the covariance of which is a2 x 2 matrix, as follows:；

S203, calculating a feature vector and a feature value:

S204, calculating main components:

s205, reducing the dimension of the data set:

Rearranging the raw data with final principal components representing the largest and most important information of the dataset; to replace the original data set with the newly formed principal component, it is simply multiplied with the transpose of the original data, the resulting data being dimensionality reduced

；

Wherein,Representing the predicted outcome of a random forest,/>Representing the maximum index function, N is the test set, T is the number of decision trees,/>, andTo indicate a function,/>For the prediction result of the t-th decision tree, y represents the category,Voting weight for the t decision tree; when the prediction result of the decision tree is true, the function/>, is indicatedThe value of (2) is 1, whereas 0;

；