CN113269254A - Coal and gangue identification method for particle swarm optimization XGboost algorithm - Google Patents

Coal and gangue identification method for particle swarm optimization XGboost algorithm Download PDF

Info

Publication number
CN113269254A
CN113269254A CN202110580411.2A CN202110580411A CN113269254A CN 113269254 A CN113269254 A CN 113269254A CN 202110580411 A CN202110580411 A CN 202110580411A CN 113269254 A CN113269254 A CN 113269254A
Authority
CN
China
Prior art keywords
coal
gangue
model
xgboost
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110580411.2A
Other languages
Chinese (zh)
Inventor
周孟然
闫鹏程
胡锋
来文豪
卞凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Science and Technology
Original Assignee
Anhui University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Science and Technology filed Critical Anhui University of Science and Technology
Priority to CN202110580411.2A priority Critical patent/CN113269254A/en
Publication of CN113269254A publication Critical patent/CN113269254A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a coal and gangue identification method for particle swarm optimization XGboost algorithm, belonging to the field of coal and gangue identification and comprising the following steps: collecting multispectral image information of coal and gangue, and preprocessing the multispectral image information; carrying out sample division on the collected coal and gangue multispectral images, randomly dividing the preprocessed coal and gangue multispectral images into independent training sets and test sets according to a ratio of 7:3, and setting labels for the samples; performing feature extraction on the coal and gangue multispectral images in the training set and the testing set; constructing a coal and gangue identification model based on an XGboost algorithm by using the extracted multispectral image characteristics, training the coal and gangue identification model on a training set, and optimizing parameters of the XGboost algorithm through a particle swarm optimization algorithm; and (4) testing the classification accuracy of the coal and gangue identification model to the coal and gangue through the test set, and verifying the performance of the model. The XGboost model adopted by the method has high identification accuracy, strong interpretability and difficult overfitting, and can obtain good classification effect.

Description

Coal and gangue identification method for particle swarm optimization XGboost algorithm
Technical Field
The invention belongs to the technical field of coal and gangue identification, and particularly relates to a coal and gangue identification method for particle swarm optimization XGboost algorithm.
Background
Coal is the first energy source in our country for a long time, and in the process of mining and excavating coal, coal which is not processed is called raw coal, the raw coal contains a large amount of gangue, the content of the gangue is high, the gangue contains a large amount of heavy metals, the calorific value of the gangue is low, the calorific value of the coal is influenced after the gangue is mixed with the coal, the quality of the coal is influenced, and the environment is polluted in the combustion process. While China has been developing clean coal technology, the separation of coal and gangue is an important step.
The method for separating coal and gangue mainly comprises manual gangue discharge, jigging coal separation, floating coal separation, selective crushing, dense medium coal separation, ray detection and identification coal separation and the like, but the methods generally have the problems of low identification precision, large occupied space, high investment cost, serious environmental pollution and the like. The application provides an identification method of a multispectral particle swarm optimization XGboost algorithm, wherein the XGboost algorithm is a gradient lifting integrated learning algorithm based on a gradient lifting decision tree algorithm, and a principle is that a plurality of weak classifiers are integrated, and a more accurate classification effect is obtained through multiple iterations. XGBoost has many advantages: the method has the advantages of high speed, good effect, capability of processing large-scale data, use of second-order derivatives, more accurate loss and support of custom loss functions. The particle swarm optimization algorithm is an evolutionary computing technology. The basic idea is to find the optimal solution through collaboration and information sharing among individuals in a population. The method has the advantages of quite high speed of approaching the optimal solution, simplicity, easy realization, less parameter setting and capability of effectively optimizing the parameters of the system. The particle swarm optimization XGboost algorithm is adopted to construct the coal and gangue identification model, and the method is an effective identification method.
In addition to manual gangue separation, automatic gangue (coal) separation technology can be divided into wet gangue separation and dry gangue separation according to whether water resources are utilized. The wet-method gangue separation needs to consume a large amount of water resources, and the generated coal slime pollution is difficult to treat and is not in accordance with the concept of producing clean coal; the gamma rays, the X-rays and other rays have certain radiation, so that certain damage can be caused to the health of a human body, the interference of factors such as light rays and the like on the common image identification gangue separation is large, and the identification accuracy is not high. The coal and gangue identification method for the particle swarm optimization XGboost algorithm can obtain higher identification rate and high speed, and can make up for the defects of the existing coal and gangue identification method.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a coal and gangue identification method for optimizing an XGboost algorithm by particle swarm.
In order to achieve the above purpose, the invention provides the following technical scheme:
a coal and gangue identification method for a particle swarm optimization XGboost algorithm comprises the following steps:
collecting multispectral image information of coal and gangue, and preprocessing the multispectral image information;
carrying out sample division on the collected coal and gangue multispectral images, randomly dividing the preprocessed coal and gangue multispectral images into independent training sets and test sets according to a ratio of 7:3, and setting labels for the samples, wherein the label of the coal is 1, and the label of the gangue is 0;
performing feature extraction on the coal and gangue multispectral images in the training set and the testing set;
constructing a coal and gangue identification model based on an XGboost algorithm by using the extracted multispectral image characteristics, training the coal and gangue identification model on a training set, and optimizing parameters of the XGboost algorithm through a particle swarm optimization algorithm;
and (4) testing the classification accuracy of the coal and gangue identification model to the coal and gangue through the test set, and verifying the performance of the model.
Preferably, a multispectral image acquisition system is used for acquiring multispectral images of a plurality of samples of coal and gangue to obtain multispectral images of the coal and the gangue.
Preferably, training the gangue identification model comprises:
for a given training sample set D { (x) with N samples and M featuresi,yi)}(i=1,2,…,N,xi∈RM,yiE to R), and finally obtaining an integrated model added by K CART decision trees through XGboost model training:
Figure BDA0003085849740000031
Figure BDA0003085849740000032
is the output of the XGboost model, F ═ F (x) wq(x)}(q:RM→T,w∈RT) F represents a specific CART tree which is a set of all CART decision trees in the model; each decision tree function fkCorresponding to a specific tree structure q and a corresponding leaf node weight vector w; for one sample, the process of obtaining the final predicted value by the XGBoost model is as follows: mapping the sample to a corresponding leaf node on each decision tree, and then adding the weights of K leaf nodes corresponding to the sample; the machine learning model defines a loss function for measuring the deviation between the predicted value and the true value of the model;
the loss function of the XGboost model is:
Figure BDA0003085849740000033
the formula comprises two parts, wherein the first part is a training loss function, and the second part is a regular term;
in the XGboost algorithm, training is carried out in a mode of tree model iterative increase, namely, a CART decision tree function f is added in each step in the training process, so that the loss function is further reduced; after a plurality of iterations, an optimal CART tree f is added in the t steptI.e. the CART tree that minimizes the penalty function, the penalty function becomes:
Figure BDA0003085849740000034
to select a tree structure ftIs such that the loss function L is obtained(t)The reduction amplitude of (a) is maximum, and taylor second-order expansion is performed on the above equation:
Figure BDA0003085849740000035
in the formula:
Figure BDA0003085849740000036
respectively at the point of expansion for the loss function
Figure BDA0003085849740000037
The first and second derivatives of (d);
the regularization term contained in the loss function can be used to control the complexity of the trained model, and is defined as follows:
Figure BDA0003085849740000038
t represents the number of leaf nodes; w represents the leaf weight;
in the expanded form
Figure BDA0003085849740000041
Representing the outputs of all CART tree functions obtained before the t-th step
Figure BDA0003085849740000042
The loss function formed by the sample label is a constant value; because the reduction amplitude of the loss function is irrelevant to the constant term, the constant term is removed, and the loss function is further optimized by combining the regular term expression, so that the simplified loss function is obtained as follows:
Figure BDA0003085849740000043
the formula is paired with wjTaking the derivative and making it 0, the optimal leaf node weight is:
Figure BDA0003085849740000044
the optimal loss function at this time is:
Figure BDA0003085849740000045
the method is used for measuring the quality of any tree structure, and the smaller the tree structure is, the better the tree structure is, so that the loss function of the model can be reduced more;
the XGboost training process is to increase the CART function in an iterative mode to finally obtain an XGboost model
Figure BDA0003085849740000046
When the condition of iteration termination is that the tree model is continuously added, the accuracy of the model is improved by less than s; new function f for each incrementtThe obtaining process is as follows: and initially, a leaf node is added, a branch is added every time, the tree growing scheme with the minimum loss function value is selected, and the process is circularly carried out until the maximum depth of the tree reaches a specified value or the minimum sample weight sum is smaller than a threshold value, and the splitting is stopped.
Preferably, optimizing parameters such as a learning rate (learning _ rate), a maximum tree depth (max _ depth) and a minimum leaf weight (min _ child _ weight) in the XGboost algorithm through a particle swarm optimization algorithm; the method specifically comprises the following steps:
initializing a particle swarm;
determining a fitness function according to a target function of the optimization problem, and calculating the fitness of each particle in the particle swarm;
calculating the individual most extreme value of each particle in the particle swarm, then comparing the current fitness value of each particle in the particle swarm with the individual extreme value of each particle, and replacing the individual extreme value with the fitness value if the current fitness value of the particle is superior to the individual extreme value of the particle;
comparing the current fitness values of all the particles in the particle swarm with the global extreme value, and replacing the global extreme value with the current fitness value if the current fitness value is superior to the global extreme value;
updating the velocity and position of the particles by formula
νid(t+1)=ω·νid(t)+c1r1[pid(t)-νid(t)]+c2r2[pgd(t)-νid(t)]
And xid(t+1)=xid(t)+νid(t +1) updating the speed and position of the particles;
judging whether an iteration condition is met, and if the iteration condition is met, ending; if the condition is not met, returning to the step 2 and continuing to optimize;
and (3) iteration termination conditions: and when the iteration times reach the set maximum iteration times or the set minimum error standard, stopping the iteration, and otherwise, continuing the iteration until the iteration termination condition is met.
The coal and gangue identification method of the particle swarm optimization XGboost algorithm provided by the invention has the following beneficial effects:
according to the invention, the multispectral imaging technology is adopted to acquire the images of the coal and the gangue, and a coal and gangue identification model of a particle swarm optimization XGboost algorithm is established through feature extraction and sample division, so that the coal and the gangue can be quickly and accurately identified. The XGboost model has the advantages that the interpretability is strong, overfitting is not easy to generate, the XGboost is optimized and parameter-adjusted by combining the particle swarm algorithm, great complexity cannot be added to the original XGboost algorithm, the stability of the model can be improved, and the identification accuracy of the XGboost model is effectively improved;
the XGboost algorithm is combined with the particle swarm optimization algorithm, so that the optimal parameters can be quickly found, the optimal model is established, the operation speed and the accuracy of the model are improved, a good classification effect is realized, and the XGboost algorithm is also advantageous compared with other machine learning algorithms.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the invention and it will be clear to a person skilled in the art that other drawings can be derived from them without inventive effort.
Fig. 1 is a flowchart of a coal and gangue identification method of a particle swarm optimization XGBoost algorithm according to embodiment 1 of the present invention;
FIG. 2 is a flow chart of XGboost parameter optimization based on particle swarm optimization.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention and can practice the same, the present invention will be described in detail with reference to the accompanying drawings and specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
The invention provides a coal and gangue identification method for a particle swarm optimization XGboost algorithm, which comprises the following steps of:
step 1, acquiring multispectral image information of coal and gangue by using a multispectral image acquisition system to obtain a multispectral image data set of the coal and the gangue, and preprocessing the data; the multispectral image acquisition system selects a real-time multispectral Mosai surface camera of Shanghai five-bell photo-electronic technology Limited to acquire multispectral images of a plurality of samples of coal and gangue, and the multispectral images of the coal and the gangue are obtained, wherein the pixels of the multispectral images are 2048 x 1088.
And 2, carrying out sample division on the collected coal and gangue multispectral images, randomly dividing the preprocessed coal and gangue multispectral images into independent training sets and independent testing sets according to a ratio of 7:3, and setting labels for the samples, wherein the label of the coal is 1, and the label of the gangue is 0.
And 3, extracting the characteristics of the collected coal and gangue multispectral images.
Step 4, constructing a coal and gangue identification model based on an XGboost algorithm by using the extracted multispectral image characteristics, training the coal and gangue identification model on a training set, and optimizing parameters of the XGboost algorithm through a particle swarm optimization algorithm; the particle swarm optimization algorithm is a global random search algorithm based on swarm intelligence, and is high in convergence speed, few in required parameters and easy to implement.
Specifically, in this embodiment, training the coal and gangue identification model includes:
for a given training sample set D { (x) with N samples and M featuresi,yi)}(i=1,2,…,N,xi∈RM,yiE to R), and finally obtaining an integrated model added by K CART decision trees through XGboost model training:
Figure BDA0003085849740000061
Figure BDA0003085849740000071
is the output of the XGboost model, F ═ F (x) wq(x)}(q:RM→T,w∈RT) F represents a specific CART tree for the set of all CART decision trees in the model. Each decision tree function fkCorresponding to a particular tree structure q and corresponding leaf node weight vectors w. For one sample, the process of obtaining the final predicted value by the XGBoost model is as follows: and mapping the sample to a corresponding leaf node on each decision tree, and then adding the weights of K leaf nodes corresponding to the sample. The machine learning model defines a loss function for measuring the deviation between the predicted value and the true value of the model, and in the training process, the training target is to make the value of the loss function as small as possible.
The loss function of the XGboost model is:
Figure BDA0003085849740000072
the equation contains two parts, the first part being the training loss function and the second part being the regularization term.
In the XGboost algorithm, training is carried out in a mode of tree model iterative increase, namely, a CART decision tree function f is added to each step in the training process, so that loss is causedThe function is further reduced. After a plurality of iterations, an optimal CART tree f is added in the t steptI.e. the CART tree that minimizes the penalty function. The loss function becomes:
Figure BDA0003085849740000073
to select a tree structure ftIs such that the loss function L is obtained(t)The reduction amplitude of (a) is maximum, and taylor second-order expansion is performed on the above equation:
Figure BDA0003085849740000074
in the formula:
Figure BDA0003085849740000075
respectively at the point of expansion for the loss function
Figure BDA0003085849740000076
The first and second derivatives of (a).
The regular term contained in the loss function can be used for controlling the complexity of the trained model, so that the model is not excessively complex while ensuring the accuracy on a training sample, thereby avoiding overfitting and enhancing the generalization capability, and the definition is as follows:
Figure BDA0003085849740000077
t represents the number of leaf nodes; w represents the leaf weight.
In the expanded form
Figure BDA0003085849740000081
Representing the outputs of all CART tree functions obtained before the t-th step
Figure BDA0003085849740000082
Loss of sample labelThe function is a constant value. Because the reduction amplitude of the loss function is irrelevant to the constant term, the constant term is removed, and the loss function is further optimized by combining the regular term expression, so that the simplified loss function is obtained as follows:
Figure BDA0003085849740000083
the formula is paired with wjTaking the derivative and making it 0, the optimal leaf node weight is:
Figure BDA0003085849740000084
the optimal loss function at this time is:
Figure BDA0003085849740000085
the method is used for measuring the quality of any tree structure, and the smaller the tree structure is, the better the tree structure is, so that the loss function of the model can be reduced more.
The XGboost training process is to increase the CART function in an iterative mode to finally obtain an XGboost model
Figure BDA0003085849740000086
The condition of iteration termination is that when the tree model is continuously added, the accuracy of the model is improved by less than s. New function f for each incrementtThe obtaining process is as follows: and initially, a leaf node is added, a branch is added every time, the tree growing scheme with the minimum loss function value is selected, and the process is circularly carried out until the maximum depth of the tree reaches a specified value or the minimum sample weight sum is smaller than a threshold value, and the splitting is stopped.
There are many parameters in the XGBoost algorithm, and parameter optimization is necessary to obtain good classification result.
Specifically, the XGBoost parameters are classified into 3 types of general parameters, boost parameters, and learning target parameters, where the boost parameters are main parameters when the data sample is trained, and the influence of adjusting the parameters on the accuracy of the model is the largest.
TABLE 1 Booster parameter information table of XGboost algorithm
Figure BDA0003085849740000091
According to a large amount of XGboost parameter adjustment experience and engineering practice application, an algorithm cannot be converged due to an excessively large learning _ rate, and an algorithm is overfitting due to an excessively small learning _ rate. Max _ depth is too large, so that the possibility that the model falls into the local optimal solution is also high, and an overfitting phenomenon occurs. Min _ child _ weight is the minimum sample weight and threshold in child nodes, and if the parameters are too small, the algorithm is overfitting, and if the parameters are too large, the classification performance of the algorithm on linear irreparable data is reduced. The larger the value of Gamma, the more conservative the algorithm, the more closely related this parameter is to the loss function and needs to be adjusted. The value of Subsample decreases and the algorithm is more conservative, avoiding overfitting, but if the parameter setting is too small, it may result in under-fitting. Lambda is used to control the regularization portion of XGboost, which has the effect of reducing overfitting. Alpha can be applied in very high dimensionality, making the algorithm faster. Therefore, in the present application, parameters such as learning _ rate, max _ depth, min _ child _ weight, gamma, subsample, colsample _ byte, lambda and alpha are optimized, and other parameters are set as default values.
The XGboost algorithm has a plurality of parameters, and the accuracy of the model is greatly influenced by adjusting the parameters, so in the embodiment, parameters such as learning rate (learning _ rate), maximum tree depth (max _ depth) and minimum leaf weight (min _ child _ weight) in the XGboost algorithm are optimized through the particle swarm optimization algorithm; as shown in fig. 2, the method specifically includes the following steps:
step 4.1, initializing the particle swarm;
4.2, determining a fitness function according to the objective function of the optimization problem, and calculating the fitness of each particle in the particle swarm;
4.3, calculating the individual maximum value of each particle in the particle swarm, then comparing the current fitness value of each particle in the particle swarm with the individual maximum value of the particle, and replacing the individual maximum value with the fitness value if the current fitness value of the particle is superior to the individual maximum value of the particle;
4.4, comparing the current fitness values of all the particles in the particle swarm with the global extreme value, and replacing the global extreme value with the current fitness value if the current fitness value is superior to the global extreme value;
step 4.5, updating the speed and position of the particles through a formula
νid(t+1)=ω·νid(t)+c1r1[pid(t)-νid(t)]+c2r2[pgd(t)-νid(t)]
And xid(t+1)=xid(t)+νid(t +1) updating the speed and position of the particles;
step 4.6, judging whether the iteration condition is met, and if the iteration condition is met, ending; and if the condition is not met, returning to the step 2 and continuing to perform optimization.
And (3) iteration termination conditions: and when the iteration times reach the set maximum iteration times or the set minimum error standard, stopping the iteration, and otherwise, continuing the iteration until the iteration termination condition is met.
And 5, testing the classification accuracy of the coal and gangue identification model on coal and gangue through the test set, and verifying the performance of the model.
Specifically, the performance of the coal and gangue identification model is embodied by the identification accuracy of the model to coal and gangue, the model has high identification accuracy and good performance, and the model has low identification accuracy and poor performance. Therefore, the coal and gangue identification is realized when the model performance is verified.
The identification method provided by the embodiment has the advantages that:
1. effective identification of coal and gangue is an important premise for coal and gangue separation, and the coal and gangue identification based on the particle swarm optimization XGboost algorithm provides a method for accurate identification of coal and gangue, and is a great support for developing clean coal technology.
2. The XGboost algorithm is a gradient-boosting integrated learning algorithm based on a gradient-boosting decision tree, a strong classifier is formed by integrating a CART tree, information application in a first-order derivative and a second-order derivative is extracted by performing second-order Taylor expansion on a loss function, and a regular term is added to reduce the complexity of a model, so that overfitting of the model is prevented.
3. The XGboost algorithm has a plurality of parameters, the boost parameter is particularly important for the accuracy of the model, and the XGboost algorithm is optimized through the particle swarm optimization algorithm to obtain a good classification effect.
The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any simple modifications or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (4)

1. A coal and gangue identification method for a particle swarm optimization XGboost algorithm is characterized by comprising the following steps:
collecting multispectral image information of coal and gangue, and preprocessing the multispectral image information;
carrying out sample division on the collected coal and gangue multispectral images, randomly dividing the preprocessed coal and gangue multispectral images into independent training sets and test sets according to a ratio of 7:3, and setting labels for the samples, wherein the label of the coal is 1, and the label of the gangue is 0;
performing feature extraction on the coal and gangue multispectral images in the training set and the testing set;
constructing a coal and gangue identification model based on an XGboost algorithm by using the extracted multispectral image characteristics, training the coal and gangue identification model on a training set, and optimizing parameters of the XGboost algorithm through a particle swarm optimization algorithm;
and (4) testing the classification accuracy of the coal and gangue identification model to the coal and gangue through the test set, and verifying the performance of the model.
2. The method for identifying the coal and gangue according to the XGboost algorithm optimized by the particle swarm optimization of claim 1, wherein a multispectral image acquisition system is used for acquiring multispectral images of a plurality of samples of the coal and the gangue to obtain the multispectral images of the coal and the gangue.
3. The method for identifying the coal and gangue according to the particle swarm optimization XGboost algorithm, disclosed by claim 1, is characterized in that the training of the coal and gangue identification model comprises the following steps:
for a given training sample set D { (x) with N samples and M featuresi,yi)}(i=1,2,…,N,xi∈RM,yiE to R), and finally obtaining an integrated model added by K CART decision trees through XGboost model training:
Figure FDA0003085849730000011
Figure FDA0003085849730000012
is the output of the XGboost model, F ═ F (x) wq(x)}(q:RM→T,w∈RT) F represents a specific CART tree which is a set of all CART decision trees in the model; each decision tree function fkCorresponding to a specific tree structure q and a corresponding leaf node weight vector w; for one sample, the process of obtaining the final predicted value by the XGBoost model is as follows: mapping the sample to a corresponding leaf node on each decision tree, and then adding the weights of K leaf nodes corresponding to the sample; the machine learning model defines a loss function for measuring the deviation between the predicted value and the true value of the model;
the loss function of the XGboost model is:
Figure FDA0003085849730000021
the formula comprises two parts, wherein the first part is a training loss function, and the second part is a regular term;
in the XGboost algorithm, training is carried out in a mode of tree model iterative increase, namely, a CART decision tree function f is added to each step in the training process, so that the loss is reducedThe loss function is further reduced; after a plurality of iterations, an optimal CART tree f is added in the t steptI.e. the CART tree that minimizes the penalty function, the penalty function becomes:
Figure FDA0003085849730000022
to select a tree structure ftIs such that the loss function L is obtained(t)The reduction amplitude of (a) is maximum, and taylor second-order expansion is performed on the above equation:
Figure FDA0003085849730000023
in the formula:
Figure FDA0003085849730000024
respectively at the point of expansion for the loss function
Figure FDA0003085849730000029
The first and second derivatives of (d);
the regularization term contained in the loss function can be used to control the complexity of the trained model as defined below:
Figure FDA0003085849730000025
t represents the number of leaf nodes; w represents the leaf weight;
in the expanded form
Figure FDA0003085849730000026
Representing the outputs of all CART tree functions obtained before the t-th step
Figure FDA0003085849730000027
The loss function formed by the sample label is a constant value; due to lossesThe reduction amplitude of the function is irrelevant to the constant term, the constant term is removed, and the loss function is further optimized by combining the regular term expression, so that the simplified loss function is obtained as follows:
Figure FDA0003085849730000028
the formula is paired with wjTaking the derivative and making it 0, the optimal leaf node weight is:
Figure FDA0003085849730000031
the optimal loss function at this time is:
Figure FDA0003085849730000032
the method is used for measuring the quality of any tree structure, and the smaller the tree structure is, the better the tree structure is, so that the loss function of the model can be reduced more;
the XGboost training process is to increase the CART function in an iterative mode to finally obtain an XGboost model
Figure FDA0003085849730000033
When the condition of iteration termination is that the tree model is continuously added, the accuracy of the model is improved by less than s; new function f for each incrementtThe obtaining process is as follows: and initially, a leaf node is added, a branch is added every time, the tree growing scheme with the minimum loss function value is selected, and the process is circularly carried out until the maximum depth of the tree reaches a specified value or the minimum sample weight sum is smaller than a threshold value, and the splitting is stopped.
4. The method for identifying the coal and gangue according to the particle swarm optimization XGboost algorithm, disclosed by the claim 3, is characterized in that parameter optimization is carried out on the learning rate, the maximum tree depth and the minimum leaf weight in the XGboost algorithm through the particle swarm optimization algorithm; the method specifically comprises the following steps:
initializing a particle swarm;
determining a fitness function according to a target function of the optimization problem, and calculating the fitness of each particle in the particle swarm;
calculating the individual most extreme value of each particle in the particle swarm, then comparing the current fitness value of each particle in the particle swarm with the individual extreme value of each particle, and replacing the individual extreme value with the fitness value if the current fitness value of the particle is superior to the individual extreme value of the particle;
comparing the current fitness values of all the particles in the particle swarm with the global extreme value, and replacing the global extreme value with the current fitness value if the current fitness value is superior to the global extreme value;
updating the velocity and position of the particles by formula
νid(t+1)=ω·νid(t)+c1r1[pid(t)-νid(t)]+c2r2[pgd(t)-νid(t)]
And xid(t+1)=xid(t)+νid(t +1) updating the speed and position of the particles;
judging whether an iteration condition is met, and if the iteration condition is met, ending; if the condition is not met, returning to the step 2 and continuing to optimize;
and (3) iteration termination conditions: and when the iteration times reach the set maximum iteration times or the set minimum error standard, stopping the iteration, and otherwise, continuing the iteration until the iteration termination condition is met.
CN202110580411.2A 2021-05-26 2021-05-26 Coal and gangue identification method for particle swarm optimization XGboost algorithm Withdrawn CN113269254A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110580411.2A CN113269254A (en) 2021-05-26 2021-05-26 Coal and gangue identification method for particle swarm optimization XGboost algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110580411.2A CN113269254A (en) 2021-05-26 2021-05-26 Coal and gangue identification method for particle swarm optimization XGboost algorithm

Publications (1)

Publication Number Publication Date
CN113269254A true CN113269254A (en) 2021-08-17

Family

ID=77233143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110580411.2A Withdrawn CN113269254A (en) 2021-05-26 2021-05-26 Coal and gangue identification method for particle swarm optimization XGboost algorithm

Country Status (1)

Country Link
CN (1) CN113269254A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688903A (en) * 2021-08-24 2021-11-23 贵州电网有限责任公司 Power transmission line micro-terrain classification method easy to cover ice
CN114235740A (en) * 2021-11-12 2022-03-25 华南理工大学 XGboost model-based waste plastic spectrum identification method
CN114264620A (en) * 2021-11-24 2022-04-01 淮阴工学院 Portable spectral data analysis system and method based on python language
CN114943189A (en) * 2022-07-26 2022-08-26 广东海洋大学 XGboost-based acoustic velocity profile inversion method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688903A (en) * 2021-08-24 2021-11-23 贵州电网有限责任公司 Power transmission line micro-terrain classification method easy to cover ice
CN113688903B (en) * 2021-08-24 2024-03-22 贵州电网有限责任公司 Method for classifying ice-covered micro-topography of power transmission line Louis
CN114235740A (en) * 2021-11-12 2022-03-25 华南理工大学 XGboost model-based waste plastic spectrum identification method
CN114264620A (en) * 2021-11-24 2022-04-01 淮阴工学院 Portable spectral data analysis system and method based on python language
CN114943189A (en) * 2022-07-26 2022-08-26 广东海洋大学 XGboost-based acoustic velocity profile inversion method and system

Similar Documents

Publication Publication Date Title
CN113269254A (en) Coal and gangue identification method for particle swarm optimization XGboost algorithm
WO2022160771A1 (en) Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
Zhao et al. Cloud shape classification system based on multi-channel cnn and improved fdm
CN107563381B (en) Multi-feature fusion target detection method based on full convolution network
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN112464883B (en) Automatic detection and identification method and system for ship target in natural scene
CN106980858B (en) Language text detection and positioning system and language text detection and positioning method using same
CN108648191B (en) Pest image recognition method based on Bayesian width residual error neural network
CN108229550B (en) Cloud picture classification method based on multi-granularity cascade forest network
CN105825502B (en) A kind of Weakly supervised method for analyzing image of the dictionary study based on conspicuousness guidance
CN109697469A (en) A kind of self study small sample Classifying Method in Remote Sensing Image based on consistency constraint
CN110853070A (en) Underwater sea cucumber image segmentation method based on significance and Grabcut
CN107680099A (en) A kind of fusion IFOA and F ISODATA image partition method
CN110458022B (en) Autonomous learning target detection method based on domain adaptation
CN112613428B (en) Resnet-3D convolution cattle video target detection method based on balance loss
CN107945210A (en) Target tracking algorism based on deep learning and environment self-adaption
CN113435486A (en) Coal gangue identification method based on PCA-IFOA-SVM combined with gray level-texture fusion features
CN116977633A (en) Feature element segmentation model training method, feature element segmentation method and device
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
CN115953666B (en) Substation site progress identification method based on improved Mask-RCNN
Wicaksono et al. Improve image segmentation based on closed form matting using K-means clustering
CN116343205A (en) Automatic labeling method for fluorescence-bright field microscopic image of planktonic algae cells
Osumi et al. Domain adaptation using a gradient reversal layer with instance weighting
CN113724233B (en) Transformer equipment appearance image defect detection method based on fusion data generation and transfer learning technology
CN114913158A (en) Hydrogeological rock mass crack and crack water seepage detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210817

WW01 Invention patent application withdrawn after publication