CN112801140A - XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm - Google Patents

XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm Download PDF

Info

Publication number
CN112801140A
CN112801140A CN202110018388.8A CN202110018388A CN112801140A CN 112801140 A CN112801140 A CN 112801140A CN 202110018388 A CN202110018388 A CN 202110018388A CN 112801140 A CN112801140 A CN 112801140A
Authority
CN
China
Prior art keywords
moth
xgboost
fire suppression
optimization algorithm
breast cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110018388.8A
Other languages
Chinese (zh)
Inventor
胡雪梅
徐蔚鸿
陈沅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202110018388.8A priority Critical patent/CN112801140A/en
Publication of CN112801140A publication Critical patent/CN112801140A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to the field of breast cancer diagnosis, in particular to an XGboost breast cancer diagnosis method based on a moth fire suppression optimization algorithm, which comprises the following steps: acquiring an original breast cancer data set and carrying out normalization processing on the data set; selecting screening characteristics by adopting a default parameter XGboost characteristic, reducing data dimensionality and dividing a breast cancer sample data set after dimensionality reduction into a training sample set and a testing sample set; optimizing parameters of the XGboost model by adopting a moth fire suppression optimization algorithm; inputting the training sample set into the optimized XGboost model for model training, verifying the performance of the model by adopting 10-fold cross validation, and measuring the trained model by using Accuracy index; inputting a test sample set into a trained model, obtaining a classification result, and measuring the classification result by adopting Accuracy, F1, G-mean and AUC indexes. Compared with the prior art, the method has the advantages of simple model, interpretability, high prediction accuracy, high prediction speed and the like.

Description

XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm
Technical Field
The invention relates to the field of breast cancer diagnosis, in particular to an XGboost breast cancer rapid diagnosis method based on a moth fire suppression optimization algorithm.
Background
Worldwide, breast cancer accounts for approximately 15% of all cancers affecting women, and is a common cause of cancer-related deaths in women. And women of any age may have breast cancer. Therefore, the key to early detection and treatment of breast cancer is preventive screening, and screening programs have been successfully initiated in many countries around the world. But the related medical resources are scarce for most people, and particularly, the number of experts and medical staff aiming at the serious disease of cancer is less, and the problem of uneven distribution of the medical resources exists. Applying machine learning to the field of cancer diagnosis can greatly improve the diagnosis efficiency.
The ensemble learning algorithm improves the stability and accuracy of the model by combining a plurality of weak classifiers into a strong classifier. XGboost belongs to one of integrated learning algorithms and is an improvement on GBDT algorithm. The XGboost algorithm performs second-order Taylor expansion on the loss function, so that the accuracy of the model is improved, and a regular term is added into the objective function to obtain the optimal solution of the whole body, so that the reduction of the objective function and the complexity of the model are balanced, and overfitting is avoided. The method has good performance advantages when being applied to feature extraction and data classification, can solve practical problems when being applied to the field of breast cancer diagnosis, but is still lack of such application at present.
The performance of the XGboost model is related to the setting of parameters, and the reasonable setting of the parameters can greatly improve the overall effect of the model. The traditional manual parameter adjustment method is used for training an algorithm by checking a random parameter set in a manual mode, but the manual mode cannot ensure that an optimal parameter combination is obtained, the conventional and widely used method is a grid search and random search algorithm, but similar results of random search and manual parameter adjustment cannot be ensured, the grid search has the problem of high resource consumption, and because the previous parameter information is not considered, the local minimum value is easy to fall into, a new effective parameter adjustment method needs to be provided to improve the model training effect.
The Moth fire suppression optimization algorithm (MFO) is a novel intelligent optimization algorithm proposed in 2015 by Seyedali Mirjalii, and provides a new heuristic search paradigm for the optimization field: the algorithm has the performance characteristics of strong parallel optimization capability, excellent global property and difficulty in falling into a local extreme value. The method comprises the steps of firstly utilizing XGboost of default parameters to extract features of an original breast cancer data set to reduce sample data dimensionality, then utilizing a moth fire suppression optimization algorithm to optimize parameters of an XGboost model to obtain an optimal parameter set, and finally training the optimized XGboost model and verifying the performance of the model by adopting 10-fold cross validation.
Disclosure of Invention
The invention aims to solve the problems of diagnosis accuracy and diagnosis speed in the field of breast cancer diagnosis, and provides an XGboost breast cancer diagnosis method based on a moth fire suppression optimization algorithm.
The purpose of the invention can be realized by the following technical scheme:
an XGboost breast cancer diagnosis method based on a moth fire suppression optimization algorithm comprises the following steps:
1) acquiring an original breast cancer data set and carrying out normalization processing;
2) an XGboost feature selection method is adopted to perform feature selection on an original breast cancer data set, so that the dimensionality of the data set is reduced;
3) dividing the breast cancer data set subjected to dimensionality reduction into a training set and a testing set according to a fixed proportion;
4) optimizing XGboost model parameters by adopting a moth fire suppression optimization algorithm, and determining an optimal parameter set;
5) training the optimized XGboost classification model by using a training sample set;
6) verifying the performance of the trained model by adopting a 10-fold cross validation method, and indicating the quality of the scale quantity model by adopting Accuracy, F1, G-Mean and AUC;
7) inputting a test sample set to a trained XGboost classification model for breast cancer classification diagnosis, and measuring the classification diagnosis effect of the model by adopting Accuracy, F1, G-Mean and AUC indexes.
In the step 1), the acquiring of the original data set is to download a breast cancer data set from a UCI machine learning library, and to perform normalization processing on the breast cancer data set, and the specific calculation is as follows:
Figure BDA0002887825140000021
in the formula (1), the reaction mixture is,
Figure BDA0002887825140000022
represents the data x obtained by normalizing the jth data in the ith dimension in the breast cancer data seti,jTo represent the jth original data in the ith dimension in the breast cancer dataset, Max (x)i) Represents the maximum in the ith dimension, Min (x) in the breast cancer dataseti) Represents the minimum in the ith dimension in the breast cancer dataset;
in the step 2), the XGboost feature selection method comprises the steps of firstly dividing a breast cancer data set into a training set and a testing set according to a ratio of 7:3, training an XGboost classification model with default parameters by using the training set, sequencing features according to the importance of the trained model features, and excluding features with importance scores smaller than a set importance score threshold;
in the step 4), in the moth fire suppression optimization algorithm, the dimension of the moth is set to 9 dimensions, which are respectively a learning rate learning _ rate, a tree n _ estimators of the tree, a maximum tree depth max _ depth, a minimum leaf node weight min _ child _ weight, a genetic algorithm mma value, a random sampling ratio subsample, a ratio colsample _ byte of the column number of each tree in random sampling, a weight reg _ alpha of an L1 regular term, and a weight reg _ lambda of an L2 regular term;
in the step 4), in the moth fire suppression optimization algorithm, the fitness function of the moths is the error rate err _ rate of the XGBoost classification model, and the fitness value of each moth is specifically calculated by firstly training the XGBoost classification model by using a training set, inputting the specified parameter value of the XGBoost classification model to the trained model for classification, and finally calculating the classification error rate err _ rate, namely 1-Accuracy;
in the step 4), initializing a first generation of moths and flames in a moth fire suppression optimization algorithm, randomly taking values of the positions of the moths in a parameter search space, calculating a fitness value corresponding to each moth, and setting the positions of the flames and the fitness values thereof as the positions and the fitness values of the moths sorted in ascending order according to the fitness values;
in the step 4), in the moth fire suppression optimization algorithm, the number of flames is adaptively reduced according to the following formula:
Figure BDA0002887825140000023
in the formula (2), l is the current iteration number, N is the maximum flame number, and T is the maximum iteration number;
in the step 4), in the moth fire suppression optimization algorithm, the position of each moth relative to the flame is updated according to the formula (5):
Mi=S(Mi,Fj) (3)
S(Mi,Fj)=Di×ebt×cos2πt+Fj (4)
Di=|Fj-Mi| (5)
wherein M isiDenotes the i-th moth, FjDenotes the jth flame, S denotes the helical function, DiDenotes the distance between the ith moth and the jth flame, b is a defined logarithmic spiral shape constant (set b equal to 1), and the path coefficient t is [ -1,1]The initial point of the spiral function starts from the moth, the terminal point is the position of flame, and the fluctuation range of the spiral is a parameter search space;
in the step 4), judging whether the current iteration number reaches the maximum iteration number in the moth fire suppression optimization algorithm, directly returning to the position of the flame when the maximum iteration number is reached, namely the searched optimal parameter set, and continuing iterative search when the maximum iteration number is not reached.
The invention has the beneficial effects that:
the invention provides an XGboost breast cancer diagnosis method based on a moth fire suppression optimization algorithm, aiming at the problem of breast cancer diagnosis.
The XGboost classification model parameters are optimized by adopting a moth fire suppression optimization algorithm, and the optimal parameter set is searched by utilizing a heuristic spiral.
And thirdly, the trained XGboost classification model is adopted to diagnose the breast cancer, the diagnosis accuracy is higher, and the operation efficiency is higher than that of the prior art.
And fourthly, verifying the performance of the model by adopting a 10-fold cross verification method, wherein the final evaluation result is better than the prior art in Accuracy, so that overfitting is prevented to a certain extent, and the generalization of the classification model is verified.
Drawings
Fig. 1 is a schematic flow diagram of a breast cancer diagnosis method for optimizing an XGBoost model based on a moth fire suppression optimization algorithm in an embodiment of the present invention.
FIG. 2 is a schematic flow diagram of optimizing XGboost by the moth fire suppression optimization algorithm in the embodiment.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A method for quickly diagnosing XGboost breast cancer based on a moth fire suppression optimization algorithm is characterized by extracting key characteristic attributes from an original breast cancer data set, optimizing an XGboost model parameter training model by using the moth fire suppression algorithm, and realizing the diagnosis of the breast cancer, and referring to a figure 1, the specific method flow is as follows:
1. data set selection and preprocessing
A breast cancer wisconsin (primary) dataset WDBC from the UCI machine learning library was selected, which contains a total of 699 samples, 2 classes, namely 458 benign samples, 241 malignant samples, and 32 characteristic attributes. Preprocessing a data set, firstly adding feature attribute names including id, class and the like to an original data set, removing the id feature attributes in the original data set, wherein the class feature attribute column belongs to a benign class, the corresponding tag value is B, the class feature attribute is set to be 1, the class feature attribute belongs to a malignant class, the corresponding tag value is M, the class feature attribute column is set to be-1, the class tag vector y is assigned, the attribute columns except the class feature attribute column are assigned to a sample data vector X, and finally, normalizing the sample data X, wherein the normalization method is as follows:
Figure BDA0002887825140000031
in the formula (1), the reaction mixture is,
Figure BDA0002887825140000032
representing data obtained by normalizing the jth data in the ith dimension in sample data X, Xi,jTo represent the jth original data in the ith dimension in sample data X, Max (X)i) Represents the maximum value in the ith dimension, Min (X), in the sample data Xi) Representing the minimum value in the ith dimension in sample data X.
2. XGboost feature selection processing and sample data set partitioning
Dividing the sample data set X and the labels y thereof into a training sample set and a test sample set according to the proportion of 7:3, training an XGboost model with default parameters by using the training sample set, checking the importance score of each feature by using the feature _ importances attribute of the trained model, and drawing a feature importance ascending sort chart. And (4) screening out the features with the importance scores of less than 0.003099 by adopting a transform method of SelectFromModel in the sklern feature selection package to obtain a new sample data set X with only 22 features. And dividing the new sample data set X and the label y thereof into a training sample set and a test sample set according to the proportion of 7: 3.
3. Optimization of XGboost classification model parameters by moth fire suppression optimization algorithm
The XGboost model has a plurality of parameters, and in order to further improve the classification accuracy of the model, the optimal parameter set of the model is searched by adopting a moth fire suppression optimization algorithm. Referring to fig. 2, the specific implementation steps of the moth fire suppression optimization algorithm for searching the optimal parameter set of the XGBoost classification model are as follows: the method comprises the following steps: initializing the first generation of moths and flames, setting 10 moths and the variable number thereof as 9, setting the maximum iteration times as 50, and expressing the position matrixes and the fitness value vectors of the moths and the flames as follows:
Figure RE-GDA0002994528040000041
Figure RE-GDA0002994528040000042
Figure RE-GDA0002994528040000043
Figure RE-GDA0002994528040000044
wherein m is1,1First dimension data value, OM, representing a first moth1Representing the fitness value, f, corresponding to the first moth1,1Data values, OF, representing the 1 st dimension OF the first flame1Representing a fitness value corresponding to the first flame;
step two: selecting 9 parameters of learning rate learning _ rate of the XGboost model, tree n _ estimators of the tree, maximum tree depth max _ depth, minimum leaf node weight min _ child _ weight, gamma value, random sampling proportion subsample, proportion colsample _ byte of column number of random sampling of each tree, weight reg _ alpha of L1 regular term and weight reg _ lambda of L2 regular term as variables of the moths, randomly initializing the variables of each first generation of moths according to the search space in the table 1 and calculating the fitness value of each moth, and setting the flame position and the fitness value of the first generation as the position and the fitness value of the moths sorted according to the ascending sequence of the fitness values;
step three: the self-adaptive mechanism reduces the number of flames, if the position updating of each time of 10 moths is based on 10 different positions in a search space, the local development capability of the algorithm can be reduced, in order to solve the problem, the self-adaptive mechanism is adopted to reduce the global and local development capabilities of the flame number balance algorithm in the search space, and the flame number is calculated according to the formula (6):
Figure RE-GDA0002994528040000045
where l is the current iteration number;
step four: updating the position of each moth, wherein the updating mechanism is as follows:
Mi=S(Mi,Fj) (7)
S(Mi,Fj)=Di×ebt×cos2πt+Fj (8)
Di=|Fj-Mi| (9)
wherein M isiDenotes the i-th moth, FjDenotes the jth flame, DiDenotes the distance between the ith moth and the jth flame, b is a defined logarithmic spiral shape constant (set b equal to 1), and the path coefficient t is [ -1,1]The random number of (1);
step five: calculating the fitness value of the moths, training an XGboost classification model corresponding to each moth by using a training sample set, inputting 9 designated parameters of the model into the trained classification model to calculate the error rate err _ rate (1-Accuracy) of the classification as the fitness value of the moths, wherein the 9 designated parameters are position parameters of each moth;
step six: setting the flame position and the fitness value thereof, reordering the updated moth position and the flame position according to the fitness value, and selecting a space position with a smaller fitness value to update the space position to the position of the next generation of flame;
step seven: and judging whether the iteration termination condition is met, if the iteration times reach the maximum iteration times, returning the position parameters of the flame, wherein the position parameters of the flame are the searched optimal parameter set, and otherwise, returning to the second step to continue the iterative search.
TABLE 1 parameter search space
Figure BDA0002887825140000051
4. Training optimized XGboost classification model
Training an XGboost classification model with optimal parameters by using a training sample set and storing the model;
5. cross validation model training effect
Performing 10-fold cross validation on the trained XGboost classification model by using a sample data set X and a label y thereof, and measuring the classification effect of the model by adopting evaluation index running time, Accuracy, F1, G-Mean and AUC;
6. breast cancer diagnosis
Inputting a test sample set into a trained classification model to obtain a diagnosis result, and measuring the model classification diagnosis effect by adopting evaluation indexes such as running time, accuacy, F1, G-Mean and AUC;
7. description of evaluation index
In the breast cancer sample data set X, the classification label value of the benign tumor is 1, and the classification label value of the malignant tumor is-1, as can be seen from the result confusion matrix in table 2, TP represents the number of the benign tumors correctly diagnosed as benign, FP represents the number of the malignant tumors incorrectly diagnosed as benign, TN represents the number of the benign tumors incorrectly diagnosed as malignant, and FN represents the number of the malignant tumors correctly diagnosed as malignant. The indices Accuracy, F1, G-Mean and AUC are calculated as follows:
Figure BDA0002887825140000061
Figure BDA0002887825140000062
Figure BDA0002887825140000063
Figure BDA0002887825140000064
Figure BDA0002887825140000065
Figure BDA0002887825140000066
Figure BDA0002887825140000067
Figure BDA0002887825140000068
AUC is the area under the ROC curve, the ordinate of the ROC curve is the real normal rate TPR, and the abscissa is the false positive rate FPR.
Table 2 results confusion matrix
Actual value \ predicted value Positive(1) Negative(-1)
True(1) TP (True Positive) TN (True Negative)
False(-1) FP (False Positive) FN (False Negative)
8. Results of the experiment
The best parameter set of the XGBoost model searched by the moth fire suppression optimization algorithm is shown in table 3. Meanwhile, two groups of comparison experiments are carried out, the first group adopts different parameter optimization methods on the basis of feature selection, and comprises a genetic optimization algorithm, a grid search and Bayesian optimization algorithm and a comparison between an original model and the moth fire suppression optimization algorithm provided by the invention, the average classification Accuracy Accuracy, the average F1, the average G-mean, the average AUC and the running time after 10-fold cross validation are shown in the table 4, the average classification Accuracy Accuracy, the average F1, the average G-mean and the average AUC of the Tree-MFO-XGB model provided by the invention are the highest, the running time is the second shortest, the classification Accuracy and the running time are comprehensively considered, and the classification effect of the model provided by the invention is the best. The second group is that on the basis of processing an original data set by adopting feature selection, classification diagnosis is respectively carried out by using a Support Vector Machine (SVM) model, a GBDT model, a random forest model and a K nearest neighbor algorithm and a comparison experiment based on 10-fold cross validation of the method provided by the invention, the comparison result is shown in table 5, the average classification Accuracy (Accuracy, average F1, average G-mean and average AUC) of the models are still not as high as that of the method provided by the invention, and the method provided by the invention is most effective by comprehensively considering the diagnosis Accuracy and time.
TABLE 3 optimal parameter set
Figure BDA0002887825140000071
Table 4 comparative experiment result 1 based on 10-fold cross validation
Figure BDA0002887825140000072
Table 5 comparative experiment result 2 based on 10-fold cross validation
Figure BDA0002887825140000081
The above embodiments describe in detail specific implementation manners of the XGBoost breast cancer rapid diagnosis method based on the moth fire suppression optimization algorithm, and the description of the above embodiments only uses the methods and core ideas provided to help understanding the present invention.

Claims (6)

1. An XGboost breast cancer rapid diagnosis method based on a moth fire suppression optimization algorithm is characterized by comprising the following steps:
(1) carrying out normalization processing on an original breast cancer data set to obtain a sample data set;
(2) the XGboost feature selection algorithm with default parameters is adopted to sort and screen the features of the sample data set according to feature importance, extract key features, reduce the dimensionality of the sample data, and divide the dimensionality-reduced sample data set into a training sample set and a testing sample set according to a fixed proportion;
(3) optimizing XGboost model parameters by adopting a moth fire suppression optimization algorithm, determining an optimal parameter set, and inputting a training sample set into the optimized XGboost model for training;
(4) and verifying the performance of the trained model by adopting a 10-fold cross validation method, and measuring the final classification effect of the model by adopting the operation time, Accuracy, F1, G-Mean and AUC indexes.
2. The XGboost breast cancer diagnosis method based on moth fire suppression optimization algorithm according to claim 1,
in the fire suppression optimization algorithm for the moths in the step (3), the moths are assumed to be candidate parameter sets of the search parameter set, the moths are search individuals moving in the search space, the parameter variable to be solved is the positions of the moths in the search space, and the matrix of the moths population is represented as follows:
Figure FDA0002887825130000011
wherein n represents the number of moths and is set to be 10, d represents the number of parameter variables to be solved and is set to be 9, and a corresponding column of fitness value vectors are assumed to exist for the n moths and are represented as follows:
Figure FDA0002887825130000012
the flame is the optimal position of the moths corresponding to the space so far, and each moth updates the position of the moth by using the unique flame corresponding to the moth, so that the situation of trapping a local extreme value is avoided, and therefore, the position of the moth in the search space and the position of the flame are variable matrixes with the same dimension, which are expressed as follows:
Figure FDA0002887825130000013
Figure FDA0002887825130000014
wherein OF is a fitness value vector corresponding to the flame, and the position OF the flame and the fitness value thereof are set as the position and the fitness value OF the moth sorted according to the ascending order OF the fitness value.
3. The XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm according to claim 1,
in the step (3), 9 parameters with large influence of the XGBoost model are selected by the moth fire suppression optimization algorithm, wherein the parameters are respectively learning rate learning _ rate, tree n _ estimators of the tree, maximum tree depth max _ depth, minimum leaf node weight min _ child _ weight, gamma value, random sampling ratio subsample, ratio colsample _ byte of column number of random sampling of each tree, weight reg _ alpha of L1 regular term and weight reg _ lambda of L2 regular term, and the number of the control variables of each moth is equal to the number of XGBoost parameters needing to be optimized and is 9.
4. The XGboost breast cancer diagnosis method based on moth fire suppression optimization algorithm according to claim 1,
and (3) selecting the classification error rate of the XGboost classification model as a fitness function of the moths by the moth fire suppression optimization algorithm in the step (3), wherein the error rate err _ rate is calculated in a manner that err _ rate is 1-Accuracy.
5. The XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm according to claim 1,
in the step (3), the moth fire suppression optimization algorithm performs mathematical modeling on the flying behavior of moth fire suppression, and the position updating mechanism of each moth relative to the flame can be represented by the following equation:
Mi=S(Mi,Fj) (5)
S(Mi,Fj)=Di×ebt×cos2πt+Fj (6)
Di=|Fj-Mi| (7)
wherein M isiDenotes the i-th moth, FjDenotes the jth flame, S denotes the helical function, DiDenotes the distance between the ith moth and the jth flame, b is a defined logarithmic spiral shape constant (set b equal to 1), and the path coefficient t is [ -1,1]The initial point of the spiral function starts from the moth, the terminal point is the position of the flame, and the fluctuation range of the spiral is a parameter search space.
6. The XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm according to claim 1,
in the step (3), the moth fire suppression optimization algorithm adopts a self-adaptive mechanism to reduce the number of flames in an iterative process in a self-adaptive manner, and the formula is as follows:
Figure FDA0002887825130000021
where l is the current iteration number, N is the maximum number of flames, set to 10, and T is the maximum iteration number, set to 50.
CN202110018388.8A 2021-01-07 2021-01-07 XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm Pending CN112801140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110018388.8A CN112801140A (en) 2021-01-07 2021-01-07 XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110018388.8A CN112801140A (en) 2021-01-07 2021-01-07 XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm

Publications (1)

Publication Number Publication Date
CN112801140A true CN112801140A (en) 2021-05-14

Family

ID=75808922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110018388.8A Pending CN112801140A (en) 2021-01-07 2021-01-07 XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm

Country Status (1)

Country Link
CN (1) CN112801140A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506630A (en) * 2021-07-08 2021-10-15 上海中医药大学附属龙华医院 Whole all-round intelligent management system of breast cancer postoperative
CN114358169A (en) * 2021-12-30 2022-04-15 上海应用技术大学 Colorectal cancer detection system based on XGboost
CN115406882A (en) * 2022-10-31 2022-11-29 常州安控电器成套设备有限公司 GBDT and improved MFO-based water quality pollutant detection method
CN117253555A (en) * 2023-11-17 2023-12-19 山东阜丰发酵有限公司 Method for improving xanthan gum fermentation process and operation system thereof
CN117314700A (en) * 2023-11-27 2023-12-29 朗朗教育科技股份有限公司 Dual-teacher teaching management method and system for preschool education
CN117877740A (en) * 2023-12-08 2024-04-12 南通大学附属医院 Gastric cancer lymph node metastasis prediction method based on noninvasive test indexes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784455A (en) * 2019-10-16 2020-02-11 国网湖北省电力有限公司电力科学研究院 Method for optimizing Xgboost model based on linear decreasing weight particle swarm algorithm
CN111242302A (en) * 2019-12-27 2020-06-05 冶金自动化研究设计院 XGboost prediction method of intelligent parameter optimization module
CN111243662A (en) * 2020-01-15 2020-06-05 云南大学 Pan-cancer gene pathway prediction method, system and storage medium based on improved XGboost
CN111708865A (en) * 2020-06-18 2020-09-25 海南大学 Technology forecasting and patent early warning analysis method based on improved XGboost algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784455A (en) * 2019-10-16 2020-02-11 国网湖北省电力有限公司电力科学研究院 Method for optimizing Xgboost model based on linear decreasing weight particle swarm algorithm
CN111242302A (en) * 2019-12-27 2020-06-05 冶金自动化研究设计院 XGboost prediction method of intelligent parameter optimization module
CN111243662A (en) * 2020-01-15 2020-06-05 云南大学 Pan-cancer gene pathway prediction method, system and storage medium based on improved XGboost
CN111708865A (en) * 2020-06-18 2020-09-25 海南大学 Technology forecasting and patent early warning analysis method based on improved XGboost algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SEYEDALI MIRJALILI: "《Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm》", 《ELSEVIER》 *
俞圣亮: "《基于基因数据的乳腺癌预后分析》", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506630A (en) * 2021-07-08 2021-10-15 上海中医药大学附属龙华医院 Whole all-round intelligent management system of breast cancer postoperative
CN114358169A (en) * 2021-12-30 2022-04-15 上海应用技术大学 Colorectal cancer detection system based on XGboost
CN114358169B (en) * 2021-12-30 2023-09-26 上海应用技术大学 Colorectal cancer detection system based on XGBoost
CN115406882A (en) * 2022-10-31 2022-11-29 常州安控电器成套设备有限公司 GBDT and improved MFO-based water quality pollutant detection method
CN117253555A (en) * 2023-11-17 2023-12-19 山东阜丰发酵有限公司 Method for improving xanthan gum fermentation process and operation system thereof
CN117314700A (en) * 2023-11-27 2023-12-29 朗朗教育科技股份有限公司 Dual-teacher teaching management method and system for preschool education
CN117877740A (en) * 2023-12-08 2024-04-12 南通大学附属医院 Gastric cancer lymph node metastasis prediction method based on noninvasive test indexes

Similar Documents

Publication Publication Date Title
CN112801140A (en) XGboost breast cancer rapid diagnosis method based on moth fire suppression optimization algorithm
Lu et al. A hybrid ensemble algorithm combining AdaBoost and genetic algorithm for cancer classification with gene expression data
CN111368891B (en) K-Means text classification method based on immune clone gray wolf optimization algorithm
CN111626336A (en) Subway fault data classification method based on unbalanced data set
Kianmehr et al. Fuzzy clustering-based discretization for gene expression classification
CN114841257A (en) Small sample target detection method based on self-supervision contrast constraint
CN106548041A (en) A kind of tumour key gene recognition methods based on prior information and parallel binary particle swarm optimization
CN112215259B (en) Gene selection method and apparatus
CN112164426A (en) Drug small molecule target activity prediction method and device based on TextCNN
CN113159264A (en) Intrusion detection method, system, equipment and readable storage medium
CN110010204B (en) Fusion network and multi-scoring strategy based prognostic biomarker identification method
CN116821715A (en) Artificial bee colony optimization clustering method based on semi-supervision constraint
CN106951728B (en) Tumor key gene identification method based on particle swarm optimization and scoring criterion
CN110991494A (en) Method for constructing prediction model based on improved moth optimization algorithm
CN114496112A (en) Multi-objective optimization-based breast cancer resistant drug component intelligent quantification method
Qin et al. Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm
CN114219228A (en) Stadium evacuation evaluation method based on EM clustering algorithm
Babu et al. A simplex method-based bacterial colony optimization algorithm for data clustering analysis
CN111583194A (en) High-dimensional feature selection algorithm based on Bayesian rough set and cuckoo algorithm
CN114117876A (en) Feature selection method based on improved Harris eagle algorithm
CN114943866A (en) Image classification method based on evolutionary neural network structure search
CN115249513A (en) Neural network copy number variation detection method and system based on Adaboost integration idea
CN114358191A (en) Gene expression data clustering method based on depth automatic encoder
CN114095268A (en) Method, terminal and storage medium for network intrusion detection
CN110782950A (en) Tumor key gene identification method based on preference grid and Levy flight multi-target particle swarm algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210514

WD01 Invention patent application deemed withdrawn after publication