CN114611719A - XGboost training method based on cuckoo search algorithm - Google Patents

XGboost training method based on cuckoo search algorithm Download PDF

Info

Publication number
CN114611719A
CN114611719A CN202210236632.2A CN202210236632A CN114611719A CN 114611719 A CN114611719 A CN 114611719A CN 202210236632 A CN202210236632 A CN 202210236632A CN 114611719 A CN114611719 A CN 114611719A
Authority
CN
China
Prior art keywords
bird nest
xgboost
bird
nest
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210236632.2A
Other languages
Chinese (zh)
Inventor
胡雪梅
徐蔚鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202210236632.2A priority Critical patent/CN114611719A/en
Publication of CN114611719A publication Critical patent/CN114611719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of machine learning, in particular to a novel XGboost training method based on cuckoo search. The CS-based XGboost is applied to the real-world enterprise personnel management field staff information data set for the time-out prediction after the XGboost trained by the method. In addition, CS-based XGBoosts were compared to existing XGBoosts trained by other optimization algorithms, including GA, PSO, etc., in addition to four classifiers of GBDT, RF, SVM and KNN. Experimental results and corresponding discussion show that the XGboost based on the MFO is superior to the comparison model in the main performance indexes such as accuracy, accuracy and recall rate.

Description

XGboost training method based on cuckoo search algorithm
Technical Field
The invention relates to the field of machine learning, in particular to a novel XGboost training method based on a cuckoo search algorithm.
Background
With the rapid development of artificial intelligence technology, machine learning algorithms are applied in various industries to solve practical problems. At present, data information in each field is explosively increased along with industrial development, and the massive data cannot be effectively processed by manpower alone, so that an effective computer algorithm is urgently needed to analyze and utilize the data, and therefore, the problem of processing the data in each field by adopting an artificial intelligence technology to solve is always a research hotspot. XGboost, a typical representative of integrated learning techniques, can efficiently handle large-scale machine learning tasks. Since its introduction, due to its performance advantages and affordable time and memory complexities, it has been widely used in a number of research areas, ranging from cancer diagnosis, medical history analysis to credit risk assessment, metagenomics, etc. Although the traditional XGBoost (i.e., the XGBoost with default parameter setting) is widely applied in many fields, the fitting degree of the original model without parameter optimization and the existing data set is low, which results in poor generalization performance and adaptability. XGBoost has over thirty superparameters, the performance of which is highly dependent on how they are optimized in training, and it is therefore very important to tune them.
Disclosure of Invention
The invention aims to solve the problem of parameter optimization during XGboost model training, and provides a novel XGboost training method based on a cuckoo search algorithm.
The purpose of the invention can be realized by the following technical scheme:
a novel XGboost training method based on cuckoo search algorithm comprises the following steps:
(1) preprocessing the original data set: firstly, scaling each column of attribute values in the data set to an interval [0,1] by adopting a maximum and minimum normalization method, and secondly, performing feature dimensionality reduction on the data set by adopting a random forest feature selection method;
(2) dividing the preprocessed data set into a training set and a test set according to a user-defined proportion;
(3) an XGboost training method based on a cuckoo search algorithm is adopted to train the over-parameters of the XGboost;
(4) according to a group of optimal parameter values obtained by training, constructing the XGboost, and then inputting a training set to train the XGboost;
(5) testing the trained XGboost by using a test set, and outputting a prediction result;
(6) evaluating the prediction performance of the XGboost by using 4 model performance evaluation indexes of Precision Accuracy, Precision, Recall and F1 score;
in the step (2), a random forest feature selection algorithm is adopted to screen the data set, and specifically, the data set is divided into a training set and a testing set according to a fixed proportion, then the training set is input to train a random forest model, importance scores corresponding to each feature are output and are sorted in a descending order, then a feature importance score threshold value is set, and finally the feature with the feature importance score smaller than the set threshold value is deleted, so that the data set after dimensionality reduction is obtained.
In the step (3), an XGBoost structure is trained by using an XGBoost training method based on a cuckoo search algorithm, specifically:
(4-1) determining the size n of the bird nest population; dimension d of the bird nest position; namely the number of parameters to be optimized in the XGboost; probability of discovery Pa(ii) a Upper and lower bounds of the bird's nest search space; the maximum number of iterations Max _ itex. Setting the classification Accuracy of XGboost model prediction as a fitness function of a bird Nest, wherein a matrix representation Nest of the bird Nest position and corresponding fitness vectors NF represent a formula (1) and a formula (2);
Figure BDA0003542486880000021
Figure BDA0003542486880000022
wherein: n represents the number of bird nests, d represents the dimension of the bird nest position, xi,jRepresents the j dimension in the i bird nest, wherein fiAnd representing the fitness value corresponding to the ith bird nest.
(4-2) randomly initializing bird nest positions and searching space S (S ═ lb, ub)]) Initializing the position of bird's nest according to x*,j=random(lbj,ubj) Calculating a random initial value, wherein ubjAnd lbjThe upper and lower search bounds for the jth hyper-parametric variable to be optimized, respectively, and random () represents a random function that returns an in-range[lbj,ubj]A random number within;
(4-3) calculating a fitness function value of the bird nest according to the set fitness function, and reserving the optimal bird nest gt (namely the bird nest position vector with the maximum fitness value);
(4-4) updating the position of the bird nest by adopting Laevir flight: randomly changing the position of the current bird nest by adopting the following formula so as to obtain a group of new bird nest positions, comparing the new bird nest positions with the old bird nest positions, and reserving the bird nest positions with larger adaptability values;
Figure BDA0003542486880000023
wherein: alpha is alpha>0 is the step size scaling factor, and L (lambda) represents the Levy flight function, i.e., L vy, u-t,(1<λ≤3)。
(4-5) discarding a small fraction of worse nests than creating new nests: circulating from the 1 st bird nest to the n th bird nest, and generating a random number r epsilon [0,1] which is subjected to uniform distribution in each circulation; and if r is larger than Pa, updating the position of the bird nest by adopting a formula (4), otherwise, not updating the position of the bird nest. When the circulation is finished, a group of new bird nest positions are obtained;
Figure BDA0003542486880000024
wherein XljAnd XkjFor randomly selected solutions, H (μ) is the Hervessed function, PaIs a handover parameter for balancing local and global random walks, s being the step size, and epsilon being a uniformly distributed random number.
(4-6) calculating the fitness corresponding to the updated bird nest position, and reserving the locally optimal bird nest pt (namely, storing a bird nest position vector with the maximum fitness value in the current bird nest);
(4-7) comparing the fitness values of pt and gt, and if the fitness value of pt is larger than gt, updating the global optimal gt;
(4-8) comparing pt with gt, and updating global optimal gt (including the bird nest position GXbox and the fitness value Gfmax thereof);
(4-9) judging whether the maximum iteration number is reached: and if not, returning to (4-4) to continue the loop iteration, otherwise, returning to the global optimal bird nest position gt.
Compared with the existing XGboost training method, the XGboost training method has the beneficial effects that:
(1) the invention provides a novel XGboost training method based on a cuckoo search algorithm, which is superior to the existing XGboost training method based on PSO and GA when a multi-peak function is optimized;
(2) the invention provides a novel XGboost training method based on a cuckoo search algorithm, which keeps effective balance between local search and diversity or randomness;
(3) the invention provides a novel XGboost training method based on a cuckoo search algorithm, which only comprises two control parameters, so that the algorithm is simpler and more universal;
drawings
Fig. 1 is a schematic flow diagram of XGBoost optimized by the cuckoo search algorithm in the embodiment.
FIG. 2 is a diagram illustrating feature score ordering according to an embodiment.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
A novel XGboost training method based on cuckoo search algorithm comprises the following specific processes:
1. data set preprocessing
Selecting an employee data set HR _ comma _ sep from human resource management of a Kaggle official network, wherein the total number of the employee data set HR _ comma _ sep is 14999 employee records, 10 attribute characteristics and no missing value; the attribute feature details are shown in table 1, the attribute left is a classification label, which indicates whether the job leaving (1-job leaving, 0-job not) is marked as y, the first 9 sample attributes are marked as x, normalization processing is performed on x, and a maximum minimization method is adopted.
TABLE 1 Attribute feature details for employee datasets
Properties Means of Numbering Maximum value Minimum value
satisfaction_level Degree of satisfaction f0 1.00 0.00
last_evaluation Performance assessment f1 1.00 0.36
number_project Number of completed items f2 7.00 2.00
average_montly_hours Average monthly working time f3 310.00 96.00
time_spend_company Duration of work at company f4 10.00 2.00
work_accident Whether there is a work accident f5 1.00 0.00
promotion Whether or not there has been an increase in the past 5 years f6 1.00 0.00
department Department of department f7 9.00 0.00
Salary Salary level f8 2.00 0.00
left Whether or not to leave work class 1 0
2. Random forest feature selection algorithm screening dataset
The method comprises the following specific implementation steps of screening an original data set by adopting a feature selection method, reducing the dimensionality of the data set so as to improve the operation efficiency, deleting redundant or irrelevant attribute features so as to improve the prediction precision of a model, and screening the data set by adopting a random forest feature selection algorithm:
the method comprises the following steps: firstly, dividing a data set (X, y) into a training set (X _ train, y _ train) and a testing set (X _ test, y _ test) according to a ratio of 7: 3;
step two: inputting a training set training random forest classification model rf _ model, calling rf _ model, feature _ attributes _ output importance scores corresponding to the features, and sorting in a descending order, as shown in fig. 2;
step three: setting an importance score threshold thresh to 0.004383, adopting a selectfrommomodel function to reserve a feature larger than thresh, adopting a transform (X) function to convert an original sample X into a new sample X, wherein the reserved features are f0, f4, f2, f3, f1, f7 and f8, and the features f5 and f6 are deleted;
3. data set partitioning
And dividing the data set (X, y) after dimensionality reduction into a training set (X _ train, y _ train) and a testing set (X _ test, y _ test) according to the proportion of 7: 3.
4. XGboost training method based on cuckoo search algorithm trains XGboost
The XGboost comprises a plurality of hyper-parameters, and in order to further improve the prediction accuracy of the model, the optimal parameter set of the model is searched by adopting a cuckoo search algorithm. Referring to fig. 1, the specific implementation steps of training the XGBoost by using the XGBoost training method based on the cuckoo search algorithm are as follows:
the method comprises the following steps: determining the size n of the bird nest population to be 25 and the dimension d to be 9, and finding the probability PaThe upper and lower boundaries of the bird nest search space are shown in table 2, the maximum iteration number MaxN is 100, and the matrix representation of the bird nest population is shown in formula (1);
step two: randomly initializing bird nest positions in searchRandomly initializing bird nest positions in space according to chi*,j=random(lbj,ubj) Calculating a random initial value, wherein ubjAnd lbjThe upper and lower search bounds for the jth hyper-parametric variable to be optimized, respectively, and random () represents a random function that returns an interval [ lbj,ubj]A random number within;
step three: calculating the fitness function value of the bird Nest according to the set fitness function (1. precondition that XGboost parameter is set as the position value of the bird Nest in Nest, 2. input training model of training set, 3. input testing set into the trained model, calculate the classification Accuracy of the model),
reserving an optimal bird nest gt (namely a bird nest position vector with the maximum fitness value);
step four: and (3) updating the position of the bird nest by adopting Laiwei flight: randomly changing the position of the current bird nest by adopting the following formula so as to obtain a group of new bird nest positions, comparing the new bird nest positions with the old bird nest positions, and reserving the bird nest positions with larger adaptability values;
step five: discarding a small fraction of worse nests than creating new nests: circulating from the 1 st bird nest to the nth bird nest, and generating a random number r ∈ [0,1] which is subjected to uniform distribution in each circulation; and if r is greater than Pa, updating the position of the bird nest by adopting a formula (4), otherwise, not updating the position of the bird nest. When the circulation is finished, a group of new bird nest positions are obtained;
step six: calculating the fitness corresponding to the updated bird nest position, and reserving the locally optimal bird nest pt (namely, storing the bird nest position vector with the maximum fitness value in the current bird nest);
step seven: comparing the fitness value of pt with the fitness value of gt, and if the fitness value of pt is larger than gt, updating the global optimal gt;
step eight: judging whether the maximum iteration number is reached: and if not, returning to the step four to continue the loop iteration, otherwise, returning to the global optimal bird nest position gt.
Figure BDA0003542486880000041
Figure BDA0003542486880000042
Figure BDA0003542486880000043
Figure BDA0003542486880000044
TABLE 2 upper and lower bounds of the parameters
Parameter(s) Search scope
learning_rate [0.01,0.3]
n_estimators [10,2000]
max_depth [1,15]
min_child_weight [0,10]
gamma [0.01,10.0]
subsample [0.01,1.0]
colsample_bytree [0.01,1.0]
reg_alpha [0.01,1.0]
reg_lambda [0.01,1.0]
Table 3 optimal parameter set
Parameter(s) Optimal value
learning_rate 0.1457
n_estimators 85
max_depth 15
min_child_weight 0.019
gamma 0.0113
subsample 0.86916
colsample_bytree 1.0
reg_alpha 0.7277
reg_lambda 0.2664
5. Training the optimized XGboost model and carrying out model evaluation
Inputting a training set to train the optimized XGboost model, and measuring and evaluating the trained XGboost classification model by adopting Precision Accuracy, Precision, Recall and F1, wherein 4 index calculation modes are as follows:
Figure BDA0003542486880000051
Figure BDA0003542486880000052
Figure BDA0003542486880000053
Figure BDA0003542486880000054
where TP represents the number of samples for which the job separation was correctly predicted as separation, FP represents the number of samples for which the job separation was not incorrectly predicted as separation, TN represents the number of samples for which the job separation was incorrectly predicted as non-separation, and FN represents the number of samples for which the job separation was not correctly predicted as non-separation.
6. Performing staff outage prediction
And inputting the test set into a trained XGboost model for prediction to obtain a final prediction result.
7. Design of experiments
In order to verify the effectiveness of the method provided by the invention, two groups of comparison experiments are set, the first group respectively compares the XGboost original model XGB, the model RF-XGB adopting random forests for feature screening and 4 index (Accuracy, Precision, Recall and F1) evaluation results of the three models of the model RF-CS-XGB provided by the invention, and the comparison results are shown in Table 4; the second group compares the method RF-CS-XGB provided by the invention with the random forest RF-RF, the logistic regression RF-LR, the support vector machine RF-SVM, the gradient boosting decision tree RF-GBDT, the K neighbor algorithm RF-KNN and other common classification models which are only processed by the random forest feature selection method, and the experimental comparison result is shown in the table 5.
TABLE 4 results of the first comparative set of experiments
Model (model) Accuracy Precision Recall F1
XGB 97.40% 97.17% 91.53% 94.27%
RF-XGB 97.44% 97.27% 91.63% 94.37%
RF-CS-XGB 99.09% 99.22% 96.86% 98.03%
TABLE 5 second set of comparative experimental results
Model (model) Accuracy Precision Recall F1
RF-RF 99.04% 99.32% 96.57% 97.93%
RF-LR 76.60% 49.83% 27.40% 35.36%
RF-SVM 81.53% 92.31% 22.84% 36.61%
RF-GBDT 97.58% 97.38% 92.10% 94.67%
RF-KNN 95.62% 90.36% 90.96% 90.66%
RF-CS-XGB 99.09% 99.22% 96.86% 98.03%
The above embodiments describe in detail a specific implementation manner of the XGBoost training method based on cuckoo search algorithm and applied to the staff departure prediction, and the above embodiments only use the proposed method and core ideas to help understanding the present invention.

Claims (2)

1. A novel XGboost training method based on cuckoo search algorithm is characterized by comprising the following steps:
step 1: preprocessing an original data set, including normalization and feature dimension reduction, and dividing the processed data set into a training set and a test set according to a fixed proportion;
and 2, step: the XGboost is trained through an XGboost training method based on a cuckoo search algorithm;
and step 3: constructing XGboost according to a group of parameter values obtained by training;
and 4, step 4: the XGboost is constructed by adopting a test set test, and the model is comprehensively evaluated by adopting 4 model evaluation indexes of Accuracy, Precision, Recall and F1 score.
2. The new XGBoost training method based on cuckoo search algorithm as claimed in claim 1, wherein: the training of the XGBoost by the XGBoost training method based on the cuckoo search algorithm in step 2 specifically comprises:
step 2-1: determining the size n of the bird nest population; dimension d of the bird nest position; namely the number of parameters to be optimized in the XGboost; probability of discovery Pa(ii) a Upper and lower bounds of the bird's nest search space; the maximum number of iterations Max _ itex. The classification Accuracy predicted by the XGboost model is set as a fitness function of the bird Nest, and a matrix representation Nest of the bird Nest position and a corresponding fitness vector NF are represented as follows:
Figure FDA0003542486870000011
wherein: x is the number ofi,jRepresenting the jth dimension in the ith bird nest; n represents the number of bird nests; d represents the dimension of the bird nest, namely the number of the parameters of the XGboost to be optimized.
Figure FDA0003542486870000012
WhereinfiAnd (3) representing the fitness value corresponding to the ith bird nest, wherein n represents the number of the bird nests.
Step 2-2: randomly initializing bird nest position, and searching space S (S ═ lb, ub)]) Initializing the position of bird's nest according to x*,j=random(lbj,ubj) Calculating a random initial value, wherein ubjAnd lbjThe upper and lower search bounds for the jth hyper-parametric variable to be optimized, respectively, and random () represents a random function that returns an interval [ lbj,ubj]The random number in (c).
Step 2-3: calculating the adaptability value of the bird nest according to the classification Accuracy Accuracy of the XGboost, and reserving the optimal bird nest gt (namely the bird nest position vector with the maximum adaptability value);
step 2-4: and (3) updating the position of the bird nest by adopting Laiwei flight: randomly changing the position of the current bird nest by adopting the following formula so as to obtain a group of new bird nest positions, comparing the new bird nest positions with the old bird nest positions, and reserving the bird nest positions with larger adaptability values;
Figure FDA0003542486870000013
wherein: alpha is alpha>0 is the step size scaling factor, and L (lambda) represents the Levy flight function, i.e., L vy, u-t,(1<λ≤3)。
Step 2-5: discarding a small fraction of worse nests than creating new nests: circulating from the 1 st bird nest to the nth bird nest, and generating a random number r ∈ [0,1] which is subjected to uniform distribution in each circulation; if r is greater than Pa, the position of the bird nest is updated by adopting the following formula, otherwise, the position of the bird nest is not updated. And when the circulation is finished, obtaining a new group of bird nest positions.
Figure FDA0003542486870000014
Wherein XljAnd XkjFor randomly selected solutions, H (μ) is the Hervessed function, PaIs used for smoothingSwitching parameters of local and global random walk are balanced, s is a step length, and epsilon is a uniformly distributed random number;
step 2-6: calculating the fitness corresponding to the updated bird nest position, and reserving the local optimal bird nest pt (namely, the position of the bird nest with the maximum fitness value in the current bird nest is saved);
step 2-7: comparing the fitness value of pt with that of gt, and if the fitness value of pt is greater than that of gt, updating the global optimal gt;
step 2-8: judging whether the maximum iteration number is reached: and if not, returning to 2-4 to continue the cycle iteration, otherwise, returning to the global optimal bird nest position gt.
CN202210236632.2A 2022-03-11 2022-03-11 XGboost training method based on cuckoo search algorithm Pending CN114611719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210236632.2A CN114611719A (en) 2022-03-11 2022-03-11 XGboost training method based on cuckoo search algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210236632.2A CN114611719A (en) 2022-03-11 2022-03-11 XGboost training method based on cuckoo search algorithm

Publications (1)

Publication Number Publication Date
CN114611719A true CN114611719A (en) 2022-06-10

Family

ID=81862317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210236632.2A Pending CN114611719A (en) 2022-03-11 2022-03-11 XGboost training method based on cuckoo search algorithm

Country Status (1)

Country Link
CN (1) CN114611719A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115406882A (en) * 2022-10-31 2022-11-29 常州安控电器成套设备有限公司 GBDT and improved MFO-based water quality pollutant detection method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115406882A (en) * 2022-10-31 2022-11-29 常州安控电器成套设备有限公司 GBDT and improved MFO-based water quality pollutant detection method

Similar Documents

Publication Publication Date Title
US10713597B2 (en) Systems and methods for preparing data for use by machine learning algorithms
CN108920556B (en) Expert recommending method based on discipline knowledge graph
CN108985335B (en) Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
US11366806B2 (en) Automated feature generation for machine learning application
Casalino et al. Incremental adaptive semi-supervised fuzzy clustering for data stream classification
US20220277188A1 (en) Systems and methods for classifying data sets using corresponding neural networks
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
Martínez-Ballesteros et al. Improving a multi-objective evolutionary algorithm to discover quantitative association rules
CN111309577B (en) Spark-oriented batch application execution time prediction model construction method
Santhosh et al. Generalized fuzzy logic based performance prediction in data mining
CN114611719A (en) XGboost training method based on cuckoo search algorithm
CN110968693A (en) Multi-label text classification calculation method based on ensemble learning
CN110175631A (en) A kind of multiple view clustering method based on common Learning Subspaces structure and cluster oriental matrix
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
CN113469288A (en) High-risk personnel early warning method integrating multiple machine learning algorithms
CN114385808A (en) Text classification model construction method and text classification method
CN116913394A (en) Cell type annotation method based on single cell transcriptome data
Ibrahım WBBA-KM: a hybrid weight-based bat algorithm with K-means algorithm for cluster analysis
CN111832645A (en) Classification data feature selection method based on discrete crow difference collaborative search algorithm
CN116756373A (en) Project review expert screening method, system and medium based on knowledge graph update
Zhao et al. Rfe based feature selection improves performance of classifying multiple-causes deaths in colorectal cancer
KR101085066B1 (en) An Associative Classification Method for detecting useful knowledge from huge multi-attributes dataset
CN115344386A (en) Method, device and equipment for predicting cloud simulation computing resources based on sequencing learning
Li et al. Parameters optimization of back propagation neural network based on memetic algorithm coupled with genetic algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination