CN111639695B - Method and system for classifying data based on improved drosophila optimization algorithm - Google Patents

Method and system for classifying data based on improved drosophila optimization algorithm Download PDF

Info

Publication number
CN111639695B
CN111639695B CN202010454464.5A CN202010454464A CN111639695B CN 111639695 B CN111639695 B CN 111639695B CN 202010454464 A CN202010454464 A CN 202010454464A CN 111639695 B CN111639695 B CN 111639695B
Authority
CN
China
Prior art keywords
data
drosophila
optimal
pos
max
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010454464.5A
Other languages
Chinese (zh)
Other versions
CN111639695A (en
Inventor
汪鹏君
范毅
陈慧灵
施一剑
管晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN202010454464.5A priority Critical patent/CN111639695B/en
Publication of CN111639695A publication Critical patent/CN111639695A/en
Application granted granted Critical
Publication of CN111639695B publication Critical patent/CN111639695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a method and a system for classifying data based on an improved drosophila optimization algorithm. The method comprises the steps of (1) preprocessing acquired historical data; (2) The history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine); (3) Constructing a prediction model according to parameters obtained by algorithm optimization; (4) And classifying the data to be detected by adopting the newly constructed prediction model. By implementing the method, the problems of sinking into a local optimal solution, insufficient convergence precision and the like of a drosophila optimization algorithm can be solved, the problems in the specific field can be classified and predicted, and the decision precision of a prediction model is improved.

Description

Method and system for classifying data based on improved drosophila optimization algorithm
Technical Field
The invention relates to the technical field of big data, in particular to a method and a system for classifying data based on an improved drosophila optimization algorithm.
Background
Along with the development of technology, the field of big data application is also wider and wider, so that new challenges are presented to the processing of big data classification, prediction and the like, and especially the group intelligent optimization algorithm is used for the big data classification and prediction.
It is well known that the group intelligent optimization algorithm achieves the purpose of optimizing by simulating the group intelligent behaviors represented by various biological and non-living systems in nature and utilizing mutual collaboration and communication among individuals in the group. These swarm intelligence algorithms are well known: ant colony algorithm, particle swarm algorithm, artificial bee colony algorithm, etc.
Pan et al in 2012 proposed a Drosophila optimization algorithm (FOA) whose inspiration derives from the foraging behavior of Drosophila. Fruit flies are known to smell close to food sources and to visually determine the location of the food sources. The random search mechanism has great influence on the optimization efficiency of the FOA algorithm, and the algorithm is easy to be trapped into the problems of local optimization and the like. Thus, in view of the above problems, the drosophila algorithm is improved from the viewpoint of finding new search mechanisms and by combining the mixed distribution manner to optimize the accuracy of the prediction model.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a system for classifying data based on an improved drosophila optimization algorithm, which can solve the problems of the drosophila optimization algorithm, such as a local optimal solution, low convergence speed and the like, realize classification and prediction of the problems in the specific field and improve the decision accuracy.
In order to solve the above technical problems, the embodiment of the present invention provides a method for optimizing prediction model parameters by improving a drosophila optimization algorithm, thereby classifying data, comprising the following steps:
step S1: preprocessing the acquired historical data;
step S2: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);
step S3: constructing a prediction model according to parameters obtained by algorithm optimization;
step S4: and classifying the data to be detected by adopting the newly constructed prediction model.
The step S2 specifically includes:
step S2.1: initializing parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
Step S2.2: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S i,j =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (1)
Y i,1 =C min +w×(C max -C min ) (2)
X i,2 =γ min +w×(γ maxmin ) (3)
Y i,2 =γ min +w×(γ maxmin ) (4)
Where w is a random number subject to uniform distribution between [0,1 ].
Step S2.3: and saving the current optimal population position and the fitness value. Taking the preprocessed historical data as a training sample of a prediction model, and calculating the adaptation of all current fruit fly positionsA degree value (F), and a best fit degree value (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
Step S2.4: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position. And (3) calculating by using the formulas (5) and (10) to obtain a new population position, selecting whether mutation occurs by using the formula (12) or not by using the formula (11), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (6)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (7)
X i,j t+1 =X 1,j t +vx i,j t+1 (8)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (9)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)
r i =r i ×(1-exp((-0.9)×t)) (11)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (12)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
Step S2.5: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise go to step S2.3.
The "prediction model" in the step S3 is represented by the formulaTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
The embodiment of the invention also provides another method for optimizing prediction model parameters by improving a Drosophila optimization algorithm so as to classify data, which comprises the following steps of:
step S001: preprocessing the acquired historical data;
step S002: and determining a drosophila optimization algorithm and initializing necessary parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
Step S003: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (a) - (e) to obtain initial position S i,j =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (a)
Y i,1 =C min +w×(C max -C min ) (b)
X i,2 =γ min +w×(γ maxmin ) (c)
Y i,2 =γ min +w×(γ maxmin ) (d)
Where w is a random number subject to uniform distribution between [0,1 ].
Step S004: taking the preprocessed historical data as a training sample of a prediction model, calculating fitness values (F) of all current fruit fly positions, and taking the optimal fitness values (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
Step S005: and updating the position of the next generation drosophila by using formulas (f) - (i) according to the current optimal population position. And (3) calculating to obtain a new population position by using the formulas (e) and (j), selecting whether mutation occurs in the mode of the formula (l) by using the formula (k), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (f)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (g)
X i,j t+1 =X 1,j t +vx i,j t+1 (h)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (i)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (j)
r i =r i ×(1-exp((-0.9)×t)) (k)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (l)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
Step S006: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise, go to step S004.
Step S007: constructing a prediction model according to parameters obtained by algorithm optimization;
step S008: and classifying the data to be detected by adopting the newly constructed prediction model.
The "predictive model" in the step S007 is represented by the formulaTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
The embodiment of the invention also provides a system for classifying data by optimizing prediction model parameters by adopting an improved drosophila optimization algorithm, which comprises the following steps:
and a data acquisition and processing unit: preprocessing the acquired historical data;
model parameter improvement unit: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);
model reconstruction unit: constructing a prediction model according to parameters obtained by algorithm optimization;
data classification prediction unit: and classifying the data to be detected by adopting the newly constructed prediction model.
The model parameter improvement unit specifically comprises:
a first initialization module for initializing necessary parameters including maximum iteration times (T), drosophila population number (N), and definition range of punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
A second initialization module for random generationInitial position of drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S i,j =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (1)
Y i,1 =C min +w×(C max -C min ) (2)
X i,2 =γ min +w×(γ maxmin ) (3)
Y i,2 =γ min +w×(γ maxmin ) (4)
Where w is a random number subject to uniform distribution between [0,1 ].
The calculation module is used for calculating fitness values (F) of all current fruit fly positions by taking the preprocessed historical data as training samples of a prediction model, and calculating optimal fitness values (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
And the updating module is used for iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position. And (3) calculating by using the formulas (5) and (10) to obtain a new population position, selecting whether mutation occurs by using the formula (12) or not by using the formula (11), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (6)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (7)
X i,j t+1 =X 1,j t +vx i,j t+1 (8)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (9)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)
r i =r i ×(1-exp((-0.9)×t)) (11)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (12)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
A judging module for judging whether the maximum iteration times (T) are reached;
and the output module outputs the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma).
The predictive model is formulated byTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
The embodiment of the invention has the following beneficial effects:
according to the method, a new search mechanism and a mixed distribution strategy are introduced into a drosophila optimization algorithm, so that the search capability of the algorithm is improved, the situation that a local optimal solution is trapped is avoided, and the method is further used for optimizing parameters of models such as a support vector machine, a kernel extreme learning machine and the like, and an optimal machine learning model is built, so that the classification and prediction of the problems in the specific field are realized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that it is within the scope of the invention to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
FIG. 1 is a flow chart of a method for optimizing predictive model parameters to classify data by improving a Drosophila optimization algorithm in accordance with an embodiment of the present invention;
FIG. 2 is a flow chart of another method for optimizing predictive model parameters to classify data provided by an embodiment of the invention that improves the Drosophila optimization algorithm;
fig. 3 is a schematic structural diagram of a system for optimizing prediction model parameters to classify data by improving a drosophila optimization algorithm according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, in an embodiment of the present invention, a method for optimizing prediction model parameters by improving a drosophila optimization algorithm is provided, which includes the following steps:
step S1: preprocessing the acquired historical data;
the specific process is that historical data related to a problem to be researched is obtained, normalized and classified, wherein the historical data is normalized by using a formula (1);
wherein the classified attributes include data attributes and category attributes.
For example, taking data distinguished by benign and malignant thyroid nodules based on ultrasound features as an example, the data attribute values fall into two major categories, namely data attribute X 1 -X 8 Representing ultrasound properties, X, for benign and malignant thyroid nodule disease 9 The class of the data sample is represented: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: the value is-1.
For example, taking enterprise bankruptcy risk prediction data as an example, a single sample attribute value is divided into two main classes, namely, data attributes have X 1 -X n Such related financial index as liability rate, total amount of assets, etc., then X n+1 Representing category labels: namely, whether the enterprise has the risk of cracking in two years, if the risk of cracking is 1, the risk of cracking is not-1.
Step S2: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);
the specific process is that the penalty coefficient (C) and the kernel width (gamma) of a predictive support vector machine are optimized by utilizing an improved drosophila algorithm;
step S2.1: initializing parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
Step S2.2: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (2) - (6) to obtain initial position S i,j =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (2)
Y i,1 =C min +w×(C max -C min ) (3)
X i,2 =γ min +w×(γ maxmin ) (4)
Y i,2 =γ min +w×(γ maxmin ) (5)
Where w is a random number subject to uniform distribution between [0,1 ].
Step S2.3: and saving the current optimal population position and the fitness value. Taking the preprocessed historical data as a training sample of a prediction model, calculating fitness values (F) of all current fruit fly positions, and taking the optimal fitness values (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
Step S2.4: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (7) - (10) according to the current optimal population position. And (3) calculating by using the formulas (6) and (11) to obtain a new population position, selecting whether mutation occurs in the mode of the formula (13) or not by using the formula (12), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (7)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (8)
X i,j t+1 =X 1,j t +vx i,j t+1 (9)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (10)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (11)
r i =r i ×(1-exp((-0.9)×t)) (12)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (13)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
Step S2.5: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise go to step S2.3.
Step S3: constructing a prediction model according to parameters obtained by algorithm optimization;
the specific process is that through the formulaTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
Step S4: and classifying the data to be detected by adopting the newly constructed prediction model.
As shown in FIG. 2, another method for optimizing predictive model parameters to classify data by improving a Drosophila optimization algorithm according to an embodiment of the invention comprises the following steps:
step S001: preprocessing the acquired historical data;
the specific process is that historical data related to a problem to be researched is obtained, normalized and classified, wherein the historical data is normalized by using a formula (a);
wherein the classified attributes include data attributes and category attributes.
For example, taking data distinguished by benign and malignant thyroid nodules based on ultrasound features as an example, the data attribute values fall into two major categories, namely data attribute X 1 -X 8 Representing ultrasound properties, X, for benign and malignant thyroid nodule disease 9 The class of the data sample is represented: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: the value is-1.
For example, taking enterprise bankruptcy risk prediction data as an example, a single sample attribute value is divided into two main classes, namely, data attributes have X 1 -X n Such related financial index as liability rate, total amount of assets, etc., then X n+1 Representing category labels: namely, whether the enterprise has the risk of cracking in two years, if the risk of cracking is 1, the risk of cracking is not-1.
Step S002: and determining a drosophila optimization algorithm and initializing necessary parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
Step S003: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into a defined range by adopting formulas (b) - (f) to obtain an initial position S i,j =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (b)
Y i,1 =C min +w×(C max -C min ) (c)
X i,2 =γ min +w×(γ maxmin ) (d)
Y i,2 =γ min +w×(γ maxmin ) (e)
Where w is a random number subject to uniform distribution between [0,1 ].
Step S004: taking the preprocessed historical data as a training sample of a prediction model, calculating fitness values (F) of all current fruit fly positions, and taking the optimal fitness values (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
Step S005: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (g) - (j) according to the current optimal population position. And (3) calculating to obtain a new population position by using the formulas (f) and (k), selecting whether mutation occurs in the mode of the formula (m) by using the formula (l), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (g)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (h)
X i,j t+1 =X 1,j t +vx i,j t+1 (i)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (j)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (k)
r i =r i ×(1-exp((-0.9)×t)) (l)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (m)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
Step S006: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise, go to step S004.
Step S007: constructing a prediction model according to parameters obtained by algorithm optimization;
step S008: and classifying the data to be detected by adopting the newly constructed prediction model.
The "predictive model" in the step S007 is represented by the formulaTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
As shown in fig. 3, in an embodiment of the present invention, a system for optimizing prediction model parameters to classify data by improving a drosophila optimization algorithm is provided, which includes:
the data acquisition and processing unit 110: preprocessing the acquired historical data;
model parameter improvement unit 120: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);
model reconstruction unit 130: constructing a prediction model according to parameters obtained by algorithm optimization;
the data classification prediction unit 140: and classifying the data to be detected by adopting the newly constructed prediction model.
The model parameter improving unit 120 specifically includes:
a first initialization module for initializing necessary parameters includingMaximum iteration number (T), drosophila population number (N), and punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
And the second initialization module is used for randomly generating initial positions of the Drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S i =(S i,1 ,S i,2 ),(i=1,2,…,N;j=1,2);
X i,1 =C min +w×(C max -C min ) (1)
Y i,1 =C min +w×(C max -C min ) (2)
X i,2 =γ min +w×(γ maxmin ) (3)
Y i,2 =γ min +w×(γ maxmin ) (4)
Where w is a random number subject to uniform distribution between [0,1 ].
The calculation module is used for calculating fitness values (F) of all current fruit fly positions by taking the preprocessed historical data as training samples of a prediction model, and calculating optimal fitness values (F best ) And corresponding location save (X pos ,Y pos ,S pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.
And the updating module is used for iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position. And (3) calculating by using the formulas (5) and (10) to obtain a new population position, selecting whether mutation occurs by using the formula (12) or not by using the formula (11), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (6)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (7)
X i,j t+1 =X 1,j t +vx i,j t+1 (8)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (9)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)
r i =r i ×(1-exp((-0.9)×t)) (11)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (12)
where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.
The judging module is used for judging that the maximum iteration times (T) are reached;
and the output module outputs the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma).
The predictive model is formulated byTo realize the method. Wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r‖x i -x j2 )。
The embodiment of the invention has the following beneficial effects:
it should be noted that, in the above system embodiment, each included system unit is only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (2)

1. A method for classifying data based on an improved drosophila optimization algorithm, the method being performed by a program for instructing associated hardware, said program being stored on a computer readable storage medium, comprising the steps of:
step S1: normalizing and classifying the acquired historical data, wherein the classified attributes comprise data attributes and category attributes;
the historical data are data distinguished by benign and malignant thyroid nodules based on ultrasonic characteristics, the data attribute values are divided into two main categories, namely, the data attributes X1-X8 represent ultrasonic attributes aiming at benign and malignant thyroid nodule diseases, and X9 represents the category of the data: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: a value of-1;
step S2: the preprocessed historical data is used as training samples, and the improved drosophila optimization algorithm is utilized to optimize parameters of a prediction model, wherein the prediction model is a support vector machine, and the parameters comprise penalty coefficients and kernel widths;
step S3: constructing a prediction model according to the optimized parameters of the support vector machine;
step S4: classifying the sample to be detected by adopting a newly constructed prediction model;
the step S2 specifically includes:
step S2.1: initializing parameters, and defining the maximum iteration times T, the number N of Drosophila populations and punishment coefficient C min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
Step S2.2: randomly generating initial positions of Drosophila populations, and mapping the Drosophila populations into a defined range by adopting formulas (1) - (5) to obtain initial positions S i,j =S i,1 ,S i,2 ),i=1,2,…,N;j=1,2;
X i,1 =C min +w×(C max -C min ) (1)
Y i,1 =C min +w×(C max -C min ) (2)
X i,2 =γ min +w×(γ maxmin ) (3)
Y i,2 =γ min +w×(γ maxmin ) (4)
Wherein w is a random number subject to uniform distribution between [0,1 ];
step S2.3: storing fitness value of current optimal population position, taking preprocessed historical data as training sample of prediction model, calculating fitness value F of all current drosophila position, and using optimal fitness value F best And corresponding position preservation X pos ,Y pos ,S pos Verifying the accuracy ACC of the model by using an internal K-fold intersection strategy;
step S2.4: iteratively searching for updating the population position, updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position, calculating to obtain a new population position by using formulas (5) - (10), selecting whether mutation occurs by using formula (12) or not by using formula (11), finally calculating a fitness value, judging whether the fitness value is better than the optimal fitness value of the previous generation, if so, replacing the updated optimal position, otherwise, continuously maintaining;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (6)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (7)
X i,j t+1 =X 1,j t +vx i,j t+1 (8)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (9)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)
r i =r i ×(1-exp((-0.9)×t)) (11)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (12)
wherein L is an adaptive dynamic variable related to the iteration number, B, M is a random value conforming to uniform distribution, k is an integer value conforming to uniform distribution, t is the current iteration number, gauss is a random value conforming to Gaussian distribution, and Cauchy is a random value conforming to Cauchy distribution;
step S2.5: judging whether the maximum iteration times T are reached, if so, outputting the optimal drosophila position to obtain an optimal punishment coefficient C and a kernel width gamma; otherwise, turning to step S2.3;
the prediction model in the step S3 is represented by a formulaTo realize, wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i Is LagrangianCoefficient, K (x i ,x j )=exp(-r||x i -x j || 2 )。
2. A system for optimizing predictive model parameters to classify data based on an improved drosophila optimization algorithm, the system being implemented by programming instructions associated with hardware, said programming stored on a computer readable storage medium comprising:
a data acquisition unit: the method comprises the steps of acquiring historical data, carrying out normalization processing and classification, wherein the classified attributes comprise data attributes and category attributes;
the historical data are data distinguished by benign and malignant thyroid nodules based on ultrasonic characteristics, the data attribute values are divided into two main categories, namely, the data attributes X1-X8 represent ultrasonic attributes aiming at benign and malignant thyroid nodule diseases, and X9 represents the category of the data: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: a value of-1;
model parameter improvement unit: the history data after preprocessing is used as training samples and the punishment coefficient and the kernel width of the support vector machine are optimized by utilizing an improved drosophila optimization algorithm;
model reconstruction unit: constructing a prediction model according to parameters obtained by algorithm optimization;
data classification prediction unit: classifying the sample to be detected by adopting a newly constructed prediction model;
the model parameter improvement unit specifically comprises:
a first initialization module for initializing necessary parameters including maximum iteration times (T), drosophila population number (N), and definition range of punishment coefficient (C) min ,C max ]Definition range of core width gamma [ gamma ] minmax ];
The second initialization module randomly generates initial positions of the Drosophila population, maps the Drosophila population into a defined range by adopting formulas (1) - (5) to obtain initial positions S i,j =S i,1 ,S i,2 ),i=1,2,…,N;j=1,2;
X i,1 =C min +w×(C max -C min ) (1)
Y i,1 =C min +w×(C max -C min ) (2)
X i,2 =γ min +w×(γ maxmin ) (3)
Y i,2 =γ min +w×(γ maxmin ) (4)
Wherein w is a random number subject to uniform distribution between [0,1 ];
the calculation module takes the preprocessed historical data as a training sample of a prediction model, calculates fitness values F of all current fruit fly positions, and takes the optimal fitness value F best And corresponding position preservation X pos ,Y pos ,S pos Verifying the accuracy ACC of the model by using an internal K-fold intersection strategy;
the updating module is used for iteratively searching for updating the population position, updating the position of the next generation drosophila in a mode of formulas (6) - (9) according to the current optimal population position, calculating to obtain a new population position by utilizing formulas (5) and (10), selecting whether mutation occurs in a mode of formula (12) or not by utilizing a formula (11), finally calculating an fitness value, judging whether the fitness value is better than the optimal fitness value of the previous generation, replacing the updated optimal position if the fitness value is better, and otherwise, continuously maintaining;
vx i,j t+1 =vx i,j t +(X pos (j)-X 1,j t )×(-2×M) (6)
vy i,j t+1 =vy i,j t +(Y pos (j)-Y 1,j t )×(-2×M) (7)
X i,j t+1 =X 1,j t +vx i,j t+1 (8)
Y i,j t+1 =Y 1,j t +vy i,j t+1 (9)
S i,j t+1 =S i,j t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)
r i =r i ×(1-exp((-0.9)×t)) (11)
S i,j t+1 =S pos (k)+0.9×A i ×(2×B-1) (12)
wherein L is an adaptive dynamic variable related to the iteration number, B, M is a random value conforming to uniform distribution, k is an integer value conforming to uniform distribution, t is the current iteration number, gauss is a random value conforming to Gaussian distribution, and Cauchy is a random value conforming to Cauchy distribution;
the judging module is used for judging whether the maximum iteration times T are reached or not;
the output module outputs the optimal drosophila position to obtain an optimal punishment coefficient C and a kernel width gamma;
the predictive model is formulated byTo realize, wherein x is j For the sample to be tested, x i For the training sample, y i The labels corresponding to the classification of the training samples are b preset threshold values, a i For Lagrangian coefficient, K (x i ,x j )=exp(-r||x i -x j || 2 )。
CN202010454464.5A 2020-05-26 2020-05-26 Method and system for classifying data based on improved drosophila optimization algorithm Active CN111639695B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010454464.5A CN111639695B (en) 2020-05-26 2020-05-26 Method and system for classifying data based on improved drosophila optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010454464.5A CN111639695B (en) 2020-05-26 2020-05-26 Method and system for classifying data based on improved drosophila optimization algorithm

Publications (2)

Publication Number Publication Date
CN111639695A CN111639695A (en) 2020-09-08
CN111639695B true CN111639695B (en) 2024-02-20

Family

ID=72329361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010454464.5A Active CN111639695B (en) 2020-05-26 2020-05-26 Method and system for classifying data based on improved drosophila optimization algorithm

Country Status (1)

Country Link
CN (1) CN111639695B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222172A (en) * 2021-04-21 2021-08-06 深圳供电局有限公司 Electric shock early warning method and device based on fruit fly algorithm
CN114154679B (en) * 2021-10-22 2024-01-26 南京华盾电力信息安全测评有限公司 Spark-based PCFOA-KELM wind power prediction method and device
CN116842454B (en) * 2023-06-06 2024-04-30 南京财经大学 Financial asset classification method and system based on support vector machine algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544548A (en) * 2013-10-31 2014-01-29 辽宁工程技术大学 Method for predicting height of mine water flowing fractured zone
CN106679880A (en) * 2016-12-21 2017-05-17 华南理工大学 Pressure sensor temperature compensating method based on FOA-optimized SOM-RBF
CN107908688A (en) * 2017-10-31 2018-04-13 温州大学 A kind of data classification Forecasting Methodology and system based on improvement grey wolf optimization algorithm
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN109116833A (en) * 2018-08-31 2019-01-01 重庆邮电大学 Based on improvement drosophila-bat algorithm mechanical failure diagnostic method
CN109948675A (en) * 2019-03-05 2019-06-28 温州大学 The method for constructing prediction model based on outpost's mechanism drosophila optimization algorithm on multiple populations

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017197626A1 (en) * 2016-05-19 2017-11-23 江南大学 Extreme learning machine method for improving artificial bee colony optimization

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544548A (en) * 2013-10-31 2014-01-29 辽宁工程技术大学 Method for predicting height of mine water flowing fractured zone
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN106679880A (en) * 2016-12-21 2017-05-17 华南理工大学 Pressure sensor temperature compensating method based on FOA-optimized SOM-RBF
CN107908688A (en) * 2017-10-31 2018-04-13 温州大学 A kind of data classification Forecasting Methodology and system based on improvement grey wolf optimization algorithm
CN109116833A (en) * 2018-08-31 2019-01-01 重庆邮电大学 Based on improvement drosophila-bat algorithm mechanical failure diagnostic method
CN109948675A (en) * 2019-03-05 2019-06-28 温州大学 The method for constructing prediction model based on outpost's mechanism drosophila optimization algorithm on multiple populations

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features;Hui Huang 等;《International Conference on Data Science, Medicine and Bioinformatics》;第1-14页 *
一种基于lévy飞行轨迹的果蝇优化算法;郭德龙;杨楠;周永权;;计算机与数字工程(第02期);第113-119页 *
基于柯西变异的果蝇优化算法;韩旭明;邱兵;刘翘铭;周丽媛;王丽敏;;微电子学与计算机(第11期);第32-36页 *
基于柯西-高斯动态消减变异的果蝇优化算法研究;杜晓昕 等;《计算机工程与科学》;第第38卷卷(第第06期期);第1171-1176页 *
基于混合变异的果蝇优化算法;郭德龙;罗晓宾;周永权;;数学的实践与认识(第12期);第167-175页 *
蝙蝠优化算法在边坡可靠性分析中的应用;贺子光;吴博;赵法锁;程振全;汪班桥;段钊;;灾害学(第03期);第31-39页 *
贺子光 ; 吴博 ; 赵法锁 ; 程振全 ; 汪班桥 ; 段钊 ; .蝙蝠优化算法在边坡可靠性分析中的应用.灾害学.2016,(第03期),第31-39页. *

Also Published As

Publication number Publication date
CN111639695A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
CN111639695B (en) Method and system for classifying data based on improved drosophila optimization algorithm
CN106023195B (en) BP neural network image partition method and device based on self-adapted genetic algorithm
US20200372400A1 (en) Tree alternating optimization for learning classification trees
CN107330902B (en) Chaotic genetic BP neural network image segmentation method based on Arnold transformation
CN112488283B (en) Improved multi-objective gray wolf optimization algorithm implementation method
CN110188358A (en) The training method and device of Natural Language Processing Models
KR20210030063A (en) System and method for constructing a generative adversarial network model for image classification based on semi-supervised learning
US11334791B2 (en) Learning to search deep network architectures
Abd-Alsabour A review on evolutionary feature selection
CN111105045A (en) Method for constructing prediction model based on improved locust optimization algorithm
CN111046178B (en) Text sequence generation method and system
Zhang et al. Evolving neural network classifiers and feature subset using artificial fish swarm
CN111459988A (en) Method for automatic design of machine learning assembly line
CN115687610A (en) Text intention classification model training method, recognition device, electronic equipment and storage medium
Mu et al. Auto-CASH: A meta-learning embedding approach for autonomous classification algorithm selection
JP3896868B2 (en) Pattern feature selection method, classification method, determination method, program, and apparatus
CN112364980B (en) Deep neural network training method based on reinforcement learning under weak supervision scene
Klein et al. Synthetic data at scale: A paradigm to efficiently leverage machine learning in agriculture
CN115599918B (en) Graph enhancement-based mutual learning text classification method and system
KR102518825B1 (en) Reinforcement learning system for self-development
CN116451859A (en) Bayesian optimization-based stock prediction method for generating countermeasure network
CN112115969B (en) Method and device for optimizing FKNN model parameters based on variant sea squirt swarm algorithm
Zhang et al. A YOLOv7 incorporating the Adan optimizer based corn pests identification method
Raximov et al. The importance of loss function in artificial intelligence
Saini et al. Image compression using APSO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant