CN111639695B

CN111639695B - Method and system for classifying data based on improved drosophila optimization algorithm

Info

Publication number: CN111639695B
Application number: CN202010454464.5A
Authority: CN
Inventors: 汪鹏君; 范毅; 陈慧灵; 施一剑; 管晓春
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2024-02-20
Anticipated expiration: 2040-05-26
Also published as: CN111639695A

Abstract

The embodiment of the invention discloses a method and a system for classifying data based on an improved drosophila optimization algorithm. The method comprises the steps of (1) preprocessing acquired historical data; (2) The history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine); (3) Constructing a prediction model according to parameters obtained by algorithm optimization; (4) And classifying the data to be detected by adopting the newly constructed prediction model. By implementing the method, the problems of sinking into a local optimal solution, insufficient convergence precision and the like of a drosophila optimization algorithm can be solved, the problems in the specific field can be classified and predicted, and the decision precision of a prediction model is improved.

Description

Method and system for classifying data based on improved drosophila optimization algorithm

Technical Field

The invention relates to the technical field of big data, in particular to a method and a system for classifying data based on an improved drosophila optimization algorithm.

Background

Along with the development of technology, the field of big data application is also wider and wider, so that new challenges are presented to the processing of big data classification, prediction and the like, and especially the group intelligent optimization algorithm is used for the big data classification and prediction.

It is well known that the group intelligent optimization algorithm achieves the purpose of optimizing by simulating the group intelligent behaviors represented by various biological and non-living systems in nature and utilizing mutual collaboration and communication among individuals in the group. These swarm intelligence algorithms are well known: ant colony algorithm, particle swarm algorithm, artificial bee colony algorithm, etc.

Pan et al in 2012 proposed a Drosophila optimization algorithm (FOA) whose inspiration derives from the foraging behavior of Drosophila. Fruit flies are known to smell close to food sources and to visually determine the location of the food sources. The random search mechanism has great influence on the optimization efficiency of the FOA algorithm, and the algorithm is easy to be trapped into the problems of local optimization and the like. Thus, in view of the above problems, the drosophila algorithm is improved from the viewpoint of finding new search mechanisms and by combining the mixed distribution manner to optimize the accuracy of the prediction model.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a system for classifying data based on an improved drosophila optimization algorithm, which can solve the problems of the drosophila optimization algorithm, such as a local optimal solution, low convergence speed and the like, realize classification and prediction of the problems in the specific field and improve the decision accuracy.

In order to solve the above technical problems, the embodiment of the present invention provides a method for optimizing prediction model parameters by improving a drosophila optimization algorithm, thereby classifying data, comprising the following steps:

step S1: preprocessing the acquired historical data;

step S2: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);

step S3: constructing a prediction model according to parameters obtained by algorithm optimization;

step S4: and classifying the data to be detected by adopting the newly constructed prediction model.

The step S2 specifically includes:

step S2.1: initializing parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) _min ,C _max ]Definition range of core width gamma [ gamma ] _min ,γ _max ]；

Step S2.2: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S _i,j ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (1)

Y _i,1 ＝C _min +w×(C _max -C _min ) (2)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (3)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Where w is a random number subject to uniform distribution between [0,1 ].

Step S2.3: and saving the current optimal population position and the fitness value. Taking the preprocessed historical data as a training sample of a prediction model, and calculating the adaptation of all current fruit fly positionsA degree value (F), and a best fit degree value (F _best ) And corresponding location save (X _pos ,Y _pos ,S _pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.

Step S2.4: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position. And (3) calculating by using the formulas (5) and (10) to obtain a new population position, selecting whether mutation occurs by using the formula (12) or not by using the formula (11), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (6)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (7)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (8)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (9)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)

r _i ＝r _i ×(1-exp((-0.9)×t)) (11)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (12)

where L is an adaptive dynamic variable related to the number of iterations, B, M is a random value subject to a uniform distribution, k is an integer value subject to a uniform distribution, t is the current number of iterations, gauss is a random value subject to a gaussian distribution, and Cauchy is a random value subject to a Cauchy distribution.

Step S2.5: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise go to step S2.3.

The "prediction model" in the step S3 is represented by the formulaTo realize the method. Wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i For Lagrangian coefficient, K (x _i ,x _j )＝exp(-r‖x _i -x _j ‖ ² )。

The embodiment of the invention also provides another method for optimizing prediction model parameters by improving a Drosophila optimization algorithm so as to classify data, which comprises the following steps of:

step S001: preprocessing the acquired historical data;

step S002: and determining a drosophila optimization algorithm and initializing necessary parameters. Maximum iteration number (T), drosophila population number (N), and punishment coefficient (C) _min ,C _max ]Definition range of core width gamma [ gamma ] _min ,γ _max ]；

Step S003: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (a) - (e) to obtain initial position S _i,j ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (a)

Y _i,1 ＝C _min +w×(C _max -C _min ) (b)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (c)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (d)

Where w is a random number subject to uniform distribution between [0,1 ].

Step S004: taking the preprocessed historical data as a training sample of a prediction model, calculating fitness values (F) of all current fruit fly positions, and taking the optimal fitness values (F _best ) And corresponding location save (X _pos ,Y _pos ,S _pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.

Step S005: and updating the position of the next generation drosophila by using formulas (f) - (i) according to the current optimal population position. And (3) calculating to obtain a new population position by using the formulas (e) and (j), selecting whether mutation occurs in the mode of the formula (l) by using the formula (k), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (f)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (g)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (h)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (i)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (j)

r _i ＝r _i ×(1-exp((-0.9)×t)) (k)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (l)

Step S006: judging whether the maximum iteration times (T) are reached, if so, outputting the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma); otherwise, go to step S004.

Step S007: constructing a prediction model according to parameters obtained by algorithm optimization;

step S008: and classifying the data to be detected by adopting the newly constructed prediction model.

The "predictive model" in the step S007 is represented by the formulaTo realize the method. Wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i For Lagrangian coefficient, K (x _i ,x _j )＝exp(-r‖x _i -x _j ‖ ² )。

The embodiment of the invention also provides a system for classifying data by optimizing prediction model parameters by adopting an improved drosophila optimization algorithm, which comprises the following steps:

and a data acquisition and processing unit: preprocessing the acquired historical data;

model parameter improvement unit: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);

model reconstruction unit: constructing a prediction model according to parameters obtained by algorithm optimization;

data classification prediction unit: and classifying the data to be detected by adopting the newly constructed prediction model.

The model parameter improvement unit specifically comprises:

a first initialization module for initializing necessary parameters including maximum iteration times (T), drosophila population number (N), and definition range of punishment coefficient (C) _min ,C _max ]Definition range of core width gamma [ gamma ] _min ,γ _max ]；

A second initialization module for random generationInitial position of drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S _i,j ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (1)

Y _i,1 ＝C _min +w×(C _max -C _min ) (2)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (3)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Where w is a random number subject to uniform distribution between [0,1 ].

The calculation module is used for calculating fitness values (F) of all current fruit fly positions by taking the preprocessed historical data as training samples of a prediction model, and calculating optimal fitness values (F _best ) And corresponding location save (X _pos ,Y _pos ,S _pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.

And the updating module is used for iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position. And (3) calculating by using the formulas (5) and (10) to obtain a new population position, selecting whether mutation occurs by using the formula (12) or not by using the formula (11), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (6)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (7)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (8)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (9)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)

r _i ＝r _i ×(1-exp((-0.9)×t)) (11)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (12)

A judging module for judging whether the maximum iteration times (T) are reached;

and the output module outputs the optimal drosophila position to obtain the optimal punishment coefficient (C) and the optimal kernel width (gamma).

The predictive model is formulated byTo realize the method. Wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i For Lagrangian coefficient, K (x _i ,x _j )＝exp(-r‖x _i -x _j ‖ ² )。

The embodiment of the invention has the following beneficial effects:

according to the method, a new search mechanism and a mixed distribution strategy are introduced into a drosophila optimization algorithm, so that the search capability of the algorithm is improved, the situation that a local optimal solution is trapped is avoided, and the method is further used for optimizing parameters of models such as a support vector machine, a kernel extreme learning machine and the like, and an optimal machine learning model is built, so that the classification and prediction of the problems in the specific field are realized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that it is within the scope of the invention to one skilled in the art to obtain other drawings from these drawings without inventive faculty.

FIG. 1 is a flow chart of a method for optimizing predictive model parameters to classify data by improving a Drosophila optimization algorithm in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of another method for optimizing predictive model parameters to classify data provided by an embodiment of the invention that improves the Drosophila optimization algorithm;

fig. 3 is a schematic structural diagram of a system for optimizing prediction model parameters to classify data by improving a drosophila optimization algorithm according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, in an embodiment of the present invention, a method for optimizing prediction model parameters by improving a drosophila optimization algorithm is provided, which includes the following steps:

step S1: preprocessing the acquired historical data;

the specific process is that historical data related to a problem to be researched is obtained, normalized and classified, wherein the historical data is normalized by using a formula (1);

wherein the classified attributes include data attributes and category attributes.

For example, taking data distinguished by benign and malignant thyroid nodules based on ultrasound features as an example, the data attribute values fall into two major categories, namely data attribute X ₁ -X ₈ Representing ultrasound properties, X, for benign and malignant thyroid nodule disease ₉ The class of the data sample is represented: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: the value is-1.

For example, taking enterprise bankruptcy risk prediction data as an example, a single sample attribute value is divided into two main classes, namely, data attributes have X ₁ -X _n Such related financial index as liability rate, total amount of assets, etc., then X _n+1 Representing category labels: namely, whether the enterprise has the risk of cracking in two years, if the risk of cracking is 1, the risk of cracking is not-1.

the specific process is that the penalty coefficient (C) and the kernel width (gamma) of a predictive support vector machine are optimized by utilizing an improved drosophila algorithm;

Step S2.2: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into defined range by using formulas (2) - (6) to obtain initial position S _i,j ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (2)

Y _i,1 ＝C _min +w×(C _max -C _min ) (3)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (5)

Where w is a random number subject to uniform distribution between [0,1 ].

Step S2.3: and saving the current optimal population position and the fitness value. Taking the preprocessed historical data as a training sample of a prediction model, calculating fitness values (F) of all current fruit fly positions, and taking the optimal fitness values (F _best ) And corresponding location save (X _pos ,Y _pos ,S _pos ). The accuracy ACC of the model was verified with an internal K-fold crossover strategy.

Step S2.4: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (7) - (10) according to the current optimal population position. And (3) calculating by using the formulas (6) and (11) to obtain a new population position, selecting whether mutation occurs in the mode of the formula (13) or not by using the formula (12), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (7)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (8)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (9)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (10)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (11)

r _i ＝r _i ×(1-exp((-0.9)×t)) (12)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (13)

the specific process is that through the formulaTo realize the method. Wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i For Lagrangian coefficient, K (x _i ,x _j )＝exp(-r‖x _i -x _j ‖ ² )。

As shown in FIG. 2, another method for optimizing predictive model parameters to classify data by improving a Drosophila optimization algorithm according to an embodiment of the invention comprises the following steps:

step S001: preprocessing the acquired historical data;

the specific process is that historical data related to a problem to be researched is obtained, normalized and classified, wherein the historical data is normalized by using a formula (a);

Step S003: and randomly generating initial positions of the drosophila population. Mapping Drosophila population into a defined range by adopting formulas (b) - (f) to obtain an initial position S _i,j ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (b)

Y _i,1 ＝C _min +w×(C _max -C _min ) (c)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (d)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (e)

Where w is a random number subject to uniform distribution between [0,1 ].

Step S005: and iteratively searching and updating the population positions. And updating the position of the next generation drosophila by using formulas (g) - (j) according to the current optimal population position. And (3) calculating to obtain a new population position by using the formulas (f) and (k), selecting whether mutation occurs in the mode of the formula (m) by using the formula (l), and finally calculating the fitness value. Judging whether the optimal fitness value is better than the optimal fitness value of the previous generation, if so, replacing and updating the optimal position, otherwise, continuing to keep;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (g)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (h)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (i)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (j)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (k)

r _i ＝r _i ×(1-exp((-0.9)×t)) (l)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (m)

As shown in fig. 3, in an embodiment of the present invention, a system for optimizing prediction model parameters to classify data by improving a drosophila optimization algorithm is provided, which includes:

the data acquisition and processing unit 110: preprocessing the acquired historical data;

model parameter improvement unit 120: the history data after preprocessing is used as training samples and a new improved drosophila optimization algorithm is utilized to optimize parameters (punishment coefficient and kernel width) of a prediction model (support vector machine);

model reconstruction unit 130: constructing a prediction model according to parameters obtained by algorithm optimization;

the data classification prediction unit 140: and classifying the data to be detected by adopting the newly constructed prediction model.

The model parameter improving unit 120 specifically includes:

a first initialization module for initializing necessary parameters includingMaximum iteration number (T), drosophila population number (N), and punishment coefficient (C) _min ,C _max ]Definition range of core width gamma [ gamma ] _min ,γ _max ]；

And the second initialization module is used for randomly generating initial positions of the Drosophila population. Mapping Drosophila population into defined range by using formulas (1) - (5) to obtain initial position S _i ＝(S _i,1 ,S _i,2 ),(i＝1,2,…,N；j＝1,2)；

X _i,1 ＝C _min +w×(C _max -C _min ) (1)

Y _i,1 ＝C _min +w×(C _max -C _min ) (2)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (3)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Where w is a random number subject to uniform distribution between [0,1 ].

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (6)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (7)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (8)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (9)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)

r _i ＝r _i ×(1-exp((-0.9)×t)) (11)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (12)

The judging module is used for judging that the maximum iteration times (T) are reached;

The embodiment of the invention has the following beneficial effects:

it should be noted that, in the above system embodiment, each included system unit is only divided according to the functional logic, but not limited to the above division, so long as the corresponding function can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method for classifying data based on an improved drosophila optimization algorithm, the method being performed by a program for instructing associated hardware, said program being stored on a computer readable storage medium, comprising the steps of:

step S1: normalizing and classifying the acquired historical data, wherein the classified attributes comprise data attributes and category attributes;

the historical data are data distinguished by benign and malignant thyroid nodules based on ultrasonic characteristics, the data attribute values are divided into two main categories, namely, the data attributes X1-X8 represent ultrasonic attributes aiming at benign and malignant thyroid nodule diseases, and X9 represents the category of the data: i.e., distinguishing benign nodules from malignant nodules; if the sample is a malignant nodule: a value of 1, if the sample is a benign nodule: a value of-1;

step S2: the preprocessed historical data is used as training samples, and the improved drosophila optimization algorithm is utilized to optimize parameters of a prediction model, wherein the prediction model is a support vector machine, and the parameters comprise penalty coefficients and kernel widths;

step S3: constructing a prediction model according to the optimized parameters of the support vector machine;

step S4: classifying the sample to be detected by adopting a newly constructed prediction model;

the step S2 specifically includes:

step S2.1: initializing parameters, and defining the maximum iteration times T, the number N of Drosophila populations and punishment coefficient C _min ,C _max ]Definition range of core width gamma [ gamma ] _min ,γ _max ]；

Step S2.2: randomly generating initial positions of Drosophila populations, and mapping the Drosophila populations into a defined range by adopting formulas (1) - (5) to obtain initial positions S _i,j ＝S _i,1 ,S _i,2 ),i＝1,2,…,N；j＝1,2；

X _i,1 ＝C _min +w×(C _max -C _min ) (1)

Y _i,1 ＝C _min +w×(C _max -C _min ) (2)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (3)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Wherein w is a random number subject to uniform distribution between [0,1 ];

step S2.3: storing fitness value of current optimal population position, taking preprocessed historical data as training sample of prediction model, calculating fitness value F of all current drosophila position, and using optimal fitness value F _best And corresponding position preservation X _pos ,Y _pos ,S _pos Verifying the accuracy ACC of the model by using an internal K-fold intersection strategy;

step S2.4: iteratively searching for updating the population position, updating the position of the next generation drosophila by using formulas (6) - (9) according to the current optimal population position, calculating to obtain a new population position by using formulas (5) - (10), selecting whether mutation occurs by using formula (12) or not by using formula (11), finally calculating a fitness value, judging whether the fitness value is better than the optimal fitness value of the previous generation, if so, replacing the updated optimal position, otherwise, continuously maintaining;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (6)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (7)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (8)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (9)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)

r _i ＝r _i ×(1-exp((-0.9)×t)) (11)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (12)

wherein L is an adaptive dynamic variable related to the iteration number, B, M is a random value conforming to uniform distribution, k is an integer value conforming to uniform distribution, t is the current iteration number, gauss is a random value conforming to Gaussian distribution, and Cauchy is a random value conforming to Cauchy distribution;

step S2.5: judging whether the maximum iteration times T are reached, if so, outputting the optimal drosophila position to obtain an optimal punishment coefficient C and a kernel width gamma; otherwise, turning to step S2.3;

the prediction model in the step S3 is represented by a formulaTo realize, wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i Is LagrangianCoefficient, K (x _i ,x _j )＝exp(-r||x _i -x _j || ² )。

2. A system for optimizing predictive model parameters to classify data based on an improved drosophila optimization algorithm, the system being implemented by programming instructions associated with hardware, said programming stored on a computer readable storage medium comprising:

a data acquisition unit: the method comprises the steps of acquiring historical data, carrying out normalization processing and classification, wherein the classified attributes comprise data attributes and category attributes;

model parameter improvement unit: the history data after preprocessing is used as training samples and the punishment coefficient and the kernel width of the support vector machine are optimized by utilizing an improved drosophila optimization algorithm;

data classification prediction unit: classifying the sample to be detected by adopting a newly constructed prediction model;

the model parameter improvement unit specifically comprises:

The second initialization module randomly generates initial positions of the Drosophila population, maps the Drosophila population into a defined range by adopting formulas (1) - (5) to obtain initial positions S _i,j ＝S _i,1 ,S _i,2 ),i＝1,2,…,N；j＝1,2；

X _i,1 ＝C _min +w×(C _max -C _min ) (1)

Y _i,1 ＝C _min +w×(C _max -C _min ) (2)

X _i,2 ＝γ _min +w×(γ _max -γ _min ) (3)

Y _i,2 ＝γ _min +w×(γ _max -γ _min ) (4)

Wherein w is a random number subject to uniform distribution between [0,1 ];

the calculation module takes the preprocessed historical data as a training sample of a prediction model, calculates fitness values F of all current fruit fly positions, and takes the optimal fitness value F _best And corresponding position preservation X _pos ,Y _pos ,S _pos Verifying the accuracy ACC of the model by using an internal K-fold intersection strategy;

the updating module is used for iteratively searching for updating the population position, updating the position of the next generation drosophila in a mode of formulas (6) - (9) according to the current optimal population position, calculating to obtain a new population position by utilizing formulas (5) and (10), selecting whether mutation occurs in a mode of formula (12) or not by utilizing a formula (11), finally calculating an fitness value, judging whether the fitness value is better than the optimal fitness value of the previous generation, replacing the updated optimal position if the fitness value is better, and otherwise, continuously maintaining;

vx _i,j ^t+1 ＝vx _i,j ^t +(X _pos (j)-X _1,j ^t )×(-2×M) (6)

vy _i,j ^t+1 ＝vy _i,j ^t +(Y _pos (j)-Y _1,j ^t )×(-2×M) (7)

X _i,j ^t+1 ＝X _1,j ^t +vx _i,j ^t+1 (8)

Y _i,j ^t+1 ＝Y _1,j ^t +vy _i,j ^t+1 (9)

S _i,j ^t+1 ＝S _i,j ^t+1 ×(0.1+0.9×(L×Gauss+(1-L)×Cauchy)) (10)

r _i ＝r _i ×(1-exp((-0.9)×t)) (11)

S _i,j ^t+1 ＝S _pos (k)+0.9×A _i ×(2×B-1) (12)

the judging module is used for judging whether the maximum iteration times T are reached or not;

the output module outputs the optimal drosophila position to obtain an optimal punishment coefficient C and a kernel width gamma;

the predictive model is formulated byTo realize, wherein x is _j For the sample to be tested, x _i For the training sample, y _i The labels corresponding to the classification of the training samples are b preset threshold values, a _i For Lagrangian coefficient, K (x _i ,x _j )＝exp(-r||x _i -x _j || ² )。