CN111079074A

CN111079074A - Method for constructing prediction model based on improved sine and cosine algorithm

Info

Publication number: CN111079074A
Application number: CN201911106482.8A
Authority: CN
Inventors: 陈慧灵; 乔雪婷; 谷至阳; 汪鹏君; 孙诚; 赵学华; 刘国民; 罗云纲
Original assignee: Wenzhou University
Current assignee: Wenzhou University
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-04-28

Abstract

The invention provides a method for constructing a prediction model based on an improved sine and cosine algorithm, which comprises the steps of obtaining sample data and carrying out normalization processing on the obtained sample data; optimizing a penalty factor C and a kernel width gamma of a support vector machine by using an improved sine and cosine algorithm; and constructing a prediction model by using the normalized data based on the obtained penalty factor C and the kernel width gamma, and classifying and predicting the sample to be classified based on the constructed prediction model. By implementing the method, the penalty factor and the kernel width of the SVM are optimized based on the improved sine and cosine algorithm, the convergence speed and the convergence precision of the algorithm can be effectively improved, the capability of the algorithm for escaping from the local optimal solution is improved, and a better global approximate optimal solution is found to obtain the SVM model with higher classification precision.

Description

Method for constructing prediction model based on improved sine and cosine algorithm

Technical Field

The invention relates to the technical field of computers, in particular to a method for constructing a prediction model based on an improved sine and cosine algorithm.

Background

As is well known, scientific techniques, particularly computer science techniques, increasingly feature crossovers and infiltrations, changing human production and lifestyle. The field of big data application is also wider and wider, so a new challenge is provided for the classification, prediction and other processing of big data, and particularly, a meta-heuristic optimization algorithm is used for the classification and prediction of the big data.

Support Vector Machines (SVMs), the two most commonly used parametric optimization methods for constructing predictive models to analyze data, include grid search and gradient descent. In the first parameter optimization method, the grid search is an exhaustive search method, and generally, a designated parameter space is divided by setting reasonable upper and lower limits of an interval and interval step length, then, a parameter combination of each grid node representation is trained and predicted, and a group of parameters with the highest values in prediction results are used as the optimal parameters of the final SVM model. Although the method can ensure to obtain the optimal parameter combination in the given parameter space to a certain extent, the searching efficiency is greatly reduced along with the increase of the parameter space, especially the setting of reasonable intervals and interval step values is very difficult, so that the feasibility is greatly reduced, and the model is very easy to fall into a local optimal value; in the second parameter optimization method, although the gradient descent method can overcome the defects of the grid search method, it is very sensitive to the initial value, and particularly when the initial parameter setting is very far from the optimal solution, the model is easily converged to the local optimal solution.

In recent years, since meta-heuristic search algorithms have received extensive attention from both academia and industry due to their unique global optimization capabilities, they are generally considered to have a greater chance of finding a global optimal solution than conventional optimization methods, and therefore, various meta-heuristic algorithm-based SVM training algorithms are proposed to deal with the parameter optimization problem.

When the SVM is applied specifically, the performance of the SVM is mainly affected by kernel functions including a linear kernel function, a polynomial kernel function, a Radial Basis Function (RBF) kernel function, a sigmoid kernel function, and the like, and generally, an SVM based on the RBF kernel function is selected. The RBF kernel SVM mainly involves two important parameters C and γ. C is a punishment factor which is used for controlling the punishment degree of the wrong divided samples and playing a role in controlling the balance between the training error and the model complexity; the smaller the C value, the smaller the penalty for misjudging the sample in the data, so that the training error becomes larger, and therefore the structural risk also becomes larger. Conversely, the larger the C value is, the larger the degree of constraint on the misclassified samples is, which may cause that although the misjudgment rate of the model on the training data is low, the overall generalization capability is poor, and the phenomenon of "overfitting" is easy to occur. The parameter γ represents the kernel width in the RBF kernel, which determines the width of the kernel and directly affects the performance of the SVM. If gamma is not properly obtained, it is difficult for the SVM to obtain the desired learning effect. Too small a value of y may result in overfitting, and too large a value of y may result in too gentle a discrimination function of the SVM. The penalty factor C and the kernel width γ affect the classification hyperplane of the SVM from different angles. In practical applications, the generalization performance of the SVM is deteriorated if the values are too large or too small.

However, the SVM parameter optimization problem is processed by adopting the existing metaheuristic search algorithm, the convergence speed and the convergence precision of the algorithm are still to be further improved, and the capability of the algorithm to escape from the local optimal solution is improved, so that a better global approximate optimal solution is found.

Disclosure of Invention

The invention aims to provide a method for constructing a prediction model based on an improved sine and cosine algorithm, which can effectively improve the convergence speed and the convergence precision of the algorithm, improve the capability of the algorithm to escape from a local optimal solution, and find a better global approximate optimal solution to obtain an SVM (support vector machine) with higher classification precision.

In order to achieve the above object, the present invention provides a method for constructing a prediction model based on an improved sine and cosine algorithm, comprising the following steps:

step S1, sampling sample data and carrying out normalization processing on the acquired sample data;

step S2, optimizing a penalty factor C and a kernel width gamma of the support vector machine by using an improved sine and cosine algorithm, specifically:

s2.1, initializing parameters; the initialized parameters comprise: maximum iteration time T, current iteration time T, population number N, search space upper boundary ub, search space lower boundary lb, most advantageous position Pbest, search space [ Cmin, Cmax of penalty factor C]And a search space [ γ min, γ max ] of kernel width γ]Position of current solution of ith dimension in t-th iteration

Random number r₁，r₂，r₃， P_iIs the target position in the ith dimension;

s2.2, randomly initializing the positions of N points, wherein the position of the ith population is M_i＝(M_i1，M_i2)，i＝1，2，...，N；M_i1A penalty factor C, M, representing the current location of the population_i2Representing the kernel width gamma value of the population at the current position;

s2.3, performing iteration according to the formula (1);

wherein r is₄Is [0,1 ]]The random number of (1);

as shown in equation (1), there are four key parameters: r is₁，r₂，r₃And r₄，r₁Giving the area of the next position, which may be between or outside the solution and the position of the target, r₂Defining the distance, r, that the motion should move towards the target position or in the opposite direction₃Carrying a random load to the target location to randomly emphasize or de-emphasize the effect of the target location in describing the distance, r₄The parameters are switched from sine to cosine or vice versa in equation (1); in addition, r₂The value is 2 pi × random, r₃The value is 2 × random;

s2.4, judging whether the iteration is the first iteration or not, and if so, directly carrying out comparison on each point M_iAll calculate their fitness f_iOtherwise, the latest population position M is merged with the historical optimal population position F, and the fitness F is calculated_iAnd the fitness f of each point i is calculated_iSorting from big to small, and selecting the first N population positions as historical optimal positions F;

s2.5, screening out N points with fitness greater than that of the optimal point and the maximum fitness, replacing the optimal point Pbest with the currently screened point with the maximum fitness, and further assigning the position of the current point to an optimal population position Best _ pos;

wherein the fitness f of each point i_iThe method is based on a penalty factor C and a kernel width gamma value of the current position of a population i, and the accuracy ACC of a support vector machine is calculated by an internal K-fold cross validation strategy according to a formula (2);

wherein, acc_kRepresenting the accuracy of calculation obtained on each fold of data;

step S2.6, for stable development and exploration, the sine and cosine ranges in equation (1) are adaptively modified using the following equations:

wherein T is the current iteration, T is the maximum number of iterations, a is a constant;

s2.7, judging whether the maximum iteration time T is exceeded or not; if not, jumping to the step S2.4; if yes, executing the next step S2.8;

s2.8, outputting the position Best _ pos of the optimal population Pbest and the fitness corresponding to the position Best _ pos, namely an optimal penalty factor C and a kernel width gamma value;

step S3, based on the obtained penalty factor C and the kernel width gamma value, using the normalized data to construct a prediction model shown in the following formula (4), and classifying and predicting the sample to be classified based on the constructed prediction model; wherein, K (x)_i，x_j) Shown by formula (5); x is the number of_jRepresenting the sample data after the jth normalization processing; x is the number of_i(i 1.. l) represents a training sample; y is_iI 1.. l) represents a label corresponding to the training sample, y_i1 represents a positive type sample, y_i1 represents a negative class sample, b is a threshold value α_iIs the lagrange coefficient;

K(x_i,x_j)＝exp(-r||x_i-x_j||²) (5)。

it is further set that the step S2.4 further includes the following steps:

step S2.4.1, introducing a PSO-SCA algorithm, combining the development capability of the PSO and the exploration capability of the SCA, comparing the fitness value of the search agent in the current population with the optimal precision value in the iteration in each iteration process, and storing all better solutions in a matrix form. The solution also tracks the best value obtained so far for any nearby solution to achieve a globally optimal solution; the kernel formula is as follows:

wherein the content of the first and second substances,

is the position of the current solution in the ith dimension in the t iteration, r₁，r₂，r₃Is a random number, P_iIs the target position in the ith dimension, r₄Is [0,1 ]]The random number of (1);

step S2.4.2, evolving the PSO-SCA result through a DE algorithm, and if the adaptability of the evolved result is superior to the adaptability value of the original individual, keeping the evolved individual;

s2.4.3, performing variation on the current optimal point by using Gaussian distribution, Cauchy distribution and Levy distribution, finding the minimum point in the three variation results, and updating the fitness value and the corresponding point;

X＝min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)

wherein, X _ m _ Levy, X _ m _ gauge and X _ m _ Cauchy are variance values obtained by Levy, Gaussian and Cauchy variation, respectively.

The invention has the beneficial effects that:

the PSO algorithm is reasonably combined in the optimization process of the Sine and Cosine Algorithm (SCA), the DE and the variation mechanism are added at proper positions to realize the optimization of the punishment factor C and the kernel width gamma code of the SVM into individual positions, the K-fold cross validation is adopted in the optimization process to prevent the sine and cosine algorithm from falling into local extreme values, a more efficient and accurate intelligent model can be obtained, the point diversity is increased, the searching capability of the algorithm is enhanced, the algorithm can be prevented from falling into local optimum, the global optimum solution is quickly found, and therefore a more accurate prediction effect can be obtained, and a decision maker can be effectively assisted to make scientific and reasonable decisions.

Drawings

Fig. 1 is a flowchart of a method for constructing a prediction model based on an improved sine and cosine algorithm according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, a method for constructing a prediction model based on an improved sine and cosine algorithm includes the following steps:

and step S1, sampling sample data and carrying out normalization processing on the acquired sample data.

The specific process is that the sample data comes from various different fields and can be designed according to actual needs, such as medical field and goldAnd the data attribute type is divided into a data attribute and a category attribute. For example, for a single sample attribute of data for breast cancer disease, the data attribute values fall into two broad categories, namely data attribute X₁-X₉Representing attributes of relevant medical pathological aspects for breast cancer diseases, X₁₀The categories of the data sample are represented: i.e. whether the sample is afflicted with breast cancer disease, and if the sample is afflicted with breast cancer: value 1, if the sample is healthy: a value of-1; as another example, for a single sample attribute distribution of enterprise bankruptcy risk prediction data, there may be X₁-X_nSuch related financial indexes as attribute indexes such as liability rate, total assets, etc., then X_n+1Also category labels: namely, whether the enterprise has the risk of bankruptcy within two years, if the risk of bankruptcy is 1, the risk of not bankruptcy is-1.

For convenience of data processing, normalization processing is performed on the acquired sample data.

s2.3, performing iteration according to the formula (1);

wherein r is₄Is [0,1 ]]The random number of (1);

as shown in equation (1), there are four key parameters: r is₁，r₂，r₃And r₄，r₁Giving the area (or direction of movement) of the next position, which may be between or outside the solution and the position of the target, r₂Defining the distance, r, that the motion should move towards the target position or in the opposite direction₃Carries a random load to the target location to randomly emphasize (r)₃>1) Or decrease (r)₃<1) Influence of the target position in describing the distance, r₄The parameters are switched from sine to cosine or vice versa in equation (1); in addition, r₂The value is 2 pi × random, r₃The value is 2 × random;

wherein, acc_kRepresenting the calculation on each fold of dataThe accuracy of the measurement;

where T is the current iteration, T is the maximum number of iterations, a is a constant (assumed to be 2);

K(x_i,x_j)＝exp(-r||x_i-x_j||²) (5)。

it is further set that the step S2.4 further includes the following steps:

wherein the content of the first and second substances,

X＝min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)

Application examples

Using breast cancer data as sample data, the sample set is represented as follows: (x)_i，y_i) 699, wherein' x_i' denotes a 9-dimensional feature vector, y is a sample label with a value of 1 or-1, ' 1 ' denotes that the sample is suffering from breast cancer, and' -1 ' denotes that the patient is healthy.

Firstly, standardizing each characteristic attribute value of sample data to be tested, and utilizing a formula

Normalizing the sample data, wherein S_iIn a representative sampleCharacteristic raw value of the attribute of (1), S_iIs' is S_iNormalized value, S, obtained from the formula_minRepresenting the minimum value, S, in the corresponding sample data_maxRepresenting the maximum value in the corresponding sample data.

And then, optimizing a penalty coefficient C and a kernel width gamma of a support vector machine by using an improved sine and cosine algorithm, and optimizing by using a K-fold crossing strategy in the support vector machine (namely, performing K-fold cutting on a sample introduced into the model, wherein K-1 fold is used as training data each time, and optimizing two key parameters by using the improved sine and cosine algorithm while training to expect to obtain an optimal intelligent classification model, and after the model is constructed, using the rest data as test data to evaluate the performance of the constructed intelligent decision model). In short, for different intelligent classification decision problems, i need to adopt an improved sine and cosine algorithm with global search capability to construct a classification decision model optimal for such problems, as discussed previously: the penalty coefficient C and the kernel width gamma have important influence on the performance of the model, namely, the quality of the two parameters directly influences the quality of the performance of the decision-making model, so that an improved sine and cosine algorithm is proposed to complete the selection of the two parameters, the traditional algorithm is improved, local extreme points are skipped, and the convergence speed and the accuracy of the algorithm are improved to a certain extent.

Input training sample (x)_i，y_i) And the problem optimized according to the largarrange dual problem becomes:

then, for the above optimization problem, an improved sine cosine algorithm is adopted for C and gamma (which is the kernel width K (x) of the radial basis kernel function parameter)_i,x_j)＝exp(-γ||x_i-x_j||²) ) and solving the optimal solution by:

a^*＝(a₁ ^*,a₂ ^*,...,a^* ₆₉₉)^T

then the following solution is given:

the final optimal classification hyperplane function is then:

the embodiment of the invention has the following beneficial effects:

the invention combines the PSO algorithm in the optimization process of the Sine and Cosine Algorithm (SCA), adds DE and a variation mechanism at a proper position to realize that the punishment factor C and the kernel width gamma code of the SVM are optimized as an individual position, adopts K-fold cross validation in the optimization process to prevent the sine and cosine algorithm from falling into a local extreme value, can obtain a more efficient and accurate intelligent model, not only increases the point diversity and enhances the searching capability of the algorithm, but also can prevent the algorithm from falling into the local optimum and quickly find out the global optimum solution, thereby obtaining more accurate prediction effect and more effectively assisting a decision maker to make scientific and reasonable decisions.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.

While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for constructing a prediction model based on an improved sine and cosine algorithm is characterized by comprising the following steps:

s2.1, initializing parameters; the initialized parameters comprise: maximum iteration time T, current iteration time T, population number N, search space upper boundary ub, search space lower boundary lb, optimal point position Pbest and search space [ Cmin, Cmax of penalty factor C]And kernel wide γ search space [ γ min, ymax]Position of current solution of ith dimension in t-th iteration

Random number r₁，r₂，r₃，P_iIs the target position in the ith dimension;

s2.3, performing iteration according to the formula (1);

wherein r is₄Is [0,1 ]]The random number of (1);

as shown in equation (1), there are four key parameters: r is₁，r₂，r₃And r₄，r₁Giving the area of the next position, which may be between or outside the solution and the position of the target, r₂Defining the distance, r, that the motion should move towards the target position or in the opposite direction₃Carrying a random load to the target location to randomly emphasize or de-emphasize the effect of the target location in describing the distance, r₄Parameters are in the publicSwitching from sine to cosine or vice versa in equation (1); in addition, r₂The value is 2 pi × random, r₃The value is 2 × random;

s2.4, judging whether the iteration is the first iteration or not, and if so, directly carrying out comparison on each point M_iAll calculate its fitness f_iOtherwise, the latest population position M is merged with the historical optimal population position F, and the fitness F is calculated_iAnd the fitness f of each point i is calculated_iSorting from big to small, and selecting the first N population positions as historical optimal positions F;

s2.5, screening out N points with fitness greater than that of the optimal point and the highest fitness, replacing the optimal point Pbest with the currently screened point with the highest fitness, and further assigning the current point position to an optimal population position Bestpos;

step S2.6, for stable development and exploration, the sine and cosine ranges in equation (1) are adaptively modified using the following equation:

step S3, based on the obtained penalty factor C and the kernel width gamma value, using the normalized data to construct a prediction model shown in the following formula (4), and classifying and predicting the sample to be classified based on the constructed prediction model; wherein, K (x)_i，x_j) Shown by formula (5); x is the number of_jRepresenting sample data after j normalization processing; x is the number of_i(i 1.. l) represents a training sample; y is_iI 1.. l) represents a label corresponding to the training sample, and y represents a label corresponding to the training sample_i1 represents a positive type sample, y_i1 represents a negative class sample, b is a threshold value α_iIs the lagrange coefficient;

K(x_i,x_j)＝exp(-r||x_i-x_j||²) (5)。

2. the method according to claim 1, wherein the step S2.4 further comprises the steps of:

wherein the content of the first and second substances,

X＝min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)