CN111079074A - Method for constructing prediction model based on improved sine and cosine algorithm - Google Patents

Method for constructing prediction model based on improved sine and cosine algorithm Download PDF

Info

Publication number
CN111079074A
CN111079074A CN201911106482.8A CN201911106482A CN111079074A CN 111079074 A CN111079074 A CN 111079074A CN 201911106482 A CN201911106482 A CN 201911106482A CN 111079074 A CN111079074 A CN 111079074A
Authority
CN
China
Prior art keywords
population
optimal
value
fitness
iteration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911106482.8A
Other languages
Chinese (zh)
Inventor
陈慧灵
乔雪婷
谷至阳
汪鹏君
孙诚
赵学华
刘国民
罗云纲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN201911106482.8A priority Critical patent/CN111079074A/en
Publication of CN111079074A publication Critical patent/CN111079074A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for constructing a prediction model based on an improved sine and cosine algorithm, which comprises the steps of obtaining sample data and carrying out normalization processing on the obtained sample data; optimizing a penalty factor C and a kernel width gamma of a support vector machine by using an improved sine and cosine algorithm; and constructing a prediction model by using the normalized data based on the obtained penalty factor C and the kernel width gamma, and classifying and predicting the sample to be classified based on the constructed prediction model. By implementing the method, the penalty factor and the kernel width of the SVM are optimized based on the improved sine and cosine algorithm, the convergence speed and the convergence precision of the algorithm can be effectively improved, the capability of the algorithm for escaping from the local optimal solution is improved, and a better global approximate optimal solution is found to obtain the SVM model with higher classification precision.

Description

Method for constructing prediction model based on improved sine and cosine algorithm
Technical Field
The invention relates to the technical field of computers, in particular to a method for constructing a prediction model based on an improved sine and cosine algorithm.
Background
As is well known, scientific techniques, particularly computer science techniques, increasingly feature crossovers and infiltrations, changing human production and lifestyle. The field of big data application is also wider and wider, so a new challenge is provided for the classification, prediction and other processing of big data, and particularly, a meta-heuristic optimization algorithm is used for the classification and prediction of the big data.
Support Vector Machines (SVMs), the two most commonly used parametric optimization methods for constructing predictive models to analyze data, include grid search and gradient descent. In the first parameter optimization method, the grid search is an exhaustive search method, and generally, a designated parameter space is divided by setting reasonable upper and lower limits of an interval and interval step length, then, a parameter combination of each grid node representation is trained and predicted, and a group of parameters with the highest values in prediction results are used as the optimal parameters of the final SVM model. Although the method can ensure to obtain the optimal parameter combination in the given parameter space to a certain extent, the searching efficiency is greatly reduced along with the increase of the parameter space, especially the setting of reasonable intervals and interval step values is very difficult, so that the feasibility is greatly reduced, and the model is very easy to fall into a local optimal value; in the second parameter optimization method, although the gradient descent method can overcome the defects of the grid search method, it is very sensitive to the initial value, and particularly when the initial parameter setting is very far from the optimal solution, the model is easily converged to the local optimal solution.
In recent years, since meta-heuristic search algorithms have received extensive attention from both academia and industry due to their unique global optimization capabilities, they are generally considered to have a greater chance of finding a global optimal solution than conventional optimization methods, and therefore, various meta-heuristic algorithm-based SVM training algorithms are proposed to deal with the parameter optimization problem.
When the SVM is applied specifically, the performance of the SVM is mainly affected by kernel functions including a linear kernel function, a polynomial kernel function, a Radial Basis Function (RBF) kernel function, a sigmoid kernel function, and the like, and generally, an SVM based on the RBF kernel function is selected. The RBF kernel SVM mainly involves two important parameters C and γ. C is a punishment factor which is used for controlling the punishment degree of the wrong divided samples and playing a role in controlling the balance between the training error and the model complexity; the smaller the C value, the smaller the penalty for misjudging the sample in the data, so that the training error becomes larger, and therefore the structural risk also becomes larger. Conversely, the larger the C value is, the larger the degree of constraint on the misclassified samples is, which may cause that although the misjudgment rate of the model on the training data is low, the overall generalization capability is poor, and the phenomenon of "overfitting" is easy to occur. The parameter γ represents the kernel width in the RBF kernel, which determines the width of the kernel and directly affects the performance of the SVM. If gamma is not properly obtained, it is difficult for the SVM to obtain the desired learning effect. Too small a value of y may result in overfitting, and too large a value of y may result in too gentle a discrimination function of the SVM. The penalty factor C and the kernel width γ affect the classification hyperplane of the SVM from different angles. In practical applications, the generalization performance of the SVM is deteriorated if the values are too large or too small.
However, the SVM parameter optimization problem is processed by adopting the existing metaheuristic search algorithm, the convergence speed and the convergence precision of the algorithm are still to be further improved, and the capability of the algorithm to escape from the local optimal solution is improved, so that a better global approximate optimal solution is found.
Disclosure of Invention
The invention aims to provide a method for constructing a prediction model based on an improved sine and cosine algorithm, which can effectively improve the convergence speed and the convergence precision of the algorithm, improve the capability of the algorithm to escape from a local optimal solution, and find a better global approximate optimal solution to obtain an SVM (support vector machine) with higher classification precision.
In order to achieve the above object, the present invention provides a method for constructing a prediction model based on an improved sine and cosine algorithm, comprising the following steps:
step S1, sampling sample data and carrying out normalization processing on the acquired sample data;
step S2, optimizing a penalty factor C and a kernel width gamma of the support vector machine by using an improved sine and cosine algorithm, specifically:
s2.1, initializing parameters; the initialized parameters comprise: maximum iteration time T, current iteration time T, population number N, search space upper boundary ub, search space lower boundary lb, most advantageous position Pbest, search space [ Cmin, Cmax of penalty factor C]And a search space [ γ min, γ max ] of kernel width γ]Position of current solution of ith dimension in t-th iteration
Figure BDA0002271466340000031
Random number r1,r2,r3, PiIs the target position in the ith dimension;
s2.2, randomly initializing the positions of N points, wherein the position of the ith population is Mi=(Mi1,Mi2),i=1,2,...,N;Mi1A penalty factor C, M, representing the current location of the populationi2Representing the kernel width gamma value of the population at the current position;
s2.3, performing iteration according to the formula (1);
Figure BDA0002271466340000032
wherein r is4Is [0,1 ]]The random number of (1);
as shown in equation (1), there are four key parameters: r is1,r2,r3And r4,r1Giving the area of the next position, which may be between or outside the solution and the position of the target, r2Defining the distance, r, that the motion should move towards the target position or in the opposite direction3Carrying a random load to the target location to randomly emphasize or de-emphasize the effect of the target location in describing the distance, r4The parameters are switched from sine to cosine or vice versa in equation (1); in addition, r2The value is 2 pi × random, r3The value is 2 × random;
s2.4, judging whether the iteration is the first iteration or not, and if so, directly carrying out comparison on each point MiAll calculate their fitness fiOtherwise, the latest population position M is merged with the historical optimal population position F, and the fitness F is calculatediAnd the fitness f of each point i is calculatediSorting from big to small, and selecting the first N population positions as historical optimal positions F;
s2.5, screening out N points with fitness greater than that of the optimal point and the maximum fitness, replacing the optimal point Pbest with the currently screened point with the maximum fitness, and further assigning the position of the current point to an optimal population position Best _ pos;
wherein the fitness f of each point iiThe method is based on a penalty factor C and a kernel width gamma value of the current position of a population i, and the accuracy ACC of a support vector machine is calculated by an internal K-fold cross validation strategy according to a formula (2);
Figure BDA0002271466340000041
wherein, acckRepresenting the accuracy of calculation obtained on each fold of data;
step S2.6, for stable development and exploration, the sine and cosine ranges in equation (1) are adaptively modified using the following equations:
Figure BDA0002271466340000042
wherein T is the current iteration, T is the maximum number of iterations, a is a constant;
s2.7, judging whether the maximum iteration time T is exceeded or not; if not, jumping to the step S2.4; if yes, executing the next step S2.8;
s2.8, outputting the position Best _ pos of the optimal population Pbest and the fitness corresponding to the position Best _ pos, namely an optimal penalty factor C and a kernel width gamma value;
step S3, based on the obtained penalty factor C and the kernel width gamma value, using the normalized data to construct a prediction model shown in the following formula (4), and classifying and predicting the sample to be classified based on the constructed prediction model; wherein, K (x)i,xj) Shown by formula (5); x is the number ofjRepresenting the sample data after the jth normalization processing; x is the number ofi(i 1.. l) represents a training sample; y isiI 1.. l) represents a label corresponding to the training sample, yi1 represents a positive type sample, yi1 represents a negative class sample, b is a threshold value αiIs the lagrange coefficient;
Figure BDA0002271466340000051
K(xi,xj)=exp(-r||xi-xj||2) (5)。
it is further set that the step S2.4 further includes the following steps:
step S2.4.1, introducing a PSO-SCA algorithm, combining the development capability of the PSO and the exploration capability of the SCA, comparing the fitness value of the search agent in the current population with the optimal precision value in the iteration in each iteration process, and storing all better solutions in a matrix form. The solution also tracks the best value obtained so far for any nearby solution to achieve a globally optimal solution; the kernel formula is as follows:
Figure BDA0002271466340000052
wherein the content of the first and second substances,
Figure BDA0002271466340000053
is the position of the current solution in the ith dimension in the t iteration, r1,r2,r3Is a random number, PiIs the target position in the ith dimension, r4Is [0,1 ]]The random number of (1);
step S2.4.2, evolving the PSO-SCA result through a DE algorithm, and if the adaptability of the evolved result is superior to the adaptability value of the original individual, keeping the evolved individual;
s2.4.3, performing variation on the current optimal point by using Gaussian distribution, Cauchy distribution and Levy distribution, finding the minimum point in the three variation results, and updating the fitness value and the corresponding point;
X=min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)
wherein, X _ m _ Levy, X _ m _ gauge and X _ m _ Cauchy are variance values obtained by Levy, Gaussian and Cauchy variation, respectively.
The invention has the beneficial effects that:
the PSO algorithm is reasonably combined in the optimization process of the Sine and Cosine Algorithm (SCA), the DE and the variation mechanism are added at proper positions to realize the optimization of the punishment factor C and the kernel width gamma code of the SVM into individual positions, the K-fold cross validation is adopted in the optimization process to prevent the sine and cosine algorithm from falling into local extreme values, a more efficient and accurate intelligent model can be obtained, the point diversity is increased, the searching capability of the algorithm is enhanced, the algorithm can be prevented from falling into local optimum, the global optimum solution is quickly found, and therefore a more accurate prediction effect can be obtained, and a decision maker can be effectively assisted to make scientific and reasonable decisions.
Drawings
Fig. 1 is a flowchart of a method for constructing a prediction model based on an improved sine and cosine algorithm according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, a method for constructing a prediction model based on an improved sine and cosine algorithm includes the following steps:
and step S1, sampling sample data and carrying out normalization processing on the acquired sample data.
The specific process is that the sample data comes from various different fields and can be designed according to actual needs, such as medical field and goldAnd the data attribute type is divided into a data attribute and a category attribute. For example, for a single sample attribute of data for breast cancer disease, the data attribute values fall into two broad categories, namely data attribute X1-X9Representing attributes of relevant medical pathological aspects for breast cancer diseases, X10The categories of the data sample are represented: i.e. whether the sample is afflicted with breast cancer disease, and if the sample is afflicted with breast cancer: value 1, if the sample is healthy: a value of-1; as another example, for a single sample attribute distribution of enterprise bankruptcy risk prediction data, there may be X1-XnSuch related financial indexes as attribute indexes such as liability rate, total assets, etc., then Xn+1Also category labels: namely, whether the enterprise has the risk of bankruptcy within two years, if the risk of bankruptcy is 1, the risk of not bankruptcy is-1.
For convenience of data processing, normalization processing is performed on the acquired sample data.
Step S2, optimizing a penalty factor C and a kernel width gamma of the support vector machine by using an improved sine and cosine algorithm, specifically:
s2.1, initializing parameters; the initialized parameters comprise: maximum iteration time T, current iteration time T, population number N, search space upper boundary ub, search space lower boundary lb, most advantageous position Pbest, search space [ Cmin, Cmax of penalty factor C]And a search space [ γ min, γ max ] of kernel width γ]Position of current solution of ith dimension in t-th iteration
Figure BDA0002271466340000071
Random number r1,r2,r3, PiIs the target position in the ith dimension;
s2.2, randomly initializing the positions of N points, wherein the position of the ith population is Mi=(Mi1,Mi2),i=1,2,...,N;Mi1A penalty factor C, M, representing the current location of the populationi2Representing the kernel width gamma value of the population at the current position;
s2.3, performing iteration according to the formula (1);
Figure BDA0002271466340000081
wherein r is4Is [0,1 ]]The random number of (1);
as shown in equation (1), there are four key parameters: r is1,r2,r3And r4,r1Giving the area (or direction of movement) of the next position, which may be between or outside the solution and the position of the target, r2Defining the distance, r, that the motion should move towards the target position or in the opposite direction3Carries a random load to the target location to randomly emphasize (r)3>1) Or decrease (r)3<1) Influence of the target position in describing the distance, r4The parameters are switched from sine to cosine or vice versa in equation (1); in addition, r2The value is 2 pi × random, r3The value is 2 × random;
s2.4, judging whether the iteration is the first iteration or not, and if so, directly carrying out comparison on each point MiAll calculate their fitness fiOtherwise, the latest population position M is merged with the historical optimal population position F, and the fitness F is calculatediAnd the fitness f of each point i is calculatediSorting from big to small, and selecting the first N population positions as historical optimal positions F;
s2.5, screening out N points with fitness greater than that of the optimal point and the maximum fitness, replacing the optimal point Pbest with the currently screened point with the maximum fitness, and further assigning the position of the current point to an optimal population position Best _ pos;
wherein the fitness f of each point iiThe method is based on a penalty factor C and a kernel width gamma value of the current position of a population i, and the accuracy ACC of a support vector machine is calculated by an internal K-fold cross validation strategy according to a formula (2);
Figure BDA0002271466340000091
wherein, acckRepresenting the calculation on each fold of dataThe accuracy of the measurement;
step S2.6, for stable development and exploration, the sine and cosine ranges in equation (1) are adaptively modified using the following equations:
Figure BDA0002271466340000092
where T is the current iteration, T is the maximum number of iterations, a is a constant (assumed to be 2);
s2.7, judging whether the maximum iteration time T is exceeded or not; if not, jumping to the step S2.4; if yes, executing the next step S2.8;
s2.8, outputting the position Best _ pos of the optimal population Pbest and the fitness corresponding to the position Best _ pos, namely an optimal penalty factor C and a kernel width gamma value;
step S3, based on the obtained penalty factor C and the kernel width gamma value, using the normalized data to construct a prediction model shown in the following formula (4), and classifying and predicting the sample to be classified based on the constructed prediction model; wherein, K (x)i,xj) Shown by formula (5); x is the number ofjRepresenting the sample data after the jth normalization processing; x is the number ofi(i 1.. l) represents a training sample; y isiI 1.. l) represents a label corresponding to the training sample, yi1 represents a positive type sample, yi1 represents a negative class sample, b is a threshold value αiIs the lagrange coefficient;
Figure BDA0002271466340000093
K(xi,xj)=exp(-r||xi-xj||2) (5)。
it is further set that the step S2.4 further includes the following steps:
step S2.4.1, introducing a PSO-SCA algorithm, combining the development capability of the PSO and the exploration capability of the SCA, comparing the fitness value of the search agent in the current population with the optimal precision value in the iteration in each iteration process, and storing all better solutions in a matrix form. The solution also tracks the best value obtained so far for any nearby solution to achieve a globally optimal solution; the kernel formula is as follows:
Figure BDA0002271466340000101
wherein the content of the first and second substances,
Figure BDA0002271466340000102
is the position of the current solution in the ith dimension in the t iteration, r1,r2,r3Is a random number, PiIs the target position in the ith dimension, r4Is [0,1 ]]The random number of (1);
step S2.4.2, evolving the PSO-SCA result through a DE algorithm, and if the adaptability of the evolved result is superior to the adaptability value of the original individual, keeping the evolved individual;
s2.4.3, performing variation on the current optimal point by using Gaussian distribution, Cauchy distribution and Levy distribution, finding the minimum point in the three variation results, and updating the fitness value and the corresponding point;
X=min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)
wherein, X _ m _ Levy, X _ m _ gauge and X _ m _ Cauchy are variance values obtained by Levy, Gaussian and Cauchy variation, respectively.
Application examples
Using breast cancer data as sample data, the sample set is represented as follows: (x)i,yi) 699, wherein' xi' denotes a 9-dimensional feature vector, y is a sample label with a value of 1 or-1, ' 1 ' denotes that the sample is suffering from breast cancer, and' -1 ' denotes that the patient is healthy.
Firstly, standardizing each characteristic attribute value of sample data to be tested, and utilizing a formula
Figure BDA0002271466340000103
Normalizing the sample data, wherein SiIn a representative sampleCharacteristic raw value of the attribute of (1), SiIs' is SiNormalized value, S, obtained from the formulaminRepresenting the minimum value, S, in the corresponding sample datamaxRepresenting the maximum value in the corresponding sample data.
And then, optimizing a penalty coefficient C and a kernel width gamma of a support vector machine by using an improved sine and cosine algorithm, and optimizing by using a K-fold crossing strategy in the support vector machine (namely, performing K-fold cutting on a sample introduced into the model, wherein K-1 fold is used as training data each time, and optimizing two key parameters by using the improved sine and cosine algorithm while training to expect to obtain an optimal intelligent classification model, and after the model is constructed, using the rest data as test data to evaluate the performance of the constructed intelligent decision model). In short, for different intelligent classification decision problems, i need to adopt an improved sine and cosine algorithm with global search capability to construct a classification decision model optimal for such problems, as discussed previously: the penalty coefficient C and the kernel width gamma have important influence on the performance of the model, namely, the quality of the two parameters directly influences the quality of the performance of the decision-making model, so that an improved sine and cosine algorithm is proposed to complete the selection of the two parameters, the traditional algorithm is improved, local extreme points are skipped, and the convergence speed and the accuracy of the algorithm are improved to a certain extent.
Input training sample (x)i,yi) And the problem optimized according to the largarrange dual problem becomes:
Figure BDA0002271466340000111
Figure BDA0002271466340000112
then, for the above optimization problem, an improved sine cosine algorithm is adopted for C and gamma (which is the kernel width K (x) of the radial basis kernel function parameter)i,xj)=exp(-γ||xi-xj||2) ) and solving the optimal solution by:
a*=(a1 *,a2 *,...,a* 699)T
then the following solution is given:
Figure BDA0002271466340000113
the final optimal classification hyperplane function is then:
Figure RE-GDA0002325582510000121
the embodiment of the invention has the following beneficial effects:
the invention combines the PSO algorithm in the optimization process of the Sine and Cosine Algorithm (SCA), adds DE and a variation mechanism at a proper position to realize that the punishment factor C and the kernel width gamma code of the SVM are optimized as an individual position, adopts K-fold cross validation in the optimization process to prevent the sine and cosine algorithm from falling into a local extreme value, can obtain a more efficient and accurate intelligent model, not only increases the point diversity and enhances the searching capability of the algorithm, but also can prevent the algorithm from falling into the local optimum and quickly find out the global optimum solution, thereby obtaining more accurate prediction effect and more effectively assisting a decision maker to make scientific and reasonable decisions.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (2)

1. A method for constructing a prediction model based on an improved sine and cosine algorithm is characterized by comprising the following steps:
step S1, sampling sample data and carrying out normalization processing on the acquired sample data;
step S2, optimizing a penalty factor C and a kernel width gamma of the support vector machine by using an improved sine and cosine algorithm, specifically:
s2.1, initializing parameters; the initialized parameters comprise: maximum iteration time T, current iteration time T, population number N, search space upper boundary ub, search space lower boundary lb, optimal point position Pbest and search space [ Cmin, Cmax of penalty factor C]And kernel wide γ search space [ γ min, ymax]Position of current solution of ith dimension in t-th iteration
Figure FDA0002271466330000011
Random number r1,r2,r3,PiIs the target position in the ith dimension;
s2.2, randomly initializing the positions of N points, wherein the position of the ith population is Mi=(Mi1,Mi2),i=1,2,...,N;Mi1A penalty factor C, M, representing the current location of the populationi2Representing the kernel width gamma value of the population at the current position;
s2.3, performing iteration according to the formula (1);
Figure FDA0002271466330000012
wherein r is4Is [0,1 ]]The random number of (1);
as shown in equation (1), there are four key parameters: r is1,r2,r3And r4,r1Giving the area of the next position, which may be between or outside the solution and the position of the target, r2Defining the distance, r, that the motion should move towards the target position or in the opposite direction3Carrying a random load to the target location to randomly emphasize or de-emphasize the effect of the target location in describing the distance, r4Parameters are in the publicSwitching from sine to cosine or vice versa in equation (1); in addition, r2The value is 2 pi × random, r3The value is 2 × random;
s2.4, judging whether the iteration is the first iteration or not, and if so, directly carrying out comparison on each point MiAll calculate its fitness fiOtherwise, the latest population position M is merged with the historical optimal population position F, and the fitness F is calculatediAnd the fitness f of each point i is calculatediSorting from big to small, and selecting the first N population positions as historical optimal positions F;
s2.5, screening out N points with fitness greater than that of the optimal point and the highest fitness, replacing the optimal point Pbest with the currently screened point with the highest fitness, and further assigning the current point position to an optimal population position Bestpos;
wherein the fitness f of each point iiThe method is based on a penalty factor C and a kernel width gamma value of the current position of a population i, and the accuracy ACC of a support vector machine is calculated by an internal K-fold cross validation strategy according to a formula (2);
Figure FDA0002271466330000021
wherein, acckRepresenting the accuracy of calculation obtained on each fold of data;
step S2.6, for stable development and exploration, the sine and cosine ranges in equation (1) are adaptively modified using the following equation:
Figure FDA0002271466330000022
wherein T is the current iteration, T is the maximum number of iterations, a is a constant;
s2.7, judging whether the maximum iteration time T is exceeded or not; if not, jumping to the step S2.4; if yes, executing the next step S2.8;
s2.8, outputting the position Best _ pos of the optimal population Pbest and the fitness corresponding to the position Best _ pos, namely an optimal penalty factor C and a kernel width gamma value;
step S3, based on the obtained penalty factor C and the kernel width gamma value, using the normalized data to construct a prediction model shown in the following formula (4), and classifying and predicting the sample to be classified based on the constructed prediction model; wherein, K (x)i,xj) Shown by formula (5); x is the number ofjRepresenting sample data after j normalization processing; x is the number ofi(i 1.. l) represents a training sample; y isiI 1.. l) represents a label corresponding to the training sample, and y represents a label corresponding to the training samplei1 represents a positive type sample, yi1 represents a negative class sample, b is a threshold value αiIs the lagrange coefficient;
Figure FDA0002271466330000031
K(xi,xj)=exp(-r||xi-xj||2) (5)。
2. the method according to claim 1, wherein the step S2.4 further comprises the steps of:
step S2.4.1, introducing a PSO-SCA algorithm, combining the development capability of the PSO and the exploration capability of the SCA, comparing the fitness value of the search agent in the current population with the optimal precision value in the iteration in each iteration process, and storing all better solutions in a matrix form. The solution also tracks the best value obtained so far for any nearby solution to achieve a globally optimal solution; the kernel formula is as follows:
Figure FDA0002271466330000032
wherein the content of the first and second substances,
Figure FDA0002271466330000033
is the position of the current solution in the ith dimension in the t iteration, r1,r2,r3Is a random number, PiIs the target position in the ith dimension, r4Is [0,1 ]]The random number of (1);
step S2.4.2, evolving the PSO-SCA result through a DE algorithm, and if the adaptability of the evolved result is superior to the adaptability value of the original individual, keeping the evolved individual;
s2.4.3, performing variation on the current optimal point by using Gaussian distribution, Cauchy distribution and Levy distribution, finding the minimum point in the three variation results, and updating the fitness value and the corresponding point;
X=min{X_m_Levy,X_m_gaus,X_m_cauchy} (7)
wherein, X _ m _ Levy, X _ m _ gauge and X _ m _ Cauchy are variance values obtained by Levy, Gaussian and Cauchy variation, respectively.
CN201911106482.8A 2019-11-13 2019-11-13 Method for constructing prediction model based on improved sine and cosine algorithm Pending CN111079074A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911106482.8A CN111079074A (en) 2019-11-13 2019-11-13 Method for constructing prediction model based on improved sine and cosine algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911106482.8A CN111079074A (en) 2019-11-13 2019-11-13 Method for constructing prediction model based on improved sine and cosine algorithm

Publications (1)

Publication Number Publication Date
CN111079074A true CN111079074A (en) 2020-04-28

Family

ID=70310873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911106482.8A Pending CN111079074A (en) 2019-11-13 2019-11-13 Method for constructing prediction model based on improved sine and cosine algorithm

Country Status (1)

Country Link
CN (1) CN111079074A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111474850A (en) * 2020-05-25 2020-07-31 南昌航空大学 PID (proportion integration differentiation) hydraulic leveling system control method based on improved sine and cosine algorithm
CN111611727A (en) * 2020-06-23 2020-09-01 南昌航空大学 Optimal design method for ensuring motion reliability of cam mechanism
CN112200224A (en) * 2020-09-23 2021-01-08 温州大学 Medical image feature processing method and device
CN112485652A (en) * 2020-12-09 2021-03-12 电子科技大学 Analog circuit single fault diagnosis method based on improved sine and cosine algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108462711A (en) * 2018-03-22 2018-08-28 江南大学 A kind of intrusion detection method of cosine time-varying PSO-SVM
CN109948675A (en) * 2019-03-05 2019-06-28 温州大学 The method for constructing prediction model based on outpost's mechanism drosophila optimization algorithm on multiple populations
CN110222751A (en) * 2019-05-28 2019-09-10 温州大学 A method of prediction model is constructed based on orthogonal sine and cosine algorithm on multiple populations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108462711A (en) * 2018-03-22 2018-08-28 江南大学 A kind of intrusion detection method of cosine time-varying PSO-SVM
CN109948675A (en) * 2019-03-05 2019-06-28 温州大学 The method for constructing prediction model based on outpost's mechanism drosophila optimization algorithm on multiple populations
CN110222751A (en) * 2019-05-28 2019-09-10 温州大学 A method of prediction model is constructed based on orthogonal sine and cosine algorithm on multiple populations

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111474850A (en) * 2020-05-25 2020-07-31 南昌航空大学 PID (proportion integration differentiation) hydraulic leveling system control method based on improved sine and cosine algorithm
CN111611727A (en) * 2020-06-23 2020-09-01 南昌航空大学 Optimal design method for ensuring motion reliability of cam mechanism
CN111611727B (en) * 2020-06-23 2022-11-22 南昌航空大学 Optimal design method for guaranteeing motion reliability of cam mechanism
CN112200224A (en) * 2020-09-23 2021-01-08 温州大学 Medical image feature processing method and device
CN112200224B (en) * 2020-09-23 2023-12-01 温州大学 Medical image feature processing method and device
CN112485652A (en) * 2020-12-09 2021-03-12 电子科技大学 Analog circuit single fault diagnosis method based on improved sine and cosine algorithm
CN112485652B (en) * 2020-12-09 2021-09-14 电子科技大学 Analog circuit single fault diagnosis method based on improved sine and cosine algorithm

Similar Documents

Publication Publication Date Title
Xia et al. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring
Song et al. Feature selection using bare-bones particle swarm optimization with mutual information
Zhang et al. A return-cost-based binary firefly algorithm for feature selection
Singh et al. Feature wise normalization: An effective way of normalizing data
CN111079074A (en) Method for constructing prediction model based on improved sine and cosine algorithm
Sahebi et al. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance
Fernandes et al. Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning
CN110991494A (en) Method for constructing prediction model based on improved moth optimization algorithm
Lin et al. Parameter tuning, feature selection and weight assignment of features for case-based reasoning by artificial immune system
CN110705640A (en) Method for constructing prediction model based on slime mold algorithm
Branke MCDA and multiobjective evolutionary algorithms
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN111986811A (en) Disease prediction system based on big data
US20210012862A1 (en) Shortlist selection model for active learning
CN110738362A (en) method for constructing prediction model based on improved multivariate cosmic algorithm
CN110069817A (en) A method of prediction model is constructed based on California gray whale optimization algorithm is improved
CN110222751A (en) A method of prediction model is constructed based on orthogonal sine and cosine algorithm on multiple populations
CN109284860A (en) A kind of prediction technique based on orthogonal reversed cup ascidian optimization algorithm
CN108877947B (en) Depth sample learning method based on iterative mean clustering
El Moutaouakil et al. Optimal entropy genetic fuzzy-C-means SMOTE (OEGFCM-SMOTE)
CN112215259B (en) Gene selection method and apparatus
CN110751257A (en) Method for constructing prediction model based on hunger game search algorithm
Park et al. An efficient differential evolution using speeded-up k-nearest neighbor estimator
CN109948675A (en) The method for constructing prediction model based on outpost&#39;s mechanism drosophila optimization algorithm on multiple populations
Huang et al. Harnessing deep learning for population genetic inference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200428