CN116842354A

CN116842354A - Feature selection method based on quantum artificial jellyfish search mechanism

Info

Publication number: CN116842354A
Application number: CN202310548465.XA
Authority: CN
Inventors: 高洪元; 郭颖; 揣济阁; 孙溶辰; 杜亚男; 任立群; 狄妍岐; 陈暄; 谷晓苑
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2023-05-16
Filing date: 2023-05-16
Publication date: 2023-10-03

Abstract

The invention provides a characteristic selection method based on a quantum artificial jellyfish search mechanism, which combines a quantum optimization theory and the artificial jellyfish search mechanism to obtain the quantum artificial jellyfish search mechanism, and discretizes the artificial jellyfish search mechanism for solving the continuous optimization problem, so that the method has higher robustness and breaks through the application limit of the artificial jellyfish search mechanism. The encapsulated feature selection adopting the synchronous optimization method can effectively reduce the time complexity of the traditional encapsulated feature selection. The group intelligent optimization method is utilized to simultaneously carry out the optimization of the super parameters of the support vector machine and the selection of the feature subsets, so that the performance of the selected intelligent optimization method is high, and the quantum artificial jellyfish search mechanism has excellent convergence and shorter time. Therefore, compared with some feature selection methods based on the existing traditional group intelligence, the method has the advantages of higher convergence speed, higher convergence precision, lower time complexity and better robustness.

Description

Feature selection method based on quantum artificial jellyfish search mechanism

Technical Field

The invention relates to a characteristic selection method based on a quantum artificial jellyfish search mechanism, and belongs to the field of data processing.

Background

Feature selection research is that a method for selecting some most effective features from original features to reduce the dimension of a data set and optimize specific indexes of a system is an important means for improving the performance of a learning algorithm, and is also a key data preprocessing step in pattern recognition. The existing feature selection methods can be classified into an embedded feature selection method, a filtered feature selection method and a packaged feature selection method according to whether the evaluation criteria of the feature subset are combined with a subsequent learning algorithm. The packaged feature selection method is widely applied because the feature selection problem is combined with a subsequent learning method, and the advantages and disadvantages of the feature subset selection are evaluated by using a classifier, so that the performance of the selected subset has a great advantage. Meanwhile, the packaged feature selection is combined with various intelligent algorithms to improve the feature selection efficiency and reduce the time complexity. However, due to the difficulty and complexity of engineering problems, the optimization capability of the existing swarm intelligence algorithm is difficult to achieve the effects of good convergence, high convergence precision and low time complexity, so that the design of the feature selection method with short feature selection time, good performance of the selected feature subset and strong robustness has important theoretical value and research significance.

Through the search of the prior literature, zhang Yong and the like in Information Sciences (2017, vol.418-419, no.47, pp.561-574) disclose a feature selection method based on a binary firefly algorithm, and the defect that the firefly algorithm is easy to fall into a local optimal solution is improved by combining a plurality of improvement methods while fully utilizing the advantages of simplicity and high convergence rate of the firefly algorithm. However, since the firefly algorithm is an intelligent group algorithm proposed earlier, the convergence accuracy of the firefly algorithm has obvious defects, and the higher convergence accuracy is difficult to realize by correspondingly improving the algorithm by using the traditional method, the method cannot achieve quite ideal results in the aspect of feature subset selection. Balakrishanan K et al, "A novel control factor and Brownian motion-based improved Harris Hawks Optimization for feature selection", published in Journal of Ambient Intelligence and Humanized Computing (2022, pp.1-23), propose a Harris hawk algorithm based on a new control factor and Brownian motion and apply the algorithm to feature selection. The method improves the Harris eagle algorithm, improves the convergence of the Harris eagle algorithm, and obtains higher classification accuracy and more ideal feature subset results on feature selection. However, the method only selects the high-dimension data set to be compared with the existing feature selection method, and the advantages of the low-dimension data set are difficult to embody, so that the robustness is not strong. Gu Heming et al, in "synchronous optimization feature selection based on the genetic wuyangull algorithm" published in "Automation chemistry report" (2022, vol.48, no.06, pp.1601-1615), propose a model for optimizing support vector machine parameters and performing feature selection simultaneously by using an algorithm, which greatly reduces the time spent for the feature selection of the package. The genetic Wuyangull algorithm combining the genetic algorithm and the Wuyangull algorithm is improved in balance and optimizing capability between exploration and utilization, so that the accuracy of the algorithm is greatly improved, the average number of the selected feature subsets is low, the purpose that the feature selection is the dimension reduction of the data set is achieved perfectly, the time advantage of the algorithm is not obvious, and the convergence accuracy and the time are not simultaneously considered.

The existing literature shows that the important features can be screened by applying an effective optimization algorithm to feature selection, so that the dimension of a data set can be reduced better, and subsequent data processing is facilitated. The traditional optimization method is applied to feature selection, so that the problems of low convergence accuracy, high time cost and the like are met. The above problems are improved by introducing some group intelligent methods such as harris eagle mechanism, artificial jellyfish mechanism and the like in the field of feature selection, but since some newly proposed methods have the above problems, various effective improvements are needed for the problems existing when the methods are applied to feature selection. In addition, the feature selection method adopting synchronous optimization can effectively reduce the time complexity of the packaged feature selection, but the performance of supporting the super-parameter selection of the vector machine and the feature subset selection is considered in the selection of the algorithm. In order to achieve the goals of rapid convergence of the intelligent optimization method and high superiority of the selected feature subset, new evolution methods are required to be designed, and a feasible and effective feature selection method is provided. The invention provides a novel method for synchronously selecting two super parameters and a feature subset of a support vector machine by designing a basic discrete quantum artificial jellyfish mechanism on the basis of a packaged synchronous optimization feature selection method.

Disclosure of Invention

The invention aims to solve the problems of insufficient precision and high time complexity of a feature subset selected by the existing packaged feature selection method, and further provides a quantum artificial jellyfish search mechanism which has higher convergence speed and higher effectiveness and can solve engineering problems and is widely applied. According to the invention, the quantum optimization theory and the artificial jellyfish search mechanism are combined to obtain the quantum artificial jellyfish search mechanism, and the artificial jellyfish search mechanism for solving the continuous optimization problem is discretized, so that the artificial jellyfish search mechanism has higher robustness, and the application limit of the artificial jellyfish search mechanism is broken through. Meanwhile, the time complexity of the traditional packaged feature selection can be effectively reduced by adopting the packaged feature selection of the synchronous optimization method. The group intelligent optimization method is utilized to simultaneously carry out the optimization of the super parameters of the support vector machine and the selection of the feature subsets, so that the performance of the selected intelligent optimization method is high, and the quantum artificial jellyfish search mechanism designed by the invention has excellent convergence and shorter time. Therefore, compared with some feature selection methods based on the traditional group intelligence, the designed feature selection method based on the quantum artificial jellyfish search mechanism has the advantages of higher convergence speed, higher convergence precision, lower time complexity and better robustness.

The purpose of the invention is realized in the following way: the method comprises the following steps:

step one: inputting a data set, preprocessing the data set, normalizing the data set row and dividing a training set and a testing set, and constructing a model for training, testing and classifying the data set by using a support vector machine.

Input data set i= [ (m) ₁ ,y ₁ ),(m ₂ ,y ₂ ),...,(m _L ,y _L )]Wherein M= [ M ] ₁ ,m ₂ ,...,m _L ]For data samples in a dataset, y= [ Y ] ₁ ,y ₂ ,...,y _L ]For class labels in a dataset, L is the total number of data samples in the dataset, each data sample having n feature elements in a feature vector, i.e., m _i ＝[m _i1 ,m _i2 ,...,m _in ]I=1, 2,..l, n is the number of features that the dataset contains. Preprocessing the input data set, and converting all data samples and class labels in the data set into numbers. Normalizing the data set after preprocessing, setting the first data setThe data samples are->Data sample maximum value m _max ＝[m _1,max ,m _2,max ,...,m _n,max ]Minimum value of m _min ＝[m _1,min ,m _2,min ,...,m _n,min ]Wherein->Is->Maximum value of individual characteristic elements, +.>Is->Minimum value of individual characteristic elements, +.>Using normalization formula->Normalizing all data samples in the data set to obtain a normalized data set I ' = [ (m ') ' ₁ ,y ₁ ),(m′ ₂ ,y ₂ ),...,(m′ _L ,y _L )]Wherein m is _i ＝[m′ _i1 ,m′ _i2 ,...,m′ _in ]，i＝1,2,...,L，M′＝[m′ ₁ ,m′ ₂ ,...,m′ _L ]M' is the normalized data sample set. Randomly selecting each group of characteristic data in the data set input into the support vector machine to be alpha in duty ratio ₁ The data sample of (2) and the class label thereof are used as training sets, the rest data are test sets, and the training set is set as I' ₁ Test set I' ₂ . Because of the complexity of the data set, the invention adopts the directed acyclic graph method to construct a support vector machine model for multi-classification to classify the data set input into the support vector machine. Let the feature number included in the data set input to the support vector machine be k, construct +.>A nonlinear separable support vector machine for converting a k-type problem intoSolving the two kinds of problems to finish the method for containing the categoriesThe data set with number k carries out the accuracy problem.

Training set I is first of all used in classification process ₁ The initial hyper-parameters of the' sum support vector machine are input to the support vector machine for training. The training process is a process of searching the optimal classification hyperplane, the optimal penalty factor and the relaxation variable by the support vector machine. This optimization process is equivalent to solving a quadratic programming problem by introducing lagrangian multipliers to construct constrained optimization equations. Constraint optimization equations for class i and class j areConstraint is->And->Wherein->Lagrangian multiplier for class i and class j, C penalty factor,/>For supporting the kernel function of the vector machine, K represents the kernel function symbol. Due to lack of prior knowledge, the kernel function selects a Gaussian kernel function which can be mapped to infinity, and the expression is +. >Wherein the method comprises the steps ofFor the Euclidean distance between two data samples, exp () is an exponential function based on a natural constant e, ++>Delta is the standard deviation of all data samples in the dataset. The training process is carried out by solving->Optimal solution of->Constructing optimal hyperplanes of the ith class and the jth class, and performing traversal optimization on penalty factors and relaxation variables within a certain range.

(1) Input support vector machine initial penalty factorAnd initial relaxation variable +.>Setting the scope of the search grid asAnd->The search step length is r ₃ The current search number is +.>The maximum search number is +.>Search starting point parameter is->And->Wherein r is ₁ Searching radius for penalty factor, r ₂ Search radius for relaxation variables.

(2) Searching and recording by adopting a sequence minimum optimization methodComputing optimal weight vectors for class i and class jAnd optimum bias->And then constructing decision functions of the ith class and the jth classWhere sgn () is a sign function, for any variable u, satisfy

(3) Using hinge loss function as loss function of cross validation, and calculating by adopting c-fold cross validation method to obtain the parametersAnd->The classification accuracy under.

(4) If it isAnd->Let->Returning to process (2) to continue the search when +.>Finishing the traversal of all values in one interval, namely +. >Is->Integer multiple of time-> Wherein->Is a round-down operation. When->Ending the search, setting the parameter with the highest corresponding classification accuracy as the i-th and j-th optimal super parameters obtained by training, and marking the parameter as C ^ij And zeta ^ij And outputting optimal decision functions of the ith class and the jth class.

(5) The k categories are arranged and combined to obtainCombining modes, wherein each combining mode carries out training of a support vector machine according to the methods from the process (1) to the process (4) to obtain +.>The group adapts to super parameters and corresponding decision functions of different data category combinations, and the model construction of the directed acyclic graph support vector machine is completed.

After training, the model obtained by training is stored, and the test set I 'is used for' ₂ The test is performed in an 'input support vector machine'. For any test data m' _r And judging the category to which the data sample belongs by adopting a discrete judging rule. If it meetsThen it is the i-th class, otherwise it is the j-th class. And after the classification of the test set is finished, comparing the class label obtained by the support vector machine through the test with the correct class label, and if the class label is the same as the correct class label, classifying the class label correctly, otherwise, classifying the class label incorrectly. The classification accuracy is the percentage of the data samples with correct classification to the total number of the data samples in the test set.

For a data set with a data class number of k, the class to which the data sample belongs is ω= [ ω ] ₁ ,ω ₂ ,...,ω _k ]The objective function of the constructed support vector machine isAnd meet the following requirementsWherein w is ^ij Weight vector for class i and class j hyperplane, b ^ij Bias for class i and class j hyperplane, +.>A mapping function that maps data samples to a high dimension. And searching an optimal classification hyperplane through an objective function, and realizing accurate classification of the complex data set.

Step two: initializing the quantum position of each individual in the quantum artificial jellyfish population, constructing and calculating the adaptability of the quantum artificial jellyfish individuals, and determining the initial global optimal quantum position of the population.

Setting the population scale of the quantum artificial jellyfish as N, the maximum iteration number of the whole population as T, T representing the iteration number, the maximum dimension of the search space as D, and the quantum position of the ith quantum artificial jellyfish of the T generation as The position of the quantum artificial jellyfish obtained by measuring the quantum position can be expressed as +.>The (d) th dimension quantum position of the (t) th generation (i) th quantum artificial jellyfish>Corresponding measuring position isThe measurement rule is +.>Wherein->Is a random number satisfying uniform distribution in the interval, d=n+2n ₁ 。

In the characteristic selection method based on the quantum artificial jellyfish searching mechanism, the first 2n of the quantum artificial jellyfish searching mechanism ₁ Initial penalty factors for searching support vector machines in multiple dimensions And initial relaxation variable +.>The remaining n dimensions are used for feature selection. Penalty factor for support vector machine>And relaxation variable->Adopting a binary coding mode, wherein the coding length of each variable is n ₁ Bits. For feature l ₁ ,l ₂ ,...,l _n If the value is 1, the characteristic is selected; otherwise, the value is 0, and the feature is not selected.

Because the main targets of feature selection are classification precision and the number of selected features, the ideal result is that the number of selected features is small and the classification precision is high, and therefore the application effect of a quantum artificial jellyfish mechanism and a comparison mechanism in supporting the feature selection of a vector machine is evaluated according to the two standards. The selected fitness function formula isWhere α is the classification accuracy, representing the classification accuracy atThe specific weight, gamma, of the fitness function _R For classifying the correctness, beta is the importance of the selected feature, and represents the weight of the number of the selected feature in the fitness function, wherein beta=1-alpha, < ->For the selected feature subset number.

Initializing the quantum position of the d dimension of the 1 st generation i quantum artificial jellyfish as [0,1 ]]Random numbers in between, i=1, 2,..n, d=1, 2,..d. Obtaining the position of each quantum artificial jellyfish according to the measurement rule, and generating initial support vector machine parameters And->The binary code is converted into decimal, the selected feature subsets are divided into a training set and a testing set and then are input into the support vector machine model together to obtain the classification accuracy, and the classification accuracy is substituted into the fitness function to obtain the initial fitness value of each quantum artificial jellyfish, wherein the smaller the fitness value is, the better the fitness value is. And setting the quantum position of the quantum artificial jellyfish with the optimal initial fitness value in the population as the initial global optimal quantum position of the population.

Step three: following ocean current movement. The quantum artificial jellyfish body updates the quantum position of the jellyfish body through a quantum revolving door, and searches the optimal solution in the global range.

Time control function of mechanism adopted by ith quantum artificial jellyfish to feedAnd constant p ₀ Control, i=1, 2,..n, when +.>The time quantum artificial jellyfish moves along with ocean current, < >> Is a random number between (0, 1).

The ocean current has stronger attraction to the artificial jellyfish due to the fact that the ocean current is rich in food, the foraging movement of the ocean current has important influence, in the following ocean current movement process, the optimal quantum position in the quantum artificial jellyfish population is set to be the ocean current direction, the updating process of the quantum position of each quantum artificial jellyfish is mainly completed through a simulated simplified quantum revolving door, and the quantum revolving door updating process of the ith quantum artificial jellyfish in the d dimension is as follows Wherein->D=1, 2, D, # for quantum rotation angle when following ocean current motion> Is the quantum position of the globally optimal solution in the previous t iteration processes, namely the quantum position of the quantum artificial jellyfish with the optimal fitness value in the iteration process, </i >>And->Is a random number between (0, 1), beta ₁ Is a distribution factor. If the quantum rotation angle->Quantum bit->With quantum NOT gate with a small probability of updatingA process can be expressed as->Therefore, in following ocean current movement, the update formula of the quantum artificial jellyfish on the quantum position of the jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">To follow the variation probability of ocean current motion.

Step four: movement within the population. In the intra-population movement, each quantum artificial jellyfish has two different foraging strategies of active movement and passive movement, and the quantum artificial jellyfish population is locally developed.

The motion of the quantum artificial jellyfish in the population is expressed as two motion modes of active motion and passive motion. When (when)The quantum artificial jellyfish moves in the population when +.>The quantum artificial jellyfish shows active movement; otherwise the quantum artificial jellyfish shows passive movement, < ->Is a random number between (0, 1).

The passive motion is a motion form of searching around a quantum artificial jellyfish body, and the quantum revolving door updating process of the ith quantum artificial jellyfish d-th dimensional quantum position is as followsWherein-> For quantum rotation angle during passive movement, +.>Is a standard normal random number, < >>Is the d-th dimension average quantum position of the quantum artificial jellyfish population,>if the quantum rotation angle->Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>Thus, in passive motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">Is the variation probability of passive motion.

The active motion is a motion form of searching quantum artificial jellyfish by means of optimal position, and the ith quantum artificial jellyfish is the d-th dimensional quantumThe quantum revolving door updating process of the position is as followsWherein-> For quantum rotation angle during active movement, +.>And->Is a random number between (0, 1, ">The method is an optimal quantum position of the d dimension of the i-th quantum artificial jellyfish in the iteration process, namely the d dimension quantum position when the i-th quantum artificial jellyfish fitness value is optimal. If the quantum rotation angle- >Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>Thus, in the active motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">Is the variation probability of passive motion.

Step five: and calculating the fitness value of each quantum artificial jellyfish after quantum position updating, and updating the optimal quantum position of the quantum artificial jellyfish population.

Obtaining the quantum position of the ith quantum artificial jellyfish in the t+1st iterationThen, the position of the i-th quantum artificial jellyfish is measured by using a measurement rule>Converting two initial hyper parameters of the support vector machine obtained by searching into decimal from binary codes, dividing a training set and a testing set for the selected feature subset, and inputting the training set, the testing set and the initial hyper parameters of the support vector machine into the support vector machine model to obtain the classification accuracy. Substituting the obtained classification accuracy and the number of the selected feature subsets into an fitness function to calculate a corresponding fitness value, and recording the fitness value of the t+1st iteration quantum artificial jellyfish. If the optimal fitness value in the t+1th iteration quantum artificial jellyfish population is smaller than the fitness value of the global optimal solution obtained in the previous t iterations, setting the optimal quantum position of the t+1th iteration as the global optimal quantum position; otherwise, the globally optimal quantum position of the t+1st iteration is equal to the globally optimal quantum position in the previous t iterations.

Step six: judging whether the maximum iteration times T are reached, if not, enabling t=t+1, and returning to the step three to continue iteration; otherwise, outputting the selected optimal feature subset, and simultaneously obtaining the classification accuracy and classification result.

Compared with the prior art, the invention has the beneficial effects that: the traditional artificial jellyfish search mechanism has the problems of low convergence speed, low convergence precision, easiness in sinking into a local optimal solution and the like, and can only be used for solving a continuous optimization problem. Aiming at the problems, the invention provides a discrete quantum artificial jellyfish searching mechanism which combines an artificial jellyfish searching mechanism with a quantum optimization theory, designs a brand new evolution strategy based on quantum coding and a simulated quantum revolving door. The combination of the quantum optimization theory greatly improves the convergence speed and convergence precision of the traditional artificial jellyfish searching mechanism, breaks the application limit of the traditional artificial jellyfish searching mechanism, and can solve the discrete optimization problem. The designed quantum artificial jellyfish searching mechanism can rapidly solve the objective function with high precision, has superior convergence performance and stronger robustness.

The traditional packaged feature selection method continuously selects a feature subset from an initial data set, trains a learner, and evaluates the selected feature subset according to the performance of the learner until an optimal feature subset is selected. While the feature subset selected by the encapsulated feature selection has a higher accuracy than the feature subset selected by the filtered feature selection from the classification effect of the learner, the computational overhead of the encapsulated feature selection is typically much greater than the filtered feature selection because the encapsulated feature selection is continually trained by the learner. In order to reduce the calculation cost and reduce the time complexity of the packaged feature selection, the invention adopts a synchronous optimization feature selection method, and adopts a group intelligent search mechanism to simultaneously complete the super-parameter optimization of the support vector machine and the selection of a feature subset. Compared with some classical feature selection methods, such as greedy algorithm and recursive feature elimination, the feature selection method with synchronous optimization can greatly reduce the complexity of calculation and reduce the time for searching the optimal feature subset.

In the synchronous optimization feature selection, the performance of the intelligent search mechanism for the selected group is higher. The traditional group intelligent method is difficult to achieve convergence and time, and the advantages of high accuracy and short time of the synchronous optimization feature selection method cannot be fully exerted in practical application. The quantum artificial jellyfish searching mechanism provided by the invention improves the defects of the traditional artificial jellyfish searching mechanism through a quantum optimization theory on the aspect of feature selection, greatly reduces the time, makes up the defects of package type feature selection, has more excellent performance through the measurement of three indexes of average classification accuracy, average number of selected features and average running time, and can show that the quantum artificial jellyfish searching mechanism has stronger use value in the field of feature selection.

Drawings

FIG. 1 is a flow chart of a feature selection method based on a quantum artificial jellyfish search mechanism;

FIG. 2 is a schematic diagram of the k classification by Directed Acyclic Graph method;

FIG. 3 is a graph of convergence performance versus time for different methods for the same low-dimensional dataset;

FIG. 4 is a graph of convergence performance versus time for different methods under the same high-dimensional dataset;

FIG. 5 is a histogram of selected feature averages for different methods under different data sets;

fig. 6 is a histogram of average classification correctness for different methods under different data sets.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

The invention is described in further detail below with reference to the drawings and the detailed description. The flow of the characteristic selection method based on the quantum artificial jellyfish search mechanism is shown in figure 1, and the technical scheme of the invention comprises the following steps:

Input data set i= [ (m) ₁ ,y ₁ ),(m ₂ ,y ₂ ),...,(m _L ,y _L )]Wherein M= [ M ] ₁ ,m ₂ ,...,m _L ]For data samples in a dataset, y= [ Y ] ₁ ,y ₂ ,...,y _L ]For class labels in a dataset, L is the total number of data samples in the dataset, each data sample having n feature elements in a feature vector, i.e., m _i ＝[m _i1 ,m _i2 ,...,m _in ]I=1, 2,..l, n is the number of features that the dataset contains. Preprocessing the input data set, and converting all data samples and class labels in the data set into numbers. Normalizing the data set after preprocessing, setting the first data set The data samples are->Data sample maximum value m _max ＝[m _1,max ,m _2,max ,...,m _n,max ]Minimum value of m _min ＝[m _1,min ,m _2,min ,...,m _n,min ]Wherein->Is->Maximum value of individual characteristic elements, +.>Is->Minimum value of individual characteristic elements, +.>Using normalization formula->Normalizing all data samples in the data set to obtain a normalized data set I ' = [ (m ') ' ₁ ,y ₁ ),(m′ ₂ ,y ₂ ),...,(m′ _L ,y _L )]Wherein m 'is' _i ＝[m′ _i1 ,m′ _i2 ,...,m′ _in ]，i＝1,2,...,L，M′＝[m′ ₁ ,m′ ₂ ,...,m′ _L ]M' is the normalized data sample set. Will inputEach group of characteristic data in the data set of the support vector machine is randomly selected to have a duty ratio alpha ₁ The data sample of (2) and the class label thereof are used as training sets, the rest data are test sets, and the training set is set as I' ₁ Test set I' ₂ . Because of the complexity of the data set, the invention adopts the directed acyclic graph method to construct a support vector machine model for multi-classification to classify the data set input into the support vector machine. Let the feature number included in the data set input to the support vector machine be k, construct +.>A nonlinear separable support vector machine for converting a k-class problem into +.>Solving the two kinds of problems, and further completing the accurate problem on the data set with the category number of k.

Training set I 'is first of all used in the classification process' ₁ And inputting initial super parameters of the support vector machine into the support vector machine for training. The training process is a process of searching the optimal classification hyperplane, the optimal penalty factor and the relaxation variable by the support vector machine. This optimization process is equivalent to solving a quadratic programming problem by introducing lagrangian multipliers to construct constrained optimization equations. Constraint optimization equations for class i and class j are Constraint is->And->Wherein->Lagrangian multiplier for class i and class j, C penalty factor,/>For supporting the kernel function of the vector machine, K represents the kernel function symbol. Due to lack of prior knowledge, the kernel function selects a Gaussian kernel function which can be mapped to infinity, and the expression is +.>Wherein the method comprises the steps ofFor the Euclidean distance between two data samples, exp () is an exponential function based on a natural constant e, ++>Delta is the standard deviation of all data samples in the dataset. The training process is carried out by solving->Optimal solution of->Constructing optimal hyperplanes of the ith class and the jth class, and performing traversal optimization on penalty factors and relaxation variables within a certain range.

(1) Input support vector machine initial penalty factorAnd initial relaxation variable +.>Setting the scope of the search grid to +.>And->The search step length is r ₃ The current search number is +.>The maximum search number is +.>Search starting point parameter is->And->Wherein r is ₁ Searching radius for penalty factor, r ₂ Search radius for relaxation variables.

(4) If it isAnd->Let->Returning to process (2) to continue the search when +.>Finishing the traversal of all values in one interval, namely +.>Is->Integer multiple of time-> Wherein->Is a round-down operation. When->Ending the search, setting the parameter with the highest corresponding classification accuracy as the i-th and j-th optimal super parameters obtained by training, and marking the parameter as C ^ij And zeta ^ij And outputting optimal decision functions of the ith class and the jth class.

After training, the model obtained by training is stored, and the test set I 'is used for' ₂ And inputting the test result to a support vector machine for testing. For any test data m' _r And judging the category to which the data sample belongs by adopting a discrete judging rule. If it meetsThen it is the i-th class, otherwise it is the j-th class. And after the classification of the test set is finished, comparing the class label obtained by the support vector machine through the test with the correct class label, and if the class label is the same as the correct class label, classifying the class label correctly, otherwise, classifying the class label incorrectly. The classification accuracy is the percentage of the data samples with correct classification to the total number of the data samples in the test set.

Setting the population scale of the quantum artificial jellyfish as N and the maximum overlap of the whole populationThe generation number is T, T represents the iteration number, the maximum dimension of the search space is D, and the quantum position of the ith quantum artificial jellyfish of the T generation is The position of the quantum artificial jellyfish obtained by measuring the quantum position can be expressed as +.>The (d) th dimension quantum position of the (t) th generation (i) th quantum artificial jellyfish>Corresponding measuring position isThe measurement rule is +.>Wherein->Is a random number satisfying uniform distribution in the interval, d=n+2n ₁ 。

In the characteristic selection method based on the quantum artificial jellyfish searching mechanism, the first 2n of the quantum artificial jellyfish searching mechanism ₁ Initial penalty factors for searching support vector machines in multiple dimensionsAnd initial relaxation variable +.>The remaining n dimensions are used for feature selection. Penalty factor for support vector machine>And relaxation variable->Adopting a binary coding mode, wherein the coding length of each variable is n ₁ Bits. For feature l ₁ ,l ₂ ,...,l _n If the value is 1, the characteristic is selected; otherwise, the value is 0, and the feature is not selected.

Because the main targets of feature selection are classification precision and the number of selected features, the ideal result is that the number of selected features is small and the classification precision is high, and therefore the application effect of a quantum artificial jellyfish mechanism and a comparison mechanism in supporting the feature selection of a vector machine is evaluated according to the two standards. The selected fitness function formula isWhere α is the accuracy of the classification, and represents the specific gravity of the classification accuracy in the fitness function, γ _R For classifying the correctness, beta is the importance of the selected feature, and represents the weight of the number of the selected feature in the fitness function, wherein beta=1-alpha, < ->For the selected feature subset number.

Initializing the quantum position of the d dimension of the 1 st generation i quantum artificial jellyfish as [0,1 ]]Random numbers in between, i=1, 2,..n, d=1, 2,..d. Obtaining the position of each quantum artificial jellyfish according to the measurement rule, and generating initial support vector machine parameters And->The binary code is converted into decimal, the selected feature subsets are divided into a training set and a testing set and then are input into the support vector machine model together to obtain the classification accuracy, and the classification accuracy is substituted into the fitness function to obtain the initial fitness value of each quantum artificial jellyfish, wherein the smaller the fitness value is, the better the fitness value is. Quantum artificial method for optimizing initial fitness value in populationThe quantum position of jellyfish is set as the initial global optimal quantum position of the population.

The ocean current has stronger attraction to the artificial jellyfish due to the fact that the ocean current is rich in food, the foraging movement of the ocean current has important influence, in the following ocean current movement process, the optimal quantum position in the quantum artificial jellyfish population is set to be the ocean current direction, the updating process of the quantum position of each quantum artificial jellyfish is mainly completed through a simulated simplified quantum revolving door, and the quantum revolving door updating process of the ith quantum artificial jellyfish in the d dimension is as follows Wherein->D=1, 2, D, # for quantum rotation angle when following ocean current motion> Is the quantum position of the globally optimal solution in the previous t iteration processes, namely the quantum position of the quantum artificial jellyfish with the optimal fitness value in the iteration process, </i >>And->Is a random number between (0, 1), beta ₁ Is a distribution factor. If the quantum rotation angle->Quantum bitThe update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>Therefore, in following ocean current movement, the update formula of the quantum artificial jellyfish on the quantum position of the jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">To follow the variation probability of ocean current motion.

The passive motion is a motion form of searching around a quantum artificial jellyfish body, and the quantum revolving door updating process of the ith quantum artificial jellyfish d-th dimensional quantum position is as followsWherein-> R is the quantum rotation angle during passive motion ₁ ^t Is a standard normal random number, < >>Is the d-th dimension average quantum position of the quantum artificial jellyfish population,>if the quantum rotation angle->Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>Thus (2)In passive motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">Is the variation probability of passive motion.

The active motion is a motion form of searching the quantum artificial jellyfish by means of the optimal position, and the quantum revolving door updating process of the ith quantum artificial jellyfish d-th dimensional quantum position is as followsWherein-> For quantum rotation angle during active movement, +.>And->Is a random number between (0, 1, ">The method is an optimal quantum position of the d dimension of the i-th quantum artificial jellyfish in the iteration process, namely the d dimension quantum position when the i-th quantum artificial jellyfish fitness value is optimal. If the quantum rotation angle- >Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>Thus, in the active motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish can be expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">Is the variation probability of passive motion.

Obtaining the quantum position of the ith quantum artificial jellyfish in the t+1st iterationThen, the position of the i-th quantum artificial jellyfish is measured by using a measurement rule>Converting two initial hyper parameters of the support vector machine obtained by searching into decimal from binary codes, dividing a training set and a testing set for the selected feature subset, and inputting the training set, the testing set and the initial hyper parameters of the support vector machine into the support vector machine model to obtain the classification accuracy. Substituting the obtained classification accuracy and the number of the selected feature subsets into an fitness function to calculate a corresponding fitness value, and recording the fitness value of the t+1st iteration quantum artificial jellyfish. If the optimal fitness value in the t+1st iteration quantum artificial jellyfish population is smaller than that obtained in the previous t iterations Setting the optimal quantum position of the t+1st iteration as the global optimal quantum position according to the fitness value of the global optimal solution; otherwise, the globally optimal quantum position of the t+1st iteration is equal to the globally optimal quantum position in the previous t iterations.

For convenience of description, the quantum artificial jellyfish searching mechanism provided by the invention is abbreviated as QJS, and the searching mechanism for comparison is abbreviated as an artificial jellyfish searching mechanism, a harris eagle mechanism and a particle swarm mechanism which are respectively abbreviated as JS, HHO and PSO.

In order to comprehensively compare the performances of the four methods, the parameter of the quantum artificial jellyfish searching mechanism is set as alpha ₁ ＝40％，c＝10，r ₁ ＝50，r ₂ ＝10，r ₃ ＝0.5，α＝0.99，p ₀ ＝0.5，β ₁ ＝3，n ₁ =15. The quantum artificial jellyfish searching mechanism adopts a fifteen-bit binary coding mode to perform discrete searching on C and ζ, and the other comparison mechanism uses the first two dimensions to perform continuous searching on C and ζ. Relevant parameters of JS see Zhao Xuewu et al, "artificial jellyfish search optimization algorithm for human brain function division", published in computer science and exploration (2022, vol.16, no.08, pp.1829-1841); HHO-related parameters are described in "A novel control factor and Brownian motion-based improved Harris Hawks Optimization for feature selection" published by Balakrishnan K et al in Journal of Ambient Intelligence and Humanized Computing (2022, pp.1-23); relevant parameters of PSO are Li Jianyu et al in "self-monitoring data-driven particle swarm optimization algorithm for large-scale feature selection" published in "Intelligent systems journal" (2023, vol.01, no.18, pp.1-13). The population size and the maximum number of iterations of the four mechanisms are the same, and the values are n=30 and t=50 respectively. Each method is independently operated 50 times, and 50 times of operation are taken And drawing a fitness curve by using the fitness average value.

The fitness curve simulation result of the same low-dimensional data set under the same initial value is shown in fig. 3, and the average value of 50 runs is taken and drawn. The fitness curve simulation result of the same high-dimensional data set under the same initial value is shown in fig. 4, and the average value of 50 runs is taken for drawing. The classification accuracy and the selected feature number obtained by different data sets using different optimization methods are shown in fig. 5 and 6, and the histogram is drawn by taking the average value of 50 runs. As can be seen from fig. 3 and fig. 4, compared with JS, HHO and PSO, QJS has the advantages of fast convergence speed and high convergence accuracy, regardless of the dimension of the data set, and the better convergence of QJS is fully reflected.

For more accurate measurement of the effect of feature selection, 6 data sets with different dimensions are selected for feature selection, and 3 indexes of average classification accuracy, average number of selected features and average running time are selected for measurement, and the measurement results of the average classification accuracy and the average number of selected features are shown in fig. 5 and 6. It can be seen from fig. 5 and 6 that the feature subset selected by QJS has the highest accuracy and the least number of features for the different dimensional data sets.

The average classification accuracy is the ratio of the classification accuracy obtained through cross verification after training and testing the selected feature subset according to the support vector machine and the repeated experiment number, and is used for measuring the accuracy of the selected feature subset to the subsequent learning algorithm. The average number of the selected features is the ratio of the number of features contained in the selected feature subset to the number of repeated experiments, and is used for measuring the effect of feature selection on the dimension reduction of the data set. The average run time is the ratio of the total time spent by each optimization method after repeated runs to the number of repeated experiments, and is used for measuring the time complexity of each optimization method.

The four methods evaluate three indexes of the six data sets with different dimensions, and the evaluation results are shown in table 1. Wherein a larger value of the average classification accuracy indicates a higher accuracy of the selected feature subset, a smaller average number of selected features indicates a better effect achieved by feature selection, and a lower average run time indicates a lower time complexity of the optimization method used. The darkened font represents the optimal result among the four methods.

TABLE 1

As can be seen from table 1, all data of the present invention are better than the comparative method in the 72 sets of data tested under equivalent conditions, so it can be concluded that: the feature subset selected by the feature selection method based on the quantum artificial jellyfish search mechanism is obviously superior to the result of feature selection by the artificial jellyfish search mechanism, the Harris eagle mechanism and the particle swarm mechanism in terms of accuracy or feature quantity contained in the subset, and the feature selection method based on the quantum artificial jellyfish search mechanism has lower time complexity. Therefore, the method has stronger robustness, and the quantum artificial jellyfish searching mechanism designed by utilizing the quantum optimization theory can simultaneously give consideration to convergence and time consumption.

Claims

1. The characteristic selection method based on the quantum artificial jellyfish search mechanism is characterized by comprising the following steps:

step one: inputting a data set, preprocessing the data set, normalizing the data set row, dividing a training set and a testing set, and constructing a model for training, testing and classifying the data set by using a support vector machine;

step two: initializing the quantum position of each individual in the quantum artificial jellyfish population, constructing and calculating the adaptability of the quantum artificial jellyfish individuals, and determining the initial global optimal quantum position of the population;

step three: following ocean current movement: the quantum artificial jellyfish body updates the quantum position of the jellyfish body through a quantum revolving door, and searches the optimal solution in the global range;

step four: motion within the population: in the intra-population movement, each quantum artificial jellyfish has two different foraging strategies of active movement and passive movement, and the quantum artificial jellyfish population is locally developed;

step five: calculating the fitness value of each quantum artificial jellyfish after quantum position updating, and updating the optimal quantum position of the quantum artificial jellyfish population;

2. The method for selecting the characteristics based on the quantum artificial jellyfish searching mechanism according to claim 1, wherein the step one specifically comprises: input data set i= [ (m) ₁ ,y ₁ ),(m ₂ ,y ₂ ),...,(m _L ,y _L )]Wherein M= [ M ] ₁ ,m ₂ ,...,m _L ]For data samples in a dataset, y= [ Y ] ₁ ,y ₂ ,...,y _L ]For class labels in a dataset, L is the total number of data samples in the dataset, each data sample having n feature elements in a feature vector, i.e., m _i ＝[m _i1 ,m _i2 ,...,m _in ]I=1, 2,..l, n is the number of features contained in the dataset; preprocessing an input data set, and converting all data samples and category labels in the data set into numbers; normalizing the data set after preprocessing, setting the first data setThe data samples are->Data sample maximum value m _max ＝[m _1,max ,m _2,max ,...,m _n,max ]Minimum value of m _min ＝[m _1,min ,m _2,min ,...,m _n,min ]Wherein->Is->Maximum value of individual characteristic elements, +.>Is->Minimum value of individual characteristic elements, +.>Using normalization formula->Normalizing all data samples in the data set to obtain a normalized data set I ' = [ (m ') ' ₁ ,y ₁ ),(m′ ₂ ,y ₂ ),...,(m′ _L ,y _L )]Wherein m 'is' _i ＝[m′ _i1 ,m′ _i2 ,...,m′ _in ]，i＝1,2,...,L，M′＝[m′ ₁ ,m′ ₂ ,...,m′ _L ]M' is the normalized data sample set; randomly selecting each group of characteristic data in the data set input into the support vector machine to be alpha in duty ratio ₁ The data sample of (1) and the class label thereof are used as training sets, the rest data are test sets, and the training set is set as I ₁ ' test set I ₂ ′。

3. The method for selecting the characteristics based on the quantum artificial jellyfish searching mechanism according to claim 1, wherein the step two specifically comprises: setting the population scale of the quantum artificial jellyfish as N, the maximum iteration number of the whole population as T, T representing the iteration number, the maximum dimension of the search space as D, and the ith generation of the quantum bit of the ith quantum artificial jellyfishIs arranged asMeasuring the quantum position to obtain the position of the quantum artificial jellyfish, which is expressed as +.>The (d) th dimension quantum position of the (t) th generation (i) th quantum artificial jellyfish>The corresponding measuring position is +.>The measurement rule is +.>Wherein->Is a random number satisfying uniform distribution in the interval, d=n+2n ₁ ；

Initializing the quantum position of the d dimension of the 1 st generation i quantum artificial jellyfish as [0,1 ]]Random numbers in between, i=1, 2, N, d=1, 2,; obtaining the position of each quantum artificial jellyfish according to the measurement rule, and generating initial support vector machine parametersAnd->Converting binary codes into decimal codes, dividing the selected feature subsets into training sets and test sets, inputting the training sets and the test sets into the support vector machine model to obtain classification accuracy, substituting the classification accuracy into an fitness function to obtain an initial fitness value of each quantum artificial jellyfish, wherein the smaller the fitness value is, the more excellent the fitness value is; the quantum position of the quantum artificial jellyfish with the optimal initial fitness value in the population is set as the population Is used to determine the initial globally optimal quantum position of (1).

4. The method for selecting the characteristics based on the quantum artificial jellyfish searching mechanism according to claim 1, wherein the third step specifically comprises: time control function of mechanism adopted by ith quantum artificial jellyfish to feedAnd constant p ₀ Control, i=1, 2,..n, when +.>The quantum artificial jellyfish moves along with ocean current, is a random number between (0, 1);

the process for updating the quantum revolving door of the ith quantum artificial jellyfish in the d dimension is as followsWherein->D=1, 2, D, # for quantum rotation angle when following ocean current motion> Is the quantum position of the globally optimal solution in the previous t iteration processes, namely the quantum position of the quantum artificial jellyfish with the optimal fitness value in the iteration process, </i >>And->Is a random number between (0, 1), beta ₁ Is a distribution factor; if the quantum rotation angle->Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process can be expressed as +.>In following ocean current motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish can be expressed as +.>Wherein->Is [0,1]Random numbers distributed uniformly among them, ">To follow the variation probability of ocean current motion.

5. The method for selecting the characteristics based on the quantum artificial jellyfish searching mechanism according to claim 1, wherein the step four specifically comprises: when (when) The quantum artificial jellyfish moves in the population when +.>Time quantum artificial workJellyfish appears to be active; otherwise the quantum artificial jellyfish shows passive movement, < ->Is a random number between (0, 1);

the passive motion is a motion form of searching around a quantum artificial jellyfish body, and the quantum revolving door updating process of the ith quantum artificial jellyfish d-th dimensional quantum position is as followsWherein the method comprises the steps of R is the quantum rotation angle during passive motion ₁ ^t Is a standard normal random number, < >>Is the d-th dimension average quantum position of the quantum artificial jellyfish population,>if the quantum rotation angle->Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process is denoted +.>Thus in passive motion, the updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish is expressed asWherein->Is [0,1]Random numbers distributed uniformly among them, ">Variation probability for passive motion;

the active motion is a motion form of searching the quantum artificial jellyfish by means of the optimal position, and the quantum revolving door updating process of the ith quantum artificial jellyfish d-th dimensional quantum position is as followsWherein the method comprises the steps of For quantum rotation angle during active movement, +.>And->Is a random number between (0, 1, ">The method is an optimal quantum position of the d dimension of the i-th quantum artificial jellyfish in the iterative process, namely the d dimension quantum position of the i-th quantum artificial jellyfish when the fitness value of the i-th quantum artificial jellyfish is optimal; if the quantum rotation angle- >Quantum bit->The update is performed with a smaller probability with quantum NOT gate, this process is denoted +.>The updated formula of the d-th dimension quantum position of the i-th quantum artificial jellyfish is expressed as +.>Wherein->Is [0,1]Random numbers distributed uniformly among them, ">Is the variation probability of passive motion.

6. The method for selecting the characteristics based on the quantum artificial jellyfish searching mechanism according to claim 1, wherein the fifth step specifically comprises: obtaining the quantum position of the ith quantum artificial jellyfish in the t+1st iterationThen, the position of the ith quantum artificial jellyfish is measured by using a measurement ruleConverting two initial hyper parameters of the support vector machine obtained by searching into decimal from binary codes, dividing a training set and a testing set for the selected feature subset, and inputting the training set, the testing set and the initial hyper parameters of the support vector machine into the support vector machine model to obtain classification accuracy; substituting the obtained classification accuracy and the number of the selected feature subsets into an fitness function to calculate a corresponding fitness value, and recording the fitness value of the t+1st iteration quantum artificial jellyfish; if the optimal number in the t+1st iteration quantum artificial jellyfish populationThe fitness value is smaller than the fitness value of the global optimal solution obtained in the previous t iterations, and the optimal quantum position of the t+1st iteration is set as the global optimal quantum position; otherwise, the globally optimal quantum position of the t+1st iteration is equal to the globally optimal quantum position in the previous t iterations.