CN111553475A

CN111553475A - High-dimensional multi-mode evolution optimization method based on random embedding technology

Info

Publication number: CN111553475A
Application number: CN202010341693.6A
Authority: CN
Inventors: 候亚庆; 曹雨梦; 杨鑫; 张强; 魏小鹏
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2020-08-18

Abstract

The invention belongs to the crossing field of evolution calculation and machine learning, and relates to a high-dimensional multi-mode evolution optimization method based on a random embedding technology, which can be used for automatic parameter adjustment of a multi-classification Support Vector Machine (SVM) model. The invention is an improved optimization algorithm which is obtained based on a multi-time optimization parallel operation and is used for carrying out transfer learning among a plurality of subtasks by adopting a self-coding technology so as to improve the optimization efficiency, can quickly and effectively optimize and adjust a large number of parameters at the same time, and is very suitable for the super-parameter adjustment of a large-scale machine learning model. According to the method, a plurality of low-dimensional random embeddings are generated, each random embedding can be used as a substitute representation of an original high-dimensional task, uniqueness and correlation existing among subtasks are utilized, and beneficial transfer learning among subtasks is carried out, so that on one hand, the searching efficiency of the optimal solution is effectively improved, and meanwhile, the time consumption of multiple times of operation is avoided.

Description

High-dimensional multi-mode evolution optimization method based on random embedding technology

Technical Field

The invention belongs to the crossing field of evolution calculation and machine learning, and relates to a high-dimensional multi-mode evolution optimization method based on a random embedding technology, which can be used for automatic parameter adjustment of a multi-classification Support Vector Machine (SVM) model.

Background

In clinical work, medical images provide important auxiliary information for clinical decision making, but traditional image diagnosis is mainly based on subjective judgment of radiologists, diagnosis accuracy depends on personal experience of experts, and due to the fact that the level of doctors is limited, misdiagnosis is easily caused in many hospitals, and subsequent treatment is affected. If the computer-aided doctor can be used for detection, the diagnosis accuracy can be greatly improved, and the doctor is helped to make a subsequent treatment scheme.

In the aspect of computer-aided diagnosis, the current main method is to finely adjust a deep neural network of a natural picture and then use the fine adjustment for a classification task of medical images (Kermany, Goldbaum.et al in 2018). However, the features of the medical image are not rich in natural pictures, and the difference between different pictures is relatively small, so that the classification task of the medical image is more biased to fine-grained classification, a very deep neural network is not needed when a model is constructed, and conversely, if the network scale is enlarged to improve the classification accuracy rate by only deepening the neural network, overfitting is easily caused, and the classification effect is reduced. Compared with a deep neural network, the support vector machine is a traditional machine learning classification model, although the scale of processable data is small, the model classification speed is high, local optimization is not caused, and the support vector machine is combined with the neural network and is very suitable for being used as a medical image diagnosis task.

However, in a multi-disease diagnosis system constructed by using a support vector machine, many hyper-parameters need to be adjusted, and the problem of hyper-parameter tuning of a model is also a significant challenge in the field of machine learning. In the modeling process, not only how to build an optimal network structure needs to be considered, but also finding an optimal hyper-parameter combination is a very important task, and because parameter adjustment of the machine learning model has no fixed rule and mode, parameter adjustment can only be performed depending on human experience and random attempts, and manpower and time are consumed. With the continuous expansion of the super parameter dimension, the super parameter tuning becomes an NP difficult problem, and the optimal parameter combination is difficult to find only by manpower. In the development process of the machine learning algorithm, most of the time is used for constructing an optimal model, namely, an optimal hyper-parameter combination is searched, no matter manual parameter adjustment is carried out, or methods such as random search and grid search are used, a large amount of time is consumed, and the time cost is exponentially increased along with the amplification of the parameter number. Therefore, if automated hyper-parametric tuning can be achieved, modeling efficiency would be greatly improved.

The multi-classification support vector machine hyperparameter tuning essentially belongs to a combined optimization problem, and for the optimization problem, a widely applied method is an evolutionary algorithm. The evolutionary algorithm is a metaheuristic optimization algorithm based on population, and is inspired by Darwinian biological evolution theory. The genetic algorithm, the differential evolution and the like are easy to realize, have strong universality and are widely applied to solving the optimization problem. Although the evolutionary algorithm has wide potential applicability, previous researches show that the efficiency of the genetic algorithm is gradually reduced along with the increase of the problem dimension, so that if the existing genetic algorithm is expanded to the problem of high-dimensional hyperparameter optimization, the most direct solution idea is to reduce the dimension of the hyperparameter dimension through a dimension reduction technology and then optimize the hyperparameter dimension by applying the genetic algorithm. Among this class of methods, decomposition-based and embedding-based methods are currently the two most dominant technical approaches.

Among them, decomposition-based approaches have been successfully applied to separable problems, i.e., a high-dimensional search space can be decomposed into multiple low-dimensional sub-problems (Kandasamy, Schneider, and P' oczos 2015, Friesen and Domingos (2015)). However, due to the nature of the technology, the application of the technology to the hyper-parameter optimization problem of the machine learning model to be solved is not simple, because the hyper-parameters of the machine learning are usually not independent of each other, and often have complex dependency relationships, and the optimization needs to be carried out simultaneously.

On the other hand, considering that some decision variables do not have significant influence on the target problem in the optimization problem, the embedding method (random projection, random embedding, and so on Chen, Krause, and Castro (2012), Carpentier, Munos, and others 2012, (Wang et al 2013; Kaban, Bootkrajang, and Durrant2013, (Qian, Hu, and Yu 2016) can be used to perform efficient search on the high-dimensional space.

Research shows that for a high-dimensional model parameter optimization problem in machine learning, parameters of most dimensions do not significantly influence a target problem. I.e., these problems are all characterized by a low effective dimension, the mathematical description for solving such problems with random embedding is as follows: the target function f (X) X → R^DEffective dimension d_e(d_e<D) If there is one dimension d_eThe linear subspace Ψ is such that

Then

f(x)＝f(x_e+x_⊥)＝f(x_e)

Wherein the content of the first and second substances,

,Ψ^⊥the orthogonal complement of Ψ is represented. Ψ means the effective subspace of f, Ψ^⊥Refers to the constant subspace of f. The equation represents the function f for the dimension d_eThe variation in the subspace Ψ (effective subspace) is very sensitive, but at the subspace Ψ^⊥Function value table on (constant subspace)Hardly changed. On the premise, the random embedding method can solve the optimization problem with high dimensionality. In particular, consider an effective dimension of d_eFunction f (X) X → R, random embedded matrix M ∈ R^D×dEach element of the matrix is independently sampled in a normal distribution N (0,1), where D>d≥d_eThus for any x ∈ R^DThere must be one solution y ∈ R^dThus, therefore, it is

f(x)＝f(My)

For any maximum x^*∈R^DMust exist in y^*∈R^dSo that f (My)^*)＝f(x^*). Accordingly, the optimization problem f (x) is described as follows:

wherein g (y) ═ f (my). We can therefore optimize the low dimensional objective function g (y) and then use the matrix M to demap back into the high dimensional search space.

In order to make the evolutionary algorithm feasible, the boundary of the low-dimensional search space needs to be further defined. After mapping back to the high dimensional space, the point beyond the boundary is fixed on the boundary.

However, due to the randomness of the random embedding method, sometimes the optimal solution is not necessarily included in the compressed space. According to the method and assumption of random embedding, the probability that the optimal solution has γ (non-zero) does not fall within the boundaries of the generated low-dimensional subspace, and thus it is difficult to determine whether the randomly generated low-dimensional space is valid. Therefore, the random embedding method is not ideal for the machine learning model parameter adjustment problem.

Existing approaches overcome this problem by considering multiple random embeddings. Specifically, in the hyper-parametric optimization problem of the machine learning model, N different low-dimensional searched parameter spaces are independently sampled, and each parameter space is optimized to reduce the failure probability, that is, the probability that the optimal solution of the target problem is not contained in any one of the N low-dimensional spaces is γ (N), but because each search optimization is run from the beginning, no potential relation among the low-dimensional spaces is utilized, parallel optimization of the multiple independent spaces is very time-consuming, and especially under the condition that the computational resources are limited and the parallel optimization cannot be performed at high level, the defect is more significant. The biggest defect of the existing technical scheme is that the time consumption between sequence optimization is large due to the fact that the relevance between search results obtained by running of the existing technical scheme is not fully utilized after the existing technical scheme is run for multiple times.

Disclosure of Invention

In order to solve the problems, the high-dimensional multi-mode evolution optimization method provided by the invention is mainly based on a random embedding technology, joint evolution search is carried out on a plurality of low-dimensional spaces, gene migration is obtained by using a cross method, so that a target task can simultaneously obtain beneficial information of a plurality of low-dimensional subtasks, the relevance among the subtasks is fully utilized, the defect of optimization search caused by the randomness of random embedding is overcome, the search efficiency of optimal parameters is greatly improved, and the optimization process of the parameters is accelerated.

The purpose of the invention is realized by the following technical scheme:

a high-dimensional multi-mode evolution optimization method based on a random embedding technology is an improved optimization algorithm which is based on a multi-mode obtained by parallel operation of multiple times of optimization, namely, a self-coding technology is adopted among a plurality of subtasks to carry out transfer learning among the subtasks so as to improve the optimization efficiency, and comprises the following steps:

(1) the specific target value of the single-target optimization problem to be solved is represented by f (x), all parameters to be optimized are represented by a long vector x, and each dimension of x corresponds to a parameter to be optimized (namely, the optimization x finds the value of x for minimizing f (x)), and f (x) corresponding to each x can be calculated according to the specific single-target optimization problem.

(2) A classical genetic algorithm in an evolutionary algorithm is used as a basic optimization algorithm, and a random embedding thought is combined to optimize a target value; the parameters are set as follows:

setting the corresponding parameter values for the genetic algorithm: the number N of subtasks, the population size K of each subtask, the maximum iteration frequency generation or maximum evaluation frequency FEVs of the iteration stop condition, and the cross probability P_CAnd the mutation probability P_M。

Setting corresponding parameter values for the random embedding idea: the dimensionality of the high-dimensional optimization problem, namely the number D of parameters to be adjusted, the dimensionality D of each subtask, and the effective dimensionality D of the whole problem set according to a random embedding theory_eAlgebraic interval G for occurrence of transfer learning_w。

(3) Initializing a population P of size K at low latitudes using the idea of random embedding, P ═ P₁,p₂,...,p_K-wherein each individual randomly corresponds to a low-dimensional representation and each individual is uniformly randomly assigned to a respective subtask, thereby obtaining

Where N is the number of low-dimensional tasks generated.

(4) Initializing random embedding matrix { M₁,M₂,…M_NTherein of

Wherein d is_multitaskIs the upper limit of the dimension of each low-dimensional subtask, so that the random embedding matrix { M ] is utilized by the random embedding method₁,M₂,…M_NH, solve the high-dimensional optimization problem f (x), x ∈ R^DThe random embedding idea is to replace the optimized original high-dimensional equation f (X) with the optimized low-dimensional equation g (y), wherein the DNA represents the gene material chromosome, namely the individual to be optimized, and finally the obtained low-dimensional DNA is substituted into a genetic algorithm to carry out genetic operation evolution, so that a plurality of low-dimensional search spaces y ∈ y are obtained^d_multitaskInstead of in the original high-dimensional space by parallel search optimizationThe optimization is carried out.

(5) The searching efficiency is improved by carrying out transfer learning on any two subtasks through a self-coding technology. Because different low-dimensional tasks have different perception views on the original high-dimensional task in the multi-mode optimization of the multi-task environment, the genetic material in one task cannot be directly used by other tasks, and the invention provides a self-coding technology for generating a corresponding mapping matrix W between any two subtasks. Assuming that the populations of the two subtasks are,

and

wherein Q separately optimizes two low-dimensional tasks Θ_i，Θ_jA mapping matrix W between the two^ijIs composed of (W)^ijPⁱ-P^j) Expressed in the least mean square of, e.g. formula

For this simple optimization problem, the least squares method can be used

Wherein the content of the first and second substances,

finger PⁱTranspose of (W)^ijProviding a low-dimensional task Θ_iAnd theta_jThe connection mode between the two. Thus, PⁱThe solution obtained by population optimization may be obtained by multiplying by a matrix W^ijIs mapped to a population P^jIn (1). If P isⁱAnd P^jThe dimension of (2) is different, and the W matrix is generated by firstly carrying out zero filling operation on the shorter chromosome codes and then calculating. And obtaining a migration matrix between every two subtasks, so that the population can be subjected to migration learning.

(6) And generating a child population C1 by cross variation in the population of each subtask.

(7) And (4) applying the self-coding matrix W to perform migration learning among subtasks, and generating a child population C2 obtained by the migration learning for the population of each subtask.

(8) And merging the parent population P and the child populations C1 and C2, calculating the target value of each individual in the merged population, and selecting the K individuals with the best target values before by adopting an elite strategy according to the target values to update the population P.

(9) And (5) repeating the steps (5) to (8) until an iteration stop condition is reached, namely a set maximum evaluation time is reached, and finally obtaining a parameter value corresponding to each individual in the population P.

Compared with the prior art, the invention has the beneficial effects that: the invention is an effective high-dimensional optimization method, namely, a large number of parameters can be optimized and adjusted quickly and effectively at the same time, so the method is very suitable for the super-parameter adjustment of large-scale machine learning models, such as the super-parameter adjustment of neural network models, the super-parameter adjustment of SVM models and the like. Aiming at the optimization problem of high dimension and low effective dimension, the method provides an evolutionary multitask multi-mode optimization method based on a random embedding dimension reduction means. The method generates a plurality of low-dimensional random embeddings, and each random embeddings can be used as a substitute expression of an original (high-dimensional) task; and the uniqueness and the correlation existing among all the subtasks are utilized to carry out beneficial transfer learning among the subtasks, so that on one hand, the searching efficiency of the optimal solution is effectively improved, and meanwhile, the time consumption of multiple operations is avoided.

Drawings

Fig. 1 is a block diagram of a chest disease diagnosis system.

Fig. 2 is a block diagram of the feature extraction network VGG 16.

FIG. 3 is a flowchart of an SVM multi-classifier auto-parameter adjustment.

Fig. 4 is a schematic diagram of a random embedding process.

Detailed description of the preferred embodiments

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

The method can be applied to the super-parameter adjusting process in a plurality of large-scale machine learning models, such as the super-parameter adjusting process of a large-scale neural network model or SVM model. The embodiment is particularly applied to a medical diagnosis system for diagnosing chest diseases, as shown in fig. 1. The chest disease diagnosis system adopts a multi-classification Support Vector Machine (SVM) as a model, carries out super-parameter adjustment on the SVM model by a high-dimensional multi-mode evolution optimization method, and trains the optimized SVM model to obtain the diagnosis system for chest disease diagnosis. The specific embodiments discussed are merely illustrative of implementations of the invention and do not limit the scope of the invention. The following describes embodiments of the present invention mainly for the problem of the chest disease diagnosis system, and specifically includes the following steps:

(1) breast medical image feature extraction module

The medical images of the chest in the data set are input into the existing pre-trained classical feature extraction neural network VGG16, and are used for extracting corresponding feature maps in the medical images and connecting the feature maps into a long feature vector. As shown in fig. 2: the structure of the pre-trained feature extraction network VGG16 is characterized in that a medical image is input into 2 3 x 3 convolutional layers (Cov) connected in series to perform primary feature extraction, then a pooling layer (Pool) is connected to compress an obtained feature map, the feature map is input into 3 series 3 x 3 convolutional layers (Cov) to perform final feature extraction after the feature map is repeated twice, finally the feature map is input into 3 series full-connection layers (Fc) to obtain a final long feature vector as an input vector of an SVM, and the feature extraction step of the medical image is completed.

(2) Basic setting is carried out on SVM models of multi-classification support vector machine

SVM is a classical classification model in machine learning, and the classification task refers to the training set with labels

A classifier is trained so that it can be applied to unlabeled data. The training objective of the SVM is to minimize the loss function:

u, 1, to getThe SVM model with better classification effect is obtained. Where w is the parameter matrix of the SVM model, λ is a constant coefficient, U is the number of samples, y_iIs the classification label of the ith sample, x_iIs the data of the ith sample, and b is the bias term of the SVM model.

For the diagnosis problem of chest diseases, as shown in table 1, the medical images of the chest need to be analyzed to determine whether the patient has the following 14 diseases: atelectasis, consolidation, infiltration, pneumothorax, edema, emphysema, fibrosis, effusion, pneumonia, pleural thickening, cardiac hypertrophy, nodules, mass, hernia, or a combination thereof. When the classification problem is solved by using the SVM model, the hyper-parameters of the model greatly influence the balance between two targets of errors and edge maximization of the SVM model on a training set. When the SVM model solves the multi-classification task, a common method is a one-to-one (OvO) strategy, and the multi-classification problem is converted into a plurality of two-classification problems. In a class H classification problem, training is performed according to a one-to-one (OvO) strategy

And two classifiers. Since 14 classes are trained in the chest disease diagnosis system, the SVM model used will be

The individual hyper-parameters need to be adjusted. And integrating 91 hyper-parameters to be optimized into a long vector x, wherein the value of each dimension in the vector x represents the value of one hyper-parameter, and the specific value range of the hyper-parameter adopts the commonly used SVM hyper-parameter range. And substituting each vector x, namely a group of hyper-parameters in a fixed value range, into a multi-classification Support Vector Machine (SVM) model for training to finally obtain a classification model with convergent training, and then representing the training effect of the group of hyper-parameters, namely an optimized target value, by testing the accuracy value tested on the classification model by a data set, so that the model with higher accuracy corresponds to a group of hyper-parameters with better optimization effect.

(3) Carrying out automatic parameter adjustment and optimization process on the SVM model (as shown in figure 3)

(3.1) basic setting of genetic Algorithm

The random embedding technology-based high-dimensional multi-mode evolution optimization method is applied to solving the problem of super-parameter adjustment of the multi-classification support vector machine SVM model, and a group of optimized super-parameters is obtained and substituted into the multi-classification support vector machine SVM model. All hyper-parameters to be optimized in the SVM model are integrated into a long vector x, and the accuracy of the SVM model corresponding to each x on a test set after the SVM model is finally trained to be converged is used as an optimization target value f (x). And (3) using a genetic algorithm GA as a basic evolutionary algorithm, and carrying out heuristic optimization based on different population quantities. Setting the number of the low-dimensional subtasks to be N-3, setting the population size of the whole problem to be K-30, setting the maximum iteration frequency generation or the maximum evaluation frequency FEVs to be FEs-300 under the iteration stop condition, and setting the cross probability P to be 300_CProbability of mutation P_MAnd the like. Setting corresponding parameter values for the random embedding idea: the dimension of the high-dimensional optimization problem, namely the number D of the hyper parameters to be adjusted is 91, the dimension D of each subtask is set to be 1/3 of the dimension D of the original problem, D is 30, and the effective dimension D of the whole problem is set according to the random embedding theory_e10, the algebraic interval at which the transition occurs is set to G_w＝1。

(3.2) multiple Low-dimensional subtask Generation modules

Initializing a series of random embedding matrixes (M) subject to normal distribution according to the idea of random embedding₁,M₂,…M_NTherein of

Where D is the original high dimension, D_multitaskThe maximum value of the upper limit of the effective dimensions of the N subtasks, and the high-dimensional optimization problem g (y) ═ f (My) ═ f (x), x ∈ R is mapped by a matrix^DConverting into a series of low-dimensional subproblems g (y), y ∈ y^d_multitask。

(3.3) multiple sub-population generating modules

Initializing a population P of size K at low latitudes, P ═ P₁,p₂,...,p_KEach of which randomly corresponds to a low-dimensional expression and each of which is associated with a low-dimensional expressionThe volumes are uniformly and randomly distributed to the subtasks, thus obtaining

Where N is the number of low-dimensional subtasks generated.

(3.4) population evolution optimization module

Repeating the following operations until an iteration stop condition is reached, namely a set maximum evaluation number is reached: calculating and generating a migration matrix W matrix through a self-coding technology, wherein in each generation: firstly, evaluating each individual in the population, and calculating a target value f (x) of each individual in the population (namely, a training model module in the step (4)); then generating a child population C1 by cross variation in the population of each subtask; and then applying the matrix W obtained by self coding to execute the migration learning among the subtasks, and generating a child population C2 obtained by the migration learning for the population of each subtask. Calculating the target value (namely a training model module) of each individual in the offspring populations C1 and C2; finally, merging the parent population P and the child populations C1 and C2, and selecting the first K best individuals to update the population P by adopting an elite strategy according to the target value of each individual; substituting the finally obtained super-parameter value corresponding to each individual in the population P into the SVM model, wherein the SVM model is the finally obtained SVM model with higher precision and can be applied to medical diagnosis of chest diseases.

(4) Training model module

Substituting the individuals of which the target values are to be calculated, namely a group of fixed super-parameter values x, into the SVM model to be trained, and inputting the feature vectors extracted by the training data set through the feature extraction module into the SVM model for training until convergence; inputting the test data set into the trained SVM model for testing, and calculating the accuracy of the model on the test data set as the optimized target value f (x) of the individual x.

Claims

1. A high-dimensional multi-mode evolution optimization method based on a random embedding technology is characterized by comprising the following steps:

(1) the specific target value of the single-target optimization problem to be solved is represented by f (x), all parameters to be optimized are represented by a long vector x, and each dimension of x corresponds to a parameter to be optimized

(2) A genetic algorithm is adopted as a basic optimization algorithm, and a random embedding thought is combined to optimize a target value; the parameters are set as follows:

setting the corresponding parameter values for the genetic algorithm: the number N of subtasks, the population size K of each subtask, the maximum iteration frequency generation or maximum evaluation frequency FEVs of the iteration stop condition, and the cross probability P_CAnd the mutation probability P_M；

Setting corresponding parameter values for the random embedding idea: the dimensionality of the high-dimensional optimization problem, namely the number D of parameters to be adjusted, the dimensionality D of each subtask, and the effective dimensionality D of the whole problem set according to a random embedding theory_eAlgebraic interval G for occurrence of transfer learning_w；

Wherein N is the number of generated low-dimensional tasks;

(4) initializing random embedding matrix { M₁,M₂,…M_NTherein of

Wherein d is_multitaskIs the upper limit of each low-dimensional subtask dimension, using a random embedding matrix { M }₁,M₂,…M_NH, solve the high-dimensional optimization problem f (x), x ∈ R^D(ii), translating into a low dimensional sub-problem g (y), g (y) f (my) f (x);

(5) a corresponding mapping matrix W is generated between any two subtasks through a self-coding technology, and migration learning is carried out to improve the search efficiency: assuming that the populations of the two subtasks are,

and

wherein Q separately optimizes two low-dimensional tasks Θ_i，Θ_jA mapping matrix W between the two^ijIs composed of (W)^ijPⁱ-P^j) Expressed in least mean square, i.e.

Then obtaining by least square method

Wherein the content of the first and second substances,

finger PⁱTranspose of (W)^ijProviding a low-dimensional task Θ_iAnd theta_jThe connection mode between the two; thus, PⁱThe solution obtained by population optimization is multiplied by a matrix W^ijIs mapped to a population P^jPerforming the following steps; if P isⁱAnd P^jThe dimensions of the W matrix are different, and the W matrix is generated by firstly carrying out zero filling operation on the shorter chromosome codes and then calculating; obtaining a migration matrix between every two subtasks, namely performing migration learning on the population;

(6) generating a child population C1 by cross variation in the population of each subtask;

(7) performing migration learning among subtasks by applying a matrix W obtained by self-coding, and generating a child population C2 obtained by migration learning for the population of each subtask;

(8) merging the parent population P and the child populations C1 and C2, calculating the target value of each individual in the merged population, and selecting the K individuals with the best target values to update the population P according to the target values by adopting an elite strategy;