CN113762370A

CN113762370A - Depth network set generation method combined with Gaussian random field

Info

Publication number: CN113762370A
Application number: CN202111001978.6A
Authority: CN
Inventors: 代子风; 张宸; 梁晓龙; 赵海童; 张长胜; 张斌
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-12-07

Abstract

The invention uses a pre-screening strategy combined with a Gaussian random field to accelerate the generation of a deep neural network set, and discloses a deep network set generation method combined with the Gaussian random field. The method comprises the following steps: training a Gaussian random field model by using the preprocessed neural network; step two: initializing a neural network set, and generating a network set with higher accuracy; step three: optimizing a neural network set by combining a Gaussian random field model; step four: the network set is compressed. The method predicts the fitness of the neural network through the Gaussian random field model, reduces the calculation times by combining a pre-screening strategy, obtains the neural network set more quickly and improves the model performance.

Description

Depth network set generation method combined with Gaussian random field

Technical Field

The invention belongs to the field of machine learning algorithms, and relates to a depth network set generation method combined with a Gaussian random field.

Background

Along with the development of modern science and technology and the progress of productivity, the performance of a computer is greatly broken through, and the acquisition of mass data is not difficult any more. As a pronoun of artificial neural networks, deep learning has become the most fierce research direction in the field of machine learning, and various algorithms for improving the training speed of the neural networks are proposed in succession. On the basis, the neural network has been rapidly developed, and more deep network models are proposed and widely applied to various fields such as face recognition, image classification, natural language processing and the like.

The neural network model simulates the working principle of human brain neurons, and performs a series of calculations and processing on input information to finally extract useful information as an output result. A classical fully-connected neural network can be roughly divided into an input layer, a hidden layer and an output layer. Each layer of the full-connection network comprises a plurality of nodes, wherein each node is connected with all nodes of the adjacent layer, and the nodes of the same layer are not connected with each other. Each layer processes and processes the information of the upper layer, and then transfers the information to the next layer until the output layer. The information contained in the output layer nodes is the information extracted from the original input by the neural network, and can be used for corresponding task processing. However, when the accuracy of the neural network model is calculated, construction and complete calculation are often required, and a large amount of time is spent, so that the training speed of the neural network model is difficult to improve.

If the Gaussian random field model is used for predicting accuracy, the performance of the model can be well improved, and training time is reduced. The basic principle of the Gaussian random field model is that before the evaluation of a real fitness function, the evaluated solution is used for establishing the Gaussian random field model, and then a function value corresponding to an unknown solution is predicted. By setting a pre-screening rule, only those solutions with larger promotion space are reserved, so that the purpose of reducing the evaluation of the true fitness function is achieved, the evaluation times of the fitness function of the evolutionary algorithm in the optimization process can be effectively reduced, and the overall training speed of the model is improved.

From the above background, in order to obtain a set of neural network models with high accuracy and large variability, the calculation process takes much time, and the final set of neural network models is too large to be stored.

Disclosure of Invention

In order to reduce the execution time of the differential evolution algorithm, the invention provides a pre-screening strategy combined with a Gaussian random field model to reduce the unnecessary evaluation times of a fitness function, and the overall training speed of the model is improved through the pre-screening strategy.

The technical scheme of the invention is as follows:

a depth network set generation method combining Gaussian random fields. Firstly, a Gaussian random field model is trained by using neural network individuals with accurate fitness calculated, and the neural network individuals are used for predicting the fitness of the individuals. And then, initializing a network set by using a differential evolution algorithm to obtain a group of networks with higher accuracy. And then, the generated network set is used, two objective functions are respectively constructed by relying on two indexes of difference and accuracy, and therefore the two objective functions are converted into a multi-objective problem to be solved by using a multi-objective differential evolution algorithm. In the selection process of the evolutionary algorithm, the value of the fitness function is predicted by using the established Gaussian random field model, and whether accurate fitness is calculated or not is judged by combining a pre-screening strategy, so that the overall operation efficiency of the model is improved. And for the final network set, the clustering algorithm is used for reducing the scale of the set, so that the storage of the final network set is facilitated.

The method comprises the following steps:

step one, respectively constructing a Gaussian random field prediction model of two indexes of accuracy and difference by using a neural network individual of which the fitness function is accurately calculated;

secondly, after population initialization operation is carried out by using a single-target differential evolution algorithm, a neural network set with higher accuracy is obtained, namely the neural network set is initialized; predicting the accuracy of the variant individual by using the Gaussian random field prediction model established in the step one;

step three, further optimizing the neural network set obtained by initialization in the step two;

calculating a fitness function of each individual in the initial population, and carrying out Gaussian variation on the individual; predicting the fitness function of the new individual by using a Gaussian random field prediction model, and judging by combining a pre-screening strategy; updating the reference point and neighborhood problems of the individual; judging whether to update an external storage set; outputting the external storage set;

step four, the clustering algorithm reduces the scale of the network set;

acquiring an external storage set output in the step three; adopting a clustering algorithm to narrow the selection range of the networks, and clustering according to the accuracy and difference of each network in the set; and selecting a central network from each cluster, constructing a new deep neural network set and outputting the new deep neural network set as a final result.

Further, the method for generating the depth network set by combining the Gaussian random field specifically comprises the following steps:

the method comprises the following steps: and (4) building a prediction model by combining the Gaussian random field.

And respectively constructing a Gaussian random field prediction model with two indexes of accuracy and difference by using the neural network individuals of which the accurate fitness functions are calculated. A kernel function of gaussian regression prediction is first defined for passing the covariance between the input individuals. And then performing fitting Gaussian process regression, and fitting the neural network individuals with the fitness value to obtain a regression model. And finally, optimizing the hyper-parameters of the kernel function by maximizing the transfer-based log-edge likelihood fitting to obtain a final Gaussian random field prediction model.

Further, the first step specifically comprises:

step 1.1, acquiring a neural network individual set of which the fitness function is accurately calculated, establishing the accuracy and difference set of each individual, and ensuring the consistency of the lengths of the sets;

step 1.2, defining a kernel function of Gaussian regression prediction, and transferring covariance between input individuals. The kernel function is WhiteKernel and is used for estimating the noise of the target function and reducing the influence of the noise;

step 1.3, fitting Gaussian process regression is carried out on the neural network individuals and the accurate fitness function values, and the neural network individuals and the fitness values are fitted to obtain a regression model;

step 1.4, optimizing the over-parameter optimizer of the kernel function by maximizing the transfer-based log-edge likelihood estimation; because the log-edge likelihood estimation has a plurality of local optimal solutions, the optimization process is repeated for a plurality of times by specifying a kernel parameter n _ resets _ optimum; performing first optimization operation by setting an initial value of a hyper-parameter n _ resets _ optimum of the kernel; in the subsequent operation process, values of the hyper-parameter n _ resets _ optimum are randomly selected from reasonable range values;

step 1.5, obtaining a prediction model combined with the Gaussian random field, and establishing a reference network set S of the Gaussian random field model_evalAnd the method is used for predicting the fitness of the individual subsequently and updating in real time in the prediction process.

Still further, the step 1.3 further comprises:

step 1.3.1: taking a neural network individual x in the neural network individual set obtained in the step 1.1 as an independent variable, taking a corresponding fitness function value as a dependent variable y, and constructing an objective function y which is g (x);

step 1.3.2: from gaussian process regression prediction, assume that the objective function y ═ g (x) obeys a mean value μ and a variance δ²Positive-negative distribution of;

step 1.3.3: constructing a maximum likelihood estimation function for fitting a normal distribution curve; the maximum likelihood function PDF is shown in equation (1.1):

wherein exp represents an exponential function with a natural logarithm e as a base, and det represents a value of a determinant corresponding to the calculation matrix; the matrix C is a K × K matrix, and the values C of corresponding positions in the matrix_i，j＝c(xⁱ，x^j)，c(xⁱ，x^j) Representing the value of the correlation function, for an arbitrary argument x, x' ∈ Rⁿ，RⁿRepresenting a set of real numbers, the correlation function c (x, x ') -exp [ -d (x, x')]Characterizing the correlation between the objective function values g (x) and g (x ') corresponding to the independent variables x, x'; wherein the distance function

θ_iAnd p_iEach represents a hyper-parameter controlling the distance function, independent of the independent variable x, x ', so that the value of the correlation function is only dependent on the magnitude of (x-x '), the greater the (x-x '), the smaller the correlation, and vice versa; vector y ═ y¹，y²，...y^K) And 1 is a column vector of dimension K;

step 1.3.4: according to the features of the positive Tai distribution, maximizing the likelihood estimation function and making the mean value

Variance (variance)

The unbiased estimate of the objective function g (x) is

Denotes the expectation, variance, of the mean μ

Wherein r ═ c (x, x)¹)，c(x，x²)，...，c(x，x^K))^T，

Represents the variance δ²Considering the objective function g (x) to obey a positive distribution

Step 1.3.5: and (3) training the individual neural network used in the step (1.3.1) and the fitness value to a Gaussian random field prediction model to obtain a final regression model.

Step two: and initializing a neural network set.

Firstly, coding hyper-parameters such as the size and the number of convolution kernels in a deep convolution network, the filter size and the step length of a pooling layer, the number of nodes of a full connection layer and the like into individuals in an evolutionary algorithm, then generating a series of neural networks with high enough accuracy by only taking the accuracy as a target function and applying a single target evolutionary algorithm, and when the accuracy in an individual set is higher than a control threshold: i.e. the single target algorithm controls the output minimum accuracy rate r₁And (4) ending the calculation, and outputting the last generation of individuals as the initial population of the next multi-target algorithm. In the process, the accuracy of the variant individual is predicted by using the Gaussian random field prediction model established in the step one, and whether real accuracy calculation is carried out or not is judged by combining a pre-screening strategy.

Further, the second step specifically comprises:

step 2.1, generating a plurality of individuals uniformly and randomly in a decision space Ω, wherein all the individuals are composed of real number codes corresponding to hyper-parameters of a neural network structure, and the composition is shown in a formula (2.1):

x_i(0)＝(x_i，1(0)，x_i，2(0)，...，x_i，d(0))i＝1，2，3...，Md＝1，2，3...，V (2.1)

wherein M represents the number of target generation individuals, V represents the maximum dimension of a decision space omega, and the initialization mode of the jth dimension of the ith individual is shown as a formula (2.2);

x_i，j(0)＝L(0，1)(Lj_min_{j_max}|i＝1，2，3...，M j＝1，2，3，...，d)_{j_min} (2.2)

wherein L is_{j_min}And L_{j_max}Respectively representing the upper and lower boundaries of the value of the jth dimension of the parameter vector, and rand (0, 1) representing the generation of a random number between 0 and 1;

step 2.2, performing variation operation of the population initialization algorithm, namely randomly selecting two different individuals from the population, scaling the vector difference of the two different individuals, and then performing vector synthesis on the two different individuals and the individual to be varied, as shown in a formula (2.3):

x′_i(g)＝x_r1(g)+F·(x_r2(g)-x_r3(g)) (2.3)

wherein the scaling factor F ∈ [0, 2 ]]And r is₁≠r₂≠r₃≠i，x_ri(g) X 'represents a pre-mutated individual'_i(g) Represents new individuals generated after mutation; carrying out boundary value control on the newly generated individuals while carrying out mutation;

step 2.3, initializing a cross operation by the population, wherein the value of each dimension of the crossed individuals is randomly selected from the corresponding dimension value of the variant individuals or the corresponding original individuals, so as to obtain the crossed individuals, and the specific generation method is shown as a formula (2.4):

wherein the cross probability cr is E [0, 1 ]]，x″_i，jRepresents the mutated individual x ″)_iThe j-th dimension value of (a).

Step 2.4, selecting operation of population initialization, calling the Gaussian random field prediction model established in the step one, and pre-screening the variant individuals by setting a pre-screening rule; wherein the pre-filtering rule is set based on the possible boosting probability defined in equation (2.5):

wherein x represents an individual solution vector,

representing an unbiased estimate of the objective function, f_minRepresents the minimum value of the fitness function,

representing an unbiased estimate of the variance, poi (x) representing the probability of possible lifting for the individual solution vector x;

selecting M/2 individuals from PoI values of all variant individuals in a roulette mode to perform real fitness function evaluation, comparing the PoI values with the fitness function values of parent population individuals, selecting the optimal individuals according to a greedy rule, and constructing a new generation population, wherein the specific selection rule is shown as a formula (2.6):

wherein f (x) represents a fitness function value for the target individual;

when the individuals in the population are updated, adding the new individuals into a reference network set S of the Gaussian random field model_eval(ii) a If no change occurs, no update occurs.

Step 2.5: judging the output accuracy of the population, and when the accuracy of the network corresponding to all individuals in the population is greater than the minimum accuracy r of the single-target algorithm control output₁When the yield is 0.9, the algorithm is terminated and a final generation population is output; if not, return to step 2.2.

Step three: and optimizing the network set by the multi-objective differential evolution algorithm. And (4) firstly, acquiring the last generation of individuals in the step two as an initial population of the multi-target differential evolution algorithm. Then, the fitness function of each individual in the initial population is calculated and the individual is placed in an external storage set. And calculating adjacent subproblems of each individual and carrying out Gaussian variation on the individual. And predicting the fitness function of the new individual by using the Gaussian random field prediction model constructed in the step one, judging by combining a pre-screening strategy, and updating the reference point and neighborhood problems of the individual. And finally, the accuracy and the control threshold are as follows: minimum accuracy rate r of multi-target algorithm control joining external storage set₂Comparing and judgingWhether to update the external storage set. When the individual accuracy rates in the external storage set are all higher than the control threshold: control output minimum accuracy rate r of multi-target algorithm₃And when the time or the evolution algebra is larger than the control evolution algebra threshold value, ending the circulation and outputting the external storage set.

Further, the third step specifically comprises:

step 3.1: taking a plurality of neural network individuals of the last generation population of the single-target algorithm obtained in the step two as an initial population of the multi-target algorithm;

step 3.2: calculating the fitness function of each individual of the population, and storing the fitness function into an external storage set outEP and a Gaussian random field model reference network set S_evalPerforming the following steps;

step 3.3: generation of adjacent subproblems: multi-objective differential evolution algorithm generates an evenly distributed weight vector [ lambda ] for all sub-problems¹，...，λ^MH, wherein the weight vector corresponding to the ith sub-problem

Represents a single weight value; obtaining T sub-problems which are closest to each sub-problem, namely a neighborhood, by calculating Euclidean distances between weight vectors corresponding to the sub-problems;

step 3.4: the Gaussian variation of the neural network individuals: in the evolution process of the first generation of population individuals, two indexes p and q are selected randomly from the neighborhood of the ith subproblem in a circulating mode, and then corresponding individuals are obtained

And

obtaining variant individuals according to a basic variant formula of a differential evolution algorithm

And according to probability isAdding a Gaussian random variable into each dimension value of the variant individual, as shown in formula (3.1):

wherein the scaling factor F ∈ [0, 2 ]]，rnd_U(0, 1) represents from [0, 1 ]]Fractions within the range obtained by uniform random sampling; rnd (r)_G(0, sigma) represents a Gaussian random vector with the mean value of 0 and the standard deviation of sigma, the value of sigma is one twentieth of the value range of the corresponding dimension element, the value is 0.5,

the expression means an individual after the mutation,

representing the original individual;

after traversing each individual in the population, ending the circulation and entering the next step;

step 3.5: predicting the fitness of the new individual according to a Gaussian random field prediction model, calculating a PoI value according to a formula (2.5), and if the PoI is more than 0.5, calculating the true fitness of the individual; otherwise, discarding the variant individual and continuing to predict the next variant individual;

step 3.6: and (3) updating the reference point and the neighborhood: before updating the reference point, it is first determined whether the accuracy of the network corresponding to the variant individual is greater than a threshold: minimum accuracy rate r of multi-target algorithm control joining external storage set₂0.9 if the accuracy is greater than r₂Updating the reference point, otherwise not updating; wherein the reference point

And to

The part has:

wherein f is_i(x) Representing fitness function values of corresponding individuals;

when neighborhood replacement operation is carried out, T neighborhood individuals of the ith individual need to be judged, and when a formula (3.3) is met, the corresponding domain network individual is replaced by a new variant individual;

wherein i_sRepresenting elements in a neighborhood B (i) corresponding to the ith individual;

only if the accuracy of the network corresponding to the variant individual is greater than the threshold r₂When the replacement condition shown in the formula (3.3) is satisfied and 0.9 is set, performing neighborhood replacement operation;

step 3.7: after the variant individual is updated in step 3.6, the variant individual is added into the reference network set S of the Gaussian random field model_eavlPerforming the following steps;

step 3.8: continuing to circulate until the generation of individuals completes iteration;

step 3.9: updating the outer set outEP; in each generation of the multi-target algorithm evolution, firstly, the network accuracy in the variant individual is greater than the minimum accuracy rate r of the control output of the multi-target algorithm₃Saving the individuals with the size of 0.95, and adding the individuals into an external storage set after all the variations of the contemporary population are finished; then evolution algebra G is G + 1;

step 3.10: when all the individual accuracy rates in the external storage set are greater than the control output accuracy rate threshold value, the multi-target algorithm controls the output minimum accuracy rate r₃When the evolution algebra G is larger than the control threshold value 20 or 0.95, ending the operation of the algorithm and outputting all individuals in the external storage set; and when the condition is not met, returning to the step 3.4 to continue the operation.

Step four: the clustering algorithm reduces the size of the network set. Firstly, the external storage set output in the step three is obtained as the network set of the step. Then, a clustering algorithm is adopted to narrow the selection range of the networks, K clusters are formed according to the two objective function values of the accuracy and the difference of each network in the set, and the networks which are relatively close to each other in the set are clustered into a whole. And finally, selecting a central network from each cluster, constructing a new deep neural network set and outputting the new deep neural network set as a final result.

Further, the fourth step further comprises:

step 4.1: obtaining the difference and the accuracy of the deep neural network in the external storage set outEP in the third step to generate a data set;

step 4.2: initializing centers of K categories randomly;

step 4.3: an initial clustering center matrix;

step 4.4: traversing the distances between all data in the data set and the centers of the K clustering matrixes, and measuring the distance between two individuals by adopting the Euclidean distance, wherein the calculation mode is shown as a formula (4.1):

wherein x₁And x₂Respectively representing two different network entities, f_i(x_i) Representing network individuals x_iThe value of i is 1 or 2, dis represents the distance between the two individuals;

step 4.5: finding out the minimum distance and determining whether to update the clustering center matrix; if the minimum distance is smaller, updating the clustering center individual; traversing K clusters, and ending the cycle after each cluster is traversed;

step 4.6: from each cluster, the central individual outputs are selected and combined together as the final set of deep neural networks output.

The invention has the beneficial effects that: the method introduces the Gaussian random field model to predict the fitness function of the deep neural network, and combines the pre-screening strategy to reduce the calculation times, so that a proper deep neural network set can be obtained more quickly and accurately, the training speed of the model is accelerated, and the performance of the model is improved.

Drawings

FIG. 1 is a flow chart of a method for generating a deep network set by combining Gaussian random fields according to the invention.

FIG. 2 is an algorithm flow diagram of the population initialization strategy of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings.

The embodiment of the invention is carried out on the basis of laboratory equipment, and adopts a PC of Windows 10. The CPU used for the experiment was an Intel core i 77700K processor with 4 cores and 8 threads, a base frequency of 4.5G Hz, and a dynamic acceleration frequency of 4.5G Hz. The GPU is NVIDIA GTX 1082Ti, and the GPU is provided with 3584 CUDA cores, 11GB video memory and 352bit video memory bit width. The PC is also equipped with a 16GB memory and a 1T hard disk.

The data set used in the present invention is an MNIST data set, and MNIST (Mixed National Institute of Standards and Technology database) is a classic data set in the computer vision field, which contains 70000 gray-scale pictures of handwritten numbers in total, and each picture is 28 × 28 pixel points. In this task, each picture corresponds to a certain label, which is the actual number represented by the handwritten digital picture. The entire MNIST data set is divided into two parts, a training data set consisting of 60000 pictures and a test data set consisting of 10000 pictures. Wherein the training data set is further divided into a training set of 55000 pictures and a validation set of 5000 pictures.

The meaning of some parameters in the steps of the invention is as follows:

PoI: the probability of possible boosting is derived from the gaussian random field model.

r₁: the single target algorithm controls the output with the lowest accuracy.

r₂: the multi-objective algorithm controls the lowest accuracy rate of joining the external storage set.

r₃: and controlling and outputting the lowest accuracy rate by a multi-target algorithm.

S_eval: the gaussian random field model references a set of network individuals.

outEP: and the external storage set is used for outputting the final individual.

λ_i: average weight vector of network individuals.

G: evolution algebra of the multi-objective evolution algorithm.

The method for generating the depth network set by combining the Gaussian random field comprises the following specific steps:

Step 1.1: acquiring a neural network individual set of which the accurate fitness function is calculated, wherein the number of the network individuals is 300, establishing an accuracy rate and a difference set of each individual, and ensuring the consistency of the lengths of the sets.

Step 1.2: a kernel function of gaussian regression prediction is defined for passing the covariance between the input individuals. The kernel function uses WhiteKernel, which can estimate the noise of the target function and reduce the influence of the noise.

Step 1.3: and fitting the neural network individuals and the fitness value to obtain a regression model.

Step 1.4: the hyper-parameter optimizer of the kernel function is optimized by maximizing the transfer-based log-edge likelihood estimation. Since there may be multiple locally optimal solutions for log-edge likelihood estimation, the optimization process may be repeated multiple times by specifying the kernel parameter n _ resets _ optimum. The first optimized run is made by setting the initial value of the kernel's hyper-parameter n _ resets _ optimum. The value of the over-parameter n _ thresholds _ optimum in the subsequent operation process is randomly selected from reasonable range values.

Step 1.5: obtaining a prediction model combined with a Gaussian random field and establishing a reference network set S_evalAnd the method is used for predicting the fitness of the individual subsequently and updating in real time in the prediction process.

Wherein the step 1.3 is divided into the following five steps:

step 1.3.1: taking the neural network individual x in the neural network individual set obtained in step 1.1 as an independent variable, the corresponding fitness function value as a dependent variable y, and constructing an objective function y as g (x).

Step 1.3.2: from gaussian process regression prediction, assume that the objective function y ═ g (x) obeys a mean value μ and a variance δ²Positive too distribution of (c).

Step 1.3.3: and constructing a maximum likelihood estimation function for fitting a normal distribution curve. The maximum likelihood function PDF is shown in equation (1.1):

wherein exp represents an exponential function with a natural logarithm e as a base, and det represents a value of a determinant corresponding to the calculation matrix; the matrix C is a K × K matrix, and the values C of corresponding positions in the matrix_i，j＝c(xⁱ，x^j)，c(xⁱ，x^j) Representing the value of the correlation function, for an arbitrary argument x, x' ∈ Rⁿ，RⁿRepresenting a set of real numbers, the correlation function c (x, x ') -exp [ -d (x, x')]And characterizing the correlation between the objective function values g (x) and g (x ') corresponding to the independent variables x, x'. Wherein the distance function

θ_iAnd p_iBoth represent the hyperparameters controlling the distance function, independently of the independent variables x, x'. The value of the correlation function is therefore only related to the size of (x-x '), the greater the (x-x'), the smaller the correlation and vice versa; vector y ═ y¹，y²，...y^K) And 1 is a column vector of dimension K.

Step 1.3.4: the likelihood estimation function is maximized according to the features of the positive distribution. Let mean value

Variance (variance)

The unbiased estimate of the objective function g (x) is

Denotes the expectation, variance, of the mean μ

Wherein r ═ c (x, x)¹)，c(x，x²)，...，c(x，x^K))^T，

Represents the variance δ²The expectation is that. At this point, the objective function g (x) may be considered to obey the positive-Taiwan distribution

Step two: obtaining neural network set with higher accuracy by single-target differential evolution algorithm

Step 2.1: 100 individuals are uniformly and randomly generated in a decision space omega, all the individuals are composed of real number codes corresponding to hyper-parameters of a neural network structure to be structured, and the specific composition is shown in a formula (2.1).

Wherein M represents the number of target generation individuals, V represents the maximum dimension of the decision space Ω, and the initialization mode of the jth dimension of the ith individual is shown as formula (2.2).

Wherein L is_{j_min}And L_{j_max}Respectively representing the upper and lower boundaries of the value of the jth dimension of the parameter vector, and rand (0, 1) represents the generation of a random number between 0 and 1.

Step 2.2: in the variation operation of the population initialization algorithm, two different individuals are randomly selected from a population, the vector difference of the two different individuals is scaled and then vector synthesis is carried out on the two different individuals and the individual to be varied, and the specific details are shown in a formula (2.3):

x′_i(g)＝x_r1(g)+F·(x_r2(g)-x_r3(g)) (2.3)

wherein the scaling factor F ∈ [0, 2 ]]And r is₁≠r₂≠r₃≠i，x_ri(g) X 'represents a pre-mutated individual'_i(g) Indicates the new individuals generated after mutation. Strict boundary value control needs to be carried out on newly generated individuals while mutation is carried out, and when the value of a certain dimension exceeds the corresponding interval, the algorithm remaps the value to a proper range through a specific operation.

Step 2.3: and (3) initializing a cross operation by the population, wherein the value of each dimension of the crossed individuals is randomly selected from the corresponding dimension value of the variant individuals or the corresponding original individuals. Thus, cross individuals are obtained, and a specific generation method is shown in formula (2.4):

Step 2.4: and (3) carrying out selection operation of population initialization, calling the Gaussian random field prediction model established in the step one, and pre-screening the variant individuals by setting a pre-screening rule. The pre-filtering rules are set here based on the possible lifting probabilities defined in equation (2.5).

Where x represents an individual solution vector and,

representing the unbiased estimation of the variance, and poi (x) represents the possible lifting probability corresponding to the individual solution vector x, and the value is about 0.5.

Selecting M/2 individuals (half of the population size) from PoI values of all variant individuals in a roulette mode to perform real fitness function evaluation, comparing the PoI values with the fitness function values of parent population individuals, selecting the optimal individuals according to a greedy rule, and constructing a new generation of population, wherein the specific selection rule is shown as a formula (2.6).

Where f (x) represents the fitness function value of the target individual.

When the individuals in the population are updated, adding the new individuals into a reference network set S of the Gaussian random field model_eval. If no change occurs, no update occurs.

Step 2.5: the method comprises the following steps that a group output accuracy rate judgment operation is carried out, wherein the function of a single-target algorithm at the position is to generate a group of networks with high enough accuracy rate, and therefore, an accuracy rate parameter r is added into the algorithm₁The termination of the algorithm is controlled at 0.9. When the accuracy of the network corresponding to all the individuals in the population is greater than the accuracy r₁When 0.9, the algorithm terminates and outputs the final generation population. If not, return to step 2.2.

Step three: and further optimizing the network set obtained by initialization.

The step mainly aims at the last generation population obtained by the single-target evolutionary algorithm to carry out the multi-target differential evolutionary algorithm

Step 3.1: and obtaining 100 neural network individuals of the last generation of population of the single-target algorithm, and taking the neural network individuals as the initial population of the multi-target algorithm.

Step 3.2: calculating the fitness function of each individual of the population, and storing the fitness function into a reference network set S of an external storage set outEP and a Gaussian random field model_evalIn (1).

Step 3.3: generation of adjacent subproblems. Multi-objective differential evolution algorithm generates an evenly distributed weight vector [ lambda ] for all sub-problems¹，...，λ^MH, wherein the weight vector corresponding to the ith sub-problem

Representing a single weight value. Then, T sub-problems (called neighborhoods) closest to each sub-problem can be obtained by calculating Euclidean distances between weight vectors corresponding to the sub-problems, and the evolution of the multi-objective algorithm is realized by information exchange between adjacent sub-problems.

Step 3.4: and (4) carrying out individual Gaussian variation on the neural network. In the evolution process of the first generation of population individuals, two indexes p and q are selected randomly from the neighborhood of the ith subproblem in a circulating manner, and then corresponding individuals are obtained

And

And adding Gaussian random variables into each dimension value of the variant individual according to the probability, wherein the specific method is as a formula(3.1) is as follows:

wherein the scaling factor F ∈ [0, 2 ]]，rnd_U(0, 1) represents from [0, 1 ]]The fraction within the range obtained by uniform random sampling. rnd (r)_G(0, sigma) represents a Gaussian random vector with the mean value of 0 and the standard deviation of sigma, the value of sigma is one twentieth of the value range of the corresponding dimension element, the value is 0.5,

the expression means an individual after the mutation,

representing the original individual.

And after traversing each individual in the population, ending the circulation and entering the next step.

Step 3.5: predicting the fitness of the new individual according to the Gaussian random field prediction model, calculating the PoI value according to a formula (2.5), and if the PoI is more than 0.5, calculating the true fitness of the individual. Otherwise, discarding the variant individual and continuing to predict the next variant individual.

Step 3.6: and updating the reference point and the neighborhood. Before updating the reference point, it is first determined whether the accuracy of the network corresponding to the variant individual is greater than a threshold accuracy rate r₂0.9 if the accuracy is greater than r₂The reference point is updated and otherwise not. Wherein the reference point

And to

Are all provided with

Wherein f is_i(x) The fitness function value of the ith individual is expressed.

When neighborhood replacement operation is performed, T neighborhood individuals of the ith individual (subproblem) need to be judged, and when the formula (3.3) is met, the corresponding domain network individual is replaced by a new variant individual.

Wherein i_sRepresents the elements in the neighborhood B (i) corresponding to the ith individual.

Only if the accuracy of the network corresponding to the variant individual is greater than the threshold r₂When the replacement condition shown in equation (3.3) is satisfied at 0.9, the neighborhood replacement operation is performed. This is intended to avoid the situation where a higher accuracy individual is replaced by a lower accuracy individual.

Step 3.7: after the variant individual is updated in step 3.6, the variant individual is added into the reference network set S of the Gaussian random field model_eavlIn (1).

Step 3.8: continuing to loop until the iteration of the generation of individuals is completed

Step 3.9: the outer set outEP is updated. In each generation of multi-target algorithm evolution, the network accuracy in the variant individual is more than r₃Individuals who are 0.95 remain, wait for the completion of all variations in the contemporary population, and add them to the external storage set. Then evolution algebra G +1

Step 3.10: when all the individual accuracy rates in the external storage set are greater than the control output accuracy rate threshold value r₃When the evolution algebra G is larger than the control threshold value 20 or 0.95, the running of the algorithm is finished, and all the individuals in the external storage set are output. And when the condition is not met, returning to the step 3.4 to continue the operation.

Step four: the clustering algorithm reduces the scale of the neural network set.

Step 4.1: and obtaining the difference and the accuracy of the deep neural network in the external storage set outEP in the third step to generate a data set.

Step 4.2: the centers of the 10 classes are initialized randomly.

Step 4.3: an initial cluster center matrix.

Step 4.4: traversing the distances between all data in the data set and the centers of 10 clustering matrices, wherein the distance between two individuals is measured by the Euclidean distance, and the calculation mode is shown as formula (4.1):

wherein x₁And x₂Respectively representing two different network entities, f_i(x_i) Representing network individuals x_iI has a value of 1 or 2, dis represents the distance between the two individuals.

Step 4.5: finding the minimum distance and determining whether to update the cluster center matrix. And updating the clustering center individual if the minimum distance is smaller. And traversing 10 clusters, and ending the cycle after each cluster is traversed.

In order to verify the effectiveness of the pre-screening strategy combined with the Gaussian random field in the method provided by the invention, 5 times of experiments are respectively carried out on the models with or without the Gaussian random field under the condition that the overall algorithm setting is kept unchanged, so that the contingency of the experiments is eliminated and the accuracy of the experimental results is ensured as high as possible. Averaging the obtained experimental results and rounding the operation time to obtain the performance of the model on the test set and the optimized part of the operation time under the condition of the existence of the pre-screening strategy, wherein the specific experimental data are shown in table 1:

TABLE 1 influence of pre-screening strategy based on Gaussian random field on experimental results

In conclusion, the deep network set generation method combined with the Gaussian random field can effectively improve the training speed of the network set, and the accuracy is higher.

The above description of exemplary embodiments has been presented only to illustrate the technical solution of the invention and is not intended to be exhaustive or to limit the invention to the precise form described. Obviously, many modifications and variations are possible in light of the above teaching to those skilled in the art. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to thereby enable others skilled in the art to understand, implement and utilize the invention in various exemplary embodiments and with various alternatives and modifications. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims

1. A deep network set generation method combined with a Gaussian random field is characterized by comprising the following steps:

step four, the clustering algorithm reduces the scale of the network set;

2. The method of claim 1, wherein the step one further comprises:

step 1.2, defining a kernel function of Gaussian regression prediction, and transferring covariance between input individuals; the kernel function is WhiteKemel and is used for estimating the noise of the target function and reducing the influence of the noise;

3. The method of claim 2, wherein the step 1.3 further comprises:

step 1.3.1, taking a neural network individual x in the neural network individual set obtained in step 1.1 as an independent variable, taking a corresponding fitness function value as a dependent variable y, and constructing an objective function y which is g (x);

step 1.3.2, regression prediction is performed according to a gaussian process, and an objective function y is assumed to be g (x) and the mean value is assumed to be mu, and the variance is assumed to be delta²Positive-negative distribution of;

step 1.3.3, constructing a maximum likelihood estimation function for fitting a normal distribution curve; the maximum likelihood function PDF is shown in equation (1.1):

step 1.3.4, according to the feature of positive space distribution, maximizing the likelihood estimation function and making the mean value

Variance (variance)

The unbiased estimate of the objective function g (x) is

Denotes the expectation, variance, of the mean μ

Wherein r ═ c (x, x)¹)，c(x，x²)，...，c(x，x^K))^T，

And step 1.3.5, training a Gaussian regression prediction model by the neural network individuals used in the step 1.3.1 and the fitness value to obtain a final regression model.

4. The method according to any one of claims 1 to 3, wherein in the second step, the hyper-parameters of the size and number of convolution kernels in the deep convolutional network, the filter size and step size of the pooling layer, and the number of nodes in the fully-connected layer are encoded into the individual in the evolutionary algorithm, and a series of neural networks with high enough accuracy are generated by using the single-target evolutionary algorithm with only the accuracy as the objective function, and when the accuracies in the individual set are all higher than the control threshold: single target algorithm control output minimum accuracy rate r₁And when the current generation is finished, finishing the calculation and outputting the last generation of individuals.

5. The method of claim 4, wherein the step two further comprises:

x_i，j(0)＝L(0，1)(Lj_min_j-max|i＝1，2，3...，M j＝1，2，3，...，d)_{j_min} (2.2)

x′_i(g)＝x_r1(g)+F·(x_r2(g)-x_r3(g)) (2.3)

wherein the cross probability cr is E [0, 1 ]]，x″_i，jRepresents the mutated individual x ″)_iThe jth dimension value of (a);

wherein x represents an individual solution vector,

wherein f (x) represents a fitness function value for the target individual;

when the individuals in the population are updated, adding the new individuals into a reference network set S of the Gaussian random field model_eval(ii) a If no change occurs, no update is performed;

6. The method according to claim 5, wherein in the third step, the fitness function of each individual in the initial population is calculated and the individual is placed in an external storage set, with respect to the last generation of individuals obtained by the single-target evolutionary algorithm in the second step as the initial population of the multi-target differential evolutionary algorithm; calculating adjacent subproblems of each individual and carrying out Gaussian variation on the individual; predicting the fitness function of the new individual by using a Gaussian random field prediction model, and judging by combining a pre-screening strategy; updating the reference point and neighborhood problems of the individual; and comparing the accuracy of the network with a control threshold: minimum accuracy rate r of multi-target algorithm control joining external storage set₂Comparing, and judging whether to update the external storage set; when the individual accuracy rates in the external storage set are all higher than the control threshold: control output minimum accuracy rate r of multi-target algorithm₃And when the time or the evolution algebra is larger than the control evolution algebra threshold value, ending the circulation and outputting the external storage set.

7. The method of claim 6, wherein the step three further comprises:

step 3.2: calculating the fitness function of each individual of the population, and storing the fitness function into a reference network set S of an external storage set outEP and a Gaussian random field model_evalPerforming the following steps;

step 3.4: the Gaussian variation of the neural network individuals: in the evolution process of the first generation of population individuals, two indexes p and q are selected randomly from the neighborhood of the ith subproblem in a circulating manner, and then corresponding individuals are obtained

And

And adding a Gaussian random variable into each dimension value of the variant individual according to the probability, as shown in a formula (3.1):

wherein the scaling factor F ∈ [0, 2 ]]，rnd_U(0, 1) represents from [0, 1 ]]Obtained by uniform random sampling within rangeA decimal number; rnd (r)_G(0, sigma) represents a Gaussian random vector with the mean value of 0 and the standard deviation of sigma, the value of sigma is one twentieth of the value range of the corresponding dimension element, the value is 0.5,

the expression means an individual after the mutation,

representing the original individual;

And to

All have:

8. The method according to claim 7, wherein in the fourth step, the external storage set outputted in the third step is obtained; adopting a clustering algorithm to narrow the selection range of the network, forming K clusters according to two objective function values of the accuracy and the difference of each network in the set, and clustering networks which are relatively close to each other in the set into a class; and finally, selecting a central network from each cluster, constructing a new deep neural network set and outputting the new deep neural network set as a final result.

9. The method of claim 8, wherein the step four further comprises:

step 4.2: initializing centers of K categories randomly;

step 4.3: an initial clustering center matrix;