CN110399917B

CN110399917B - Image classification method based on hyper-parameter optimization CNN

Info

Publication number: CN110399917B
Application number: CN201910671268.0A
Authority: CN
Inventors: 付俊; 王思淼
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2023-04-18
Anticipated expiration: 2039-07-24
Also published as: CN110399917A

Abstract

The invention provides an image classification method based on hyper-parameter optimization (CNN), and belongs to the technical field of image recognition. The method selects and selects the structural parameters of the convolutional layer C1 and the pooling layer P1 as the hyper-parameters of the invention according to the structural characteristics of the CNN architecture, and limits the value range of the hyper-parameters to be (X) _l ,X _u ). Then, the hyper-parameters of the CNN are optimized by adopting two periodic variation PSO algorithms of global variation and local variation, so that the condition that the optimization of the hyper-parameters of the traditional PSO algorithm stays at local optimum is avoided, and the image classification performance which is more competitive than that of the traditional PSO algorithm is obtained. The efficiency and the cost of deep learning CNN hyper-parameter optimization are obviously improved, the image classification potential of a CNN framework is furthest exerted, hardware resources and calculation cost when the CNN carries out image classification are saved, and the method has a certain application value in engineering practice.

Description

Image classification method based on hyper-parameter optimization CNN

Technical Field

The invention relates to the technical field of image recognition, in particular to an image classification method based on hyper-parameter optimization (CNN).

Background

The image classification technology has been developed more and more mature, and CNN architectures suitable for different scene classifications are infinite, but a complex CNN structure usually consumes hardware resources and calculation cost. Before the CNN is used for image classification training, some parameters in the CNN need to be set in advance, the parameters are called as hyper-parameters, and the classification performance of the CNN image can be improved to the greatest extent on the premise of not changing the structure of the CNN by selecting a group of optimal hyper-parameters. Therefore, it is particularly important in engineering practice to select an appropriate hyper-parameter to fully release the image classification performance of the CNN architecture.

There have been some efforts in the study of hyper-parametric optimization methods for image classification, and earlier studies focused on the use of machine-learned hyper-parametric optimization methods for CNN. The hyper-parameter optimization method is mainly divided into model-free optimization and model-based optimization, the most advanced method of the hyper-parameter optimization method comprises simple grid and random search, and the most advanced method comprises a heuristic optimization algorithm based on population and Bayesian optimization (GP) based on Gaussian process. Heuristic optimization algorithms are of particular interest for CNN hyper-parametric optimization, where particle swarm optimization has proven to be very effective in solving multiple tasks in many areas due to its simplicity and versatility, and it has great potential for large-scale parallelization. The search efficiency of the super-parameter optimization based on the particle swarm optimization is far superior to that of other super-parameter optimization algorithms such as grid search and random search, the search time of the super-parameter optimization is shortened, and the problems of low optimization efficiency, time consumption and the like of the traditional super-parameter optimization are solved. However, the particle swarm optimization has the problem of being prone to fall into local optimization, which causes the super-parameter optimization to stay in the local optimization rather than a group of globally optimal super-parameters, so that a group of super-parameters which enable the CNN performance to be optimal cannot be searched to a certain extent, and the CNN image classification cannot achieve the optimal result.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image classification method based on hyper-parameter optimization (CNN).

The technical scheme adopted by the invention is as follows:

an image classification method based on hyper-parameter optimization (CNN) comprises the following steps:

step 1: preprocessing an image data set needing to be classified, and proportionally dividing the image data set into a training set T, a testing set C and a verification set V, wherein the verification set V and the training set T are extracted from image data and satisfy | T | =10 LiV |.

And 2, step: and (3) building a CNN (convolutional neural network) framework which comprises a convolutional layer C1, a pooling layer P1, a convolutional layer C2 and a pooling layer P2 and is activated and terminated by Softmax. Selecting the structural parameters of the convolutional layer C1 and the pooling layer P1 as the hyper-parameters of the invention according to the structural characteristics of the CNN architecture, and determining the value range of the hyper-parameters as (X) _l ,X _u )。

And step 3: optimizing hyper-parameters of the CNN framework by adopting a periodic variation PSO algorithm on the verification set to obtain a group of hyper-parameter values X _i (g)；

Step 3-1: initial particle velocity, fitness function value, individual optimum position P _g Global optimum position P _i Setting the iteration number g as 0, the iteration precision delta as more than or equal to 0, the particle search space dimension as D and the number of particles as N;

the current position vector of each particle in the population is X _i ＝(x _i,1 ,x _i,2 ,...,x _i,D ) I =1,2,.., N, with the current velocity vector being V _i ＝(v _i,1 ,v _i,2 ,...,v _i,D ) I =1,2,.., N, with the individual optimum position vector P _g ＝(p _g，1 ,p _g,2 ,...,p _g,D ) G =1,2,.., N, with the global optimum position vector P _i ＝(p _i1 ,p _i2 ,...,p _iD )，i＝1,2,...,N；

Step 3-2: at the g-th iteration, each particle updates its velocity and position:

V _i (g+1)＝ωV _i (g)+c ₁ r ₁ (P _i (g)-X _i (g))+c ₂ r ₂ (P _g (g)-X _i (g)) (1)

X _i (g+1)＝X _i (g)+V _i (g) (2)

where ω is the inertia factor, c ₁ And c ₂ Are learning factors, all are constants, r ₁ And r ₂ Is a random number from the range (0,1);

step 3-3: selecting a global mutation operator to change the positions of all particles in the whole population, or selecting a local mutation operator to change the positions of elite particles in the population:

the formula of the global mutation operator and the local mutation operator for the position change is as follows:

wherein A is ₁ ，A ₂ Is a self-defined rangeFactor, constant, r ₃ ，r ₄ 0,1),

for elite particles, q is the number of new particles generated by the local mutation operator, f ₁ Is the global variation frequency, f ₂ Is the local variation frequency;

step 3-4: checking whether the speed and the position of the particles are out of range, and if so, replacing the corresponding particle value by the exceeding boundary value, wherein the specific judgment method comprises the following steps:

if V _i (g)≤V _l Then V is _i (g)＝V ₁ (ii) a If V _i (g)≥V _u Then V is _i (g)＝V ₁ (ii) a If X _i (g)≤X ₁ Then X _i (g)＝X ₁ (ii) a If X _i (g)≥X _u Then X _i (g)＝X ₁ ；

Wherein (V) ₁ ,V _u ) Is the velocity range of the particle, (X) ₁ ,X _u ) Is the position range of the particle;

step 3-5: the optimal position obtained by executing the steps 3-1 to 3-4 is the required over-parameter value X _i (g)。

And 4, step 4: will exceed parameter X _i (g) Inputting the optimized CNN into the CNN, and training the optimized CNN by using the training set obtained in the step 1;

and 5: inputting the test set obtained in the step 1 into the trained CNN to obtain a classification result of the test set C;

step 6: judging whether the iteration reaches a termination condition;

step 6-1: calculating the fitness function value of each particle period variation PSO:

wherein, CNN (X) _i (g) Is the accuracy of the classification result obtained in step 5 of claim 1, X _i (g) The super-molecule obtained in step 3 of claim 1A parameter value;

step 6-2: respectively updating individual historical optimal positions P by comparing the fitness function values of the particles of the iteration obtained in the step 6-1 _i (g) And the group optimal position P _g (g) Obtaining the optimal particle X of the iteration _min (g)：

Step 6-3: judging the fitness value of the optimal particle to increase by less than a threshold value represented by epsilon, and judging the position update of the optimal particle in the population to be less than

The minimum step length is expressed, and whether the iteration number g reaches the maximum iteration number g or not is judged _max If one of the above termination conditions is satisfied, the iteration is terminated.

And 7: if the termination condition is not reached, executing the step 3 to the step 6 to continue iteration;

and step 8: if the end condition is reached, the final optimal hyper-parameter is obtained and recorded as X _min (g)；

And step 9: the final optimal hyperparameter X _min (g) And substituting the image data into the CNN, and classifying the image of the whole data set to obtain a classification result.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the image classification method based on the hyperparametric optimization CNN effectively solves the problem that the particle swarm algorithm is easy to fall into local optimization, and further improves the convergence speed and the convergence precision of the particle swarm algorithm; the searching performance of the hyper-parameter optimization method is better, so that the problem of unsatisfactory CNN classification precision caused by improper hyper-parameter selection is avoided to a certain extent, the image classification precision is improved, and the performance of CNN image processing is exerted to the greatest extent.

Drawings

FIG. 1 is a flowchart of an image classification method based on hyper-parametric optimization CNN according to the present invention;

fig. 2 is a segment of a handwritten digit recognition MNIST dataset for image classification in a first embodiment of the invention;

FIG. 3 is a diagram showing the change of the classification performance of the MNIST data set with the iteration number before and after the improvement of the hyper-parameter optimization method in the first embodiment of the present invention;

FIG. 4 is a fragment of an object recognition cifar-10 dataset for image classification according to a second embodiment of the present invention;

fig. 5 is a graph showing the classification performance of the cifar-10 data set as a function of iteration number before and after the improvement of the hyper-parameter optimization method according to the second embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the method of the present embodiment is as follows.

Step 1: the reference data set MNIST data set is chosen as the data set to be classified, the data set being shown in fig. 2 as a segment, the data set having 70000 grayscale images, each image being 28 × 28 pixels and containing 10 classes, each class having 7000 images. 60000 images in the data set are randomly selected as a training set, the remaining 10000 images are selected as a testing set, and 6000 images are randomly selected from the training set as a verification set.

Step 2: and building a CNN architecture which comprises a convolutional layer C1, a pooling layer P1, a convolutional layer C2 and a pooling layer P2 and is activated and terminated by Softmax. Selecting the structural parameters of the convolutional layer C1 and the pooling layer P1 as the hyper-parameters of the invention according to the structural characteristics of the CNN architecture, and determining the value range of the hyper-parameters as (X) _l ,X _u ) As shown in table 1.

TABLE 1 parameters of convolutional and max-pooling layers and their allowable ranges

Step 3-1: initial particle velocity, fitness function value, individual optimum position P _g Global optimum position P _i Setting the iteration number g as 0, the iteration precision delta is more than or equal to 0, the particle search space dimension D =4, and the number of particles N =10;

the current position vector of each particle in the population is X _i ＝(x _i,1 ,x _i,2 ,...,x _i,4 ) I =1,2, 10, with the current velocity vector being V _i ＝(v _i,1 ,v _i,2 ,...,v _i,4 ) I =1,2, 10, the individual optimal position vector is P _g ＝(p _g，1 ,p _g,2 ,...,p _g,4 ) G =1,2, 10, the global optimum position vector is P _i ＝(p _i1 ,p _i2 ,...+p _i4 )，i＝1,2,...,10；

X _i (g+1)＝X _i (g)+V _i (g) (2)

wherein ω = c ₁ ＝c ₂ ＝0.5，r ₁ And r ₂ Is a random number from the range (0,1);

wherein A is ₁ ＝A ₂ ＝1，r ₃ 、r ₄ Is a random number in the range of (0,1),

for elite particles, the number of new particles generated by the local mutation operator q =3, global mutation frequency f ₁ =10, local variation frequency f ₂ ＝2；

Step 3-4: checking whether the speed and position of the particles are out of range, and if so, replacing the corresponding particle value with the boundary value beyond the speed and position, wherein the specific judgment method comprises the following steps:

Wherein (V) ₁ ,V _u ) The particle velocity range is (-2,2), (X) ₁ ,X _u ) The position range values for the particles are as described in table 1;

and 5: inputting the test set obtained in the step 1 into the trained CNN to obtain a classification result of the image;

step 6: judging whether the iteration reaches a termination condition;

step 6-1: calculating a fitness function value of each particle period variation PSO, and drawing a fitness function curve, as shown in FIG. 3:

wherein, CNN (X) _i (g) Is the accuracy, X, of the classification result obtained in step 5 of claim 1 _i (g) The value of the hyperparametric value obtained in step 3 of claim 1;

step 6-2: respectively updating individual historical optimal positions P by comparing the particle fitness function values of the iteration obtained in the step 6-1 _i (g) And the group optimal position P _g (g) Obtaining the optimal particle X of the iteration _min (g)：

Step 6-3: judging whether the increase of the optimal particle fitness value in five continuous generations is smaller than a threshold value epsilon =0.0001, and judging whether the update of the optimal particle position in the population in 5 continuous generations is smaller than the minimum step length

And judging whether the iteration time g reaches the maximum iteration time 20, and if one of the termination conditions is met, terminating the iteration.

And 7: if the termination condition is not reached, executing the step 3 to the step 6;

and 8: if the end condition is reached, the final optimal hyper-parameter is obtained and recorded as X _min (g)；

Example 2

As shown in fig. 1, the method of the present embodiment is as follows.

Step 1: the reference dataset object identification cifar-10 dataset was chosen as the dataset to be classified, the segment of the dataset is shown in fig. 4, the dataset has 60000 32 × 32 pixel color images, which contain 10 classes, each of which has 6000 images. 50000 images in the data set are randomly selected as a training set, the remaining 10000 images are used as a test set, and 5000 images in the training set are randomly selected as a verification set.

Step 2 to step 9 are the same as those in example 1, and the classification performance of the cifar-10 data sets before and after the improvement of the obtained hyper-parameter optimization method is changed with the iteration number as shown in fig. 5.

The image classification accuracy comparison of the handwritten digit recognition MNIST data set and the object recognition cifar-10 data set before and after the improvement of the hyper-parameter optimization method is shown in the table 2:

TABLE 2 accuracy of CNN image classification for different datasets before and after hyper-parameter optimization

The result shows that the method of the invention improves the image classification accuracy to a certain extent on the premise of not changing the classification CNN framework, exerts the image classification potential of the CNN framework to the maximum extent, saves the hardware resources and the calculation cost when the CNN carries out image classification, and has a certain application value in engineering practice.

Claims

1. An image classification method based on hyper-parameter optimization CNN is characterized by comprising the following steps:

step 1: preprocessing an image data set to be classified, and dividing the image data set into a training set T, a test set C and a verification set V in proportion;

step 2: building a CNN architecture, and selecting a hyper-parameter and a value range thereof according to the structural characteristics of the CNN architecture;

and step 3: optimizing CNN architecture by adopting periodic variation PSO algorithm on verification setHyper-parametric to obtain a set of hyper-parameter values X _i (g)；

and 5: inputting the test set C obtained in the step 1 into the trained CNN to obtain a classification result of the test set C;

step 6: judging whether the iteration reaches a termination condition;

and 8: if the end condition is reached, the final optimal hyper-parameter is obtained and is marked as X _min (g)；

And step 9: the final optimal hyperparameter X is obtained _min (g) Substituting into CNN, classifying the images of the whole data set to obtain a classification result;

the process of optimizing the hyper-parameters of the CNN architecture by adopting the periodic variation PSO algorithm in the verification set in the step 3 is as follows:

step 3-1: initializing particle speed, fitness function value and individual optimal position P _g Global optimum position P _i Setting the iteration number g as 0, the iteration precision delta as more than or equal to 0, the particle search space dimension as D and the number of particles as N;

X _i (g+1)＝X _i (g)+V _i (g) (2)

wherein A is ₁ ，A ₂ Is a self-defined amplitude factor, which is a constant, r ₃ ，r ₄ Is a random number in the range of (0,1),

step 3-5: the optimal position obtained by executing the steps 3-1 to 3-4 is the required over-parameter value X _i (g)；

The process of judging whether the iteration reaches the termination condition in the step 6 is as follows:

wherein, CNN (X) _i (g) Is the accuracy, X, of the classification result obtained in said step 5 _i (g) Obtaining the super parameter value obtained in the step 3;

And 6-3: judging the fitness value of the optimal particle to increase by less than a threshold value represented by epsilon, and judging the position update of the optimal particle in the population to be less than

2. The method for classifying images based on hyperparametric optimized CNN of claim 1, wherein the verification set V and the training set T in step 1 are extracted from image data, and satisfy | T | =10 calvities.

3. The image classification method based on hyper-parametric optimization (CNN) according to claim 1, wherein the CNN framework constructed in the step 2 comprises a convolutional layer C1, a pooling layer P1, a convolutional layer C2 and a pooling layer P2, and is terminated by Softmax activation, the structural parameters of the convolutional layer C1 and the pooling layer P1 are selected as the hyper-parameters according to the structural characteristics of the CNN framework, and the value range of the hyper-parameters is determined as (X) _l ,X _u )。