CN111222545B

CN111222545B - Image classification method based on linear programming incremental learning

Info

Publication number: CN111222545B
Application number: CN201911348984.1A
Authority: CN
Inventors: 白静; 员安然; 王鼎臣; 周华吉; 肖竹; 张丹; 杨韦洁
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2022-04-19
Anticipated expiration: 2039-12-24
Also published as: CN111222545A

Abstract

The invention discloses an image classification method based on linear programming incremental learning, which comprises the following steps: constructing a convolutional neural network; generating an initial training set; initially training a convolutional neural network; obtaining a class-average feature vector of an initial training set; judging whether the category of the image to be classified belongs to the category in the initial training set, if so, classifying by using a convolutional neural network, and if not, executing the next step; generating an incremental training set; obtaining a class-average feature vector of an incremental training set; solving a weight column vector by using a linear programming model; updating the convolutional neural network; and classifying by using a convolutional neural network. The method has the advantages of strong self-adaptive capacity, capability of generating the incremental training set by only one image, capability of finishing incremental learning by only needing few computing resources and computing time, and high classification accuracy of the initial training set and the incremental training set.

Description

Image classification method based on linear programming incremental learning

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification method based on linear programming incremental learning in the technical field of image classification. The method can be used for classifying the main body target in the optical image or the ground object target in the hyperspectral image.

Background

Image classification is an important field of image processing, and is an image processing method for distinguishing different types of objects according to different characteristics of the different objects reflected in an image. The problem faced in the field of image classification at present is that a trained model can only classify images of classes contained in a training set, but images of classes not contained in the training set cannot be classified correctly, a mechanism for learning images of classes not contained in training data needs to be trained by a large number of images of calibration classes, a large number of calibration work is time-consuming and labor-consuming, and a large number of computing resources and time are consumed for training.

In the patent technology 'an image classification training method capable of incremental learning in a big data scene' (patent application number: 201710550339.2, publication number: CN 107358257B) owned by southern China university, an image classification training method capable of incremental learning in a big data scene is provided. Firstly, training an image classifier for initial image data; secondly, when a new class image appears, incremental training is needed to be carried out on the initial model to obtain an updated image classifier; and finally, classifying the test data by using the trained incremental image classifier to obtain a classification result. The method can effectively carry out incremental learning and image classification on the new class images. However, the method still has the following defects: a large number of calibration-class images are required to form the incremental training set, requiring labor and time costs.

The university of west ann traffic proposed an automatic incremental learning method for image recognition in the patent document "automatic incremental learning method for image recognition" (patent application No. 201810574578.6, publication No. CN 108805196 a). The incremental method comprises the steps of firstly reading a plurality of marked image data to train to obtain a pre-training model, then calculating an evaluation standard alpha, classifying the non-marked data according to an entropy loss function beta, and finally inputting the new training data obtained each time into the current model again to train until iteration is completed. This approach provides a solution for image recognition using a small amount of label data and relatively much label-free data (or data containing interference). However, the method still has the disadvantages that: involving iterative processes, consuming a large amount of computing resources and time, and having slow response to incremental categories.

Disclosure of Invention

The invention aims to provide an image classification method based on linear programming incremental learning, aiming at the defects of the prior art. The method solves the problems that the existing image classification technology cannot correctly classify images of classes not contained in a training set, and the incremental learning process needs a large number of calibrated images of classes and consumes a large number of computing resources and time.

In order to achieve the purpose, the idea of the invention is as follows: the convolutional neural network is divided into a feature extraction module and a classification module, the feature extraction module is fixed in the incremental training process, and only the classification module is updated. And when the classification module is updated, the updated classification module is ensured to correctly classify the characteristics of the initial training class and the characteristics of the incremental training class.

The method comprises the following specific steps:

(1) constructing a convolutional neural network:

(1a) a10-layer feature extraction module is built, and the structure of the module is as follows in sequence: input layer → first convolution layer → second convolution layer → first pooling layer → third convolution layer → fourth convolution layer → second pooling layer → Flatten layer → normalization layer → first full-link layer;

the parameters of each layer are set as follows: respectively setting the number of convolution kernels in the first convolution layer, the number of convolution kernels in the fourth convolution layer, the number of convolution kernels, the number of step sizes of the first pooling layer, the number of the second pooling layer, the number of the convolution kernels, the step sizes of the first pooling layer, the second pooling layer, the number of the convolution kernels, the number of the second pooling layer, the number of convolution kernels, the number of convolution kernels, the second pooling layer, the number of convolution kernels, the number of the second pooling layer, the number of convolution kernels, the number of the second pooling layer, the number of the second pooling layer, the number of the convolution kernels, the number of the second pooling layer, the number of the second pooling layer, the number of the second convolution kernels, the number of the convolution kernels, the number of the second convolution kernels, the number of the convolution kernels, the number of the convolution kernels, the number of the convolution kernels, the second convolution kernels, the number of the convolution kernels, the number of the first convolution kernels, the number of the convolution; the first full connection layer consists of 512 nodes;

(1b) building a classification module consisting of a point build-up layer and an output layer; the row number in the weight determinant of the point accumulation layer is 512, the column number is equal to the total number of the label categories of all the input images, and the activation function of the output layer is softmax;

(1c) connecting the feature extraction module and the classification module in sequence to form a convolutional neural network;

(2) generating an initial training set:

inputting at least 1000 images with labeled categories, wherein all the images at least comprise 3 labeled categories, preprocessing each input image, and forming an initial training set by all the preprocessed images;

(3) initial training of the convolutional neural network:

inputting the initial training set into a convolutional neural network, updating the weight of each layer of the convolutional neural network by using a gradient descent method until the root mean square error value is reduced to be below 5.0, and obtaining the initially trained convolutional neural network;

(4) obtaining a class-average feature vector of an initial training set:

(4a) sequentially inputting each image in the initial training set into an initially trained convolutional neural network, and taking each image 512-dimensional output vector output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(4b) averaging each element of the feature vectors of all the images of the same labeling category, and forming a category average feature vector of the labeling category by the average values of all the elements;

(5) judging whether the category of the image to be classified belongs to the category in the initial training set, if so, executing the step (10), otherwise, executing the step (6);

(6) generating an incremental training set:

inputting one or more images with the same labeling category as the images to be classified, preprocessing all the input images in the same way as in the step (2), and forming all the preprocessed images into an incremental training set;

(7) obtaining a class-average feature vector of an incremental training set;

(7a) sequentially inputting each image in the incremental training set into an initially trained convolutional neural network, and taking each image 512-dimensional output vector output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image;

(7b) averaging each element of the feature vectors of all the images in the incremental training set, and forming the average value of all the elements into a class-average feature vector of the labeling class;

(8) solving the weight column vector by using a linear programming model:

(8a) and (3) improving the classification score of the class-average feature vector of the training set to the maximum extent under the limitation of a correct classification constraint condition by using the following formula:

max Z＝f·W

s.t.f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, s.t. represents the constraint condition, f_iA class-average feature vector representing the ith class in the initial training set, i is 1 … n, n represents the total number of all labeled classes in the initial training set, and W_jRepresenting j-th row elements in a point-product layer weight determinant of a classification module in the initially trained convolutional neural network, wherein the value of j is correspondingly equal to i;

(8b) solving the linear model by adopting one of the existing linear programming tool software to obtain a weight column vector to be solved; in the embodiment of the invention, a python language sklern library is adopted to solve the linear model;

(9) updating the convolutional neural network:

(9a) combining the weight determinant of the point lamination of the classification module in the initially trained convolutional neural network with the weight column vector obtained in the step (8b) in columns to obtain an updated weight determinant, and replacing the weight determinant of the point lamination of the classification module in the initially trained convolutional neural network with the updated weight determinant to obtain an updated classification module;

(9b) sequentially connecting a feature extraction module of the initially trained convolutional neural network with an updated classification module to form an updated convolutional neural network, and then executing the step (10);

(10) classification with convolutional neural networks:

and inputting the image to be classified into a convolutional neural network, and outputting a classification result.

Compared with the prior art, the invention has the following advantages:

firstly, because the invention utilizes the class-average feature vector of the incremental training set to establish the linear programming model, the invention has no requirement on the minimum number of images in the incremental training set, and overcomes the problems of manpower cost and time cost in the prior art that the incremental training set is formed by a large number of images labeled with classes, so that the invention can generate the incremental training set and complete the incremental learning by only one image labeled with the class.

Secondly, only forward propagation calculation is needed when the class-average feature vectors of the incremental training set are obtained, and only one linear programming model is needed to be solved in the incremental process, so that the problems that the incremental process in the prior art relates to an iterative process, the calculation is complex and time-consuming, and the response to the incremental categories is slow are solved, and the incremental learning can be completed only by few calculation resources and calculation time.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a simulation diagram of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

The specific implementation steps of the present invention are described in further detail with reference to fig. 1.

Step 1, constructing a convolutional neural network.

A10-layer feature extraction module is built, and the structure of the module is as follows in sequence: the input layer → the first convolution layer → the second convolution layer → the first pooling layer → the third convolution layer → the fourth convolution layer → the second pooling layer → the Flatten layer → the normalization layer → the first fully-connected layer, and the parameters of each layer are set as follows: respectively setting the number of convolution kernels in the first convolution layer, the number of convolution kernels in the fourth convolution layer, the number of convolution kernels, the number of step sizes of the first pooling layer, the number of the second pooling layer, the number of the convolution kernels, the step sizes of the first pooling layer, the second pooling layer, the number of the convolution kernels, the number of the second pooling layer, the number of convolution kernels, the number of convolution kernels, the second pooling layer, the number of convolution kernels, the number of the second pooling layer, the number of convolution kernels, the number of the second pooling layer, the number of the second pooling layer, the number of the convolution kernels, the number of the second pooling layer, the number of the second pooling layer, the number of the second convolution kernels, the number of the convolution kernels, the number of the second convolution kernels, the number of the convolution kernels, the number of the convolution kernels, the number of the convolution kernels, the second convolution kernels, the number of the convolution kernels, the number of the first convolution kernels, the number of the convolution; the first fully-connected layer consists of 512 nodes.

Building a classification module consisting of a point build-up layer and an output layer; the row number in the weighted determinant of the point stack is 512, the column number is equal to the total number of the label categories of all the input images, and the activation function of the output layer is softmax.

And the feature extraction module and the classification module are sequentially connected to form a convolutional neural network.

And 2, generating an initial training set.

Inputting at least 1000 images with labeled categories, wherein all the images at least comprise 3 labeled categories, preprocessing each input image, and if the input image is an optical image, sequentially preprocessing each image by rotating, shearing, stretching, reducing noise, changing brightness and changing contrast; if the input image is a hyperspectral image, performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image in sequence, and forming an initial training set by all preprocessed images.

And 3, initially training the convolutional neural network.

And inputting the initial training set into the convolutional neural network, and updating the weight of each layer of the convolutional neural network by using a gradient descent method until the root mean square error value is reduced to be below 5.0 to obtain the initially trained convolutional neural network.

And 4, acquiring the class average feature vector of the initial training set.

And sequentially inputting each image in the initial training set into an initially trained convolutional neural network, and taking each 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image.

And averaging each element of the feature vectors of all the images of the same labeling category, and forming the average value of all the elements into a category-average feature vector of the labeling category.

And 5, manually judging whether the category of the image to be classified belongs to the category in the initial training set, if so, executing the step 10, otherwise, executing the step 6.

And 6, generating an incremental training set.

Inputting one or more images with the same labeling type as the images to be classified, preprocessing all the input images, and if the input images are optical images, preprocessing rotation, shearing, stretching, noise reduction, brightness change and contrast change of each image in sequence; if the hyperspectral images are input, performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image in sequence, and forming all preprocessed images into an incremental training set.

And 7, acquiring the class-average feature vector of the incremental training set.

And sequentially inputting each image in the incremental training set into an initially trained convolutional neural network, and taking each 512-dimensional output vector of each image output by a first full-connection layer in a feature extraction module of the network as a feature vector of the image.

And averaging each element of the feature vectors of all the images in the incremental training set, and forming the average value of all the elements into the class-average feature vector of the labeling category.

And 8, solving the weight column vector by using a linear programming model.

And (3) improving the classification score of the class-average feature vector of the training set to the maximum extent under the limitation of a correct classification constraint condition by using the following formula:

max Z＝f·W

s.t.f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, s.t. represents the constraint condition, f_iA class-average feature vector representing the ith class in the initial training set, i is 1 … n, n represents the total number of all labeled classes in the initial training set, and W_jIn the determinant of the dot-product layer weight of the classification module in the convolution neural network representing the initial trainingThe j column element corresponds to the j value.

And solving the linear model by adopting a python language sklern library to obtain a weight column vector.

And 9, updating the convolutional neural network.

Combining the weight determinant of the point-product layer of the classification module in the initially trained convolutional neural network and the weight column vector obtained in the eighth step in columns to obtain an updated weight determinant, and replacing the weight determinant of the point-product layer of the classification module in the initially trained convolutional neural network with the updated weight determinant to obtain an updated classification module.

And (3) connecting the initially trained feature extraction module of the convolutional neural network with the updated classification module in sequence to form an updated convolutional neural network, and then executing the step 10.

And step 10, classifying by using a convolutional neural network.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. the experimental conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the GPU is NVIDIA GeForce GTX 1080Ti/PCIe/SSE2, 20 cores, the main frequency is 2.4GHz, and the memory size is 64 GB; the video memory size is 20 GB.

The software platform of the simulation experiment of the invention is as follows: the operating system was ubuntu18.04 LTS, version 1.2.1 for TensorFlow.

2. Emulated content

The simulation experiment of the invention is to classify each image containing ground features of the input Paviau university Paviau hyperspectral data set by adopting the method and two prior arts (a new-class and old-class sample collaborative training RR method and a weight parameter random initialization RIC method) respectively to obtain a classification result.

The RR method for the cooperative training of the new class and the old class samples in the prior art comprises the following steps: an image classification method proposed in a published paper "Deep connected neural networks for hyperspectral image classification" (Journal of Sensors, vol.2015, pp.1-12) of w.hu, h.yangyu, w.li, z.fan, and l.hengchao, which is abbreviated as a RR method for collaborative training of new and old samples.

The prior art method for randomly initializing the RIC by using the weight parameters refers to the following steps: hang Qi, in its published paper "Low-shot learning with embedded weights" (Proceedings of the IEEE Conference on Computer Vision and Pattern recognition.2018:5822-5830), sets forth an image classification method, referred to as weight parameter random initialization RIC method.

The input image used by the simulation experiment is each image containing ground objects, which is extracted from Paviau university high spectral data set, the size of the image is 11 multiplied by 103, and the image format is mat. The PaviaU hyperspectral dataset of the university of pavian is hyperspectral data taken by an airborne reflectance optical spectroscopy imager (reflexiviss spectroscopy imaging system, ross-03) in germany in 2003 at the university of pavian in italy. The spectral imager continuously images 115 bands in the wavelength range of 0.43-0.86 μm, the spatial resolution of the images is 1.3m/pixel, and 12 bands are rejected due to the influence of noise. The PaviaU hyperspectral data set of the university of Pavea uses images formed by 103 spectral bands after noise removal. The size of the data is 610 × 340, and the data contains 2207400 pixels, wherein the number of the pixels containing ground features is only 42776, the pixels contain bitumen, meadow, rubble, woods, metal plates, bare soil, asphalt, stone bricks and shadow 9 ground features, and the rest pixels are background pixels. An image of 11 × 11 pixels taken out with one pixel including a feature as the center is taken as one image including a feature.

In a simulation experiment, a training set and a test set are selected according to the following proportion:

(1) taking out all images of eight types of asphalt, meadow, macadam, forest, metal plate, bare soil, asphalt and stone brick of the Paviau university Paviau hyperspectral dataset, wherein 5% of all images of each type are used for generating an initial training set, and the rest 95% of all images of each type are used for testing. The number of images in each category of the initial training set and the number of images used for testing are shown in table 1.

(2) And taking out all images of the shadow class of the PaviaU high-spectrum data set of the university of Pavian, wherein one image of all images of the shadow class is used for generating an incremental training set, and all other images of the shadow class are used for testing. The number of images for each class of the incremental training set and the number of images used for testing are shown in table 2.

TABLE 1 List of number of each class of images in initial training set and test number

Categories	Initial training set	Number of test images
			Asphalt	33	6598
Meadow	93	18556
			Crushing stone	10	2089
(Forest)	15	3049
			Metal plate	6	1339
Bare soil	25	5004
			Asphalt	6	1324
Stone brick	18	3664

TABLE 2 summary of number of images in each class and number of tests in incremental training set

Categories	Incremental training set	Number of test images
			Shadow masking	1	946

The effect of the present invention will be further described with reference to the simulation diagram of fig. 2.

Fig. 2(a) is a true terrain map of a PaviaU hyperspectral dataset at the university of parkia, which has a size of 610 × 340 pixels. Fig. 2(b) is a result diagram of classifying each image containing ground features taken from the PaviaU hyperspectral dataset of the university of paviana by using the RR method of the prior art for the collaborative training of the new class and the old class samples. Fig. 2(c) is a result diagram of classifying each image containing ground features taken from the PaviaU hyperspectral dataset of the university of paviana by using a weight parameter random initialization RIC method in the prior art. Fig. 2(d) is a diagram showing the result of classifying each image containing the feature taken from the PaviaU hyperspectral dataset of the university of paviae by using the method of the present invention.

In order to compare the classification effect of different methods on each category, the classification result of the three methods is evaluated by using the classification accuracy evaluation index. The classification accuracy rates of nine classes in the simulation experiment of the invention are respectively calculated by using the following formulas, and the classification accuracy rates of different methods for each class are drawn as table 3:

TABLE 3 Classification accuracy (%), for different analog images, for each method

Image classification	RR	RIC	Method for producing a composite material
				Asphalt	0.00	99.08	95.65
Meadow	68.05	99.87	98.32
				Crushing stone	5.16	94.64	93.63
(Forest)	0.07	98.66	96.68
				Metal plate	0.00	99.53	91.64
Bare soil	0.04	89.96	86.21
				Asphalt	0.00	86.36	82.44
Stone brick	0.63	98.91	91.48
				Shadow masking	91.67	0.00	100.00

As can be seen from fig. 2(b) in combination with table 3, compared with the RR method for collaborative training of new class and old class samples and the RIC method for random initialization of weight parameters in the prior art, the accuracy of each class included in the initial training set is very low, because the method does not inherit any classification capability of the calibration class in the initial training set in the incremental process, and only the labeled class in the incremental training set can be learned.

As can be seen from fig. 2(c) in combination with table 3, the weight parameter random initialization RIC method in the prior art has high classification accuracy for classes in the initial training set, but has low classification accuracy for classes in the incremental training set due to failure to effectively learn the classes in the incremental training set.

As can be seen from fig. 2(d) in combination with table 3, the classification result of the present invention is superior to the classification results of the two prior art in the classification ability of the initial training set to the calibration category and the incremental learning can be completed on the incremental training set generated from one image.

As can be seen from fig. 2 in conjunction with table 3, the above simulation experiments show that: the method can utilize the feature extraction module of the convolution neural network which is trained well initially to extract the features of the images in the incremental training set, and on the premise of keeping the capability of identifying the images in the initial training set, the incremental learning is realized quickly and efficiently, so that the method can identify the categories contained in the initial training set and the incremental training set. The image classification method is suitable for various application scenes, has obvious advantages in the application scenes with unbalanced data or strong timeliness, and is an efficient and flexible image classification method.

Claims

1. An image classification method based on linear programming incremental learning is characterized by comprising the following steps of constructing a convolutional neural network formed by connecting a feature extraction module and a classification module, initially training the convolutional neural network, classifying by using the convolutional neural network, if pictures to be classified with the categories not belonging to the categories in an initial training set are encountered, establishing a linear programming model by using the average features of the initial training set and the incremental training set, solving a weight column vector, and updating a classifier, wherein the method comprises the following steps:

(1) constructing a convolutional neural network:

(2) generating an initial training set:

(3) initial training of the convolutional neural network:

(4) obtaining a class-average feature vector of an initial training set:

(6) generating an incremental training set:

(7) obtaining a class-average feature vector of an incremental training set;

(8) solving the weight column vector by using a linear programming model:

max Z＝f·W

s.t.f_i·W＜f_i·W_j

f·W＞f·W_j

wherein max represents the maximization operation, Z represents the objective function, f represents the class-average feature vector of the incremental training set, represents the dot product operation, W represents the weight column vector to be solved, and s.t represents the constraint condition，f_iA class-average feature vector representing the ith class in the initial training set, i is 1 … n, n represents the total number of all labeled classes in the initial training set, and W_jRepresenting j-th row elements in a point-product layer weight determinant of a classification module in the initially trained convolutional neural network, wherein the value of j is correspondingly equal to i;

(8b) solving the linear model to obtain a weight column vector;

(9) updating the convolutional neural network:

(10) classification with convolutional neural networks:

2. The image classification method based on linear programming incremental learning of claim 1, characterized in that: the preprocessing in the step (2) and the step (6) is to perform preprocessing of rotating, cutting, stretching, denoising, changing brightness and changing contrast on each image in sequence if the input image is an optical image; and if the input image is a hyperspectral image, sequentially performing Principal Component Analysis (PCA) dimensionality reduction and normalization preprocessing on each image.

3. The image classification method based on linear programming incremental learning of claim 1, characterized in that: and (3) solving the linear model in the step (8b) by adopting any one of linear programming tool software.