CN111815529A

CN111815529A - Low-quality image classification enhancement method based on model fusion and data enhancement

Info

Publication number: CN111815529A
Application number: CN202010607913.5A
Authority: CN
Inventors: 王道累; 张天宇; 朱瑞; 孙嘉珺; 李明山; 李超; 韩清鹏; 袁斌霞
Original assignee: Shanghai Electric Power University
Current assignee: Shanghai Electric Power University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-10-23
Anticipated expiration: 2040-06-30
Also published as: CN111815529B

Abstract

The invention relates to a low-quality image classification enhancement method based on model fusion and data enhancement, which comprises the following steps of: s1: establishing an image set; s2: performing data enhancement on the image set; s3: constructing and training a VGG16 convolutional neural network model; s4: constructing and training a ResNet convolution neural network model; s5: inputting an image to be classified; s6: acquiring a first probability vector and a second probability vector of an image to be classified; s7: acquiring a fusion probability vector, acquiring the image type of the image to be classified, if the image type is clear, entering a step S9, and if not, entering a step S8; s8: enhancing the image to be classified to obtain an enhanced image, and inputting the enhanced image as the image to be classified into step S6; s9: and outputting the image. Compared with the prior art, the method and the device have the advantages that two modes of data enhancement and model fusion are adopted to improve the accuracy of classification, the images can be effectively classified, and the enhancement effect is good.

Description

Low-quality image classification enhancement method based on model fusion and data enhancement

Technical Field

The invention relates to a low-quality image classification enhancement method, in particular to a low-quality image classification enhancement method based on model fusion and data enhancement.

Background

When a camera shoots a picture, a low-quality image is usually shot due to poor lighting environment or unstable self-stability, and the like, so that the image details are lost due to the fact that shadow, low brightness or blurred pictures exist on the low-quality image, and subsequent operations such as identification and analysis are difficult to perform on the image.

In a low-quality image enhancement method under extreme weather conditions disclosed in chinese patent No. cn201610079472.x, etc., a classification enhancement method for distinguishing a haze image and a rain and snow image according to chromaticity component values is proposed. The method adopts a pure physics algorithm, can distinguish a few image types, has poor robustness, and is very easy to have misjudgment and the like. In an imaging identification method and system in severe weather disclosed in chinese patent CN201811484514.3, a method for enhancing and then identifying collected low-quality images in severe weather is proposed, an electronic image stabilization algorithm is used to eliminate motion blur of the images or an adaptive image defogging algorithm is used to eliminate cloud and fog interference of the images, and then a VGG16 convolutional neural network is used to identify and classify the images. However, the invention does not mention how to distinguish whether the acquired image is a low-quality image, and only one enhancement algorithm is adopted to perform enhancement processing on the low-quality images of all categories, so that the enhancement effect is general.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a low-quality image classification enhancement method based on model fusion and data enhancement.

The purpose of the invention can be realized by the following technical scheme:

a low-quality image classification enhancement method based on model fusion and data enhancement comprises the following steps:

s1: establishing an image set, wherein the image set comprises a clear image, a fuzzy image and a low-brightness image;

s2: performing data enhancement on the images in the image set;

s3: constructing a VGG16 convolutional neural network model, and training the VGG16 convolutional neural network model through an image set;

s4: constructing a ResNet convolutional neural network model, and training the ResNet convolutional neural network model through an image set;

s5: inputting an image to be classified;

s6: inputting an image to be classified into a VGG16 convolutional neural network model to obtain a first probability vector of the image to be classified, inputting the image to be classified into a ResNet convolutional neural network model to obtain a second probability vector of the image to be classified;

s7: fusing the first probability vector and the second probability vector to obtain a fused probability vector, and acquiring the image type of the image to be classified, if the image type is clear, entering step S9, otherwise entering step S8;

s8: selecting an image enhancement algorithm corresponding to the image type to enhance the image to be classified to obtain an enhanced image, and inputting the enhanced image serving as the image to be classified into the step S6;

s9: and outputting the image.

Preferably, the first probability vector is:

[λ_a1，λ_a2，λ_a3]

wherein λ is_a1，λ_a2，λ_a3Probability that the images acquired in the VGG16 convolutional neural network models respectively are sharp images, blurred images and low-brightness images respectively,

the second probability vector is:

[λ_b1，λ_b2，λ_b3]

wherein λ is_b1，λ_b2，λ_b3Respectively the probability that the images acquired in the ResNet convolutional neural network model are sharp images, blurred images and low-brightness images,

the fusion probability vector is as follows:

[λ_a1+λ_b1，λ_a2+λ_b2，λ_a3+λ_b3]

and the image type represented by the maximum value in the fusion probability vector is the image type of the image to be classified.

Preferably, the classifier of the VGG16 convolutional neural network model comprises two fully-connected layers, wherein the second fully-connected layer adopts a Softmax activation function, and the second fully-connected layer maps the input values of the first fully-connected layer to the interval (0, 1), so as to obtain the first probability vector.

Preferably, the classifier of the ResNet convolutional neural network model includes two fully-connected layers, wherein the second fully-connected layer uses a Softmax activation function, and the second fully-connected layer maps the input values of the first fully-connected layer to the interval (0, 1) to obtain the second probability vector.

Preferably, the data enhancement is one or more of flipping, rotating, scaling, cropping, shifting, adding noise, and modifying contrast, at random, of the image data.

Preferably, the data enhancement in step S2 is to perform horizontal rotation and vertical rotation on the image data, and the step S2 specifically includes:

s21: taking the vertex of the lower left corner of the image as an origin O, establishing an xyz space coordinate system, wherein the size of the image is a multiplied by b, and the image is superposed with an xy plane;

s22: randomly judging whether the image is horizontally rotated or not, wherein the probability of horizontal rotation or not is 0.5, if the image is horizontally rotated, the image is rotated by 180 degrees by taking (x is a/2, and z is 0) as a rotating shaft, and if not, the step S22 is carried out;

s23: and randomly judging whether the image is vertically rotated or not, wherein the probability of vertical rotation or not is 0.5, if the image is vertically rotated, rotating the image 180 degrees by taking (y is b/2 and z is 0) as a rotating shaft, finishing data enhancement, and if the image is not vertically rotated, finishing data enhancement.

Preferably, when the number of times of enhancement of the image is greater than the maximum number of times of enhancement, the image is output:

s5: inputting an image to be classified, and setting the enhancing times p to be 0;

s8: selecting an image enhancement algorithm corresponding to the image type to enhance the image to be classified, acquiring an enhanced image, setting the enhancement times p to be p +1, if p is more than or equal to the maximum enhancement times, entering the step S9, and otherwise, inputting the enhanced image serving as the image to be classified into the step S6;

s9: and outputting the image.

Preferably, the VGG16 convolutional neural network model comprises 13 convolutional layers, 3 fully-connected layers and 5 pooling layers.

Preferably, in S8, the GAN-based blind motion blur removal algorithm is used to perform enhancement processing on the image with the blurred image as the image type.

Preferably, in S8, the image with the image type of low-brightness image is enhanced by using a low-brightness image enhancement algorithm based on the corresponding camera model.

Compared with the prior art, the invention has the following advantages:

(1) compared with the prior art, the image classification method based on the convolutional neural network has the advantages that the convolutional neural network is adopted for classifying the image, the accuracy rate is higher compared with that of a traditional classifier based on physics, and meanwhile, the classification accuracy rate is improved by adopting two modes of data enhancement and model fusion;

(2) the invention integrates two algorithms special for enhancing two types of images, namely fuzzy images and low-brightness images, and can effectively classify and restore the detailed information of the images;

(3) in order to prevent two types of conditions on one image, the method sets the maximum enhancement times, and performs classification enhancement on the image for multiple times to realize the maximum enhancement effect;

(4) the random image data enhancement mode is utilized to perform data enhancement of horizontal rotation and vertical rotation on the image data, and the classification accuracy is effectively improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of an embodiment of the present invention;

FIG. 3 is a block diagram of the VGG16 model;

FIG. 4 is a block diagram of the VGG16 model;

FIG. 5 is a block diagram of a residual unit;

FIG. 6 is a block diagram of the ResNet model;

fig. 7 is a structural diagram of the DeblurGAN model.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. Note that the following description of the embodiments is merely a substantial example, and the present invention is not intended to be limited to the application or the use thereof, and is not limited to the following embodiments.

Examples

A low-quality image classification enhancement method based on model fusion and data enhancement is disclosed, as shown in FIG. 1, and comprises the following steps:

s1: an image set is established.

In the image set of the present embodiment, the image set includes three types of images, i.e., a sharp image, a blurred image, and a low-brightness image, and includes a training set, a test set, and a verification set, where 300 images of each type are included in the training set, 50 images of each type are included in the test set, and 50 images of each type are included in the verification set.

S2: data enhancement is performed on the images in the image set.

Data enhancement is one or more of flipping, rotating, scaling, cropping, shifting, adding noise, and modifying contrast on the image data at random.

In this embodiment, in step S2, the data enhancement is to perform horizontal rotation and vertical rotation on the image data, where the step S2 specifically includes:

S3: and constructing a VGG16 convolutional neural network model, and training the VGG16 convolutional neural network model through the image set.

As shown in fig. 3 and 4, the model structure of VGG16 includes 13 Convolutional layers (volumetric Layer), 3 Fully connected layers (full connected Layer), and 5 pooling layers (Pool Layer). The convolutional layers and the pooling layers can be divided into different blocks (blocks), which are numbered from front to back as blocks 1-Block5, and each Block comprises a plurality of convolutional layers and pooling layers, for example, Block4 comprises 3 convolutional layers (Conv3-512) and 1 pooling layer (Maxpool); and within the same block, the number of channels of the convolutional layers is the same, for example: the Block2 comprises 2 convolutional layers, each convolutional layer is represented by Conv2-128, namely the convolutional core is 3 × 3 × 3, and the number of channels is 128; block3 contains 3 convolutional layers, each denoted by Conv3-256, i.e., the convolutional kernel is 3 × 3 × 3, and the number of channels is 256. The input structure of the model VGG16 is 224 × 224 × 3, where the number of channels is doubled, from 64 to 128 in sequence, to 256 until 512 remains the same and is no longer doubled, and the height and width of the image is halved, from 224 → 112 → 56 → 28 → 14 → 7.

S4: and constructing a ResNet convolutional neural network model, and training the ResNet convolutional neural network model through the image set.

The ResNet convolution neural network solves the degradation problem of the deep network through residual learning: as network depth increases, network accuracy saturates and even decreases. As shown in fig. 5 and 6, the structure of the ResNet model refers to the VGG19 structure, on the basis of which the residual unit is modified and added through a short circuit mechanism.

The residual unit has two layers, the first layer:

F(x)＝W₂σ(W₁x)

where σ represents the nonlinear activation function ReLu, W₁And W₂Two weights in the first layer are represented.

Then the first layer is connected with the second layer through a short circuit, the ReLu function of the second layer obtains an output y,

y＝F(x)+x

wherein F (x) is the output of the first layer.

ResNet is mainly characterized in that the convolution with the convolution kernel size of 2 is directly used for down-sampling, and a global average pooling layer is adopted to replace a full-connection layer. Compared with the common network, the ResNet network imposes a short circuit mechanism between every two layers, so that residual error learning is formed.

S5: inputting an image to be classified;

s6: and inputting the image to be classified into a VGG16 convolutional neural network model to obtain a first probability vector of the image to be classified, inputting the image to be classified into a ResNet convolutional neural network model to obtain a second probability vector of the image to be classified.

S7: and fusing the first probability vector and the second probability vector to obtain a fused probability vector, and acquiring the image type of the image to be classified, if the image type is clear, entering the step S9, otherwise, entering the step S8.

The invention adopts a VGG16 convolutional neural network model and a ResNet convolutional neural network model for model fusion. Model fusion is to directly add probability vectors predicted by two or more basic models, and take the class with the highest probability after addition as the prediction class of the image.

In this embodiment, the probability vector is represented in a Onehot encoding form. The Onehot coding is a common class label coding form in a neural network, and converts a class label with an integer value into a binary value, wherein the index of the label value is marked as 1, and the other indexes are 0. In this embodiment, there are 3 category labels for the sharp image, the blurred image, and the low-luminance image, which correspond to

label values

0,1, and 2, respectively. The tag value 0 is converted into [1,0,0] after One-Hot encoding, the tag value 1 is converted into [0,1,0], and the tag value 2 is converted into [0,0,1 ].

In this embodiment, the two model classifiers each include two fully-connected layers, and the second fully-connected layer employs a Softmax activation function, and the second fully-connected layer maps the input values from the first fully-connected layer to the intervals (0, 1), and obtains a first probability vector and a second probability vector. Specifically, the Softmax function maps the input values of the first fully-connected layer between intervals (0, 1), forming, for example, [ lambda ]₁，λ₂，λ₃]Of the probability vector of_iFor each class of probability, the sum of which is 1, usually taken as λ_iThe largest median is used as the predicted class.

The formula of the Softmax function is:

wherein Z is_iAnd C is the output value of the ith node, and the number of output nodes, namely the number of classified categories.

In this embodiment, the pre-trained VGG16 convolutional neural network model and ResNet convolutional neural network model are adopted to output the first probability vector [ λ [ lambda ] ], respectively_a1，λ_a2，λ_a3]And a second probability vector [ lambda ]_b1，λ_b2，λ_b3]。

Wherein, λ_a1，λ_a2，λ_a3Probability, λ, of the images acquired in the VGG16 convolutional neural network model to be sharp, blurred and low-luminance images, respectively_b1，λ_b2，λ_b3Respectively, the probability that the image obtained in the ResNet convolutional neural network model is a sharp image, a blurred image, and a low-brightness image, respectively.

Adding the two probability vectors to obtain a fusion probability vector

[λ_a1+λ_b1，λ_a2+λ_b2，λ_a3+λ_b3]

S8: and (4) selecting an image enhancement algorithm corresponding to the image type to enhance the image to be classified to obtain an enhanced image, and inputting the enhanced image serving as the image to be classified into the step S6.

Specifically, a GAN-based blind motion blur removal algorithm is adopted to enhance an image with a blurred image as an image type, and a corresponding camera model-based low-brightness image enhancement algorithm is adopted to enhance an image with a low-brightness image as an image type.

Further, a GAN-based blind motion blur removal algorithm:

the algorithm treats image blurring removal as an end-to-end task through a GAN network, and a clear image is automatically generated according to a blurred image by using a generator through learning the blurred image and the clear image. The blind deblurring in the algorithm aims to restore a clear image Is by giving a blurred image IB under the condition of no blur kernel, deblurring Is carried out by adopting a generator, and a discrimination network Is introduced in the training process to carry out training and learning in a countermeasure mode.

As shown IN fig. 7, the generator contains two downsampling convolution modules, 9 residual modules (containing one convolution, IN and ReLU), and two upsampling transposed convolution modules, while also introducing global residual concatenation. This structure may allow faster training and at the same time better performance. Besides the generator, the algorithm also defines a discriminator in the training process, and adopts WGAN (Wasserstein GAN) with a penalty term to carry out countertraining.

The loss function of the algorithm consists of two parts: content loss and antagonism loss, the formula is:

L＝L_CAN+λL_x

wherein L is_GANTo combat losses, L_xFor content loss, λ is the weight.

In countering loss, using WGAN-GP as a penalty function, the loss is calculated as:

wherein, I^BIn order to input a blurred image, the image is displayed,

in order to be a generator,

to discriminate the network.

The content loss adopts the perception loss, and is based on the difference between the generated and target image CNN feature mapping, and the formula is as follows:

wherein, I^sFor the generated image, I^BIn order to input a blurred image, the image is displayed,

to be a generator, phi_i，jIs a feature map obtained by the jth convolution (after activation) before the ith max pooling layer in a VGG19 network, W_i，jAnd H_i，jIs a dimension function graph.

Further, a low brightness picture enhancement algorithm based on the corresponding model of the camera:

for a picture taken by a camera, the pixel value is not proportional to the brightness value reflected by the object. The nonlinear transformation Function from the luminance value sensed by the Camera photosensitive element to the actual pixel value of the image is called a Camera Response Function (CRF). The algorithm firstly waits for a camera response model by analyzing the relation between pictures with different exposure degrees, then obtains an exposure contrast image of the image by using a picture brightness component estimation method, and finally enhances the low-brightness picture by using the camera response model and the exposure contrast image.

The camera response model is defined as:

P＝f(E)

where E is the irradiance of the picture, P is the pixel value of the picture, and f should satisfy:

the algorithm uses a luminance Transform Function (BTF) to estimate f.

BTF is a mapping function between two graphs of the same scene but different exposure:

R＝g(P₀，k)

wherein P is₁And P₀Is an image with different exposure degrees in the same scene, and k is an exposure ratio.

CRM can therefore be calculated from g (f (e), k) ═ f (ke).

S9: and outputting the image.

The experimental environment in this example is: the accuracy of the system Win10, the display card Tesla P100, the deep learning framework Tensorflow2.0, the VGG16, the ResNet model and the fused model is shown in the following table, and the accuracy of the fused model is higher than that of any single model, which is shown in the following table.

In an embodiment of the present invention, in order to prevent the image processed by the two enhancement algorithms from being blurred or with low brightness, as shown in fig. 2, a maximum enhancement number is set, and when the enhancement number of the image is greater than or equal to the maximum enhancement number, the image is output:

s2: performing data enhancement on the images in the image set;

s9: and outputting the image.

The above embodiments are merely examples and do not limit the scope of the present invention. These embodiments may be implemented in other various manners, and various omissions, substitutions, and changes may be made without departing from the technical spirit of the present invention.

Claims

1. A low-quality image classification enhancement method based on model fusion and data enhancement is characterized by comprising the following steps:

s2: performing data enhancement on the images in the image set;

s5: inputting an image to be classified;

s9: and outputting the image.

2. The method of claim 1, wherein the first probability vector is:

[λ_a1，λ_a2，λ_a3]

the second probability vector is:

[λ_b1，λ_b2，λ_b3]

the fusion probability vector is as follows:

[λ_a1+λ_b1，λ_a2+λ_b2，λ_a3+λ_b3]

3. The method of claim 1, wherein the classifier of the VGG16 convolutional neural network model comprises two fully-connected layers, wherein the second fully-connected layer uses Softmax activation function, and the second fully-connected layer maps the input values from the first fully-connected layer to the interval (0, 1) to obtain the first probability vector.

4. The method as claimed in claim 1, wherein the classifier of the ResNet convolutional neural network model comprises two fully-connected layers, wherein the second fully-connected layer uses Softmax activation function, and the second fully-connected layer maps the input values from the first fully-connected layer to the interval (0, 1) to obtain the second probability vector.

5. The method of claim 1, wherein the data enhancement is one or more of flipping, rotating, scaling, cropping, shifting, adding noise, and modifying contrast of the image data.

6. The method as claimed in claim 5, wherein the data enhancement in step S2 is performed by rotating the image data horizontally and vertically, and the step S2 specifically comprises:

7. The method for classifying and enhancing the low-quality image based on the model fusion and the data enhancement as claimed in claim 1, wherein when the enhancing times of the image is greater than or equal to the maximum enhancing times, the image is output:

s9: and outputting the image.

8. The method of claim 1, wherein the VGG16 convolutional neural network model comprises 13 convolutional layers, 3 fully-connected layers and 5 pooling layers.

9. The method according to claim 1, wherein in S8, the blind motion blur removal algorithm based on GAN is used to enhance the image with the blurred image as the image type.

10. The method according to claim 1, wherein in S8, the image with the image type of low-brightness image is enhanced by a low-brightness image enhancement algorithm based on a corresponding camera model.