CN111914904B

CN111914904B - Image classification method fusing DarkNet and Capsule eNet models

Info

Publication number: CN111914904B
Application number: CN202010652781.8A
Authority: CN
Inventors: 李钢; 张玲; 王飞龙; 李晶; 冯军鹏; 郝中良
Original assignee: Taiyuan University of Technology
Current assignee: Taiyuan University of Technology
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2022-07-01
Anticipated expiration: 2040-07-08
Also published as: CN111914904A

Abstract

The invention discloses an image classification method fusing DarkNet and Capsule model, belongs to the technical field of image processing, and solves the problem of poor classification effect caused by unbalanced data; the technical scheme comprises the following steps: constructing a DarkNet-Capsule network fusion classification model, realizing definition of a fusion classification model loss function, inputting an image to be classified in the fusion classification model, performing forward training by using the DarkNet, and extracting a feature map of a target image; further processing the characteristic map of the target image, and completing error back transmission through loss to update parameters of the whole network; after multi-round iterative learning, classifying the images by utilizing a fusion classification model; the invention can further improve the classification accuracy when the data are unbalanced in the field of image classification, and simultaneously lays a firmer foundation for the research of machine vision.

Description

Image classification method fusing DarkNet and Capsule eNet models

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image classification method fusing DarkNet and Capsule eNet models.

Background

Image classification is an important technology in the field of image processing, and in recent years, with the development of deep learning, image classification technology has been developed greatly.

The DarkNet model belongs to an image feature extraction model improved based on a residual error concept in a YOLO detection framework, and not only has the property of avoiding network degradation of a residual error network, but also reduces the parameter quantity of the model. However, as the amount of training data is reduced, the generalization performance of the model is deteriorated, and the classification accuracy is sharply reduced.

The CapsuleNet model proposes the problem that the convolutional neural network extracts image features by using convolutional kernels, so that the neural network cannot learn representations of the same features under all different view angles. In order to solve the problem, the model expresses image features by scalar quantity of vectors representing neurons, so that the generalization performance of the model is improved, a relatively small data volume can be used for obtaining a good classification effect, and the classification accuracy of the model is poor when the model faces a relatively complex data set.

In reality, the magnitude of the data set owned by us is often extremely inclined, and the effect of the model owned by us at present is often not ideal when the image features of a certain small category are extracted, so that the final classification effect is poor. Based on this, it is necessary to provide a model that can improve the classification result caused by data imbalance.

Disclosure of Invention

The invention overcomes the defects of the prior art, provides the image classification method fusing the DarkNet and the Capsule eNet model, and solves the problem of poor classification effect caused by unbalanced data.

In order to achieve the above object, the present invention is achieved by the following technical solutions.

An image classification method fusing DarkNet and Capsule eNet models comprises the following specific steps:

step S1: constructing a DarkNet-Capsule network fusion classification model, including the definition of the DarkNet model and the definition of the Capsule eNet model; specifically, a plurality of DarkNet blocks form a DarkNet model, and a CapsuleNet network is built after the last DarkNet block.

Step S2: realizing the definition of a fusion classification model loss function; the loss of the fusion model is composed of margin loss of the capsule network and L2 weight regularization loss of DarkNet, and the loss function formula is as follows: l ═ T_k max(0，m⁺-||v_k||)²+λ(1-T_k)max(0，||v_k||-m^-)²+||w_D||²

Wherein, T_kIndicates whether a class exists, m⁺＝0.9，m^-0.1, λ is a hyperparameter set to 0.5, | | v_kI represents the probability that a capsule unit belongs to this category, W_D||²The loss function is the sum of the L2 regularization losses of each layer weight of the DarkNet network, and the loss function is the sum of the MarginLoss of CapsuleNet plus the L2 regularization losses of each layer weight of the DarkNet network.

Step S3: inputting images to be classified in the fusion classification model, performing forward training by using DarkNet, and extracting a feature map of a target image;

step S4: further processing the characteristic map of the target image by using a capsule network, and completing error back transmission through loss to update parameters of the whole network;

step S5: and after multiple rounds of iterative learning, the images are classified by utilizing a fusion classification model.

Preferably, the step S1 specifically includes:

step S11: building a DarkNet model, wherein the DarkNet model consists of a plurality of DarkNet blocks, all the DarkNet blocks consist of two convolutions and a residual error link, and the convolution kernels of the two convolutions are respectively 1 and 3; and an independent convolution layer is arranged between every two DarkNet blocks, the downsampling convolution is adopted, the original input is supplemented with 0, and the convolution is realized through VALID convolution with the step length of 2 and the convolution kernel of 3.

Step S12: building a capsuleNet network after the last DarkNet block, wherein the capsuleNet network consists of a main capsule network and a digital capsule network, the main capsule network changes the characteristics into a plurality of capsule units, and each capsule unit is a vector; the digital capsule layer separates each capsule unit, traverses each capsule unit, performs dynamic routing and outputs the final result of the network.

Preferably, the images to be classified in step S3 include training set, validation set, and test set images and labels for all classes, and all images are adjusted to a consistent size according to the fused classification model during input.

Preferably, the process of extracting the feature map by using the DarkNet in step S3 is as follows:

(1) inputting the image into a DarkNet network, and returning the results of the last 2 DarkNet blocks to provide richer visual field characteristics;

(2) and adjusting the return result of the small scale to the size same as that of the return result of the large scale by using a nearest neighbor interpolation method, combining the features of the return result of the small scale and the return result of the large scale, and performing convolution to extract the features.

Preferably, the step S4 specifically includes:

step S41: inputting the characteristic map processed by the DarkNet model into a main capsule layer of the capsule network, and converting the characteristic map into capsule units with each unit being an 8-dimensional vector.

Step S42: when the digital capsule layer divides the input according to capsule units, b is initialized firstly for all the capsule units i of the layer I and the capsule units j of the layer (l +1)_ijWith 0, r iterations are performed with the dynamic routing algorithm:

c, calculating capsule units i of the layer I_ij＝soft max(b_ij)；

② to capsule sheet with (l +1) layerThe element j calculates:

③ for the (l +1) layer of capsule units j:

fourthly, updating parameters of the layer I of capsule units and the layer (l +1) of capsule units j:

wherein, b_ijIs a temporary variable whose value is updated in the course of iteration, and when the whole algorithm is finished, its value is stored in c_ij(ii) a Calculating the vector c_ijThe value of (d), i.e. all weights of the l layers of capsules i;

is the output of the layer of capsule units, s_jUsed for measuring the importance degree of the capsules with l layers to the (l +1) capsules; the squarh function is a nonlinear function to obtain an output capsule v_jThe function ensures that the direction of the vector is preserved, while the length is limited to 1 or less, the modulo length of the vector represents the probability of belonging to this category, the greater the modulo length, the greater the probability, and the vector itself represents the information of the individual features in the image, including position, pose information.

Step S43: and solving the loss corresponding to the network output, and updating the parameters of the fusion classification model according to the solved loss.

Preferably, the step S5 specifically includes:

step S51: preprocessing data, inputting the data into a fusion classification model, and extracting image features;

step S52: after 5 rounds of iterative learning, updating the learning rate and the learning attenuation rate, and continuing iterative optimization;

step S53: and after the model accuracy rate and the loss are converged, performing an experiment by using a verification set and a test set, and storing check points.

Compared with the prior art, the invention has the beneficial effects that.

The image classification method fusing the DarkNet and the Capsule eNet model provided by the invention can combine the excellent feature extraction capability of the DarkNet aiming at complex images and the advantages of the Capsule eNet on the same deformation and good generalization of the feature extraction, make up for the shortages, and solve the problem of poor classification effect caused by unbalanced data from the network model level. The model can improve the classification accuracy of the model when data are unbalanced, and the learning capability of the model is improved, so that the model can be quickly converged. The method can further improve the classification accuracy when the data are unbalanced in the field of image classification, and simultaneously lays a firmer foundation for the research of machine vision.

Drawings

FIG. 1 is a flow chart of image classification fusing DarkNet and Capsule eNet models, implemented in accordance with the present invention.

FIG. 2 is a block diagram of DarkNet in a fused DarkNet and Capsule eNet model, implemented in accordance with the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail with reference to the embodiments and the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. The technical solution of the present invention is described in detail below with reference to the embodiments and the drawings, but the scope of protection is not limited thereto.

As shown in fig. 1-2, an image classification method fusing a DarkNet and a CapsuleNet model specifically includes the following steps:

step S1: processing image data to be loaded, wherein the image to be loaded comprises the following steps: training sets, validation sets, and test set images and labels for all classes. And all the images are adjusted to be consistent in size according to the fusion classification model, so that the model can be conveniently processed.

Step S2: and the definition for realizing the DarkNet-Capsule network fusion classification model comprises the definition of the DarkNet model and the definition of the Capsule eNet model.

Step S21: building a DarkNet model, wherein the DarkNet model consists of a plurality of DarkNet blocks, and all the DarkNet blocks consist of two convolutions (convolution kernels are 1 and 3 respectively) and a residual error link; and an independent convolution layer is arranged between every two DarkNet blocks, the downsampling convolution is adopted, the original input is supplemented with 0, and the convolution is realized through VALID convolution with the step length of 2 and the convolution kernel of 3.

Step S22: and building a capsuleNet network after the last DarkNet block, wherein the capsuleNet network consists of a main capsule network and a digital capsule network. The main capsule network changes the characteristics into a plurality of capsule units, and each capsule unit is a vector; the digital capsule layer separates each capsule unit, traverses each capsule unit, performs dynamic routing and outputs the final result of the network.

Step S3: realizing the definition of a fusion model loss function; the loss of the fusion model is composed of margin loss of the capsule network plus L2 weight regularization loss of DarkNet;

L＝T_k max(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)²+||w_D||²

wherein, T_kIndicates whether a class exists, m⁺＝0.9，m^-λ is a hyperparameter, typically set to 0.5, | | v |, 0.1_kI represents the probability that the capsule unit belongs to this category, W_D||²The sum of the L2 regularization losses for each layer weight of the DarkNet network. The loss function is the sum of the MarginLoss of CapsuleNet plus the L2 regularization loss of each layer weight of the DarkNet network.

Step S4: inputting images to be classified in the fusion classification model, performing forward training by using DarkNet, and extracting a feature map of a target image; the method specifically comprises the following steps:

step S41: inputting the image into a DarkNet network, and returning the results of the last 2 DarkNet blocks to provide richer visual field characteristics;

step S42: adjusting the return result of the small scale to the size same as the return result of the large scale by using a nearest neighbor interpolation method, combining the characteristics of the return result of the small scale and the return result of the large scale, and then performing convolution to extract the characteristics;

step S5: utilizing a capsule network to further process the characteristic map of the target image, and completing error back transmission through loss to update the parameters of the whole network;

step S51: inputting the characteristic map processed by the DarkNet model into a main capsule layer of the capsule network, and changing the characteristic map into capsule units with each unit being an 8-dimensional vector;

step S52: when the digital capsule layer divides the input according to capsule units, b is initialized firstly for all the capsule units i of the layer I and the capsule units j of the layer (l +1)_ijWhen the value is equal to 0, carrying out r iterations by using a dynamic routing algorithm;

c is calculated for the capsule unit i of the layer l_ij＝soft max(b_ij)；

For capsule unit j of (l +1) layer, calculate:

for capsule unit j of (l +1) layer, calculate:

updating parameters for the layer i of capsule units and the layer (l +1) of capsule units j:

wherein, b_ijIs a temporary variable whose value is updated in the course of iteration, and when the whole algorithm is finished, its value is stored in c_ij(ii) a Calculating vector c_ijThe value of (i), i.e. all weights of the l layers of capsules i;

is the output of the layer of capsule units, s_jUsed for measuring the importance degree of the capsules with l layers to the (l +1) capsules; the squarh function is a nonlinear function to obtain an output capsule v_jThe function ensures that the direction of the vector is preserved, while the lengthConstrained to 1 or less, the modulo length of the vector represents the probability of belonging to this category, the larger the modulo length, the greater the probability, and the vector itself may represent information of various features in the image (e.g., position, pose, etc.).

And calculating the loss corresponding to the network output, and updating the parameters of the previous network according to the calculated loss.

Step S6: after multi-round iterative learning, classifying the images by using a fusion model;

step S61: preprocessing data, inputting the data into a fusion network model, and extracting image characteristics;

after 5 rounds of iterative learning, updating the learning rate and the learning attenuation rate, and continuing iterative optimization;

and after the model accuracy rate and the loss are converged, performing an experiment by using a verification set and a test set, and storing check points.

The DarkNet network in the fusion model has the characteristics of avoiding network degradation, relatively small network parameters and strong characteristic extraction capability for complex data sets; the Capsule network is improved aiming at the convolutional network, so that the network can obtain a better effect on fewer data sets and has stronger generalization; the advantages of the two can be complemented, and the DarkNet can make up the problem that the Capsule eNet has poor performance in the complex data set aiming at the strong feature extraction capability of the complex data set; the Capsule eNet uses the vector to represent the scalar quantity of the neuron, makes up the defect of convolution, converts the characteristic map extracted by DarkNet into the vector and improves the generalization performance. Therefore, the DarkNet and the Capsule eNet are fused into the same deep learning model, the defects of the DarkNet and the Capsule eNet are overcome, the new model still shows stronger classification capability when the image categories are unbalanced, and the new model inherits the advantages of the Capsule eNet, so that the robustness of the model is stronger.

While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image classification method fusing DarkNet and Capsule eNet models is characterized by comprising the following specific steps:

step S1: constructing a DarkNet-Capsule network fusion classification model, including the definition of the DarkNet model and the definition of the Capsule model; specifically, a plurality of DarkNet blocks form a DarkNet model, and a CapsuleNet network is built after the last DarkNet block;

step S2: realizing the definition of a fusion classification model loss function; the loss of the fusion model is composed of the marginloss of the capsule network plus the L2 weight regularization loss of DarkNet, and the loss function formula is as follows:

L＝T_kmax(0，m⁺-||v_k||)²+λ(1-T_k)max(0，||v_k||-m^-)²+||w_D||²

wherein, T_kIndicates whether a class exists, m⁺＝0.9，m^-0.1, λ is a hyperparameter set to 0.5, | | v |, and_ki represents the probability that a capsule unit belongs to this category, W_D||²The sum of L2 regularization loss of each layer weight of the DarkNet network, and the loss function is the sum of MarginLoss of CapsuleNet and L2 regularization loss of each layer weight of the DarkNet network;

2. The image classification method fusing the DarkNet and the Capsule Net models according to claim 1, wherein the step S1 specifically comprises:

step S11: building a DarkNet model, wherein the DarkNet model consists of a plurality of DarkNet blocks, all the DarkNet blocks consist of two convolutions and a residual error link, and the convolution kernels of the two convolutions are respectively 1 and 3; and an independent convolution layer is arranged between every two DarkNet blocks, downsampling convolution is adopted, the original input is supplemented with 0, and then the convolution is realized through VALID convolution with the step length of 2 and the convolution kernel of 3;

3. The method of claim 1, wherein the images to be classified in step S3 include training set, verification set and test set images and labels of all classes, and all images are adjusted to a consistent size according to the fused classification model during the input process.

4. The image classification method based on the DarkNet and Capsule Net models as claimed in claim 1, wherein the process of extracting the feature map by using DarkNet in step S3 is as follows:

5. The image classification method fusing the DarkNet and the Capsule Net models according to claim 1, wherein the step S4 specifically comprises:

step S41: inputting the characteristic spectrum processed by the DarkNet model into a main capsule layer of a capsule network, and converting the characteristic spectrum into capsule units with each unit being an 8-dimensional vector;

c, calculating capsule units i of the layer I_ij＝soft max(b_ij)；

Calculating the capsule unit j of the (l +1) layer:

③ for the (l +1) layer of capsule units j:

is the output of the layer of capsule units, s_jUsed for measuring the importance degree of the capsules with l layers to the (l +1) capsules; the squarh function is a nonlinear function to obtain an output capsule v_jThe function ensures that the direction of the vector is preserved and the length is limited to 1 or less, the modulo length of the vector represents the probability of belonging to this category, the greater the modulo length, the greater the probability, and the vector itself represents the information of the individual features in the image, including position, pose information;

6. The method for classifying images by fusing DarkNet and Capsule Net models according to claim 1, wherein the step S5 is executed