CN113537325A

CN113537325A - Deep learning method for image classification based on logic of extracting high-low-level features

Info

Publication number: CN113537325A
Application number: CN202110758190.3A
Authority: CN
Inventors: 马辛; 付幸文; 孙亦琦
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-10-22
Anticipated expiration: 2041-07-05
Also published as: CN113537325B

Abstract

The invention discloses a deep learning method based on extracting high-low level feature logic for image classification, which consists of a feature extraction network and a logic network, wherein the feature extraction network is obtained by connecting a plurality of feature extraction units in series, the logic network is obtained by connecting a plurality of logic extraction units in series, the feature extraction units are convolutional layers or residual blocks, the logic extraction units are convolutional dendritic modules consisting of convolutional layers and Hadamard products, the feature extraction network is single-input and multi-output, is used for extracting image features of all levels from low to high, inputting images to be classified and outputting feature maps of all levels, the logic network is multi-input and single-output, and is used for constructing logic relations among high-low level image features and classifying according to the logic relations, inputting feature maps of all levels and outputting classification results Faster convergence speed and better robustness.

Description

Deep learning method for image classification based on logic of extracting high-low-level features

Technical Field

The invention relates to the field of artificial intelligence, in particular to a deep learning method for extracting high-low-level feature logic for classification, which is applied to image signals.

Background

Classification is a fundamental problem in many areas of fault diagnosis, automation, Computer Vision (CV), and Natural Language Processing (NLP). Deep learning has been a useful tool to solve the classification problem. Feature extraction has long been the key to Convolutional Neural Networks (CNNs) to mimic human vision. The CNN performs feature extraction on the input image by a convolution kernel. Due to the ability of translation and other degeneration and hierarchical expression, the CNN can extract deeper image features for classification. Later, researchers improved the CNN at depth with skip-junctions to get a residual neural network ResNet. The model solves the problems of gradient disappearance, gradient diffusion and degradation of deep CNN, so that the neural network has stronger feature extraction capability. The biological dendrites in the brain are proved to have logical operation ability of AND \ OR \ NOT, and the logical relation between input signals can be calculated. The existing model for simulating the function of the biological dendrites is a tree-projection network (DD) which can provide measures for logical extraction, and as the number of layers is increased, the DD can construct more complex logical relations. The convolutional dendritic network (CDD) inherits the logic extraction capability of the DD and the feature extraction capability of the CNN, and overcomes the defects that the DD cannot perform feature extraction and the CNN cannot perform logic extraction. Thus, CDD focuses more on the combination of features and relies on the logical relationship between features for classification. Many models in the field of target detection, such as a Feature Pyramid Network (FPN), DensNet, YOLOv3, and the like, improve the detection accuracy of small objects by fusing high-level and low-level features. In the field of image segmentation, a large number of semantic structures are kept in a high-level semantic segmentation result, but a small structure is seriously lost, a large number of details are kept in a low-level feature segmentation result, but semantic category prediction is poor, so that a semantic segmentation model with better performance is obtained by fusing high-level and low-level features and combining the advantages of the high-level and low-level features.

There have been many attempts to mimic the human visual perception capabilities, but the following problems still remain. The traditional neural network model cannot perform logic extraction between features. Although the CDD has the advantages of DD and CNN, the CDD has a shallow feature extraction layer and thus has a weak feature extraction capability. In addition, because of the hierarchical expression capability of the neural network on the features, both the high-level features and the low-level features contain a large amount of information of the image, and the existing classification network only utilizes the high-level features for classification and loses a large amount of information of the low-level features.

Disclosure of Invention

The invention aims to solve the technical problems that the traditional network cannot perform logic extraction among features, the feature extraction capability of a convolutional dendritic network (CDD) is weak, and the existing neural network cannot combine high-and-low-layer features, and provides a deep learning method for image classification based on logic for extracting the high-and-low-layer features. The method is mainly applied to the field of image classification, can improve the classification precision of all traditional models for classification only by extracting image features, and has the advantages of high classification precision, high convergence speed, good robustness and the like.

The technical scheme adopted by the invention for solving the technical problems is as follows: a deep learning method based on logic of extracting high-level and low-level features for image classification is characterized by comprising a feature extraction network and a logic network, wherein the feature extraction network is obtained by connecting a plurality of feature extraction units in series, the logic network is obtained by connecting a plurality of logic extraction units in series, the feature extraction units are convolutional layers or residual blocks, the logic extraction units are convolutional dendritic modules consisting of convolutional layers and Hadamard products, the connection relationship among the units is that the input of the feature extraction unit at the layer is the output of the feature extraction unit at the upper level, the input of the logic extraction unit at the layer is the output of the logic extraction unit at the upper level and the output of the feature extraction unit at the layer, the feature extraction network is single-input and multi-output and is used for extracting the image features of all levels from low level to high level, the input is the image to be classified and the feature map of all levels is output, the logic network is multi-input single-output and is used for constructing the logic relation among the high-bottom image characteristics and classifying according to the logic relation, the input is a characteristic diagram of each layer, and the output is a classification result, and the method specifically comprises the following steps:

step 1, sending an input image with an image label into a first-layer convolutional layer for dimension conversion and primary feature extraction to obtain a feature map 0;

in the formula (I), the compound is shown in the specification,

is a characteristic diagram 0, X_inFor inputting an image, W_firstIs the weight matrix of the first convolutional layer,

is a convolution;

step 2, sending the feature map 0 into a feature extraction unit 1 to obtain a feature map 1 after feature extraction;

in the formula (I), the compound is shown in the specification,

is a characteristic diagram 0, X_F ¹Is characterized in FIG. 1, W_F ¹Weight matrix, W, for feature extraction unit 1_S ¹For the linear mapping weight matrix in the hop connection in the feature extraction unit 1, F (-) is a non-linear activation function,

is a convolution;

step 3, simultaneously sending the feature map 0 and the feature map 1 to a logic extraction unit 1, and obtaining a feature logic map 1 after high-low layer feature logic combination;

in the formula (I), the compound is shown in the specification,

in order to characterize the logic diagram of figure 1,

is a characteristic diagram 0, X_F ¹In order to characterize the drawing of figure 1,

is the weight matrix of the logical extraction unit 1,

for convolution, O is the Hadamard product;

step 4, sending the feature map 1 into a feature extraction unit 2, and obtaining a feature map 2 after feature extraction;

in the formula, X_F ¹In order to characterize the drawing of figure 1,

in order to characterize the figure 2 in the way,

a weight matrix, W, for the feature extraction unit 2_S ²For the linear mapping weight matrix in the hop connection in the feature extraction unit 2, F (-) is a non-linear activation function,

is a convolution;

step 5, simultaneously sending the characteristic logic diagram 1 and the characteristic diagram 2 into a logic extraction unit 2, and obtaining a characteristic logic diagram 2 after high-low layer characteristic logic combination;

in the formula (I), the compound is shown in the specification,

in order to characterize the logic diagram of figure 1,

for the characteristic logic diagram 2, X_F ¹Is a characteristic diagram of FIG. 1, X_F ²In order to characterize the figure 2 in the way,

is the weight matrix of the logic extraction unit 2,

for convolution, O is the Hadamard product;

and 6, continuously utilizing the next-layer feature extraction unit to extract the features of a higher layer and utilizing the next-layer logic extraction unit to logically combine the features of a higher layer and a lower layer until a feature logic diagram of a highest layer is obtained, wherein a recurrence formula is expressed as follows:

wherein k is a layer mark (k: 2,3, …),

in order to characterize the diagram k-1,

in order to be able to characterize the graph k,

in order to characterize the logic diagram k-1,

in order to characterize the logic diagram k,

a weight matrix, W, for the feature extraction unit k_S ^kFor the linear mapping weight matrix in the hopping connection in the feature extraction unit k,

for the weight matrix of the logic extraction unit 2, F (-) is a non-linear activation function,

is a convolution;

step 7, reforming the feature logic diagram of the highest layer into a one-dimensional tensor and sending the one-dimensional tensor into a classifier, and obtaining a classification result of the image by the classifier according to the feature logic of the highest layer;

in the formula, X_outIn order to be a result of the classification,

is a characteristic logic diagram of the highest layer, W_FCReshape (-) represents the reformation of variables into one-dimensional tensors for the weight matrix of the fully connected layers in the classifier;

step 8, comparing the classification result with the image label, calculating a classification error, and adjusting the weights of the first-layer convolutional layer, each feature extraction unit and each logic extraction unit by using an error back propagation algorithm to iterate for multiple times to obtain an optimal image classification model;

E＝L(X_lab,X_out)

in the formula, X_outFor the classification result, X_labIs an image label, E is a classification error, W_*For all weights in the model, W_*' is all weights of the model adjusted by the error back propagation algorithm, L (-) is a loss function, and eta is a learning rate;

and 9, classifying the input images without the image labels by using the optimal image classification model to obtain the categories of the input images.

Further, expanding the formula in step 3 to obtain a feature logic diagram 1 including all possible feature logic combinations between the feature diagram 0 and the feature diagram 1;

in the formula (I), the compound is shown in the specification,

in order to characterize the logic diagram of figure 1,

is the weight of logical extraction unit 1, f_ij ⁰Is the image feature of the feature map 0, f_ij ¹For characterizing the image features of FIG. 1, and logic f_ij ⁰·f_ij ¹OR logic f_ij ⁰+f_ij ¹NOT logic-f_ij ^*，

Is the weight of the logical extraction unit 1, where i, j is the matrix index (i ═ 1,2, …, n;. j ═ 1,2, …, m), n is the length of the image, m is the width of the image, x represents any layer,

for convolution, o is the hadamard product, (padding 1, stop 1) indicates that one pixel is filled around the image and the convolution step size is 1.

Further, by using the recursion formula in step 6, the feature logic combination among all the high-level and low-level features can be obtained, and then the weight before the logic combination is adjusted by an error back propagation algorithm, so that the high-level and low-level feature logic relationship contributing to the classification precision can be obtained.

The principle of the invention is as follows: traditional classification networks can extract more robust image features for classification, but ignore potential logical relationships between features. Meanwhile, the classification mode only depends on the high-level features, and a large amount of detail features contained in the low-level features are ignored. Therefore, the invention combines the feature extraction module with better network robustness and the CDD module through a well-designed framework, extracts and combines the features of the high-bottom image, and finally classifies the features according to the logical relationship among the features.

Compared with the prior art, the invention has the advantages that:

the advantages compared to conventional models (such as ResNet, WRN, etc.) that are used for classification only by extracting image features are: the invention considers the logic relation between the high-bottom characteristics, and the classification precision and the convergence speed are both obviously improved; the advantages compared to CDD networks that rely on logical relationships between features for classification are: the method has stronger feature extraction capability, and can extract image features of higher levels, so that the model is more robust.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments of the present invention are described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

In order to illustrate the deep learning method for extracting high-level and low-level feature logic for classification, a complete image classification model is constructed by the method, image signals of two source data sets Cifar10 and Cifar100 are classified, and the classification target related in the technical scheme can be tensors with any dimension and size, and is compared with two traditional models (ResNet, WRN) which only rely on high-level features for classification under the same frame. And the model applied to ResNet is called ResNet-CDD, and the model applied to WRN is called WRN-CDD.

As shown in fig. 1, the present invention is composed of a feature extraction unit and a logic extraction unit. And (3) replacing the input data with 3 multiplied by 32 tensors according to the sizes of the images in the Cifar10 and Cifar100 data, quantizing the images, sending the quantized images into the model, training the model according to the steps 1 to 8, and obtaining the optimal classification model after multiple iterations. The whole process of extracting the characteristics and obtaining the beneficial characteristic logic combination aims at reducing errors, and automatic iterative optimization is carried out by taking an error back propagation algorithm as a method without manual participation. The model prediction classification process is steps 1 to 7. Since the weights of various types in the model are adjusted and optimized by a large amount of data in the training set, the error between the output of the image signals in the test set obtained through the model and the image labels is also within an allowable range.

As shown in tables 1,2, 3 and 4, in order to compare the present invention with two conventional models (ResNet, WRN) classified only by means of high-level features on two open source data sets Cifar10 and Cifar100 in the same framework, the model applied to ResNet is called ResNet-CDD, and the model applied to WRN is called WRN-CDD.

The following phenomena can be seen from tables 1,2, 3 and 4: on the Cifar10 data set with small classification difficulty, when the original model is small (the number of layers or width or the number of parameters is small, namely the feature extraction capability is weak), the precision of the original model is improved greatly, and when the model is gradually enlarged (namely the feature extraction capability is gradually enhanced), the precision of the original model gradually reaches the precision level of the method. On the Cifar100 data set with high classification difficulty, the method greatly improves the original model, and also presents a trend that the improvement of the original model is changed from large to small as the original model is changed from small to large (namely the feature extraction capability of the original model is changed from small to large).

TABLE 1 comparative experimental results of the data set for Cifar10 between the present invention (ResNet-CDD) and the original model (ResNet)

TABLE 2 comparative experimental results of the Cifar100 dataset for the present invention (ResNet-CDD) and the former model (ResNet)

TABLE 3 comparative experimental results of the data set of Cifar10 between the present invention (WRN-CDD) and the original model (WRN)

TABLE 4 comparative experimental results of the present invention (WRN-CDD) and the original model (WRN) in the Cifar100 dataset

First, it can be seen from the above that the present invention has the advantages of higher precision, faster convergence speed, and better robustness. Secondly, the invention can greatly enhance the classification model with weak feature extraction capability, which accords with the visual characteristics of human beings, in fact, human beings can realize high-precision classification only by extracting some low-level image features (such as contour, texture, color and the like) and then through the logic combination of the brain, and human beings can not be limited to only carry out logic combination on the features of the same level but can combine all available high-level features. Meanwhile, if the recognized object is simple and only needs a few simple features to distinguish, human vision does not need to classify through logic combination among the features, which explains that the original model can achieve the same level of classification accuracy on the Cifar10 data set as the present invention.

Those skilled in the art will appreciate that the invention may be practiced without these specific details.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A deep learning method based on logic of extracting high-level and low-level features for image classification is characterized by comprising a feature extraction network and a logic network, wherein the feature extraction network is obtained by connecting a plurality of feature extraction units in series, the logic network is obtained by connecting a plurality of logic extraction units in series, the feature extraction units are convolutional layers or residual blocks, the logic extraction units are convolutional dendritic modules consisting of convolutional layers and Hadamard products, the connection relationship among the units is that the input of the feature extraction unit at the layer is the output of the feature extraction unit at the upper level, the input of the logic extraction unit at the layer is the output of the logic extraction unit at the upper level and the output of the feature extraction unit at the layer, the feature extraction network is single-input and multi-output and is used for extracting the image features of all levels from low level to high level, the input is the image to be classified and the feature map of all levels is output, the logic network is multi-input single-output and is used for constructing the logic relation among the high-bottom image characteristics and classifying according to the logic relation, the input is a characteristic diagram of each layer, and the output is a classification result, and the method specifically comprises the following steps:

in the formula, X_F ⁰Is a characteristic diagram 0, X_inFor inputting an image, W_firstIs the weight matrix of the first convolutional layer,

is a convolution;

in the formula, X_F ¹Is characterized in FIG. 1, W_F ¹Weight matrix, W, for feature extraction unit 1_S ¹For the linear mapping weight matrix in the hop connection in the feature extraction unit 1, F (-) is a non-linear activation function,

is a convolution;

step 3, simultaneously sending the feature diagram 0 and the feature diagram 1 to a logic extraction unit 1, and obtaining a feature logic diagram 1 after high-level and low-level feature logic combination, wherein the feature logic diagram 1 comprises all possible feature logic combinations between the feature diagram 0 and the feature diagram 1;

in the formula (I), the compound is shown in the specification,

for the characteristic logic diagram 1, X_F ⁰Is a characteristic diagram 0, X_F ¹In order to characterize the drawing of figure 1,

in order to be a convolution,

is the product of Hadamard;

in the formula, X_F ²Is characterized by FIG. 2, W_F ²A weight matrix, W, for the feature extraction unit 2_S ²Linear mapping weight matrix in jump connection in the feature extraction unit 2;

in the formula (I), the compound is shown in the specification,

in order to characterize the logic of figure 2,

is the weight matrix of the logic extraction unit 2;

and 6, continuously extracting the features of a higher layer by using a next-layer feature extraction unit and logically combining the features of a high layer and a low layer by using a next-layer logic extraction unit until a feature logic diagram of the highest layer is obtained, wherein the feature logic diagram comprises feature logic combinations among all the features of the high layer and the low layer, and a recursion formula is expressed as follows:

wherein k is a layer number, k is 2,3, …, X_F ^k-1Is a characteristic diagram k-1, X_F ^kIn order to be able to characterize the graph k,

in order to characterize the logic diagram k-1,

is a characteristic logic diagram k, W_F ^kA weight matrix, W, for the feature extraction unit k_S ^kLinear mapping weight matrix in jump connection in the feature extraction unit k;

in the formula, X_outIn order to be a result of the classification,

step 8, comparing the classification result with the image label, calculating a classification error, adjusting the weights of the first-layer convolutional layer, each feature extraction unit and each logic extraction unit by using an error back propagation algorithm, and obtaining an optimal image classification model after multiple iterations;

E＝L(X_lab,X_out)