CN113537325B

CN113537325B - Deep learning method for image classification based on extracted high-low layer feature logic

Info

Publication number: CN113537325B
Application number: CN202110758190.3A
Authority: CN
Inventors: 马辛; 付幸文; 孙亦琦
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-07-11
Anticipated expiration: 2041-07-05
Also published as: CN113537325A

Abstract

The invention discloses a deep learning method based on feature logic of an extraction high-low layer for image classification, which consists of a feature extraction network and a logic network, wherein the feature extraction network is formed by connecting a plurality of feature extraction units in series, the logic network is formed by connecting a plurality of logic extraction units in series, the feature extraction units are convolution layers or residual blocks, the logic extraction units are convolution dendritic modules formed by convolution layers and Hadamard products, the feature extraction network is single-input multi-output, is used for extracting image features of all layers from low to high, is input into images to be classified, is output into feature images of all layers, the logic network is multi-input single-output, is used for constructing a logic relation between the features of the high-low layer and classifying according to the logic relation, is input into feature images of all layers, and is output into classification results.

Description

Deep learning method for image classification based on extracted high-low layer feature logic

Technical Field

The invention relates to the field of artificial intelligence, in particular to a deep learning method for classifying by extracting high-low layer feature logic applied to image signals.

Background

Classification is a fundamental problem in many fields such as fault diagnosis, automation, computer Vision (CV), and Natural Language Processing (NLP). Deep learning has been a useful tool to solve classification problems. Feature extraction has been a key to Convolutional Neural Networks (CNNs) mimicking human vision. The CNN performs feature extraction on the input image by convolution check. Due to the ability to translate, etc., and to stratify, CNNs can extract deeper image features for classification. Later, researchers improved deep CNNs with jump connections to yield residual neural networks res net. The model solves the problems of gradient disappearance, gradient dispersion and degradation of deep CNN, so that the neural network has stronger feature extraction capability. Biological dendrites in the brain have proven to have logical and/or/non-logical computing capabilities, enabling the calculation of logical relationships between input signals. An existing model that mimics biological dendritic functions is a dendritic network (DD) that can provide a measure for logical extraction, and as the number of layers increases, the DD can build more complex logical relationships. The convolution dendritic network (CDD) inherits the logic extraction capability of the DD and the feature extraction capability of the CNN, and makes up the defects that the DD cannot perform feature extraction and the CNN cannot perform logic extraction. CDD is therefore more concerned with the combination of features and relies on the logical relationships between features for classification. Many models in the field of target detection, such as Feature Pyramid Networks (FPN), densNet, YOLOv3, etc., improve the detection accuracy of small objects by fusing high-low layer features. In the image segmentation field, a large number of semantic structures are maintained by a high-level semantic segmentation result, but a small structure is seriously lost, a large number of details are maintained by a segmentation result of a low-level feature, but semantic category prediction is poor, so that a semantic segmentation model with better performance is obtained by fusing the high-level feature and the low-level feature and combining the advantages of the high-level feature and the low-level feature.

Many attempts have been made to emulate human visual perceptibility, but the following problems still remain. The traditional neural network model cannot do logical extraction between features. CDD inherits the advantages of DD and CNN, but the feature extraction layer of CDD is shallow, so that the feature extraction capability is weak. In addition, because the neural network has the hierarchical expression capability of the features, whether the high-level features or the low-level features contain a large amount of information of the images, the existing classification network only uses the high-level features to classify and can lose a large amount of information of the low-level features.

Disclosure of Invention

The invention aims to solve the technical problems that the traditional network cannot perform logic extraction between features, the feature extraction capability of a convolutional dendritic network (CDD) is weak, and the existing neural network cannot combine high-low-level features. The method is mainly applied to the field of image classification, can improve the classification precision of all traditional models which are used for classification only by extracting image features, and has the advantages of high classification precision, high convergence speed, good robustness and the like.

The technical scheme adopted for solving the technical problems is as follows: the deep learning method based on the feature logic of the high and low layers is characterized by comprising a feature extraction network and a logic network, wherein the feature extraction network is formed by connecting a plurality of feature extraction units in series, the logic network is formed by connecting a plurality of logic extraction units in series, the feature extraction units are convolution layers or residual blocks, the logic extraction units are convolution dendritic modules formed by convolution layers and Hadamard products, the connection relation among the units is that the input of the feature extraction unit of the layer is the output of the feature extraction unit of the upper layer, the input of the feature extraction unit of the layer is the output of the feature extraction unit of the upper layer and the output of the feature extraction unit of the layer, the feature extraction network is single-input multi-output, the function is to extract the image features of each layer from low to high, the input is the feature map of each layer, the logic network is multi-input single-output, the function is to construct the logic relation between the image features of the high and classify according to the logic relation, the input is the feature map of each layer, and the output is the classifying result, and the method specifically comprises the following steps:

step 1, an input image with an image tag is sent into a first layer convolution layer to perform dimension transformation and preliminary feature extraction to obtain a feature map 0;

in the method, in the process of the invention,

is characteristic diagram 0, X _in To input an image W _first Weight matrix for first layer convolution layer, +.>

Is a convolution;

step 2, sending the feature map 0 into a feature extraction unit 1 to obtain a feature map 1 after feature extraction;

in the method, in the process of the invention,

is characteristic diagram 0, X _F ¹ Is characterized by FIG. 1, W _F ¹ A weight matrix W for the feature extraction unit 1 _S ¹ For the linear mapping weight matrix in the jump connection in the feature extraction unit 1, F (·) is a nonlinear activation function,/->

Is a convolution;

step 3, simultaneously sending the feature map 0 and the feature map 1 to a logic extraction unit 1, and obtaining a feature logic map 1 after high-low layer feature logic combination;

in the method, in the process of the invention,

logic diagram 1, for features>

Is characteristic diagram 0, X _F ¹ For the features of FIG. 1, < >>

Weight matrix for logic extraction unit 1, +.>

Convolution, O is Hadamard product;

step 4, sending the feature map 1 into a feature extraction unit 2 to obtain a feature map 2 after feature extraction;

wherein X is _F ¹ As a feature of the figure 1 of the drawings,

for the features of FIG. 2, < >>

A weight matrix W for the feature extraction unit 2 _S ² For the linear mapping weight matrix in the jump connection in the feature extraction unit 2, F (·) is a nonlinear activation function,/->

Is a convolution;

step 5, the feature logic diagram 1 and the feature diagram 2 are simultaneously sent to a logic extraction unit 2, and the feature logic diagram 2 is obtained after high-low-layer feature logic combination;

in the method, in the process of the invention,

logic diagram 1, for features>

Is characterized by logic diagram 2, X _F ¹ Is characterized by FIG. 1, X _F ² For the features of FIG. 2, < >>

For the weight matrix of the logic extraction unit 2, +.>

Convolution, O is Hadamard product;

and step 6, continuously extracting higher-level features by using a next-level feature extraction unit and logically combining the higher-level features and the lower-level features by using the next-level logic extraction unit until a highest-level feature logic diagram is obtained, wherein a recursive formula is expressed as follows:

where k is a layer flag (k=2, 3, …),

is a characteristic diagram k-1,/or->

For the feature map k->

Is a characteristic logic diagram k-1,/o>

Is a characteristic logic diagram k->

Weight matrix for feature extraction unit k, W _S ^k For the linear mapping weight matrix in the jump connection in the feature extraction unit k,/for the feature extraction unit k>

For the weight matrix of the logic extraction unit 2, F (·) is a nonlinear activation function, ++>

Is a convolution;

step 7, reforming the highest-level feature logic diagram into a one-dimensional tensor and sending the one-dimensional tensor into a classifier, and obtaining a classification result of the image by the classifier according to the highest-level feature logic;

wherein X is _out In order to classify the result of the classification,

is the feature logic diagram of the highest layer, W _FC For the weight matrix of the full connection layer in the classifier, reshape (·) represents reforming variables into one-dimensional tensors;

step 8, comparing the classification result with the image label, calculating classification errors, and adjusting weights of the first-layer convolution layer, each feature extraction unit and the logic extraction unit by using an error back propagation algorithm to obtain an optimal image classification model after multiple iterations;

E＝L(X _lab ,X _out )

wherein X is _out For classification result, X _lab For image labels, E is the classification error, W _* For all weights in the model, W _* ' is all weights of the model after adjustment of an error back propagation algorithm, L (·) is a loss function, and η is a learning rate;

and 9, classifying the input image without the image tag by using an optimal image classification model to obtain the category to which the input image belongs.

Further, expanding the formula in the step 3, so as to obtain a feature logic diagram 1 including all possible feature logic combinations between the feature diagram 0 and the feature diagram 1;

in the method, in the process of the invention,

logic diagram 1, for features>

Is characteristic diagram 0, X _F ¹ For the features of FIG. 1, < >>

For logically extracting the weight of cell 1, f _ij ⁰ Image features, f, of feature map 0 _ij ¹ For characterizing the image features of FIG. 1, AND logic f _ij ⁰ ·f _ij ¹ Or logic f _ij ⁰ +f _ij ¹ Non-logic-f _ij ^* ，/>

Is the weight of the logic extraction unit 1, where i, j is the matrix index (i=1, 2, …, n; j=1, 2, …, m), n is the length of the image, m is the width of the image, and represents any layer>

For convolution, o is hadamard product, (padding=1, tap=1) indicates that one pixel is filled around the image and the convolution step is 1.

Further, by using the recurrence formula in step 6, feature logic combinations between all high-level and low-level features can be obtained, and then the weights before logic combination are adjusted by an error back propagation algorithm, so that the high-level and low-level feature logic relationship contributing to classification accuracy can be obtained.

The principle of the invention is as follows: traditional classification networks can extract more robust image features for classification, but ignore potential logical relationships between features. Meanwhile, the classification mode is only dependent on the high-level features, and a large number of detail features contained in the low-level features are ignored. Therefore, the invention combines the characteristic extraction module with better network robustness with the CDD module through a carefully designed architecture, extracts and combines the high-bottom image characteristics, and finally classifies the image characteristics by means of the logic relationship among the characteristics.

Compared with the prior art, the invention has the advantages that:

the advantages compared to conventional models (e.g., resNet, WRN, etc.) that are used for classification by extracting image features alone are: the invention considers the logic relation between the high-low-level characteristics, and obviously improves the classification precision and the convergence rate; the advantages compared to CDD networks that rely on logical relationships between features for classification are: the method has stronger feature extraction capability, and can extract higher-level image features, so that the model is more robust.

Drawings

Fig. 1 is a flow chart of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments of the present invention will be described in further detail below with reference to the accompanying drawings of the specification. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In order to illustrate a deep learning method of extracting high-low level feature logic for classification according to the present invention, a complete image classification model is constructed by the present invention, and the image signals of two open source data sets Cifar10 and Cifar100 are classified and compared with two conventional models (res net, WRN) which rely on only high-level features for classification. And the model of the invention applied to ResNet is called ResNet-CDD, and the model applied to WRN is called WRN-CDD.

As shown in fig. 1, the present invention is composed of a feature extraction unit and a logic extraction unit. According to the size of the middle image of the Cifar10 and Cifar100 data, the input data is replaced by 3×32×32 tensors, the images are quantized and sent into a model, training is carried out on the model according to the steps 1 to 8, and the optimal classification model can be obtained after multiple iterations. The whole process of extracting the characteristics and obtaining the logic combination of the beneficial characteristics aims at reducing errors, and the iterative optimization is automatically carried out by taking an error back propagation algorithm as a method, so that manual participation is not needed. The model predictive classification process is steps 1 to 7. Because the weights in the model are already adjusted and optimized by a large amount of data in the training set, the error between the output of the image signal in the test set obtained by the model and the image label is also within the allowable range.

Tables 1,2, 3 and 4 show that in order to compare the present invention with two conventional models (ResNet, WRN) classified by only high-level features on two open source data sets, cifar10 and Cifar100, under the same framework, the model of the present invention applied to ResNet is called ResNet-CDD, and the model applied to WRN is called WRN-CDD.

From tables 1,2, 3 and 4, the following phenomena can be seen: on the Cifar10 data set with smaller classification difficulty, when the original model is smaller (the number of layers or the width or the parameter quantity is smaller, namely, the feature extraction capability is weaker), the precision of the original model is improved greatly, and when the model is gradually enlarged (namely, the feature extraction capability is gradually enhanced), the precision of the original model gradually reaches the precision level of the invention. On the Cifar100 data set with larger classification difficulty, the invention greatly improves the original model, and also presents a trend that the original model is improved from large to small as the original model is reduced from small to large (namely, the feature extraction capability of the original model is improved from large to strong).

TABLE 1 comparative experimental results of the invention (ResNet-CDD) with the prototype (ResNet) in the Cifar10 dataset

TABLE 2 comparative experimental results of the invention (ResNet-CDD) with the prototype (ResNet) in the Cifar100 dataset

TABLE 3 comparative experimental results of the invention (WRN-CDD) with the prototype model (WRN) in the Cifar10 dataset

TABLE 4 comparative experimental results of the invention (WRN-CDD) with the prototype model (WRN) in the Cifar100 dataset

First, the advantages of the invention, such as higher accuracy, faster convergence speed, and better robustness, can be seen from the foregoing. Secondly, the invention can greatly enhance the classification model with weaker feature extraction capability, which accords with the visual characteristics of human beings, and in practice, the human beings can realize high-precision classification only by extracting some low-level image features (such as outlines, textures, colors and the like) and then performing logical combination of brains, and the human beings are not limited to only logically combining the features of the same level, but combining all available high-level features. It should also be noted that if the identified object is relatively simple, only a few simple features are needed to distinguish, and that human vision does not need to be classified by logical combinations between features, which explains that the original model can achieve the same level of classification accuracy on the Cifar10 dataset as the present invention.

What is not described in detail in the present specification belongs to the prior art known to those skilled in the art.

The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalents and modifications that do not depart from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The deep learning method based on the feature logic of the high and low layers is characterized by comprising a feature extraction network and a logic network, wherein the feature extraction network is formed by connecting a plurality of feature extraction units in series, the logic network is formed by connecting a plurality of logic extraction units in series, the feature extraction units are convolution layers or residual blocks, the logic extraction units are convolution dendritic modules formed by convolution layers and Hadamard products, the connection relation among the units is that the input of the feature extraction unit of the layer is the output of the feature extraction unit of the upper layer, the input of the feature extraction unit of the layer is the output of the feature extraction unit of the upper layer and the output of the feature extraction unit of the layer, the feature extraction network is single-input multi-output, the function is to extract the image features of each layer from low to high, the input is the feature map of each layer, the logic network is multi-input single-output, the function is to construct the logic relation between the image features of the high and classify according to the logic relation, the input is the feature map of each layer, and the output is the classifying result, and the method specifically comprises the following steps:

wherein X is _F ⁰ Is characteristic diagram 0, X _in To input an image W _first Is the weight matrix of the first layer convolution layer,

is a convolution;

wherein X is _F ¹ Is characterized by FIG. 1, W _F ¹ A weight matrix W for the feature extraction unit 1 _S ¹ For the linear mapping weight matrix in the jump connection in the feature extraction unit 1, F (·) is a nonlinear activation function,

is a convolution;

step 3, simultaneously sending the feature map 0 and the feature map 1 to a logic extraction unit 1, and obtaining a feature map 1 after high-low layer feature logic combination, wherein the feature map 1 comprises all possible feature logic combinations between the feature map 0 and the feature map 1;

in the method, in the process of the invention,

is characterized by logic diagram 1, X _F ⁰ Is characteristic diagram 0, X _F ¹ For the features of FIG. 1, < >>

For convolution +.>

Is Hadamard product;

wherein X is _F ² Is characterized by FIG. 2, W _F ² A weight matrix W for the feature extraction unit 2 _S ² A linear mapping weight matrix in jump connection in the feature extraction unit 2;

in the method, in the process of the invention,

logic diagram 2, for feature>

A weight matrix which is a logic extraction unit 2;

and step 6, continuously extracting higher-level features by using a next-level feature extraction unit and logically combining the higher-level features and the lower-level features by using the next-level logic extraction unit until a highest-level feature logic diagram is obtained, wherein the feature logic diagram comprises feature logic combinations among all the higher-level features and the lower-level features, and a recursive formula is expressed as follows:

where k is a layer mark, k=2, 3, …, X _F ^k-1 For the characteristic diagram k-1, X _F ^k As a characteristic map k of the graph,

as a characteristic logic diagram k-1,

for the characteristic logic diagram k, W _F ^k Weight matrix for feature extraction unit k, W _S ^k A linear mapping weight matrix in jump connection in the feature extraction unit k;

wherein X is _out In order to classify the result of the classification,

step 8, comparing the classification result with the image label, calculating classification errors, and adjusting weights of a first-layer convolution layer, each feature extraction unit and a logic extraction unit by using an error back propagation algorithm, so as to obtain an optimal image classification model after multiple iterations;

E＝L(X _lab ,X _out )