CN112926691A

CN112926691A - Convolutional dendrite method for extracting feature logic for classification

Info

Publication number: CN112926691A
Application number: CN202110373833.2A
Authority: CN
Inventors: 马辛; 付幸文; 孙亦琦
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-06-08

Abstract

The invention discloses a convolution dendritic method for extracting feature logic for classification, which belongs to the field of artificial intelligence, classifies by means of the feature logic of input data, can extract data features and simultaneously acquire the logic relationship among the features, and comprises the following steps: (1) firstly, extracting input data characteristics by using convolution operation of a weight matrix on input image signals or time series signal data to obtain characteristic data; (2) summing the characteristic data and the bias matrix and introducing a constant term to obtain intermediate data; (3) performing Hadamard product on intermediate data containing the characteristic data and the constant item and the intermediate data to construct a logical relation between the characteristics; (4) and (3) iteratively optimizing the weight matrix and the bias matrix by adopting an error back propagation algorithm, thereby keeping the characteristic logic relation contributing to the classification precision as the output data of the layer, wherein the output data of the layer is the input data of the next layer. The method has the advantages of small calculation amount, high classification precision, good model stability, high convergence speed and strong transportability.

Description

Convolutional dendrite method for extracting feature logic for classification

Technical Field

The invention relates to the field of artificial intelligence, in particular to a convolutional dendritic method for classifying by extracting characteristic logic applied to an image signal or a time series signal.

Background

Classification is a fundamental problem in many areas of fault diagnosis, automation, Computer Vision (CV), and Natural Language Processing (NLP). Machine learning has been a useful tool to solve the classification problem. Classification is the task of dividing the data according to sample characteristics. Therefore, it is naturally believed that this problem is solved by finding a suitable classification curve or surface. However, machine learning algorithms using this strategy generate only one black-box model. The traditional classification model mostly divides the data set by finding a proper classification curve or curved surface according to the sample characteristics, but neglects the logical relationship among the sample characteristics. In fact the logical relationship of the sample features is crucial for the classification.

The biological dendrites in the brain are proved to have logical operation ability of AND \ OR \ NOT. The existing model for imitating the function of the biological dendrites is a tree-shaped network. The dendritic type network is different from the conventional soma network in that: 1. the logical relationship of the input data is considered; 2. the nonlinear activation function of the traditional cell body network is replaced by dendrites to obtain a white box model, so that the number of the dendritic layers can be modified in a targeted manner according to the classification or fitting result of the model, and the over-fitting and under-fitting phenomena caused by over-depth or over-shallow of the network are effectively avoided. For the sake of distinction, the present invention refers to such a network that divides a data set by finding an appropriate classification curve or surface from sample features as a soma network, and refers to a network that classifies by means of logical relationships between input data or features as a dendritic network.

Although the dendritic type network has good effect on various large mainstream classified data sets, such as MNIST, FASHION-MNIST and the like. But it still has the following 3 problems: 1. the dendritic network changes the picture data into a one-dimensional tensor, and all the data are sent to the network before feature extraction is carried out, and although the logical relation among all input data is considered, all the data do not contribute to the classification precision of one picture; 2. when the dendritic network is used for logical extraction, the consistency of the dimensionality before and after data transmission needs to be ensured, so that the dendritic network is difficult to be embedded into other networks; 3. the dendritic type network loses the previously extracted logical relationship of the network in the process of making the logical extraction.

Disclosure of Invention

The invention aims to solve the technical problems that the traditional dendritic network cannot extract features, is difficult to embed into other networks and loses the logical relationship of the previous network extraction, and designs a convolution dendritic method for extracting feature logic for classification for the classification field.

The technical scheme adopted by the invention for solving the technical problems is as follows: a method of extracting convolutional dendrites for classification by feature logic, comprising the steps of:

step 1, inputting an image signal or a time sequence signal as input data, and extracting input data characteristics by using convolution operation of a weight matrix on the input data to obtain characteristic data;

step 2, summing the characteristic data and the bias matrix and introducing a constant term to obtain intermediate data;

step 3, performing Hadamard product construction on intermediate data containing the characteristic data and the constant item and the intermediate data to construct a logical relation between the characteristics;

step 4, iterative optimization of the weight matrix and the bias matrix is performed by adopting an error back propagation algorithm, so that a characteristic logic relation contributing to classification precision is reserved as output data of the layer, and the output data of the layer is input data of the next layer;

and 5, repeating the steps 1-4 to construct a deeper logic relation for classification.

Further, the step 1 of performing convolution operation on the input data by using the weight matrix to obtain the feature data specifically includes:

in which n is a layer number, X_FAs characteristic data, X_inFor input data, W is a weight matrix,

is a convolution.

Further, the step 2 sums the feature data with the bias matrix to obtain intermediate data:

in which n is a layer number, X_MFor intermediate data, X_FB is a bias matrix;

further, in the step 3, the intermediate data and the intermediate data themselves are subjected to a logical relationship between the Hadamard product operation and the structural characteristics to obtain output data of the layer:

in which n is a layer number, X_outTo output data, X_MIn order to be the intermediate data,

is the product of Hadamard;

further, the output data of the layer in step 4 is the input data of the next layer:

in which n is a layer number, X_inFor inputting data, X_outTo output data.

Repeating the steps (1) to (4) can construct a deeper logical relationship for classification.

The principle of the invention is as follows: 1. the feature of the input data is extracted by utilizing the convolution operation of the weight matrix and the input data to obtain the feature data, a large amount of data which does not contribute to the classification precision and even influences the classification precision is eliminated, the classification precision, the model stability and the convergence speed are improved while the operation amount is reduced; 2. summing the characteristic data and the bias matrix to obtain intermediate data, wherein the operation introduces a constant term into the characteristic data; 3. the intermediate data and the intermediate data are subjected to Hadamard product to obtain output data, the intermediate data is the sum of the characteristic data and a bias matrix (namely a constant item), and the output data obtained by Hadamard product with the intermediate data can retain the characteristic logic extracted by the network before, and meanwhile, the dimension change in the data transmission process is not required to be considered, so that the model has ultrahigh transportability; 4. and the output data of the layer is taken as input data and sent to the next layer of network, so that the network extracts the logical relationship among the features while extracting the data features, and finally, the classification is carried out according to the logical relationship among the features.

Compared with the prior art, the invention has the advantages that:

compared with the traditional cell body network (such as a multilayer perceptron, a convolutional neural network and the like which adopt different activation functions) which divides a data set by finding a proper classification curve or curved surface by depending on sample characteristics, the advantages are that: (1) the logical relationship of the input data is considered; (2) the nonlinear activation function of the traditional cell body network is replaced by dendrites to obtain a white box model, so that the number of the dendritic layers can be modified in a targeted manner according to the classification or fitting result of the model, and the over-fitting and under-fitting phenomena caused by over-depth or over-shallow of the network are effectively avoided.

The advantages compared with the traditional dendritic network are as follows: redundant data are reduced by means of extracting characteristic data, so that the method has smaller operation amount, higher classification precision, better model stability and higher convergence speed; by adding a bias matrix, introducing a constant term and then performing a Hadamard product with intermediate data, the characteristic logic extracted before the network is reserved, and meanwhile, the problem of dimension change in the data transmission process is not considered, so that the method has higher transportability; the problem that the traditional dendritic network cannot extract features is solved, and the logical relationship among the features is extracted while the data features are extracted.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a classification model constructed using the present invention;

FIG. 3 is a comparison framework of a classification model constructed by the present invention and a conventional mainstream model;

FIG. 4 is the experimental results obtained on the MNIST data set for comparative experiments;

FIG. 5 is the experimental results obtained on the FASHION-MNIST dataset for comparative experiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments of the present invention are described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The classification target related in the foregoing technical solution may be a tensor of any dimension and size, and the following describes, by taking input data of a 3 × 3 × 1 tensor as an example, a convolutional dendrite method for extracting feature logic for classification according to the present invention, which can be applied to the field of artificial intelligence, for example, to an image signal or a time series signal acquired by a circuit, as shown in fig. 1, the method includes the following steps:

step 1, inputting an image signal or a time sequence signal as input data, and performing convolution operation on the input data by using a weight matrix to obtain characteristic data

is a convolution;

step 2, summing the characteristic data and the bias matrix and introducing a constant term to obtain intermediate data

In which n is a layer number, X_MFor intermediate data, X_FB is a bias matrix;

step 3, performing a logical relationship between the intermediate data containing the characteristic data and the constant term and the intermediate data per se to construct characteristics by means of a Hadamard product to obtain output data of the layer

is the product of Hadamard;

from X_outCan be seen in the expansion of_outThe method comprises the logical relation of AND logic (x.x), OR logic (x + x) and non-logic (-x), namely, the logical relation required by classification is constructed, and then the characteristic logical relation combination contributing to the classification precision can be obtained by adjusting the weight matrix and the bias matrix through an error back propagation algorithm.

And 4, iteratively optimizing the weight matrix and the bias matrix by adopting an error back propagation algorithm, thereby keeping the characteristic logic relation contributing to the classification precision as the output data of the layer, wherein the output data of the layer is the input data of the next layer

In which n is a layer number, X_inFor inputting data, X_outTo output data.

And repeating the steps 1-4 to construct a deeper logical relationship for classification.

To further describe the application method of the present invention, a complete image classification model is constructed by using the present invention, the image signals of two open source data sets MNIST and fast-MNIST are classified, and compared with the classification models of the current major streams in the same frame. For convenience of description, the classification method of the present invention is abbreviated as CDD, and the classification model constructed by the present invention is referred to as CDD Net.

As shown in fig. 2, CDD Net is composed of a plurality of CDD modules and a full connection layer. The training of the CDD Net specifically comprises the following processes:

step 1, converting image signals in MNIST and FASHION-MNIST training sets into tensors with the size of 1 x 28, sending the tensors into a CDD module, and processing in the CDD module in the steps (1) to (4), namely extracting features by adopting convolution operation between a weight matrix and the image signals, and then constructing a feature logic combination through a Hadamard product between intermediate data to obtain a relatively shallow feature logic combination as output data;

step 2, taking the output data of the previous CDD module as the input data of the next CDD module, and continuously extracting relatively deep logic combination;

step 3, repeating the process 2 until the output of the last CDD module is obtained, and then sending the output to a full connection layer to obtain a classification result;

step 4, calculating errors between the classification results and the known labels, and optimizing a weight matrix and a bias matrix by adopting an error back propagation algorithm;

and 5, repeating the processes 1-4 to carry out iterative optimization until the error between the output and the image label is within an allowable range, so that the characteristic logic combination contributing to the classification precision can be obtained, and the optimized CDD Net is obtained. The whole process of extracting the characteristics and obtaining the beneficial characteristic logic combination aims at reducing errors, and automatic iterative optimization is carried out by taking an error back propagation algorithm as a method without manual participation. The prediction classification process of the CDD Net is a training process 1-3, namely, the features are extracted, and then output is obtained according to the beneficial logic relation combination among the features for classification. Since the weight matrix and bias matrix inside the CDD Net have been adjusted and optimized by a large amount of data in the training set, the error between the output of the image signal in the test set through the CDD Net and the image label is also within the allowable range.

As shown in fig. 3, the present invention (CDD for short) is compared with the classification models of the current major streams under the same framework, including Convolutional Neural Network (CNN) which is most good at image recognition, and traditional dendritic network (DD), and also classical multi-layer perceptron (MLP) which adopts different activation functions, wherein Relu, Sig, and Tsig are all common activation functions. The adopted data set is an open source data set MNIST and a FASHION-MNIST which are mainstream in the field of artificial intelligence. To form a control, all models were made with similar structures.

The comparison results obtained on MNIST and FASHION-MNIST are shown in FIG. 4 and FIG. 5 respectively, and it can be seen from FIG. 4 and FIG. 5 that the present invention has the highest test accuracy, the fastest convergence rate and the best stability, which shows that 1.CDD can approach the global minimum value rather than the local minimum value with the fastest speed and the maximum extent; the CDD can rapidly extract sample characteristics and learn the logic relation contained between the characteristics, and avoids interference of useless data (which is the problem of DD), so that the CDD has high precision and high convergence rate; the CDD has higher stability than the conventional soma network (such as CNN or MLP) which generally uses embedded nonlinear function to find hyperplane.

Those skilled in the art will appreciate that the invention may be practiced without these specific details.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A convolutional dendrite method for extracting feature logic for classification, comprising the steps of:

and 5, repeating the steps (1) to (4) to construct a deeper logical relationship for classification.

2. The method for extracting convolutional dendrites for classifying by feature logic according to claim 1, wherein the step 1 of performing convolution operation on input data by using a weight matrix to obtain feature data specifically comprises:

is a convolution.

3. The convolutional dendritic method for extracting feature logic for classification as claimed in claim 1, wherein said step 2 sums the feature data with the bias matrix to obtain the intermediate data:

in which n is a layer number, X_MFor intermediate data, X_FFor feature data, B is a bias matrix.

4. The convolutional dendritic method for extracting feature logic for classification as claimed in claim 1, wherein the step 3 constructs the logical relationship between the features by performing a hadamard product operation on the intermediate data and the intermediate data itself to obtain the output data of the layer:

in which n is a layer number, X_outTo output data, X_MFor the intermediate data, O is the Hadamard product.

5. The convolutional dendritic method for extracting feature logic for classification as claimed in claim 1, wherein the output data of the step 4 layer is the input data of the next layer:

in which n is a layer number, X_inFor inputting data, X_outTo output data.