CN109087264B

CN109087264B - Method for making network notice important part of data based on deep network

Info

Publication number: CN109087264B
Application number: CN201810891937.0A
Authority: CN
Inventors: 李秀; 龙如蛟
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2021-04-09
Anticipated expiration: 2038-08-07
Also published as: CN109087264A

Abstract

The invention discloses a depth network-based network attention numberAccording to the significant part, the method comprises: a1, vectorizing the original feature map, namely each pixel of the original feature map is represented by a vector; a2, obtaining a similarity matrix through self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X^*(ii) a A3, reconstructing the feature map X^*And performing iterative processing after comparing with the original characteristic diagram. Reconstructing the original characteristic diagram to obtain a reconstructed characteristic diagram X after the original characteristic diagram is subjected to vectorization design^*Convergence is achieved through iterative processing, parameters do not need to be introduced, important areas become very obvious, and network identification capacity is improved.

Description

Method for making network notice important part of data based on deep network

Technical Field

The invention relates to the field of computer vision, in particular to a method for making a network notice important parts of data based on a deep network, which is also called a notice mechanism in the field of computer vision.

Background

Human vision acquires a target area needing important attention by rapidly scanning a global image, and then puts more attention resources into the area, and suppresses other useless information. Although this is a human instinct, for neural networks, it does not have this ability to judge, but treats each pixel equally, thus limiting the ability to express the network.

The method for making the network notice important parts of data based on the deep network has the advantage that the neural network starts to learn to notice important information due to the introduction of a attention mechanism. An attention mechanism may be used for spatial pixels, so that the network pays attention to important spatial regions; the method can be used for a feature map channel, so that the network learns the category semantics; but also in the time dimension to capture behaviors, actions, etc. Acting on spatial pixels, it is necessary to learn a weight value in an interval of 0 to 1 for each pixel, and then multiply the pixel by the weight value as a new pixel value. For important pixels, the learned weight is larger, and the learned weight is smaller for less important pixels, so that the effect of amplifying important areas and inhibiting other areas can be achieved, and the attention function of human beings can be simulated.

Although attention mechanisms have had considerable success in the field of computer vision. However, these results have the following disadvantages:

1) the network design is very complicated and the universality is poor. For example, although the Network in "Residual Attention Network for Image Classification" has a good effect, the process of learning the Attention weight is very complicated and heavy, so that other researchers can hardly use the results.

2) In all current attention mechanisms, the process of learning the weight value needs to introduce additional parameters. Increasing the amount of parameters will cause various problems, such as easy overfitting, high training hardware requirements, inability to migrate to the mobile end, etc.

Therefore, it is necessary to learn attention weights according to the information contained in the data itself without introducing additional parameters, so that the number of deep network parameters is large, overfitting is easy to occur, the model is too large to be transplanted to a mobile terminal, how to obtain important image information, and the like, and a problem to be solved is urgently solved at present.

Disclosure of Invention

The invention aims to solve the problem that how to obtain important image information in a neural network without additionally introducing network parameters in the prior art, and provides a method for making the network pay attention to important parts of data based on a depth network and a parameter-free attention mechanism design method based on the depth network.

In order to solve the technical problem, the following technical scheme is adopted in the application:

a method for making a network aware of important parts of data based on a deep network, comprising the steps of:

a1, vectorizing the original feature map (X), namely each pixel of the original feature map (X) is represented by a vector;

a2, obtaining a similarity matrix (W) by self-comparison learning and normalizing the similarity matrix to obtain a reconstructed characteristic diagram (X)^*)；

A3, reconstructing a feature map (X)^*) And after comparing with the original characteristic diagram (X), carrying out iterative processing to make the important area obvious.

Preferably, in the step a1, vectorizing the original feature map specifically includes:

the original feature map (X)^[h,w,c]) Expressed as:

where N is h × w, h is the length of the original feature map, w is the width of the original feature map, the number of elements in each vector is c, each pixel is a vector with length c, T is the transpose operation,

representing the kth pixel in the feature map.

Preferably, in the step a2, the similarity between any two pixels is compared by multiplying the original feature map by the transpose of the original feature map, that is, W is XX^TAnd obtaining a similarity matrix (W).

Preferably, in the step a2, the normalization of the similarity matrix (W) refers to the normalization of the similarity matrix (W) by a softmax function.

Preferably, in the step a2, a feature map (X) is reconstructed^*) The similarity matrix is obtained by normalizing the similarity matrix (W) and then performing product on the normalized similarity matrix and the original characteristic diagram (X).

Preferably, the feature map (X) is reconstructed^*) Is multiplied by the original feature map (X), i.e. W₁＝XX^*TObtaining a reconstructed feature map (X)^*) The similarity matrix (W1).

Preferably, the feature map X is reconstructed^*Until the algorithm converges, the similarity matrix (W1) of the original feature map (X) is iterated with the similarity matrix (W) of the original feature map (X), and the similarity matrix (W0) embedded with the important region weight information is obtained.

Preferably, the (X) obtained from the last iteration is used^*) Sending to the next iteration as input to obtain the next (X)^*) This process is repeated until the algorithm converges.

Preferably, the first iteration, i.e. the original feature map (X), is compared with itself.

Preferably, the algorithm converges after 4 iterations.

Compared with the prior art, the invention has the beneficial effects that:

the method for making network notice important part of data based on deep network of the invention is to convert original characteristic diagram X into vector quantity and then convert the vector quantity into original characteristic diagram XPerforming product to obtain a similarity matrix W of the original characteristic diagram, and normalizing the similarity matrix W to obtain a reconstructed characteristic diagram X^*Then, the feature map X is reconstructed^*The similarity matrix is iterated with the similarity matrix W of the original characteristic diagram to obtain the similarity matrix embedded with the importance information weight, network parameters do not need to be additionally introduced, the important area becomes very obvious, and the network identification capability is improved.

Further, compared with a scalar, the vector can transmit and express more information, and the importance feature containing the original feature map X is more.

Further, the similarity matrix is learned by self-comparison within the feature map without introducing more parameters, thereby preventing overfitting and models being too large.

Furthermore, the method can be designed into a universal module, is suitable for being inserted into any layer of any convolutional network, and has very strong universality.

Drawings

FIG. 1 is a schematic diagram of the algorithm structure of the present invention;

fig. 2 is another algorithm flow diagram of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

The invention relates to a method for making a network notice important parts of data based on a deep network, which comprises the following steps:

a1, vectorizing the original feature diagram X, namely each pixel of the original feature diagram X is represented by a vector;

a2, obtaining a similarity matrix W by self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X^*。

A3, reconstructing the feature map X^*And after comparing with the original characteristic diagram X, carrying out iterative processing to make the important area obvious.

The algorithm flow chart of the method for making the network notice the important part of the data based on the deep network of the embodiment is shown in fig. 1, wherein,

on the left is the first iteration of the algorithm, i.e. the original feature map itself is learned compared with itself.

The left side is the residual iteration process, and the reconstructed feature map is compared with the original feature map.

Because the importance information is embedded into the weight in the process of comparative learning and introduced into the similarity matrix W, the process of reconstructing the feature map by using the similarity matrix W is actually the feature reselection, and the important area in the reconstructed feature map becomes very obvious, thereby improving the network identification capability.

Specifically, the method for designing the non-parameter attention mechanism based on the deep network in this embodiment is performed according to the following steps:

step A1, vectorizing the original characteristic diagram X,

let the original feature map X be a feature map of [ h, w, c ] dimension, where h and w represent the length and width dimensions of the original feature map and c is the number of channels.

According to the length h and the width w of the original feature map X, the original feature map X has h multiplied by w pixels. The number of channels per pixel is c. The original feature map X can therefore be seen as being composed of h × w vectors, each vector having c elements. The original feature map X is expressed as

N ═ hxw. Wherein T is a transpose operation,

represents the kth pixel in the original feature map, and each pixel is a vector with length c.

Step A2, obtaining a similarity matrix W by self-comparison learning and normalizing the similarity matrix W to obtain a reconstructed characteristic diagram X^*The method comprises the following steps:

2.1 self-comparison learning to obtain a similarity matrix W

W＝XX^T (1)

Wherein the content of the first and second substances,

representing the original feature map, X^TThe representation is a transpose of the original feature map X,

namely, it is

The meaning expressed by equation (1) is: the original feature map X is multiplied with the transpose of the original feature map X, i.e. starting with the first pixel, each pixel of the original feature map X is inner-multiplied with the other pixels of the input feature map, because

So for any two vectors, the smaller the angle, the more similar the result of the inner product. Thus XX^TThe result of (1) is to compare the similarity between any two pixels, thereby obtaining a similarity matrix W

The ith row in the similarity matrix W represents the result of the similarity comparison of the ith pixel with all the pixels.

2.2 normalization processing

Normalizing each column of the similarity matrix W by using a softmax function (0 in softmax (W,0) in the formula represents column normalization), and obtaining a normalized matrix of the original feature map X, namely

In which each element p in the matrix_ijIs a number between 0 and 1, indicating a similar ratio between pixels.

As can be seen from equation (3), if two pixels are closer, the weight of the corresponding element in the normalized matrix is larger.

2.3 reconstruction of feature map X^*

Reconstructing the original characteristic diagram X by utilizing the similar proportion among all pixels to obtain a reconstructed characteristic diagram X with significant characteristics^*I.e. by

X^*＝softmax(W,0)X (4)

In particular, the amount of the solvent to be used,

the weight values of all elements in the normalized matrix are used for representing the mutual relation between the pixels, the larger the weight value is, the closer the two pixels are, the greater the contribution to each other is during reconstruction, therefore, the relevance and the importance among the pixels are embedded into all the elements in the normalized matrix, and the feature diagram X is reconstructed^*Is the process of re-selecting features using the similarity matrix W. Thus reconstructing the feature map X^*Becomes stronger relative to the similarity matrix X. Reconstruction of feature map X^*The important area represented by (1) will be given more weight to be highlighted.

Step A3, reconstructing the feature map X^*After being compared with the original characteristic diagram X, the iteration processing is carried out to make the important area become obvious,

W＝W_old+XX^*T (6)

X^*＝softmax(W,0)X (7)

the X obtained by the i-1 th iteration_i-1 ^*Sending the ith iteration as input to obtain the ith X_i ^*… … repeat this process until the algorithm converges.

Considering that the similarity matrix W is obtained by self-comparison learning and is not always right, if the similarity matrix W is problematic, the feature diagram X is reconstructed^*Problems must also arise if the feature map X is reconstructed^*Compared with the self-learning, the comparison result is more and more wrong, so that the application adopts the reconstructed characteristic diagram X^*The method carries out self comparison learning with the original characteristic diagram X, and the original characteristic diagram X plays a role in supervision, and is similar to PagA transition matrix of the errank algorithm.

In general, this process iterates 3 to 4 times to achieve convergence.

Specifically, the steps a1 to A3 in this embodiment are processed by the following algorithm process, as shown in fig. 2, including the following specific steps:

a11, start;

[x1,…,xN,]T；

a13, initializing variable i, 0 → i;

XT→W；

a15, normalization process softmax (W,0) → P;

a16, reconstruction characteristic map PX->X^*；

A17, iterative processing i +1 → i;

sending Xi-1 to the ith iteration as input to obtain an ith reconstructed feature map Xi, Wold + XX T → W;

a19, performing normalization processing softmax (W,0) → P on the reconstruction feature map, wherein 0 in the formula represents the normalization of the W column;

a20, multiplying the new learned weight P by the original feature map to reconstruct a new feature map PX->X^*；

A20, determining whether the number of iterations is greater than a set value i > 4? If yes, continue; if not, returning to A17;

a21, outputting a converged reconstruction feature map; and A22, ending.

Wherein, the steps A11-A12 correspond to the step A1, the steps A13-A16 correspond to the step A2, and the rest correspond to the step A3.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A method for making a network aware of important parts of image data based on a deep network, comprising the steps of:

a1, vectorizing the original feature diagram X of the image, namely each pixel of the original feature diagram X is represented by a vector;

a2, obtaining a similarity matrix W by self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X^*；

A3, reconstructing the feature map X^*After comparing with the original characteristic diagram X, carrying out iterative processing to make the important area obvious;

reconstructing the feature map X^*Is multiplied by the original feature map X, i.e. W₁＝XX^*TObtaining a reconstructed feature map X^*The similarity matrix W1;

reconstructing the feature map X^*The similarity matrix W1 and the similarity matrix W of the original feature map X are iterated until the algorithm converges, and a similarity matrix W0 embedded with important region weight information is obtained.

2. The method according to claim 1, wherein in the step a1, the original feature map is vectorized, specifically:

the original feature map X^[h，w，c]Expressed as:

representing the kth pixel in the feature map, k being a number from 1 to N.

3. The method according to claim 1, wherein in the step a2, the similarity between any two pixels is obtained by multiplying the original feature map by the transpose of the original feature map, i.e. W ═ XX^TAnd obtaining a similarity matrix W.

4. The method according to claim 1, wherein in the step a2, the normalization of the similarity matrix W is performed by normalizing the similarity matrix W with a softmax function.

5. The method according to claim 1, wherein in step A2, the feature map X is reconstructed^*The similarity matrix is obtained by normalizing the similarity matrix W and then performing product on the normalized similarity matrix W and the original characteristic diagram X.

6. The method of claim 1, wherein X from a previous iteration is used^*Sending the next iteration as input to obtain the next X^*This process is repeated until the algorithm converges.

7. The method of claim 1, wherein the first iteration is to compare the original profile X with itself.

8. The method of claim 1, wherein the algorithm converges after 4 iterations.