CN109087264B - Method for making network notice important part of data based on deep network - Google Patents

Method for making network notice important part of data based on deep network Download PDF

Info

Publication number
CN109087264B
CN109087264B CN201810891937.0A CN201810891937A CN109087264B CN 109087264 B CN109087264 B CN 109087264B CN 201810891937 A CN201810891937 A CN 201810891937A CN 109087264 B CN109087264 B CN 109087264B
Authority
CN
China
Prior art keywords
feature map
original
similarity matrix
characteristic diagram
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810891937.0A
Other languages
Chinese (zh)
Other versions
CN109087264A (en
Inventor
李秀
龙如蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CN201810891937.0A priority Critical patent/CN109087264B/en
Publication of CN109087264A publication Critical patent/CN109087264A/en
Application granted granted Critical
Publication of CN109087264B publication Critical patent/CN109087264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/94
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention discloses a depth network-based network attention numberAccording to the significant part, the method comprises: a1, vectorizing the original feature map, namely each pixel of the original feature map is represented by a vector; a2, obtaining a similarity matrix through self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X*(ii) a A3, reconstructing the feature map X*And performing iterative processing after comparing with the original characteristic diagram. Reconstructing the original characteristic diagram to obtain a reconstructed characteristic diagram X after the original characteristic diagram is subjected to vectorization design*Convergence is achieved through iterative processing, parameters do not need to be introduced, important areas become very obvious, and network identification capacity is improved.

Description

Method for making network notice important part of data based on deep network
Technical Field
The invention relates to the field of computer vision, in particular to a method for making a network notice important parts of data based on a deep network, which is also called a notice mechanism in the field of computer vision.
Background
Human vision acquires a target area needing important attention by rapidly scanning a global image, and then puts more attention resources into the area, and suppresses other useless information. Although this is a human instinct, for neural networks, it does not have this ability to judge, but treats each pixel equally, thus limiting the ability to express the network.
The method for making the network notice important parts of data based on the deep network has the advantage that the neural network starts to learn to notice important information due to the introduction of a attention mechanism. An attention mechanism may be used for spatial pixels, so that the network pays attention to important spatial regions; the method can be used for a feature map channel, so that the network learns the category semantics; but also in the time dimension to capture behaviors, actions, etc. Acting on spatial pixels, it is necessary to learn a weight value in an interval of 0 to 1 for each pixel, and then multiply the pixel by the weight value as a new pixel value. For important pixels, the learned weight is larger, and the learned weight is smaller for less important pixels, so that the effect of amplifying important areas and inhibiting other areas can be achieved, and the attention function of human beings can be simulated.
Although attention mechanisms have had considerable success in the field of computer vision. However, these results have the following disadvantages:
1) the network design is very complicated and the universality is poor. For example, although the Network in "Residual Attention Network for Image Classification" has a good effect, the process of learning the Attention weight is very complicated and heavy, so that other researchers can hardly use the results.
2) In all current attention mechanisms, the process of learning the weight value needs to introduce additional parameters. Increasing the amount of parameters will cause various problems, such as easy overfitting, high training hardware requirements, inability to migrate to the mobile end, etc.
Therefore, it is necessary to learn attention weights according to the information contained in the data itself without introducing additional parameters, so that the number of deep network parameters is large, overfitting is easy to occur, the model is too large to be transplanted to a mobile terminal, how to obtain important image information, and the like, and a problem to be solved is urgently solved at present.
Disclosure of Invention
The invention aims to solve the problem that how to obtain important image information in a neural network without additionally introducing network parameters in the prior art, and provides a method for making the network pay attention to important parts of data based on a depth network and a parameter-free attention mechanism design method based on the depth network.
In order to solve the technical problem, the following technical scheme is adopted in the application:
a method for making a network aware of important parts of data based on a deep network, comprising the steps of:
a1, vectorizing the original feature map (X), namely each pixel of the original feature map (X) is represented by a vector;
a2, obtaining a similarity matrix (W) by self-comparison learning and normalizing the similarity matrix to obtain a reconstructed characteristic diagram (X)*);
A3, reconstructing a feature map (X)*) And after comparing with the original characteristic diagram (X), carrying out iterative processing to make the important area obvious.
Preferably, in the step a1, vectorizing the original feature map specifically includes:
the original feature map (X)[h,w,c]) Expressed as:
Figure BDA0001757171540000021
where N is h × w, h is the length of the original feature map, w is the width of the original feature map, the number of elements in each vector is c, each pixel is a vector with length c, T is the transpose operation,
Figure BDA0001757171540000022
representing the kth pixel in the feature map.
Preferably, in the step a2, the similarity between any two pixels is compared by multiplying the original feature map by the transpose of the original feature map, that is, W is XXTAnd obtaining a similarity matrix (W).
Preferably, in the step a2, the normalization of the similarity matrix (W) refers to the normalization of the similarity matrix (W) by a softmax function.
Preferably, in the step a2, a feature map (X) is reconstructed*) The similarity matrix is obtained by normalizing the similarity matrix (W) and then performing product on the normalized similarity matrix and the original characteristic diagram (X).
Preferably, the feature map (X) is reconstructed*) Is multiplied by the original feature map (X), i.e. W1=XX*TObtaining a reconstructed feature map (X)*) The similarity matrix (W1).
Preferably, the feature map X is reconstructed*Until the algorithm converges, the similarity matrix (W1) of the original feature map (X) is iterated with the similarity matrix (W) of the original feature map (X), and the similarity matrix (W0) embedded with the important region weight information is obtained.
Preferably, the (X) obtained from the last iteration is used*) Sending to the next iteration as input to obtain the next (X)*) This process is repeated until the algorithm converges.
Preferably, the first iteration, i.e. the original feature map (X), is compared with itself.
Preferably, the algorithm converges after 4 iterations.
Compared with the prior art, the invention has the beneficial effects that:
the method for making network notice important part of data based on deep network of the invention is to convert original characteristic diagram X into vector quantity and then convert the vector quantity into original characteristic diagram XPerforming product to obtain a similarity matrix W of the original characteristic diagram, and normalizing the similarity matrix W to obtain a reconstructed characteristic diagram X*Then, the feature map X is reconstructed*The similarity matrix is iterated with the similarity matrix W of the original characteristic diagram to obtain the similarity matrix embedded with the importance information weight, network parameters do not need to be additionally introduced, the important area becomes very obvious, and the network identification capability is improved.
Further, compared with a scalar, the vector can transmit and express more information, and the importance feature containing the original feature map X is more.
Further, the similarity matrix is learned by self-comparison within the feature map without introducing more parameters, thereby preventing overfitting and models being too large.
Furthermore, the method can be designed into a universal module, is suitable for being inserted into any layer of any convolutional network, and has very strong universality.
Drawings
FIG. 1 is a schematic diagram of the algorithm structure of the present invention;
fig. 2 is another algorithm flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The invention relates to a method for making a network notice important parts of data based on a deep network, which comprises the following steps:
a1, vectorizing the original feature diagram X, namely each pixel of the original feature diagram X is represented by a vector;
a2, obtaining a similarity matrix W by self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X*
A3, reconstructing the feature map X*And after comparing with the original characteristic diagram X, carrying out iterative processing to make the important area obvious.
The algorithm flow chart of the method for making the network notice the important part of the data based on the deep network of the embodiment is shown in fig. 1, wherein,
on the left is the first iteration of the algorithm, i.e. the original feature map itself is learned compared with itself.
The left side is the residual iteration process, and the reconstructed feature map is compared with the original feature map.
Because the importance information is embedded into the weight in the process of comparative learning and introduced into the similarity matrix W, the process of reconstructing the feature map by using the similarity matrix W is actually the feature reselection, and the important area in the reconstructed feature map becomes very obvious, thereby improving the network identification capability.
Specifically, the method for designing the non-parameter attention mechanism based on the deep network in this embodiment is performed according to the following steps:
step A1, vectorizing the original characteristic diagram X,
let the original feature map X be a feature map of [ h, w, c ] dimension, where h and w represent the length and width dimensions of the original feature map and c is the number of channels.
According to the length h and the width w of the original feature map X, the original feature map X has h multiplied by w pixels. The number of channels per pixel is c. The original feature map X can therefore be seen as being composed of h × w vectors, each vector having c elements. The original feature map X is expressed as
Figure BDA0001757171540000041
N ═ hxw. Wherein T is a transpose operation,
Figure BDA0001757171540000042
represents the kth pixel in the original feature map, and each pixel is a vector with length c.
Step A2, obtaining a similarity matrix W by self-comparison learning and normalizing the similarity matrix W to obtain a reconstructed characteristic diagram X*The method comprises the following steps:
2.1 self-comparison learning to obtain a similarity matrix W
W=XXT (1)
Wherein the content of the first and second substances,
Figure BDA0001757171540000043
representing the original feature map, XTThe representation is a transpose of the original feature map X,
namely, it is
Figure BDA0001757171540000044
The meaning expressed by equation (1) is: the original feature map X is multiplied with the transpose of the original feature map X, i.e. starting with the first pixel, each pixel of the original feature map X is inner-multiplied with the other pixels of the input feature map, because
Figure BDA0001757171540000051
So for any two vectors, the smaller the angle, the more similar the result of the inner product. Thus XXTThe result of (1) is to compare the similarity between any two pixels, thereby obtaining a similarity matrix W
Figure BDA0001757171540000052
The ith row in the similarity matrix W represents the result of the similarity comparison of the ith pixel with all the pixels.
2.2 normalization processing
Normalizing each column of the similarity matrix W by using a softmax function (0 in softmax (W,0) in the formula represents column normalization), and obtaining a normalized matrix of the original feature map X, namely
Figure BDA0001757171540000053
In which each element p in the matrixijIs a number between 0 and 1, indicating a similar ratio between pixels.
As can be seen from equation (3), if two pixels are closer, the weight of the corresponding element in the normalized matrix is larger.
2.3 reconstruction of feature map X*
Reconstructing the original characteristic diagram X by utilizing the similar proportion among all pixels to obtain a reconstructed characteristic diagram X with significant characteristics*I.e. by
X*=softmax(W,0)X (4)
In particular, the amount of the solvent to be used,
Figure BDA0001757171540000054
the weight values of all elements in the normalized matrix are used for representing the mutual relation between the pixels, the larger the weight value is, the closer the two pixels are, the greater the contribution to each other is during reconstruction, therefore, the relevance and the importance among the pixels are embedded into all the elements in the normalized matrix, and the feature diagram X is reconstructed*Is the process of re-selecting features using the similarity matrix W. Thus reconstructing the feature map X*Becomes stronger relative to the similarity matrix X. Reconstruction of feature map X*The important area represented by (1) will be given more weight to be highlighted.
Step A3, reconstructing the feature map X*After being compared with the original characteristic diagram X, the iteration processing is carried out to make the important area become obvious,
W=Wold+XX*T (6)
X*=softmax(W,0)X (7)
the X obtained by the i-1 th iterationi-1 *Sending the ith iteration as input to obtain the ith Xi *… … repeat this process until the algorithm converges.
Considering that the similarity matrix W is obtained by self-comparison learning and is not always right, if the similarity matrix W is problematic, the feature diagram X is reconstructed*Problems must also arise if the feature map X is reconstructed*Compared with the self-learning, the comparison result is more and more wrong, so that the application adopts the reconstructed characteristic diagram X*The method carries out self comparison learning with the original characteristic diagram X, and the original characteristic diagram X plays a role in supervision, and is similar to PagA transition matrix of the errank algorithm.
In general, this process iterates 3 to 4 times to achieve convergence.
The algorithm flow chart of the method for making the network notice the important part of the data based on the deep network of the embodiment is shown in fig. 1, wherein,
on the left is the first iteration of the algorithm, i.e. the original feature map itself is learned compared with itself.
The left side is the residual iteration process, and the reconstructed feature map is compared with the original feature map.
Because the importance information is embedded into the weight in the process of comparative learning and introduced into the similarity matrix W, the process of reconstructing the feature map by using the similarity matrix W is actually the feature reselection, and the important area in the reconstructed feature map becomes very obvious, thereby improving the network identification capability.
Specifically, the steps a1 to A3 in this embodiment are processed by the following algorithm process, as shown in fig. 2, including the following specific steps:
a11, start;
[x1,…,xN,]T;
a13, initializing variable i, 0 → i;
XT→W;
a15, normalization process softmax (W,0) → P;
a16, reconstruction characteristic map PX->X*
A17, iterative processing i +1 → i;
sending Xi-1 to the ith iteration as input to obtain an ith reconstructed feature map Xi, Wold + XX T → W;
a19, performing normalization processing softmax (W,0) → P on the reconstruction feature map, wherein 0 in the formula represents the normalization of the W column;
a20, multiplying the new learned weight P by the original feature map to reconstruct a new feature map PX->X*
A20, determining whether the number of iterations is greater than a set value i > 4? If yes, continue; if not, returning to A17;
a21, outputting a converged reconstruction feature map; and A22, ending.
Wherein, the steps A11-A12 correspond to the step A1, the steps A13-A16 correspond to the step A2, and the rest correspond to the step A3.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (8)

1. A method for making a network aware of important parts of image data based on a deep network, comprising the steps of:
a1, vectorizing the original feature diagram X of the image, namely each pixel of the original feature diagram X is represented by a vector;
a2, obtaining a similarity matrix W by self-comparison learning and normalizing to obtain a reconstructed characteristic diagram X*
A3, reconstructing the feature map X*After comparing with the original characteristic diagram X, carrying out iterative processing to make the important area obvious;
reconstructing the feature map X*Is multiplied by the original feature map X, i.e. W1=XX*TObtaining a reconstructed feature map X*The similarity matrix W1;
reconstructing the feature map X*The similarity matrix W1 and the similarity matrix W of the original feature map X are iterated until the algorithm converges, and a similarity matrix W0 embedded with important region weight information is obtained.
2. The method according to claim 1, wherein in the step a1, the original feature map is vectorized, specifically:
the original feature map X[h,w,c]Expressed as:
Figure FDA0002959275360000011
where N is h × w, h is the length of the original feature map, w is the width of the original feature map, the number of elements in each vector is c, each pixel is a vector with length c, T is the transpose operation,
Figure FDA0002959275360000012
representing the kth pixel in the feature map, k being a number from 1 to N.
3. The method according to claim 1, wherein in the step a2, the similarity between any two pixels is obtained by multiplying the original feature map by the transpose of the original feature map, i.e. W ═ XXTAnd obtaining a similarity matrix W.
4. The method according to claim 1, wherein in the step a2, the normalization of the similarity matrix W is performed by normalizing the similarity matrix W with a softmax function.
5. The method according to claim 1, wherein in step A2, the feature map X is reconstructed*The similarity matrix is obtained by normalizing the similarity matrix W and then performing product on the normalized similarity matrix W and the original characteristic diagram X.
6. The method of claim 1, wherein X from a previous iteration is used*Sending the next iteration as input to obtain the next X*This process is repeated until the algorithm converges.
7. The method of claim 1, wherein the first iteration is to compare the original profile X with itself.
8. The method of claim 1, wherein the algorithm converges after 4 iterations.
CN201810891937.0A 2018-08-07 2018-08-07 Method for making network notice important part of data based on deep network Active CN109087264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810891937.0A CN109087264B (en) 2018-08-07 2018-08-07 Method for making network notice important part of data based on deep network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810891937.0A CN109087264B (en) 2018-08-07 2018-08-07 Method for making network notice important part of data based on deep network

Publications (2)

Publication Number Publication Date
CN109087264A CN109087264A (en) 2018-12-25
CN109087264B true CN109087264B (en) 2021-04-09

Family

ID=64834210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810891937.0A Active CN109087264B (en) 2018-08-07 2018-08-07 Method for making network notice important part of data based on deep network

Country Status (1)

Country Link
CN (1) CN109087264B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780468A (en) * 2016-12-22 2017-05-31 中国计量大学 View-based access control model perceives the conspicuousness detection method of positive feedback
CN108334901A (en) * 2018-01-30 2018-07-27 福州大学 A kind of flowers image classification method of the convolutional neural networks of combination salient region

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106506901B (en) * 2016-09-18 2019-05-10 昆明理工大学 A kind of hybrid digital picture halftoning method of significance visual attention model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780468A (en) * 2016-12-22 2017-05-31 中国计量大学 View-based access control model perceives the conspicuousness detection method of positive feedback
CN108334901A (en) * 2018-01-30 2018-07-27 福州大学 A kind of flowers image classification method of the convolutional neural networks of combination salient region

Also Published As

Publication number Publication date
CN109087264A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
Liu et al. Learning converged propagations with deep prior ensemble for image enhancement
Liu et al. Connecting image denoising and high-level vision tasks via deep learning
CN110084296B (en) Graph representation learning framework based on specific semantics and multi-label classification method thereof
CN109033095B (en) Target transformation method based on attention mechanism
Brifman et al. Turning a denoiser into a super-resolver using plug and play priors
Rubinstein et al. Dictionary learning for analysis-synthesis thresholding
CN111461322B (en) Deep neural network model compression method
US20160283858A1 (en) Multimodal Data Fusion by Hierarchical Multi-View Dictionary Learning
CN111325165B (en) Urban remote sensing image scene classification method considering spatial relationship information
CN107392865B (en) Restoration method of face image
Hyvärinen Statistical models of natural images and cortical visual representation
CN113220886A (en) Text classification method, text classification model training method and related equipment
Trujillo et al. Evolving estimators of the pointwise Hölder exponent with genetic programming
Zhang et al. Robust alternating low-rank representation by joint Lp-and L2, p-norm minimization
Fu et al. Learning dual priors for jpeg compression artifacts removal
CN113283524A (en) Anti-attack based deep neural network approximate model analysis method
CN115457183A (en) Training method, reconstruction method and device for generating and reconstructing serialized sketch model
CN113208641B (en) Auxiliary diagnosis method for lung nodule based on three-dimensional multi-resolution attention capsule network
Seyedi et al. Elastic adversarial deep nonnegative matrix factorization for matrix completion
CN109087264B (en) Method for making network notice important part of data based on deep network
Mukherjee et al. Learned reconstruction methods with convergence guarantees
CN112905894A (en) Collaborative filtering recommendation method based on enhanced graph learning
US20200380364A1 (en) Adversarial Probabilistic Regularization
Yao et al. Multiscale residual fusion network for image denoising
CN113298232B (en) Infrared spectrum blind self-deconvolution method based on deep learning neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant