CN115310607A - Vision transform model pruning method based on attention diagram - Google Patents
Vision transform model pruning method based on attention diagram Download PDFInfo
- Publication number
- CN115310607A CN115310607A CN202211239440.3A CN202211239440A CN115310607A CN 115310607 A CN115310607 A CN 115310607A CN 202211239440 A CN202211239440 A CN 202211239440A CN 115310607 A CN115310607 A CN 115310607A
- Authority
- CN
- China
- Prior art keywords
- attention
- model
- vit
- head
- visual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a visual Transformer model pruning method based on attention diagram, which is applied to a machine vision reasoning system and comprises the following steps: in the machine vision reasoning system, performing a plurality of initial training rounds on a ViT model through a data training network to generate a complete attention diagram; calculating the information entropy of the attention diagram, and performing pruning operation on the attention head according to the calculated information entropy; removing the weight parameters related to the pruned attention head to obtain a new ViT model; re-fine-tuning the parameters of the new ViT model; by pruning the multi-head attention module and deleting the characteristic diagram with high uncertainty and the corresponding attention head, the parameters and complexity of a ViT model are reduced, the calculation complexity and parameter quantity of a ViT model are reduced, the size of the ViT model can be reduced, and finally the aim of realizing the lightweight of the ViT model under the condition of limited performance loss of the ViT model is achieved.
Description
Technical Field
The invention belongs to the technical field of neural network lightweight, and particularly relates to a visual Transformer model pruning method based on attention diagram.
Background
The Transformer is a deep neural network mainly based on a self-attention mechanism and is applied to the field of natural language processing, a visual Transformer model is abbreviated as ViT model, the Transformer has strong modeling capability of long-range dependency relationship and has attracted remarkable success in various visual tasks, however, the huge calculation amount and memory consumption of the Transformer model are inherent problems, so that the Transformer model cannot be successfully deployed and put into use on an edge-end computing device with limited resources, pruning is a common method for effectively reducing inference cost of the neural network, and the method is widely applied to computer vision and natural language processing application.
The model pruning method based on the attention-seeking can be used for deploying the neural network model in an embedded machine vision reasoning system with low power consumption and limited computing resources, wherein the embedded machine vision reasoning system comprises an embedded computing board based on graphics processor acceleration and a neural network processor, and the system can generally only provide less than 20% of computing resources equivalent to a high-performance GPU.
Pruning operation can be generally divided into two categories of unstructured pruning and structured pruning, specifically, the unstructured pruning deletes single unimportant weight under a specific standard, the unstructured pruning belongs to a fine paradigm, little damage is caused to precision, special hardware design is required for actual acceleration, the structured pruning removes the whole substructure of a model, such as a channel and an attention head, some work has been done to prune ViT by reducing the number of image coding blocks, tang et al have developed a top-down image block pruning method, the method removes redundant image blocks based on reconstruction errors of a pre-trained model, xu et al have completely utilized the whole spatial structure based on image coding block selection and slow-fast combined update strategies maintained by structures; although the method can save the calculation cost, the reasoning complexity and the model size cannot be reduced, and therefore the visual Transformer model pruning method based on the attention-seeking is provided.
Disclosure of Invention
The invention aims to provide a visual Transformer model pruning method based on attention-deficit hyperactivity algorithm, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a visual Transformer model pruning method based on attention diagram is applied to a machine vision reasoning system and comprises the following steps:
step A, in a machine vision reasoning system, performing a plurality of rounds of initial training on a ViT model through a data training network to generate a complete attention diagram;
b, calculating the information entropy of the attention diagram, and performing pruning operation on the attention head according to the calculated information entropy to measure the uncertainty of the attention diagram;
step C, removing all weight parameters related to the pruned attention head to obtain a new ViT model;
and D, fine-tuning the parameters of the new ViT model again.
Preferably, in the step a, the ViT model splits an input image into N image blocks, and adds a class code to each image block, and then feeds the N image blocks with the additional class codes into an encoder similar to a common Transformer to form N image coding blocks.
Preferably, the step a includes the following steps:
a1, in the initial stage of ViT model training, the ViT model does not learn useful information, and at the moment, the attention is tried to be unordered and the attention is tried to have large information entropy;
a2, after a plurality of rounds of initial training are carried out on the ViT model, the ViT model learns basic information and starts to present a certain mode;
a3, in the final stage of ViT model training, when the ViT model converges, each attention head obtains an attention map, at this time, important image coding blocks are highly concerned by the attention head, the information entropy is reduced, and all the attention maps are the average result of one training round.
Preferably, in said step B, after the ViT model performs several rounds of initial training, when the useful information learned by the attention head increases, the attention head will pay attention to the image coding block, so that the information entropy decreases, and the attention force map has certainty; when the attention head learns less useful information, the attention head has a uniform attention to the whole world, so that the information entropy is increased, thereby generating large uncertainty, and the information entropy is used for measuring the uncertainty of the attention diagram.
Preferably, in the step B, for the Transformer block, the multi-headed self-attention MSA and the multi-layered perceptron MLP are the main parts of the expenditure of computing resources;
represents the input of the L-th layer, andthen, the attention calculation of the attention head h is as shown in equation (1):
q, K, V denote "query", "key" and "value" in the multi-headed attention mechanism, respectively;
for the h attention head module in the L level, participate in generating an attention diagram,the calculated "query", "key" and "value" are respectively expressed as;
d represents the attention head embedding dimension;
n represents the number of image blocks input into the ViT model;
t represents a visual Transformer network with an attention head of H;
the calculation of the multi-headed self-attention MSA is shown in equation (2):
h denotes the number of attention heads.
Preferably, the calculation complexity through the parameters contained in formula (1) and formula (2) is as shown in formula (3):
c represents the parameter calculation complexity;
4NDHd represents the sum of the calculated amounts of the projection calculation;
the simultaneous parameter quantity is shown in formula (4):
p represents the number of parameters;
d represents the embedding dimension, D = Hd when the ViT model has not been pruned.
Preferably, when the input sequence of the visual Transformer is a long sequence scene, the computational complexity of self-attention is expressed as;
When the sequence length of the visual Transformer cannot dominate the complexity of all multi-headed attention modules, the computational complexity of self-attention is expressed as。
Preferably, after the ViT model is pruned, the number of heads of attention is pruned toThen the complexity after pruning is as shown in equation (5):
the simultaneous parameter quantity is shown in formula (6):
preferably, in said step B, attention is soughtRepresenting the attention map of the L-th layer and the attention head h, the information entropy of the attention map is shown as formula (7):
for the ith query image block, performing Softmax operation in the attention calculation, thenRepresenting the probability distribution of key image blocks to the ith query coding block.
Compared with the prior art, the invention has the beneficial effects that:
1. through the analysis of the visual Transformer model structure, a plurality of calculation resources are occupied by a multi-head attention module in the model, the multi-head attention module is pruned, a characteristic diagram with high uncertainty and a corresponding attention head are deleted, so that the parameters and complexity of the ViT model are reduced, the calculation complexity and parameter quantity of the ViT model are reduced, the size of ViT model can be reduced under the condition that the accuracy of the ViT model is not greatly influenced, and finally the lightweight of the ViT model is realized under the condition that the limited performance loss of the ViT model is achieved;
2. compared with the traditional pruning method, the invention guides the pruning operation of the attention head by utilizing the attention map instead of the traditional Taylor criterion, thereby providing a new thought for the pruning decision;
3. the importance of each attention head is measured through the information entropy, and a pruning decision is guided.
Drawings
FIG. 1 is a schematic view of the attention head pruning process of the present invention;
FIG. 2 is a schematic flow chart of the method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the visual transform model pruning method based on attention diagram provided by the present invention is applied to a machine vision reasoning system, and includes the following steps:
step A, in a machine vision reasoning system, performing a plurality of rounds of initial training on a ViT model through a data training network to generate a complete attention diagram;
in step a, the ViT model splits an input image into N image blocks, attaches a class code to each image block, and then feeds the N image blocks with the additional class codes into an encoder to form N image coding blocks;
the step A comprises the following steps:
a1, in the initial stage of ViT model training, the ViT model does not learn useful information, and at the moment, the attention is tried to be unordered and the attention is tried to have large information entropy;
a2, after a plurality of rounds of initial training are carried out on a ViT model, a ViT model learns basic information and starts to present a certain mode;
a3, in the final stage of ViT model training, when the ViT model converges, each attention head obtains an attention map, at the moment, important image coding blocks are highly concerned by the attention heads, so that the information entropy is reduced, and all the attention maps are the average result of one training round;
b, calculating the information entropy of the attention diagram, and performing pruning operation on the attention head according to the calculated information entropy to measure the uncertainty of the attention diagram;
in step B, after the ViT model performs a plurality of initial training rounds, when the useful information learned by the attention head increases, the attention head focuses on the image coding block, so that the information entropy decreases, and the attention head is more deterministic; when the attention head learns less useful information, the attention head has uniform attention to the whole world, so that the information entropy is increased, and large uncertainty is generated, wherein the information entropy is used for measuring the uncertainty of the attention diagram in the process;
in step B, for the transform block, the multi-headed self-attention MSA and the multi-tier perceptron MLP are the main parts of the expenditure of computing resources;
represents the input of the L-th layer, andthen, the attention calculation of the attention head h is shown as formula (1):
q, K, V denote "query", "key" and "value" in the multi-headed attention mechanism, respectively;
for the h attention head module in the L layer, participate in generating an attention diagram,the calculated "query", "key" and "value" are respectively expressed as;
d represents the attention head embedding dimension;
n represents the number of image blocks input into the ViT model;
t represents a visual Transformer network with an attention head of H;
the calculation of the multi-headed self-attention MSA is shown in equation (2):
h represents the number of attention heads;
the computational complexity of the parameters contained by equations (1) and (2) is shown in equation (3):
c represents the complexity of parameter calculation;
4NDHd represents the sum of the calculated amounts of the projection calculation;
the simultaneous parameter quantity is shown in formula (4):
p represents the number of parameters;
d represents the embedding dimension, D = Hd when the ViT model has not been pruned.
When the input sequence of the visual Transformer is a long-sequence scene, the computational complexity of self-attention is expressed as;
When the sequence length of the visual Transformer cannot dominate the complexity of all multi-headed attention modules, the computational complexity of self-attention is expressed as。
In this embodiment, as shown in FIG. 1, after the ViT model is pruned, the number of attention heads is pruned toThen the complexity after pruning is shown in equation (5):
the simultaneous parameter quantity is shown in formula (6):
in step B, an attention map is drawnRepresenting the attention map of the L-th layer and the attention head h, the information entropy of the attention map is shown as formula (7):
for the ith query image block, performing Softmax operation in the attention calculation, thenRepresenting the probability distribution of the key image blocks to the ith query coding block;
step C, removing all weight parameters related to the pruned attention head to obtain a new ViT model;
step D, fine-tuning parameters of the new ViT model;
the ViT model pruning method is shown in table 1 below:
table 1:
the ViT model pruning method based on attention force diagram can be used for deploying a neural network model in an embedded machine vision reasoning system with low power consumption and limited computing resources, the embedded machine vision reasoning system comprises an embedded computing board based on graphic processor acceleration and a neural network processor, the machine vision reasoning system can only provide computing resources which are equivalent to less than 20% of a high-performance GPU (graphics processing unit), and the machine vision reasoning system is very difficult to deploy due to the fact that storage and computing resources limit the vision Transformer model, after a pruning task is completed, requirements for storage, data bandwidth and computing resources can be reduced to the range of computing capacity of the embedded machine vision reasoning system, and edge end deployment of the vision Transformer model can be smoothly achieved;
the method treats the characteristics on the key dimension in the visual Transformer model multi-head self-attention module as probability distribution, further calculates information entropy to present uncertainty of attention, then reduces parameters and complexity of ViT model by deleting characteristic diagram with high uncertainty and corresponding attention head, reduces calculation complexity and parameters of ViT model, and finally achieves the aim of realizing the lightweight ViT model under the condition of limited loss of ViT model performance.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (9)
1. A visual transform model pruning method based on attention diagram is applied to a machine vision reasoning system and is characterized by comprising the following steps:
step A, in a machine vision reasoning system, performing a plurality of rounds of initial training on a ViT model through a data training network to generate a complete attention diagram;
b, calculating the information entropy of the attention diagram, and performing pruning operation on the attention head according to the calculated information entropy to measure the uncertainty of the attention diagram;
step C, removing all weight parameters related to the pruned attention head to obtain a new ViT model;
and D, re-fine-tuning the parameters of the new ViT model.
2. The visual fransformer model pruning method based on attention-deficit-map according to claim 1, wherein: in the step a, the ViT model splits an input image into N image blocks, and adds a class code to each image block, and then feeds the N image blocks with the additional class codes into an encoder similar to a common Transformer to form N image coding blocks.
3. The visual fransformer model pruning method based on the attention map as claimed in claim 2, wherein: the step A comprises the following steps:
a1, in the initial stage of ViT model training, the ViT model does not learn useful information, and at the moment, the attention is tried to be unordered and the attention is tried to have large information entropy;
a2, after a plurality of rounds of initial training are carried out on the ViT model, the ViT model learns basic information and starts to present a certain mode;
a3, in the final stage of ViT model training, when the ViT model converges, each attention head obtains an attention map, and at the moment, important image coding blocks are highly concerned by the attention head, so that the information entropy is reduced, and all the attention maps are the average result of one training round.
4. The visual fransformer model pruning method based on attention-deficit-map according to claim 1, wherein: in the step B, after the ViT model performs a plurality of initial training rounds, when the useful information learned by the attention head increases, the attention head focuses on the image coding block, so that the information entropy decreases, and the attention head has certainty; when the attention head learns less useful information, the attention head has a uniform attention to the whole world, so that the information entropy is increased, thereby generating large uncertainty, and the information entropy is used for measuring the uncertainty of the attention diagram.
5. The visual fransformer model pruning method based on attention-deficit hyperactivity disorder according to claim 4, wherein: in said step B, for the transform block, the multi-headed self-attention MSA and the multi-layered perceptron MLP are the main part of the cost of computing resources;
represents the input of the L-th layer, andthen, the attention calculation of the attention head h is shown as formula (1):
q, K, V denote "query", "key" and "value" in the multi-headed attention mechanism, respectively;
for the h attention head module in the L layer, participate in generating an attention diagram,the calculated "query", "key" and "value" are respectively expressed as;
d represents the attention head embedding dimension;
n represents the number of image blocks input into the ViT model;
t represents a visual Transformer network with an attention head of H;
the calculation of the multi-headed self-attention MSA is shown in equation (2):
h denotes the number of attention heads.
6. The visual fransformer model pruning method based on attention-deficit hyperactivity disorder according to claim 5, wherein: the computational complexity of the parameters contained by equations (1) and (2) is shown in equation (3):
c represents the parameter calculation complexity;
4NDHd represents the sum of the calculated amounts of the projection calculation;
the simultaneous parameter quantity is shown in formula (4):
p represents the number of parameters;
d represents the embedding dimension, D = Hd when the ViT model has not been pruned.
7. The visual fransformer model pruning method based on attention-deficit hyperactivity disorder according to claim 6, wherein: the computational complexity of self-attention when the input sequence of the visual Transformer is a long sequence scene is expressed as;
8. The visual fransformer model pruning method based on attention-deficit-map according to claim 7, wherein: after the ViT model is pruned, the number of attention points is pruned toThen the complexity after pruning is as shown in equation (5):
the simultaneous parameter quantity is shown in formula (6):
9. the visual fransformer model pruning method based on attention-deficit hyperactivity disorder according to claim 8, wherein: in said step B, the attention map is obtainedRepresenting the attention map of the L-th layer and the attention head h, the information entropy of the attention map is shown as formula (7):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211239440.3A CN115310607A (en) | 2022-10-11 | 2022-10-11 | Vision transform model pruning method based on attention diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211239440.3A CN115310607A (en) | 2022-10-11 | 2022-10-11 | Vision transform model pruning method based on attention diagram |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115310607A true CN115310607A (en) | 2022-11-08 |
Family
ID=83868361
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211239440.3A Pending CN115310607A (en) | 2022-10-11 | 2022-10-11 | Vision transform model pruning method based on attention diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115310607A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
-
2022
- 2022-10-11 CN CN202211239440.3A patent/CN115310607A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242282B (en) | Deep learning model training acceleration method based on end edge cloud cooperation | |
Tang et al. | Patch slimming for efficient vision transformers | |
WO2021004366A1 (en) | Neural network accelerator based on structured pruning and low-bit quantization, and method | |
CN111242180B (en) | Image identification method and system based on lightweight convolutional neural network | |
CN113595993B (en) | Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation | |
CN115310607A (en) | Vision transform model pruning method based on attention diagram | |
CN114580636A (en) | Neural network lightweight deployment method based on three-target joint optimization | |
CN113722980A (en) | Ocean wave height prediction method, system, computer equipment, storage medium and terminal | |
CN112036564A (en) | Pruning method, device and equipment of neural network and storage medium | |
CN108268950A (en) | Iterative neural network quantization method and system based on vector quantization | |
CN116977763A (en) | Model training method, device, computer readable storage medium and computer equipment | |
CN114861907A (en) | Data calculation method, device, storage medium and equipment | |
CN112132219A (en) | General deployment scheme of deep learning detection model based on mobile terminal | |
CN116820762A (en) | Bian Yun cooperative computing method based on power edge chip | |
CN115049786B (en) | Task-oriented point cloud data downsampling method and system | |
CN114492847B (en) | Efficient personalized federal learning system and method | |
CN114550277A (en) | Lightweight face recognition method and system | |
CN114372565A (en) | Target detection network compression method for edge device | |
CN117808083B (en) | Distributed training communication method, device, system, equipment and storage medium | |
CN113298248B (en) | Processing method and device for neural network model and electronic equipment | |
Xu et al. | LPViT: Low-Power Semi-structured Pruning for Vision Transformers | |
CN117892219A (en) | Photovoltaic output day-ahead prediction method and device based on classification learning | |
CN118194929A (en) | Self-adaptive optimization method and device of artificial intelligent model, electronic equipment and product | |
CN115687930A (en) | Edge calculation model training method based on automatic machine learning | |
Li et al. | Implementation and Applications of Neural Networks Based on FPGA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |