CN113963022B - Multi-outlet full convolution network target tracking method based on knowledge distillation - Google Patents
Multi-outlet full convolution network target tracking method based on knowledge distillation Download PDFInfo
- Publication number
- CN113963022B CN113963022B CN202111221017.6A CN202111221017A CN113963022B CN 113963022 B CN113963022 B CN 113963022B CN 202111221017 A CN202111221017 A CN 202111221017A CN 113963022 B CN113963022 B CN 113963022B
- Authority
- CN
- China
- Prior art keywords
- outlet
- outlets
- knowledge distillation
- channel
- temp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000013140 knowledge distillation Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004821 distillation Methods 0.000 claims description 4
- 230000035945 sensitivity Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 6
- 239000002243 precursor Substances 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target tracking method of a multi-outlet full convolution network based on knowledge distillation, which comprises the following steps: step one, constructing a multi-outlet full convolution network based on knowledge distillation; and step two, training a plurality of outlets based on knowledge distillation. The invention provides a multi-outlet full convolution structure based on knowledge distillation, which is used for tracking based on classification, and encourages the precursor outlets to imitate and learn the probability output of the subsequent outlets by virtue of the advantages of knowledge distillation, so that the discrimination capability of the precursor outlets is improved. The invention improves the discrimination capability by extracting the regional characteristics of different scales by utilizing a plurality of RoIAlignon layers and fusing the regional characteristics above each outlet. According to the invention, different kinds of attention modules are used for capturing different target specific information, so that the distinguishing capability of the target and the background and the interferents thereof is improved. The invention obtains higher tracking precision and simultaneously has relatively higher processing speed.
Description
Technical Field
The invention relates to a target tracking method, in particular to a target tracking method of a multi-outlet full convolution network of knowledge distillation.
Background
Convolutional Neural Networks (CNNs) have been successfully applied to visual target tracking tasks by virtue of their advantages in extracting high-level semantic feature representations. However, although CNN-based tracking methods can achieve good positioning accuracy, the processing speed of most methods is slow.
Disclosure of Invention
In order to better balance the speed and the precision of a CNN-based tracker, the invention provides a target tracking method of a multi-outlet full convolution network based on knowledge distillation.
The invention aims at realizing the following technical scheme:
a target tracking method of a multi-outlet full convolution network based on knowledge distillation comprises the steps of firstly selecting the first three convolution layers of a pretrained VGG-M model on an ImageNet, and embedding two MIN modules into a first convolution layer and a second convolution layer respectively so as to increase nonlinear representation of characteristics and relieve gradient disappearance caused by ReLU. The above three convolution layers and two MIN modules form a base network for extracting a characteristic representation of an input candidate sample. Then, three attention modules are introduced in the base network, including two residual attention modules and one channel attention module. Finally, three outlets are set in the basic network to respectively correspond to three video frames with different difficulties. The three outlets have the same structure and include one RoIAlign layer for extracting candidate region features, and two convolution layers (conv_exit_1 and conv_exit_2) for classifying candidate regions. The method specifically comprises the following steps:
step one, constructing a multi-outlet full convolution network based on knowledge distillation, wherein the specific construction steps are as follows:
(1) Selecting the first three convolution layers of the VGG-M pre-training network, respectively embedding two MIN modules into the first convolution layer and the second convolution layer to increase the nonlinearity of the characteristic representation, and meanwhile, the influence caused by gradient disappearance, wherein the three convolution layers and the two MIN modules form a basic network together;
the overall flow of the MIN module is as follows:
wherein ,xi,j Is an input centered on coordinates (i, j), ch is the channel index of feature F, w and b represent feature weights and offsets, respectively, F is constructed by taking the maximum of k maxout hidden layer portions, with the maxout cell being the most across the channelA pooling layer which selects the maximum output to be input to the next layer, and in addition, introduces a normalized BN layer to avoid the influence caused by the data distribution difference;
(2) On the basis of a basic network, three attention modules are added to increase discrimination capability of feature representation, wherein the discrimination capability comprises two residual attention modules and a channel attention module, one channel attention module is added after the second residual attention module to enhance sensitivity of a channel to distinguishing a target and a background, the channel attention module takes a feature F as an input of the channel attention module, spatial information of the F is removed through global pooling operation, a channel dependency relationship is obtained through two fully connected layers, and a channel weight w is calculated by utilizing a sigmoid function c Output F C (x) Multiplying the channel weight w by F (x) c The method comprises the following steps:
F C (x)=w c ·F(x);
the mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature,/-> and />Respectively representing a bitwise multiply and add operation;
(3) Three outlets are arranged in the whole network, each outlet has the same structure, wherein the three outlets comprise a region feature extraction layer for extracting features corresponding to each RoI region, and two convolution layers Conv_Exit_1 and Conv_Exit_2 are used for dividing candidate samples into targets and backgrounds;
step two, training a plurality of outlets based on knowledge distillation, wherein the specific steps are as follows:
(1) Given a teacher classifier t and a student classifier s learned from t, the learning process is optimized by minimizing the cross entropy of its output:
[s 1/temp (x)] c =softmax(s(x)/temp),
[t 1/temp (x)] c =softmax(t(x)/temp),
wherein t (x) and s (x) represent predictions of t and s, respectively, temp is a temperature parameter, [ t ] 1/temp (x)] c and [s1/temp (x)] c Soft predictions of t and s, respectively, C representing the number of categories;
(2) The whole model is obtained by minimizing the classification loss L cls And distillation loss L in a multiple outlet configuration dis And (3) optimizing:
L=L cls +aL dis ,
wherein a is a superparameter for balancing the two losses, L dis The definition is as follows:
where Ex is the number of outlets, T (e) e Ex represents the set of teacher outlets, cf (·) represents the classifier corresponding to each outlet.
Compared with the prior art, the invention has the following advantages:
1. the invention provides a multi-outlet full convolution structure based on knowledge distillation, which is used for tracking based on classification, and encourages the precursor outlets to imitate and learn the probability output of the subsequent outlets by virtue of the advantages of knowledge distillation, so that the discrimination capability of the precursor outlets is improved.
2. The invention improves the discrimination capability by extracting the regional characteristics of different scales by utilizing a plurality of RoIAlignon layers and fusing the regional characteristics above each outlet.
3. According to the invention, different kinds of attention modules are used for capturing different target specific information, so that the distinguishing capability of the target and the background and the interferents thereof is improved.
4. Compared with the mainstream tracking method based on classification, the method provided by the invention has higher tracking precision and relatively higher processing speed.
Drawings
FIG. 1 is a flow chart of a method for target tracking for a multi-outlet full convolution network of knowledge distillation in accordance with the present invention;
FIG. 2 is an example of a simple, medium, and difficult frame;
FIG. 3 is a graph of output statistics for each outlet;
FIG. 4 is a comparison of the method of the present invention and other mainstream target tracking methods in an OTB-100 dataset;
FIG. 5 is a comparison of the method of the present invention and other mainstream target tracking methods at a UAV123 dataset;
figure 6 is a statistic of the output of each outlet at 4 data sets.
Detailed Description
The following description of the present invention is provided with reference to the accompanying drawings, but is not limited to the following description, and any modifications or equivalent substitutions of the present invention should be included in the scope of the present invention without departing from the spirit and scope of the present invention.
The invention provides a target tracking method based on knowledge distillation for multi-outlet full convolution, which is named DMENT. In DMENet, different types of attention mechanisms are embedded into different levels of the full convolutional network to capture more discriminative feature representations. And, three additional outlets are added in the full convolution network to obtain accurate estimation of the target position in the current frame as soon as possible. The entire DMENet is trained by a strategy of knowledge distillation to improve the accuracy of preamble exit. Each of the outlets has a confidence score for deciding whether the processing of the video frame needs to end at the current outlet or need to be passed to an upper layer outlet.
Fig. 1 shows the overall structure of the entire network, which can be divided into three parts, specifically as follows:
the first part is a determination of the number of outlets. To determine the appropriate number of outlets, assume that the target difficulty in a video sequence can be divided into three categories: simple (the change in the appearance of the object is relatively small), medium (the change in the appearance of the object is fast but not severe) and difficult (the change in the appearance of the object is relatively severe). Here, it is assumed that the above three kinds of targets with different difficulties can be located using the low, middle and high layer features, respectively. To verify this assumption, a verification is performed on the OTB-100 dataset.
In the OTB-100 dataset, each video frame is classified as: simple, medium and difficult three categories. The classification basis for the different classes is the average overlap ratio of the output prediction frames of the 12 tracking methods. The average overlapping rate threshold corresponding to the medium and difficult video frames is 0.7 and 0.5 respectively, and the threshold corresponding to the simple and difficult video frames is more than or equal to 0.7. As shown in fig. 2, examples of some simple, medium and difficult frames are shown.
To count the actual output ratio of each outlet, three outlets of the network were trained without knowledge distillation. At each exit, a confidence score is set to determine whether to locate the target of the current frame at that exit (high confidence) or to proceed to the next exit (low confidence). That is, only if the confidence score for the current outlet reaches a threshold, the position prediction of the target may be output at this outlet. Fig. 3 shows statistics of simple/medium/difficult frames output at the first/second/third outlets, which justifies the assumption.
The second part is a network structure, as shown in fig. 1, first three convolution layers of the VGG-M model pre-trained on ImageNet are selected, and after two MIN modules are respectively embedded into the first and second convolution layers, the nonlinear representation of the features is increased, and the gradient vanishing problem caused by ReLU is relieved. The three convolution layers and the two MIN modules form a basic network for extracting the characteristic representation of the target. Then, three attention modules are introduced in the base network, including two residual attention modules and one channel attention module. Finally, three outlets are set in the basic network to respectively correspond to video frames with three difficulties. The three outlets have the same structure: one RoIAlign layer is used to extract candidate region features, and two convolution layers (conv_exit_1 and conv_exit_2) are used to classify candidate regions. Details of the MIN module, the attention module, and the outlet are as follows.
MIN module: while classification-based tracking methods possess good accuracy, there are still some problems: (1) discrimination capability of the model; (2) gradient extinction and saturation problems during training. In most classification-based approaches, a feature representation of the object is extracted through a lightweight network that cannot cope with nonlinear changes in the object. Furthermore, a constant of 0 will block the gradient of the non-activated ReLU, causing the gradient to disappear. Also, changes in the data distribution during the training phase may saturate the activation function, which slows down the training process (especially during the online update phase).
To solve this problem, the present invention proposes to embed two MIN modules after the first and second convolutional layers, respectively. First, a two-layer multi-layer perceptron (MLP) is employed to increase local nonlinearity. After each MLP, there is one maxout unit to overcome the vanishing gradient problem caused when using ReLU. Wherein the math of the maxout unit is expressed as follows:
wherein ,xi,j Is an input centered on (i, j), ch is the channel index of feature F, and w and b represent feature weights and offsets, respectively. F is constructed by taking the maximum of k maxout hidden parts. The maxout cell acts as a maximum pooling layer across channels, which selects the maximum output to input to the next layer. In addition, a normalized BN layer is introduced to avoid the effects of data distribution differences. The overall flow of the MIN module is as follows:
attention module: since the final object of the present invention is to stop the processing of the current frame as early as possible in the tracking process, the following exit should be more discriminant to guide the exit of the preamble. Therefore, two residual attention modules are added in the base network to enhance the discrimination capability of deep features. First, a max pooling layer (max pooling) is used to expand the receptive field to capture global features. Second, the spatial resolution is extended to the original spatial resolution using bilinear interpolation operations. The mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature. /> and />Representing the bitwise multiply and add operations, respectively.
After the second residual attention module, a channel attention module is added to enhance the sensitivity of the channel to distinguish between the object and the background. The channel attention module takes the feature F as its input and removes the spatial information of F through a global pooling operation. Then, the channel dependency is obtained through the two fully connected layers. Then, the channel weight w is calculated using a sigmoid function c . Output F C (x) Multiplying the channel weight w by F (x) c The method comprises the following steps:
F C (x)=w c ·F(x), (4)。
and (3) an outlet: in the multiple output architecture of the present invention, each of the outputs comprises a RoIAlign layer and two convolution layers (conv_exit_1 and conv_exit_2), the outputs of which comprise two nodes corresponding to the target and background, respectively. Each outlet can be seen as a binary classifier. The output sizes of conv_exit_1 and conv_exit_2 are 3×3×128 and 1×1×2, respectively. The confidence score for an exit is used to decide whether to locate a target with high confidence at that exit or to proceed to the next exit for further processing.
In most classification-based tracking methods, region (RoI) features are typically extracted on high-level features, however the high-level lacks detailed information for accurately locating the target. To supplement the detail features in the region features, the present invention proposes to superimpose the region features of the preamble exit onto the current exit. Specifically, the region feature may be expressed as 3×3×ch, ch representing the number of feature channels. For the current outlet, the regional features of the preamble outlet are serially connected along the channel axis.
The third part is multi-outlet training based on knowledge distillation, given a teacher classifier t and student classifiers s learned from t, the learning process can be optimized by minimizing the cross entropy of its output:
wherein t (x) and s (x) represent predictions of t and s, respectively. temp is a temperature parameter used to control the softness of the teacher's t output. [ t ] 1/temp (x)] c and [s1/temp (x)] c Representing soft predictions of t and s, respectively. C represents the number of categories. The distillation loss of the multi-outlet structure is then defined as follows:
where Ex is the number of outlets. T (e) εEx represents the set of teacher outlets. Here, all outlets are set to learn for the last outlet. cf (·) represents the classifier corresponding to each outlet. Finally, the whole model is obtained by minimizing the classification loss L cls and Ldis And (3) optimizing:
L=L cls +aL dis , (7);
where a is a super parameter used to balance the two losses. In the experimental work, a=1 was set.
4. Experimental results
The invention is implemented by Pytorch and runs on a machine equipped with Intel (R) 4790k CPU and an NvidiaTeslaK40c GPU. For the offline training phase, training was performed using the ImageNet-Vid dataset. 8 video frames are randomly selected in a given video, and 64 positive samples and 192 negative samples are taken in each video frame. Given a marking frame, the collection threshold of the positive sample is more than or equal to 0.7, and the collection range of the negative sample is 0 to 0.5. Training was performed for 1000 cycles at a learning rate of 0.0001. For the online training phase, 500 positive samples and 5000 negative samples are taken in the first frame to initialize the model. And 96 positive samples and 192 negative samples are collected when the estimated position of the current frame is obtained. After every 10 frames, the model was trained using the positive and negative samples collected.
In validating the performance of the present invention, it was named DMENet and four public data sets (OTB-100, UAV123, laSOT and VOT 2018) were used to evaluate the performance.
Figure 4 shows the results of a comparison of the method of the present invention with other 11 mainstream tracking methods on an OTB-100 dataset. The comparison method comprises the following steps: VITAL, siamRPN ++, MDNet, KYS, diMP, prDiMP, DAT, daSiamRPN, ATOM, TRACA, and UDT. As shown in fig. 4, DMENet achieves the highest Success rate (Success) score. Meanwhile, the accuracy (Precision) and the success rate of the DMENT are both higher than those of the current tracking method VITAL based on classification.
Unlike the video capture of OTB-100 from real life, the video of UAV123 is captured from the drone platform. The results of comparing dment with other mainstream methods on UAV123 dataset are shown in fig. 5, and it can be seen from fig. 5 that dment achieves competitive results in all comparative tracking methods.
The LaSOT dataset consists of 1400 video sequences. In this dataset, the tracking method is evaluated mainly in terms of Success rate (Success). All methods were tested on a test set containing 280 videos. Table 1 shows the success rate of each method. As shown in table 1, the success rate value of DMENet is far higher than other class-based tracking methods, i.e., VITAL and MDNet.
Table 1 comparison of success rates on LaSOT dataset
The VOT2018 dataset contains 60 video sequences, and the evaluation criteria include: accuracy (Ar), robustness (Rr) and desired average overlap ratio (EAO). As shown in table 2, DMENet ranks higher among all the comparative tracking methods, with competitive results.
Table 2 comparison results of vot2018
Figure 6 counts the output of each outlet at the presence or absence of known distillation. Where E represents outlets and E w/Dis represents each outlet trained by knowledge distillation. As can be seen from fig. 6, in the case of the knowledge distillation, the output of the preamble exit increases more, and the operation speed of the algorithm is increased.
Claims (4)
1. A method for tracking a target of a multi-outlet full convolution network based on knowledge distillation, the method comprising the steps of:
step one, constructing a multi-outlet full convolution network based on knowledge distillation, wherein the specific construction steps are as follows:
(1) Selecting the first three convolution layers of the VGG-M pre-training network, respectively embedding two MIN modules into the first convolution layer and the second convolution layer, and forming a basic network by the three convolution layers and the two MIN modules together;
(2) On the basis of the basic network, three attention modules are added to increase the discrimination capability of the feature representation, wherein the discrimination capability comprises two residual attention modules and one channel attention module, and the second residual attention module is used for injectingAfter the intention module, a channel attention module is added to enhance the sensitivity of the channel to distinguishing the target and the background, the channel attention module takes the characteristic F as the input of the channel attention module, removes the space information of the F through global pooling operation, obtains the channel dependency relationship through two fully connected layers, and calculates the channel weight w by using a sigmoid function c Output F C (x) Multiplying the channel weight w by F (x) c Obtaining;
(3) Three outlets are arranged in the whole network, each outlet has the same structure, wherein the three outlets comprise a region feature extraction layer for extracting features corresponding to each RoI region, and two convolution layers Conv_Exit_1 and Conv_Exit_2 are used for dividing candidate samples into targets and backgrounds;
step two, training a plurality of outlets based on knowledge distillation, wherein the specific steps are as follows:
(1) Given a teacher classifier t and a student classifier s learned from t, the learning process is optimized by minimizing the cross entropy of its output:
[s 1/temp (x)] c =softmax(s(x)/temp),
[t 1/temp (x)] c =softmax(t(x)/temp),
wherein t (x) and s (x) represent predictions of t and s, respectively, temp is a temperature parameter, [ t ] 1/temp (x)] c and [s1/temp (x)] c Soft predictions of t and s, respectively, C representing the number of categories;
(2) The whole model is obtained by minimizing the classification loss L cls And distillation loss L in a multiple outlet configuration dis And (3) optimizing:
L=L cls +aL dis ,
where a is a super parameter used to balance the two losses.
2. The target tracking method for a knowledge distillation based multi-outlet full convolution network according to claim 1, wherein the overall flow of the MIN module is as follows:
wherein ,xi,j Is an input centered on coordinates (i, j), ch is the channel index of feature F, and w and b represent feature weights and offsets, respectively.
3. The method for target tracking for a knowledge distillation based multi-outlet full convolution network according to claim 1, characterized in that the mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature,/-> and />Representing the bitwise multiply and add operations, respectively.
4. Knowledge-based as claimed in claim 1A target tracking method of a distillation-aware multi-outlet full convolution network is characterized in that L is as follows dis The definition is as follows:
where Ex is the number of outlets, T (e) e Ex represents the set of teacher outlets, cf (·) represents the classifier corresponding to each outlet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111221017.6A CN113963022B (en) | 2021-10-20 | 2021-10-20 | Multi-outlet full convolution network target tracking method based on knowledge distillation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111221017.6A CN113963022B (en) | 2021-10-20 | 2021-10-20 | Multi-outlet full convolution network target tracking method based on knowledge distillation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113963022A CN113963022A (en) | 2022-01-21 |
CN113963022B true CN113963022B (en) | 2023-08-18 |
Family
ID=79465705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111221017.6A Active CN113963022B (en) | 2021-10-20 | 2021-10-20 | Multi-outlet full convolution network target tracking method based on knowledge distillation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113963022B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409500A (en) * | 2018-09-21 | 2019-03-01 | 清华大学 | The model accelerating method and device of knowledge based distillation and nonparametric convolution |
WO2021023202A1 (en) * | 2019-08-07 | 2021-02-11 | 交叉信息核心技术研究院(西安)有限公司 | Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method |
CN112651998A (en) * | 2021-01-18 | 2021-04-13 | 沈阳航空航天大学 | Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network |
CN113095480A (en) * | 2021-03-24 | 2021-07-09 | 重庆邮电大学 | Interpretable graph neural network representation method based on knowledge distillation |
CN113159073A (en) * | 2021-04-23 | 2021-07-23 | 上海芯翌智能科技有限公司 | Knowledge distillation method and device, storage medium and terminal |
CN113255899A (en) * | 2021-06-17 | 2021-08-13 | 之江实验室 | Knowledge distillation method and system with self-correlation of channels |
CN113326941A (en) * | 2021-06-25 | 2021-08-31 | 江苏大学 | Knowledge distillation method, device and equipment based on multilayer multi-attention migration |
CN113449680A (en) * | 2021-07-15 | 2021-09-28 | 北京理工大学 | Knowledge distillation-based multimode small target detection method |
CN113470036A (en) * | 2021-09-02 | 2021-10-01 | 湖南大学 | Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation |
-
2021
- 2021-10-20 CN CN202111221017.6A patent/CN113963022B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409500A (en) * | 2018-09-21 | 2019-03-01 | 清华大学 | The model accelerating method and device of knowledge based distillation and nonparametric convolution |
WO2021023202A1 (en) * | 2019-08-07 | 2021-02-11 | 交叉信息核心技术研究院(西安)有限公司 | Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method |
CN112651998A (en) * | 2021-01-18 | 2021-04-13 | 沈阳航空航天大学 | Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network |
CN113095480A (en) * | 2021-03-24 | 2021-07-09 | 重庆邮电大学 | Interpretable graph neural network representation method based on knowledge distillation |
CN113159073A (en) * | 2021-04-23 | 2021-07-23 | 上海芯翌智能科技有限公司 | Knowledge distillation method and device, storage medium and terminal |
CN113255899A (en) * | 2021-06-17 | 2021-08-13 | 之江实验室 | Knowledge distillation method and system with self-correlation of channels |
CN113326941A (en) * | 2021-06-25 | 2021-08-31 | 江苏大学 | Knowledge distillation method, device and equipment based on multilayer multi-attention migration |
CN113449680A (en) * | 2021-07-15 | 2021-09-28 | 北京理工大学 | Knowledge distillation-based multimode small target detection method |
CN113470036A (en) * | 2021-09-02 | 2021-10-01 | 湖南大学 | Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation |
Non-Patent Citations (1)
Title |
---|
基于知识蒸馏的轻量型浮游植物检测网络;张彤彤;董军宇;赵浩然;李琼;孙鑫;;应用科学学报(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113963022A (en) | 2022-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
CN108805083B (en) | Single-stage video behavior detection method | |
CN110516536B (en) | Weak supervision video behavior detection method based on time sequence class activation graph complementation | |
US10002290B2 (en) | Learning device and learning method for object detection | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
US7783581B2 (en) | Data learning system for identifying, learning apparatus, identifying apparatus and learning method | |
CN110276248B (en) | Facial expression recognition method based on sample weight distribution and deep learning | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN111680706A (en) | Double-channel output contour detection method based on coding and decoding structure | |
CN110532920A (en) | Smallest number data set face identification method based on FaceNet method | |
CN104200237A (en) | High speed automatic multi-target tracking method based on coring relevant filtering | |
CN104504362A (en) | Face detection method based on convolutional neural network | |
CN102385592B (en) | Image concept detection method and device | |
CN112800876A (en) | Method and system for embedding hypersphere features for re-identification | |
CN110633727A (en) | Deep neural network ship target fine-grained identification method based on selective search | |
CN114842343A (en) | ViT-based aerial image identification method | |
CN113807176A (en) | Small sample video behavior identification method based on multi-knowledge fusion | |
CN110503090B (en) | Character detection network training method based on limited attention model, character detection method and character detector | |
CN115063664A (en) | Model learning method, training method and system for industrial vision detection | |
Silva et al. | Online weighted one-class ensemble for feature selection in background/foreground separation | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN111242114B (en) | Character recognition method and device | |
CN110334703B (en) | Ship detection and identification method in day and night image | |
CN109815887B (en) | Multi-agent cooperation-based face image classification method under complex illumination | |
CN113963022B (en) | Multi-outlet full convolution network target tracking method based on knowledge distillation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |