CN113963022B - Multi-outlet full convolution network target tracking method based on knowledge distillation - Google Patents

Multi-outlet full convolution network target tracking method based on knowledge distillation Download PDF

Info

Publication number
CN113963022B
CN113963022B CN202111221017.6A CN202111221017A CN113963022B CN 113963022 B CN113963022 B CN 113963022B CN 202111221017 A CN202111221017 A CN 202111221017A CN 113963022 B CN113963022 B CN 113963022B
Authority
CN
China
Prior art keywords
outlet
outlets
knowledge distillation
channel
temp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111221017.6A
Other languages
Chinese (zh)
Other versions
CN113963022A (en
Inventor
邬向前
卜巍
马丁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111221017.6A priority Critical patent/CN113963022B/en
Publication of CN113963022A publication Critical patent/CN113963022A/en
Application granted granted Critical
Publication of CN113963022B publication Critical patent/CN113963022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method of a multi-outlet full convolution network based on knowledge distillation, which comprises the following steps: step one, constructing a multi-outlet full convolution network based on knowledge distillation; and step two, training a plurality of outlets based on knowledge distillation. The invention provides a multi-outlet full convolution structure based on knowledge distillation, which is used for tracking based on classification, and encourages the precursor outlets to imitate and learn the probability output of the subsequent outlets by virtue of the advantages of knowledge distillation, so that the discrimination capability of the precursor outlets is improved. The invention improves the discrimination capability by extracting the regional characteristics of different scales by utilizing a plurality of RoIAlignon layers and fusing the regional characteristics above each outlet. According to the invention, different kinds of attention modules are used for capturing different target specific information, so that the distinguishing capability of the target and the background and the interferents thereof is improved. The invention obtains higher tracking precision and simultaneously has relatively higher processing speed.

Description

Multi-outlet full convolution network target tracking method based on knowledge distillation
Technical Field
The invention relates to a target tracking method, in particular to a target tracking method of a multi-outlet full convolution network of knowledge distillation.
Background
Convolutional Neural Networks (CNNs) have been successfully applied to visual target tracking tasks by virtue of their advantages in extracting high-level semantic feature representations. However, although CNN-based tracking methods can achieve good positioning accuracy, the processing speed of most methods is slow.
Disclosure of Invention
In order to better balance the speed and the precision of a CNN-based tracker, the invention provides a target tracking method of a multi-outlet full convolution network based on knowledge distillation.
The invention aims at realizing the following technical scheme:
a target tracking method of a multi-outlet full convolution network based on knowledge distillation comprises the steps of firstly selecting the first three convolution layers of a pretrained VGG-M model on an ImageNet, and embedding two MIN modules into a first convolution layer and a second convolution layer respectively so as to increase nonlinear representation of characteristics and relieve gradient disappearance caused by ReLU. The above three convolution layers and two MIN modules form a base network for extracting a characteristic representation of an input candidate sample. Then, three attention modules are introduced in the base network, including two residual attention modules and one channel attention module. Finally, three outlets are set in the basic network to respectively correspond to three video frames with different difficulties. The three outlets have the same structure and include one RoIAlign layer for extracting candidate region features, and two convolution layers (conv_exit_1 and conv_exit_2) for classifying candidate regions. The method specifically comprises the following steps:
step one, constructing a multi-outlet full convolution network based on knowledge distillation, wherein the specific construction steps are as follows:
(1) Selecting the first three convolution layers of the VGG-M pre-training network, respectively embedding two MIN modules into the first convolution layer and the second convolution layer to increase the nonlinearity of the characteristic representation, and meanwhile, the influence caused by gradient disappearance, wherein the three convolution layers and the two MIN modules form a basic network together;
the overall flow of the MIN module is as follows:
wherein ,xi,j Is an input centered on coordinates (i, j), ch is the channel index of feature F, w and b represent feature weights and offsets, respectively, F is constructed by taking the maximum of k maxout hidden layer portions, with the maxout cell being the most across the channelA pooling layer which selects the maximum output to be input to the next layer, and in addition, introduces a normalized BN layer to avoid the influence caused by the data distribution difference;
(2) On the basis of a basic network, three attention modules are added to increase discrimination capability of feature representation, wherein the discrimination capability comprises two residual attention modules and a channel attention module, one channel attention module is added after the second residual attention module to enhance sensitivity of a channel to distinguishing a target and a background, the channel attention module takes a feature F as an input of the channel attention module, spatial information of the F is removed through global pooling operation, a channel dependency relationship is obtained through two fully connected layers, and a channel weight w is calculated by utilizing a sigmoid function c Output F C (x) Multiplying the channel weight w by F (x) c The method comprises the following steps:
F C (x)=w c ·F(x);
the mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature,/-> and />Respectively representing a bitwise multiply and add operation;
(3) Three outlets are arranged in the whole network, each outlet has the same structure, wherein the three outlets comprise a region feature extraction layer for extracting features corresponding to each RoI region, and two convolution layers Conv_Exit_1 and Conv_Exit_2 are used for dividing candidate samples into targets and backgrounds;
step two, training a plurality of outlets based on knowledge distillation, wherein the specific steps are as follows:
(1) Given a teacher classifier t and a student classifier s learned from t, the learning process is optimized by minimizing the cross entropy of its output:
[s 1/temp (x)] c =softmax(s(x)/temp),
[t 1/temp (x)] c =softmax(t(x)/temp),
wherein t (x) and s (x) represent predictions of t and s, respectively, temp is a temperature parameter, [ t ] 1/temp (x)] c and [s1/temp (x)] c Soft predictions of t and s, respectively, C representing the number of categories;
(2) The whole model is obtained by minimizing the classification loss L cls And distillation loss L in a multiple outlet configuration dis And (3) optimizing:
L=L cls +aL dis ,
wherein a is a superparameter for balancing the two losses, L dis The definition is as follows:
where Ex is the number of outlets, T (e) e Ex represents the set of teacher outlets, cf (·) represents the classifier corresponding to each outlet.
Compared with the prior art, the invention has the following advantages:
1. the invention provides a multi-outlet full convolution structure based on knowledge distillation, which is used for tracking based on classification, and encourages the precursor outlets to imitate and learn the probability output of the subsequent outlets by virtue of the advantages of knowledge distillation, so that the discrimination capability of the precursor outlets is improved.
2. The invention improves the discrimination capability by extracting the regional characteristics of different scales by utilizing a plurality of RoIAlignon layers and fusing the regional characteristics above each outlet.
3. According to the invention, different kinds of attention modules are used for capturing different target specific information, so that the distinguishing capability of the target and the background and the interferents thereof is improved.
4. Compared with the mainstream tracking method based on classification, the method provided by the invention has higher tracking precision and relatively higher processing speed.
Drawings
FIG. 1 is a flow chart of a method for target tracking for a multi-outlet full convolution network of knowledge distillation in accordance with the present invention;
FIG. 2 is an example of a simple, medium, and difficult frame;
FIG. 3 is a graph of output statistics for each outlet;
FIG. 4 is a comparison of the method of the present invention and other mainstream target tracking methods in an OTB-100 dataset;
FIG. 5 is a comparison of the method of the present invention and other mainstream target tracking methods at a UAV123 dataset;
figure 6 is a statistic of the output of each outlet at 4 data sets.
Detailed Description
The following description of the present invention is provided with reference to the accompanying drawings, but is not limited to the following description, and any modifications or equivalent substitutions of the present invention should be included in the scope of the present invention without departing from the spirit and scope of the present invention.
The invention provides a target tracking method based on knowledge distillation for multi-outlet full convolution, which is named DMENT. In DMENet, different types of attention mechanisms are embedded into different levels of the full convolutional network to capture more discriminative feature representations. And, three additional outlets are added in the full convolution network to obtain accurate estimation of the target position in the current frame as soon as possible. The entire DMENet is trained by a strategy of knowledge distillation to improve the accuracy of preamble exit. Each of the outlets has a confidence score for deciding whether the processing of the video frame needs to end at the current outlet or need to be passed to an upper layer outlet.
Fig. 1 shows the overall structure of the entire network, which can be divided into three parts, specifically as follows:
the first part is a determination of the number of outlets. To determine the appropriate number of outlets, assume that the target difficulty in a video sequence can be divided into three categories: simple (the change in the appearance of the object is relatively small), medium (the change in the appearance of the object is fast but not severe) and difficult (the change in the appearance of the object is relatively severe). Here, it is assumed that the above three kinds of targets with different difficulties can be located using the low, middle and high layer features, respectively. To verify this assumption, a verification is performed on the OTB-100 dataset.
In the OTB-100 dataset, each video frame is classified as: simple, medium and difficult three categories. The classification basis for the different classes is the average overlap ratio of the output prediction frames of the 12 tracking methods. The average overlapping rate threshold corresponding to the medium and difficult video frames is 0.7 and 0.5 respectively, and the threshold corresponding to the simple and difficult video frames is more than or equal to 0.7. As shown in fig. 2, examples of some simple, medium and difficult frames are shown.
To count the actual output ratio of each outlet, three outlets of the network were trained without knowledge distillation. At each exit, a confidence score is set to determine whether to locate the target of the current frame at that exit (high confidence) or to proceed to the next exit (low confidence). That is, only if the confidence score for the current outlet reaches a threshold, the position prediction of the target may be output at this outlet. Fig. 3 shows statistics of simple/medium/difficult frames output at the first/second/third outlets, which justifies the assumption.
The second part is a network structure, as shown in fig. 1, first three convolution layers of the VGG-M model pre-trained on ImageNet are selected, and after two MIN modules are respectively embedded into the first and second convolution layers, the nonlinear representation of the features is increased, and the gradient vanishing problem caused by ReLU is relieved. The three convolution layers and the two MIN modules form a basic network for extracting the characteristic representation of the target. Then, three attention modules are introduced in the base network, including two residual attention modules and one channel attention module. Finally, three outlets are set in the basic network to respectively correspond to video frames with three difficulties. The three outlets have the same structure: one RoIAlign layer is used to extract candidate region features, and two convolution layers (conv_exit_1 and conv_exit_2) are used to classify candidate regions. Details of the MIN module, the attention module, and the outlet are as follows.
MIN module: while classification-based tracking methods possess good accuracy, there are still some problems: (1) discrimination capability of the model; (2) gradient extinction and saturation problems during training. In most classification-based approaches, a feature representation of the object is extracted through a lightweight network that cannot cope with nonlinear changes in the object. Furthermore, a constant of 0 will block the gradient of the non-activated ReLU, causing the gradient to disappear. Also, changes in the data distribution during the training phase may saturate the activation function, which slows down the training process (especially during the online update phase).
To solve this problem, the present invention proposes to embed two MIN modules after the first and second convolutional layers, respectively. First, a two-layer multi-layer perceptron (MLP) is employed to increase local nonlinearity. After each MLP, there is one maxout unit to overcome the vanishing gradient problem caused when using ReLU. Wherein the math of the maxout unit is expressed as follows:
wherein ,xi,j Is an input centered on (i, j), ch is the channel index of feature F, and w and b represent feature weights and offsets, respectively. F is constructed by taking the maximum of k maxout hidden parts. The maxout cell acts as a maximum pooling layer across channels, which selects the maximum output to input to the next layer. In addition, a normalized BN layer is introduced to avoid the effects of data distribution differences. The overall flow of the MIN module is as follows:
attention module: since the final object of the present invention is to stop the processing of the current frame as early as possible in the tracking process, the following exit should be more discriminant to guide the exit of the preamble. Therefore, two residual attention modules are added in the base network to enhance the discrimination capability of deep features. First, a max pooling layer (max pooling) is used to expand the receptive field to capture global features. Second, the spatial resolution is extended to the original spatial resolution using bilinear interpolation operations. The mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature. /> and />Representing the bitwise multiply and add operations, respectively.
After the second residual attention module, a channel attention module is added to enhance the sensitivity of the channel to distinguish between the object and the background. The channel attention module takes the feature F as its input and removes the spatial information of F through a global pooling operation. Then, the channel dependency is obtained through the two fully connected layers. Then, the channel weight w is calculated using a sigmoid function c . Output F C (x) Multiplying the channel weight w by F (x) c The method comprises the following steps:
F C (x)=w c ·F(x), (4)。
and (3) an outlet: in the multiple output architecture of the present invention, each of the outputs comprises a RoIAlign layer and two convolution layers (conv_exit_1 and conv_exit_2), the outputs of which comprise two nodes corresponding to the target and background, respectively. Each outlet can be seen as a binary classifier. The output sizes of conv_exit_1 and conv_exit_2 are 3×3×128 and 1×1×2, respectively. The confidence score for an exit is used to decide whether to locate a target with high confidence at that exit or to proceed to the next exit for further processing.
In most classification-based tracking methods, region (RoI) features are typically extracted on high-level features, however the high-level lacks detailed information for accurately locating the target. To supplement the detail features in the region features, the present invention proposes to superimpose the region features of the preamble exit onto the current exit. Specifically, the region feature may be expressed as 3×3×ch, ch representing the number of feature channels. For the current outlet, the regional features of the preamble outlet are serially connected along the channel axis.
The third part is multi-outlet training based on knowledge distillation, given a teacher classifier t and student classifiers s learned from t, the learning process can be optimized by minimizing the cross entropy of its output:
wherein t (x) and s (x) represent predictions of t and s, respectively. temp is a temperature parameter used to control the softness of the teacher's t output. [ t ] 1/temp (x)] c and [s1/temp (x)] c Representing soft predictions of t and s, respectively. C represents the number of categories. The distillation loss of the multi-outlet structure is then defined as follows:
where Ex is the number of outlets. T (e) εEx represents the set of teacher outlets. Here, all outlets are set to learn for the last outlet. cf (·) represents the classifier corresponding to each outlet. Finally, the whole model is obtained by minimizing the classification loss L cls and Ldis And (3) optimizing:
L=L cls +aL dis , (7);
where a is a super parameter used to balance the two losses. In the experimental work, a=1 was set.
4. Experimental results
The invention is implemented by Pytorch and runs on a machine equipped with Intel (R) 4790k CPU and an NvidiaTeslaK40c GPU. For the offline training phase, training was performed using the ImageNet-Vid dataset. 8 video frames are randomly selected in a given video, and 64 positive samples and 192 negative samples are taken in each video frame. Given a marking frame, the collection threshold of the positive sample is more than or equal to 0.7, and the collection range of the negative sample is 0 to 0.5. Training was performed for 1000 cycles at a learning rate of 0.0001. For the online training phase, 500 positive samples and 5000 negative samples are taken in the first frame to initialize the model. And 96 positive samples and 192 negative samples are collected when the estimated position of the current frame is obtained. After every 10 frames, the model was trained using the positive and negative samples collected.
In validating the performance of the present invention, it was named DMENet and four public data sets (OTB-100, UAV123, laSOT and VOT 2018) were used to evaluate the performance.
Figure 4 shows the results of a comparison of the method of the present invention with other 11 mainstream tracking methods on an OTB-100 dataset. The comparison method comprises the following steps: VITAL, siamRPN ++, MDNet, KYS, diMP, prDiMP, DAT, daSiamRPN, ATOM, TRACA, and UDT. As shown in fig. 4, DMENet achieves the highest Success rate (Success) score. Meanwhile, the accuracy (Precision) and the success rate of the DMENT are both higher than those of the current tracking method VITAL based on classification.
Unlike the video capture of OTB-100 from real life, the video of UAV123 is captured from the drone platform. The results of comparing dment with other mainstream methods on UAV123 dataset are shown in fig. 5, and it can be seen from fig. 5 that dment achieves competitive results in all comparative tracking methods.
The LaSOT dataset consists of 1400 video sequences. In this dataset, the tracking method is evaluated mainly in terms of Success rate (Success). All methods were tested on a test set containing 280 videos. Table 1 shows the success rate of each method. As shown in table 1, the success rate value of DMENet is far higher than other class-based tracking methods, i.e., VITAL and MDNet.
Table 1 comparison of success rates on LaSOT dataset
The VOT2018 dataset contains 60 video sequences, and the evaluation criteria include: accuracy (Ar), robustness (Rr) and desired average overlap ratio (EAO). As shown in table 2, DMENet ranks higher among all the comparative tracking methods, with competitive results.
Table 2 comparison results of vot2018
Figure 6 counts the output of each outlet at the presence or absence of known distillation. Where E represents outlets and E w/Dis represents each outlet trained by knowledge distillation. As can be seen from fig. 6, in the case of the knowledge distillation, the output of the preamble exit increases more, and the operation speed of the algorithm is increased.

Claims (4)

1. A method for tracking a target of a multi-outlet full convolution network based on knowledge distillation, the method comprising the steps of:
step one, constructing a multi-outlet full convolution network based on knowledge distillation, wherein the specific construction steps are as follows:
(1) Selecting the first three convolution layers of the VGG-M pre-training network, respectively embedding two MIN modules into the first convolution layer and the second convolution layer, and forming a basic network by the three convolution layers and the two MIN modules together;
(2) On the basis of the basic network, three attention modules are added to increase the discrimination capability of the feature representation, wherein the discrimination capability comprises two residual attention modules and one channel attention module, and the second residual attention module is used for injectingAfter the intention module, a channel attention module is added to enhance the sensitivity of the channel to distinguishing the target and the background, the channel attention module takes the characteristic F as the input of the channel attention module, removes the space information of the F through global pooling operation, obtains the channel dependency relationship through two fully connected layers, and calculates the channel weight w by using a sigmoid function c Output F C (x) Multiplying the channel weight w by F (x) c Obtaining;
(3) Three outlets are arranged in the whole network, each outlet has the same structure, wherein the three outlets comprise a region feature extraction layer for extracting features corresponding to each RoI region, and two convolution layers Conv_Exit_1 and Conv_Exit_2 are used for dividing candidate samples into targets and backgrounds;
step two, training a plurality of outlets based on knowledge distillation, wherein the specific steps are as follows:
(1) Given a teacher classifier t and a student classifier s learned from t, the learning process is optimized by minimizing the cross entropy of its output:
[s 1/temp (x)] c =softmax(s(x)/temp),
[t 1/temp (x)] c =softmax(t(x)/temp),
wherein t (x) and s (x) represent predictions of t and s, respectively, temp is a temperature parameter, [ t ] 1/temp (x)] c and [s1/temp (x)] c Soft predictions of t and s, respectively, C representing the number of categories;
(2) The whole model is obtained by minimizing the classification loss L cls And distillation loss L in a multiple outlet configuration dis And (3) optimizing:
L=L cls +aL dis ,
where a is a super parameter used to balance the two losses.
2. The target tracking method for a knowledge distillation based multi-outlet full convolution network according to claim 1, wherein the overall flow of the MIN module is as follows:
wherein ,xi,j Is an input centered on coordinates (i, j), ch is the channel index of feature F, and w and b represent feature weights and offsets, respectively.
3. The method for target tracking for a knowledge distillation based multi-outlet full convolution network according to claim 1, characterized in that the mathematical expression of the residual attention module is as follows:
wherein ,activation using sigmoid function, F R (x) Is the residual attention feature,/-> and />Representing the bitwise multiply and add operations, respectively.
4. Knowledge-based as claimed in claim 1A target tracking method of a distillation-aware multi-outlet full convolution network is characterized in that L is as follows dis The definition is as follows:
where Ex is the number of outlets, T (e) e Ex represents the set of teacher outlets, cf (·) represents the classifier corresponding to each outlet.
CN202111221017.6A 2021-10-20 2021-10-20 Multi-outlet full convolution network target tracking method based on knowledge distillation Active CN113963022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111221017.6A CN113963022B (en) 2021-10-20 2021-10-20 Multi-outlet full convolution network target tracking method based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111221017.6A CN113963022B (en) 2021-10-20 2021-10-20 Multi-outlet full convolution network target tracking method based on knowledge distillation

Publications (2)

Publication Number Publication Date
CN113963022A CN113963022A (en) 2022-01-21
CN113963022B true CN113963022B (en) 2023-08-18

Family

ID=79465705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111221017.6A Active CN113963022B (en) 2021-10-20 2021-10-20 Multi-outlet full convolution network target tracking method based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN113963022B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409500A (en) * 2018-09-21 2019-03-01 清华大学 The model accelerating method and device of knowledge based distillation and nonparametric convolution
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN112651998A (en) * 2021-01-18 2021-04-13 沈阳航空航天大学 Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network
CN113095480A (en) * 2021-03-24 2021-07-09 重庆邮电大学 Interpretable graph neural network representation method based on knowledge distillation
CN113159073A (en) * 2021-04-23 2021-07-23 上海芯翌智能科技有限公司 Knowledge distillation method and device, storage medium and terminal
CN113255899A (en) * 2021-06-17 2021-08-13 之江实验室 Knowledge distillation method and system with self-correlation of channels
CN113326941A (en) * 2021-06-25 2021-08-31 江苏大学 Knowledge distillation method, device and equipment based on multilayer multi-attention migration
CN113449680A (en) * 2021-07-15 2021-09-28 北京理工大学 Knowledge distillation-based multimode small target detection method
CN113470036A (en) * 2021-09-02 2021-10-01 湖南大学 Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409500A (en) * 2018-09-21 2019-03-01 清华大学 The model accelerating method and device of knowledge based distillation and nonparametric convolution
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN112651998A (en) * 2021-01-18 2021-04-13 沈阳航空航天大学 Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network
CN113095480A (en) * 2021-03-24 2021-07-09 重庆邮电大学 Interpretable graph neural network representation method based on knowledge distillation
CN113159073A (en) * 2021-04-23 2021-07-23 上海芯翌智能科技有限公司 Knowledge distillation method and device, storage medium and terminal
CN113255899A (en) * 2021-06-17 2021-08-13 之江实验室 Knowledge distillation method and system with self-correlation of channels
CN113326941A (en) * 2021-06-25 2021-08-31 江苏大学 Knowledge distillation method, device and equipment based on multilayer multi-attention migration
CN113449680A (en) * 2021-07-15 2021-09-28 北京理工大学 Knowledge distillation-based multimode small target detection method
CN113470036A (en) * 2021-09-02 2021-10-01 湖南大学 Hyperspectral image unsupervised waveband selection method and system based on knowledge distillation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于知识蒸馏的轻量型浮游植物检测网络;张彤彤;董军宇;赵浩然;李琼;孙鑫;;应用科学学报(03);全文 *

Also Published As

Publication number Publication date
CN113963022A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN108805083B (en) Single-stage video behavior detection method
CN110516536B (en) Weak supervision video behavior detection method based on time sequence class activation graph complementation
US10002290B2 (en) Learning device and learning method for object detection
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
US7783581B2 (en) Data learning system for identifying, learning apparatus, identifying apparatus and learning method
CN110276248B (en) Facial expression recognition method based on sample weight distribution and deep learning
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN111680706A (en) Double-channel output contour detection method based on coding and decoding structure
CN110532920A (en) Smallest number data set face identification method based on FaceNet method
CN104200237A (en) High speed automatic multi-target tracking method based on coring relevant filtering
CN104504362A (en) Face detection method based on convolutional neural network
CN102385592B (en) Image concept detection method and device
CN112800876A (en) Method and system for embedding hypersphere features for re-identification
CN110633727A (en) Deep neural network ship target fine-grained identification method based on selective search
CN114842343A (en) ViT-based aerial image identification method
CN113807176A (en) Small sample video behavior identification method based on multi-knowledge fusion
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
CN115063664A (en) Model learning method, training method and system for industrial vision detection
Silva et al. Online weighted one-class ensemble for feature selection in background/foreground separation
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN111242114B (en) Character recognition method and device
CN110334703B (en) Ship detection and identification method in day and night image
CN109815887B (en) Multi-agent cooperation-based face image classification method under complex illumination
CN113963022B (en) Multi-outlet full convolution network target tracking method based on knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant