CN112633406A - Knowledge distillation-based few-sample target detection method - Google Patents

Knowledge distillation-based few-sample target detection method Download PDF

Info

Publication number
CN112633406A
CN112633406A CN202011626826.0A CN202011626826A CN112633406A CN 112633406 A CN112633406 A CN 112633406A CN 202011626826 A CN202011626826 A CN 202011626826A CN 112633406 A CN112633406 A CN 112633406A
Authority
CN
China
Prior art keywords
network
model
training
training data
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011626826.0A
Other languages
Chinese (zh)
Inventor
杨嘉琛
郭晓岚
王晨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011626826.0A priority Critical patent/CN112633406A/en
Publication of CN112633406A publication Critical patent/CN112633406A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a knowledge distillation-based few-sample target detection method, which is characterized by comprising the following steps of: constructing a picture database with labels and tags to meet the condition of target detection of few samples, and enhancing the diversity of the extended samples by certain data, namely, combining training data and fine-tuning training data; selecting a target detection framework and a backbone network, constructing a network model, and training the network model by using joint training data to obtain a weight model; fine-tuning the first weight model obtained in the second step by using fine-tuning training data to obtain a new weight model, which is called a second weight model; and taking the first weight model obtained in the second step as a student network, taking the second weight model obtained in the third step as a teacher network, using the fine tuning training data again, and performing knowledge distillation according to a test result of the teacher network on the fine tuning training data to realize fine tuning of the student network weight so as to obtain a final third weight model.

Description

Knowledge distillation-based few-sample target detection method
Technical Field
The invention belongs to the field of few-sample object detection, and relates to a few-sample target detection method.
Background
In the field of computer vision, object detection is a very popular research topic. Especially with the proposed Convolutional Neural Network (CNN) and its wide application in the field of image processing, object detection has been rapidly developed and achieved significant achievement[1]. However, the general CNN-based target detection framework needs to be trained by a large data set. When a detection task with only rare samples is faced, the current target detection cannot achieve satisfactory effect, which is also a limitation of the general target detection technology. The main reason for this is that CNN is too powerful in its characterization capability, and when faced with too few samples, the samples are over-fitted, resulting in a lack of generalization capability on new samples, which in turn reduces the detection capability. Therefore, in order to improve the detection performance of target detection under the condition of few samples, a method for effectively improving the network overfitting must be considered.
The overfitting refers to a problem in the model parameter fitting process, and since the training data contains sampling errors, the sampling errors are also considered by the complex model during training, and the sampling errors are well fitted. The method has the main performance that the model has good effect on a training set, has poor effect on a test set and has weak generalization capability. In the case of a very small number of samples, the target detection network is susceptible to such overfitting. Some solutions to this problem are available. Some researchers use data enhancement-based methods to improve the over-fitting problem by increasing the sample complexity to create interference to the neural network model, but such methods are still not applicable for the case of few samples (e.g., only 1 or 2, etc.). Some researchers recently used meta-learning methods[2]The method enables the neural network to have the ability of relearning through a task learning mode, and then well learns the knowledge of a small number of samples. Although some progress is made in the field of target detection of a few samples, the methods usually need to add an additional meta-learning module to enhance the characterization capability of the model on the few samples, and have no universality and a complex structure.
Knowledge steamerDistillation (KD) is the desire to migrate dark knowledge in complex models (teachers) to simple models (students), which are typically more compact than teacher models. Through knowledge distillation, it is desirable that student models approach or exceed teachers as much as possible, thereby achieving similar predictive results with less complexity[3]. At present, there are two main modes of application of knowledge distillation in CNN: (1) the general distillation mode, i.e. distillation between different models, is to perform knowledge transfer between models. (2) Self-distillation mode, i.e. distillation between the same models, treats distillation as a regularization way to improve the network model. Therefore, the self-knowledge distillation can be used as a training method for a target detection task with few samples, the network is retrained by the self-knowledge distillation, the overfitting degree of the neural network under the condition of few samples is reduced, and an additional module is not required to be added.
[1] Liu dong, plum, Cao Shi Dong, deep learning and its application in image object classification and detection are reviewed [ J ] computer science 2016, (12):13-23.
[2] Panxingmu, zhang xulong, dong unknown, etc. current research situation of few sample object detection [ J ] proceedings of Nanjing university of information engineering (Nature science edition), 2019(6).
[3]Hinton G,Vinyals O,Dean J.Distilling the Knowledge in a Neural Network[J].Computer Science,2015,14(7):38-39.
Disclosure of Invention
Aiming at the problem of overfitting of a network in the detection of the target with less samples, the invention provides the method for detecting the target with less samples based on knowledge distillation, which can be used for conveniently, universally and accurately approximating the overfitting network to improve the overfitting. The general target detection framework has better low-sample detection performance. The technical scheme is as follows:
a few-sample target detection method based on knowledge distillation is characterized by comprising the following specific operation steps:
the first step is as follows: and constructing a picture database with labels and tags to meet the condition of target detection of few samples, and enhancing the diversity of the extended samples by certain data to divide the extended samples into joint training data D _ join and fine-tuning training data D _ ft.
The second step is that: selecting a target detection framework and a backbone network, constructing a network model, and training the network model by using joint training data to obtain a weight model; the method comprises the following steps:
selecting fast-RCNN as a target detection framework, adopting VGG16 with 13 convolutional layers, 3 full-link layers and 5 pooling layers as a backbone network, selecting an optimizer as SGD, selecting an Align method in an ROI pooling stage, training the above models by using joint training data D _ join, and finally obtaining the weight of the joint training model, namely a first weight model;
the third step: and D, fine-tuning the first weight model obtained in the second step by using fine-tuning training data to obtain a new weight model, namely a second weight model, wherein the method comprises the following steps:
training a first weight model by using fine tuning training data D _ ft, and setting an initial learning rate without setting an attenuation round; setting the training round as 5 rounds, selecting an optimizer as SGD, and selecting an Align method in the ROI pooling stage; freezing all feature layers of the VGG16, and only adjusting the classification layer; finally, the weight of the fine tuning training model is obtained and is called as a second weight model;
the fourth step: using the first weight model obtained in the second step as a student network, using the second weight model obtained in the third step as a teacher network, using the fine tuning training data again, performing knowledge distillation according to the test result of the teacher network on the fine tuning training data to realize fine tuning of the student network weight, obtaining a final third weight model, obtaining a detection result through the third model and outputting the detection result,
the fourth step comprises the following specific steps:
using the fine tuning training data D _ ft and the prior knowledge of the teacher network to perform knowledge distillation training on the student network, and setting an initial learning rate without setting an attenuation round; setting the training round as 8 rounds, selecting an optimizer as SGD, and selecting an Align method in the ROI pooling stage; freezing the first 10 feature layers of VGG16, and adjusting the next 3 feature layers and classifiers to improve the high-layer overfitting; and (3) calculating a total loss function when knowledge distillation is carried out on the student network according to the following formula by combining the prior knowledge of the teacher network and the real data label knowledge:
l _ cls is the classification difference between the teacher network and the student network; the L _ cls not only comprises the distribution difference with a teacher network, but also comprises the difference with a real label, and with the progress of distillation training, the distribution difference with the teacher network accounts for from high to low, and the distribution difference with the real label accounts for from low to high;
l _ reg is the difference of the prediction boxes between the teacher network and the student network, and the positioning capability of the teacher network and the student network is improved by regressing the distance between the prediction boxes and the true value of the student network;
l _ cpf is the inter-feature difference for the few sample classes; weighting the characteristic difference output by the teacher and student network models by generating an attention distribution map related to few sample classes so as to lead the student network to be more inclined to learn the characteristics of few sample data;
lambda and gamma are hyper-parameters for balancing different loss function terms;
and obtaining a final third weight model.
The invention is based on the idea of self-knowledge distillation and designs a training method suitable for a few-sample target detection framework. The idea is that the neural network model is retrained through knowledge distillation, the obtained model is more generalized than before, learning of few sample class data is highlighted, and the detection capability of the model on the few sample class data is comprehensively improved. Compared with other methods, the method has universality, can be expanded to different target detection frameworks and different data sets, and is simple and efficient.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting a small-sample target based on self-knowledge distillation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of data enhancement provided by an embodiment of the present invention;
FIG. 3 is a frame diagram of a knowledge distillation training process based on fast-RCNN according to an embodiment of the present invention;
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular structures, parameters, etc. in order to provide a more thorough understanding of the embodiments of the invention. However, it will be apparent to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In some general cases, detailed descriptions of well-known structures are omitted so as not to unnecessarily obscure the present invention.
In order to make the technical scheme of the invention clearer, the invention is described below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a schematic flow chart of an implementation of a method for detecting a small sample target based on self-knowledge distillation according to an embodiment of the present invention, which is detailed as follows:
the first step of the embodiment is to prepare a data set. The invention selects and uses the PASCAL VOC2007 data set to produce a plurality of less-sample tasks, and the less-sample target detection task generally takes any 5 types of data in the VOC2007 data set 20 types as the less-sample data set and refers to the less-sample data set as CnovelIn this embodiment, C is selectednovel=[bird,bus,cow,motorbike,sofa]K pictures are taken for each category (K is usually equal to 1, 2, 3, 5, 10). The other 15 classes are called base class data, denoted CbaseUsually training with 5 classes of few sample data, where the training data is Djoint. Then, in the fine tuning and distilling stage, the same number of 15 types of base class data are selected and the model is fine tuned and distilled together with less sample data, and the training data is Dft. To increase sample diversity at the data level, DjointAnd DftEach picture in (a) is flipped to achieve data enhancement, see fig. 2.
Example the second step is joint training. This example is based on the fast-RCNN detection framework, and selects VGG16 having 13 Convolutional layers (volumetric Layer), 3 Fully connected layers (full connected Layer), and 5 pooling layers (Pool Layer) as a backbone network, and implements code using a deep learning framework pytorch. Joint training is training with 15 classes of base class data along with 5 other classes of few sample data, i.e., Djoint. The reason for performing such joint training is that the 15-class base class data can enable the model to better learn the shallow universal feature expression capability, which is beneficial to improving the detection performance of few samples. Specifically, the initial learning rate is set to be 0.001, and the attenuation of each 4 rounds is set to be 0.1; setting the training round as 10 rounds, selecting an optimizer as SGD, and selecting an Align method at the ROI pooling stage; setting sizes of 9 prior anchor point frames; at the same time training is started based on the pre-training weights vgg16_ noise. The training loss function at this time is the same as the original fast-RCNN. The model obtained by training at this time is referred to as model 1.
Example the third step is fine tuning training. Since the data sets in the joint training phase are severely class unbalanced, the model will produce a severe overfitting for those few sample data. Trimming is a common method of ameliorating this problem. Specifically, the invention fine-tunes the model 1 through a small data set, and selects 15 types of base class data with the same quantity and few sample data to fine-tune the model so as not to reinitialize the classification layer of the model, namely Dft. During fine adjustment, a smaller learning rate is often selected, the initial learning rate is set to be 0.0001, and attenuation turns are not set; setting the training round as 5 rounds, selecting an optimizer as SGD, and selecting an Align method in the ROI pooling stage; freezing the characteristic layer of VGG16, and only adjusting the classification layer, which is beneficial for the classifier of the model to have better characteristic expression capability to less sample data; the fine tuning training model weights are obtained and are called as model 2.
Example the fourth step is knowledge distillation training. The invention uses fine-tuning training data DftAnd prior knowledge of the model 2 (teacher network), and knowledge distillation training is carried out on the model 1 (student network). This step is analyzed with reference to fig. 3. FIG. 3 shows the general detection process of fast-RCNN, which uses a branch to generate some propusals that may contain samples, and then performs classification and regression to improve detection performance. The main components of the device comprise: 1) a backbone for generating a base feature map, with the output fbase(x | W), where W represents a network parameter; 2) region for generating proposalsProposal Networks (RPN); 3) the network head (RCN) for classifying and regressing the region of interest (ROI) is set as p and R respectively, wherein p is the predicted logits of each class by the network. The invention sets the classification and regression result of teacher network output as ptAnd RtThe output of the student network is ps、Rs. Since it is essentially the same model, it is called self-distillation. In the knowledge distillation training stage, the student network performs knowledge distillation on the teacher network through loss functions of three aspects, which is expressed by the invention as follows:
Figure BDA0002879391400000041
where λ and γ are hyper-parameters for balancing different loss function terms, λ is 1 and γ is 0.01 in this embodiment.
In the knowledge distillation, the invention uses a broader softmax function, as shown in equation (2).
Figure BDA0002879391400000042
Where p is the logits value output by the classifier. The general softmax obtains a distribution which is a vector close to the argmax output one-hot, and the output is too hard, so that the probability that too much attention is paid to the network learning is high, and the probability that the value is low is ignored. By the aid of the generalized softmax function, probability distribution of output classification results is softer, students learn similar mapping by approximating probability distribution of networks of the students and teachers, and network fitting degree is reduced. T is called a temperature coefficient, and the larger the value of T, the higher the output softening degree. When the knowledge transfer between models is carried out, only the T value needs to be properly increased, and the Cross entropy (Cross-entropy) of the distribution between networks is minimized, which is expressed as formula (3):
Figure BDA0002879391400000043
Figure BDA0002879391400000044
is the classification difference between models. The invention properly increases the value of the temperature T to soften the output in the training stage, so that the student network learns more knowledge and improves the network generalization capability. And then carrying out inter-model knowledge transfer by reducing output distribution difference among networks. Through the formula (3), the student model can well learn the classification knowledge of the teacher model. But the teacher's knowledge is not necessarily correct, and in order to ensure that the students learned correct and generalized knowledge, they need to incorporate the correct label provided by the group truth.
Figure BDA0002879391400000045
The formula (4):
Figure BDA0002879391400000046
mu is a hyperparameter representing the ratio of knowledge sources, CE represents the cross entropy function, yclsRepresenting the true label value of the classification. At the start of the distillation training, the present experiment set μ to 0.5, and as the distillation training proceeded, the value of μ increased by 0.1 every two rounds, so that the distribution difference from the teacher network went from high to low. The trained model not only has prior knowledge obtained by distillation, reduces overfitting, but also approaches to real data, and the model effect is better
Figure BDA0002879391400000047
Is the difference in the prediction box between the models. The specificity of the object detection task is that besides classification, bounding boxes are used to mark out objects in the picture and adjust the position and size of the frame. Therefore, knowledge transfer of the capabilities of the prediction box is also required. However, the prediction of the bounding box is regressed according to the value of the exact position, and a soft label like softmax cannot be used. Since the teacher's prediction is difficult to match the ground routeAccurate, the results of using it to regress student networks can result in a poor effect. The invention provides that when the prediction effect of the student network is worse than that of the teacher network, an extra gradient is generated, so that the student network approaches to a ground route more quickly, and otherwise, no extra punishment exists. Expressed as formulas (5) and (6):
Figure BDA0002879391400000051
Figure BDA0002879391400000052
sL1 represents the Smooth L1 function, ybbThe label truth value of the prediction box, sigma, is the weight value of the penalty item, is set to 0.5 in the invention, and m represents the edge over parameter, and is set to 0 in the invention.
Figure BDA0002879391400000053
Is for inter-feature differences in few sample classes. The first two losses do not perform special processing on less sample data, only the fitting degree of the network is improved, and the difference between the characteristics of the less sample class is added for better detecting the less sample target. The method is equivalent to an attention mechanism, and enables a student network to learn the characteristics of less sample data more preferentially. The invention calculates the IOU of boundary box truth value of anchorages and few samples generated by sliding on the characteristic diagram, selects the anchorages surrounding the few sample targets which meet the condition by setting a threshold value proportional to the maximum IOU, and then adds the anchorages to obtain an attention distribution diagram which is set as thetaij. Finally, the feature differences between the models are weighted using this profile. Expressed as formula (7):
Figure BDA0002879391400000054
wherein f isbaseRepresenting backbone network trafficThe characteristic diagram is shown, i, j represents the width and height of the characteristic diagram, and c represents the number of channels.
Figure BDA0002879391400000055
Is a decision function defined to be valid only if the input class belongs to a few sample classes. N is a normalization parameter, the value of which can be represented by equation (8):
N=∑ijΘij (8)
Figure BDA0002879391400000056
has the advantage that it encourages student networks to pay more attention to knowledge of a few sample classes during distillation training. While the background can be suppressed to some extent. After knowledge distillation training is completed, the obtained model 3 has stronger Lupont performance, and the process of fitting less sample data is reduced. The model 3 is used for detecting less sample data to be detected, and the performance is superior to that of the model 2 and the model 1.
The above examples are merely illustrative of the technical solutions and limitations of the present invention, and the application of the present invention is not limited to the above examples, with many similar variations. Modifications and equivalents of the embodiments of the invention described herein will occur to those skilled in the art and are intended to be included within the scope of the claims appended hereto.

Claims (2)

1. A few-sample target detection method based on knowledge distillation is characterized by comprising the following specific operation steps:
the first step is as follows: constructing a picture database with labels and tags to meet the condition of target detection of few samples, enhancing the diversity of the extended samples by certain data, and dividing the extended samples into combined training data DjointAnd fine tuning training data Dft
The second step is that: selecting a target detection framework and a backbone network, constructing a network model, and training the network model by using joint training data to obtain a weight model; the method comprises the following steps:
selecting fast-RCNN as a target detection frameThe frame adopts VGG16 with 13 convolutional layers, 3 full-link layers and 5 pooling layers as a backhaul network, selects an optimizer as SGD, selects an Align method in the ROI pooling stage, and uses joint training data DjointTraining the models to finally obtain the weight of the combined training model, which is called as a first weight model;
the third step: and D, fine-tuning the first weight model obtained in the second step by using fine-tuning training data to obtain a new weight model, namely a second weight model, wherein the method comprises the following steps:
using Fine-tuning training data DftTraining a first weight model, setting an initial learning rate and not setting an attenuation round; setting the training round as 5 rounds, selecting an optimizer as SGD, and selecting an Align method in the ROI pooling stage; freezing all feature layers of the VGG16, and only adjusting the classification layer; finally, the weight of the fine tuning training model is obtained and is called as a second weight model;
the fourth step: and taking the first weight model obtained in the second step as a student network, taking the second weight model obtained in the third step as a teacher network, using the fine tuning training data again, carrying out knowledge distillation according to a test result of the teacher network on the fine tuning training data, realizing fine tuning of the student network weight, obtaining a final third weight model, and obtaining and outputting a detection result through the third model.
2. The method according to claim 1, wherein the fourth step is as follows:
using Fine-tuning training data DftAnd prior knowledge of a teacher network, performing knowledge distillation training on the student network, and setting an initial learning rate without setting an attenuation round; setting the training round as 8 rounds, selecting an optimizer as SGD, and selecting an Align method in the ROI pooling stage; freezing the first 10 feature layers of VGG16, and adjusting the next 3 feature layers and classifiers to improve the high-layer overfitting; and (3) calculating a total loss function when knowledge distillation is carried out on the student network according to the following formula by combining the prior knowledge of the teacher network and the real data label knowledge:
Figure FDA0002879391390000011
Figure FDA0002879391390000012
is the classification difference between the teacher network and the student network;
Figure FDA0002879391390000013
the distribution difference with the teacher network and the distribution difference with the real label are included, and along with the distillation training, the distribution difference with the teacher network is from high to low, and the distribution difference with the real label is from low to high;
Figure FDA0002879391390000014
the difference of the prediction boxes between the teacher network and the student network improves the positioning capability of the teacher network and the student network by regressing the distance between the student network prediction box and the true value;
Figure FDA0002879391390000015
is directed to inter-feature differences in few sample classes; weighting the characteristic difference output by the teacher and student network models by generating an attention distribution map related to few sample classes so as to lead the student network to be more inclined to learn the characteristics of few sample data;
lambda and gamma are hyper-parameters for balancing different loss function terms;
and obtaining a final third weight model.
CN202011626826.0A 2020-12-31 2020-12-31 Knowledge distillation-based few-sample target detection method Pending CN112633406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011626826.0A CN112633406A (en) 2020-12-31 2020-12-31 Knowledge distillation-based few-sample target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011626826.0A CN112633406A (en) 2020-12-31 2020-12-31 Knowledge distillation-based few-sample target detection method

Publications (1)

Publication Number Publication Date
CN112633406A true CN112633406A (en) 2021-04-09

Family

ID=75290032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011626826.0A Pending CN112633406A (en) 2020-12-31 2020-12-31 Knowledge distillation-based few-sample target detection method

Country Status (1)

Country Link
CN (1) CN112633406A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991330A (en) * 2021-04-19 2021-06-18 征图新视(江苏)科技股份有限公司 Knowledge distillation-based positive sample industrial defect detection method
CN113222034A (en) * 2021-05-20 2021-08-06 浙江大学 Knowledge distillation-based fine-grained multi-class unbalanced fault classification method
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113610173A (en) * 2021-08-13 2021-11-05 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113850012A (en) * 2021-06-11 2021-12-28 腾讯科技(深圳)有限公司 Data processing model generation method, device, medium and electronic equipment
CN113919444A (en) * 2021-11-10 2022-01-11 北京市商汤科技开发有限公司 Training method of target detection network, target detection method and device
CN114332567A (en) * 2022-03-16 2022-04-12 成都数之联科技股份有限公司 Training sample acquisition method and device, computer equipment and storage medium
CN114970375A (en) * 2022-07-29 2022-08-30 山东飞扬化工有限公司 Rectification process monitoring method based on real-time sampling data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
CN110033026A (en) * 2019-03-15 2019-07-19 深圳先进技术研究院 A kind of object detection method, device and the equipment of continuous small sample image
CN110633747A (en) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 Compression method, device, medium and electronic device for target detector
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111860236A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 Small sample remote sensing target detection method and system based on transfer learning
CN112115783A (en) * 2020-08-12 2020-12-22 中国科学院大学 Human face characteristic point detection method, device and equipment based on deep knowledge migration

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
CN110033026A (en) * 2019-03-15 2019-07-19 深圳先进技术研究院 A kind of object detection method, device and the equipment of continuous small sample image
CN110633747A (en) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 Compression method, device, medium and electronic device for target detector
CN110674714A (en) * 2019-09-13 2020-01-10 东南大学 Human face and human face key point joint detection method based on transfer learning
CN111860236A (en) * 2020-07-06 2020-10-30 中国科学院空天信息创新研究院 Small sample remote sensing target detection method and system based on transfer learning
CN112115783A (en) * 2020-08-12 2020-12-22 中国科学院大学 Human face characteristic point detection method, device and equipment based on deep knowledge migration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUOBIN CHEN ET AL: "Learning Efficient Object Detection Models with Knowledge Distillation", 《NIPS"17:PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 *
KYUNGYUL KIM ET AL: "Self-Knowledge Distillation: A Simple Way for Better Generalization", 《ARXIV》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991330A (en) * 2021-04-19 2021-06-18 征图新视(江苏)科技股份有限公司 Knowledge distillation-based positive sample industrial defect detection method
CN112991330B (en) * 2021-04-19 2021-08-13 征图新视(江苏)科技股份有限公司 Knowledge distillation-based positive sample industrial defect detection method
CN113222034A (en) * 2021-05-20 2021-08-06 浙江大学 Knowledge distillation-based fine-grained multi-class unbalanced fault classification method
CN113222034B (en) * 2021-05-20 2022-01-14 浙江大学 Knowledge distillation-based fine-grained multi-class unbalanced fault classification method
CN113850012A (en) * 2021-06-11 2021-12-28 腾讯科技(深圳)有限公司 Data processing model generation method, device, medium and electronic equipment
CN113850012B (en) * 2021-06-11 2024-05-07 腾讯科技(深圳)有限公司 Data processing model generation method, device, medium and electronic equipment
CN113610173A (en) * 2021-08-13 2021-11-05 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113610173B (en) * 2021-08-13 2022-10-04 天津大学 Knowledge distillation-based multi-span domain few-sample classification method
CN113487614A (en) * 2021-09-08 2021-10-08 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113919444A (en) * 2021-11-10 2022-01-11 北京市商汤科技开发有限公司 Training method of target detection network, target detection method and device
CN114332567A (en) * 2022-03-16 2022-04-12 成都数之联科技股份有限公司 Training sample acquisition method and device, computer equipment and storage medium
CN114970375A (en) * 2022-07-29 2022-08-30 山东飞扬化工有限公司 Rectification process monitoring method based on real-time sampling data

Similar Documents

Publication Publication Date Title
CN112633406A (en) Knowledge distillation-based few-sample target detection method
CN109034205B (en) Image classification method based on direct-push type semi-supervised deep learning
CN109086658B (en) Sensor data generation method and system based on generation countermeasure network
WO2021143396A1 (en) Method and apparatus for carrying out classification prediction by using text classification model
WO2020114378A1 (en) Video watermark identification method and apparatus, device, and storage medium
US20220198339A1 (en) Systems and methods for training machine learning model based on cross-domain data
US20160224903A1 (en) Hyper-parameter selection for deep convolutional networks
CN111967480A (en) Multi-scale self-attention target detection method based on weight sharing
US10579907B1 (en) Method for automatically evaluating labeling reliability of training images for use in deep learning network to analyze images, and reliability-evaluating device using the same
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN113807420A (en) Domain self-adaptive target detection method and system considering category semantic matching
CN110348447B (en) Multi-model integrated target detection method with abundant spatial information
CN111898685B (en) Target detection method based on long tail distribution data set
CN113378959B (en) Zero sample learning method for generating countermeasure network based on semantic error correction
CN115393687A (en) RGB image semi-supervised target detection method based on double pseudo-label optimization learning
CN110689091A (en) Weak supervision fine-grained object classification method
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN113095249A (en) Robust multi-mode remote sensing image target detection method
WO2023088174A1 (en) Target detection method and apparatus
CN115424177A (en) Twin network target tracking method based on incremental learning
CN115546196A (en) Knowledge distillation-based lightweight remote sensing image change detection method
CN116824216A (en) Passive unsupervised domain adaptive image classification method
CN114863176A (en) Multi-source domain self-adaptive method based on target domain moving mechanism
CN113807214B (en) Small target face recognition method based on deit affiliated network knowledge distillation
CN111144462A (en) Unknown individual identification method and device for radar signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210409

RJ01 Rejection of invention patent application after publication