CN116343007A - Target detection method, device, equipment and storage medium - Google Patents

Target detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116343007A
CN116343007A CN202310317906.5A CN202310317906A CN116343007A CN 116343007 A CN116343007 A CN 116343007A CN 202310317906 A CN202310317906 A CN 202310317906A CN 116343007 A CN116343007 A CN 116343007A
Authority
CN
China
Prior art keywords
detection result
result set
detection
model
loss value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310317906.5A
Other languages
Chinese (zh)
Inventor
李林超
权家新
周凯
温婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhuoyun Intelligent Technology Co ltd
Original Assignee
Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhuoyun Intelligent Technology Co ltd filed Critical Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority to CN202310317906.5A priority Critical patent/CN116343007A/en
Publication of CN116343007A publication Critical patent/CN116343007A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a target detection method, device, equipment and storage medium. The method comprises the following steps: acquiring a target image; detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result; the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data. That is, the calibrated tag data is denoised under the combined action of the teacher model and the student model, and the influence of noise in the tag data on the target detection model is reduced, so that the prediction precision of the target detection model is improved, and the accuracy of the image detection result is further improved.

Description

Target detection method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a target detection method, a target detection device, target detection equipment and a storage medium.
Background
With the continuous development of deep learning, deep learning is widely used in the field of image detection, for example, detecting the type and position of contraband in a target image through a deep learning network. However, in the training process of the deep learning network, as various target objects have different appearances, shapes and postures, and the interference of factors such as illumination, shielding and the like during imaging is added, the difficulty of marking training images is high, more noise label data are easy to appear, the training of a detection model is greatly influenced, and the detection accuracy of the detection model obtained after the training is reduced.
Disclosure of Invention
Aiming at the technical problems existing in the conventional technology, the embodiment of the application provides a target detection method, device, equipment and storage medium.
In a first aspect, an embodiment of the present application provides a target detection method, including:
acquiring a target image;
detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
In a second aspect, an embodiment of the present application provides an object detection apparatus, including:
the acquisition module is used for acquiring a target image;
the processing module is used for detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the processors to implement the steps of the object detection method as provided in the first aspect of the embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the object detection method as provided in the first aspect of embodiments of the present application.
According to the technical scheme provided by the embodiment of the application, after the target image is acquired, a target object in the target image is detected by using a pre-trained target detection model, so that a corresponding detection result is obtained; the target detection model is obtained by denoising tag data corresponding to the training image through the teacher model and the student model together, and training the student model based on the teacher model, the training image and the denoised tag data. That is, the calibrated tag data is denoised under the combined action of the teacher model and the student model, and the influence of noise in the tag data on the target detection model is reduced, so that the prediction precision of the target detection model is improved, and the accuracy of the image detection result is further improved.
Drawings
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a training process of a target detection model according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a reliability factor determination process of a detection result according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an object detection device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings.
In the field of image detection, a target object to be detected (such as contraband) is easy to have phenomena of excessive shielding, extreme angles, abnormal objects, imaging noise and the like, so that the labeling difficulty of the target object in a training image is high, a large amount of noise label data is easy to appear, for example, a labeling frame type is wrong, a wrong label frame, a missed label frame and the like exist, the detection accuracy of a detection model obtained based on the label data training is reduced, and the accuracy of an image detection result is low.
Therefore, according to the technical scheme provided by the embodiment of the application, the label data corresponding to the training image can be denoised through the teacher model and the student model, and the student model is trained based on the teacher model, the training image and the denoised label data, so that the target detection model with good detection performance is obtained, and the target object (such as contraband) in the target image is detected by using the target detection model, so that the accuracy of a detection result can be improved.
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present application. As shown in fig. 1, the method may include:
s101, acquiring a target image.
S102, detecting a target object in a target image by using a pre-trained target detection model to obtain a corresponding detection result.
Specifically, the target image may be an image that needs to be detected by a target object, and the target object may be some objects such as a person, an animal, an article, or a vehicle, for example, the target object is contraband. In practical application, the target image may be acquired through the image capturing device, or may be acquired from a corresponding storage setting, and after the target image is obtained, the target image may be input into a pre-trained target detection model, where the target detection model may include a backbone network, a bottleneck layer, a detection head, and the like, and feature extraction is performed on the target image through the target detection model, and a detection result corresponding to the target image is determined based on the extracted feature. Taking a target object as a contraband example, detecting the contraband in the target image through a target detection model, so as to obtain the type and the position of the contraband.
The target detection model is obtained by denoising tag data corresponding to the training image through the teacher model and the student model together, and training the student model based on the teacher model, the training image and the denoised tag data. In this embodiment, two models may be set, one is a detection model with a complex network structure and a large number of parameters, and the other is a detection model with a simple network structure and a small number of parameters, as a student model, the student model is trained by distillation, the detection result of the training image by the student model is verified by the teacher model, the detection result of the training image by the teacher model is verified by the student model, and the error label, the miss-selected label and the like in the label data are determined based on the detection result after mutual verification, namely, the label data is denoised under the interaction of the teacher model and the student model, so that the influence of the noise of the label data on the target detection model is reduced, and the detection performance of the target detection model is improved.
According to the target detection method provided by the embodiment of the application, after the target image is acquired, a target object in the target image is detected by using a pre-trained target detection model, so that a corresponding detection result is obtained; the target detection model is obtained by denoising tag data corresponding to the training image through the teacher model and the student model together, and training the student model based on the teacher model, the training image and the denoised tag data. That is, the calibrated tag data is denoised under the combined action of the teacher model and the student model, and the influence of noise in the tag data on the target detection model is reduced, so that the prediction precision of the target detection model is improved, and the accuracy of the image detection result is further improved.
In one embodiment, the object detection model may also be trained with reference to the process described in the embodiments below. Optionally, as shown in fig. 2, before S101, the method further includes:
s201, detecting training images through a teacher model and a student model to obtain a first detection result set of the teacher model and a second detection result set of the student model.
The training image is an image for model training of the target detection model, and usually, a plurality of training images are adopted, so that for convenience of description, the training image is simply referred to herein. After the training images are obtained, the training images are respectively input into a teacher model and a student model, and a first detection result set processed by the teacher model and a second detection result set processed by the student model are obtained.
S202, determining a first loss value corresponding to the third detection result set according to the cross-over ratio between the third detection result set and the label data corresponding to the training image, the reliability factor of each detection result in the third detection result set and the original loss value corresponding to the third detection result set.
The third detection result set comprises m detection results with highest reliability factors in the first detection result set and m detection results with highest reliability factors in the second detection result set, wherein m is a natural number larger than 1.
The above-mentioned reliability factor is used to indicate the reliability of the detection result, and it is understood that the larger the reliability factor is, the higher the reliability (i.e., accuracy) of the detection result is, whereas the smaller the reliability factor is, the lower the reliability of the detection result is. After the first detection result set and the second detection result set are obtained, the third detection result set can be determined according to the reliability factor of each detection result in the first detection result set and the reliability factor of each detection result in the second detection result set. Specifically, the detection results in the first detection result set may be sorted from large to small according to the reliability factor, the detection results in the second detection result set may be sorted from large to small according to the reliability factor, m detection results with the highest reliability factor in the first detection result set and m detection results with the highest reliability factor in the second detection result set are selected to be combined into 2m detection results, and the 2m detection results form a third detection result set. Of course, non-maximum suppression (Non Maximum Suppression, NMS) processing may also be performed on 2m detection results.
Alternatively, the specific value of m may be determined by the following formula (1):
m=(1-min(FP,M))×B_M (1)
wherein FP represents the false positive probability of the last iteration student model, b_m represents the number of tag data (i.e. calibration frames) in each iteration, M represents the super parameter, and M can be set to 0.5, and it can be seen that as the false positive rate decreases, the number of selected detection results increases.
Alternatively, as shown in fig. 3, the reliability factor of each detection result in the first detection result set may be determined by the following procedure:
s301, determining an intersectional-over-Union (IoU) ratio of each detection result in the first detection result set to the detection result of the same category in the second detection result set.
Specifically, NMS processing may be performed on the second set of detection results of the student model, and then the first set of detection results of the teacher model and the detection results of the same category in the second set of detection results after NMS processing may be subjected to cross-correlation operation.
S302, determining the maximum cross-over ratio as a target cross-over ratio of each detection result.
And selecting a maximum cross ratio from all the cross ratios corresponding to a certain detection result, and determining the maximum cross ratio as a target cross ratio of the detection result.
S303, determining the KL divergence of the first detection result set relative to the second detection result set in terms of category.
Specifically, the KL divergence KL of the first detection result set with respect to the second detection result set in terms of category may be determined with reference to the following formula (2) t
Figure BDA0004151910780000071
Wherein N is the number of categories contained in the detection result,
Figure BDA0004151910780000072
confidence level of detection result output for student model for ith type, < >>
Figure BDA0004151910780000073
The detection result output for the teacher model is for the i-th type of confidence.
S304, determining the reliability factors of all detection results in the first detection result set according to the KL divergence and the target cross-correlation ratio.
Specifically, the following formula (3) may be referred to determine the reliability factor of each detection result in the first detection result set:
Q t =sigmoid(KL t )×Iou t (3)
wherein Q is t Is a reliability factor for each detection result in the first detection result set Iou t And the target cross ratio of each detection result in the first detection result set.
Accordingly, the reliability factor of each detection result in the second detection result set may be determined by the following procedure: determining the intersection ratio of each detection result in the second detection result set aiming at the detection result of the same category in the first detection result set, determining the maximum intersection ratio in the same category as the target intersection ratio of each detection result in the second detection result set, determining the KL divergence of the second detection result set relative to the first detection result set in the category aspect, and then determining the reliability factor of each detection result in the second detection result set according to the KL divergence and the target intersection ratio of each detection result in the second detection result set.
After the third detection result set is obtained, the cross-over ratio between each detection result in the third detection result set and the label data corresponding to the training image can be determined, and a calculation mode for calculating the first loss value corresponding to the third detection result set is determined based on the obtained cross-over ratio. That is, when the intersection ratio of the third detection result set and the tag data is in different value ranges, the calculation manners for calculating the first loss value corresponding to the third detection result set may be different. Specifically, the first loss value is related to an original loss value corresponding to the third detection result set and a reliability factor of each detection result in the third detection result set. The original loss value refers to a loss value between each detection result in the third detection result set and the tag data, and comprises an original loss value of the student model and an original loss value of the teacher model, meanwhile, the original loss value comprises a regression loss value and a category loss value, the regression loss value refers to loss between the predicted position of the target object and the position of the target object in the tag data, and the category loss value refers to loss between the predicted category of the target object and the category of the target object in the tag data. The first loss value may be considered as a loss value obtained by correcting the original loss value by combining the reliability factors of the detection results in the third detection result set, and the target detection model is trained by the first loss value, so that the influence of noise in the tag data on the target detection model is reduced.
S203, determining a second loss value corresponding to the fourth detection result set according to the reliability factor of each detection result in the fourth detection result set and the original loss value corresponding to the fourth detection result set.
The fourth detection result set comprises detection results except the third detection result set in the first detection result set and the second detection result set.
The original loss value corresponding to the fourth detection result set refers to the loss value between each detection result and the tag data in the fourth detection result set, including the original loss value of the student model and the original loss value of the teacher model, and meanwhile, the original loss value includes both the regression loss value and the category loss value, and the concepts of the regression loss value and the category loss value may be referred to the description in S202 above, which is not repeated herein.
The second loss value can be regarded as the loss value after the original loss value corresponding to the fourth detection result set is corrected by combining the reliability factors of the detection results in the fourth detection result set, and the target detection model is trained by the second loss value, so that the influence of noise in the tag data on the target detection model can be reduced.
Specifically, the following equation (4) -equation (6) may be referred to determine the second loss value corresponding to the fourth detection result set:
Figure BDA0004151910780000091
Figure BDA0004151910780000092
Figure BDA0004151910780000093
wherein,,
Figure BDA0004151910780000094
original regression loss value of detection result output by student model in fourth detection result set, ++>
Figure BDA0004151910780000095
Loss value of original category of the test result output by the student model in the fourth test result set, ++>
Figure BDA0004151910780000096
A first loss value for the test result output by the student model in the fourth test result set,
Figure BDA0004151910780000097
the original regression loss value of the test result output by the teacher model in the fourth test result set,
Figure BDA0004151910780000098
the original category loss value for the test result output by the teacher model in the fourth test result set,
Figure BDA0004151910780000099
a first loss value of the detection result output by the teacher model in the fourth detection result set remain For the first loss value corresponding to the fourth detection result set, Q k And the reliability factor corresponding to the detection result is obtained.
S204, training the student model according to the first loss value and the second loss value to obtain a target detection model.
After the first loss value and the second loss value are obtained, the first loss value and the second loss value are summed, the student model is subjected to back propagation based on the summed loss values, and parameters of the student model are optimized until the student model reaches a preset convergence condition, so that a corresponding target detection model is obtained.
In this embodiment, the first detection result set and the second detection result set are divided into a third detection result set with higher reliability and a fourth detection result set with lower reliability by the reliability factors of the detection results output by the teacher model and the student model, and the original loss values corresponding to the third detection result set are corrected by selecting different modes based on the cross-over ratio of the third detection result set and the label data to obtain the first loss value, and the original values corresponding to the fourth detection result set are corrected based on the reliability factors of the detection results in the fourth detection result set to obtain the second loss value, and then the student model is trained based on the first loss value and the second loss value to obtain the target detection model, so that the influence of noise in the label data on the target detection model is reduced.
In practical applications, when the cross-over ratio between the third detection result set and the tag data corresponding to the training image is in different value ranges, the calculation mode for calculating the first loss value corresponding to the third detection result set may be different. On the basis of the above embodiment, optionally, the above S202 may include the following three ways:
mode one: when the intersection ratio between the third detection result set and the label data corresponding to the training image is smaller than or equal to a first preset threshold value, determining the class weight corresponding to each detection result according to the reliability factor of each detection result in the third detection result set, the confidence coefficient of each class of each detection result and the sum of the confidence coefficients of all classes of each detection result; and determining a first loss value corresponding to the third detection result set according to the class weight corresponding to each detection result in the third detection result set and the original class loss value corresponding to the third detection result set.
The first preset threshold may be set based on actual requirements, and optionally, the first preset threshold may be set to 0.3. Because the third detection result set includes a plurality of detection results with the highest reliability factors output by the student model and a plurality of detection results with the highest reliability factors output by the teacher model, in other words, the reliability of each detection result in the third detection result set is relatively high, when the cross ratio between the third detection result set and the label data corresponding to the training image is determined to be smaller than or equal to the first preset threshold value, the possibility that errors exist in the label data manually calibrated is relatively high. For this case, only the tag data may be used to participate in calculation of the class loss, the regression loss is not calculated, and parameters such as a confidence coefficient of each class of each detection result and a sum of confidence coefficients of all classes of each detection result in the third detection result set are adopted, so as to correct the original class loss value corresponding to the third detection result set (that is, assign a corresponding class weight to the original class loss value corresponding to the third detection result set), thereby obtaining the first loss value corresponding to the third detection result set.
Specifically, the category weight corresponding to each detection result in the third detection result set may be determined with reference to the following formula (7), and the first loss value corresponding to the third detection result set may be determined with reference to the following formula (8):
Figure BDA0004151910780000111
loss pre_k1 =weight pre_k1 ×loss pre_k1_label (8)
wherein p (x) i ) For the confidence of the ith class of the kth test result,
Figure BDA0004151910780000112
the sum of confidence levels of all categories of the kth detection result is that N is the number of categories contained in the kth detection result, Q k For the k-th detection result, sum (Q) is the sum of the reliability factors of all detection results in the third detection result set, loss pre_k1 In order to form a first loss value corresponding to the third detection result set pre_k1_label In order to form the original category loss value corresponding to the third detection result set, weight pre_k1 In the third detection result set, the category weight of each detection result is set.
Further, because the reliability of each detection result in the third detection result set is higher, corresponding correction detection data can be determined according to each detection result in the third detection result set and the category weight corresponding to each detection result, and each obtained correction detection data is determined as label data of the next training round of the student model. That is, after the third detection result set is given the corresponding category weight, the third detection result set is used as the label data to participate in the next training of the student model, so that the expansion of the label data is realized, the accuracy of the label data is improved, and the accuracy of the target detection model is improved.
Through the first loss value corresponding to the third detection result set calculated in the mode, the false mark and missing mark phenomena in the tag data can be reduced, the influence of noise in the tag data on the target detection model is reduced, and therefore the performance of the target detection model is improved.
Mode two: when the intersection ratio between the third detection result set and the label data corresponding to the training image is larger than the first preset threshold value and smaller than the second preset threshold value, determining the class weight corresponding to each detection result in the third detection result set according to the reliability factor of each detection result in the third detection result set and the class error probability of each detection result; determining regression weights corresponding to all detection results according to the cross ratio corresponding to all detection results in the third detection result set and the reliability factors of all detection results; and determining a first loss value corresponding to the third detection result set according to the category weight, the regression weight, the original category loss value and the original regression loss value between the third detection result set and the label data corresponding to the training image.
The second preset threshold may be set based on actual requirements, and optionally, the second preset threshold may be set to 0.75. Because the third detection result set includes a plurality of detection results with the highest reliability factors output by the student model and a plurality of detection results with the highest reliability factors output by the teacher model, in other words, the reliability of each detection result in the third detection result set is relatively high, when the intersection ratio between the third detection result set and the label data corresponding to the training image is determined to be greater than the first preset threshold and less than the second preset threshold, it is indicated that similar errors exist in the detection results predicted by the teacher model and the student model, the corresponding label data are difficult samples, and the learning of the characteristics of the difficult samples by the teacher model and the student model needs to be enlarged, for example, relatively high weights are set for the original category loss value and the original regression loss value corresponding to the third detection result set. Specifically, the category weight corresponding to each detection result in the third detection result set may be determined by the following formula (9), the regression weight corresponding to each detection result in the third detection result set may be determined by the following formula (10), and the first loss value corresponding to the third detection result set may be determined by the following formula (11):
Figure BDA0004151910780000131
Figure BDA0004151910780000132
loss pre_k2 =weight pre_k2_label ×loss pre_k2_label +weight pre_k2_bbox ×loss pre_k2_bbox (11)
wherein IOU (pre-bbox, GT-bbox) is the cross-over ratio between the kth detection result pre-bbox and the corresponding tag data GT-bbox, weight pre_k2_label Weight, which is the regression weight of the kth detection result in the second mode pre_k2_label For class weight of kth detection result in mode two, loss pre_k2_label Loss of original category corresponding to the third detection result set in the second mode pre_k2_bbox In the second mode, the loss is the original regression loss value corresponding to the third detection result set pre_k2 And the first loss value corresponding to the third detection result set in the second mode.
Mode three: when the intersection ratio of the third detection result set and the label data corresponding to the training image is larger than or equal to a second preset threshold value, determining a first loss value corresponding to the third detection result set according to the product of the reliability factor of each detection result in the third detection result set and the confidence coefficient of each category of each detection result, and the original category loss value and the original regression loss value corresponding to the third detection result set.
When determining that the cross ratio between the third detection result set and the label data corresponding to the training image is greater than or equal to the second preset threshold, it indicates that the feature detection capability of the teacher model and the student model on the label data is strong, and then the label data can be regarded as easy-to-check samples, in order to avoid the occurrence of the over-fitting phenomenon, the proportion of the easy-to-check samples needs to be reduced, the attention of the target detection model on simple features is reduced, for example, a lower weight is set for the original category loss value and the original regression loss value corresponding to the third detection result set. Specifically, the first loss value corresponding to the third detection result set may be determined by the following formula (12):
loss pre_k3 =sigmoid(P)×Q k ×(loss pre_k3_label +loss pre_k3_bbox ) (12)
where P is the confidence of each category of the kth detection result, loss pre_k3_label Loss of the original category corresponding to the third detection result set in the third mode pre_k3_bbox To the original regression loss value corresponding to the third detection result set in the third mode pre_k3 The first loss value corresponding to the third detection result set in the third mode is obtained.
According to the method, the first loss value corresponding to the third detection result set is calculated in the second mode, the learning of the target detection model on the complex sample is improved, the second loss value corresponding to the third detection result set is calculated in the third mode, the learning of the target detection model on the simple sample is reduced, the phenomenon of fitting is avoided, namely when the cross ratio between the third detection result set and the label data corresponding to the training image is located in different value ranges, the first loss value is determined in the corresponding calculation mode, noise in the label data is reduced, the learning of the complex feature is enhanced, the learning of the simple feature is reduced, and therefore the accuracy of the target detection model is improved.
Fig. 4 is a schematic structural diagram of an object detection device according to an embodiment of the present application. As shown in fig. 4, the apparatus may include: an acquisition module 401 and a processing module 402.
Specifically, the acquiring module 401 is configured to acquire a target image;
the processing module 402 is configured to detect a target object in the target image by using a pre-trained target detection model, so as to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
After the target image is acquired, the target detection device detects the target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result; the target detection model is obtained by denoising tag data corresponding to the training image through the teacher model and the student model together, and training the student model based on the teacher model, the training image and the denoised tag data. That is, the calibrated tag data is denoised under the combined action of the teacher model and the student model, and the influence of noise in the tag data on the target detection model is reduced, so that the prediction precision of the target detection model is improved, and the accuracy of the image detection result is further improved.
On the basis of the above embodiment, optionally, the processing module 402 is further configured to detect, before the target image is acquired, the training image through the teacher model and the student model, to obtain a first detection result set of the teacher model and a second detection result set of the student model; determining a first loss value corresponding to a third detection result set according to the cross-over ratio between the third detection result set and the label data corresponding to the training image, the reliability factor of each detection result in the third detection result set and the original loss value corresponding to the third detection result set; determining a second loss value corresponding to a fourth detection result set according to the reliability factor of each detection result in the fourth detection result set and the original loss value corresponding to the fourth detection result set; wherein the fourth detection result set comprises detection results except the third detection result set in the first detection result set and the second detection result set; training the student model according to the first loss value and the second loss value to obtain a target detection model; the third detection result set comprises m detection results with highest reliability factors in the first detection result set and m detection results with highest reliability factors in the second detection result set, wherein m is a natural number larger than 1.
On the basis of the above embodiment, optionally, the processing module 402 is specifically configured to determine an intersection ratio between each detection result in the first detection result set and the detection result of the same category in the second detection result set; determining the maximum cross ratio as the target cross ratio of each detection result; determining a KL divergence of the first set of detection results in terms of category relative to the second set of detection results; and determining the reliability factor of each detection result in the first detection result set according to the KL divergence and the target cross-correlation ratio.
On the basis of the foregoing embodiment, optionally, the processing module 402 is specifically configured to determine, when the intersection ratio between the third detection result set and the tag data corresponding to the training image is less than or equal to a first preset threshold, a class weight corresponding to each detection result according to a reliability factor of each detection result, a confidence coefficient of each class of each detection result, and a sum of confidence coefficients of all classes of each detection result in the third detection result set; and determining a first loss value corresponding to the third detection result set according to the class weight corresponding to each detection result in the third detection result set and the original class loss value corresponding to the third detection result set.
On the basis of the foregoing embodiment, optionally, the processing module 402 is further configured to determine corresponding corrected detection data according to each detection result in the third detection result set and a class weight corresponding to each detection result; and determining each piece of correction detection data as label data of the next training round of the student model.
On the basis of the foregoing embodiment, optionally, the processing module 402 is specifically configured to determine, when the cross-over ratio between the third detection result set and the tag data corresponding to the training image is greater than the first preset threshold and less than the second preset threshold, a class weight corresponding to each detection result in the third detection result set according to the reliability factor of each detection result in the third detection result set and the class error probability of each detection result; determining regression weights corresponding to all detection results according to the cross ratio corresponding to all detection results in the third detection result set and the reliability factors of all detection results; and determining a first loss value corresponding to the third detection result set according to the category weight, the regression weight, the original category loss value and the original regression loss value between the third detection result set and the label data corresponding to the training image.
On the basis of the foregoing embodiment, optionally, the processing module 402 is specifically configured to determine, when the intersection ratio of the third detection result set and the tag data corresponding to the training image is greater than or equal to the second preset threshold, a first loss value corresponding to the third detection result set according to a product of a reliability factor of each detection result in the third detection result set and a confidence coefficient of each category of each detection result, and an original category loss value and an original regression loss value corresponding to the third detection result set.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 5, the electronic device may include a processor 500, a memory 501, an input device 502, and an output device 503; the number of processors 500 in the electronic device may be one or more, one processor 500 being taken as an example in fig. 5; the processor 500, the memory 501, the input means 502 and the output means 503 in the electronic device may be connected by a bus or otherwise, in fig. 5 by way of example.
The memory 501 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the object detection method in the embodiment of the present application (for example, the acquisition module 401 and the processing module 402 in the object detection device). The processor 500 executes various functional applications of the electronic device and data processing, i.e., implements the above-described object detection method, by running software programs, instructions, and modules stored in the memory 501.
The memory 501 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for functions; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 501 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 501 may further include memory remotely located with respect to processor 500, which may be connected to a device/terminal/server via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 502 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. The output means 503 may comprise a display device such as a display screen.
The present embodiments also provide a storage medium containing computer executable instructions, which when executed by a computer processor, are for performing a target detection method, the method comprising:
acquiring a target image;
detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present application is not limited to the method operations described above, and may also perform the related operations in the object detection method provided in any embodiment of the present application.
From the above description of embodiments, it will be clear to a person skilled in the art that the present application may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, where the instructions include a number of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
It should be noted that, in the above-mentioned embodiments of the search apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, as long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. Those skilled in the art will appreciate that the present application is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the present application. Therefore, while the present application has been described in connection with the above embodiments, the present application is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present application, the scope of which is defined by the scope of the appended claims.

Claims (10)

1. A method of detecting an object, comprising:
acquiring a target image;
detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
2. The method of claim 1, further comprising, prior to the acquiring the target image:
detecting training images through a teacher model and a student model to obtain a first detection result set of the teacher model and a second detection result set of the student model;
determining a first loss value corresponding to a third detection result set according to the cross-over ratio between the third detection result set and the label data corresponding to the training image, the reliability factor of each detection result in the third detection result set and the original loss value corresponding to the third detection result set; the third detection result set comprises m detection results with highest reliability factors in the first detection result set and m detection results with highest reliability factors in the second detection result set, wherein m is a natural number larger than 1;
determining a second loss value corresponding to a fourth detection result set according to the reliability factor of each detection result in the fourth detection result set and the original loss value corresponding to the fourth detection result set; wherein the fourth detection result set comprises detection results except the third detection result set in the first detection result set and the second detection result set;
and training the student model according to the first loss value and the second loss value to obtain a target detection model.
3. The method of claim 2, wherein the determining of the reliability factor for each test result in the first set of test results comprises:
determining the cross-over ratio of each detection result in the first detection result set to the detection result of the same category in the second detection result set;
determining the maximum cross ratio as the target cross ratio of each detection result;
determining a KL divergence of the first set of detection results in terms of category relative to the second set of detection results;
and determining the reliability factor of each detection result in the first detection result set according to the KL divergence and the target cross-correlation ratio.
4. The method according to claim 2, wherein determining the first loss value corresponding to the third detection result set according to the cross-over ratio between the third detection result set and the tag data corresponding to the training image, the reliability factor of each detection result in the third detection result set, and the original loss value corresponding to the third detection result set includes:
when the intersection ratio between the third detection result set and the label data corresponding to the training image is smaller than or equal to a first preset threshold value, determining the class weight corresponding to each detection result according to the reliability factor of each detection result in the third detection result set, the confidence coefficient of each class of each detection result and the sum of the confidence coefficients of all classes of each detection result;
and determining a first loss value corresponding to the third detection result set according to the class weight corresponding to each detection result in the third detection result set and the original class loss value corresponding to the third detection result set.
5. The method as recited in claim 4, further comprising:
determining corresponding correction detection data according to each detection result in the third detection result set and the class weight corresponding to each detection result;
and determining each piece of correction detection data as label data of the next training round of the student model.
6. The method as recited in claim 4, further comprising:
when the cross ratio between the third detection result set and the label data corresponding to the training image is larger than the first preset threshold value and smaller than the second preset threshold value, determining the class weight corresponding to each detection result in the third detection result set according to the reliability factor of each detection result in the third detection result set and the class error probability of each detection result;
determining regression weights corresponding to all detection results according to the cross ratio corresponding to all detection results in the third detection result set and the reliability factors of all detection results;
and determining a first loss value corresponding to the third detection result set according to the category weight, the regression weight, the original category loss value and the original regression loss value between the third detection result set and the label data corresponding to the training image.
7. The method as recited in claim 6, further comprising:
when the intersection ratio of the third detection result set and the label data corresponding to the training image is greater than or equal to the second preset threshold value, determining a first loss value corresponding to the third detection result set according to the product of the reliability factor of each detection result in the third detection result set and the confidence coefficient of each category of each detection result, and the original category loss value and the original regression loss value corresponding to the third detection result set.
8. An object detection apparatus, comprising:
the acquisition module is used for acquiring a target image;
the processing module is used for detecting a target object in the target image by using a pre-trained target detection model to obtain a corresponding detection result;
the target detection model is obtained by denoising tag data corresponding to a training image through a teacher model and a student model together, and training the student model based on the teacher model, the training image and the denoised tag data.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the processor to perform the steps of the method of any of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
CN202310317906.5A 2023-03-28 2023-03-28 Target detection method, device, equipment and storage medium Pending CN116343007A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310317906.5A CN116343007A (en) 2023-03-28 2023-03-28 Target detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310317906.5A CN116343007A (en) 2023-03-28 2023-03-28 Target detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116343007A true CN116343007A (en) 2023-06-27

Family

ID=86875880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310317906.5A Pending CN116343007A (en) 2023-03-28 2023-03-28 Target detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116343007A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372819A (en) * 2023-12-07 2024-01-09 神思电子技术股份有限公司 Target detection increment learning method, device and medium for limited model space

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117372819A (en) * 2023-12-07 2024-01-09 神思电子技术股份有限公司 Target detection increment learning method, device and medium for limited model space
CN117372819B (en) * 2023-12-07 2024-02-20 神思电子技术股份有限公司 Target detection increment learning method, device and medium for limited model space

Similar Documents

Publication Publication Date Title
CN111639744A (en) Student model training method and device and electronic equipment
CN111931731B (en) Question judging method and device, electronic equipment and storage medium
US20100277586A1 (en) Method and apparatus for updating background
CN111931864B (en) Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN111738249B (en) Image detection method, image detection device, electronic equipment and storage medium
CN110990627B (en) Knowledge graph construction method, knowledge graph construction device, electronic equipment and medium
CN112085056B (en) Target detection model generation method, device, equipment and storage medium
CN112257703B (en) Image recognition method, device, equipment and readable storage medium
CN111027412A (en) Human body key point identification method and device and electronic equipment
CN116343007A (en) Target detection method, device, equipment and storage medium
CN113763348A (en) Image quality determination method and device, electronic equipment and storage medium
CN114743074B (en) Ship detection model training method and system based on strong and weak confrontation training
CN115567736A (en) Video content detection method, device, equipment and storage medium
CN112307900A (en) Method and device for evaluating facial image quality and electronic equipment
CN115797735A (en) Target detection method, device, equipment and storage medium
CN113516697B (en) Image registration method, device, electronic equipment and computer readable storage medium
CN117710756B (en) Target detection and model training method, device, equipment and medium
CN111832550B (en) Data set manufacturing method and device, electronic equipment and storage medium
CN106910207B (en) Method and device for identifying local area of image and terminal equipment
CN113052019A (en) Target tracking method and device, intelligent equipment and computer storage medium
CN116758280A (en) Target detection method, device, equipment and storage medium
CN113762382B (en) Model training and scene recognition method, device, equipment and medium
CN115601618A (en) Magnetic core defect detection method and system and computer storage medium
CN115661542A (en) Small sample target detection method based on feature relation migration
CN115423780A (en) Image quality-based key frame extraction method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination