CN115482417B - Multi-target detection model, training method, device, medium and equipment thereof - Google Patents

Multi-target detection model, training method, device, medium and equipment thereof Download PDF

Info

Publication number
CN115482417B
CN115482417B CN202211212592.4A CN202211212592A CN115482417B CN 115482417 B CN115482417 B CN 115482417B CN 202211212592 A CN202211212592 A CN 202211212592A CN 115482417 B CN115482417 B CN 115482417B
Authority
CN
China
Prior art keywords
detection
target
real
frame
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211212592.4A
Other languages
Chinese (zh)
Other versions
CN115482417A (en
Inventor
陈瑞斌
肖兵
李正国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Shixi Technology Co Ltd
Original Assignee
Zhuhai Shixi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Shixi Technology Co Ltd filed Critical Zhuhai Shixi Technology Co Ltd
Priority to CN202211212592.4A priority Critical patent/CN115482417B/en
Publication of CN115482417A publication Critical patent/CN115482417A/en
Application granted granted Critical
Publication of CN115482417B publication Critical patent/CN115482417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-target detection model, a training method, a device, a medium and equipment thereof, wherein the method comprises the following steps: acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information; evaluating the detection target information to obtain an evaluation result; and adjusting the detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image. According to the scheme, on the premise of not increasing the model parameters and training data, the classification precision in object detection is improved, so that the robustness of an algorithm is enhanced.

Description

Multi-target detection model, training method, device, medium and equipment thereof
Technical Field
The invention relates to the technical field of image processing, in particular to a multi-target detection model and a training method, a training device, a training medium and training equipment thereof.
Background
In the image processing process, the target detection is an important component for image recognition, and the training mode of the deep learning detection network for the target detection model directly influences the accuracy of the detection result of the model. Most of the deep learning detection network training modes are based on multi-task decoupling, such as yolo series. However, there is a certain dependence between tasks, if the task is forcefully split, the characteristics of the task which is partially dependent are disordered under special conditions, and the final classification result is affected seriously.
Disclosure of Invention
In view of the foregoing, the present invention provides a multi-objective detection model, and training method, apparatus, medium and device thereof, which overcome or at least partially solve the foregoing problems.
According to one aspect of the present invention, there is provided a multi-objective detection model created based on a YOLO network architecture, the multi-objective detection model including a backbone network structure and a loss function structure for image feature learning; the loss function structure includes: the system comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module.
Optionally, the model predictive decoder is respectively coupled with the cross ratio calculation module, the detection target confidence coefficient regression module and the classification module;
the cross ratio calculation module is further coupled with the detection target confidence coefficient regression module, the frame regression module and the model prior module; the real information distribution module is respectively coupled with the cross-correlation calculation module and the model prior module; the model prior module is also coupled to the classification module.
Optionally, a model predictive decoder for outputting detection target information of the image;
the real information distribution module is used for outputting real target information of the image;
the cross-over ratio calculation module is used for calculating the cross-over ratio between the detection frame and the real frame;
the model prior module is used for adjusting the detection target information according to the output of the real information distribution module and the cross ratio calculation module.
According to one aspect of the invention, a model prior-based multi-target detection model training method is provided, and is applied to the multi-target detection model; the method comprises the following steps:
evaluating the detection target information to obtain an evaluation result;
and adjusting the detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image.
Optionally, the evaluating the detection target information based on the real target information, and obtaining an evaluation result includes:
acquiring real target information of the sample image; the real target information comprises image coordinates of a real frame; the detection target information comprises image coordinates of a detection frame corresponding to the detection target;
and calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame by using the intersection ratio calculation module, so as to take the intersection ratio between the detection frame and the real frame as an evaluation result.
Optionally, the adjusting the detection target result according to the evaluation result includes:
generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image;
generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix;
and determining the category label of the detection target according to the overlapped category list, and taking the category label as the final category label of the detection target.
Optionally, the generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame includes:
acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix;
marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter;
and generating an overlapping mask matrix corresponding to each detection target based on the cross ratio matrix, the first parameter and the second parameter.
Optionally, the generating the overlapping category list corresponding to the detection target based on the overlapping mask matrix includes:
acquiring a real target class label;
and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.
Optionally, the determining the category label of the detection target according to the overlapping category list includes:
acquiring the total number of categories of the overlapped category list of the detection target;
if the total number of categories is smaller than or equal to a second preset threshold value, the category label in the overlapped category list is used as the category label of the detection target;
and if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.
According to another aspect of the present invention, there is provided a model prior-based multi-objective detection model training apparatus applied to any one of the above multi-objective detection models, the apparatus comprising:
the target detection unit is used for acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information;
the evaluation unit is used for evaluating the detection target information to obtain an evaluation result;
and the training unit is used for adjusting the detection target result according to the evaluation result and training the multi-target detection model by using the adjusted detection target result and the sample image.
According to another aspect of the present invention, there is provided a computer readable storage medium for storing program code for performing the model prior based multi-objective detection model training method of any one of the above.
According to another aspect of the present invention, there is provided a computing device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the model prior-based multi-objective detection model training method according to any one of the above-described instructions in the program code.
According to another aspect of the present invention, there is provided an image capturing apparatus employing the multi-target detection model of any one of the above, or performing the model-prior-based multi-target detection model training method, or including the model-prior-based multi-target detection model training device, or having the computer-readable storage medium.
The invention provides a multi-target detection model, a training method, a device, a medium and equipment thereof. On the premise of not increasing the model parameters and training data, the classification precision in the object detection is improved, so that the algorithm robustness is enhanced.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
The above, as well as additional objectives, advantages, and features of the present invention will become apparent to those skilled in the art from the following detailed description of a specific embodiment of the present invention when read in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 illustrates a multi-objective detection network training schematic in accordance with an embodiment of the present invention;
FIG. 2 illustrates an IOU computation schematic according to one embodiment of the invention;
FIG. 3 shows a flow diagram of a model prior-based multi-objective detection model training method in accordance with an embodiment of the present invention;
FIG. 4 shows a schematic diagram of the detection results of the multi-target detection network according to FIG. 1;
FIG. 5 is a diagram showing the detection results of a network according to conventional multi-objective detection;
FIG. 6 shows a comparative schematic of recall curves according to an embodiment of the present invention;
FIG. 7 shows a schematic structural diagram of a model prior-based multi-objective detection model training device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The embodiment of the invention provides another model prior-based multi-target detection model training method, and the multi-target detection model of the embodiment is created based on a YOLO network architecture, preferably YOLOv5 or YOLOv7. The multi-objective detection model of the present embodiment includes a backbone network structure and a loss function structure that perform image feature learning. Since the YOLO network architecture is already open, this will not be described in detail here. As shown in fig. 1, the loss function structure may include a model prediction decoder M1, a real information allocation module M2, an overlap ratio calculation module M3, a detection target confidence regression module M4, a frame regression module M5, a classification module M6, and a model prior module M7. As shown in fig. 1, the model predictive decoder M1 is respectively coupled to the cross ratio calculation module M3, the detection target confidence coefficient regression module M4 and the classification module M6; the cross ratio calculation module M3 is further coupled with the detection target confidence coefficient regression module M4, the frame regression module M5 and the model prior module M7; the real information distribution module M2 is respectively coupled with the cross ratio calculation module M3 and the model prior module M7; the model prior module M7 is also coupled to the classification module M6.
Model predictive decoder m1.Decoder: for outputting detection target information of the image. The decoding process of different object detection networks is different, and the decoder output is decoupled into three parts: object, characterizing the confidence of a detection target; bbox, representing the image coordinates of the detection frame; class, characterize the classification confidence of the predictions.
The real information distribution module m2.Build target: for outputting real target information of the image. Real label distribution is responsible for dynamically distributing input real labels and establishing association with a model output layer. The label distribution modes of different detection networks are different, and the output decoupling is divided into two parts: bbox, representing the image coordinates of a real frame; class, class tag characterizing real targets.
The cross-over ratio calculation module m3.Iou calculator: the method is used for calculating the intersection ratio between the detection frame and the real frame and is used as an evaluation standard of the regression quality of the detection frame. As shown in fig. 2, if the overlapping portion of the rectangle a and the rectangle B is AB, the calculation formula of the overlap ratio is as follows:
the detection target confidence regression module m4.Object regression: the module is typically composed of regression loss, such as cross entropy loss, focal loss, etc.
Frame regression module m5.Bbox regression: the module is typically composed of a gious loss.
Classification module m6.Classification: the module is generally composed of class loss.
The model priori module M7 adjusts the detection target information according to the output of the real information distribution module and the cross ratio calculation module, and can specifically readjust the class label of the detection target according to the result of M3 (the result is used for evaluating the quality of the frame regression).
As shown in fig. 3, the model prior-based multi-objective detection model training method according to the embodiment of the present invention may at least include the following steps S1 to S3.
S1, acquiring at least one sample image for model training from a training sample set, and inputting the sample image into a multi-target detection model to obtain corresponding detection target information; the target detection information comprises at least one detection target and detection target information corresponding to the detection target; the detection target information comprises the confidence of the detection target, the image coordinates of the detection frame and the predicted classification confidence. The training sample set in this embodiment is a sample marked with real frames, and each real frame has a real target category of a real target and image coordinate related information. During each training, a plurality of iterative training of the single-round model can be selected randomly from the training sample set. In combination with the foregoing, the multi-target detection model of the present embodiment may be capable of implementing multiple types of target detection in the input image, and the multi-target detection model may be generated based on the yolo network model building.
S2, evaluating the detection target information to obtain an evaluation result.
And S3, adjusting a detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image.
In practical applications, multi-objective detection networks such as yolov5, yoloR, and yolov7 decouple tasks into three tasks of frame regression (bounding boxes regression), objective confidence regression (object regression), and multi-class classification (classification) during training. In other words, the training of the target multi-target detection model is mainly the training of the detection target confidence regression module M4, the frame regression module M5, and the classification module M6. The detection target confidence coefficient regression module M4, the frame regression module M5 and the classification module M6 are respectively provided with independent loss functions, so that the training iteration times can be controlled by taking the loss functions corresponding to the detection target confidence coefficient regression module M4, the frame regression module M5 and the classification module M6 as training constraint conditions until training is stopped.
In the conventional scheme, a model priori module M7 is not arranged, and the target confidence coefficient regression M4 depends on the calculation result of the cross-ratio calculation module M3, namely the quality of frame regression affects the target confidence coefficient. Whereas the multi-class classification M6 is independent of M4 and M5, which means that M6 does not have any measure to circumvent when the target detection is wrong. In practical application, there is a dependency between three training tasks of frame regression, target confidence coefficient regression and multi-class classification of the multi-target detection model, for example, the frame regression determines the classes of the target confidence coefficient and the classification. According to the scheme provided by the embodiment of the invention, after the detection target information corresponding to the detection target is obtained, the detection target result of the model is used as the priori information, the detection target information is adaptively modified by analyzing the priori information, namely, the model prediction result is added into the loss functions of the yolov5, yolov R and yolov7 networks to be used as the priori information, the current classification label is adaptively modified by analyzing the priori information, and the problem of classification errors caused by the adjacent or overlapping of different types of targets can be effectively solved. On the premise of not increasing the model parameters and training data, the classification precision in the object detection is improved, so that the algorithm robustness is enhanced.
In the embodiment of the present invention, the step S2 of evaluating the detection target information based on the real target information may include:
s2-1, acquiring real target information of the sample image; the real target information includes image coordinates of a real frame.
S2-2, calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame by using the intersection ratio calculation module, so as to take the intersection ratio between the detection frame and the real frame as an evaluation result.
As in fig. 2, after the image coordinates of the real frame and the image coordinates of the detection frame are obtained, the intersection ratio of the real frame and the detection frame may be calculated. In this embodiment, whether the real target or the detection target has a corresponding ID, when calculating the merging ratio, the merging ratio between each detection frame and each real frame may be calculated separately.
The step S3 may further include adjusting the detection target result according to the evaluation result:
s3-1, generating an overlapped mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image.
The step S2 may acquire IDs of each real target and each detection target, and optionally, generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame may include:
s3-1-1, obtaining a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix.
As shown in Table 1, let the matrix be n rows and m columns (n is the total number of predicted targets and m is the total number of real targets), if the two subscripts are used to represent IOU ij (i is more than or equal to 0 and less than or equal to n-1, j is more than or equal to 0 and less than or equal to m-1), wherein the subscript i is a predicted target, and j is a real target. If expressed as IOU by single table i Where subscript i is the prediction target. Wherein IOU is provided with i With IOU ij Is related to IOU i =[IOU i0 ,IOU i2 ,...IOU i,m-1 ]。
TABLE 1IOU matrix
IOU 0,0 IOU 0,1 ... IOU 0,m-2 IOU 0,m-1
IOU 1,0 IOU 1,1 ... IOU 1,m-2 IOU1 -2,m-1
... ... IOU i,j ... ...
IOU n-2,0 IOU n-2,1 ... IOU n-2,m-2 IOU n-2,m-1
IOU n-1,0 IOU n-1,1 ... IOU n-1,m-2 IOU n-1,m-1
Table 2 shows the IOU matrix of the embodiment shown in FIG. 4, where (a) is the model prediction result and (b) is the label.
Table 2 example IOU matrix
That is, the IOU matrix is ordered in sequence with the real targets being columns and the detection targets being rows.
S3-1-2, marking matrix elements with the cross ratio larger than a first preset threshold value in the cross ratio matrix as first parameters, and marking matrix elements smaller than or equal to the first preset threshold value as second parameters.
After the cross-over matrix is obtained, an overlap mask may be generated. In this embodiment, the mask of the i-th predicted target corresponding to the real target is:
mask i =IOU ij >0,(0≤j≤m-1)
in this embodiment, a first preset threshold is set to 0, and matrix elements in the IOU matrix with the cross ratio greater than the first preset threshold 0 are marked as a first parameter True; matrix elements in the IOU matrix that are less than or equal to the first preset threshold value 0 are labeled as the second parameter False.
S3-1-3, generating an overlapped mask matrix corresponding to each detection target based on the cross ratio matrix, the first parameter and the second parameter.
Thus, the overlapping mask matrix corresponding to the cross-ratio matrix of table 2 may be as shown in table 3.
Table 3 example mask matrix
True True False
True True False
False False True
Corresponding to Table 3, each row of detection targets has a corresponding overlapping mask matrix, i.e., mask 0 =[True,True,False],mask 1 =[True,True,False],mask 2 =[False,False,True]。
S3-2, generating an overlapped category list corresponding to the detection target based on the overlapped mask matrix.
After the overlapping mask matrix is obtained, an overlapping category list iou_class may be generated, and in this embodiment, generating, based on the overlapping mask matrix, an overlapping category list corresponding to the detection target includes:
s3-2-1, obtaining a real target class label.
According to fig. 4, tclass= [ circle, hexagon ].
S3-2-2, performing de-duplication on the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.
Taking the real target category as Tclass, wherein the j-th real target category is Tclass i (0.ltoreq.j.ltoreq.m-1), the list of categories in which the ith predicted target overlaps with the real target is:
iou_class i =unique(Tclass[mask i ]),(0≤i≤n-1)
wherein is class [ mask ] i ]Slicing, if mask matrix A= [ True, flash, true]Matrix b= [0,1,2,3]B is [ A ]]=[0,3]。
Unique () is a deduplication function, and if matrix b= [0,0,1,3,2,3,4], unique (B) = [0,1,2,3,4].
In connection with FIG. 4, the iou_class is shown in Table 4.
TABLE 4 overlapping category list
ID i=0 i=1 i=2
mask i [True,True,False] [True,True,False] [False,False,True]
Tclass[mask i ] Circle, hexagon] Circle, hexagon] [ Hexagon shape]
unique(Tclass[mask i ]) Circle, hexagon] Circle, hexagon] [ Hexagon shape]
iou_class Circle, hexagon] Circle, hexagon] [ Hexagon shape]
S3-3, determining the category label of the detection target according to the overlapped category list, and taking the category label as the final category label of the detection target. The method specifically comprises the following steps:
s3-3-1, obtaining the total number of categories of the overlapped category list of the detection target;
s3-3-2, if the total number of categories is smaller than or equal to a second preset threshold value, using the category labels in the overlapped category list as the category labels of the detection targets;
s3-3-3, if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.
Calculating the optimal class of the predicted target, and marking the class label of the i-th predicted target as Pclass i The optimal label of the i-th predicted target is denoted as class_refine i Then
Where len () is the total number of the list, if a= [ True, flise, true ], len (a) =4;
argmax () is a subscript corresponding to the maximum value element, and if a= [4,1,2,0,3], argmax (a) =0.
Based on the above, the prediction target optimal categories corresponding to table 4 are shown in table 5.
TABLE 5
ID i=0 i=1 i=2
Len(iou_class i ) 2 2 1
IOU i [0.99,0.32,0.0] [0.76,0.44,0.0] [0.0,0.0,0.99]
Argmax(IOU i ) 0 1 N/A
Pclass i Circle Hexagonal shape Hexagonal shape
class_refine i Circle Circle Hexagonal shape
And (3) taking the class label of the detection target obtained finally in the step (S3-3) as a final class label of the detection target, and further training a detection target confidence coefficient regression module, a frame regression module and a real information distribution module according to the sample image and the final class label after combination adjustment so as to realize multi-training of a multi-target detection model.
Tasks are decoupled into bounding box regression (bounding boxes regression), target confidence regression (object regression), and multi-class classification (classification) during the training of the multi-target model.
As shown in fig. 5, assume that circles and hexagons now need to be detected. Ideally, as shown in fig. 5 (a), when the detection frame of the circle is shifted (as shown in fig. 5 (b)), M6 still considers the detected object to be a circle, but the actual feature is a hexagon. This situation is more pronounced when the centers of the objects are close (as in fig. 5 (c)). When feature spaces between classes are confused, classification can still be in error even if the frame and target confidence regression are normal.
In FIG. 6, (a) is a master precision and recall curve, head map50 is 0.991, hand map50 is 0.929, and average is 0.960; (b) To train the results using this protocol, the head map50 was 0.996, the hand map50 was 0.936, and the average was 0.966. Each index is improved to a certain extent.
Based on the same inventive concept, the embodiment of the invention also provides a model priori-based multi-target detection model training device, wherein the multi-target detection model is created based on a YOLO network architecture and comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module; as shown in fig. 7, the model prior-based multi-objective detection model training apparatus of the present invention may include:
a target detection unit 710, configured to obtain at least one sample image for model training from a training sample set, and input the sample image into the multi-target detection model to obtain corresponding detection target information;
an evaluation unit 720, configured to evaluate the detection target information to obtain an evaluation result;
and an adjusting unit 730, configured to adjust the detection target result according to the evaluation result, and train the multi-target detection model by using the adjusted detection target result and the sample image.
In an alternative embodiment of the invention, the evaluation unit 720 may further be adapted to:
acquiring real target information of the sample image; the real target information comprises image coordinates of a real frame; the detection target information comprises image coordinates of a detection frame corresponding to the detection target;
and calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame by using the intersection ratio calculation module, so as to take the intersection ratio between the detection frame and the real frame as an evaluation result.
In an alternative embodiment of the present invention, the adjusting unit 730 may also be configured to:
generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image;
generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix;
and determining the category label of the detection target according to the overlapped category list, and taking the category label as the final category label of the detection target.
In an alternative embodiment of the present invention, the adjusting unit 730 may also be configured to:
acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix;
marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter;
and generating an overlapping mask matrix corresponding to each detection target based on the cross ratio matrix, the first parameter and the second parameter.
In an alternative embodiment of the present invention, the adjusting unit 730 may also be configured to:
acquiring a real target class label;
and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.
Acquiring the total number of categories of the overlapped category list of the detection target;
if the total number of categories is smaller than or equal to a second preset threshold value, the category label in the overlapped category list is used as the category label of the detection target;
and if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.
The embodiment of the invention also provides a computer readable storage medium for storing program codes for executing the multi-target detection model training method based on model prior described in the above embodiment.
The embodiment of the invention also provides a computing device, which comprises a processor and a memory: the memory is used for storing program codes and transmitting the program codes to the processor; the processor is configured to execute the model prior-based multi-objective detection model training method according to the foregoing embodiment according to instructions in the program code.
It will be clear to those skilled in the art that the specific working processes of the above-described systems, devices, modules and units may refer to the corresponding processes in the foregoing method embodiments, and for brevity, the description is omitted here.
In addition, each functional unit in the embodiments of the present invention may be physically independent, two or more functional units may be integrated together, or all functional units may be integrated in one processing unit. The integrated functional units may be implemented in hardware or in software or firmware.
Those of ordinary skill in the art will appreciate that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or in whole or in part in the form of a software product stored in a storage medium, comprising instructions for causing a computing device (e.g., a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM), a magnetic disk, or an optical disk, etc.
Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a personal computer, a server, or a computing device such as a network device) associated with program instructions, where the program instructions may be stored on a computer-readable storage medium, and where the program instructions, when executed by a processor of the computing device, perform all or part of the steps of the method according to the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all technical features thereof can be replaced by others within the spirit and principle of the present invention; such modifications and substitutions do not depart from the scope of the invention.

Claims (10)

1. A model prior-based multi-objective detection model training method, the method comprising:
acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information;
evaluating the detection target information to obtain an evaluation result; the evaluation result comprises the cross-correlation ratio between a detection frame of the detection target and a real frame of the real target in the sample image;
adjusting the detection target information according to the evaluation result, and training the multi-target detection model by using the adjusted detection target information and the sample image;
the adjusting the detection target information according to the evaluation result includes:
generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image; generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix; determining a category label of the detection target according to the overlapped category list, and taking the category label as a final category label of the detection target;
wherein generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame comprises: acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix; marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter; generating an overlapping mask matrix corresponding to each detection target based on the intersection ratio matrix, the first parameter and the second parameter;
the generating the overlapping category list corresponding to the detection target based on the overlapping mask matrix comprises the following steps: acquiring a real target class label; and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.
2. The method according to claim 1, wherein evaluating the detection target information to obtain an evaluation result includes:
acquiring real target information of the sample image; the real target information comprises image coordinates of a real frame; the detection target information comprises image coordinates of a detection frame corresponding to the detection target;
and calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame, and taking the intersection ratio between the detection frame and the real frame as an evaluation result.
3. The method of claim 1, wherein the determining the category label of the detection target from the overlapping category list comprises:
acquiring the total number of categories of the overlapped category list of the detection target;
if the total number of categories is smaller than or equal to a second preset threshold value, the category label in the overlapped category list is used as the category label of the detection target;
and if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.
4. A multi-target detection model, characterized in that the multi-target detection model is trained based on the model prior-based multi-target detection model training method according to any one of claims 1-3; the multi-target detection model is created based on a YOLO network architecture and comprises a backbone network structure and a loss function structure for image feature learning; the loss function structure includes: the system comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module.
5. The multi-target detection model of claim 4, wherein the model predictive decoder is coupled to the cross-ratio calculation module, the detection target confidence regression module, and the classification module, respectively;
the cross ratio calculation module is further coupled with the detection target confidence coefficient regression module, the frame regression module and the model prior module; the real information distribution module is respectively coupled with the cross-correlation calculation module and the model prior module; the model prior module is also coupled to the classification module.
6. The multi-target detection model of claim 4, wherein,
a model predictive decoder for outputting detection target information of the image;
the real information distribution module is used for outputting real target information of the image;
the cross-over ratio calculation module is used for calculating the cross-over ratio between the detection frame and the real frame;
the model prior module is used for adjusting the detection target information according to the output of the real information distribution module and the cross ratio calculation module.
7. A model prior-based multi-objective detection model training apparatus, the apparatus comprising:
the target detection unit is used for acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information;
the evaluation unit is used for evaluating the detection target information to obtain an evaluation result; the evaluation result comprises the cross-correlation ratio between a detection frame of the detection target and a real frame of the real target in the sample image;
the training unit is used for adjusting the detection target information according to the evaluation result and training the multi-target detection model by utilizing the adjusted detection target information and the sample image;
the training unit is further configured to: generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image; generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix; determining a category label of the detection target according to the overlapped category list, and taking the category label as a final category label of the detection target;
wherein generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame comprises: acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix; marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter; generating an overlapping mask matrix corresponding to each detection target based on the intersection ratio matrix, the first parameter and the second parameter;
the generating the overlapping category list corresponding to the detection target based on the overlapping mask matrix comprises the following steps: acquiring a real target class label; and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.
8. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a program code for performing the method of any one of claims 1-3.
9. A computing device, the computing device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method of any of claims 1-3 according to instructions in the program code.
10. An image capturing apparatus employing the multi-target detection model according to any one of claims 4 to 6, or performing the model-prior-based multi-target detection model training method according to any one of claims 1 to 3, or comprising the model-prior-based multi-target detection model training device according to claim 7, or having the computer-readable storage medium according to claim 8.
CN202211212592.4A 2022-09-29 2022-09-29 Multi-target detection model, training method, device, medium and equipment thereof Active CN115482417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211212592.4A CN115482417B (en) 2022-09-29 2022-09-29 Multi-target detection model, training method, device, medium and equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211212592.4A CN115482417B (en) 2022-09-29 2022-09-29 Multi-target detection model, training method, device, medium and equipment thereof

Publications (2)

Publication Number Publication Date
CN115482417A CN115482417A (en) 2022-12-16
CN115482417B true CN115482417B (en) 2023-08-08

Family

ID=84394825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211212592.4A Active CN115482417B (en) 2022-09-29 2022-09-29 Multi-target detection model, training method, device, medium and equipment thereof

Country Status (1)

Country Link
CN (1) CN115482417B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908790B (en) * 2022-12-28 2024-07-26 北京斯年智驾科技有限公司 Method and device for detecting target detection center point offset and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808122A (en) * 2017-09-30 2018-03-16 中国科学院长春光学精密机械与物理研究所 Method for tracking target and device
CN111241947A (en) * 2019-12-31 2020-06-05 深圳奇迹智慧网络有限公司 Training method and device of target detection model, storage medium and computer equipment
CN113239982A (en) * 2021-04-23 2021-08-10 北京旷视科技有限公司 Training method of detection model, target detection method, device and electronic system
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device
CN114764778A (en) * 2021-01-14 2022-07-19 北京图森智途科技有限公司 Target detection method, target detection model training method and related equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858569A (en) * 2019-03-07 2019-06-07 中国科学院自动化研究所 Multi-tag object detecting method, system, device based on target detection network
KR20210050129A (en) * 2019-10-28 2021-05-07 삼성에스디에스 주식회사 Machine learning apparatus and method for object detection
JP2022091270A (en) * 2020-12-09 2022-06-21 ブラザー工業株式会社 Method, system, and computer program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808122A (en) * 2017-09-30 2018-03-16 中国科学院长春光学精密机械与物理研究所 Method for tracking target and device
CN111241947A (en) * 2019-12-31 2020-06-05 深圳奇迹智慧网络有限公司 Training method and device of target detection model, storage medium and computer equipment
CN114764778A (en) * 2021-01-14 2022-07-19 北京图森智途科技有限公司 Target detection method, target detection model training method and related equipment
CN113239982A (en) * 2021-04-23 2021-08-10 北京旷视科技有限公司 Training method of detection model, target detection method, device and electronic system
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程璐飞.基于深度学习的多目标检测与分类算法的研究.《万方数据库》.2022,第8-53页. *

Also Published As

Publication number Publication date
CN115482417A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN110070141B (en) Network intrusion detection method
CN109460793B (en) Node classification method, model training method and device
CN110222780B (en) Object detection method, device, equipment and storage medium
CN111564179B (en) Species biology classification method and system based on triple neural network
CN110969200B (en) Image target detection model training method and device based on consistency negative sample
CN113723070B (en) Text similarity model training method, text similarity detection method and device
CN115482417B (en) Multi-target detection model, training method, device, medium and equipment thereof
CN111382572A (en) Named entity identification method, device, equipment and medium
CN109472048A (en) The method of assessment intelligent electric meter structural reliability is extended based on sparse polynomial chaos
CN114943674A (en) Defect detection method, electronic device and storage medium
CN107688822B (en) Newly added category identification method based on deep learning
CN114943672A (en) Image defect detection method and device, electronic equipment and storage medium
CN114139636B (en) Abnormal operation processing method and device
CN114202671A (en) Image prediction optimization processing method and device
Liu et al. Fuzzy c-mean algorithm based on Mahalanobis distances and better initial values
CN110135507A (en) A kind of label distribution forecasting method and device
CN110751400A (en) Risk assessment method and device
CN107067034B (en) Method and system for rapidly identifying infrared spectrum data classification
CN114139643B (en) Monoglyceride quality detection method and system based on machine vision
CN109872006A (en) A kind of scoring distribution forecasting method and device
JP7306460B2 (en) Adversarial instance detection system, method and program
CN113128556B (en) Deep learning test case sequencing method based on mutation analysis
CN111859947B (en) Text processing device, method, electronic equipment and storage medium
CN111368576B (en) Global optimization-based Code128 bar Code automatic reading method
CN111108516B (en) Evaluating input data using a deep learning algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant