CN115482417B

CN115482417B - Multi-target detection model, training method, device, medium and equipment thereof

Info

Publication number: CN115482417B
Application number: CN202211212592.4A
Authority: CN
Inventors: 陈瑞斌; 肖兵; 李正国
Original assignee: Zhuhai Shixi Technology Co Ltd
Current assignee: Zhuhai Shixi Technology Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-08-08
Anticipated expiration: 2042-09-29
Also published as: CN115482417A

Abstract

The invention provides a multi-target detection model, a training method, a device, a medium and equipment thereof, wherein the method comprises the following steps: acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information; evaluating the detection target information to obtain an evaluation result; and adjusting the detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image. According to the scheme, on the premise of not increasing the model parameters and training data, the classification precision in object detection is improved, so that the robustness of an algorithm is enhanced.

Description

Multi-target detection model, training method, device, medium and equipment thereof

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-target detection model and a training method, a training device, a training medium and training equipment thereof.

Background

In the image processing process, the target detection is an important component for image recognition, and the training mode of the deep learning detection network for the target detection model directly influences the accuracy of the detection result of the model. Most of the deep learning detection network training modes are based on multi-task decoupling, such as yolo series. However, there is a certain dependence between tasks, if the task is forcefully split, the characteristics of the task which is partially dependent are disordered under special conditions, and the final classification result is affected seriously.

Disclosure of Invention

In view of the foregoing, the present invention provides a multi-objective detection model, and training method, apparatus, medium and device thereof, which overcome or at least partially solve the foregoing problems.

According to one aspect of the present invention, there is provided a multi-objective detection model created based on a YOLO network architecture, the multi-objective detection model including a backbone network structure and a loss function structure for image feature learning; the loss function structure includes: the system comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module.

Optionally, the model predictive decoder is respectively coupled with the cross ratio calculation module, the detection target confidence coefficient regression module and the classification module;

the cross ratio calculation module is further coupled with the detection target confidence coefficient regression module, the frame regression module and the model prior module; the real information distribution module is respectively coupled with the cross-correlation calculation module and the model prior module; the model prior module is also coupled to the classification module.

Optionally, a model predictive decoder for outputting detection target information of the image;

the real information distribution module is used for outputting real target information of the image;

the cross-over ratio calculation module is used for calculating the cross-over ratio between the detection frame and the real frame;

the model prior module is used for adjusting the detection target information according to the output of the real information distribution module and the cross ratio calculation module.

According to one aspect of the invention, a model prior-based multi-target detection model training method is provided, and is applied to the multi-target detection model; the method comprises the following steps:

evaluating the detection target information to obtain an evaluation result;

and adjusting the detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image.

Optionally, the evaluating the detection target information based on the real target information, and obtaining an evaluation result includes:

acquiring real target information of the sample image; the real target information comprises image coordinates of a real frame; the detection target information comprises image coordinates of a detection frame corresponding to the detection target;

and calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame by using the intersection ratio calculation module, so as to take the intersection ratio between the detection frame and the real frame as an evaluation result.

Optionally, the adjusting the detection target result according to the evaluation result includes:

generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image;

generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix;

and determining the category label of the detection target according to the overlapped category list, and taking the category label as the final category label of the detection target.

Optionally, the generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame includes:

acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix;

marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter;

and generating an overlapping mask matrix corresponding to each detection target based on the cross ratio matrix, the first parameter and the second parameter.

Optionally, the generating the overlapping category list corresponding to the detection target based on the overlapping mask matrix includes:

acquiring a real target class label;

and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.

Optionally, the determining the category label of the detection target according to the overlapping category list includes:

acquiring the total number of categories of the overlapped category list of the detection target;

if the total number of categories is smaller than or equal to a second preset threshold value, the category label in the overlapped category list is used as the category label of the detection target;

and if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.

According to another aspect of the present invention, there is provided a model prior-based multi-objective detection model training apparatus applied to any one of the above multi-objective detection models, the apparatus comprising:

the target detection unit is used for acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information;

the evaluation unit is used for evaluating the detection target information to obtain an evaluation result;

and the training unit is used for adjusting the detection target result according to the evaluation result and training the multi-target detection model by using the adjusted detection target result and the sample image.

According to another aspect of the present invention, there is provided a computer readable storage medium for storing program code for performing the model prior based multi-objective detection model training method of any one of the above.

According to another aspect of the present invention, there is provided a computing device comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the model prior-based multi-objective detection model training method according to any one of the above-described instructions in the program code.

According to another aspect of the present invention, there is provided an image capturing apparatus employing the multi-target detection model of any one of the above, or performing the model-prior-based multi-target detection model training method, or including the model-prior-based multi-target detection model training device, or having the computer-readable storage medium.

The invention provides a multi-target detection model, a training method, a device, a medium and equipment thereof. On the premise of not increasing the model parameters and training data, the classification precision in the object detection is improved, so that the algorithm robustness is enhanced.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

The above, as well as additional objectives, advantages, and features of the present invention will become apparent to those skilled in the art from the following detailed description of a specific embodiment of the present invention when read in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a multi-objective detection network training schematic in accordance with an embodiment of the present invention;

FIG. 2 illustrates an IOU computation schematic according to one embodiment of the invention;

FIG. 3 shows a flow diagram of a model prior-based multi-objective detection model training method in accordance with an embodiment of the present invention;

FIG. 4 shows a schematic diagram of the detection results of the multi-target detection network according to FIG. 1;

FIG. 5 is a diagram showing the detection results of a network according to conventional multi-objective detection;

FIG. 6 shows a comparative schematic of recall curves according to an embodiment of the present invention;

FIG. 7 shows a schematic structural diagram of a model prior-based multi-objective detection model training device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The embodiment of the invention provides another model prior-based multi-target detection model training method, and the multi-target detection model of the embodiment is created based on a YOLO network architecture, preferably YOLOv5 or YOLOv7. The multi-objective detection model of the present embodiment includes a backbone network structure and a loss function structure that perform image feature learning. Since the YOLO network architecture is already open, this will not be described in detail here. As shown in fig. 1, the loss function structure may include a model prediction decoder M1, a real information allocation module M2, an overlap ratio calculation module M3, a detection target confidence regression module M4, a frame regression module M5, a classification module M6, and a model prior module M7. As shown in fig. 1, the model predictive decoder M1 is respectively coupled to the cross ratio calculation module M3, the detection target confidence coefficient regression module M4 and the classification module M6; the cross ratio calculation module M3 is further coupled with the detection target confidence coefficient regression module M4, the frame regression module M5 and the model prior module M7; the real information distribution module M2 is respectively coupled with the cross ratio calculation module M3 and the model prior module M7; the model prior module M7 is also coupled to the classification module M6.

Model predictive decoder m1.Decoder: for outputting detection target information of the image. The decoding process of different object detection networks is different, and the decoder output is decoupled into three parts: object, characterizing the confidence of a detection target; bbox, representing the image coordinates of the detection frame; class, characterize the classification confidence of the predictions.

The real information distribution module m2.Build target: for outputting real target information of the image. Real label distribution is responsible for dynamically distributing input real labels and establishing association with a model output layer. The label distribution modes of different detection networks are different, and the output decoupling is divided into two parts: bbox, representing the image coordinates of a real frame; class, class tag characterizing real targets.

The cross-over ratio calculation module m3.Iou calculator: the method is used for calculating the intersection ratio between the detection frame and the real frame and is used as an evaluation standard of the regression quality of the detection frame. As shown in fig. 2, if the overlapping portion of the rectangle a and the rectangle B is AB, the calculation formula of the overlap ratio is as follows:

the detection target confidence regression module m4.Object regression: the module is typically composed of regression loss, such as cross entropy loss, focal loss, etc.

Frame regression module m5.Bbox regression: the module is typically composed of a gious loss.

Classification module m6.Classification: the module is generally composed of class loss.

The model priori module M7 adjusts the detection target information according to the output of the real information distribution module and the cross ratio calculation module, and can specifically readjust the class label of the detection target according to the result of M3 (the result is used for evaluating the quality of the frame regression).

As shown in fig. 3, the model prior-based multi-objective detection model training method according to the embodiment of the present invention may at least include the following steps S1 to S3.

S1, acquiring at least one sample image for model training from a training sample set, and inputting the sample image into a multi-target detection model to obtain corresponding detection target information; the target detection information comprises at least one detection target and detection target information corresponding to the detection target; the detection target information comprises the confidence of the detection target, the image coordinates of the detection frame and the predicted classification confidence. The training sample set in this embodiment is a sample marked with real frames, and each real frame has a real target category of a real target and image coordinate related information. During each training, a plurality of iterative training of the single-round model can be selected randomly from the training sample set. In combination with the foregoing, the multi-target detection model of the present embodiment may be capable of implementing multiple types of target detection in the input image, and the multi-target detection model may be generated based on the yolo network model building.

S2, evaluating the detection target information to obtain an evaluation result.

And S3, adjusting a detection target result according to the evaluation result, and training the multi-target detection model by using the adjusted detection target result and the sample image.

In practical applications, multi-objective detection networks such as yolov5, yoloR, and yolov7 decouple tasks into three tasks of frame regression (bounding boxes regression), objective confidence regression (object regression), and multi-class classification (classification) during training. In other words, the training of the target multi-target detection model is mainly the training of the detection target confidence regression module M4, the frame regression module M5, and the classification module M6. The detection target confidence coefficient regression module M4, the frame regression module M5 and the classification module M6 are respectively provided with independent loss functions, so that the training iteration times can be controlled by taking the loss functions corresponding to the detection target confidence coefficient regression module M4, the frame regression module M5 and the classification module M6 as training constraint conditions until training is stopped.

In the conventional scheme, a model priori module M7 is not arranged, and the target confidence coefficient regression M4 depends on the calculation result of the cross-ratio calculation module M3, namely the quality of frame regression affects the target confidence coefficient. Whereas the multi-class classification M6 is independent of M4 and M5, which means that M6 does not have any measure to circumvent when the target detection is wrong. In practical application, there is a dependency between three training tasks of frame regression, target confidence coefficient regression and multi-class classification of the multi-target detection model, for example, the frame regression determines the classes of the target confidence coefficient and the classification. According to the scheme provided by the embodiment of the invention, after the detection target information corresponding to the detection target is obtained, the detection target result of the model is used as the priori information, the detection target information is adaptively modified by analyzing the priori information, namely, the model prediction result is added into the loss functions of the yolov5, yolov R and yolov7 networks to be used as the priori information, the current classification label is adaptively modified by analyzing the priori information, and the problem of classification errors caused by the adjacent or overlapping of different types of targets can be effectively solved. On the premise of not increasing the model parameters and training data, the classification precision in the object detection is improved, so that the algorithm robustness is enhanced.

In the embodiment of the present invention, the step S2 of evaluating the detection target information based on the real target information may include:

s2-1, acquiring real target information of the sample image; the real target information includes image coordinates of a real frame.

S2-2, calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame by using the intersection ratio calculation module, so as to take the intersection ratio between the detection frame and the real frame as an evaluation result.

As in fig. 2, after the image coordinates of the real frame and the image coordinates of the detection frame are obtained, the intersection ratio of the real frame and the detection frame may be calculated. In this embodiment, whether the real target or the detection target has a corresponding ID, when calculating the merging ratio, the merging ratio between each detection frame and each real frame may be calculated separately.

The step S3 may further include adjusting the detection target result according to the evaluation result:

s3-1, generating an overlapped mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image.

The step S2 may acquire IDs of each real target and each detection target, and optionally, generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame may include:

s3-1-1, obtaining a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix.

As shown in Table 1, let the matrix be n rows and m columns (n is the total number of predicted targets and m is the total number of real targets), if the two subscripts are used to represent IOU _ij (i is more than or equal to 0 and less than or equal to n-1, j is more than or equal to 0 and less than or equal to m-1), wherein the subscript i is a predicted target, and j is a real target. If expressed as IOU by single table _i Where subscript i is the prediction target. Wherein IOU is provided with _i With IOU _ij Is related to IOU _i ＝[IOU _i0 ，IOU _i2 ，...IOU _i，m-1 ]。

TABLE 1IOU matrix

IOU _0，0	IOU _0，1	...	IOU _0，m-2	IOU _0，m-1
					IOU _1，0	IOU _1，1	...	IOU _1，m-2	IOU1 _-2，m-1
...	...	IOU _i，j	...	...
					IOU _n-2，0	IOU _n-2，1	...	IOU _n-2，m-2	IOU _n-2，m-1
IOU _n-1，0	IOU _n-1，1	...	IOU _n-1，m-2	IOU _n-1，m-1

Table 2 shows the IOU matrix of the embodiment shown in FIG. 4, where (a) is the model prediction result and (b) is the label.

Table 2 example IOU matrix

That is, the IOU matrix is ordered in sequence with the real targets being columns and the detection targets being rows.

S3-1-2, marking matrix elements with the cross ratio larger than a first preset threshold value in the cross ratio matrix as first parameters, and marking matrix elements smaller than or equal to the first preset threshold value as second parameters.

After the cross-over matrix is obtained, an overlap mask may be generated. In this embodiment, the mask of the i-th predicted target corresponding to the real target is:

mask _i ＝IOU _ij ＞0，(0≤j≤m-1)

in this embodiment, a first preset threshold is set to 0, and matrix elements in the IOU matrix with the cross ratio greater than the first preset threshold 0 are marked as a first parameter True; matrix elements in the IOU matrix that are less than or equal to the first preset threshold value 0 are labeled as the second parameter False.

S3-1-3, generating an overlapped mask matrix corresponding to each detection target based on the cross ratio matrix, the first parameter and the second parameter.

Thus, the overlapping mask matrix corresponding to the cross-ratio matrix of table 2 may be as shown in table 3.

Table 3 example mask matrix

True	True	False
			True	True	False
False	False	True

Corresponding to Table 3, each row of detection targets has a corresponding overlapping mask matrix, i.e., mask ₀ ＝[True,True,False]，mask ₁ ＝[True,True,False]，mask ₂ ＝[False,False,True]。

S3-2, generating an overlapped category list corresponding to the detection target based on the overlapped mask matrix.

After the overlapping mask matrix is obtained, an overlapping category list iou_class may be generated, and in this embodiment, generating, based on the overlapping mask matrix, an overlapping category list corresponding to the detection target includes:

s3-2-1, obtaining a real target class label.

According to fig. 4, tclass= [ circle, hexagon ].

S3-2-2, performing de-duplication on the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.

Taking the real target category as Tclass, wherein the j-th real target category is Tclass _i (0.ltoreq.j.ltoreq.m-1), the list of categories in which the ith predicted target overlaps with the real target is:

iou_class _i ＝unique(Tclass[mask _i ])，(0≤i≤n-1)

wherein is class [ mask ] _i ]Slicing, if mask matrix A= [ True, flash, true]Matrix b= [0,1,2,3]B is [ A ]]＝[0,3]。

Unique () is a deduplication function, and if matrix b= [0,0,1,3,2,3,4], unique (B) = [0,1,2,3,4].

In connection with FIG. 4, the iou_class is shown in Table 4.

TABLE 4 overlapping category list

ID	i＝0	i＝1	i＝2
				mask _i	[True,True,False]	[True,True,False]	[False,False,True]
Tclass[mask _i ]	Circle, hexagon]	Circle, hexagon]	[ Hexagon shape]
				unique(Tclass[mask _i ])	Circle, hexagon]	Circle, hexagon]	[ Hexagon shape]
iou_class	Circle, hexagon]	Circle, hexagon]	[ Hexagon shape]

S3-3, determining the category label of the detection target according to the overlapped category list, and taking the category label as the final category label of the detection target. The method specifically comprises the following steps:

s3-3-1, obtaining the total number of categories of the overlapped category list of the detection target;

s3-3-2, if the total number of categories is smaller than or equal to a second preset threshold value, using the category labels in the overlapped category list as the category labels of the detection targets;

s3-3-3, if the total number of categories is larger than the second preset threshold value, determining the category label of the detection target from the overlapped category list by using an argmax function.

Calculating the optimal class of the predicted target, and marking the class label of the i-th predicted target as Pclass _i The optimal label of the i-th predicted target is denoted as class_refine _i Then

Where len () is the total number of the list, if a= [ True, flise, true ], len (a) =4;

argmax () is a subscript corresponding to the maximum value element, and if a= [4,1,2,0,3], argmax (a) =0.

Based on the above, the prediction target optimal categories corresponding to table 4 are shown in table 5.

TABLE 5

ID	i＝0	i＝1	i＝2
				Len(iou_class _i )	2	2	1
IOU _i	[0.99,0.32,0.0]	[0.76,0.44,0.0]	[0.0,0.0,0.99]
				Argmax(IOU _i )	0	1	N/A
Pclass _i	Circle	Hexagonal shape	Hexagonal shape
				class_refine _i	Circle	Circle	Hexagonal shape

And (3) taking the class label of the detection target obtained finally in the step (S3-3) as a final class label of the detection target, and further training a detection target confidence coefficient regression module, a frame regression module and a real information distribution module according to the sample image and the final class label after combination adjustment so as to realize multi-training of a multi-target detection model.

Tasks are decoupled into bounding box regression (bounding boxes regression), target confidence regression (object regression), and multi-class classification (classification) during the training of the multi-target model.

As shown in fig. 5, assume that circles and hexagons now need to be detected. Ideally, as shown in fig. 5 (a), when the detection frame of the circle is shifted (as shown in fig. 5 (b)), M6 still considers the detected object to be a circle, but the actual feature is a hexagon. This situation is more pronounced when the centers of the objects are close (as in fig. 5 (c)). When feature spaces between classes are confused, classification can still be in error even if the frame and target confidence regression are normal.

In FIG. 6, (a) is a master precision and recall curve, head map50 is 0.991, hand map50 is 0.929, and average is 0.960; (b) To train the results using this protocol, the head map50 was 0.996, the hand map50 was 0.936, and the average was 0.966. Each index is improved to a certain extent.

Based on the same inventive concept, the embodiment of the invention also provides a model priori-based multi-target detection model training device, wherein the multi-target detection model is created based on a YOLO network architecture and comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module; as shown in fig. 7, the model prior-based multi-objective detection model training apparatus of the present invention may include:

a target detection unit 710, configured to obtain at least one sample image for model training from a training sample set, and input the sample image into the multi-target detection model to obtain corresponding detection target information;

an evaluation unit 720, configured to evaluate the detection target information to obtain an evaluation result;

and an adjusting unit 730, configured to adjust the detection target result according to the evaluation result, and train the multi-target detection model by using the adjusted detection target result and the sample image.

In an alternative embodiment of the invention, the evaluation unit 720 may further be adapted to:

In an alternative embodiment of the present invention, the adjusting unit 730 may also be configured to:

acquiring a real target class label;

The embodiment of the invention also provides a computer readable storage medium for storing program codes for executing the multi-target detection model training method based on model prior described in the above embodiment.

The embodiment of the invention also provides a computing device, which comprises a processor and a memory: the memory is used for storing program codes and transmitting the program codes to the processor; the processor is configured to execute the model prior-based multi-objective detection model training method according to the foregoing embodiment according to instructions in the program code.

It will be clear to those skilled in the art that the specific working processes of the above-described systems, devices, modules and units may refer to the corresponding processes in the foregoing method embodiments, and for brevity, the description is omitted here.

In addition, each functional unit in the embodiments of the present invention may be physically independent, two or more functional units may be integrated together, or all functional units may be integrated in one processing unit. The integrated functional units may be implemented in hardware or in software or firmware.

Those of ordinary skill in the art will appreciate that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or in whole or in part in the form of a software product stored in a storage medium, comprising instructions for causing a computing device (e.g., a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM), a magnetic disk, or an optical disk, etc.

Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a personal computer, a server, or a computing device such as a network device) associated with program instructions, where the program instructions may be stored on a computer-readable storage medium, and where the program instructions, when executed by a processor of the computing device, perform all or part of the steps of the method according to the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all technical features thereof can be replaced by others within the spirit and principle of the present invention; such modifications and substitutions do not depart from the scope of the invention.

Claims

1. A model prior-based multi-objective detection model training method, the method comprising:

acquiring at least one sample image for model training from a training sample set, and inputting the sample image into the multi-target detection model to obtain corresponding detection target information;

evaluating the detection target information to obtain an evaluation result; the evaluation result comprises the cross-correlation ratio between a detection frame of the detection target and a real frame of the real target in the sample image;

adjusting the detection target information according to the evaluation result, and training the multi-target detection model by using the adjusted detection target information and the sample image;

the adjusting the detection target information according to the evaluation result includes:

generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image; generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix; determining a category label of the detection target according to the overlapped category list, and taking the category label as a final category label of the detection target;

wherein generating the overlap mask matrix based on the intersection ratio of each detection frame and each real frame comprises: acquiring a detection target ID corresponding to each detection frame and a real target ID corresponding to each real frame, sequencing the intersection ratio of each detection frame and each real frame, and generating an intersection ratio matrix; marking matrix elements with the cross-over ratio larger than a first preset threshold value in the cross-over ratio matrix as a first parameter, and marking matrix elements smaller than or equal to the first preset threshold value as a second parameter; generating an overlapping mask matrix corresponding to each detection target based on the intersection ratio matrix, the first parameter and the second parameter;

the generating the overlapping category list corresponding to the detection target based on the overlapping mask matrix comprises the following steps: acquiring a real target class label; and de-duplicating the real target class label and the overlapped mask matrix to obtain an overlapped class list corresponding to each detection target.

2. The method according to claim 1, wherein evaluating the detection target information to obtain an evaluation result includes:

and calculating the intersection ratio of each detection frame and each real frame based on the image coordinates of the real frame and the image coordinates of the detection frame, and taking the intersection ratio between the detection frame and the real frame as an evaluation result.

3. The method of claim 1, wherein the determining the category label of the detection target from the overlapping category list comprises:

4. A multi-target detection model, characterized in that the multi-target detection model is trained based on the model prior-based multi-target detection model training method according to any one of claims 1-3; the multi-target detection model is created based on a YOLO network architecture and comprises a backbone network structure and a loss function structure for image feature learning; the loss function structure includes: the system comprises a model prediction decoder, a real information distribution module, an intersection ratio calculation module, a detection target confidence coefficient regression module, a frame regression module, a classification module and a model priori module.

5. The multi-target detection model of claim 4, wherein the model predictive decoder is coupled to the cross-ratio calculation module, the detection target confidence regression module, and the classification module, respectively;

6. The multi-target detection model of claim 4, wherein,

a model predictive decoder for outputting detection target information of the image;

7. A model prior-based multi-objective detection model training apparatus, the apparatus comprising:

the evaluation unit is used for evaluating the detection target information to obtain an evaluation result; the evaluation result comprises the cross-correlation ratio between a detection frame of the detection target and a real frame of the real target in the sample image;

the training unit is used for adjusting the detection target information according to the evaluation result and training the multi-target detection model by utilizing the adjusted detection target information and the sample image;

the training unit is further configured to: generating an overlapping mask matrix based on the intersection ratio of each detection frame and each real frame of the sample image; generating an overlapping category list corresponding to the detection target based on the overlapping mask matrix; determining a category label of the detection target according to the overlapped category list, and taking the category label as a final category label of the detection target;

8. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a program code for performing the method of any one of claims 1-3.

9. A computing device, the computing device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-3 according to instructions in the program code.

10. An image capturing apparatus employing the multi-target detection model according to any one of claims 4 to 6, or performing the model-prior-based multi-target detection model training method according to any one of claims 1 to 3, or comprising the model-prior-based multi-target detection model training device according to claim 7, or having the computer-readable storage medium according to claim 8.