WO2023071121A1

WO2023071121A1 - Multi-model fusion-based object detection method and apparatus, device and medium

Info

Publication number: WO2023071121A1
Application number: PCT/CN2022/090233
Authority: WO
Inventors: 金良; 李仁刚; 赵雅倩; 郭振华; 范宝余; 徐聪; 胡克坤
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2021-10-26
Filing date: 2022-04-29
Publication date: 2023-05-04
Also published as: CN113688957A

Abstract

A multi-model fusion-based object detection method and apparatus, a device and a medium, the method comprising: acquiring a plurality of trained object detection models, and acquiring a set of images to be detected obtained after image enhancement processing is performed on original images to be detected (S11); respectively using the object detection models to detect images to be detected in said set of images, so as to obtain a plurality of groups of initial object detection results corresponding to the plurality of object detection models (S12); respectively weighting all the initial object detection results in each group of initial object detection results, so as to obtain primary weighted object detection results respectively corresponding to the object detection models (S13); and weighting, on the basis of the weight of each object detection model determined in advance by using a validation set, the plurality of primary weighted object detection results corresponding to the plurality of object detection models, so as to obtain final object detection results corresponding to said original images (S14). The diversity of the models is fully used, thereby improving the precision of object detection.

Description

A target detection method, device, equipment and medium based on multi-model fusion

This application claims the priority of the Chinese patent application submitted to the China Patent Office on October 26, 2021, with the application number 202111244219.2, and the title of the invention is "a target detection method, device, equipment and medium based on multi-model fusion". The entire contents are incorporated by reference in this application.

technical field

The present application relates to the technical field of image processing, in particular to a method, device, equipment and medium for target detection based on multi-model fusion.

Background technique

At present, with the vigorous development of big data and artificial intelligence, computer vision based on deep learning has also been widely used in various fields, such as visual navigation of autonomous vehicles, medical image analysis, and face recognition. Object detection is an important branch of computer vision and the first step in visual perception. Object detection is not only to determine what is in the image, but also to determine where the object is in the image.

Currently, there are endless network models and algorithms for target detection. Usually, more accurate results cannot be obtained by using only one algorithm, so it is necessary to combine multiple learning algorithms to obtain a more accurate result. In the field of target detection, the commonly used model integration algorithms are: multi-model direct averaging method, single-model multiple snapshot integration (Snapshots Ensemble) average fusion and AABBFI algorithm (Axis-Aligned Bounding Box Fuzzy Integral, axis-aligned bounding box fuzzy integral). Among the above three model integration algorithms, the multi-model direct averaging method directly averages the output results of multiple models without considering the differences between different models, which makes the detection accuracy limited; single-model multiple snapshot integration average fusion is For the integration of a single network model, because the differences between different models are not considered, the model diversity is insufficient; because the AABBFI algorithm only fuses the position of the detection frame (ie Bounding Box) in the detection result in a single model, it does not Considering other factors in the model and other models, therefore, leads to limited accuracy improvement of this fusion algorithm.

To sum up, how to improve the accuracy of object detection is a problem to be solved in this field.

Contents of the invention

In view of this, the purpose of the present application is to provide an object detection method, device, equipment and medium based on multi-model fusion, which can improve the accuracy of object detection. The specific plan is as follows:

In the first aspect, the present application discloses a target detection method based on multi-model fusion, including:

Obtain multiple target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected;

Using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models;

Weighting all the initial target detection results in each group of the initial target detection results respectively, so as to obtain the initial weighted target detection results corresponding to each of the target detection models;

Based on the weight of each of the target detection models determined in advance using the verification set, weight the multiple initial weighted target detection results corresponding to the multiple target detection models, and obtain the final result corresponding to the original image to be detected. Target detection results.

Optionally, before acquiring the trained multiple target detection models, it also includes:

A plurality of target detection models to be trained are screened out by using screening conditions constructed based on model structure differences; wherein, the model structure differences between different target detection models to be trained all meet preset difference conditions;

Using the training set obtained after performing image enhancement on the historical original image set, the multiple target detection models to be trained are trained to obtain multiple trained target detection models.

Optionally, the acquisition of the image set to be detected obtained after performing image enhancement processing on the original image to be detected includes:

Determine an image enhancement algorithm corresponding to each of the target detection models according to the model category of each of the target detection models;

Using the image enhancement algorithm corresponding to each of the target detection models, corresponding image enhancement processing is performed on the original image to be detected, so as to obtain a set of images to be detected corresponding to each of the target detection models.

Optionally, weighting all initial target detection results in any set of initial target detection results includes:

clustering all the initial target detection results in any group of the initial target detection results to obtain a first clustering result corresponding to the group of the initial target detection results;

According to the first clustering result, and based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm, determine the weight of each of the initial target detection results;

Based on the weight of each of the initial target detection results, weighting is performed on all the initial target detection results in the group of the initial target detection results, so as to obtain an initial weighted target detection result corresponding to the target detection model.

Optionally, the described target detection method based on multi-model fusion also includes:

Based on the verification set, determine the mean value and average precision evaluation index corresponding to each of the target detection models that have been trained;

Summing the mean and average precision evaluation indicators corresponding to all the target detection models to obtain the corresponding evaluation index sum;

The weight of each target detection model is determined based on the mean average precision evaluation index corresponding to each target detection model and the sum of the evaluation indexes.

Optionally, based on the verification set, determine the mean average precision evaluation index corresponding to any of the trained target detection models, including:

Using the trained target detection model to predict each image in the verification set, so as to obtain a prediction result output by the target detection model corresponding to each image in the verification set;

Based on the difference between the prediction result and the corresponding real labeling result in the verification set, an average average precision evaluation index of the target detection model is determined.

Optionally, using the trained target detection model to predict each image in the verification set includes:

Carry out image enhancement to each image in the verification set respectively, and obtain a plurality of enhanced images corresponding to each image in the verification set;

Using the trained target detection model to predict a plurality of enhanced images corresponding to each image in the verification set, so as to obtain a plurality of initial prediction results corresponding to each image in the verification set;

Weighting is performed on multiple initial prediction results corresponding to each image in the verification set, so as to obtain a prediction result corresponding to each image in the verification set.

In the second aspect, the present application discloses a target detection device based on multi-model fusion, including:

The obtaining module is used to obtain a plurality of target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected;

An image detection module, configured to use each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models;

A single-model weighting module, configured to weight all the initial target detection results in each group of the initial target detection results, so as to obtain the initial weighted target detection results corresponding to each of the target detection models;

A multi-model weighting module, configured to weight the multiple initial weighted target detection results corresponding to multiple target detection models based on the weight of each target detection model determined in advance using the verification set, to obtain the The final target detection result corresponding to the original image to be detected.

In a third aspect, the present application discloses an electronic device, including a processor and a memory; wherein, when the processor executes the computer program stored in the memory, the aforementioned multi-model fusion-based target detection method is implemented.

In a fourth aspect, the present application discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, the aforementioned multi-model fusion-based target detection method is implemented.

It can be seen that the present application first obtains a plurality of target detection models that have been trained, and obtains the image set to be detected after image enhancement processing is performed on the original image to be detected, and then uses each of the target detection models to perform the detection of the target detection model. The images to be detected in the image set are detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models, and then all initial target detection results in each set of initial target detection results are respectively weighted to obtain Obtaining the initial weighted target detection results corresponding to each of the target detection models, and finally based on the weight of each of the target detection models determined in advance using the verification set, the multiple initial weighted target detection models corresponding to multiple target detection models The weighted target detection results are weighted to obtain a final target detection result corresponding to the original image to be detected. It can be seen that, by weighting all the initial target detection results in each group of initial target detection results, this application reduces the difference in single-model detection results caused by image enhancement processing on the original image, and improves the robustness of the model. By assigning different weights to different models based on the training set, the diversity of the models can be fully utilized and the accuracy of target detection can be improved.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.

Fig. 1 is a flow chart of a target detection method based on multi-model fusion disclosed in the present application;

FIG. 2 is a flow chart of a specific multi-model-based weight calculation method disclosed in the present application;

Fig. 3 is a schematic diagram of a target detection structure of a single model and a single image disclosed in the present application;

FIG. 4 is a flow chart of a specific multi-model fusion-based target detection method disclosed in the present application;

FIG. 5 is a schematic structural diagram of a target detection device based on multi-model fusion disclosed in the present application;

FIG. 6 is a structural diagram of an electronic device disclosed in the present application.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

The embodiment of the present application discloses a target detection method based on multi-model fusion, as shown in Figure 1, the method includes:

Step S11: Obtain a plurality of trained target detection models, and obtain a set of images to be detected obtained after performing image enhancement processing on the original images to be detected.

In this embodiment, it is first necessary to obtain multiple target detection models determined after pre-training multiple target detection models to be trained, and then use the image enhancement algorithm corresponding to the trained multiple target detection models The image is subjected to image enhancement processing to obtain the image set to be detected. Wherein, the target detection model includes, but is not limited to, three types of models: One-stage, Two-stage, and Anchor-free; the One-stage model includes, but is not limited to, RetinaNet , YOLOV2/V3 (YOLO, You Only Look Once), etc.; Two-stage models include but not limited to Faster R-CNN, etc.; Anchor-free models include but not limited to CornerNet, ExtremeNet, CenterNet, FCOS (Fully Convolutional One-Stage Object Detection, first-order full convolution target detection), etc.

In this embodiment, before the acquisition of multiple target detection models that have been trained, it may specifically include: using screening conditions constructed based on model structure differences to screen out multiple target detection models to be trained; The model structure differences between the target detection models to be trained all meet the preset difference conditions; using the training set obtained after image enhancement is performed on the historical original image set, a plurality of the target detection models to be trained are trained to obtain the trained Good multiple of the object detection models. It is understandable that, in order to improve the detection results after the fusion of multiple target detection models and ensure the diversity of the models, before training multiple target detection models, the rules based on the size of the model structure differences that meet the preset difference conditions can be paired. Multiple initial target detection models are screened to obtain multiple target detection models to be trained. For example, five target detection models, YOLOV3, RetinaNet, Faster R-CNN, CenterNet and FCOS, are selected from the above three types of models, namely One-stage, Two-stage and Anchor-free.

Optionally, after selecting a plurality of the above-mentioned target detection models to be trained, in order to improve the generalization ability of the model and the diversity of samples, image enhancement processing can be performed on historical original images to obtain a training set, and the above-mentioned training set can be used to filter The above-mentioned multiple target detection models to be trained are trained to obtain multiple trained target detection models.

In this embodiment, the acquiring the image set to be detected obtained after performing image enhancement processing on the original image to be detected may specifically include: according to the model category of each target detection model, determining the corresponding target detection model Image enhancement algorithm: using the image enhancement algorithm corresponding to each of the target detection models to perform corresponding image enhancement processing on the original image to be detected, so as to obtain a set of images to be detected corresponding to each of the target detection models. The above-mentioned image enhancement algorithms include but not limited to image enhancement algorithms based on geometric transformation and color transformation; wherein, image enhancement algorithms based on geometric transformation include but not limited to random cropping (ie Random Cropping), random expansion (ie Random Expansion), Random horizontal flip (that is, Random Horizontal Flip), random zoom (that is, Random Resize), etc.; image enhancement algorithms based on color transformation include but are not limited to color dithering, Fancy PCA (PCA, Principal Component Analysis, principal component analysis), etc. Before training the above-mentioned multiple target detection models to be trained, it is also necessary to screen out the target image enhancement algorithm from the above-mentioned image enhancement algorithms, and add the screened above-mentioned target image enhancement algorithm to the above-mentioned multiple target detection models to be trained , as a preprocessing stage of the data.

It should be pointed out that in the process of adding the target image enhancement algorithm to the above-mentioned multiple target detection models to be trained, it should be selectively added according to the functions of the target detection models to be trained, and the image with repeated functions will be enhanced Algorithm to remove. For example, the selected image enhancement algorithm Random Cropping is eliminated in the Two-stage class model, because the RPN (Region Proposal Network, Region Generation Network) network in the Two-stage has similar functions to Random Cropping. It can be understood that, before the target detection model is trained, the selected target image enhancement algorithm has been added to the corresponding target detection model, therefore, according to the model category of each target detection model, it can be determined that each An image enhancement algorithm corresponding to each of the target detection models, and using the image enhancement algorithm corresponding to each target detection model to perform corresponding image enhancement processing on the original image to be detected, to obtain an image to be detected corresponding to each of the target detection models set.

Step S12: Using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to the plurality of target detection models.

In this embodiment, after acquiring a plurality of target detection models that have been trained, and obtaining the image set to be detected after performing image enhancement processing on the original image to be detected, each of the target detection models is used to detect the above-mentioned target detection models. All the images to be detected in the image set are detected, and multiple sets of initial target detection results corresponding to multiple target detection models output by each target detection model are obtained. Wherein, the number of each group of initial target detection results is the same as the number of images included in the image set to be detected after image enhancement processing.

Step S13: weighting all the initial target detection results in each group of the initial target detection results, so as to obtain the initial weighted target detection results corresponding to each of the target detection models.

In this embodiment, after using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to the multiple target detection models, it is necessary to All the initial target detection results corresponding to each of the target detection models are weighted, that is, the internal detection results of a single model are weighted to obtain the initial weighted target detection results corresponding to each of the target detection models.

Step S14: Based on the weight of each target detection model determined in advance using the verification set, weight the multiple initial weighted target detection results corresponding to the multiple target detection models, and obtain the corresponding to the original image to be detected The final target detection result.

In this embodiment, after weighting all the initial target detection results in each group of the initial target detection results to obtain the initial weighted target detection results corresponding to each of the target detection models, the pre-used verification set pairs are obtained. After each target detection model is trained, determine the weight corresponding to each target detection model, and use the above weight to perform weighting processing on the multiple initial weighted target detection results corresponding to multiple target detection models, That is, different weight values are assigned to the detection results corresponding to the multiple target detection models, and then the final target detection result corresponding to the original image to be detected is obtained.

It can be seen that, in the embodiment of the present application, firstly obtain a plurality of target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected, and then use each of the target detection models to separately Detecting the images to be detected in the image set to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models, and then weighting all the initial target detection results in each set of initial target detection results , to obtain the initial weighted target detection results corresponding to each of the target detection models, and finally based on the weight of each of the target detection models determined in advance using the verification set, the multiple target detection models corresponding to multiple target detection models The initial weighted target detection result is weighted to obtain the final target detection result corresponding to the original image to be detected. It can be seen that in the embodiment of the present application, by weighting all the initial target detection results in each group of initial target detection results, the difference in single-model detection results caused by image enhancement processing on the original image is reduced, and the robustness of the model is improved. By assigning different weights to different models based on the training set, the diversity of the models can be fully utilized and the accuracy of target detection can be improved.

The embodiment of the present application discloses a specific target detection method based on multi-model fusion, as shown in Figure 2, the method includes:

Step S21: Obtain a plurality of trained target detection models, and obtain an image set to be detected obtained after performing image enhancement processing on the original image to be detected.

Step S22: Using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to the multiple target detection models.

Step S23: Clustering all the initial object detection results in any group of the initial object detection results to obtain a first clustering result corresponding to the group of the initial object detection results.

In this embodiment, after using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models, further Based on the preset clustering algorithm, clustering is performed on all the initial target detection results in any one of the multiple groups of initial target detection results corresponding to the above-mentioned target detection models, and the results are obtained as described in the group. The first clustering result corresponding to the initial target detection result. Wherein, the aforementioned preset clustering algorithms include but are not limited to K-means algorithm (ie K-means Clustering Algorithm, k-means clustering algorithm) and the like.

Step S24: According to the first clustering result, and based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm, determine the weight of each of the initial target detection results.

In this embodiment, after clustering all the initial target detection results in any group of the initial target detection results to obtain the first clustering result corresponding to the initial target detection results in the group, based on non-maximum The value suppression (NMS, Non Maximum Suppression) algorithm suppresses the non-maximum values in the above first clustering results, and obtains the first clustering suppression results, and based on the axis-aligned bounding box fuzzy integration algorithm, the above first clustering The suppression results are calculated to obtain the weights corresponding to each of the initial target detection results. Wherein, the specific process of suppressing the non-maximum value in the above-mentioned first clustering result based on the non-maximum value suppression algorithm is as follows: obtain the index corresponding to the maximum value in the first clustering result, and compare it with the above-mentioned The clustering result of the index corresponding to the maximum value is removed from the first clustering result to obtain the target clustering result, and the clustering results of all the detection frames in the above target clustering result and the index corresponding to the above maximum value are calculated respectively. The intersection and union ratio of the detection frame, and determine the weight corresponding to each initial target detection result according to the intersection and union ratio.

Step S25: Based on the weight of each of the initial target detection results, weight all the initial target detection results in the group of the initial target detection results, so as to obtain the initial weighted target detection results corresponding to the target detection model .

In this embodiment, after determining the weights of each of the initial target detection results based on the first clustering result and based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm, each of the above-mentioned initial targets can be used The weight value of the detection result performs corresponding weighting processing on all the above-mentioned initial target detection results in the group of the initial target detection results, and obtains corresponding primary weighted target detection results corresponding to the target detection model. Specifically, the above-mentioned process of obtaining the initial weighted target detection result corresponding to the target detection model is mainly based on the AABBFI algorithm. The specific implementation process of the AABBFI algorithm is shown in Figure 3. First, the input single original image to be detected is Perform data enhancement processing to obtain the image set to be detected, and then use the trained single target detection model to infer the images in the above image set to be detected based on the AABB (Axis-Aligned Bounding Box) algorithm, and obtain multiple Inference results, and based on the FI (Fuzzy Integral, fuzzy integral) algorithm, perform fuzzy operations on the above inference results to obtain target detection results.

Step S26: Based on the weight of each target detection model determined in advance using the verification set, weight the multiple initial weighted target detection results corresponding to the multiple target detection models, and obtain the target detection results corresponding to the original image to be detected The final target detection result.

In this embodiment, based on the weight of each initial target detection result, weighting is performed on all the initial target detection results in the group of initial target detection results to obtain the initial weighted target corresponding to the target detection model After the detection result, obtain the weight value corresponding to each of the target detection models determined based on the verification set in advance, and use the weight value corresponding to the above target detection model to weight the multiple initial weights corresponding to the multiple target detection models The target detection result is weighted to obtain the final target detection result corresponding to the original image to be detected.

In this embodiment, as shown in FIG. 4, the target detection method based on multi-model fusion may specifically include:

Step S31: Determine the mean value and average precision evaluation index corresponding to each of the trained target detection models based on the verification set;

Step S32: Summing the mean and average precision evaluation indicators corresponding to all the target detection models to obtain the sum of corresponding evaluation indicators;

Step S33: Determine the weight of each target detection model based on the mean value average precision evaluation index corresponding to each target detection model and the sum of the evaluation indexes.

In this embodiment, before obtaining a plurality of trained target detection models, the verification set can be used to determine the index for measuring the accuracy of target detection and recognition corresponding to each of the above target detection models, that is, the mean average precision evaluation index ( mAP, mean Average Precision). Further, the sum of the mean average precision evaluation indicators corresponding to all the above target detection models is calculated to obtain the corresponding sum of evaluation indicators. Then, the weight corresponding to each target detection model can be determined based on the ratio of the mean average precision evaluation index corresponding to each target detection model to the sum of the above evaluation indexes.

In this embodiment, determining the mean average precision evaluation index corresponding to any of the trained target detection models based on the verification set may specifically include: using the trained target detection model to evaluate each target detection model in the verification set Prediction of each image in order to obtain the prediction result corresponding to each image in the verification set output by the target detection model; based on the difference between the prediction result and the corresponding real labeling result in the verification set, determine the The mean average precision evaluation index of the target detection model. In this embodiment, the trained target detection model is used to predict each image in the verification set, and the prediction result output by the target detection model corresponding to each image in the verification set can be obtained. Further, based on the above The degree of difference between the prediction result and the corresponding real labeling result (ie, Ground Truth) in the verification set can determine the mean average precision evaluation index corresponding to the above-mentioned target detection model.

Specifically, using the trained target detection model to predict each image in the verification set may include: respectively performing image enhancement on each image in the verification set to obtain the corresponding A plurality of enhanced images; using the trained target detection model to predict a plurality of enhanced images corresponding to each image in the verification set, so as to obtain a plurality of initial predictions corresponding to each image in the verification set Result: performing weighting processing on multiple initial prediction results corresponding to each image in the verification set, so as to obtain a prediction result corresponding to each image in the verification set. In this embodiment, in the process of using the trained target detection model to predict each image in the verification set, image enhancement processing can be performed on each image in the verification set respectively to obtain Multiple enhanced images corresponding to each image, and then use the trained target detection model to predict the multiple enhanced images corresponding to each image in the verification set, and then obtain multiple enhanced images corresponding to each image in the verification set Based on the initial prediction results, based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm, the multiple initial prediction results corresponding to each image in the above verification set are weighted, and the corresponding to each image in the above verification set is obtained. forecast result.

Wherein, for more specific processing procedures of the above-mentioned steps S21 and S22, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.

It can be seen that, in the embodiment of the present application, by clustering all the initial target detection results in any group of initial target detection results, the first clustering result corresponding to the initial target detection results of the group is obtained, and then according to the first The overlapping degree of detection frames corresponding to different cluster centers in the clustering results determines the weight of each of the initial target detection results corresponding to each cluster center, and based on the weight of each of the initial target detection results, the group of All the initial target detection results in the initial target detection results are weighted to obtain an initial weighted target detection result corresponding to the target detection model. By clustering and weighting all the initial target detection results in each group of initial target detection results, the difference in detection results caused by image enhancement on the original image can be reduced, and the robustness of a single target detection model can be improved.

Correspondingly, the embodiment of the present application also discloses a target detection device based on multi-model fusion, as shown in Fig. 5, the device includes:

The acquiring module 11 is used to acquire a plurality of target detection models that have been trained, and acquire an image set to be detected obtained after performing image enhancement processing on the original image to be detected;

The image detection module 12 is configured to use each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models;

A single-model weighting module 13, configured to weight all the initial target detection results in each group of the initial target detection results, so as to obtain the initial weighted target detection results corresponding to each of the target detection models;

The multi-model weighting module 14 is used to weight the multiple initial weighted target detection results corresponding to multiple target detection models based on the weight of each target detection model determined in advance using the verification set, to obtain the weighted target detection results corresponding to the target detection models. Describe the final target detection result corresponding to the original image to be detected.

For the specific work flow of each of the above modules, reference may be made to the corresponding content disclosed in the foregoing embodiments, which will not be repeated here.

It can be seen that in the embodiment of the present application, first obtain a plurality of target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected, and then use each of the target detection models to The images to be detected in the image set to be detected are detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models, and then all initial target detection results in each set of initial target detection results are respectively weighting to obtain the initial weighted target detection results corresponding to each of the target detection models, and finally based on the weight of each of the target detection models determined in advance using the verification set, the multi The initial weighted target detection results are weighted to obtain the final target detection result corresponding to the original image to be detected. It can be seen that in the embodiment of the present application, by weighting all the initial target detection results in each group of initial target detection results, the difference in single-model detection results caused by image enhancement processing on the original image is reduced, and the robustness of the model is improved. By assigning different weights to different models based on the training set, the diversity of the models can be fully utilized and the accuracy of target detection can be improved.

In some specific embodiments, before the acquisition module 11, it may also include:

The model screening unit is used to screen out a plurality of target detection models to be trained by using screening conditions constructed based on model structure differences; wherein, the model structure differences between different target detection models to be trained all meet the preset difference conditions;

The first training unit is configured to use a training set obtained after performing image enhancement on a historical original image set to train a plurality of target detection models to be trained, so as to obtain a plurality of trained target detection models.

In some specific embodiments, the acquisition module 11 may specifically include:

An algorithm determining unit, configured to determine an image enhancement algorithm corresponding to each of the target detection models according to the model category of each of the target detection models;

The first image enhancement unit is configured to use the image enhancement algorithm corresponding to each of the target detection models to perform corresponding image enhancement processing on the original image to be detected, so as to obtain the target to be detected corresponding to each of the target detection models image set.

In some specific embodiments, the weighting of all the initial target detection results in any group of the initial target detection results may specifically include:

The first clustering unit is configured to cluster all the initial target detection results in any group of the initial target detection results to obtain a first clustering result corresponding to the group of the initial target detection results;

The first weight determination unit is configured to determine the weight of each of the initial target detection results based on the first clustering result and based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm;

The first weighting unit is configured to weight all the initial target detection results in the group of initial target detection results based on the weight of each of the initial target detection results, so as to obtain the initial target detection model corresponding to the corresponding target. Weighted object detection results.

In some specific embodiments, the target detection device based on multi-model fusion may also include:

The first evaluation index determination unit is configured to determine the mean average precision evaluation index corresponding to each of the trained target detection models based on the verification set;

A summation unit, configured to sum the mean average precision evaluation indicators corresponding to all the target detection models to obtain the corresponding evaluation index sum;

The second weight determination unit is configured to determine the weight of each of the target detection models based on the mean average precision evaluation index corresponding to each of the target detection models and the sum of the evaluation indexes.

In some specific embodiments, the first evaluation indicator determining unit may specifically include:

A first prediction unit, configured to use the trained target detection model to predict each image in the verification set, so as to obtain a prediction result output by the target detection model corresponding to each image in the verification set;

The second evaluation index determination unit is configured to determine the mean average precision evaluation index of the target detection model based on the difference between the prediction result and the corresponding real labeling result in the verification set.

In some specific embodiments, the first prediction unit may specifically include:

The second image enhancement unit is used to respectively perform image enhancement on each image in the verification set to obtain a plurality of enhanced images corresponding to each image in the verification set;

The second prediction unit is configured to use the trained target detection model to predict a plurality of enhanced images corresponding to each image in the verification set, so as to obtain a plurality of initial images corresponding to each image in the verification set forecast result;

The second weighting unit is configured to respectively perform weighting processing on a plurality of initial prediction results corresponding to each image in the verification set, so as to obtain a prediction result corresponding to each image in the verification set.

Optionally, the embodiment of the present application also discloses an electronic device. FIG. 6 is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content in the figure should not be regarded as any limitation on the application scope of the present application.

FIG. 6 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21 , at least one memory 22 , a power supply 23 , a communication interface 24 , an input/output interface 25 and a communication bus 26 . Wherein, the memory 22 is used to store a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the multi-model fusion-based target detection method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in this embodiment may specifically be an electronic computer.

In this embodiment, the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows is applicable Any communication protocol in the technical solution of the present application is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external, and its specific interface type can be selected according to specific application needs, here Not specifically limited.

In addition, the memory 22, as a resource storage carrier, can be a read-only memory, random access memory, magnetic disk or optical disk, etc., and the resources stored thereon can include operating system 221, computer program 222, etc., and the storage method can be temporary storage or permanent storage. .

Wherein, the operating system 221 is used to manage and control each hardware device on the electronic device 20 and the computer program 222, which may be Windows Server, Netware, Unix, Linux, etc. In addition to the computer program 222 that can be used to complete the multi-model fusion-based target detection method performed by the electronic device 20 disclosed in any of the foregoing embodiments, it can also optionally include a computer program that can be used to complete other specific tasks. Computer program.

Optionally, the present application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, the aforementioned multi-model fusion-based target detection method is implemented. Regarding the specific steps of the method, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Finally, it should also be noted that in this text, relational terms such as first and second etc. are only used to distinguish one entity or operation from another, and do not necessarily require or imply that these entities or operations, any such actual relationship or order exists. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

A target detection method, device, equipment and medium based on multi-model fusion provided by this application has been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of this application. The description of the above embodiments It is only used to help understand the method of the present application and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope. In summary, The contents of this specification should not be understood as limiting the application.

Claims

A target detection method based on multi-model fusion, characterized in that, comprising:

Obtain multiple target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected;

Using each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models;

Weighting all the initial target detection results in each group of the initial target detection results respectively, so as to obtain the initial weighted target detection results corresponding to each of the target detection models;

Based on the weight of each of the target detection models determined in advance using the verification set, weight the multiple initial weighted target detection results corresponding to the multiple target detection models, and obtain the final result corresponding to the original image to be detected. Target detection results.
The target detection method based on multi-model fusion according to claim 1, wherein, before obtaining a plurality of trained target detection models, further comprising:

A plurality of target detection models to be trained are screened out by using screening conditions constructed based on model structure differences; wherein, the model structure differences between different target detection models to be trained all meet preset difference conditions;

Using the training set obtained after performing image enhancement on the historical original image set, the multiple target detection models to be trained are trained to obtain multiple trained target detection models.
The target detection method based on multi-model fusion according to claim 1, wherein said acquisition of the image set to be detected after performing image enhancement processing on the original image to be detected comprises:

Determine an image enhancement algorithm corresponding to each of the target detection models according to the model category of each of the target detection models;

Using the image enhancement algorithm corresponding to each of the target detection models, corresponding image enhancement processing is performed on the original image to be detected, so as to obtain a set of images to be detected corresponding to each of the target detection models.
The target detection method based on multi-model fusion according to claim 1, wherein weighting all initial target detection results in any set of initial target detection results includes:

clustering all the initial target detection results in any group of the initial target detection results to obtain a first clustering result corresponding to the group of the initial target detection results;

According to the first clustering result, and based on the axis-aligned bounding box fuzzy integration algorithm and the non-maximum value suppression algorithm, determine the weight of each of the initial target detection results;

Based on the weight of each of the initial target detection results, weighting is performed on all the initial target detection results in the group of the initial target detection results, so as to obtain an initial weighted target detection result corresponding to the target detection model.
The target detection method based on multi-model fusion according to any one of claims 1 to 4, further comprising:

Based on the verification set, determine the mean value and average precision evaluation index corresponding to each of the target detection models that have been trained;

Summing the mean and average precision evaluation indicators corresponding to all the target detection models to obtain the corresponding evaluation index sum;

The weight of each target detection model is determined based on the mean average precision evaluation index corresponding to each target detection model and the sum of the evaluation indexes.
The target detection method based on multi-model fusion according to claim 5, wherein, based on the verification set, determine the mean value average precision evaluation index corresponding to any of the trained target detection models, including:

Using the trained target detection model to predict each image in the verification set, so as to obtain a prediction result output by the target detection model corresponding to each image in the verification set;

Based on the difference between the prediction result and the corresponding real labeling result in the verification set, an average average precision evaluation index of the target detection model is determined.
The target detection method based on multi-model fusion according to claim 6, wherein said utilizing the trained target detection model to predict each image in the verification set comprises:

Carry out image enhancement to each image in the verification set respectively, and obtain a plurality of enhanced images corresponding to each image in the verification set;

Using the trained target detection model to predict a plurality of enhanced images corresponding to each image in the verification set, so as to obtain a plurality of initial prediction results corresponding to each image in the verification set;

Weighting is performed on multiple initial prediction results corresponding to each image in the verification set, so as to obtain a prediction result corresponding to each image in the verification set.
A target detection device based on multi-model fusion, characterized in that it comprises:

The obtaining module is used to obtain a plurality of target detection models that have been trained, and obtain the image set to be detected after performing image enhancement processing on the original image to be detected;

An image detection module, configured to use each of the target detection models to detect the images to be detected in the set of images to be detected to obtain multiple sets of initial target detection results corresponding to multiple target detection models;

A single-model weighting module, configured to weight all the initial target detection results in each group of the initial target detection results, so as to obtain the initial weighted target detection results corresponding to each of the target detection models;

A multi-model weighting module, configured to weight the multiple initial weighted target detection results corresponding to multiple target detection models based on the weight of each target detection model determined in advance using the verification set, to obtain the The final target detection result corresponding to the original image to be detected.
An electronic device, characterized in that it includes a processor and a memory; wherein, when the processor executes the computer program stored in the memory, the goal based on multi-model fusion according to any one of claims 1 to 7 is achieved Detection method.
A computer-readable storage medium, characterized in that it is used to store a computer program; wherein, when the computer program is executed by a processor, the multi-model fusion-based target detection method according to any one of claims 1 to 7 is realized .