CN114299480A

CN114299480A - Target detection model training method, target detection method and device

Info

Publication number: CN114299480A
Application number: CN202111582593.3A
Authority: CN
Inventors: 徐方凯; 虞抒沁; 童俊艳
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-04-08

Abstract

The embodiment of the application provides a target detection model training method, a target detection method and a target detection device, which relate to the technical field of deep learning, and are used for detecting each first sample image based on a first detection model to obtain a pseudo label of an object in each first sample image; acquiring first object image characteristics of objects in the first sample images and second object image characteristics of the objects in the second sample images; determining a sample image to be calibrated from each first sample image based on each first object image feature and each second object image feature; model training is carried out on the second detection model to be trained on the basis of the pseudo label of the object in the third sample image in each first sample image, the real label of the object in each second sample image and the real label of the object in the sample image to be calibrated to obtain the target detection model of the current detection scene, so that the detection cost can be reduced, and the generation efficiency of the detection model can be improved.

Description

Target detection model training method, target detection method and device

Technical Field

The application relates to the technical field of deep learning, in particular to a target detection model training method, a target detection method and a target detection device.

Background

With the rapid development of computer technology, image detection is widely applied in various aspects. For example, in the field of video surveillance, a target detection is performed on a surveillance image based on a trained target detection model, so that a preset object (e.g., an animal or a human being) contained in the surveillance image and an image area occupied by the preset object can be determined.

In the related art, in order to improve the accuracy of target detection, for a certain detection scene, a large number of sample images, which are artificially labeled with labels of target objects, corresponding to the detection scene need to be acquired, and then, a convolutional neural network model with a preset structure can be trained based on the sample images to obtain a target detection model corresponding to the detection scene.

However, manually labeling a large number of sample images increases complexity and cost of labeling, and further increases cost of detection, thereby reducing generation efficiency of the detection model.

Disclosure of Invention

The embodiment of the application aims to provide a target detection model training method, a target detection method and a target detection device, so as to reduce the detection cost and improve the generation efficiency of a detection model. The specific technical scheme is as follows:

in a first aspect, in order to achieve the above object, an embodiment of the present application discloses a method for training a target detection model, where the method includes:

acquiring each first sample image of the current detection scene without labeling;

detecting each first sample image based on a first detection model to obtain a label of an object in each first sample image as a pseudo label; wherein the first detection model is: training the labeled second sample images of other detection scenes except the current detection scene to obtain the labeled second sample images;

acquiring image characteristics of the object in each first sample image as first object image characteristics, and acquiring image characteristics of the object in each second sample image as second object image characteristics;

determining a sample image to be calibrated from each first sample image based on each first object image feature and each second object image feature so as to obtain a real label of the user for the object marker in the sample image to be calibrated; the sample image to be calibrated is not matched with the other detection scenes;

and performing model training on the second detection model to be trained on the basis of the pseudo labels of the objects in the third sample images, the real labels of the objects in the second sample images and the real labels of the objects in the sample images to be calibrated in the first sample images except the sample images to be calibrated to obtain the target detection model of the current detection scene.

Optionally, the determining a sample image to be calibrated from each first sample image based on each first object image feature and each second object image feature includes:

clustering the second object image features to obtain a plurality of object image feature clusters serving as first object image feature clusters;

determining candidate image features from the first object image features based on the distance between the candidate image features and the first object image feature clusters; wherein the candidate image feature does not match the second object image feature;

and determining a sample image to be calibrated based on the determined first sample image corresponding to the image characteristics of each candidate object.

Optionally, each first sample image comprises a plurality of sample image sets; each sample image set comprises a sample image to be enhanced of the current detection scene without labeling, and an enhanced sample image obtained by performing image enhancement on the sample image to be enhanced;

the determining candidate image features from the first object image features based on the distance to the first object image feature clusters includes:

clustering the first object image features to obtain a plurality of object image feature clusters serving as second object image feature clusters;

determining the center of each second object image feature cluster as a center object image feature, and determining object image features except the second object image feature cluster in each first object image feature as off-cluster object image features;

determining a first candidate object image feature from the central object image features based on the distance between the first candidate object image feature cluster and the central object image features; the distance between the first candidate object image feature and each first object image feature cluster is greater than the distance between other object image features in each central object image feature and each first object image feature cluster;

determining a second candidate object image feature from the image features of the objects outside each cluster based on the distance between each first object image feature cluster and the label stability; wherein the label stability of one out-of-cluster object image feature represents: the consistency degree of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster in the images contained in the sample image set corresponding to the image features of the objects outside the cluster;

and determining the first candidate object image characteristic and the second candidate object image characteristic as a final candidate object image characteristic.

Optionally, the determining, based on the distance from each first object image feature cluster, a first candidate image feature from each central object image feature includes:

aiming at each central object image feature which is not selected currently, calculating the distance between the central object image feature and the current first image feature set to be compared as the feature distance corresponding to the central object image feature; the current first image feature set to be compared comprises the centers of the first object image feature clusters;

selecting the central object image features with the maximum corresponding feature distance as first candidate object image features, adding the central object image features with the maximum corresponding feature distance to the current first image feature set to be compared, returning to execute the step of calculating the distance between the central object image features and the current first image feature set to be compared aiming at each central object image feature which is not selected currently, and taking the distance as the feature distance corresponding to the central object image features until determining a first preset number of central object image features as the first candidate object image features.

Optionally, before determining the second candidate object image feature from the object image features outside each cluster based on the distance from each first object image feature cluster and the label stability, the method further includes:

for each image feature of the object outside the cluster, calculating the entropy of the average value of the confidence degrees of the pseudo labels of the objects corresponding to the image feature of the object outside the cluster in each image contained in the sample image set corresponding to the image feature of the object outside the cluster as a first numerical value, and calculating the entropy of the confidence degrees of the pseudo labels of the objects corresponding to the image feature of the object outside the cluster as a second numerical value;

and calculating the difference value of the first numerical value and the second numerical value to be used as the label stability of the image characteristics of the object outside the cluster.

Optionally, the determining, based on the distance from each first object image feature cluster and the label stability, a second candidate object image feature from the object image features outside each cluster includes:

calculating the distance between the image features of the objects outside the cluster and the current second image feature set to be compared according to the image features of the objects outside the cluster which are not selected currently, and taking the distance as the feature distance corresponding to the image features of the objects outside the cluster; the current second image feature set to be compared comprises the center of each first object image feature cluster;

calculating the label stability of the image features of the objects outside the cluster, and obtaining a target numerical value by the weighted sum of the label stability and the corresponding feature distance;

selecting the corresponding image features of the out-of-cluster objects with the largest target value as second candidate object image features, adding the corresponding image features of the out-of-cluster objects with the largest target value to the current second image feature set to be compared, returning and executing the image features of each out-of-cluster object which is not selected currently, calculating the distance between the image features of the out-of-cluster objects and the current second image feature set to be compared as the feature distance steps corresponding to the image features of the out-of-cluster objects until a second preset number of the image features of the out-of-cluster objects are determined as the second candidate object image features.

Optionally, the determining a sample image to be calibrated based on the first sample image corresponding to the determined image feature of each candidate object includes:

and determining the sample image to be enhanced in the sample image set to which the first sample image corresponding to the candidate object image features belongs as a sample image to be calibrated according to the determined candidate object image features.

Optionally, before performing model training on the second detection model to be trained based on the pseudo label of the object in the third sample image, the real label of the object in each second sample image, and the real label of the object in the sample image to be calibrated, except for the sample image to be calibrated, in each first sample image, and obtaining the target detection model of the current detection scene, the method further includes:

and replacing the pseudo label of the object corresponding to the object image characteristic of the object in each sample image to be calibrated, which belongs to the same second object image characteristic cluster, with the real label of the object label in the sample image to be calibrated marked by the user.

Optionally, the acquiring image features of the object in each first sample image as first object image features, and acquiring image features of the object in each second sample image as second object image features, includes:

for each first sample image, inputting the first sample image into the first detection model and the universal detection model respectively to obtain first object image features of objects in the first sample image output by the feature extraction layers of the first detection model and the universal detection model; the universal detection model is obtained by training sample images based on a plurality of different detection scenes;

and for each second sample image, inputting the second sample image to the first detection model and the universal detection model respectively to obtain second object image characteristics of the object in the second sample image output by the feature extraction layers of the first detection model and the universal detection model.

In a second aspect, in order to achieve the above object, an embodiment of the present application discloses a target detection method, including:

acquiring an image to be detected;

inputting the image to be detected into a target detection model of a current detection scene to obtain a detection result of the image to be detected; wherein the target detection model is obtained by using the target detection model training method according to any one of the first aspect.

Optionally, the detection result includes a position of the object in the image to be detected, and/or a probability that the image to be detected contains a preset object.

In order to achieve the above object, an embodiment of the present application discloses an apparatus for training a target detection model, where the apparatus includes:

the first sample image acquisition module is used for acquiring each first sample image of the current detection scene without the label;

the pseudo label acquisition module is used for detecting each first sample image based on the first detection model to obtain a label of an object in each first sample image as a pseudo label; wherein the first detection model is: training the labeled second sample images of other detection scenes except the current detection scene to obtain the labeled second sample images;

the object image characteristic acquisition module is used for acquiring the image characteristics of the object in each first sample image as first object image characteristics and acquiring the image characteristics of the object in each second sample image as second object image characteristics;

the to-be-calibrated sample image determining module is used for determining a to-be-calibrated sample image from each first sample image based on each first object image characteristic and each second object image characteristic so as to obtain a real label of an object mark in the to-be-calibrated sample image; the sample image to be calibrated is not matched with the other detection scenes;

and the model training module is used for performing model training on the second detection model to be trained on the basis of the pseudo label of the object in the third sample image, the real label of the object in each second sample image and the real label of the object in the sample image to be calibrated except the sample image to be calibrated in each first sample image to obtain the target detection model of the current detection scene.

Optionally, the module for determining an image of a sample to be calibrated includes:

the first object image feature cluster determining submodule is used for clustering the features of each second object image to obtain a plurality of object image feature clusters serving as first object image feature clusters;

the candidate object image feature determining submodule is used for determining candidate object image features from the first object image features on the basis of the distance between the candidate object image features and the first object image feature clusters; wherein the candidate image feature does not match the second object image feature;

and the to-be-calibrated sample image determining submodule is used for determining the to-be-calibrated sample image based on the first sample image corresponding to the determined image characteristics of each candidate object.

the candidate image feature determination submodule includes:

the second object image feature cluster determining unit is used for clustering the first object image features to obtain a plurality of object image feature clusters serving as second object image feature clusters;

the first processing unit is used for determining the center of each second object image feature cluster as a center object image feature and determining object image features except the second object image feature cluster in each first object image feature as off-cluster object image features;

a first candidate image feature determination unit, configured to determine a first candidate image feature from each central object image feature based on a distance from each first object image feature cluster; the distance between the first candidate object image feature and each first object image feature cluster is greater than the distance between other object image features in each central object image feature and each first object image feature cluster;

a second candidate image feature determination unit, configured to determine a second candidate image feature from the image features of the objects outside each cluster based on a distance from each first object image feature cluster and a label stability; wherein the label stability of one out-of-cluster object image feature represents: the consistency degree of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster in the images contained in the sample image set corresponding to the image features of the objects outside the cluster;

a final candidate image feature determining unit, configured to determine the first candidate image feature and the second candidate image feature as final candidate image features.

Optionally, the first candidate object image feature determining unit is specifically configured to calculate, for each currently unselected central object image feature, a distance between the central object image feature and the current first image feature set to be compared, as a feature distance corresponding to the central object image feature; the current first image feature set to be compared comprises the centers of the first object image feature clusters;

Optionally, the apparatus further comprises:

before determining a second candidate object image feature from the cluster-outside object image features based on the distance between each first object image feature cluster and the label stability, calculating, for each cluster-outside object image feature, an entropy of an average of confidence degrees of pseudo labels of each object corresponding to the cluster-outside object image feature in each image included in the sample image set corresponding to the cluster-outside object image feature as a first numerical value and an entropy of a confidence degree of the pseudo labels of each object corresponding to the cluster-outside object image feature as a second numerical value;

Optionally, the second candidate object image feature determining unit is specifically configured to calculate, for each image feature of the currently unselected object outside the cluster, a distance between the image feature of the object outside the cluster and a current second image feature set to be compared, as a feature distance corresponding to the image feature of the object outside the cluster; the current second image feature set to be compared comprises the center of each first object image feature cluster;

Optionally, the to-be-calibrated sample image determining submodule is specifically configured to determine, for each determined candidate object image feature, a to-be-enhanced sample image in a sample image set to which the first sample image corresponding to the candidate object image feature belongs, as the to-be-calibrated sample image.

Optionally, the apparatus further comprises:

and a replacing module, configured to replace, before model training is performed on the second detection model to be trained based on the pseudo tag of the object in the third sample image, the real tag of the object in each second sample image, and the real tag of the object in the sample image to be calibrated, except for the sample image to be calibrated, in each first sample image, and before the target detection model of the current detection scene is obtained, the pseudo tag of the object corresponding to the object image feature of the object in each sample image to be calibrated, where the object image feature of the object in each sample image to be calibrated belongs to the same second object image feature cluster, with the real tag of the object mark in the sample image to be calibrated, by the user.

Optionally, the object image feature obtaining module is specifically configured to, for each first sample image, respectively input the first sample image to the first detection model and the general detection model, and obtain a first object image feature of an object in the first sample image, which is output by the feature extraction layers of the first detection model and the general detection model; the universal detection model is obtained by training sample images based on a plurality of different detection scenes;

In order to achieve the above object, an embodiment of the present application discloses an object detection apparatus, including:

the image acquisition module to be detected is used for acquiring an image to be detected;

the detection module is used for inputting the image to be detected to a target detection model of a current detection scene to obtain a detection result of the image to be detected; wherein the target detection model is obtained by using the target detection model training method according to any one of the first aspect.

In another aspect of this application, in order to achieve the above object, an embodiment of this application further discloses an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the target detection model training method according to the first aspect or the target detection method according to the second aspect when executing the program stored in the memory.

In yet another aspect of this embodiment, there is further provided a computer-readable storage medium having stored therein instructions which, when run on a computer, implement the object detection model training method according to the first aspect described above, or the object detection method according to the second aspect.

In yet another aspect of this embodiment, a computer program product containing instructions is provided, which when executed on a computer, causes the computer to perform the object detection model training method according to the first aspect or the object detection method according to the second aspect.

The embodiment of the application provides a target detection model training method, which can acquire each first sample image of an unlabeled label of a current detection scene; detecting each first sample image based on a first detection model to obtain a label of an object in each first sample image as a pseudo label; wherein, the first detection model is: training the labeled second sample images of other detection scenes except the current detection scene to obtain the labeled second sample images; acquiring image characteristics of the object in each first sample image as first object image characteristics, and acquiring image characteristics of the object in each second sample image as second object image characteristics; determining a sample image to be calibrated from each first sample image based on the first object image characteristics and the second object image characteristics so as to obtain a real label marked by an object in the sample image to be calibrated by a user; the sample image to be calibrated is not matched with the other detection scenes; and performing model training on the second detection model to be trained based on the pseudo label of the object in the third sample image except the sample image to be calibrated in each first sample image, the real label of the object in each second sample image and the real label of the object in the sample image to be calibrated to obtain the target detection model of the current detection scene.

Based on the target detection model training method provided by the embodiment of the application, the sample image to be calibrated in the first sample image can be determined, and the sample image to be calibrated is not matched with other detection scenes. That is to say, the first detection model does not learn the image features of the object in the sample image to be calibrated, and the accuracy of the pseudo label of the object in the sample image to be calibrated is low, so that such sample image can be pushed to a user for labeling to obtain the real label of the user, which is used for modeling the second detection model, and the trained target detection model can learn the image features of the object in the sample image to be calibrated. Similarly, the accuracy of the pseudo label of the object in the third sample image is high, and therefore, the second detection model can be directly modeled by using the pseudo label. Therefore, the target detection model of the current detection scene can be obtained only by manually marking the object in a part of the first sample image (namely the sample image to be calibrated), so that the detection cost can be reduced, and the generation efficiency of the detection model is improved. In addition, the trained first detection model is used for guiding and training the second detection model, so that the false alarm rate of the obtained target detection model can be reduced, and the detection accuracy is improved.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a target detection model training method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of another method for training a target detection model according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of another method for training a target detection model according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of another method for training a target detection model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a method for training an object detection model according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an active push module according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of an object detection model training apparatus according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the related art, for a certain detection scene, a large number of labeled sample images corresponding to the detection scene need to be acquired, and then training can be performed based on the labeled sample images to obtain a target detection model corresponding to the detection scene. However, manually labeling a large number of sample images increases complexity and cost of labeling, and further increases cost of detection, thereby reducing generation efficiency of the detection model.

In order to solve the above problem, an embodiment of the present application provides a target detection model training method, which may be applied to an electronic device. For example, the electronic device may be a server provided with a GPU (Graphics Processing Unit), or a file storage server.

Referring to fig. 1, fig. 1 is a flowchart of a method for training a target detection model according to an embodiment of the present application, where the method may include the following steps:

s101: and acquiring each first sample image of the label which is not marked in the current detection scene.

S102: and detecting each first sample image based on the first detection model to obtain the label of the object in each first sample image as a pseudo label.

Wherein, the first detection model is: and training the second sample images labeled with labels on the basis of other detection scenes except the current detection scene to obtain the second sample images.

S103: image features of the object in each of the first sample images are acquired as first object image features, and image features of the object in each of the second sample images are acquired as second object image features.

S104: and determining a sample image to be calibrated from each first sample image based on the first object image characteristics and the second object image characteristics so as to obtain the real label of the object marker in the sample image to be calibrated by the user.

And the sample image to be calibrated is not matched with other detection scenes.

S105: and performing model training on the second detection model to be trained based on the pseudo label of the object in the third sample image except the sample image to be calibrated in each first sample image, the real label of the object in each second sample image and the real label of the object in the sample image to be calibrated to obtain the target detection model of the current detection scene.

The target detection model training method provided by the embodiment of the application can determine the sample image to be calibrated in the first sample image, and the sample image to be calibrated is not matched with other detection scenes. That is to say, the first detection model does not learn the image features of the object in the sample image to be calibrated, and the accuracy of the pseudo label of the object in the sample image to be calibrated is low, so that such sample image can be pushed to a user for labeling to obtain the real label of the user, which is used for modeling the second detection model, and the trained target detection model can learn the image features of the object in the sample image to be calibrated. Similarly, the accuracy of the pseudo label of the object in the third sample image is high, and therefore, the second detection model can be directly modeled by using the pseudo label. Therefore, the target detection model of the current detection scene can be obtained only by manually marking the object in a part of the first sample image (namely the sample image to be calibrated), so that the detection cost can be reduced, and the generation efficiency of the detection model is improved. In addition, the trained first detection model is used for guiding and training the second detection model, so that the false alarm rate of the obtained target detection model can be reduced, and the detection accuracy is improved.

In practical applications, there are usually a number of different detection scenarios. For example, a scene for detecting a surveillance video image at an entrance of a park, a scene for detecting a surveillance video image at an entrance of a mall, a scene for detecting a surveillance video image at a street intersection, and the like.

For each detection scene, the monitoring video image of the detection scene can be detected based on the corresponding target detection model, and the label of the object in the monitoring video image is determined. The object in the monitoring video image may be an animal, or may also be a person, or may also be a vehicle.

Because the difference exists between the monitoring video images of different detection scenes, model training can be performed on the basis of the sample image of the detection scene for each detection scene, so that a target detection model (namely, a target detection model corresponding to the detection scene) suitable for the detection scene is obtained.

With respect to step S101, in one embodiment, the first sample image is enriched to improve the accuracy of the target detection model trained based on the first sample image. Each first sample image includes a plurality of sample image sets. Each sample image set comprises a sample image to be enhanced of the current detection scene without labeling, and an enhanced sample image obtained by performing image enhancement on the sample image to be enhanced. The label of the sample image may include the position of the object in the sample image and the category of the object.

That is to say, a plurality of to-be-enhanced sample images without labels of the current detection scene may be obtained, and then, image enhancement processing is performed on each to-be-enhanced sample image to obtain a corresponding enhanced sample image. Each sample image to be enhanced and the corresponding enhanced sample image are the first sample images.

The image enhancement processing may include at least one of: adjusting the resolution of the sample image to be enhanced (e.g., increasing the resolution, decreasing the resolution), adjusting the brightness of the sample image to be enhanced (e.g., increasing the brightness, decreasing the brightness).

For step S102, when a target detection model suitable for the current detection scene needs to be obtained, a first detection model may be obtained, that is, a detection model obtained by training second sample images based on other detection scenes except the current detection scene is obtained.

In one embodiment, the first detection model may be a fast-Region Convolutional Neural network (fast-RCNN) model, or may be a YOLO model, or may be a Mask-Region Convolutional Neural network (Mask-RCNN) model.

It can be understood that, since the first detection model is trained based on sample images of other detection scenes, the first detection model is not well suited for the current detection scene. That is, the accuracy of the pseudo label of the object in the first sample image determined based on the first detection model may be higher or lower.

With respect to step S103, in one embodiment, the following steps may be included:

the method comprises the following steps: and for each first sample image, inputting the first sample image into a first detection model and a general detection model respectively to obtain first object image features of the object in the first sample image output by the feature extraction layers of the first detection model and the general detection model.

The universal detection model is obtained by training sample images based on a plurality of different detection scenes.

Step two: and for each second sample image, inputting the second sample image into the first detection model and the universal detection model respectively to obtain second object image characteristics of the object in the second sample image output by the characteristic extraction layers of the first detection model and the universal detection model.

In this embodiment, in an implementation manner, the first detection model may include a roilign (Region Of Interest Align) layer. For the sample image (i.e., the first sample image and the second sample image), the image features output by the ROIAlign layer may be obtained, and the image features of the object in the sample image may be obtained through Global Average Pooling processing.

In one implementation, the general detection model is obtained by training a convolutional neural network model based on sample images of a plurality of different detection scenes. That is, the sample images used for training the universal test model may include the second sample image, and may also include labeled sample images of other scenes different from the second sample image and the first sample image.

That is, in the embodiment of the present application, for each first sample image, the object image feature (which may be referred to as a domain image feature) obtained based on the first detection model may be obtained, and the object image feature (which may be referred to as a general image feature) obtained based on the general detection model may also be obtained. Similarly, for each second sample image, the object image feature obtained based on the first detection model, that is, the domain image feature, may also be obtained based on the general detection model, that is, the general image feature.

Based on the above processing, by combining two different detection models (i.e., the first detection model and the general detection model), the image characteristics of the first object and the image characteristics of the second object can be enriched, and further, based on step S104, the reliability of the determined sample image to be calibrated can be improved, so as to further improve the accuracy of the target detection model.

With respect to step S103, in another embodiment, only the first object image feature of the object in the first sample image and the second object image feature of the object in the second sample image may be extracted based on the first detection model.

In another embodiment, the first object image feature of the object in the first sample image and the second object image feature of the object in the second sample image may be extracted based on the common detection model only.

For step S104, the sample image to be calibrated is the sample image that needs to be pushed to the user, and accordingly, the user may mark the object in the sample image to be calibrated, and may obtain the real label of the object in the sample image to be calibrated.

In the embodiment of the application, since the sample image to be calibrated is not matched with other detection scenes (i.e., the detection scene of the second sample image), the accuracy of the pseudo label of the object in the sample image to be calibrated, which is obtained based on the first detection model, is also low, and a user is required to mark such sample image. On the contrary, the accuracy of the pseudo label of the object in the sample images (i.e., the third sample image in the embodiment of the present application) other than the sample image to be calibrated is relatively high, and thus, the pseudo label does not need to be marked by the user and can be directly used for performing model training on the second detection model.

In one embodiment, the object image features may be represented in the form of feature vectors, and accordingly, for each first object image feature, a sum of distances between the first object image feature and each second object image feature may be calculated, and a first sample image corresponding to the first object image feature with a larger calculated sum is used as the sample image to be calibrated. The first object image is a first sample image corresponding to the feature, that is, the first sample image to which the object corresponding to the feature belongs.

In another embodiment, referring to fig. 2, on the basis of fig. 1, the step S104 may include the following steps:

s1041: and clustering the second object image features to obtain a plurality of object image feature clusters as the first object image feature clusters.

S1042: candidate image features are determined from the first object image features based on distances from the first object image feature clusters.

Wherein the object image feature does not match the second object image feature.

S1043: and determining a sample image to be calibrated based on the determined first sample image corresponding to the image characteristics of each candidate object.

In the embodiment of the present application, when clustering, the similar second object image features may be divided into a first object image feature cluster. Each first object image feature cluster has a center (which may be referred to as a first center object image feature). By calculating the distance between the first object image feature and the first object image feature cluster and determining the sample image to be calibrated instead of calculating the distance between the first object image feature and each second object image feature, the calculation amount can be reduced, and the efficiency of determining the sample image to be calibrated can be improved.

The Clustering may be performed by using a hierarchical Clustering Algorithm, or may be performed by using, for example, K-Means (K-Means Clustering Algorithm).

In one implementation, a distance between the first object image feature and the first center object image feature may be calculated as a distance between the first object image feature and the first object image feature cluster.

Alternatively, the sum of the distances between the first object image feature and a specified number of second object image features in each first object image feature cluster may be calculated as the distance between the first object image feature and the first object image feature cluster.

With reference to step S1042, in an embodiment, for each first object image feature, a sum of distances between the first object image feature and each first central object image feature may be calculated, and a first sample image corresponding to the first object image feature with a larger calculated sum is used as a sample image to be calibrated.

In another embodiment, referring to fig. 3, on the basis of fig. 2, the step S1042 may include the following steps:

s10421: and clustering the first object image features to obtain a plurality of object image feature clusters as second object image feature clusters.

S10422: determining the center of each second object image feature cluster as a center object image feature, and determining object image features except the second object image feature cluster in each first object image feature as out-of-cluster object image features.

S10423: a first candidate image feature is determined from the central object image features based on a distance from each first object image feature cluster.

And the distance between the first candidate object image feature and each first object image feature cluster is greater than the distance between other object image features in each central object image feature and each first object image feature cluster.

S10424: and determining a second candidate object image feature from the object image features outside each cluster based on the distance between each first object image feature cluster and the label stability.

Wherein the label stability of one out-of-cluster object image feature represents: the degree of coincidence of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster in the images included in the sample image set corresponding to the image features of the objects outside the cluster.

S10425: and determining the first candidate object image characteristic and the second candidate object image characteristic as a final candidate object image characteristic.

In the embodiment of the application, in order to further reduce the calculation amount and improve the efficiency of determining the sample image to be calibrated, the image features of each first object may be clustered to obtain a plurality of second object image feature clusters. Each second object image feature cluster has a center (which may be referred to as a second center object image feature). In addition, the first object image feature that does not belong to the second object image feature cluster (i.e., the out-of-cluster object image feature in the embodiment of the present application) may also be determined.

Accordingly, the second central object image feature and the cluster-outside object image feature may be processed separately to determine the candidate object image feature therefrom.

That is, from among the second center object image features, an object image feature having a large distance from each first object image feature cluster is specified as a first candidate image feature.

And determining the object image features which are relatively large in distance from the first object image feature clusters and relatively low in label stability from the object image features outside the clusters as second candidate object image features.

The label stability is low, that is, the degree of coincidence of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster is low in the images included in the sample image set corresponding to the image features of the objects outside the cluster. That is, the first detection model cannot effectively identify the first sample image to which the image features of such out-of-cluster objects belong, that is, the accuracy of the pseudo labels of the objects corresponding to the image features of such out-of-cluster objects is low, and therefore, the pseudo labels can be pushed to a user to obtain real labels.

In an embodiment, referring to fig. 4, on the basis of fig. 3, the step S10423 may include the following steps:

s104231: and calculating the distance between the image feature of the central object and the current first image feature set to be compared as the feature distance corresponding to the image feature of the central object aiming at each image feature of the central object which is not selected currently.

The current first image feature set to be compared comprises the center of each first object image feature cluster.

S104232: and selecting the central object image features with the maximum corresponding feature distance as first candidate object image features, adding the central object image features with the maximum corresponding feature distance to the current first image feature set to be compared, and returning to execute the step S104231 until determining a first preset number of central object image features as the first candidate object image features.

For example, the second center object image features include: a central object image feature 1, a central object image feature 2, a central object image feature 3, and a central object image feature 4; the first central object image feature comprises: center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

That is, the initial first set of image features to be compared includes: center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

At this time, the currently unselected second center object image features include: center object image feature 1, center object image feature 2, center object image feature 3, and center object image feature 4. The sum of the distances between each second central object image feature that is not currently selected and the object image features in the current first image feature set to be compared (i.e., the feature distance) may be calculated, and the corresponding second central object image feature with the largest feature distance (e.g., the central object image feature 4) may be selected as the first candidate object image feature.

Then, the central object image feature 4 is added to the current first set of image features to be compared, i.e. the current first set of image features to be compared comprises: center object image features 4, center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

Accordingly, the currently unselected second center object image features include: center object image feature 1, center object image feature 2, and center object image feature 3. The sum of the distances between each second central object image feature that is not currently selected and the object image features in the current first image feature set to be compared (i.e., the feature distance) may be calculated, and the corresponding second central object image feature with the largest feature distance (e.g., central object image feature 2) may be selected as the first candidate object image feature.

Then, the central object image feature 2 is added to the current first set of image features to be compared, i.e. the current first set of image features to be compared comprises: center object image features 2, center object image features 4, center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

And so on until determining the second central object image features of the first preset number, that is, determining the first candidate object image features of the first preset number.

In an embodiment, referring to fig. 4, on the basis of fig. 3, the step S10424 may include the following steps:

s104241: and calculating the distance between the image features of the objects outside the cluster and the current second image feature set to be compared according to the image features of the objects outside the cluster which are not selected currently, and taking the distance as the feature distance corresponding to the image features of the objects outside the cluster.

And the current second image feature set to be compared comprises the center of each first object image feature cluster.

S104242: and calculating the label stability of the image features of the objects outside the cluster, and obtaining a target numerical value by the weighted sum of the label stability and the corresponding feature distance.

S104243: and selecting the corresponding image features of the out-of-cluster objects with the largest target values as second candidate object image features, adding the corresponding image features of the out-of-cluster objects with the largest target values to the current second image feature set to be compared, and returning to execute the step S104241 until determining a second preset number of image features of the out-of-cluster objects as second candidate object image features.

For example, the image features of the objects outside each cluster include: the method comprises the following steps of 1, 2, 3 and 4, wherein the image features of the objects outside the clusters comprise an image feature 1, an image feature 2, an image feature 3 and an image feature 4; the first central object image feature comprises: center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

That is, the initial second set of image features to be compared includes: center object image features 5, center object image features 6, center object image features 7, and center object image features 8.

At this time, the image features of the currently unselected out-of-cluster objects include: an out-of-cluster object image feature 1, an out-of-cluster object image feature 2, an out-of-cluster object image feature 3, and an out-of-cluster object image feature 4. The total value of the distances between the image features of each object outside the cluster which is not selected currently and the image features of the objects in the current second image feature set to be compared (i.e. the feature distance) can be calculated respectively, and the weighted sum of the feature distance and the label stability of the image features of the object outside the cluster can be calculated to obtain the target value. The corresponding image feature of the out-of-cluster object with the largest target value (for example, the image feature 4 of the out-of-cluster object) is selected as the second candidate image feature.

Then, the out-of-cluster object image features 4 are added to the current second set of image features to be compared, i.e. the current second set of image features to be compared includes: an out-of-cluster object image feature 4, a center object image feature 5, a center object image feature 6, a center object image feature 7, and a center object image feature 8.

Accordingly, the image features of the currently unselected out-of-cluster objects include: an out-of-cluster object image feature 1, an out-of-cluster object image feature 2, and an out-of-cluster object image feature 3. The total value of the distances between the image features of each object outside the cluster which is not selected currently and the image features of the objects in the current second image feature set to be compared (i.e. the feature distance) can be calculated respectively, and the weighted sum of the feature distance and the label stability of the image features of the object outside the cluster can be calculated to obtain the target value. The corresponding image feature of the out-of-cluster object with the largest target value (for example, image feature 2 of the out-of-cluster object) is selected as the second candidate image feature.

Then, the image features 2 of the out-of-cluster object are added to the current second image feature set to be compared, that is, the current second image feature set to be compared includes: an out-of-cluster object image feature 2, an out-of-cluster object image feature 4, a center object image feature 5, a center object image feature 6, a center object image feature 7, and a center object image feature 8.

And repeating the steps until the image features of the second preset number of the out-of-cluster objects are determined, namely determining the image features of the second preset number of the second candidate objects.

In one embodiment, before the step S10424, the method may further include the steps of:

step 1: and calculating the entropy of the mean value of the confidence degrees of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster in the images contained in the sample image set corresponding to the image features of the objects outside the cluster as a first numerical value, and calculating the entropy of the confidence degrees of the pseudo labels of the objects corresponding to the image features of the objects outside the cluster as a second numerical value.

Step 2: and calculating the difference value of the first numerical value and the second numerical value as the label stability of the image characteristics of the object outside the cluster.

In this embodiment of the application, for each object in the first sample image, based on the first detection model, the confidence level of the pseudo tag of the object may be determined. For example, the probability that the object is a preset object may be used.

Accordingly, a sample image set (which may be referred to as a target sample image set) corresponding to each of the out-of-cluster object image features may be determined, that is, a sample image set to which a sample image to which an object corresponding to each of the out-of-cluster object image features belongs. It can be understood that the sample image to which the object corresponding to the image feature of the one out-of-cluster object belongs may be a sample image to be enhanced, and may also be an enhanced sample image.

One target sample image set comprises a plurality of images with the same image content, and the images have corresponding pseudo labels for the same object, so that the average value of the confidence degrees of the pseudo labels of the images for the same object can be calculated, and the entropy of the average value can be calculated as a first numerical value. In addition, the entropy of the confidence of each image with respect to each pseudo tag of the same object may be calculated, and the average value of each entropy may be calculated as the second value. The difference between the first value and the second value is the label stability corresponding to the target sample image set, that is, the label stability of each image in the target sample image set with respect to each object image feature of the object.

That is, the images in one target sample image set share the same label stability for each object image feature of the same object.

In one embodiment, the step S1043 may include the following steps:

In the embodiment of the application, the determined image feature of the candidate object may be an image feature of an object in a sample image to be enhanced, and correspondingly, the sample image to be enhanced may be determined as a sample image to be calibrated.

The determined image feature of the candidate object may also be an image feature of an object in the enhanced sample image, and accordingly, the sample image to be enhanced corresponding to the enhanced sample image may be determined as a sample image to be calibrated.

In one embodiment, each first sample image may include a plurality of objects, that is, one first sample image may correspond to one object candidate image feature, or may correspond to a plurality of object candidate image features. Correspondingly, if the number of the first sample images (which may be referred to as sample images to be selected) corresponding to the candidate object image features is large, a third preset number of sample images may be selected from the sample images to be selected as sample images to be calibrated based on the number of the corresponding candidate object image features.

For example, a sample image with a larger number of corresponding candidate image features may be selected from the sample images to be selected as the sample image to be calibrated.

In one embodiment, before the step S105, the method may further include the steps of:

In the embodiment of the present application, after obtaining the real label of the object mark in the sample image to be calibrated by the user, an object image feature (which may be referred to as an object image feature to be processed) that belongs to the same cluster (i.e., the second object image feature cluster) as the object image feature of the object in the sample image to be calibrated may also be determined.

Because the characteristics of the object images belonging to the same cluster are similar, the pseudo label of the object corresponding to the characteristics of the object image to be processed can be replaced by the real label marked by the user on the object in the sample image to be calibrated, so that the replaced pseudo label is closer to the real label. Accordingly, based on step S105, the second detection model can be model-trained according to the replaced pseudo label, and the accuracy of the trained target detection model can be improved.

Based on the processing, the user only needs to mark a part of the first sample image, and correspondingly, the real label marked by the user can be diffused, so that the marking cost of the user is reduced, and the accuracy of the trained target detection model can be improved.

For step S105, in an implementation manner, the second detection model may be a model that is not trained based on the sample image, and accordingly, the target detection model obtained by training the second detection model can effectively detect the image of the current detection scene.

In another implementation manner, the second detection model may also be a model obtained by training based on the second sample image, and the model structure of the first detection model is more complex than that of the second detection model. That is, the second detection model has a different model structure from the first detection model, and the detection performance of the first detection model is better than that of the second detection model for the detection scene corresponding to the second sample image. Correspondingly, the target detection model obtained by training the second detection model can be suitable for both the current detection scene and the detection scene corresponding to the second sample image.

In one embodiment, referring to fig. 5, fig. 5 is a schematic diagram illustrating a method for training an object detection model according to an embodiment of the present disclosure.

The input data includes: the method comprises the steps of labeling data (namely labeled second sample images of other detection scenes in the embodiment of the application), non-labeling data (namely unlabeled sample images to be enhanced of the current detection scene in the application), and a basic model (namely a second detection model in the application).

The system processing comprises the following steps:

the high-precision model training module 101 trains a high-precision detection model (i.e., the first detection model in the embodiment of the present application) using the labeled data, so as to obtain the pseudo label of the object in each first sample image.

The active pushing module 102 utilizes a high-precision detection model to actively push the non-labeled data to obtain the data to be calibrated (i.e., the sample image to be calibrated in the present application), so that the sample image to be calibrated is manually marked. That is, first object image features of the object in each first sample image and second object image features of the object in each second sample image are acquired. And determining a sample image to be calibrated from each first sample image based on the image characteristics of each first object and the image characteristics of each second object.

Then, the semi-supervised model training module 103 performs semi-supervised training to obtain a semi-supervised model. And performing model training on the second detection model to be trained based on the pseudo label of the object in the third sample image, the real label of the object in each second sample image and the real label of the object in the sample image to be calibrated to obtain the target detection model of the current detection scene.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an active push module according to an embodiment of the present disclosure.

The label candidate pool generating unit 201 performs image enhancement processing on the label-free data, that is, obtains a first sample image from the sample image to be enhanced. And generating a label candidate pool based on forward prediction of a high-precision detection model. That is, the first sample images are detected based on the first detection model, and the pseudo labels of the objects in the first sample images are obtained.

The feature extraction unit 202 receives a first sample image and a second sample image as input, obtains a domain image feature after image features output by a RoIAlign layer of a high-precision detection model are subjected to global average pooling, and obtains a general image feature by using a pre-trained convolutional neural network general model.

And the hierarchical clustering unit 203 is configured to perform hierarchical clustering on the domain image features and the general image features of the first sample image output by the feature extraction unit 202 to obtain a second object image feature cluster. And carrying out hierarchical clustering on the domain image features and the general image features of the second sample image to obtain a first object image feature cluster.

The pushing unit 204 obtains a pushed list (i.e., a sample image to be calibrated) based on the first object image feature cluster and the second object image feature cluster.

In an embodiment, a target detection method is further provided, which may obtain an image to be detected, and input the image to be detected to a target detection model of a current detection scene obtained by using the target detection model training method in the above embodiment, so as to obtain a detection result of the image to be detected.

The target detection method provided by the embodiment of the application can determine the sample image to be calibrated in the first sample image, and the sample image to be calibrated is not matched with other detection scenes. That is to say, the first detection model does not learn the image features of the object in the sample image to be calibrated, and the accuracy of the pseudo label of the object in the sample image to be calibrated is low, so that such sample image can be pushed to a user for labeling to obtain the real label of the user, which is used for modeling the second detection model, and the trained target detection model can learn the image features of the object in the sample image to be calibrated. Similarly, the accuracy of the pseudo label of the object in the third sample image is high, and therefore, the second detection model can be directly modeled by using the pseudo label. Therefore, the target detection model of the current detection scene can be obtained only by manually marking the object in a part of the first sample image (namely, the sample image to be calibrated), so that the detection cost can be reduced, the generation efficiency of the detection model is improved, and correspondingly, the detection efficiency can also be improved. In addition, the trained first detection model is used for guiding and training the second detection model, so that the false alarm rate of the obtained target detection model can be reduced, and the detection accuracy is improved.

In one embodiment, the detection result may include a position of the object in the image to be detected, and/or a probability that the image to be detected contains the preset object.

In one embodiment, the target detection model is used for detecting a preset object. Specifically, by setting different parameters of the target detection model, the target detection model can output the position of the object in the image to be detected and the probability that the included object is the preset object.

In addition, if the number of the preset objects is multiple, for each preset object, the probability that the object included in the image to be detected is the preset object can be obtained. That is to say, an object in the image to be detected may correspond to multiple probabilities, and if one of the probabilities is greater than the preset probability threshold, it is indicated that the object is the preset object corresponding to the probability, that is, the image to be detected includes the preset object.

In addition, the position information of the object in the image to be detected can be determined. For example, the coordinates of the vertices of the minimum bounding rectangle of the object in the image to be detected can be determined.

It is understood that the detection result output by the target detection model is not limited to the data shown in the above embodiments, and other different types of detection results can be obtained by setting the output parameters of the target detection model. For example, the number of objects included in the image to be detected, the image screen of the objects included in the image to be detected, and the like can also be obtained.

Based on the same inventive concept, an embodiment of the present application further provides a target detection model training apparatus, referring to fig. 7, where fig. 7 is a structural diagram of the target detection model training apparatus provided in the embodiment of the present application, and the apparatus may include:

a first sample image obtaining module 701, configured to obtain each first sample image of a current detection scene that is not labeled with a label;

a pseudo tag obtaining module 702, configured to detect each first sample image based on a first detection model, to obtain a tag of an object in each first sample image, where the tag is used as a pseudo tag; wherein the first detection model is: training the labeled second sample images of other detection scenes except the current detection scene to obtain the labeled second sample images;

an object image feature obtaining module 703, configured to obtain an image feature of an object in each first sample image as a first object image feature, and obtain an image feature of an object in each second sample image as a second object image feature;

a to-be-calibrated sample image determining module 704, configured to determine, based on the first object image features and the second object image features, a to-be-calibrated sample image from each first sample image, so as to obtain a real tag of an object marker in the to-be-calibrated sample image, where the real tag is marked by a user; the sample image to be calibrated is not matched with the other detection scenes;

the model training module 705 is configured to perform model training on the second detection model to be trained based on the pseudo label of the object in the third sample image, the real label of the object in each second sample image, and the real label of the object in the sample image to be calibrated, which are included in each first sample image, to obtain the target detection model of the current detection scene.

Optionally, the module 704 for determining an image of a sample to be calibrated includes:

the candidate image feature determination submodule includes:

Optionally, the apparatus further comprises:

Optionally, the object image feature obtaining module 703 is specifically configured to, for each first sample image, respectively input the first sample image into the first detection model and the general detection model, so as to obtain a first object image feature of an object in the first sample image output by the feature extraction layers of the first detection model and the general detection model; the universal detection model is obtained by training sample images based on a plurality of different detection scenes;

Based on the same inventive concept, the embodiment of the present application further provides an object detection apparatus, which may include:

The embodiment of the present application further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the target detection model training method or the target detection method in the above embodiments when executing the program stored in the memory 803.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is enabled to execute the target detection model training method or the target detection method provided in the embodiment of the present application.

The present application further provides another computer program product containing instructions, which when run on a computer, causes the computer to execute the target detection model training method provided in the present application, or the target detection method.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for training an object detection model, the method comprising:

2. The method of claim 1, wherein the determining the sample image to be calibrated from each first sample image based on the image features of each first object and the image features of each second object comprises:

3. The method of claim 2, wherein each first sample image comprises a plurality of sample image sets; each sample image set comprises a sample image to be enhanced of the current detection scene without labeling, and an enhanced sample image obtained by performing image enhancement on the sample image to be enhanced;

4. The method of claim 3, wherein determining a first candidate image feature from the central object image features based on the distance to the first object image feature clusters comprises:

5. The method of claim 3, wherein prior to determining the second candidate image feature from the out-of-cluster object image features based on the distance from each cluster of first object image features and the label stability, the method further comprises:

6. The method of claim 3, wherein determining the second candidate image feature from the cluster-outside object image features based on the distance from each cluster of first object image features and the label stability comprises:

7. The method according to claim 3, wherein the determining a sample image to be calibrated based on the first sample image corresponding to the determined image feature of each candidate object comprises:

8. The method according to claim 3, wherein before model training is performed on the second detection model to be trained based on the pseudo label of the object in the third sample image in each first sample image except the sample image to be calibrated, the real label of the object in each second sample image, and the real label of the object in the sample image to be calibrated, and a target detection model of the current detection scene is obtained, the method further comprises:

9. The method according to claim 1, wherein the acquiring image features of the object in each first sample image as first object image features and acquiring image features of the object in each second sample image as second object image features comprises:

10. A method of object detection, the method comprising:

acquiring an image to be detected;

inputting the image to be detected into a target detection model of a current detection scene to obtain a detection result of the image to be detected; wherein the object detection model is obtained by the object detection model training method according to any one of claims 1 to 9.

11. The method according to claim 10, wherein the detection result comprises a position of an object in the image to be detected and/or a probability that the image to be detected contains a preset object.

12. An object detection model training apparatus, characterized in that the apparatus comprises:

13. An object detection apparatus, characterized in that the apparatus comprises:

the detection module is used for inputting the image to be detected to a target detection model of a current detection scene to obtain a detection result of the image to be detected; wherein the object detection model is obtained by the object detection model training method according to any one of claims 1 to 9.

14. The apparatus according to claim 13, wherein the detection result comprises a position of an object in the image to be detected, and/or a probability that the image to be detected contains a preset object.

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-9, or 10-11 when executing a program stored in a memory.

16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 9, or 10 to 11.