CN112215255A

CN112215255A - Training method of target detection model, target detection method and terminal equipment

Info

Publication number: CN112215255A
Application number: CN202010933269.0A
Authority: CN
Inventors: 李国法; 纪泽锋
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2021-01-12
Anticipated expiration: 2040-09-08
Also published as: CN112215255B

Abstract

The application is applicable to the technical field of image processing, and provides a training method of a target detection model, which comprises the following steps: acquiring a plurality of data sets, wherein each data set comprises a source domain image and a target domain image; converting the source domain image and the target domain image in each data set to obtain a transition domain image; inputting the transition domain image and the target domain image into a confrontation learning model, and training a target detection model to obtain a trained target detection model; the antagonistic learning model comprises a target detection model and the domain classification model; the object detection model and the domain classification model are used for counterlearning. According to the scheme, the domain classification model and the target detection model are used for counterstudy, so that the target detection model is suitable for different scenes, and the detection precision of images in different scenes is improved.

Description

Training method of target detection model, target detection method and terminal equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a training method of a target detection model, a target detection method and terminal equipment.

Background

The method has the advantages that due to the rapid development of deep learning technology in recent years, the performance of target detection aiming at images is increasingly excellent, and the method is widely applied to the fields of automatic driving, intelligent transportation, monitoring systems, face detection and the like.

However, conventional target detection techniques often rely on a large number of reliable annotated source domain images for supervised learning. Conventional object detection techniques do not perform well when the fields of the training data and the test data are different (where a field refers to the environment in which the image is located, e.g., day and night are different fields). The domain difference or domain transfer often leads to unfavorable generalization of the model, resulting in low detection accuracy of the image, which is a problem to be solved urgently.

Disclosure of Invention

In view of this, embodiments of the present application provide a training method for a target detection model, a target detection method, and a terminal device, which can solve the technical problem that domain difference or domain transfer often causes unfavorable generalization of a model, resulting in low detection accuracy of an image.

A first aspect of an embodiment of the present application provides a method for training a target detection model, where the method includes:

acquiring a plurality of data sets, wherein each data set comprises a source domain image and a target domain image; the source domain image comprises the labeling information of the target object; the target domain image does not comprise the labeling information of the target object;

converting the source domain image and the target domain image in each data set to obtain a transition domain image; the transition domain image comprises the target object in the source domain image, the annotation information in the source domain image and background information in the target domain image;

inputting the transition domain image and the target domain image into a confrontation learning model, and training a target detection model to obtain a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning.

A second aspect of an embodiment of the present application provides a method for target detection, where the method includes:

collecting an image to be detected;

acquiring a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image;

and inputting the image to be recognized into the target detection model to obtain a target detection result output by the target detection model.

A third aspect of an embodiment of the present application provides a training apparatus for a target detection model, where the apparatus includes:

a first acquisition unit configured to acquire a plurality of data sets, each of the data sets including a source domain image and a target domain image; the source domain image comprises the labeling information of the target object; the target domain image does not comprise the labeling information of the target object;

the image processing unit is used for converting the source domain image and the target domain image in each data set to obtain a transition domain image; the transition domain image comprises the target object in the source domain image, the annotation information in the source domain image and background information in the target domain image;

the training unit is used for inputting the transition domain image and the target domain image into a confrontation learning model, training a target detection model and obtaining a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning.

A fourth aspect of an embodiment of the present application provides an apparatus for object detection, including:

the acquisition unit is used for acquiring an image to be detected;

the second acquisition unit is used for acquiring a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image;

and the identification unit is used for inputting the image to be identified into the target detection model to obtain a target detection result output by the target detection model.

A fifth aspect of embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect or the second aspect when executing the computer program.

A sixth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the method of the first or second aspect.

Compared with the prior art, the embodiment of the application has the advantages that: the method comprises the steps of obtaining a plurality of data sets, wherein each data set comprises a source domain image and a target domain image; converting the source domain image and the target domain image in each data set to obtain a transition domain image; inputting the transition domain image and the target domain image into a confrontation learning model, and training a target detection model to obtain a trained target detection model; the antagonistic learning model comprises a target detection model and the domain classification model; the object detection model and the domain classification model are used for counterlearning. According to the scheme, the domain classification model and the target detection model are used for counterstudy, so that the target detection model is suitable for different scenes, and the detection precision of images in different scenes is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 shows a schematic flow chart of a training method of an object detection model provided by the present application;

FIG. 2 shows a schematic diagram of a source domain image provided herein;

FIG. 3 shows a schematic diagram of a target domain image provided herein;

FIG. 4 is a specific schematic flowchart illustrating a step 102 in a training method of an object detection model provided in the present application;

FIG. 5 shows a schematic diagram of a first image provided herein;

FIG. 6 is a flowchart illustrating a step 103 in a method for training a target detection model provided in the present application;

FIG. 7 is a schematic diagram illustrating a hierarchy of a target detection model and a domain classification model in a training method of the target detection model provided by the present application;

FIG. 8 is a specific schematic flowchart illustrating a step 1032 in a training method of an object detection model provided in the present application;

FIG. 9 is a specific schematic flowchart illustrating step 1035 of a method for training an object detection model provided in the present application;

FIG. 10 is a flow chart illustrating the step 1037 of a method for training an object detection model provided by the present application;

FIG. 11 is a schematic flow chart diagram illustrating another method of training a target detection model provided herein;

FIG. 12 shows a schematic flow diagram of a method of object detection provided herein;

FIG. 13 is a schematic diagram of a training apparatus for an object detection model provided herein;

FIG. 14 shows a schematic diagram of an apparatus for object detection provided herein;

fig. 15 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to monitoring ". Similarly, the phrase "if it is determined" or "if [ a described condition or event ] is monitored" may be interpreted depending on the context to mean "upon determining" or "in response to determining" or "upon monitoring [ a described condition or event ]" or "in response to monitoring [ a described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

In machine learning, a domain refers to a scene of an image. For example, the daytime and the nighttime are different domains, and the sunny day and the rainy day are different domains. In different domains, there are different differences in the objects. While traditional target detection usually assumes that both training data and test data come from the same domain, real-world applications tend to have data in a different domain than training data. If the target detection model is suitable for different domains, a large number of source domain images with labeling information are needed. However, collecting source domain images with annotation information is expensive and sometimes impossible.

Conventional object detection techniques do not work well when the fields of the training data and the test data are different (where a field refers to a different scene of the image). For example, domains differ in scene, weather, lighting conditions, and camera settings. However, domain differences or domain transfer often lead to generalization of the model, resulting in low target detection accuracy, which is a problem to be solved urgently.

In view of this, embodiments of the present application provide a training method for a target detection model, a target detection method, and a terminal device, which can solve the above technical problems.

Referring to fig. 1, fig. 1 shows a schematic flow chart of a training method of a target detection model provided in the present application.

As shown in fig. 1, the method may include the steps of:

step 101, acquiring a plurality of data sets, wherein each data set comprises a source domain image and a target domain image; the source domain image comprises the labeling information of the target object; the target domain image does not include the labeling information of the target object.

A plurality of pre-stored source domain images and target domain images are acquired. The source domain image includes annotation information of the target object, please refer to fig. 2, and fig. 2 shows a schematic diagram of the source domain image provided in the present application. As shown in fig. 2, the source domain image is a road image acquired in daytime, a rectangular frame (target object) near the vehicle in the source domain image is label information, and the label information frame defines the size and position of the target object. The target domain image is an image of a domain different from the source domain and does not have annotation information, please refer to fig. 3, and fig. 3 shows a schematic diagram of the target domain image provided by the present application. As shown in fig. 3, the target domain image is a road image acquired at night.

102, converting the source domain image and the target domain image in each data set to obtain a transition domain image; the transition domain image includes the target object in the source domain image, the annotation information in the source domain image, and background information in the target domain image.

The source domain image and the target domain image typically have some common points (i.e., the same feature distribution) and differences. Therefore, the source domain image and the target domain image are combined to obtain the transition domain image, so that the transition domain image has the characteristics of the source domain image and the target domain image simultaneously. The method can reduce the difference between different domains, better capture the common point of the two domains and realize the training purpose of the target detection model.

The process of converting the source domain image and the target domain image comprises the following steps:

specifically, step 102 specifically includes the following steps. Referring to fig. 4, fig. 4 is a schematic flowchart illustrating a step 102 in a training method of a target detection model provided in the present application.

Step 1021, inputting the source domain image and the target domain image into an image conversion model to obtain a first image output by the image conversion model; the first image includes the target object in the source domain image and background information in the target domain image.

And converting the source domain image and the target domain image into a first image through an image conversion model. The first image comprises a target object in the source domain image and background information in the target domain image, and the position and the size of the target object in the image are unchanged. Namely, background information in the target domain image and a target object in the source domain image are synthesized into a first image.

The image conversion model may be CycleGAN (cyclic generation countermeasure network) or the like.

Illustratively, two generation countermeasure networks (GANs) are provided in the CycleGAN. Each GAN is provided with a generator and a discriminator for learning a conversion function between a source domain image and a target domain image. Each GAN generator will learn its corresponding transfer function (i.e., the transfer function of the source domain image to the target domain image, or the transfer function of the target domain image to the source domain image) by minimizing the loss. And generating a corresponding conversion image through the conversion function. The discriminator calculates the difference between the converted image and the original image (for example, the difference between the source domain converted image and the source domain image, or the difference between the target domain converted image and the target domain image) to calculate the generator loss, and then updates the generator and the discriminator parameters, so as to improve each other and obtain a better conversion effect. And further obtaining a first image output by the image conversion model.

Step 1022, synthesizing the labeling information of the target object into the first image to obtain the transition region image.

And synthesizing the labeling information of the target object into a first image according to the position of the target object to obtain a transition domain image. Referring to fig. 5, fig. 5 shows a schematic diagram of a first image provided by the present application. As shown in fig. 5, the transition domain image has the same target object and annotation information as the source domain image, and also has the same background information as the target domain image, so as to reduce the difference between different domains.

103, inputting the transition domain image and the target domain image into a counterstudy model, and training a target detection model to obtain a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning.

The countercheck learning model comprises two models, namely a target detection model and a domain classification model. And inputting the transition domain image and the target domain image into the counterstudy model, and counterstudy the target detection model and the domain classification model to obtain the trained target detection model.

Specifically, step 103 specifically includes the following steps. Referring to fig. 6, fig. 6 is a schematic flowchart illustrating step 103 of a method for training a target detection model provided in the present application.

Step 1031, extracting a transition feature map in the transition domain image and a target feature map in the target domain image through a first convolution network shared by a target detection model and a domain classification model.

In order to better explain the technical solution of the present embodiment, the present embodiment explains the technical solution with reference to fig. 7. Referring to fig. 7, fig. 7 is a schematic diagram illustrating a hierarchical structure of a counterlearning model in a training method of a target detection model provided in the present application. As shown in fig. 7, the antagonistic learning model includes a first convolutional network, a second convolutional network, a third convolutional network, a domain classifier, a gradient inversion layer, and an object detector. The first convolution network is used for extracting similar features in the transition domain image and the target domain image. And the second convolution network is used for extracting the characteristic related to the labeling information in the transition characteristic diagram. And the third convolution network is used for extracting the domain type related features in the target feature map. The domain classifier is used to identify a domain type. The target detector is used for identifying the labeling information and for classifying the target object. The gradient inversion layer is used for inverting the partial derivatives of the back propagation to realize the effect of the first convolution network and the domain classifier on the counterlearning.

Fig. 7 includes a hierarchy of object detection models and domain classification models. The first convolutional network to target detector is a hierarchical structure of a target detection model. The first convolutional network-to-domain classifier is a hierarchy of domain classification models.

As can be seen from the above, the first convolutional network is a convolutional network common to the target detection model and the domain classification model. In the embodiment, the transition domain image and the target domain image are input into the first convolution network, and the transition feature map in the transition domain image and the target feature map in the target domain image are extracted through the first convolution network.

It can be understood that the transition feature map and the target feature map are extracted by the first convolution network. Therefore, the transition feature map and the target feature map are features common to the transition region image and the target region image.

And step 1032, identifying the labeling information in the transition domain image according to the transition feature map.

And after the transition feature map is obtained, identifying the labeling information in the transition domain image according to the subsequent hierarchical structure of the target detection model.

Specifically, step 1032 specifically includes the following steps. Referring to fig. 8, fig. 8 is a specific schematic flowchart illustrating step 1032 in a training method of an object detection model provided in the present application.

Step 1032a, extracting a first feature map to be identified in the transition feature map through a second convolution network in the target detection model.

And inputting the transition feature map into a second convolution network, and extracting a first feature map to be identified in the transition feature map.

And 1032b, identifying the labeling information in the transition domain image according to the first feature map to be identified.

Because the traditional cross-domain target detection usually obtains a plurality of candidate regions by sliding the suspicious regions which are arranged in a complex way on the image. And the candidate regions are distributed and aligned, and a large amount of error information often exists in the extracted candidate regions, and the target detection performance is damaged by forced alignment distribution.

Therefore, in view of the above problems, the present application predicts the key point (i.e., the center) of the annotation information by the recognition method in the centret (key point detection model). Other attributes of the annotation information, such as size, 3D position, orientation, and even its pose, are then regressed.

It is understood that when the transition region image is processed layer by layer, the first feature map to be identified is heatmap. The heatmap includes a plurality of thermal peak points, which are key points of each target object. Therefore, the midpoint of the labeling information (i.e. the matrix labeling frame) can be predicted according to the method for detecting the key point.

And 1033, calculating a first loss value of the target detection model according to the labeling information.

And acquiring real labeling information in the source domain image. And calculating a first loss value of the target detection model according to the real labeling information and the labeling information output by the target detection model. The labeling information output by the target detection model comprises target key points and rectangular frame size information.

The first loss value is calculated by the following formula one:

L_det＝L_k+λ_offL_off+λ_sizeL_size

wherein L is_detRepresents a first loss value, L_kRepresents the target keypoint loss value, L_offRepresents the target center offset loss value, L_sizeRepresenting the target size loss value.

Target keypoint loss value L_kThe calculation process is as follows:

the central point coordinates of the real labeling information are subjected to down-sampling processing, and the central point coordinates of the real labeling information are distributed to a transition characteristic diagram through a Gaussian kernel to calculate a loss value L of the target key point_k。

First, a Gaussian nucleus Y_xycThe following formula two shows:

wherein σ_pIndicating the standard deviation, x and y indicating the coordinates of the key points of the real annotation information,

and

and representing the key point coordinates of the true labeling information after the down-sampling processing.

Then, the target key point loss value L is calculated by the following formula III_k：

Where α and β represent the hyperparameters of keypoint loss, N represents the number of keypoints, Y_xycThe number of the gaussian kernels is represented,

indicating the confidence of detection.

Calculating the target center offset loss L by the following formula IV_off：

Wherein the content of the first and second substances,

represents the prediction of the offset, and R represents the output step size; p and

and respectively representing the key points and the target key points of the real labeling information.

Target size loss value L_sizeThe calculation process is as follows:

suppose that

Is of class c_kBy a key point estimation factor

To predict all center points in the imageAnd for each object k, performing an object size s_kThe regression of (c) is shown in the following equation five:

target size loss value L_sizeCalculated by the following equation six:

in the formula, s_kWhich represents the size of the object k in size,

indicating the predicted size.

Obtaining the loss value L of the target key point from the above_kTarget center offset loss value L_offTarget size loss value L_sizeCalculating a first loss value L by a formula I_det。

Step 1034, adjusting a first parameter of the target detection model according to the first loss value by using a back propagation algorithm.

The partial derivative of each first parameter is calculated based on the first loss value. The first parameter is updated by a gradient descent method.

Step 1035, identifying a domain class of the target domain image according to the target feature map; the domain category includes a transition domain or a target domain.

And inputting a subsequent hierarchical structure of the domain classification model according to the target feature map so as to identify the domain class of the target domain image through the subsequent hierarchical structure.

Specifically, step 1035 specifically includes the following steps. Referring to fig. 9, fig. 9 is a specific schematic flowchart illustrating step 1035 of a training method of an object detection model provided in the present application.

Step 1035a, extracting a second feature map to be identified in the target domain through a third convolution network in the domain classification model.

And inputting the target feature map into a third convolution network so as to extract a second feature map to be identified in the target domain through the third convolution network. The second feature map to be recognized is a domain type-dependent high-dimensional feature map.

Step 1035b, identifying the domain class of the target domain image according to the second feature map to be identified.

And identifying the domain category in the second feature map to be identified through a domain classifier.

Step 1036, calculating a second loss value of the domain classification model according to the domain category.

Second loss value L_daCalculated by the following formula seven:

wherein d is_iField label representing the ith training image, d_i0 represents the transition region, d_iThe target domain is represented by 1,

represents the output of the domain classifier at (u, v) of the feature map of the ith image.

Step 1037, adjusting a second parameter of the domain classification model according to the second loss value by using a back propagation algorithm.

And calculating the partial derivative of each second parameter according to the second loss value. The second parameter is updated by a gradient descent method.

Specifically, step 1037 specifically includes the following steps. Referring to fig. 10, fig. 10 is a schematic flow chart showing step 1037 of a method for training a target detection model provided by the present application.

Step 1037a, calculating partial derivatives of the second parameters of the first convolution network, the third convolution network and the gradient inversion layer according to the second loss values.

The process of calculating the partial derivative is prior art and will not be described herein.

Step 1037b, adjusting the second parameters of the first convolutional network and the third convolutional network according to the partial derivative.

Adjusting the model parameters according to the partial derivatives is prior art and will not be described in detail herein.

Step 1037c, adjusting a second parameter of the gradient inversion layer according to an inverse of the partial derivative.

And after the partial derivative of the gradient inversion layer is obtained through calculation, updating a second parameter of the gradient inversion layer according to the opposite number of the partial derivative. As is well known, the back propagation refers to transmitting the loss (the difference between the predicted value and the true value) back layer by layer, and then each layer of network calculates the partial derivative according to the transmitted error, so as to update the parameters of the layer of network. What the gradient inversion layer does is to multiply the partial derivative transmitted to the layer by a negative number, so that the training targets of the network before and after the gradient inversion layer are opposite to each other, and the effect of countermeasure is achieved.

Step 1038, inputting the transition domain images and the target domain images corresponding to the plurality of data sets into a counterstudy model, circularly executing the first convolution network shared by the target detection model and the domain classification model, and extracting the transition feature map in the transition domain image and the target feature map in the target domain image, and the subsequent steps.

And (4) sequentially executing steps 1031 to 1038 for each data set until all data sets complete the training process.

Step 1039, using the target detection model corresponding to the time when the first loss value is minimum and the second loss value is maximum in the multiple times of training as the trained target detection model.

It is emphasized that steps 1031 to 1034 are a process of training the target detection model, i.e., a training process of the first convolutional network to the target detector hierarchy. And steps 1035 to 1037 are processes for training the domain classification model, i.e., the first convolutional network to domain classifier hierarchy.

Wherein the object detector and the domain classifier form an antagonistic learning. When the first loss value corresponding to the target detector is minimum, and the second loss value corresponding to the domain classifier is maximum. The detection accuracy of the target detection model is highest.

To better explain the process of counterlearning between the target detector and the domain classifier in this embodiment. The present embodiment is explained by taking the following figures as examples: the first convolution network metaphor is a criminal party for manufacturing the counterfeit money, the transition characteristic diagram and the target characteristic diagram extracted by the first convolution network metaphor are the counterfeit money, the target detector metaphor is a criminal party for researching the counterfeit money, and the domain classifier is an official money detecting mechanism. The first volume network of the criminal group is very bad in workmanship at the initial stage of counterfeit money manufacture, and is easily recognized as counterfeit money by official money detecting mechanisms. The criminal party who makes the counterfeit money adjusts and improves according to the suggestions brought by the criminal party who studies the counterfeit money and the defects found by the official money detecting mechanism, and the counterfeit money which is fine in work is made. The official currency detecting mechanism is cheated by the fine-crafted counterfeit currency, so that the official currency detecting mechanism also improves the currency detecting level of the official currency detecting mechanism according to the fine-crafted counterfeit currency. At this time, the criminal group who produces counterfeit money needs to be modified according to the suggestions of the criminal group who studies counterfeit money and the defects discovered by the official money detecting mechanism, so as to produce better counterfeit money. The working of the criminal group for manufacturing counterfeit money is more and more excellent, so that an official money detecting mechanism cannot identify the counterfeit money, and the criminal group for manufacturing the counterfeit money and the criminal group for researching the counterfeit money form a unique professional counterfeit team, namely the target detection model with high detection precision is obtained.

In the embodiment, a plurality of data sets are obtained, and each data set comprises a source domain image and a target domain image; converting the source domain image and the target domain image in each data set to obtain a transition domain image; inputting the transition domain image and the target domain image into a confrontation learning model, and training a target detection model to obtain a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning. According to the scheme, the domain classification model and the target detection model are used for counterstudy, so that the target detection model is suitable for different scenes, and the detection precision of images in different scenes is improved.

Optionally, on the basis of the embodiment shown in fig. 10, before the adjusting the second parameter of the gradient inversion layer according to the inverse of the partial derivative, the following steps are further included, please refer to fig. 11, where fig. 11 shows a schematic flowchart of another training method for the target detection model provided in this application. In this embodiment, steps 111 to 112 are the same as steps 1037a to 1037b in the embodiment shown in fig. 10, and please refer to the related description of steps 1037a to 1037b in the embodiment shown in fig. 10, which is not described herein again.

And 111, respectively calculating partial derivatives of the second parameters of the first convolution network, the third convolution network and the gradient inversion layer according to the second loss value.

Step 112, adjusting the second parameters of the first convolutional network and the third convolutional network according to the partial derivative.

And 113, adjusting the second parameter of the gradient inversion layer according to the training times to obtain a third parameter.

In order to suppress the noise signal of the domain classifier in the training process, a variable adaptive factor is introduced into the gradient inversion layer, so that the second parameter is adjusted through the variable adaptive factor to obtain a third parameter, and the purpose of suppressing the noise signal is achieved.

Specifically, step 113 specifically includes: calculating the third parameter according to the following formula:

where τ denotes the second parameter, p denotes the percentage of the current training number in the total training number, λ_pRepresenting the third parameter.

And step 114, adjusting a third parameter of the gradient inversion layer according to the inverse number of the partial derivative.

In this embodiment, the third parameter is obtained by adjusting the second parameter of the gradient inversion layer according to the training times. The aim of suppressing noise signals is achieved, and the precision of the training target detection model is improved.

The embodiment of the application provides a target detection method, which is used for applying the trained target detection model.

Referring to fig. 12, fig. 12 is a schematic flow chart illustrating a method of object detection provided in the present application.

And step 121, acquiring an image to be detected.

Step 122, obtaining a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image.

And 123, inputting the image to be recognized into the target detection model to obtain a target detection result output by the target detection model.

In the embodiment, an image to be detected is collected; acquiring a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image; and inputting the image to be recognized into the target detection model to obtain a target detection result output by the target detection model. By the scheme, the trained target detection model is used for identifying the target detection result in the image to be detected. The recognition accuracy of target detection is improved.

Fig. 13 shows a schematic diagram of an apparatus 13 for training an object detection model, where fig. 13 shows a schematic diagram of an apparatus 13 for training an object detection model, and fig. 13 shows an apparatus for training an object detection model, which includes:

a first acquiring unit 131, configured to acquire a plurality of data sets, each of which includes a source domain image and a target domain image; the source domain image comprises the labeling information of the target object; the target domain image does not comprise the labeling information of the target object;

an image processing unit 132, configured to perform conversion processing on the source domain image and the target domain image in each data set to obtain a transition domain image; the transition domain image comprises the target object in the source domain image, the annotation information in the source domain image and background information in the target domain image;

the training unit 133 is configured to input the transition domain image and the target domain image into an antagonistic learning model, train a target detection model, and obtain a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning.

The training device for the target detection model, provided by the application, is used for acquiring a plurality of data sets, wherein each data set comprises a source domain image and a target domain image; converting the source domain image and the target domain image in each data set to obtain a transition domain image; inputting the transition domain image and the target domain image into a confrontation learning model, and training a target detection model to obtain a trained target detection model; the confrontation learning model comprises a target detection model and a domain classification model; the object detection model and the domain classification model are used for counterlearning. According to the scheme, the domain classification model and the target detection model are used for counterstudy, so that the target detection model is suitable for different scenes, and the detection precision of images in different scenes is improved.

Referring to fig. 14, the present application provides an apparatus for object detection 14, please refer to fig. 14, fig. 14 shows a schematic diagram of an apparatus for object detection provided by the present application, and the apparatus for object detection shown in fig. 14 includes:

an acquisition unit 141 for acquiring an image to be detected;

a second obtaining unit 142, configured to obtain a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image;

the identifying unit 143 is configured to input the image to be identified into the target detection model, and obtain a target detection result output by the target detection model.

The application provides a training device of a target detection model, which is used for collecting an image to be detected; acquiring a pre-trained target detection model; the pre-trained target detection model is obtained by training a counterstudy model by using a transition domain image and a target domain image; wherein the antagonistic learning model comprises a target detection model and a domain classification model; the target detection model and the domain classification model are used for counterlearning; the transition domain image is an image obtained by converting a source domain image and the target domain image; and inputting the image to be recognized into the target detection model to obtain a target detection result output by the target detection model. By the scheme, the trained target detection model is used for identifying the target detection result in the image to be detected. The recognition accuracy of target detection is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 15 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 15, a terminal device 15 of this embodiment includes: a processor 150, a memory 151 and a computer program 152, such as a training program for an object detection model, stored in the memory 151 and executable on the processor 150. The processor 150, when executing the computer program 152, implements the steps in each of the above-described method for training an object detection model or method embodiments of object detection, such as the steps 101 to 103 shown in fig. 1. Alternatively, the processor 150, when executing the computer program 152, implements the functions of the units in the above-described device embodiments, such as the functions of the units 131 to 133 shown in fig. 13 or the units 141 to 143 shown in fig. 14.

Illustratively, the computer program 152 may be divided into one or more units, which are stored in the memory 151 and executed by the processor 150 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 152 in the terminal device 15. For example, the computer program 152 may be divided into an acquisition unit and a calculation unit, each unit having the following specific functions:

The acquisition unit is used for acquiring an image to be detected;

The terminal device 15 may be a network device such as a wireless router, a wireless gateway or a wireless bridge. The terminal device may include, but is not limited to, a processor 150 and a memory 151. Those skilled in the art will appreciate that fig. 15 is merely an example of one type of terminal device 15 and is not intended to limit one type of terminal device 15 and may include more or fewer components than shown, or some components may be combined, or different components, for example, the one type of terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 150 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 151 may be an internal storage unit of the terminal device 15, such as a hard disk or a memory of the terminal device 15. The memory 151 may also be an external storage device of the terminal device 15, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the terminal device 15. Further, the memory 151 may also include both an internal storage unit and an external storage device of the terminal device 15. The memory 151 is used for storing the computer program and other programs and data required by the kind of terminal equipment. The memory 151 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for training an object detection model, the method comprising:

2. The method of claim 1, wherein said converting said source domain image and said target domain image in each of said data sets to obtain a transition domain image comprises:

inputting the source domain image and the target domain image into an image conversion model to obtain a first image output by the image conversion model; the first image comprises the target object in the source domain image and background information in the target domain image;

and synthesizing the labeling information of the target object into the first image to obtain the transition domain image.

3. The method of claim 1, wherein the inputting the transition domain image and the target domain image into a confrontation learning model, training a target detection model, and obtaining a trained target detection model, comprises:

extracting a transition feature map in the transition domain image and a target feature map in the target domain image through a first convolution network shared by a target detection model and a domain classification model;

identifying the labeling information and the category of the target object in the transition domain image according to the transition feature map;

calculating a first loss value of the target detection model according to the labeling information and the category;

adjusting a first parameter of the target detection model according to the first loss value by using a back propagation algorithm;

identifying the domain type of the target domain image according to the target feature map; the domain category comprises a transition domain or a target domain;

calculating a second loss value of the domain classification model according to the domain category;

adjusting a second parameter of the domain classification model according to the second loss value by using a back propagation algorithm;

inputting the transition domain images and the target domain images corresponding to a plurality of data sets into a counterstudy model, circularly executing the first convolution network shared by the target detection model and the domain classification model, and extracting transition features in the transition domain images and target features in the target domain images;

and taking the corresponding target detection model with the minimum first loss value and the maximum second loss value in multiple times of training as the trained target detection model.

4. The method of claim 3, wherein the identifying the annotation information and the category of the target object in the transition region image according to the transition feature map comprises:

extracting a first feature map to be identified in the transition features through a second convolution network in the target detection model;

and identifying the labeling information and the category of the target object in the transition domain image according to the first feature map to be identified.

5. The method of claim 3, wherein the identifying the domain class of the target domain image according to the target feature map comprises:

extracting a second feature map to be identified in the target domain through a third convolution network in the domain classification model;

and identifying the domain type of the target domain image according to the second feature map to be identified.

6. The method of claim 3, wherein a gradient inversion layer is disposed in the domain classification model;

adjusting a second parameter of the domain classification model according to the second loss value by using a back propagation algorithm, including:

respectively calculating partial derivatives of second parameters of the first convolution network, the third convolution network and the gradient inversion layer according to the second loss value;

adjusting the second parameters of the first convolutional network and the third convolutional network according to the partial derivative;

and adjusting a second parameter of the gradient inversion layer according to the inverse number of the partial derivative.

7. The method of claim 6, further comprising, before said adjusting a second parameter of said gradient inversion layer according to an inverse of said partial derivative:

adjusting a second parameter of the gradient inversion layer according to the training times to obtain a third parameter;

correspondingly, adjusting a second parameter of the gradient inversion layer according to the inverse number of the partial derivative includes:

and adjusting a third parameter of the gradient inversion layer according to the inverse number of the partial derivative.

8. The method of claim 7, wherein the adjusting the second parameter of the gradient inversion layer according to the training times to obtain a third parameter comprises:

calculating the third parameter according to the following formula:

9. A method of target detection, the method comprising:

collecting an image to be detected;

10. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 8 or 9 when executing the computer program.

11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8 or 9.