WO2024012138A1

WO2024012138A1 - Target detection model training method and apparatus, and target detection method and apparatus

Info

Publication number: WO2024012138A1
Application number: PCT/CN2023/100274
Authority: WO
Inventors: 吕永春; 朱徽; 王钰; 周迅溢; 曾定衡; 蒋宁
Original assignee: 马上消费金融股份有限公司
Priority date: 2022-07-15
Filing date: 2023-06-14
Publication date: 2024-01-18
Also published as: CN117437395A

Abstract

Embodiments of the present application provide a target detection model training method and apparatus, and a target detection method and apparatus. In a model training stage, on the basis of real bounding boxes and first initial bounding boxes, a model to be trained is caused to continuously learn bounding box distribution, so that first predicted bounding boxes are closer to the real bounding boxes; the model to be trained comprises a generation sub-model and a discrimination sub-model, a regression loss value is determined on the basis of discrimination result sets outputted by the discrimination sub-model, and then iterative updating is continuously performed on model parameters on the basis of the regression loss value; and each discrimination result set simultaneously comprises discrimination results representing a bounding box distribution similarity and a bounding box coordinate coincidence degree.

Description

Target detection model training method, target detection method and device

cross reference

This invention requires the priority of a Chinese patent application submitted to the China Patent Office on July 15, 2022, with the application number 202210831208.2 and the invention name "Target Detection Model Training Method, Target Detection Method and Device". The entire content of the application has been approved. This reference is incorporated herein by reference.

Technical field

The present application relates to the field of target detection, and in particular, to a target detection model training method, target detection method and device.

Background technique

With the rapid development of artificial intelligence technology, there is an increasing demand for detecting targets in an image through pre-trained target detection models to predict the coordinate information and classification information of the bounding boxes of each target contained in the image; However, in the training process of the existing target detection model, the model parameters are trained through image feature extraction. As a result, for the sample image data set, the accuracy of the model parameters of the trained target detection model is relatively high, but for the target to be For detection images, the accuracy of the model parameters of the trained target detection model will be reduced, resulting in a relatively low target detection accuracy in the model application stage.

Contents of the invention

The purpose of this application is to provide a target detection model training method, target detection method and device.

On the one hand, this application provides a method for training a target detection model. The method includes: obtaining a first initial bounding box, and obtaining a real bounding box corresponding to the first initial bounding box; the first initial bounding box is Obtained by extracting the target area from the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box into the model to be trained Carry out model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training The method includes: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first The first predicted bounding box corresponding to the initial bounding box generates a set of discrimination results; the set of discrimination results includes a first discrimination result and a second discrimination result, and the first discrimination result represents the difference between the first predicted bounding box and the real The similarity of the bounding box distribution of the bounding box, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result corresponding to the first initial bounding box and the second discrimination result, determining the regression loss value of the model to be trained; performing parameter updates on the generating sub-model and the discriminating sub-model based on the regression loss value.

On the one hand, this application provides a target detection method. The method includes: obtaining a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset area of interest extraction model. ; Input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to the second initial bounding box; based on the second predicted bounding box corresponding to the second initial bounding box The second predicted bounding box and the second predicted category generate a target detection result of the image to be detected.

On the one hand, this application provides a target detection model training device. The device includes: a first bounding box acquisition module configured to acquire a first initial bounding box, and acquire the real corresponding corresponding first initial bounding boxes. Bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; a model training module configured to combine the first initial bounding box and the real The bounding box is input to the model to be trained for model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; the model is iteratively trained Each model training in includes: for each first initial bounding box: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discrimination The sub-model generates a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box; the set of discrimination results includes a first discrimination result and a second discrimination result. As a result, the first discrimination result represents the similarity of the bounding box distribution between the first predicted bounding box and the real bounding box, and the second discrimination result represents the similarity between the first predicted bounding box and the real bounding box. The degree of coincidence of bounding box coordinates; determining the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box; based on the regression loss value, the generated sub-model and The discriminant model performs parameter updating.

On the one hand, this application provides a target detection device. The device includes: a second bounding box acquisition module configured to acquire a second initial bounding box; the second initial bounding box is extracted using a preset region of interest. The model is obtained by extracting the target area of the image to be detected; the target detection module is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box corresponding to the second initial bounding box and A second prediction category; a detection result generation module configured to generate a target detection result of the image to be detected based on the second prediction bounding box corresponding to the second initial bounding box and the second prediction category.

In one aspect, the present application provides a computer device, the device comprising: a processor; and a memory arranged to store computer-executable instructions, the executable instructions being configured to be executed by the processor, the executable instructions Instructions include steps for performing the methods as described above.

On the one hand, embodiments of the present application provide a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions cause the computer to perform steps in the above method.

Description of drawings

In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only are some of the embodiments recorded in one or more of the present application. For those of ordinary skill in the art, without exerting any creative effort, they can also obtain results based on these drawings. Get other pictures.

Figure 1 is a schematic flow chart of a target detection model training method provided by an embodiment of the present application;

Figure 2 is a schematic flow chart of each model training process in the target detection model training method provided by the embodiment of the present application;

Figure 3 is a schematic diagram of the first implementation principle of the target detection model training method provided by the embodiment of the present application;

Figure 4a is a schematic diagram of the second implementation principle of the target detection model training method provided by the embodiment of the present application;

Figure 4b is a schematic diagram of the third implementation principle of the target detection model training method provided by the embodiment of the present application;

Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application;

Figure 6 is a schematic diagram of the implementation principle of the target detection method provided by the embodiment of the present application;

Figure 7 is a schematic diagram of the module composition of the target detection model training device provided by the embodiment of the present application;

Figure 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application;

Figure 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand one or more technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, , the described embodiments are only one or more partial embodiments of the present application, rather than all embodiments. Based on one or more embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the protection scope of this application.

It should be noted that, without conflict, one or more embodiments and features in the embodiments of the present application can be combined with each other. The embodiments of the present application will be described in detail below with reference to the accompanying drawings and embodiments.

Considering that by using a deep network to extract features, the model is prompted to learn the image features in the bounding box, continuously learn the similarity between the predicted bounding box and the image features in the real bounding box, and adjust the model parameters, so that the trained target detection model is more dependent on In the sample data set used in the model training phase, the target detection model has poor generalization and poor cross-data migration capabilities. There is bound to be a target detection model that has high target detection accuracy for the sample data set, but for new image data to be detected. The problem of low target detection accuracy. Based on this, in the model training stage, this application prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box Closer to the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring Use the target detection model to detect targets in new images to be detected, and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the discrimination output by the discriminating sub-model The results are collected to determine the regression loss value of the model to be trained, and then the model parameters of the generating sub-model and the discriminating sub-model are continuously updated for multiple rounds of iterations based on the regression loss value, until the current model training results meet the preset model training end conditions, that is, based on The method of generating and discriminating multiple rounds of confrontation continuously learns the bounding box distribution, in which the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is different from the real When the bounding boxes are difficult to distinguish, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thus Further improve the model parameter update efficiency and bounding box distribution learning accuracy of the generated sub-model; it is also considered that if the model regression loss is determined only from the coarse-grained comparison dimension of the similarity of the bounding box distribution and the model parameters are adjusted, the bounding box cannot be taken into account Precise position learning, or only determining the model regression loss from the fine-grained comparison dimension of the bounding box coordinate coincidence degree, and adjusting the model parameters, will not be able to take into account the edge ambiguity of the bounding box. Based on this, by similarity from the bounding box distribution The model regression loss is determined by combining the coarse-grained comparison dimension of the extent and the fine-grained comparison dimension of the bounding box coordinate coincidence degree, which is the discriminant sub-model. The set of output discrimination results not only includes the first discrimination result that characterizes the similarity of the distribution of bounding boxes, but also includes the second discrimination result that characterizes the degree of coincidence of the coordinates of the bounding boxes, so as to simultaneously consider the effects of bounding boxes with similar distribution of bounding boxes but specific position deviations. The regression loss and the effect of the regression loss caused by the first predicted bounding box corresponding to the real bounding box of edge ambiguity make the regression loss value obtained based on the discrimination result set more accurate, thus further improving the accuracy of the regression loss based on the regression The accuracy of the model parameters after the loss value is updated.

Figure 1 is a first flow diagram of a target detection model training method provided by one or more embodiments of the present application. The method in Figure 1 can be executed by an electronic device equipped with a target detection model training device. The electronic device can be a terminal. Device or designated server, wherein the hardware device used for target detection model training (i.e., the electronic device provided with the target detection model training device) and the hardware device for target detection (i.e., the electronic device provided with the target detection device) may be the same or different . It should be noted that the target detection model trained based on the target detection model training method provided by the embodiment of the present application can be applied to any specific application scenario that requires target detection on the image to be detected. For example, specific application scenario 1, for using a certain The image to be detected collected by the image acquisition equipment at the entrance of a public place (such as the entrance of a shopping mall, subway entrance, entrance to an attraction, or entrance to a performance site, etc.) is used for target detection. Another example is specific application scenario 2, which uses a certain breeding base. The images to be detected collected by the image acquisition equipment at each monitoring point are used for target detection.

Among them, due to the different specific application scenarios of the target detection model, the sample image data sets used in the training process of the target detection model are also different. For the specific application scenario 1, the sample image data set can be a designated public place within a preset historical time period. For the historical sample image collected at the entrance, correspondingly, the target object circled by the first initial bounding box is the target user who entered the designated public place in the historical sample image. The real category and the first predicted category can be the category to which the target user belongs, such as At least one of age group, gender, height, and occupation; for specific application scenario 2, the sample image data set can be historical sample images collected at each monitoring point in the designated breeding base within the preset historical time period, corresponding to the first The target object circled by the initial bounding box is the target breeding object in the historical sample image. The real category and the first predicted category can be the category of the target breeding object, such as live At least one of physical condition and body size.

The training process of the target detection model, as shown in Figure 1, includes at least the following steps:

S102, obtain a first preset number of first initial bounding boxes, and obtain a real bounding box corresponding to each first initial bounding box; the above-mentioned first initial bounding box is performed on the sample image data set using a preset region of interest extraction model. The target area is extracted.

The process of determining the first preset number of first initial bounding boxes may include, for each round of model training in the model iterative training, performing a step of extracting the target area from the sample image data set using the preset region of interest extraction model. , to obtain a first preset number of first initial bounding boxes; it may also be performed in advance using a preset region of interest extraction model to extract the target area from the sample image data set, and then for each round of model training in the model iterative training , randomly sampling from a large number of pre-extracted candidate bounding boxes to obtain a first preset number of first initial bounding boxes.

The sample image data set may contain multiple sample target objects, and each sample target object may correspond to multiple first initial bounding boxes. That is, the first preset number of first initial bounding boxes includes at least one corresponding to each sample target object. First initial bounding box.

Before obtaining the first preset number of first initial bounding boxes, the method further includes: inputting the sample image data set into a preset region of interest extraction model to perform region of interest extraction to obtain a second preset number of candidate bounding boxes; wherein , the second preset number is equal to or greater than the first preset number, and the first preset number is the number of the first initial bounding boxes, that is, for the situation where the second preset number is equal to the first preset number, for the above In each round of model training in the model iterative training, a preset region of interest extraction model is used to extract regions of interest from multiple sample image data in the sample image data set, and a first preset number of first initial bounding boxes are obtained; For the situation where the second preset number is greater than the first preset number, for each round of model training in the iterative training of the model, randomly sample the first preset number of first initial bounding boxes from the first preset number of candidate bounding boxes. bounding box.

Among them, considering that one of the purposes in the model training process is to continuously learn the bounding box distribution through iterative training of model parameters, thereby improving the generalization and data transferability of the model (that is, the model parameters do not depend on the samples used in the model training process data, which can better suit the needs of the model application process. identification data), in order to promote the model to be trained to better learn the bounding box distribution, it is necessary to ensure that the first initial bounding box extracted by the preset region of interest extraction model and input to the model to be trained obeys a certain probability distribution (such as Gaussian distribution or Kohl's distribution). Western distribution), in this way, the larger the number N of anchor boxes extracted by the preset region of interest extraction model, the more helpful it is for the model to be trained to better learn the bounding box distribution. However, if the preset region of interest is used in real time every time The region extraction model (such as the region of interest extraction algorithm ROI) extracts N anchor boxes as the first initial bounding box and inputs them into the model to be trained for model training, which will inevitably lead to a relatively large amount of data processing and relatively high requirements for hardware equipment.

In specific implementation, it is preferable to use a preset region of interest extraction model to extract N anchor boxes in advance, and then, in each round of model training, m are randomly sampled from the N anchor boxes as the first initial bounding boxes, and are input to the to-be- Model training is performed in the training model, which can not only ensure the data processing volume of each round of model training, but also ensure that the model can better learn the bounding box distribution, that is, it can promote the boundary while taking into account the data processing volume during the model training process. Frame distribution learning, based on this, the above-mentioned second preset number is greater than the first preset number, and the above-mentioned first preset number is the number of the first initial bounding boxes. Correspondingly, the above-mentioned acquisition of the first initial bounding box specifically includes : From the above-mentioned second preset number of candidate bounding boxes, randomly select a first preset number of candidate bounding boxes as the first initial bounding box, that is, use the preset region of interest extraction model in advance to extract multiple objects in the sample image data set. The region of interest is extracted from the sample image data to obtain a second preset number of candidate bounding boxes; then, for each round of model training, a first preset number of candidate bounding boxes are randomly sampled from the second preset number. An initial bounding box.

That is to say, a preferred implementation is to pre-extract N anchor boxes (i.e., a second preset number of candidate bounding boxes), and then, for each round of model training, randomly sample m anchor boxes from the N anchor boxes. (i.e., a first preset number of first initial bounding boxes), and then continue to perform the following step S104.

S104. Input the above-mentioned first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain the target detection model; the above preset model training end conditions may include: The current number of model training rounds is equal to the total number of training rounds, the model loss function converges, or a balance is reached between the generative sub-model and the discriminative sub-model.

Among them, regarding the model iterative training process in step S104, the specific implementation process of the model iterative training is explained below. Since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example. Provide detailed explanation. If the above-mentioned model to be trained includes a generator sub-model and a discriminant sub-model; as shown in Figure 2, each model training in the iterative training of the above-mentioned model may include the following steps S1042 to step S1046:

S1042, for each first initial bounding box: the generating sub-model performs bounding box prediction based on the first initial bounding box and obtains the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial bounding box. The first predicted bounding box corresponding to the bounding box generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the boundary box between the first predicted bounding box and the real boundary box The degree of distribution similarity, the above-mentioned second discrimination result represents the degree of coincidence of the bounding box coordinates of the above-mentioned first predicted bounding box and the above-mentioned real bounding box.

For the determination process of the first discrimination result that represents the similarity of the bounding box distribution, the KL divergence (Kullback-Leibler divergence) between the real bounding box and the corresponding first predicted bounding box can be directly calculated; however, in specific implementation, Considering that the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is real enough, when the generated bounding box (i.e., the first predicted bounding box) is indistinguishable from the real bounding box, due to the discriminant sub-model existence, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box. Therefore, in order to further improve the corresponding regression of bounding box distribution similarity The accuracy of the loss component, thereby ensuring that the first predicted bounding box predicted by the target detection model is more realistic, or for each first initial bounding box, the discriminator model can be used to distinguish between the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box comes from the discriminant probability of real data or generated data respectively. Since the size of the discriminant probability is related to the proximity of the probability distribution of the two bounding boxes (i.e., the real bounding box and the corresponding first predicted bounding box) , therefore, the discriminant probability can characterize the distribution similarity between the real bounding box and the corresponding first predicted bounding box, so based on the discriminant probability, the first regression loss component corresponding to the discriminant dimension considered from the perspective of the boundary box distribution similarity can be determined , thereby prompting the model to perform bounding box regression learning; specifically , for the real bounding box and the first predicted bounding box corresponding to a certain first initial bounding box, the discriminator model determines that the real bounding box comes from the discriminant probability of the real data, and determines that the first predicted bounding box comes from the generated data. The greater the discriminant probability of the discriminator model's discriminant judgment that the real bounding box comes from real data, and the greater the discriminant probability that the first predicted bounding box comes from generated data, indicating the probability of the first predicted bounding box and the corresponding real bounding box The lower the distribution similarity, the greater the corresponding first regression loss component in terms of the discriminant dimension of the bounding box distribution similarity. Therefore, the difference between the first predicted bounding box corresponding to a certain first initial bounding box and the corresponding real bounding box The degree of distribution similarity is determined by the discriminant sub-model's discriminant probability of whether the real bounding box and the first predicted bounding box come from real data or generated data respectively. Therefore, the first discriminant result can be generated based on the discriminant sub-model's discriminant probability, so The first discrimination result can represent the similarity degree of the bounding box distribution, and then based on the discrimination probability in the first discrimination result, the first regression loss component corresponding to the discrimination dimension of the boundary box distribution similarity degree can be determined.

For the determination process of the second discrimination result that represents the degree of coincidence of bounding box coordinates, only the intersection-union ratio loss between a certain real bounding box and the corresponding first predicted bounding box can be considered to obtain the target intersection-union ratio loss; it can also be comprehensive Considering the intersection loss between a certain real bounding box and the corresponding first predicted bounding box, and the intersection loss between a certain real bounding box and the first predicted bounding box corresponding to other real bounding boxes, determine the target Intersection-to-Union Ratio Loss; Since the size of the target Intersection-to-Union Ratio loss can represent the degree of coordinate coincidence between the real bounding box and the corresponding first predicted bounding box, based on the target Intersection-to-Union Ratio loss, it can be determined from the perspective of the boundary box coordinate coincidence degree. The second regression loss component corresponding to the discriminant dimension of The target intersection and union ratio loss between the first predicted bounding boxes. The greater the target intersection and union ratio loss, the lower the degree of coordinate coincidence between the first predicted bounding box and the corresponding real bounding box. The discriminant dimension for the degree of coordinate coincidence of the bounding box The larger the corresponding second regression loss component is, therefore, the degree of coordinate coincidence between the first predicted bounding box corresponding to a certain first initial bounding box and the corresponding real bounding box is based on the relationship between the real bounding box and the first predicted bounding box. is determined by the target intersection loss between The intersection-union ratio loss generates a second discrimination result, so that the second discrimination result can represent the degree of overlap of the bounding box coordinates, and then based on the intersection-union ratio loss in the second discrimination result, the discrimination dimension corresponding to the degree of overlap of the bounding box coordinates can be determined. The second regression loss component.

S1044: Determine the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result in the discrimination result set corresponding to each of the first initial bounding boxes.

After obtaining the set of discrimination results for each first initial bounding box, the sub-regression loss value corresponding to each first initial bounding box can be obtained. The sub-regression loss value at least includes: the third sub-regression loss value considered from the perspective of the similarity of the bounding box distribution. The first regression loss component corresponding to one discriminant dimension, the second regression loss component corresponding to the second discriminant dimension considered from the perspective of the coincidence degree of the bounding box coordinates; then, based on the sub-regression loss value corresponding to each first initial bounding box, Determine the regression loss value used to adjust model parameters.

In the process of determining the sub-regression loss value corresponding to the first initial bounding box, it can be considered from the perspective of the similarity of the distribution of the bounding boxes and the degree of coincidence of the coordinates of the bounding boxes at the same time, or it can be considered only from the perspective of the similarity of the distribution of the bounding boxes, that is, the above-mentioned third A set of discrimination results corresponding to an initial bounding box includes a first discrimination result, and correspondingly, based on the first regression loss component corresponding to the first discrimination result, a sub-regression loss value corresponding to the first initial bounding box is determined.

S1046: Update the parameters of the generating sub-model and the discriminating sub-model based on the above regression loss value.

After the regression loss value is determined based on the sub-regression loss value corresponding to each first initial bounding box, the gradient descent method is used to adjust the parameters of the generative sub-model and the discriminant sub-model based on the above-mentioned regression loss value; among them, due to the sub-regression loss value It at least reflects the first regression loss component corresponding to the regression loss discrimination dimension based on the similarity of the bounding box distribution, and the second regression loss component corresponding to the regression loss discrimination dimension based on the coincidence degree of the bounding box coordinates. Therefore, it is used to perform model parameters. The adjusted regression loss value also reflects the regression loss components corresponding to the two regression loss discriminant dimensions, so that the final trained target detection model can not only ensure that the predicted first predicted bounding box is closer to the probability distribution of the real bounding box , can also ensure that the coordinates of the first predicted bounding box and the real bounding box coincide more closely.

During the model training process, the discriminant sub-model tries to distinguish the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, which come from real data or generated data respectively, minimizing the regression loss of the model to be trained, and in order to Maximize the resolution error of the discriminant sub-model, force the generative sub-model to continuously learn the bounding box distribution, and promote multiple rounds of adversarial learning between the generative sub-model and the discriminant sub-model, thereby obtaining a more accurate generative sub-model as a target detection model.

For the process of iteratively training the model parameters to obtain the target detection model based on the regression loss value of the model to be trained, you can refer to the existing process of tuning the model parameters using the gradient descent method back propagation, which will not be described again here.

As shown in Figure 3, a schematic diagram of the specific implementation principle of the training process of a target detection model is given, which specifically includes: obtaining a first preset number of first initial bounding boxes, and obtaining the true corresponding to each first initial bounding box. Bounding box; for each first initial bounding box: the above-mentioned generating sub-model performs boundary box prediction based on the first initial bounding box to obtain the first predicted bounding box; the above-mentioned discriminating sub-model is based on the true boundary corresponding to the above-mentioned first initial bounding box The first prediction bounding box corresponding to the first initial bounding box and the above-mentioned first initial bounding box generates a set of discrimination results;

Based on the first discrimination result and the second discrimination result corresponding to each first initial bounding box, determine the regression loss value of the model to be trained; based on the above regression loss value, iteratively update the model parameters of the model to be trained until the current model training results meet the predetermined Set the end conditions of model training to obtain the target detection model.

In the embodiment of this application, during the model training phase, based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the real bounding box, so that It can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring that the target detection model can be used to detect new targets. Detect the target detection accuracy of the image and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model. Based on the set of discrimination results output by the discriminating sub-model, the model to be trained is determined regression loss value, and then continue to generate sub-models based on the regression loss value The model parameters of the type and discriminant sub-models are updated iteratively for multiple rounds until the current model training results meet the preset model training end conditions, that is, based on the generation-discrimination multi-round confrontation method, the bounding box distribution is continuously learned, in which the discriminant sub-model can determine the generation Whether the first predicted bounding box predicted by the sub-model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, the discriminant sub-model based on The adjustment of model parameters as a result of the discrimination can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving the model parameter update efficiency and bounding box distribution learning accuracy of the generating sub-model; and the discrimination The set of discrimination results output by the sub-model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates, so as to compensate for the deviation caused by the similarity of the bounding box distribution but the specific position. The effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.

Furthermore, considering that during the model training process, there may be a situation where the gradient of the regression loss corresponding to the first discriminant dimension considered from the angle of similarity of the bounding box distribution suddenly decreases or even becomes zero, in order to further improve the training accuracy of the model parameters , introducing a regression loss compensation value. Based on this, the above-mentioned judgment result set also includes a third judgment result; correspondingly, in the above-mentioned S1042, the real bounding box corresponding to the above-mentioned first initial bounding box and the third corresponding to the first initial bounding box are A predicted bounding box, generating a set of discrimination results, specifically including: judging the authenticity of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box to obtain the first discrimination result; based on the above first initial boundary The real bounding box and the first predicted bounding box corresponding to the frame are calculated, and the intersection and union ratio loss of the bounding box is calculated to obtain the second discrimination result; based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, the calculation for the treatment The loss gradient of the regression loss function of the training model is constrained by the regression loss compensation value to obtain the third discrimination result.

For each first initial bounding box, the set of discrimination results corresponding to the first initial bounding box includes not only the first discrimination result obtained from the perspective of the similarity of the distribution of the bounding boxes and the second discrimination result obtained from the perspective of the degree of coincidence of the bounding box coordinates. The result also includes the constraint corresponding to the first discriminant dimension. The regression loss compensation value of the gradient of the regression loss can not only improve the accuracy of the regression loss value, but also solve the problem that the gradient of the regression loss corresponding to the first discriminant dimension suddenly decreases or even becomes zero.

As shown in Figure 4a, a schematic diagram of the specific implementation principle of the training process of another target detection model is given, including: using the preset region of interest extraction model to extract the target area from the sample image data set in advance to obtain N anchor boxes; Among them, the sample image data set includes multiple original sample images, each original sample image includes at least one target object; the feature information corresponding to each anchor frame can include position information (x, y, w, h) and category information c, That is (x, y, w, h, c); during the model training process, multiple parameter dimensions can be set to be independent of each other. Therefore, the iterative training process of the model parameters for each dimension is also independent of each other. .

For each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first initial bounding box, and the real bounding box corresponding to each first initial bounding box is determined; where, each of the first initial bounding boxes in the sample image data set is A target object can correspond to a real bounding box. For example, if the total number of target objects in the sample image data set is d, then the number of real bounding boxes before expansion is d. In order to make the real bounding box correspond to the first predicted bounding box, Therefore, the real bounding boxes corresponding to multiple first initial bounding boxes containing the same target object can be the same, that is, based on the target object enclosed by the first initial bounding box, the real bounding box is expanded to obtain m real bounding boxes. (m>d); For example, the target object contained in a certain original sample image is a cat A, and cat A corresponds to the real bounding box A. If the number of first initial bounding boxes containing cat A is 4 ( For example, the first initial bounding box with serial numbers 6, 7, 8, and 9), then the real bounding box A is expanded into four real bounding boxes A (that is, the real boundary boxes with serial numbers 6, 7, 8, and 9).

For each first initial bounding box, the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the corresponding third bounding box. A predicted bounding box generates a set of discrimination results; where each first initial bounding box corresponds to a real bounding box and a first predicted bounding box, and the first predicted bounding box is a generative sub-model learned through continuous bounding box regression Predicted; generated sub-model output The target object circled by the first prediction bounding boxes numbered 6, 7, 8, and 9 among the m first prediction bounding boxes is cat A.

For each first initial bounding box, the first regression loss component is determined based on the first discrimination result in the discrimination result set of the first initial bounding box, and the first regression loss component is determined based on the second discrimination result in the discrimination result set of the first initial bounding box. two regression loss components, and a third regression loss component determined based on the third discrimination result in the discrimination result set of the first initial bounding box.

Based on the first regression loss component, the second regression loss component and the third regression loss component respectively corresponding to each first initial bounding box, determine the regression loss value of the model to be trained; use the stochastic gradient descent method to adjust the above-mentioned regression loss value based on the regression loss value The model parameters of the generative sub-model and the discriminant sub-model are obtained, and the updated generative sub-model and discriminant sub-model are obtained.

If the current model training results meet the preset model training end conditions, the above updated generated sub-model is determined as the trained target detection model.

If the current model training results do not meet the preset model training end conditions, the above-mentioned updated generation sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.

During the model training process, for each round of model training, the model parameters of the discriminating sub-model can be adjusted based on the discriminating result set, and the model parameters of the generating sub-model can be adjusted based on the discriminating result set; however, during specific implementation, In order to improve the training accuracy of the model parameters of the generated sub-model, for each round of model training, the model parameters of the discriminant sub-model are adjusted t times based on the set of discriminant results, and then the model parameters of the generated sub-model are adjusted based on the set of discriminant results. Once, the parameter-adjusted discriminant sub-model and generative sub-model are obtained as the next round of training models.

Among them, the regression loss value of the model to be trained is jointly determined based on the sub-regression loss values corresponding to multiple first initial bounding boxes, and the sub-regression loss value corresponding to each first initial bounding box is jointly determined based on the multiple regression loss components. Determined, based on this, the above-mentioned S1044 determines the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to each of the above-mentioned first initial bounding boxes, specifically including: determining the regression loss value corresponding to each first initial bounding box. Sub-regression loss value; each first initial boundary The sub-regression loss value corresponding to the box is determined based on the target information, where the target information includes one or a combination of the following: the similarity of the bounding box distribution represented by the first discrimination result corresponding to the first initial bounding box, the similarity of the distribution of the bounding box represented by the second discrimination result. The degree of coincidence of bounding box coordinates represented and the regression loss compensation value represented by the third discrimination result; based on the sub-regression loss value corresponding to each first initial bounding box, the regression loss value of the model to be trained is determined.

In specific implementation, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, only the first regression loss component corresponding to the first discrimination result may be considered, or the first regression loss component corresponding to the first discrimination result may be considered at the same time. The loss component and the second regression loss component corresponding to the second discrimination result can also be considered at the same time. The first regression loss component corresponding to the first discrimination result, the second regression loss component corresponding to the second discrimination result, and the regression corresponding to the third discrimination result can also be considered at the same time. Loss compensation component; where, taking the loss compensation component as an example, for each first initial bounding box, the corresponding sub-regression loss value is equal to the weighted sum of three regression loss components, which can be expressed as,
V _i (D,G)＝λ ₁ V _i1 +λ ₂ V _i2 +λ ₃ V _i3

Among them, λ ₁ represents the first weight coefficient corresponding to the first regression loss component under the first discriminant dimension, and V _i1 represents the first regression loss component under the first discriminant dimension (that is, the boundary represented by the first discriminant result). The regression loss component corresponding to the similarity of the box distribution), λ ₂ represents the second weight coefficient corresponding to the second regression loss component under the second discriminant dimension, V _i2 represents the second regression loss component under the second discriminant dimension (i.e. The regression loss component corresponding to the coincidence degree of the bounding box coordinates represented by the second discrimination result), λ ₃ represents the third weight coefficient corresponding to the regression loss compensation value, V _i3 represents the regression loss compensation value (i.e., the third regression loss component); The first discriminant dimension may be a regression loss discriminant dimension based on the similarity of bounding box distributions, and the second discriminant dimension may be a regression loss discriminant dimension based on the coincidence degree of bounding box coordinates.

In specific implementation, for multiple first initial bounding boxes, the first weight coefficient and the second weight coefficient may remain unchanged. However, considering that the first regression loss component and the second regression loss component respectively correspond to different The regression loss discriminant dimension (that is, the regression loss discriminant dimension based on the similarity of the bounding box distribution and the regression loss discriminant dimension based on the coincidence degree of the bounding box coordinates), and different The focus of regression loss consideration in the regression loss discriminant dimension is also different (for example, the regression loss discriminant dimension based on the similarity of the bounding box distribution focuses on the regression loss of the first initial bounding box corresponding to the real bounding box with blurred edge of the bounding box, The regression loss discrimination dimension based on the degree of coincidence of bounding box coordinates focuses on the regression loss of the first initial bounding box that considers the distribution of bounding boxes is similar but the specific position deviation). Therefore, the size relationship between the first regression loss component and the second regression loss component , to a certain extent, reflects which regression loss discriminant dimension can more accurately characterize the regression loss between the real bounding box and the first predicted bounding box. Based on this, for each first initial bounding box, according to the first initial boundary The size relationship between the first regression loss component and the second regression loss component corresponding to the frame, adjust the size of the first weight coefficient and the second weight coefficient; if the absolute value of the difference between the first regression loss component and the second regression loss component is not is greater than the preset loss threshold, then the first weight coefficient and the second weight coefficient remain unchanged; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is greater than the second regression loss component, then increase the first weight coefficient according to the first preset adjustment method; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first If the regression loss component is smaller than the second regression loss component, the second weight coefficient is increased according to the second preset adjustment method, so that for each first initial bounding box during the model training process, the key reference can be better Reflects the effect of the regression loss component corresponding to the discriminant dimension of the bounding box regression loss, thereby further improving the accuracy of model parameter optimization.

It should be noted that the increase range of the first weight coefficient corresponding to the above-mentioned first preset adjustment method and the increase range of the second weight coefficient corresponding to the second preset adjustment method may be the same or different, and the increase range of the weight coefficient may be Set according to actual needs, and this application does not limit this.

Among them, in view of the process of obtaining the first discrimination result by considering the discrimination dimension of the similarity degree of the bounding box distribution, the authenticity of the boundary box and the first predicted boundary box corresponding to the above-mentioned first initial boundary box are judged to obtain the first The judgment results include:

Step A1, based on the real bounding box corresponding to the first initial bounding box, determine the first discriminant probability that the real bounding box is predicted to be true by the discriminant sub-model; and based on the first discriminant probability corresponding to the first initial bounding box. A predicted bounding box that determines the second discriminant probability that the first predicted bounding box is predicted to be fake by the discriminator model.

Step A2: Generate a first discrimination result corresponding to the first initial bounding box based on the first discrimination probability and the second discrimination probability corresponding to the first initial bounding box.

For each first initial bounding box, the above discriminant sub-model determines the probability that the real bounding box corresponding to the first initial bounding box comes from real data, that is, for the real bounding box, the discriminant sub-model determines the authenticity of the real bounding box. Discriminate to obtain the first discriminant probability that the predicted real bounding box is real data; similarly, for each first initial bounding box, use the above discriminant sub-model to determine that the first predicted bounding box corresponding to the first initial bounding box comes from the generated data The probability (that is, the value 1 minus the probability that the discriminant model determines that the first predicted bounding box comes from real data), that is, for the first predicted bounding box, the discriminant model performs a true or false judgment on the first predicted bounding box, and we get Predict the first predicted bounding box to be the second discriminant probability of the generated data.

Since the discriminator model compares the first probability distribution corresponding to the real bounding box and the second probability distribution corresponding to the first predicted bounding box from the perspective of the similarity of the bounding box distribution, so as to realize the comparison between the real bounding box and the first predicted bounding box. Carry out authenticity discrimination and obtain the corresponding discrimination probability. This discrimination probability can represent the distribution similarity between the real bounding box and the corresponding first predicted bounding box. Therefore, after determining the above-mentioned first discrimination probability and second discrimination probability , the first discrimination result can be obtained, where the first discrimination result can represent the similarity of the bounding box distribution; further, based on the first discrimination result, the first regression loss component corresponding to the discrimination dimension that represents the similarity of the bounding box distribution can be determined , where the greater the first discriminant probability and the second discriminant probability are, the lower the distribution similarity between the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. Therefore, the distribution of the corresponding first initial bounding box is The greater the first regression loss component; then, the model parameters of the generation sub-model are updated based on the first regression loss component, so that the generation results of the generation sub-model can optimize the loss value of the model to be trained after being predicted by the discriminant sub-model, achieving optimization The purpose of generating a sub-model is to improve the bounding box prediction effect of the generated sub-model.

Further, in order to improve the accuracy of the first discrimination results corresponding to each first initial bounding box, In order to improve the accuracy of the first regression loss component corresponding to the discrimination dimension of the bounding box distribution similarity in the process of determining the sub-regression loss value based on the first discrimination result, based on this, the above-mentioned step A2, based on the above-mentioned first initial boundary The first discrimination probability and the second discrimination probability corresponding to the frame are generated to generate the first discrimination result, which specifically includes:

Step A21: Determine the first weighted probability based on the above-mentioned first discriminant probability and the first prior probability of the real bounding box corresponding to the first initial bounding box; and based on the above-mentioned second discriminant probability and the second prior probability of the first initial bounding box. The experimental probability is determined to determine the second weighted probability.

Step A22: Generate a first discrimination result based on the first weighted probability and the second weighted probability corresponding to the first initial bounding box.

In the process of determining the first discriminant result that represents the similarity of the bounding box distribution, the first prior probability of the real bounding box and the second prior probability of the first initial bounding box are considered, and the discriminant sub-model is used for the true bounding box and the second prior probability of the first initial bounding box, respectively. The first predicted bounding box is judged as true or false, and the obtained first judgment probability and second judgment probability are weighted to determine the first judgment result (that is, the first judgment result may include the first weighted probability and the second weighted probability), Therefore, the first regression loss component related to the similarity of the bounding box distribution obtained based on the first discrimination result can be expressed as:

in, Represents the prior probability that the i-th true bounding box appears (i.e., the first prior probability), P _i1 represents the first discriminant probability that the i-th true bounding box is predicted to be true by the discriminant sub-model, represents the prior probability that the i-th first initial bounding box appears (i.e., the second prior probability), and P _i2 represents the second discriminant probability that the i-th first predicted bounding box is predicted to be fake by the discriminant sub-model.

It should be noted that during specific implementation, can be the prior probability that the i-th first original bounding box appears. Since the first predicted bounding box is predicted by the generating sub-model based on the first original bounding box, therefore, It is also possible to provide a prior probability for the i-th first predicted bounding box occurrence.

Since the probability of occurrence of the real bounding box and the predicted bounding box both obey a certain probability distribution, such as Gaussian distribution, therefore, the first prior probability and the second prior probability can be obtained in the following way:

in, represents the real bounding box corresponding to the first initial bounding box with serial number i, σ ₁ represents the variance of the distribution probability of the first preset number of real bounding boxes, Represents the mean value of the distribution probability of the first preset number of real bounding boxes.

in, represents the first initial bounding box with serial number i, σ ₂ represents the variance of the distribution probability of the first preset number of first initial bounding boxes, Represents the mean value of the distribution probability of the first preset number of first initial bounding boxes.

The above regression loss value is equal to the sum of the sub-regression loss values corresponding to the first preset number of first initial bounding boxes. Specifically, it can be expressed as:

Among them, N _reg represents the first preset number, i represents the serial number of the first initial bounding box, and the value of i is 1 to N _reg .

Among them, for the process of obtaining the second discrimination result by considering the discriminant dimension of the bounding box coordinate coincidence degree, the above-mentioned calculation of the intersection-union ratio loss of the bounding box based on the real bounding box and the first predicted bounding box corresponding to the above-mentioned first initial bounding box is obtained, The second judgment result specifically includes:

Step B1: Calculate the intersection and union ratio loss of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box to obtain the first intersection and union ratio loss.

Taking the first initial bounding box with serial number i as an example, calculate the intersection loss between the real bounding box with serial number i and the first predicted bounding box with serial number i, and obtain the correspondence of the first initial bounding box with serial number i. The first cross-over loss.

Step B2: Based on the first intersection-union ratio loss, determine the second discrimination result corresponding to the first initial bounding box.

Since the size of the intersection-union ratio loss between two bounding boxes can characterize the degree of coincidence of the bounding box coordinates, the second discrimination result can be obtained based on the intersection-union ratio loss between the real bounding box and the first predicted bounding box, thus Based on the second discrimination result, the second regression loss component corresponding to the discrimination dimension considered from the perspective of the coincidence degree of the bounding box coordinates is determined, thereby prompting the model to perform bounding box regression learning.

For the determination process of the second discrimination result, only the first intersection-union ratio loss between the real bounding box and its corresponding first predicted bounding box can be considered. However, in order to improve the determination accuracy of the second discrimination result, thereby improving the The accuracy of the second regression loss component corresponding to the discriminant dimension considered in the angle of the boundary box coordinate coincidence degree, thereby improving the accuracy of the regression loss value used to adjust the model parameters, not only considering the first predicted boundary box corresponding to the real boundary box and itself The first cross-union loss between the real bounding box and other first predicted bounding boxes also considers the second cross-union loss between the real bounding box and other first predicted bounding boxes. This can achieve the goal of distinguishing the real bounding box from the positive sample (i.e. through bounding box regression). The first predicted bounding box corresponding to a certain real bounding box learned through learning) and the negative sample (that is, the first predicted bounding box corresponding to other real bounding boxes other than a certain real bounding box learned through bounding box regression ) is compared on the discriminant dimension of the coincidence degree of the bounding box coordinates to learn the specific position representation of the real bounding box, thereby prompting the model to better perform bounding box regression learning. Based on this, the above step B2, based on the above first intersection Ratio loss, determine the second discrimination result corresponding to the above-mentioned first initial bounding box, specifically including:

B21: Determine a set of comparison bounding boxes among the first predicted bounding boxes respectively corresponding to the first preset number of first initial bounding boxes.

Wherein, the comparison bounding box set includes other first predicted bounding boxes other than the first predicted bounding box corresponding to the first initial bounding box, or other first predicted bounding boxes that do not include the target object enclosed by the first initial bounding box. Predict bounding boxes.

Still taking the first initial bounding box with the serial number i as an example, the comparison boundary box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i (i.e., the first predicted bounding box with the serial number k). Predicted bounding box, k≠p, p=i), that is to say, except for the first predicted bounding box with serial number i, all other first predicted bounding boxes are used as negative samples of the real bounding box with serial number i. ; In order to further improve the accuracy of selecting negative samples, the above comparison bounding box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i, and the other first predicted bounding boxes do not include the first predicted bounding box with the serial number i The target object enclosed by the first initial bounding box of i (i.e. the first predicted bounding box with serial number k, k≠p, p=i or p=j, the first predicted bounding box with serial number j and the first predicted bounding box with serial number i The target object enclosed by the first initial bounding box is the same), that is, only other first predicted bounding boxes that contain different target objects from the first initial bounding box with serial number i are used as the negative of the real bounding box with serial number i. Example sample.

B22, perform a boundary box intersection and union loss calculation on the real bounding box corresponding to the first initial bounding box and the other first predicted bounding boxes, respectively, to obtain a second intersection and union loss.

Still taking the first initial bounding box with serial number i as an example, for each other first predicted bounding box in the comparison boundary box set, calculate the difference between the real bounding box with serial number i and the first predicted bounding box with serial number k. The intersection-union ratio loss is obtained, and the second intersection-union ratio loss corresponding to the first predicted bounding box with serial number k is obtained.

B23: Based on the first intersection-union ratio loss and the second intersection-union ratio loss, determine the second discrimination result corresponding to the above-mentioned first initial bounding box.

In the process of determining the second discrimination result that represents the degree of coincidence of the bounding box coordinates, the first intersection and union ratio loss is calculated based on the real bounding box with serial number i and the first predicted bounding box with serial number i, and based on the first predicted bounding box with serial number i, The real bounding box and the first predicted bounding box with serial number k, calculate the second intersection and union ratio loss (k≠p) to determine the second discrimination result (that is, the second discrimination result can include the first intersection and union ratio loss and the second Intersection and union ratio loss), then, based on the second discrimination result, the second regression loss component related to the degree of coincidence of the bounding box coordinates can be determined. In this way, the model parameters can be adjusted based on the second regression loss component to make the real object with serial number i The bounding box has a higher degree of coincidence with the coordinates of the first predicted bounding box numbered i, which makes the coordinates of the bounding box coincide with other first predicted bounding boxes smaller, thereby enhancing the global nature of the bounding box regression learning and further improving the bounding box regression. Learning accuracy.

The above-mentioned second regression loss component is the logarithm of the target intersection-union ratio loss. The target intersection-union ratio loss is the quotient of the index of the first intersection-union ratio loss and the sum of the indices of multiple second intersection-union ratios, that is Taking p=i as an example, the second regression loss component can be expressed as:

in, Represents the real bounding box corresponding to the first initial bounding box with serial number i, Represents the first initial bounding box with serial number i, Represents the first predicted bounding box corresponding to the first initial bounding box with serial number i, represents the first cross-union loss, Represents the first initial bounding box with serial number k, Represents the first predicted bounding box corresponding to the first initial bounding box with sequence number k, represents the second intersection-union ratio loss, θ _g represents the model parameters of the generated sub-model, and ω represents the preset adjustment factor.

Among them, for the determination process of the regression loss compensation value, the loss gradient of the regression loss function of the regression loss function of the model to be trained is calculated based on the real boundary box and the first predicted boundary box corresponding to the first initial boundary box. Constrained regression loss compensation value, specifically including:

Step C1: Generate a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box.

Taking the first initial bounding box with the serial number i as an example, according to the preset coordinate information sampling method, based on the first coordinate information set corresponding to the real bounding box with the serial number i and the second predicted bounding box corresponding to the serial number i The coordinate information set determines the sampling coordinate information set; based on the sampling coordinate information set, the synthetic bounding box with serial number i is determined.

Step C2: Determine the regression loss compensation value based on the similarity of the bounding box distribution between the synthetic bounding box corresponding to the first initial bounding box and the real bounding box.

After determining the synthetic bounding box corresponding to the first initial bounding box with serial number i Finally, calculate the similarity of the bounding box distribution between the synthetic bounding box with serial number i and the real bounding box with serial number i, that is Then calculate the compensation gradient about the synthetic bounding box based on the similarity of the bounding box distribution, that is Then based on the matrix two norm of the compensation gradient, the regression loss compensation value corresponding to the first initial bounding box with serial number i is determined.

Regarding the determination process of the synthetic bounding box corresponding to a certain first initial bounding box, the above-mentioned step C1 generates a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box and the first predicted bounding box corresponding to the above-mentioned first initial bounding box. Bounding box, specifically including:

C11: Determine the first coordinate information subset based on the first sampling ratio and the first coordinate information set of the real bounding box corresponding to the first initial bounding box.

C12: Determine the second coordinate information subset based on the second sampling ratio and the second coordinate information set of the first predicted bounding box corresponding to the first initial bounding box; wherein, it should be noted that the above-mentioned first sampling ratio and the third The second sampling ratio may be preset according to the actual situation, and the sum of the first sampling ratio and the second sampling ratio is equal to 1.

C13: Generate a synthetic bounding box corresponding to the first initial bounding box based on the first coordinate information subset and the second coordinate information subset.

Still taking the first initial bounding box with serial number i as an example, according to the first sampling ratio, in the first coordinate information set of the real bounding box with serial number i, randomly sample to obtain the first coordinate information subset; and according to the second Sampling ratio, in the second coordinate information set of the first predicted bounding box with serial number i, randomly sample to obtain the second coordinate information subset; determine the combination of the first coordinate information subset and the second coordinate information subset as sampling Coordinate information set, the bounding box drawn based on the sampling coordinate information set is the synthetic bounding box with serial number i; among them, since the synthetic bounding box is based on the coordinate information (that is, real data) and serial number of the real bounding box with serial number i is the coordinate information of the first predicted bounding box of i (that is, the generated data), the bounding box obtained by random sampling and mixing. Therefore, part of the coordinate information of the synthetic bounding box comes from the real data, and the other coordinate information comes from the generated data, that is The synthetic bounding box is determined by both real data and generated data and has a certain degree of randomness, so that it can be used in the first discriminant dimension. When the gradient of the corresponding regression loss suddenly decreases or even becomes zero, the gradient of the regression loss value is compensated to avoid the sudden decrease of the gradient of the regression loss corresponding to the first discriminant dimension during the model training process, or even becomes zero. Zero causes the problem of a sudden decrease in the gradient of the regression loss value, thereby further improving the training accuracy of the model parameters.

Considering that in the target detection process, the target detection model not only needs to determine the location of the target object, but also needs to determine the specific category of the target object. Therefore, during the training process of the target detection model, there may be some first initial bounding boxes for some The problem of low accuracy in category identification is that considering the first initial bounding box with low accuracy in category prediction, the first predicted bounding box corresponding to such first initial bounding box may not truly reflect the generated sub-model. Bounding box prediction accuracy, and then for the discrimination results between the first predicted bounding box and the actual bounding box corresponding to such first original bounding box, the discriminant sub-model cannot truly reflect the bounding box prediction accuracy of the generated sub-model. Therefore, in order to To further improve the accuracy of the regression loss value, in the process of determining the sub-regression loss value corresponding to the first predicted bounding box, the first predicted category corresponding to the first predicted bounding box is considered, and only the true category corresponding to the first predicted bounding box is the same as If the first prediction category matches, its corresponding sub-regression loss value will be considered. Otherwise, only its corresponding sub-category loss value will be considered, that is, the first initial bounding box corresponding to the category prediction result that does not meet the preset requirements will be excluded. Sub-regression loss value. Based on this, the above-mentioned model to be trained also includes a classification sub-model; the specific implementation method of each model training also includes: the above-mentioned classification sub-model performs classification processing on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box. , obtain the first prediction category; in specific implementation, the classification sub-model performs category prediction on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box, and the output result may be the first category prediction result; wherein, the first category prediction result Including the predicted probability that the target object enclosed by the first initial bounding box or the first predicted bounding box belongs to each candidate category. The candidate category corresponding to the maximum predicted probability is the first predicted category, that is, the first initial bounding box or the first predicted boundary. The category of the target object enclosed by the frame is predicted by the classification sub-model as the first prediction category, that is, the target object category of the image area within the first initial bounding box or the first prediction bounding box is predicted by the classification sub-model as the first prediction category; In addition, considering that the position information of the first initial bounding box and the first predicted bounding box will not deviate greatly, the position information within the first initial bounding box The image features will not deviate greatly from the image features in the first predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box. Based on this, for the case where bounding box prediction and category prediction are executed sequentially , the first predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding first category prediction result can be obtained, that is, the first predicted bounding box is first obtained based on the first initial bounding box prediction, and then the first predicted bounding box is Perform category prediction to obtain the first category prediction result; and for the situation where bounding box prediction and category prediction are executed simultaneously, the first initial bounding box can also be input into the classification sub-model for category prediction to obtain the corresponding first category prediction result. , that is, the first predicted bounding box is obtained based on the prediction of the first initial bounding box, and the category prediction is performed on the first initial bounding box to obtain the first category prediction result.

It should be noted that the iterative training process of model parameters of the above classification sub-model can refer to the existing classification model training process, and will not be described again here.

The above target information also includes a matching relationship between the first predicted category corresponding to the first initial bounding box and the true category of the first initial bounding box, wherein, for the determination process of the sub-regression loss value corresponding to each first initial bounding box, If the first predicted category corresponding to the first initial bounding box does not match the true category, then the sub-regression loss value corresponding to the first initial bounding box is zero; if the first predicted category corresponding to the first initial bounding box matches the true category matching, then the sub-regression loss value corresponding to the above-mentioned first initial bounding box is based on the first regression loss component corresponding to the above-mentioned boundary box distribution similarity, the second regression loss component corresponding to the above-mentioned bounding box coordinate coincidence degree and the above-mentioned regression loss compensation value The sub-regression loss value determined by at least one of them.

The preset category matching constraints that determine whether the first predicted category corresponding to the first initial bounding box matches the true category may be related to the first category prediction result, and specifically may include: constraints in a single matching method, or constraints in changing matching methods. Conditions, among which, for the constraints of a single matching method, the category matching constraints used in each round of model training remain unchanged (that is, independent of the current round of model training), for example, for each round of model training , if the real category is the same as the first predicted category, then it is determined that the first predicted category corresponding to the first initial bounding box matches the real category; for the constraints of the changing matching method, the category matching used in each round of model training The matching constraints are related to the number of current model training rounds. The constraints that change the matching method can be divided into category matching stage constraints or category matching gradient constraints.

Wherein, the above-mentioned category matching stage-type constraint may be that when the current model training round number is less than the first preset round number, the real category and the first predicted category belong to the same category group, and when the current model training round number is greater than or equal to the first preset round number, the real category and the first predicted category belong to the same category group. At a preset number of rounds, the real category is the same as the first predicted category, that is, based on the category matching staged constraints and the category prediction result corresponding to the first initial bounding box, the staged category matching constraint can be realized; the above category matching gradient constraint The condition may be that the sum of the first constraint term and the second constraint term is greater than the preset probability threshold, the first constraint term is the first prediction probability corresponding to the true category in the category prediction probability subset, and the second constraint term is the category prediction probability subset except The product of the sum of the second predicted probabilities other than the first predicted probability and the preset adjustment factor. The preset adjustment factor gradually decreases as the current number of training rounds increases, that is, based on the category matching gradient constraints and the first initial The category prediction result corresponding to the bounding box can realize the gradual category matching constraint; a category prediction probability subset is determined based on the category prediction result corresponding to the first initial bounding box, and the category prediction probability subset includes the target circled by the first prediction bounding box The first predicted probability that the object belongs to the real category, and the second predicted probability that the object belongs to the non-real category in the target group, that is, the category predicted probability subset includes a classification sub-model that classifies the first initial bounding box or the first predicted bounding box. Predicted, the first predicted probability under the real category in the target group and the second predicted probability under the non-real category in the target group (that is, the candidate category other than the real category in the target group), The target group is the category group where the real category is located; in the specific implementation, multiple candidate categories associated with the target detection task are predetermined, and based on the semantic information of each candidate category, the multiple candidate categories are divided into groups to obtain multiple category groups.

Considering that the first initial bounding box is obtained by extracting the area of interest using a preset area of interest extraction model, it may be that the area where the target object is delineated by the first initial bounding box is not accurate enough, resulting in model training. In the early stage, the category recognition of the first predicted bounding box corresponding to such a first initial bounding box is inaccurate. Based on this, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, the first initial bounding box is referred to The corresponding first prediction category is the same as The matching relationship between the real categories of the first initial bounding box is determined based on the above-mentioned preset category matching constraints and is used to determine whether the first predicted category corresponding to the first initial bounding box matches the real category.

Further, the classification sub-model can be pre-trained, or the model parameters of the classification sub-model can be trained simultaneously during the training process of generating the model parameters of the sub-model, that is, based on the first predicted category and the true category. The classification loss value is used to iteratively train the model parameters of the classification sub-model based on the classification loss value. In view of the situation of synchronous training of the model parameters of the classification sub-model, it is also considered that it may be due to the early stage of model training. The accuracy of the model parameters in the classification sub-model is low, resulting in inaccurate category identification of the first predicted bounding box corresponding to the first initial bounding box. Therefore, in the early stage of model training, the requirements for category accuracy are relaxed, As long as the real category corresponding to the first predicted bounding box and the first predicted category belong to the same category group, the corresponding sub-regression loss value will be considered. In the later stage of model training, the requirements for category accuracy will be tightened. Only Only when the real category corresponding to the first predicted bounding box is the same as the first predicted category will its corresponding sub-regression loss value be considered. Based on this, the above-mentioned preset category matching constraints may include: the above-mentioned constraints on the changing matching method ( Such as category matching stage-type constraints, or category matching gradient constraints);

Furthermore, in order to ensure that the preset category matching constraint conditions are limited to two category matching constraint branches that satisfy the matching relationship between the first predicted category and the real category (that is, the first predicted category belongs to the target group, and the first predicted category is the same as the real category). ), so that as the number of model training rounds increases, the preset category matching constraints gradually transform from limiting the first predicted category to fall into the target group to limiting the first predicted category to be the same as the real category. Based on this , preferably, the above-mentioned preset category matching constraints include: category matching gradient constraints.

In specific implementation, in the case where the above-mentioned preset category matching constraint is a category matching gradient constraint, still taking the first initial bounding box with the serial number i as an example, the category matching gradient constraint can be expressed as:

Among them, groups represents the target group, real _i represents the real category of the first initial bounding box with serial number i in the target group groups, f∈groups\real _i represents the non-real category in the target group, and β represents the prediction adjustment factor , Represents the first prediction probability (i.e. the above-mentioned first constraint item), represents the second prediction probability, represents the above-mentioned second constraint item, μ represents the above-mentioned preset probability threshold; The larger it is, the closer the first predicted category is to the real category; since the preset adjustment factor decreases as the current number of training rounds increases, the reference proportion of the second constraint item gradually decreases, making it more important in the later stages of model training. The first constraint term (i.e., the first prediction probability under the real category) determines whether the first predicted category matches the real category, and then after the current number of model training rounds reaches a certain number of model training rounds, the second constraint term becomes zero. , that is, when When it is greater than the preset probability threshold, it means that the classification sub-model determines the true category as the first predicted category.

For the above-mentioned preset adjustment factor, it decreases as the number of current model training rounds increases. If the current number of model training rounds is less than or equal to the target number of training rounds, then the above-mentioned second constraint term is positively related to the preset adjustment factor, The above-mentioned preset adjustment factor is negatively related to the current number of model training rounds; if the current number of model training rounds is greater than the target number of training rounds, then the above-mentioned second constraint is zero, where the target number of training rounds is less than the total number of training rounds.

During specific implementation, in order to ensure the smoothness of the adjustment of the preset adjustment factor, a linear decreasing adjustment method can be used to gradually reduce the value of the preset adjustment factor β. Therefore, for the determination of the preset adjustment factor used in current model training The process, specifically:

(1) For the first round of model training, determine the first preset value as the preset adjustment factor used in current model training.

The first preset value can be set according to actual needs. In order to simplify the adjustment complexity, the first preset value can be set to 1, that is, the preset adjustment factor β=1, that is, in the first round of model training, The above category matching gradient constraints can be:

Right now

That is to say, for the first round of model training, based on the sum of the first predicted probability and the second predicted probability corresponding to the target group, it is determined whether the first predicted category corresponding to the first initial bounding box matches the true category.

(2) For non-first round model training, determine the preset adjustment factor used in the current model training based on the current model training round number, the target training round number and the above-mentioned first preset value according to the factor decreasing adjustment method.

If the preset adjustment factor β=1 corresponding to the first round of model training, then in the case of non-first round of model training, the above category matching gradient constraints can be:

In other words, for non-first-round model training, the above categories match the gradient constraints And as the number of model training rounds increases, the second constraint participation gradually decreases.

For example, the decreasing formula corresponding to the above factor decreasing adjustment method can be:

in, express Take the maximum value between 0 and 0, the above The first item 1 in represents the first preset value (i.e., the preset adjustment factor β used in the first round of training), δ represents the current model training round number, and Z represents the target training round number, that is, the target training round number can be the total The number of training rounds is reduced by 1, or it can be the specified number of training rounds. The specified number of training rounds is less than the total number of training rounds. The difference between the total number of training rounds and the specified number of training rounds is the preset number of rounds Q. Q is greater than 2, that is, in the model During the training process of a certain number of rounds (not the last round) in the later stage of training, the preset adjustment factor β begins to be set to 0, that is, from δ=Z+1 in the later stage of model training to the last round of model training. The judgment conditions used are all

It should be noted that for the situation where the target number of training rounds Z is the total number of training rounds minus 1, the above reduction formula can be: That is, in the last round of model training, the preset adjustment factor is set to Set to 0, that is, the judgment conditions used in the last round of model training are all In addition, the decrease formula shown above is only a relatively simple linear decrease adjustment method. In the actual application process, the decrease rate of the preset adjustment factor β can be set according to actual needs. Therefore, the above decrease formula does not It does not constitute a limitation on the scope of protection of this application.

In specific implementation, the above-mentioned model to be trained includes a generation sub-model, a discriminant sub-model and a classification sub-model, as shown in Figure 4b, which provides a schematic diagram of the specific implementation principle of the training process of another target detection model, including:

(1) Preliminarily use the preset region of interest extraction model to extract the target region of the sample image data set to obtain N anchor frames.

(2) For each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first initial bounding boxes, and the real bounding boxes corresponding to each first initial bounding box are determined.

(3) For each first initial bounding box, the generating sub-model predicts the bounding box based on the first initial bounding box to obtain the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box generates a set of discrimination results; the classification sub-model performs category prediction on the first predicted bounding box to obtain the category prediction result; according to the preset category matching constraints, the real bounding box corresponding to the first initial bounding box The category prediction result of the true category and the first predicted bounding box corresponding to the first initial bounding box determines the category matching result; if the category matching result indicates that the first predicted category and the true category do not satisfy the preset category matching constraints, then the The sub-regression loss value corresponding to the first initial bounding box is zero; if the category matching result represents that the first predicted category and the true category satisfy the preset category matching constraints, then the first discrimination in the set of discrimination results based on the first initial bounding box As a result, a first regression loss component is determined, a second regression loss component is determined based on a second discrimination result in a set of discrimination results of the first initial bounding box, and a third discrimination result is determined based on a set of discrimination results of the first initial bounding box. three regression loss components, and then determine the sub-regression loss value corresponding to the first initial bounding box based on the first regression loss component, the second regression loss component and the third regression loss component.

It should be noted that the determination process of the above category matching results may be performed by a separate processing module Execution can also be executed by the discriminant sub-model. In this way, when the first predicted category and the real category do not satisfy the preset category matching constraints, it is enough to directly determine that the corresponding set of discriminant results is empty or has preset information, without the need to base it on the first The real bounding box corresponding to the initial bounding box and the corresponding first predicted bounding box generate a set of discrimination results, which can further improve the model training efficiency; with reference to what is shown in Figure 4b, the real category corresponding to each real bounding box and each third The category prediction result corresponding to a predicted bounding box is input to the discriminator model; the discriminator model predicts the category prediction result of the first predicted bounding box corresponding to the first initial bounding box based on the true category of the real bounding box corresponding to the first initial bounding box, Determine the category matching result; if the category matching result represents that the first predicted category and the true category do not meet the preset category matching constraints, the corresponding discrimination result set is empty or preset information. Therefore, the sub-regression determined based on the discrimination result set The loss value is zero; if the category matching result represents that the first predicted category and the real category satisfy the preset category matching constraints, then a discrimination result is generated based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; therefore, the sub-regression loss value determined based on the discrimination result set is based on the first regression loss component corresponding to the first discrimination result in the discrimination result set, the second regression loss component corresponding to the second discrimination result, and the third discrimination result Determined by the corresponding third regression loss component;

That is to say, in the process of determining whether the sub-regression loss value corresponding to the first initial bounding box is zero, the discrimination result can be generated directly based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; and then determine the category prediction result to determine the matching relationship between the first predicted category and the real category (that is, the category matching result indicates whether the first predicted category and the real category satisfy the preset category matching constraints); if the matching relationship is that the category is not matching, then determine the corresponding sub-regression loss value to be zero. If the matching relationship is category matching, then determine the corresponding sub-regression loss value based on multiple discrimination results in the discrimination result set; you can also first determine the first one based on the category prediction results. The matching relationship between the predicted category and the real category. If the matching relationship is category mismatch, then it is determined that the corresponding discrimination result set is empty or preset information, and the corresponding sub-regression loss value is determined to be zero. If the matching relationship is category matching, then Generate a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, and determine the pair based on multiple discrimination results in the set of discrimination results. The corresponding sub-regression loss value.

(4) Based on the sub-regression loss value corresponding to the first initial bounding box, determine the regression loss value of the model to be trained; use the stochastic gradient descent method to adjust the model parameters of the above-mentioned generative sub-model and discriminant sub-model based on the regression loss value, and obtain Generative submodel and discriminant submodel after parameter update.

(5) If the current model training results meet the preset model training end conditions, then the above updated generated sub-model is determined as the trained target detection model; if the current model training results do not meet the preset model training end conditions, then the The above updated generative sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.

The target detection model training method in the embodiment of the present application, during the model training stage, is based on the real bounding box and the first initial bounding box, prompting the model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer Based on the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring the use of the target The detection model's target detection accuracy for new images to be detected improves the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the set of discrimination results output by the discriminating sub-model. , determine the regression loss value of the model to be trained, and then continuously update the model parameters of the generative sub-model and the discriminant sub-model for multiple rounds of iterations based on the regression loss value, until the current model training results meet the preset model training end conditions, that is, based on the generative discriminant Multiple rounds of confrontation are used to continuously learn the bounding box distribution, in which the discriminator model can determine whether the first predicted bounding box predicted by the generating submodel is realistic enough. When it is difficult to distinguish, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving The model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discriminating results output by the discriminating sub-model not only includes the first discriminating result that represents the similarity of the bounding box distribution, but also includes the third discriminating result that represents the degree of coincidence of the bounding box coordinates. The second discrimination result is to compensate for the problem caused by similar distribution of bounding boxes but specific position deviation. The effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.

Corresponding to the target detection model training method described in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a target detection method. Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application. The method in Figure 5 can be executed by an electronic device provided with a target detection device, which may be a terminal device or a designated server, wherein the hardware device for target detection (ie, the electronic device provided with the target detection device) and the target The hardware device for detection model training (that is, the electronic device equipped with the target detection model training device) can be the same or different. As shown in Figure 5, the method at least includes the following steps:

S502: Obtain a third preset number of second initial bounding boxes; wherein the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.

The process of obtaining the third preset number of second initial bounding boxes may be referred to the above-mentioned process of obtaining the first preset number of first initial bounding boxes, which will not be described again here.

S504, input the above-mentioned second initial bounding box into the target detection model for target detection, and obtain the second prediction boundary box and the second prediction category corresponding to each second initial boundary box; wherein, the target detection model is based on the above-mentioned target detection model training method. For the specific training process of the target detection model obtained by training, please refer to the above embodiments and will not be described again here.

The above target detection model includes a classification sub-model and a generation sub-model; for each second initial bounding box: during the target detection process, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the corresponding second initial bounding box. The second predicted bounding box; the classification sub-model performs classification processing on the second initial bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second initial bounding box.

The classification sub-model performs category prediction on the above-mentioned second initial bounding box or the above-mentioned second predicted bounding box, and the output result may be a second category prediction result; wherein the second category prediction result includes the second initial bounding box or the second predicted bounding box. The circled target object belongs to the predicted probability of each candidate category. rate, the candidate category corresponding to the maximum prediction probability is the second prediction category, that is, the category of the target object enclosed by the second initial bounding box or the second prediction bounding box is predicted by the classification sub-model as the second prediction category, that is, the second prediction category. The target object category of the image area within the initial bounding box or the second predicted bounding box is predicted by the classification sub-model as the second predicted category; in addition, during specific implementation, the position information of the second initial bounding box and the second predicted bounding box is taken into account There will not be a large deviation, and the image features in the second initial bounding box will not deviate greatly from the image features in the second predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box, based on Therefore, for the situation where bounding box prediction and category prediction are performed sequentially, the second predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding second category prediction result is obtained, that is, based on the second initial bounding box prediction. Secondly predict the bounding box, and then perform category prediction on the second predicted bounding box to obtain the second category prediction result; and for the situation where boundary box prediction and category prediction are executed simultaneously, the second initial bounding box can also be input to the classification sub-model Category prediction is performed in the method to obtain the corresponding second category prediction result, that is, the second predicted bounding box is obtained based on the second initial bounding box prediction, and category prediction is performed on the second initial bounding box to obtain the second category prediction result.

S506: Generate a target detection result of the image to be detected based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box;

Based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined. For example, the image to be detected contains a A cat, a dog and a pedestrian.

The above target detection model includes a generation sub-model and a classification sub-model. As shown in Figure 6, a schematic diagram of the specific implementation principle of the target detection process is given, which specifically includes: using the preset region of interest extraction model to target the image to be detected. Extract and obtain P anchor boxes; randomly sample n anchor boxes from the P anchor boxes as the second initial bounding box; for each second initial bounding box, generate a sub-model to predict the bounding box based on the second initial bounding box. , obtain the second predicted bounding box; the classification sub-model predicts the category of the second predicted bounding box, and obtains the second predicted category;

Based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box, generate Detect the object detection results of the image.

It should be noted that the target detection model trained based on the above target detection model training method can be applied to any specific application scenario that requires target detection on the image to be detected, where the image to be detected can be set at a certain on-site location. What is collected by the image acquisition device, correspondingly, the target detection device can belong to the image acquisition device, and specifically can be an image processing device in the image acquisition device. The image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, And perform target detection on the image to be detected; the target detection device can also be a separate target detection device independent of the image acquisition device. The target detection device receives the image to be detected from the image acquisition device and performs target detection on the image to be detected. .

For specific application scenarios of target detection, for example, the image to be detected can be collected by an image collection device installed at the entrance of a certain public place (such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.). The corresponding , the target object to be detected in the image to be detected is the target user who enters the public place. The above target detection model is used to perform target detection on the image to be detected, so as to delineate the second target user who enters the public place in the image to be detected. Predict the bounding box, and determine the second prediction category corresponding to the second predicted bounding box (that is, the category of the target user included in the second predicted bounding box, such as at least one of age group, gender, height, and occupation), and obtain the target user to be detected The target detection result of the image; then, the user group identification result is determined based on the target detection result (such as the flow of people entering the public place, or the attributes of the user group entering the public place, etc.), and then, based on the user group identification result, the corresponding Business processing (such as automatically triggering admission restriction prompt operations, or pushing information to target users, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the target detection of the image to be detected output by the target detection model is The accuracy of the results will be higher. Therefore, the accuracy of triggering corresponding business processing based on the target detection results will be higher.

For another example, the image to be detected can be collected by image acquisition equipment installed at each monitoring point in a certain breeding base. Correspondingly, the target object to be detected in the image to be detected is the target breeding object in the breeding monitoring point. Use the above target detection model to perform target detection on the image to be detected, so as to delineate the second predicted bounding box containing the target breeding object in the image to be detected, and determine the second predicted bounding box. Detect the second prediction category corresponding to the bounding box (that is, the category of the target breeding object contained in the second prediction bounding box, such as at least one of living status and body size), and obtain the target detection result of the image to be detected; then, based on the target The detection results determine the identification results of the breeding object group (such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.), and then perform corresponding control operations based on the identification result of the breeding object group ( If a decrease in survival rate is detected, an alarm message will be automatically issued, or if a slowdown in growth rate is detected, the feeding amount or frequency will be automatically increased, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the The accuracy of the target detection results of the image to be detected outputted by the target detection model will be higher. Therefore, the accuracy of triggering corresponding control operations based on the target detection results will be higher.

The target detection method in the embodiment of the present application, during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third preset number of candidate bounding boxes among the candidate bounding boxes as The second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, Obtain the second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to the second initial bounding box, generate the target detection result of the image to be detected; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bounding box prediction of the location of the target object in the image to be detected; and the model to be trained includes a generative sub-model and a discriminant sub-model, the regression loss value is determined based on the set of discrimination results output by the discriminant sub-model, and then the regression loss value is continuously calculated The model parameters are updated iteratively to improve the efficiency of updating the model parameters of the generated sub-model; and the discrimination result set also contains discrimination results that represent the similarity of the distribution of bounding boxes and the degree of coincidence of bounding box coordinates, making the regression loss value obtained based on the discrimination result set more accurate Higher, the accuracy of the model parameters updated based on the regression loss value is further improved, thereby ensuring that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the use of the target detection model on the image to be detected. Accurate target detection Spend.

It should be noted that this embodiment in this application is based on the same inventive concept as the previous embodiment in this application. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the aforementioned target detection model training method, and repeated details will not be repeated.

Corresponding to the target detection model training method described in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a target detection model training device. Figure 7 shows the target detection model training device provided by the embodiment of the present application. A schematic diagram of the module composition. The device is used to perform the target detection model training method described in Figures 1 to 4b. As shown in Figure 7, the device includes:

The first bounding box acquisition module 702 is configured to acquire a first initial bounding box and acquire a real bounding box corresponding to the first initial bounding box; the first initial bounding box is extracted using a preset region of interest model pair The sample image data set is obtained by extracting the target area.

The model training module 704 is configured to input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model.

Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box. Result set; the discrimination result set includes a first discrimination result and a second discrimination result, the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box, and the second The discrimination result represents the overlap degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to each of the first initial bounding boxes, the model to be trained is determined The regression loss value; perform parameter updates on the generator sub-model and the discriminant sub-model based on the regression loss value.

The target detection model training device in the embodiment of the present application, in the model training stage, is based on The real bounding box and the first initial bounding box prompt the model to be trained to continuously learn the distribution of bounding boxes, making the predicted first predicted bounding box closer to the real bounding box. This can not only improve the accuracy of the trained target detection model in the image to be detected. The accuracy of the bounding box prediction of the location of the target object can also improve the generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model and improving the target after training. Detect the data migration adaptability of the model; and the model to be trained includes a generating sub-model and a discriminating sub-model. Based on the set of discriminating results output by the discriminating sub-model, the regression loss value of the model to be trained is determined, and then the generating sub-model is continuously generated based on the regression loss value. The model parameters of the model and the discriminator model are updated iteratively in multiple rounds until the current model training results meet the preset model training end conditions, that is, the bounding box distribution is continuously learned based on multiple rounds of generation and discrimination confrontation, in which the discriminator model can determine the generator Whether the first predicted bounding box predicted by the model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, the discriminant based on the discriminant sub-model As a result, adjusting the model parameters can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving the model parameter update efficiency and bounding box distribution learning accuracy of the generating sub-model; and the discriminator The set of discrimination results output by the model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates to compensate for the boundary caused by the similarity of the bounding box distribution but the specific position deviation. The effect of the frame regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.

It should be noted that the embodiment of the target detection model training device in this application and the embodiment of the target detection model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the corresponding target detection model mentioned above. The implementation of training methods will not be repeated again.

Corresponding to the target detection method described in Figures 5 to 6 above, based on the same technical concept, embodiments of the present application also provide a target detection device. Figure 8 shows the target detection device provided by the embodiment of the present application. Schematic diagram of the module composition of the device. The device is used to perform the target detection method described in Figures 5 to 6. As shown in Figure 8, the device includes:

The second bounding box acquisition module 802 is configured to acquire a third preset number of second initial bounding boxes; the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.

The target detection module 804 is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to each of the second initial bounding boxes;

The detection result generation module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding box and the second prediction category corresponding to each of the second initial bounding box.

The target detection device in the embodiment of the present application, during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third predicted number of candidate bounding boxes among the candidate bounding boxes as the third Two initial bounding boxes; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, and obtains second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to each second initial bounding box, the target detection result of the image to be detected is generated; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bounding box prediction of the location of the target object in the image to be detected; and the model to be trained includes a generative sub-model and a discriminant sub-model, the regression loss value is determined based on the set of discrimination results output by the discriminant sub-model, and then the regression loss value is continuously calculated The model parameters are updated iteratively to improve the efficiency of updating the model parameters of the generated sub-model; and the discrimination result set also contains discrimination results that represent the similarity of the distribution of bounding boxes and the degree of coincidence of bounding box coordinates, making the regression loss value obtained based on the discrimination result set more accurate Higher, further improvement based on the regression loss value update The accuracy of the model parameters ensures that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the accuracy of target detection using the target detection model on the image to be detected.

It should be noted that the embodiments of the target detection device in this application and the embodiments of the target detection method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding target detection method mentioned above. Repeat No further details will be given.

Further, corresponding to the methods shown in Figures 1 to 6 above, based on the same technical concept, embodiments of the present application also provide a computer device, which is used to execute the above-mentioned target detection model training method or target detection method, As shown in Figure 9.

Computer equipment may vary greatly due to different configurations or performance, and may include one or more processors 901 and memory 902, and the memory 902 may store one or more storage application programs or data. Among them, the memory 902 may be short-term storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), and each module may include a series of computer-executable instructions on a computer device. Furthermore, the processor 901 may be configured to communicate with the memory 902 and execute a series of computer-executable instructions in the memory 902 on the computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, etc.

The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for obtaining a first predetermined number of first initial bounding boxes, and obtaining The real bounding box corresponding to the first initial bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; The first initial bounding box and the real bounding box are input into the model to be trained for iterative model training until the current model training result satisfies the preset model training end conditions, and a target detection model is obtained.

Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box. Result set; the discrimination result set includes a first discrimination result and a second discrimination result, the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box, and the second The discrimination result represents the degree of coincidence of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box, the model to be trained is determined Regression loss value; perform parameter update on the generating sub-model and the discriminating sub-model based on the regression loss value.

The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for: obtaining a third predetermined number of second initial bounding boxes; The second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; the second initial bounding box is input into the target detection model for target detection, and the corresponding second initial bounding box is obtained. a second predicted bounding box and a second predicted category; based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, a target detection result of the image to be detected is generated.

The computer device in the embodiment of the present application, during the model training phase, prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box is closer to the real boundary This can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve High generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model, and improving the data migration adaptability of the trained target detection model; and the model to be trained includes generation The sub-model and the discriminant sub-model determine the regression loss value of the model to be trained based on the set of discrimination results output by the discriminant sub-model, and then continuously update the model parameters of the generating sub-model and the discriminant sub-model in multiple rounds of iterations based on the regression loss value. Until the current model training results meet the preset model training end conditions, that is, the bounding box distribution is continuously learned based on multiple rounds of generation and discrimination, where the discriminator sub-model can determine whether the first predicted bounding box predicted by the generation sub-model is real enough. When the generated bounding box (i.e., the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the prediction of the generated sub-model. The obtained first predicted bounding box is closer to the real bounding box, thereby further improving the model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discrimination results output by the discriminating sub-model not only includes representations of similar bounding box distributions The first discrimination result of degree also includes the second discrimination result that represents the degree of coincidence of bounding box coordinates, achieving the effect of making up for the bounding box regression loss caused by similar distribution of bounding boxes but specific position deviation, so that the regression obtained based on the discrimination result set The loss value is more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, Then randomly sample candidate bounding boxes among the candidate bounding boxes as the second initial bounding box; for each second initial bounding box, the generating sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; classification The sub-model predicts the category of the second predicted bounding box to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, the target detection result of the image to be detected is generated, thereby ensuring The generative sub-model can also accurately predict bounding boxes on new images to be detected, thereby improving the accuracy of target detection using the target detection model on images to be detected.

It should be noted that the embodiment of the computer device in this application and the embodiment of the target detection model training method in this application are based on the same inventive concept, so the specific implementation of this embodiment can be Please refer to the implementation of the corresponding target detection model training method mentioned above, and the repeated points will not be described again.

Further, corresponding to the methods shown in Figures 1 to 6 above, based on the same technical concept, embodiments of the present application also provide a storage medium for storing computer executable instructions. In a specific embodiment, the The storage medium can be a U disk, an optical disk, a hard disk, etc. When the computer executable instructions stored in the storage medium are executed by the processor, the following process can be achieved: obtaining the first initial bounding box, and obtaining the corresponding first initial bounding box. The real bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box to be trained The model undergoes model iterative training until the current model training results meet the preset model training end conditions, and the target detection model is obtained.

In another specific embodiment, the storage medium can be a U disk, an optical disk, a hard disk, etc., and when the computer executable instructions stored in the storage medium are executed by the processor, the following process can be implemented: obtain the second initial bounding box; The second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; inputting the second initial bounding box into the target detection model for target detection, the second initial bounding box is obtained the corresponding second predicted bounding box and the second predicted category; based on the second predicted bounding box corresponding to the second initial bounding box and the second predicted detection category, and generate target detection results of the image to be detected.

When the computer-executable instructions stored in the storage medium in the embodiment of the present application are executed by the processor, during the model training phase, the model to be trained is prompted to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the prediction The obtained first predicted bounding box is closer to the real bounding box, which not only improves the accuracy of the trained target detection model in predicting the bounding box at the location of the target object in the image to be detected, but also improves the accuracy of the trained target detection model. Generalizability, thereby ensuring the target detection accuracy of new images to be detected using the target detection model, and improving the data migration adaptability of the trained target detection model; and the model to be trained includes a generative sub-model and a discriminant sub-model, based on The set of discrimination results output by the discriminant sub-model determines the regression loss value of the model to be trained, and then continuously updates the model parameters of the generating sub-model and the discriminant sub-model for multiple rounds of iterations based on the regression loss value until the current model training results meet the preset The end condition of model training is to continuously learn the bounding box distribution based on multiple rounds of generative and discriminative confrontation, in which the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is realistic enough. In the generated bounding box (i.e., the first When the predicted bounding box (predicted bounding box) is difficult to distinguish from the real bounding box, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be more accurate. is close to the real bounding box, thereby further improving the model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discriminating results output by the discriminating sub-model not only includes the first discriminating result that characterizes the similarity of the bounding box distribution, but also Including the second discrimination result that represents the degree of coincidence of the bounding box coordinates, it achieves the effect of making up for the bounding box regression loss caused by the similar distribution of the bounding boxes but the specific position deviation, making the regression loss value obtained based on the discrimination result set more accurate, thus It can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly sample among the candidate bounding boxes The candidate bounding box is used as the second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub-model predicts the second predicted bounding box Perform category prediction to obtain the second predicted category; then, based on the second initial bounding box correspondence The second predicted bounding box and the second predicted category are used to generate the target detection result of the image to be detected, thereby ensuring that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the use of the target detection model to be detected. The accuracy of target detection in images.

It should be noted that the embodiment about the storage medium in this application and the embodiment about the target detection model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the corresponding target detection model training method mentioned above. Implementation, repeated parts will not be repeated.

The above has described specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations. Those skilled in the art should understand that embodiments of the present application may be provided as methods, systems or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable memory The instructions in produce an article of manufacture that includes instruction means to implement the functions specified in the process or processes of the flowchart and/or the block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, disk storage or other magnetic storage devices, or any other non-transmission medium, can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves. It should also be noted that the terms "comprises," "comprises" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.

Embodiments of the present application may be implemented in the general context of computer-executable instructions executed by a computer. Description, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices. Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. The above are only examples of this document and are not intended to limit this document. Various modifications and variations of this document may occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this document shall be included in the scope of the claims of this document.

Claims

A target detection model training method, the method includes:

Obtain a first initial bounding box, and obtain a real bounding box corresponding to the first initial bounding box; the first initial bounding box is obtained by extracting a target area from the sample image data set using a preset region of interest extraction model;

Input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model;

Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training includes:

The generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial boundary. The first predicted bounding box corresponding to the frame generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the first predicted bounding box and the real boundary box The similarity degree of the bounding box distribution, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box;

Based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box, determine the regression loss value of the model to be trained;

Parameter updates are performed on the generating sub-model and the discriminating sub-model based on the regression loss value.
The method according to claim 1, wherein said obtaining the first initial bounding box further includes:

The sample image data set is input into the preset region of interest extraction model to extract the region of interest, and a second preset number of candidate bounding boxes are obtained; the second preset number is greater than the first preset number, and the first preset number is The number is the number of the first initial bounding boxes;

Select the first preset number of candidate bounding boxes from the second preset number of candidate bounding boxes as the first initial bounding box.
The method according to claim 1, wherein generating a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box includes:

Perform a boundary box authenticity judgment on the real bounding box and the first predicted boundary box corresponding to the first initial bounding box to obtain the first judgment result; based on the real bounding box and the first predicted boundary corresponding to the first initial bounding box box, calculate the intersection and union ratio loss of the bounding box to obtain the second discrimination result; based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, calculate the regression loss function for the model to be trained. The loss gradient is used to constrain the regression loss compensation value to obtain the third discrimination result.
The method according to claim 3, wherein determining the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box includes:

Determine the sub-regression loss value corresponding to the first initial bounding box; the sub-regression loss value corresponding to the first initial bounding box is determined based on target information, and the target information includes one or a combination of the following: the first The similarity of the distribution of bounding boxes represented by the first discrimination result corresponding to the initial bounding box, the degree of coincidence of bounding box coordinates represented by the second discrimination result, and the regression loss compensation value represented by the third discrimination result;

Based on the sub-regression loss value corresponding to the first initial bounding box, the regression loss value of the model to be trained is determined.
The method according to claim 3, wherein said judging the authenticity of the bounding box corresponding to the first initial bounding box and the first predicted bounding box to obtain the first judgment result includes:

Based on the real bounding box corresponding to the first initial bounding box, determine the first discriminant probability that the real bounding box is predicted to be true by the discriminant sub-model; based on the first discriminant probability corresponding to the first initial bounding box A predicted bounding box, determining the second discrimination probability that the first predicted bounding box is predicted to be fake by the discriminant sub-model;

Based on the first discrimination probability and the second discrimination probability, a first discrimination result is generated.
The method of claim 5, wherein generating a first discrimination result based on the first discrimination probability and the second discrimination probability includes:

Determine a first weighted probability based on the first discriminant probability and the first prior probability of the real bounding box corresponding to the first initial bounding box; based on the second discriminant probability and the first a priori probability of the first initial bounding box Two prior probabilities, determine the second weighted probability;

Based on the first weighted probability and the second weighted probability, a first discrimination result is generated.
The method according to claim 3, wherein, based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, calculating the intersection and union ratio loss of the bounding box to obtain the second discrimination result includes:

Perform a boundary box intersection and union loss calculation on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box to obtain a first intersection and union loss;

Based on the first intersection-union ratio loss, a second discrimination result corresponding to the first initial bounding box is determined.
The method according to claim 3, wherein the regression loss function for the regression loss function of the model to be trained is calculated based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box. The regression loss compensation value constrained by the loss gradient includes:

Based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box, generate a synthetic bounding box corresponding to the first initial bounding box;

Based on the similarity of the bounding box distribution between the synthetic bounding box corresponding to the first initial bounding box and the real bounding box, the regression loss compensation value is determined.
The method of claim 8, wherein generating a synthetic bounding box corresponding to the first initial bounding box based on a real bounding box corresponding to the first initial bounding box and a first predicted bounding box includes:

Determine a first subset of coordinate information based on the first sampling ratio and the first set of coordinate information of the real bounding box corresponding to the first initial bounding box;

Determine a second subset of coordinate information based on the second sampling ratio and the second coordinate information set of the first predicted bounding box corresponding to the first initial bounding box; the sum of the first sampling ratio and the second sampling ratio equal to 1;

Based on the first subset of coordinate information and the second subset of coordinate information, a synthetic bounding box corresponding to the first initial bounding box is generated.
The method according to claim 4, wherein the model to be trained further includes a classification sub-model; the specific implementation of each model training further includes: the classification sub-model analyzes the first initial bounding box or the third A predicted bounding box is classified and processed to obtain the first predicted category;

The target information also includes a matching relationship between the first predicted category corresponding to the first initial bounding box and the true category of the first initial bounding box, wherein if the first predicted category and the true category does not match, then the sub-regression loss value corresponding to the first initial bounding box is zero; if the first predicted category matches the true category, then the sub-regression loss value corresponding to the first initial bounding box is A sub-regression loss value determined based on at least one of the first regression loss component corresponding to the similarity of the bounding box distribution, the second regression loss component corresponding to the coincidence degree of the bounding box coordinates, and the regression loss compensation value.
A target detection method, the method includes:

Obtain a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model;

The second initial bounding box is input into the target detection model for target detection to obtain the second The second predicted bounding box and second predicted category corresponding to the initial bounding box;

Based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, a target detection result of the image to be detected is generated.
The method according to claim 11, wherein the target detection model includes a classification sub-model and a generation sub-model;

In the target detection process, the generation sub-model performs bounding box prediction based on the second initial bounding box, and obtains a second predicted bounding box corresponding to the second initial bounding box; the classification sub-model predicts the Classification processing is performed on the second initial bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second initial bounding box.
A target detection model training device, the device includes:

The first bounding box acquisition module is configured to acquire the first initial bounding box, and acquire the real bounding boxes respectively corresponding to the first initial bounding boxes; the first initial bounding box is extracted using a preset region of interest model pair The sample image data set is obtained by extracting the target area;

A model training module configured to input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model;

Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training includes:

The generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial boundary. The first predicted bounding box corresponding to the frame generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the first predicted bounding box and the real boundary box The similarity degree of the bounding box distribution, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the The first discrimination result and the second discrimination result corresponding to the first initial bounding box are used to determine the regression loss value of the model to be trained; and based on the regression loss value, parameter updates are performed on the generating sub-model and the discriminating sub-model.
A target detection device, the device includes:

The second bounding box acquisition module is configured to acquire a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model;

A target detection module configured to input the second initial bounding box into a target detection model for target detection, and obtain a second predicted bounding box and a second predicted category corresponding to the second initial bounding box;

A detection result generation module is configured to generate a target detection result of the image to be detected based on the second predicted bounding box corresponding to the second initial bounding box and the second predicted category.
A computer device, the device includes:

processor; and

Memory arranged to store computer-executable instructions configured to be executed by the processor, the executable instructions including instructions for performing any one of claims 1-10 or any one of claims 11-12 steps in the method.
A storage medium, the storage medium is used to store computer-executable instructions, the executable instructions enable the computer to execute the method as described in any one of claims 1-10 or 11-12.