WO2024012138A1 - Target detection model training method and apparatus, and target detection method and apparatus - Google Patents

Target detection model training method and apparatus, and target detection method and apparatus Download PDF

Info

Publication number
WO2024012138A1
WO2024012138A1 PCT/CN2023/100274 CN2023100274W WO2024012138A1 WO 2024012138 A1 WO2024012138 A1 WO 2024012138A1 CN 2023100274 W CN2023100274 W CN 2023100274W WO 2024012138 A1 WO2024012138 A1 WO 2024012138A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
model
initial
predicted
sub
Prior art date
Application number
PCT/CN2023/100274
Other languages
French (fr)
Chinese (zh)
Inventor
吕永春
朱徽
王钰
周迅溢
曾定衡
蒋宁
Original Assignee
马上消费金融股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 马上消费金融股份有限公司 filed Critical 马上消费金融股份有限公司
Publication of WO2024012138A1 publication Critical patent/WO2024012138A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes

Definitions

  • the present application relates to the field of target detection, and in particular, to a target detection model training method, target detection method and device.
  • the purpose of this application is to provide a target detection model training method, target detection method and device.
  • this application provides a method for training a target detection model.
  • the method includes: obtaining a first initial bounding box, and obtaining a real bounding box corresponding to the first initial bounding box; the first initial bounding box is Obtained by extracting the target area from the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box into the model to be trained Carry out model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training
  • the method includes: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first The first predicted bounding box corresponding to the initial bounding box generates a set of discrimination results; the set of
  • this application provides a target detection method.
  • the method includes: obtaining a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset area of interest extraction model. ; Input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to the second initial bounding box; based on the second predicted bounding box corresponding to the second initial bounding box The second predicted bounding box and the second predicted category generate a target detection result of the image to be detected.
  • this application provides a target detection model training device.
  • the device includes: a first bounding box acquisition module configured to acquire a first initial bounding box, and acquire the real corresponding corresponding first initial bounding boxes. Bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; a model training module configured to combine the first initial bounding box and the real The bounding box is input to the model to be trained for model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; the model is iteratively trained Each model training in includes: for each first initial bounding box: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discrimination The sub-model generates a set of discrimination results based on the real bounding box corresponding to the first initial bounding box
  • the first discrimination result represents the similarity of the bounding box distribution between the first predicted bounding box and the real bounding box
  • the second discrimination result represents the similarity between the first predicted bounding box and the real bounding box.
  • this application provides a target detection device.
  • the device includes: a second bounding box acquisition module configured to acquire a second initial bounding box; the second initial bounding box is extracted using a preset region of interest.
  • the model is obtained by extracting the target area of the image to be detected; the target detection module is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box corresponding to the second initial bounding box and A second prediction category; a detection result generation module configured to generate a target detection result of the image to be detected based on the second prediction bounding box corresponding to the second initial bounding box and the second prediction category.
  • the present application provides a computer device, the device comprising: a processor; and a memory arranged to store computer-executable instructions, the executable instructions being configured to be executed by the processor, the executable instructions Instructions include steps for performing the methods as described above.
  • embodiments of the present application provide a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions cause the computer to perform steps in the above method.
  • Figure 1 is a schematic flow chart of a target detection model training method provided by an embodiment of the present application
  • Figure 2 is a schematic flow chart of each model training process in the target detection model training method provided by the embodiment of the present application;
  • Figure 3 is a schematic diagram of the first implementation principle of the target detection model training method provided by the embodiment of the present application.
  • Figure 4a is a schematic diagram of the second implementation principle of the target detection model training method provided by the embodiment of the present application.
  • Figure 4b is a schematic diagram of the third implementation principle of the target detection model training method provided by the embodiment of the present application.
  • Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram of the implementation principle of the target detection method provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of the module composition of the target detection model training device provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the model is prompted to learn the image features in the bounding box, continuously learn the similarity between the predicted bounding box and the image features in the real bounding box, and adjust the model parameters, so that the trained target detection model is more dependent on
  • the target detection model has poor generalization and poor cross-data migration capabilities.
  • this application prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box Closer to the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring Use the target detection model to detect targets in new images to be detected, and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the discrimination output by the discriminating sub-model The results are collected to determine the regression loss value of the model to be trained, and then the model parameters of the generating sub-model and the discriminating sub-model are continuously updated for multiple rounds of iterations based on the regression loss value, until the current model training results meet the preset model training end conditions, that is, based on The method
  • adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thus Further improve the model parameter update efficiency and bounding box distribution learning accuracy of the generated sub-model; it is also considered that if the model regression loss is determined only from the coarse-grained comparison dimension of the similarity of the bounding box distribution and the model parameters are adjusted, the bounding box cannot be taken into account Precise position learning, or only determining the model regression loss from the fine-grained comparison dimension of the bounding box coordinate coincidence degree, and adjusting the model parameters, will not be able to take into account the edge ambiguity of the bounding box.
  • the model regression loss is determined by combining the coarse-grained comparison dimension of the extent and the fine-grained comparison dimension of the bounding box coordinate coincidence degree, which is the discriminant sub-model.
  • the set of output discrimination results not only includes the first discrimination result that characterizes the similarity of the distribution of bounding boxes, but also includes the second discrimination result that characterizes the degree of coincidence of the coordinates of the bounding boxes, so as to simultaneously consider the effects of bounding boxes with similar distribution of bounding boxes but specific position deviations.
  • the regression loss and the effect of the regression loss caused by the first predicted bounding box corresponding to the real bounding box of edge ambiguity make the regression loss value obtained based on the discrimination result set more accurate, thus further improving the accuracy of the regression loss based on the regression
  • the accuracy of the model parameters after the loss value is updated.
  • Figure 1 is a first flow diagram of a target detection model training method provided by one or more embodiments of the present application.
  • the method in Figure 1 can be executed by an electronic device equipped with a target detection model training device.
  • the electronic device can be a terminal.
  • Device or designated server wherein the hardware device used for target detection model training (i.e., the electronic device provided with the target detection model training device) and the hardware device for target detection (i.e., the electronic device provided with the target detection device) may be the same or different .
  • the target detection model trained based on the target detection model training method provided by the embodiment of the present application can be applied to any specific application scenario that requires target detection on the image to be detected.
  • specific application scenario 1 for using a certain The image to be detected collected by the image acquisition equipment at the entrance of a public place (such as the entrance of a shopping mall, subway entrance, entrance to an attraction, or entrance to a performance site, etc.) is used for target detection.
  • specific application scenario 2 which uses a certain breeding base.
  • the images to be detected collected by the image acquisition equipment at each monitoring point are used for target detection.
  • the sample image data sets used in the training process of the target detection model are also different.
  • the sample image data set can be a designated public place within a preset historical time period.
  • the target object circled by the first initial bounding box is the target user who entered the designated public place in the historical sample image.
  • the real category and the first predicted category can be the category to which the target user belongs, such as At least one of age group, gender, height, and occupation; for specific application scenario 2, the sample image data set can be historical sample images collected at each monitoring point in the designated breeding base within the preset historical time period, corresponding to the first
  • the target object circled by the initial bounding box is the target breeding object in the historical sample image.
  • the real category and the first predicted category can be the category of the target breeding object, such as live At least one of physical condition and body size.
  • the training process of the target detection model includes at least the following steps:
  • the process of determining the first preset number of first initial bounding boxes may include, for each round of model training in the model iterative training, performing a step of extracting the target area from the sample image data set using the preset region of interest extraction model. , to obtain a first preset number of first initial bounding boxes; it may also be performed in advance using a preset region of interest extraction model to extract the target area from the sample image data set, and then for each round of model training in the model iterative training , randomly sampling from a large number of pre-extracted candidate bounding boxes to obtain a first preset number of first initial bounding boxes.
  • the sample image data set may contain multiple sample target objects, and each sample target object may correspond to multiple first initial bounding boxes. That is, the first preset number of first initial bounding boxes includes at least one corresponding to each sample target object. First initial bounding box.
  • the method further includes: inputting the sample image data set into a preset region of interest extraction model to perform region of interest extraction to obtain a second preset number of candidate bounding boxes; wherein , the second preset number is equal to or greater than the first preset number, and the first preset number is the number of the first initial bounding boxes, that is, for the situation where the second preset number is equal to the first preset number, for the above
  • a preset region of interest extraction model is used to extract regions of interest from multiple sample image data in the sample image data set, and a first preset number of first initial bounding boxes are obtained; For the situation where the second preset number is greater than the first preset number, for each round of model training in the iterative training of the model, randomly sample the first preset number of first initial bounding boxes from the first preset number of candidate bounding boxes. bounding box.
  • a preset region of interest extraction model to extract N anchor boxes in advance, and then, in each round of model training, m are randomly sampled from the N anchor boxes as the first initial bounding boxes, and are input to the to-be- Model training is performed in the training model, which can not only ensure the data processing volume of each round of model training, but also ensure that the model can better learn the bounding box distribution, that is, it can promote the boundary while taking into account the data processing volume during the model training process.
  • Frame distribution learning based on this, the above-mentioned second preset number is greater than the first preset number, and the above-mentioned first preset number is the number of the first initial bounding boxes.
  • the above-mentioned acquisition of the first initial bounding box specifically includes : From the above-mentioned second preset number of candidate bounding boxes, randomly select a first preset number of candidate bounding boxes as the first initial bounding box, that is, use the preset region of interest extraction model in advance to extract multiple objects in the sample image data set.
  • the region of interest is extracted from the sample image data to obtain a second preset number of candidate bounding boxes; then, for each round of model training, a first preset number of candidate bounding boxes are randomly sampled from the second preset number.
  • An initial bounding box is: From the above-mentioned second preset number of candidate bounding boxes, randomly select a first preset number of candidate bounding boxes as the first initial bounding box, that is, use the preset region of interest extraction model in advance to extract multiple objects in the sample image data set.
  • the region of interest is extracted from the sample image data to obtain a second preset number of candidate bounding boxes; then, for each round of model training, a first prese
  • a preferred implementation is to pre-extract N anchor boxes (i.e., a second preset number of candidate bounding boxes), and then, for each round of model training, randomly sample m anchor boxes from the N anchor boxes. (i.e., a first preset number of first initial bounding boxes), and then continue to perform the following step S104.
  • the above-mentioned first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain the target detection model;
  • the above preset model training end conditions may include: The current number of model training rounds is equal to the total number of training rounds, the model loss function converges, or a balance is reached between the generative sub-model and the discriminative sub-model.
  • each model training in the iterative training of the above-mentioned model may include the following steps S1042 to step S1046:
  • the generating sub-model performs bounding box prediction based on the first initial bounding box and obtains the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial bounding box.
  • the first predicted bounding box corresponding to the bounding box generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the boundary box between the first predicted bounding box and the real boundary box.
  • the degree of distribution similarity, the above-mentioned second discrimination result represents the degree of coincidence of the bounding box coordinates of the above-mentioned first predicted bounding box and the above-mentioned real bounding box.
  • the KL divergence Kullback-Leibler divergence
  • the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is real enough, when the generated bounding box (i.e., the first predicted bounding box) is indistinguishable from the real bounding box, due to the discriminant sub-model existence, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box.
  • the discriminator model can be used to distinguish between the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box comes from the discriminant probability of real data or generated data respectively.
  • the discriminant probability can characterize the distribution similarity between the real bounding box and the corresponding first predicted bounding box, so based on the discriminant probability, the first regression loss component corresponding to the discriminant dimension considered from the perspective of the boundary box distribution similarity can be determined , thereby prompting the model to perform bounding box regression learning; specifically , for the real bounding box and the first predicted bounding box corresponding to a certain first initial bounding box, the discriminator model determines that the real bounding box comes from the discriminant probability of the real data, and determines that the first predicted bounding box comes from the generated data.
  • the degree of distribution similarity is determined by the discriminant sub-model's discriminant probability of whether the real bounding box and the first predicted bounding box come from real data or generated data respectively.
  • the first discriminant result can be generated based on the discriminant sub-model's discriminant probability, so The first discrimination result can represent the similarity degree of the bounding box distribution, and then based on the discrimination probability in the first discrimination result, the first regression loss component corresponding to the discrimination dimension of the boundary box distribution similarity degree can be determined.
  • the intersection-union ratio loss between a certain real bounding box and the corresponding first predicted bounding box can be considered to obtain the target intersection-union ratio loss; it can also be comprehensive Considering the intersection loss between a certain real bounding box and the corresponding first predicted bounding box, and the intersection loss between a certain real bounding box and the first predicted bounding box corresponding to other real bounding boxes, determine the target Intersection-to-Union Ratio Loss; Since the size of the target Intersection-to-Union Ratio loss can represent the degree of coordinate coincidence between the real bounding box and the corresponding first predicted bounding box, based on the target Intersection-to-Union Ratio loss, it can be determined from the perspective of the boundary box coordinate coincidence degree.
  • the second regression loss component corresponding to the discriminant dimension of The target intersection and union ratio loss between the first predicted bounding boxes.
  • the discriminant dimension for the degree of coordinate coincidence of the bounding box The larger the corresponding second regression loss component is, therefore, the degree of coordinate coincidence between the first predicted bounding box corresponding to a certain first initial bounding box and the corresponding real bounding box is based on the relationship between the real bounding box and the first predicted bounding box.
  • the second regression loss component is determined by the target intersection loss between The intersection-union ratio loss generates a second discrimination result, so that the second discrimination result can represent the degree of overlap of the bounding box coordinates, and then based on the intersection-union ratio loss in the second discrimination result, the discrimination dimension corresponding to the degree of overlap of the bounding box coordinates can be determined.
  • the second regression loss component is determined by the target intersection loss between The intersection-union ratio loss generates a second discrimination result, so that the second discrimination result can represent the degree of overlap of the bounding box coordinates, and then based on the intersection-union ratio loss in the second discrimination result, the discrimination dimension corresponding to the degree of overlap of the bounding box coordinates can be determined.
  • S1044 Determine the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result in the discrimination result set corresponding to each of the first initial bounding boxes.
  • the sub-regression loss value corresponding to each first initial bounding box can be obtained.
  • the sub-regression loss value at least includes: the third sub-regression loss value considered from the perspective of the similarity of the bounding box distribution.
  • a set of discrimination results corresponding to an initial bounding box includes a first discrimination result, and correspondingly, based on the first regression loss component corresponding to the first discrimination result, a sub-regression loss value corresponding to the first initial bounding box is determined.
  • the gradient descent method is used to adjust the parameters of the generative sub-model and the discriminant sub-model based on the above-mentioned regression loss value; among them, due to the sub-regression loss value It at least reflects the first regression loss component corresponding to the regression loss discrimination dimension based on the similarity of the bounding box distribution, and the second regression loss component corresponding to the regression loss discrimination dimension based on the coincidence degree of the bounding box coordinates. Therefore, it is used to perform model parameters.
  • the adjusted regression loss value also reflects the regression loss components corresponding to the two regression loss discriminant dimensions, so that the final trained target detection model can not only ensure that the predicted first predicted bounding box is closer to the probability distribution of the real bounding box , can also ensure that the coordinates of the first predicted bounding box and the real bounding box coincide more closely.
  • the discriminant sub-model tries to distinguish the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, which come from real data or generated data respectively, minimizing the regression loss of the model to be trained, and in order to Maximize the resolution error of the discriminant sub-model, force the generative sub-model to continuously learn the bounding box distribution, and promote multiple rounds of adversarial learning between the generative sub-model and the discriminant sub-model, thereby obtaining a more accurate generative sub-model as a target detection model.
  • a schematic diagram of the specific implementation principle of the training process of a target detection model is given, which specifically includes: obtaining a first preset number of first initial bounding boxes, and obtaining the true corresponding to each first initial bounding box. Bounding box; for each first initial bounding box: the above-mentioned generating sub-model performs boundary box prediction based on the first initial bounding box to obtain the first predicted bounding box; the above-mentioned discriminating sub-model is based on the true boundary corresponding to the above-mentioned first initial bounding box The first prediction bounding box corresponding to the first initial bounding box and the above-mentioned first initial bounding box generates a set of discrimination results;
  • the model to be trained is prompted to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the real bounding box, so that It can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring that the target detection model can be used to detect new targets.
  • Detect the target detection accuracy of the image and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model.
  • the model to be trained is determined regression loss value, and then continue to generate sub-models based on the regression loss value
  • the model parameters of the type and discriminant sub-models are updated iteratively for multiple rounds until the current model training results meet the preset model training end conditions, that is, based on the generation-discrimination multi-round confrontation method, the bounding box distribution is continuously learned, in which the discriminant sub-model can determine the generation Whether the first predicted bounding box predicted by the sub-model is realistic enough.
  • the set of discrimination results output by the sub-model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates, so as to compensate for the deviation caused by the similarity of the bounding box distribution but the specific position.
  • the effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
  • the above-mentioned judgment result set also includes a third judgment result; correspondingly, in the above-mentioned S1042, the real bounding box corresponding to the above-mentioned first initial bounding box and the third corresponding to the first initial bounding box are A predicted bounding box, generating a set of discrimination results, specifically including: judging the authenticity of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box to obtain the first discrimination result; based on the above first initial boundary The real bounding box and the first predicted bounding box corresponding to the frame are calculated, and the intersection and union ratio loss of the bounding box is calculated to obtain the second discrimination result; based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, the calculation for the treatment
  • the loss gradient of the regression loss function of the training model is constrained by the regression loss compensation value to obtain the third discrimination result.
  • the set of discrimination results corresponding to the first initial bounding box includes not only the first discrimination result obtained from the perspective of the similarity of the distribution of the bounding boxes and the second discrimination result obtained from the perspective of the degree of coincidence of the bounding box coordinates.
  • the result also includes the constraint corresponding to the first discriminant dimension.
  • the regression loss compensation value of the gradient of the regression loss can not only improve the accuracy of the regression loss value, but also solve the problem that the gradient of the regression loss corresponding to the first discriminant dimension suddenly decreases or even becomes zero.
  • a schematic diagram of the specific implementation principle of the training process of another target detection model including: using the preset region of interest extraction model to extract the target area from the sample image data set in advance to obtain N anchor boxes;
  • the sample image data set includes multiple original sample images, each original sample image includes at least one target object;
  • the feature information corresponding to each anchor frame can include position information (x, y, w, h) and category information c, That is (x, y, w, h, c); during the model training process, multiple parameter dimensions can be set to be independent of each other. Therefore, the iterative training process of the model parameters for each dimension is also independent of each other. .
  • each of the first initial bounding boxes in the sample image data set is A target object can correspond to a real bounding box.
  • the real bounding boxes corresponding to multiple first initial bounding boxes containing the same target object can be the same, that is, based on the target object enclosed by the first initial bounding box, the real bounding box is expanded to obtain m real bounding boxes.
  • the target object contained in a certain original sample image is a cat A
  • cat A corresponds to the real bounding box A. If the number of first initial bounding boxes containing cat A is 4 ( For example, the first initial bounding box with serial numbers 6, 7, 8, and 9), then the real bounding box A is expanded into four real bounding boxes A (that is, the real boundary boxes with serial numbers 6, 7, 8, and 9).
  • the generating sub-model For each first initial bounding box, the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the corresponding third bounding box.
  • a predicted bounding box generates a set of discrimination results; where each first initial bounding box corresponds to a real bounding box and a first predicted bounding box, and the first predicted bounding box is a generative sub-model learned through continuous bounding box regression Predicted; generated sub-model output
  • the target object circled by the first prediction bounding boxes numbered 6, 7, 8, and 9 among the m first prediction bounding boxes is cat A.
  • the first regression loss component is determined based on the first discrimination result in the discrimination result set of the first initial bounding box, and the first regression loss component is determined based on the second discrimination result in the discrimination result set of the first initial bounding box.
  • two regression loss components, and a third regression loss component determined based on the third discrimination result in the discrimination result set of the first initial bounding box.
  • the regression loss value of the model to be trained Based on the first regression loss component, the second regression loss component and the third regression loss component respectively corresponding to each first initial bounding box, determine the regression loss value of the model to be trained; use the stochastic gradient descent method to adjust the above-mentioned regression loss value based on the regression loss value
  • the model parameters of the generative sub-model and the discriminant sub-model are obtained, and the updated generative sub-model and discriminant sub-model are obtained.
  • the above updated generated sub-model is determined as the trained target detection model.
  • the above-mentioned updated generation sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.
  • the model parameters of the discriminating sub-model can be adjusted based on the discriminating result set, and the model parameters of the generating sub-model can be adjusted based on the discriminating result set; however, during specific implementation, In order to improve the training accuracy of the model parameters of the generated sub-model, for each round of model training, the model parameters of the discriminant sub-model are adjusted t times based on the set of discriminant results, and then the model parameters of the generated sub-model are adjusted based on the set of discriminant results. Once, the parameter-adjusted discriminant sub-model and generative sub-model are obtained as the next round of training models.
  • the regression loss value of the model to be trained is jointly determined based on the sub-regression loss values corresponding to multiple first initial bounding boxes, and the sub-regression loss value corresponding to each first initial bounding box is jointly determined based on the multiple regression loss components.
  • the above-mentioned S1044 determines the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to each of the above-mentioned first initial bounding boxes, specifically including: determining the regression loss value corresponding to each first initial bounding box.
  • Sub-regression loss value each first initial boundary
  • the sub-regression loss value corresponding to the box is determined based on the target information, where the target information includes one or a combination of the following: the similarity of the bounding box distribution represented by the first discrimination result corresponding to the first initial bounding box, the similarity of the distribution of the bounding box represented by the second discrimination result.
  • the degree of coincidence of bounding box coordinates represented and the regression loss compensation value represented by the third discrimination result; based on the sub-regression loss value corresponding to each first initial bounding box, the regression loss value of the model to be trained is determined.
  • the first regression loss component corresponding to the first discrimination result may be considered, or the first regression loss component corresponding to the first discrimination result may be considered at the same time.
  • the loss component and the second regression loss component corresponding to the second discrimination result can also be considered at the same time.
  • the first regression loss component corresponding to the first discrimination result, the second regression loss component corresponding to the second discrimination result, and the regression corresponding to the third discrimination result can also be considered at the same time.
  • ⁇ 1 represents the first weight coefficient corresponding to the first regression loss component under the first discriminant dimension
  • V i1 represents the first regression loss component under the first discriminant dimension (that is, the boundary represented by the first discriminant result).
  • ⁇ 2 represents the second weight coefficient corresponding to the second regression loss component under the second discriminant dimension
  • V i2 represents the second regression loss component under the second discriminant dimension (i.e.
  • the first discriminant dimension may be a regression loss discriminant dimension based on the similarity of bounding box distributions, and the second discriminant dimension may be a regression loss discriminant dimension based on the coincidence degree of bounding box coordinates.
  • the first weight coefficient and the second weight coefficient may remain unchanged.
  • the regression loss discriminant dimension that is, the regression loss discriminant dimension based on the similarity of the bounding box distribution and the regression loss discriminant dimension based on the coincidence degree of the bounding box coordinates
  • the focus of regression loss consideration in the regression loss discriminant dimension is also different (for example, the regression loss discriminant dimension based on the similarity of the bounding box distribution focuses on the regression loss of the first initial bounding box corresponding to the real bounding box with blurred edge of the bounding box,
  • the regression loss discrimination dimension based on the degree of coincidence of bounding box coordinates focuses on the regression loss of the first initial bounding box that considers the distribution of bounding boxes is similar but the specific position deviation).
  • the size relationship between the first regression loss component and the second regression loss component reflects which regression loss discriminant dimension can more accurately characterize the regression loss between the real bounding box and the first predicted bounding box. Based on this, for each first initial bounding box, according to the first initial boundary The size relationship between the first regression loss component and the second regression loss component corresponding to the frame, adjust the size of the first weight coefficient and the second weight coefficient; if the absolute value of the difference between the first regression loss component and the second regression loss component is not is greater than the preset loss threshold, then the first weight coefficient and the second weight coefficient remain unchanged; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is greater than the second regression loss component, then increase the first weight coefficient according to the first preset adjustment method; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first If the regression loss component is smaller than the second regression loss
  • the increase range of the first weight coefficient corresponding to the above-mentioned first preset adjustment method and the increase range of the second weight coefficient corresponding to the second preset adjustment method may be the same or different, and the increase range of the weight coefficient may be Set according to actual needs, and this application does not limit this.
  • the authenticity of the boundary box and the first predicted boundary box corresponding to the above-mentioned first initial boundary box are judged to obtain the first
  • the judgment results include:
  • Step A1 based on the real bounding box corresponding to the first initial bounding box, determine the first discriminant probability that the real bounding box is predicted to be true by the discriminant sub-model; and based on the first discriminant probability corresponding to the first initial bounding box.
  • Step A2 Generate a first discrimination result corresponding to the first initial bounding box based on the first discrimination probability and the second discrimination probability corresponding to the first initial bounding box.
  • the above discriminant sub-model determines the probability that the real bounding box corresponding to the first initial bounding box comes from real data, that is, for the real bounding box, the discriminant sub-model determines the authenticity of the real bounding box.
  • Discriminate to obtain the first discriminant probability that the predicted real bounding box is real data similarly, for each first initial bounding box, use the above discriminant sub-model to determine that the first predicted bounding box corresponding to the first initial bounding box comes from the generated data
  • the probability that is, the value 1 minus the probability that the discriminant model determines that the first predicted bounding box comes from real data
  • the discriminant model performs a true or false judgment on the first predicted bounding box, and we get Predict the first predicted bounding box to be the second discriminant probability of the generated data.
  • the discriminator model compares the first probability distribution corresponding to the real bounding box and the second probability distribution corresponding to the first predicted bounding box from the perspective of the similarity of the bounding box distribution, so as to realize the comparison between the real bounding box and the first predicted bounding box. Carry out authenticity discrimination and obtain the corresponding discrimination probability.
  • This discrimination probability can represent the distribution similarity between the real bounding box and the corresponding first predicted bounding box.
  • the first discrimination result can be obtained, where the first discrimination result can represent the similarity of the bounding box distribution; further, based on the first discrimination result, the first regression loss component corresponding to the discrimination dimension that represents the similarity of the bounding box distribution can be determined , where the greater the first discriminant probability and the second discriminant probability are, the lower the distribution similarity between the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box.
  • the distribution of the corresponding first initial bounding box is The greater the first regression loss component; then, the model parameters of the generation sub-model are updated based on the first regression loss component, so that the generation results of the generation sub-model can optimize the loss value of the model to be trained after being predicted by the discriminant sub-model, achieving optimization
  • the purpose of generating a sub-model is to improve the bounding box prediction effect of the generated sub-model.
  • the above-mentioned step A2 based on the above-mentioned first initial boundary
  • the first discrimination probability and the second discrimination probability corresponding to the frame are generated to generate the first discrimination result, which specifically includes:
  • Step A21 Determine the first weighted probability based on the above-mentioned first discriminant probability and the first prior probability of the real bounding box corresponding to the first initial bounding box; and based on the above-mentioned second discriminant probability and the second prior probability of the first initial bounding box.
  • the experimental probability is determined to determine the second weighted probability.
  • Step A22 Generate a first discrimination result based on the first weighted probability and the second weighted probability corresponding to the first initial bounding box.
  • the first prior probability of the real bounding box and the second prior probability of the first initial bounding box are considered, and the discriminant sub-model is used for the true bounding box and the second prior probability of the first initial bounding box, respectively.
  • the first predicted bounding box is judged as true or false, and the obtained first judgment probability and second judgment probability are weighted to determine the first judgment result (that is, the first judgment result may include the first weighted probability and the second weighted probability), Therefore, the first regression loss component related to the similarity of the bounding box distribution obtained based on the first discrimination result can be expressed as:
  • the prior probability that the i-th first original bounding box appears can be the prior probability that the i-th first original bounding box appears. Since the first predicted bounding box is predicted by the generating sub-model based on the first original bounding box, therefore, It is also possible to provide a prior probability for the i-th first predicted bounding box occurrence.
  • the first prior probability and the second prior probability can be obtained in the following way:
  • ⁇ 1 represents the variance of the distribution probability of the first preset number of real bounding boxes
  • ⁇ 2 represents the variance of the distribution probability of the first preset number of first initial bounding boxes
  • the above regression loss value is equal to the sum of the sub-regression loss values corresponding to the first preset number of first initial bounding boxes. Specifically, it can be expressed as:
  • N reg represents the first preset number
  • i represents the serial number of the first initial bounding box
  • the value of i is 1 to N reg .
  • the second judgment result specifically includes:
  • Step B1 Calculate the intersection and union ratio loss of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box to obtain the first intersection and union ratio loss.
  • Step B2 Based on the first intersection-union ratio loss, determine the second discrimination result corresponding to the first initial bounding box.
  • the second discrimination result can be obtained based on the intersection-union ratio loss between the real bounding box and the first predicted bounding box, thus Based on the second discrimination result, the second regression loss component corresponding to the discrimination dimension considered from the perspective of the coincidence degree of the bounding box coordinates is determined, thereby prompting the model to perform bounding box regression learning.
  • the first intersection-union ratio loss between the real bounding box and its corresponding first predicted bounding box can be considered.
  • the accuracy of the second regression loss component corresponding to the discriminant dimension considered in the angle of the boundary box coordinate coincidence degree thereby improving the accuracy of the regression loss value used to adjust the model parameters, not only considering the first predicted boundary box corresponding to the real boundary box and itself
  • the first cross-union loss between the real bounding box and other first predicted bounding boxes also considers the second cross-union loss between the real bounding box and other first predicted bounding boxes. This can achieve the goal of distinguishing the real bounding box from the positive sample (i.e. through bounding box regression).
  • the first predicted bounding box corresponding to a certain real bounding box learned through learning) and the negative sample is compared on the discriminant dimension of the coincidence degree of the bounding box coordinates to learn the specific position representation of the real bounding box, thereby prompting the model to better perform bounding box regression learning.
  • the above step B2 based on the above first intersection Ratio loss, determine the second discrimination result corresponding to the above-mentioned first initial bounding box, specifically including:
  • B21 Determine a set of comparison bounding boxes among the first predicted bounding boxes respectively corresponding to the first preset number of first initial bounding boxes.
  • the comparison bounding box set includes other first predicted bounding boxes other than the first predicted bounding box corresponding to the first initial bounding box, or other first predicted bounding boxes that do not include the target object enclosed by the first initial bounding box. Predict bounding boxes.
  • the comparison boundary box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i (i.e., the first predicted bounding box with the serial number k).
  • the above comparison bounding box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i, and the other first predicted bounding boxes do not include the first predicted bounding box with the serial number i
  • the target object enclosed by the first initial bounding box of i i.e.
  • the target object enclosed by the first initial bounding box is the same), that is, only other first predicted bounding boxes that contain different target objects from the first initial bounding box with serial number i are used as the negative of the real bounding box with serial number i.
  • first initial bounding box with serial number i As an example, for each other first predicted bounding box in the comparison boundary box set, calculate the difference between the real bounding box with serial number i and the first predicted bounding box with serial number k. The intersection-union ratio loss is obtained, and the second intersection-union ratio loss corresponding to the first predicted bounding box with serial number k is obtained.
  • the first intersection and union ratio loss is calculated based on the real bounding box with serial number i and the first predicted bounding box with serial number i, and based on the first predicted bounding box with serial number i,
  • the real bounding box and the first predicted bounding box with serial number k calculate the second intersection and union ratio loss (k ⁇ p) to determine the second discrimination result (that is, the second discrimination result can include the first intersection and union ratio loss and the second Intersection and union ratio loss), then, based on the second discrimination result, the second regression loss component related to the degree of coincidence of the bounding box coordinates can be determined.
  • the model parameters can be adjusted based on the second regression loss component to make the real object with serial number i
  • the bounding box has a higher degree of coincidence with the coordinates of the first predicted bounding box numbered i, which makes the coordinates of the bounding box coincide with other first predicted bounding boxes smaller, thereby enhancing the global nature of the bounding box regression learning and further improving the bounding box regression. Learning accuracy.
  • the above-mentioned second regression loss component is the logarithm of the target intersection-union ratio loss.
  • the loss gradient of the regression loss function of the regression loss function of the model to be trained is calculated based on the real boundary box and the first predicted boundary box corresponding to the first initial boundary box.
  • Constrained regression loss compensation value specifically including:
  • Step C1 Generate a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box.
  • the coordinate information set determines the sampling coordinate information set; based on the sampling coordinate information set, the synthetic bounding box with serial number i is determined.
  • Step C2 Determine the regression loss compensation value based on the similarity of the bounding box distribution between the synthetic bounding box corresponding to the first initial bounding box and the real bounding box.
  • the above-mentioned step C1 generates a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box and the first predicted bounding box corresponding to the above-mentioned first initial bounding box.
  • Bounding box specifically including:
  • C11 Determine the first coordinate information subset based on the first sampling ratio and the first coordinate information set of the real bounding box corresponding to the first initial bounding box.
  • C12 Determine the second coordinate information subset based on the second sampling ratio and the second coordinate information set of the first predicted bounding box corresponding to the first initial bounding box; wherein, it should be noted that the above-mentioned first sampling ratio and the third The second sampling ratio may be preset according to the actual situation, and the sum of the first sampling ratio and the second sampling ratio is equal to 1.
  • C13 Generate a synthetic bounding box corresponding to the first initial bounding box based on the first coordinate information subset and the second coordinate information subset.
  • the bounding box drawn based on the sampling coordinate information set is the synthetic bounding box with serial number i; among them, since the synthetic bounding box is based on the coordinate information (that is, real data) and serial number of the real bounding box with serial number i is the coordinate information of the first predicted bounding box of i (that is, the generated data), the bounding box obtained by random sampling and mixing.
  • the synthetic bounding box is determined by both real data and generated data and has a certain degree of randomness, so that it can be used in the first discriminant dimension.
  • the gradient of the corresponding regression loss suddenly decreases or even becomes zero
  • the gradient of the regression loss value is compensated to avoid the sudden decrease of the gradient of the regression loss corresponding to the first discriminant dimension during the model training process, or even becomes zero. Zero causes the problem of a sudden decrease in the gradient of the regression loss value, thereby further improving the training accuracy of the model parameters.
  • the target detection model not only needs to determine the location of the target object, but also needs to determine the specific category of the target object. Therefore, during the training process of the target detection model, there may be some first initial bounding boxes for some
  • the problem of low accuracy in category identification is that considering the first initial bounding box with low accuracy in category prediction, the first predicted bounding box corresponding to such first initial bounding box may not truly reflect the generated sub-model. Bounding box prediction accuracy, and then for the discrimination results between the first predicted bounding box and the actual bounding box corresponding to such first original bounding box, the discriminant sub-model cannot truly reflect the bounding box prediction accuracy of the generated sub-model.
  • the first predicted category corresponding to the first predicted bounding box is considered, and only the true category corresponding to the first predicted bounding box is the same as If the first prediction category matches, its corresponding sub-regression loss value will be considered. Otherwise, only its corresponding sub-category loss value will be considered, that is, the first initial bounding box corresponding to the category prediction result that does not meet the preset requirements will be excluded. Sub-regression loss value.
  • the above-mentioned model to be trained also includes a classification sub-model; the specific implementation method of each model training also includes: the above-mentioned classification sub-model performs classification processing on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box. , obtain the first prediction category; in specific implementation, the classification sub-model performs category prediction on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box, and the output result may be the first category prediction result; wherein, the first category prediction result Including the predicted probability that the target object enclosed by the first initial bounding box or the first predicted bounding box belongs to each candidate category.
  • the candidate category corresponding to the maximum predicted probability is the first predicted category, that is, the first initial bounding box or the first predicted boundary.
  • the category of the target object enclosed by the frame is predicted by the classification sub-model as the first prediction category, that is, the target object category of the image area within the first initial bounding box or the first prediction bounding box is predicted by the classification sub-model as the first prediction category;
  • the position information of the first initial bounding box and the first predicted bounding box will not deviate greatly, the position information within the first initial bounding box
  • the image features will not deviate greatly from the image features in the first predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box.
  • the first predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding first category prediction result can be obtained, that is, the first predicted bounding box is first obtained based on the first initial bounding box prediction, and then the first predicted bounding box is Perform category prediction to obtain the first category prediction result; and for the situation where bounding box prediction and category prediction are executed simultaneously, the first initial bounding box can also be input into the classification sub-model for category prediction to obtain the corresponding first category prediction result. , that is, the first predicted bounding box is obtained based on the prediction of the first initial bounding box, and the category prediction is performed on the first initial bounding box to obtain the first category prediction result.
  • model parameters of the above classification sub-model can refer to the existing classification model training process, and will not be described again here.
  • the above target information also includes a matching relationship between the first predicted category corresponding to the first initial bounding box and the true category of the first initial bounding box, wherein, for the determination process of the sub-regression loss value corresponding to each first initial bounding box, If the first predicted category corresponding to the first initial bounding box does not match the true category, then the sub-regression loss value corresponding to the first initial bounding box is zero; if the first predicted category corresponding to the first initial bounding box matches the true category matching, then the sub-regression loss value corresponding to the above-mentioned first initial bounding box is based on the first regression loss component corresponding to the above-mentioned boundary box distribution similarity, the second regression loss component corresponding to the above-mentioned bounding box coordinate coincidence degree and the above-mentioned regression loss compensation value The sub-regression loss value determined by at least one of them.
  • the preset category matching constraints that determine whether the first predicted category corresponding to the first initial bounding box matches the true category may be related to the first category prediction result, and specifically may include: constraints in a single matching method, or constraints in changing matching methods. Conditions, among which, for the constraints of a single matching method, the category matching constraints used in each round of model training remain unchanged (that is, independent of the current round of model training), for example, for each round of model training , if the real category is the same as the first predicted category, then it is determined that the first predicted category corresponding to the first initial bounding box matches the real category; for the constraints of the changing matching method, the category matching used in each round of model training
  • the matching constraints are related to the number of current model training rounds.
  • the constraints that change the matching method can be divided into category matching stage constraints or category matching gradient constraints.
  • the above-mentioned category matching stage-type constraint may be that when the current model training round number is less than the first preset round number, the real category and the first predicted category belong to the same category group, and when the current model training round number is greater than or equal to the first preset round number, the real category and the first predicted category belong to the same category group.
  • the real category is the same as the first predicted category, that is, based on the category matching staged constraints and the category prediction result corresponding to the first initial bounding box, the staged category matching constraint can be realized; the above category matching gradient constraint
  • the condition may be that the sum of the first constraint term and the second constraint term is greater than the preset probability threshold, the first constraint term is the first prediction probability corresponding to the true category in the category prediction probability subset, and the second constraint term is the category prediction probability subset except The product of the sum of the second predicted probabilities other than the first predicted probability and the preset adjustment factor.
  • the preset adjustment factor gradually decreases as the current number of training rounds increases, that is, based on the category matching gradient constraints and the first initial
  • the category prediction result corresponding to the bounding box can realize the gradual category matching constraint;
  • a category prediction probability subset is determined based on the category prediction result corresponding to the first initial bounding box, and the category prediction probability subset includes the target circled by the first prediction bounding box
  • the first predicted probability that the object belongs to the real category, and the second predicted probability that the object belongs to the non-real category in the target group, that is, the category predicted probability subset includes a classification sub-model that classifies the first initial bounding box or the first predicted bounding box.
  • the target group is the category group where the real category is located; in the specific implementation, multiple candidate categories associated with the target detection task are predetermined, and based on the semantic information of each candidate category, the multiple candidate categories are divided into groups to obtain multiple category groups.
  • the first initial bounding box is obtained by extracting the area of interest using a preset area of interest extraction model, it may be that the area where the target object is delineated by the first initial bounding box is not accurate enough, resulting in model training. In the early stage, the category recognition of the first predicted bounding box corresponding to such a first initial bounding box is inaccurate. Based on this, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, the first initial bounding box is referred to The corresponding first prediction category is the same as The matching relationship between the real categories of the first initial bounding box is determined based on the above-mentioned preset category matching constraints and is used to determine whether the first predicted category corresponding to the first initial bounding box matches the real category.
  • the classification sub-model can be pre-trained, or the model parameters of the classification sub-model can be trained simultaneously during the training process of generating the model parameters of the sub-model, that is, based on the first predicted category and the true category.
  • the classification loss value is used to iteratively train the model parameters of the classification sub-model based on the classification loss value. In view of the situation of synchronous training of the model parameters of the classification sub-model, it is also considered that it may be due to the early stage of model training.
  • the accuracy of the model parameters in the classification sub-model is low, resulting in inaccurate category identification of the first predicted bounding box corresponding to the first initial bounding box.
  • the above-mentioned preset category matching constraints may include: the above-mentioned constraints on the changing matching method (such as category matching stage-type constraints, or category matching gradient constraints);
  • the preset category matching constraints gradually transform from limiting the first predicted category to fall into the target group to limiting the first predicted category to be the same as the real category.
  • the above-mentioned preset category matching constraints include: category matching gradient constraints.
  • the category matching gradient constraint in the case where the above-mentioned preset category matching constraint is a category matching gradient constraint, still taking the first initial bounding box with the serial number i as an example, the category matching gradient constraint can be expressed as:
  • groups represents the target group
  • real i represents the real category of the first initial bounding box with serial number i in the target group groups
  • f ⁇ groups ⁇ real i represents the non-real category in the target group
  • represents the prediction adjustment factor
  • represents the second prediction probability represents the above-mentioned second constraint item
  • represents the above-mentioned preset probability threshold
  • the first constraint term determines whether the first predicted category matches the real category, and then after the current number of model training rounds reaches a certain number of model training rounds, the second constraint term becomes zero. , that is, when When it is greater than the preset probability threshold, it means that the classification sub-model determines the true category as the first predicted category.
  • the above-mentioned preset adjustment factor decreases as the number of current model training rounds increases. If the current number of model training rounds is less than or equal to the target number of training rounds, then the above-mentioned second constraint term is positively related to the preset adjustment factor, The above-mentioned preset adjustment factor is negatively related to the current number of model training rounds; if the current number of model training rounds is greater than the target number of training rounds, then the above-mentioned second constraint is zero, where the target number of training rounds is less than the total number of training rounds.
  • a linear decreasing adjustment method can be used to gradually reduce the value of the preset adjustment factor ⁇ . Therefore, for the determination of the preset adjustment factor used in current model training The process, specifically:
  • the first preset value can be set according to actual needs.
  • the above category matching gradient constraints can be:
  • the decreasing formula corresponding to the above factor decreasing adjustment method can be:
  • the first item 1 in represents the first preset value (i.e., the preset adjustment factor ⁇ used in the first round of training), ⁇ represents the current model training round number, and Z represents the target training round number, that is, the target training round number can be the total
  • the number of training rounds is reduced by 1, or it can be the specified number of training rounds.
  • the specified number of training rounds is less than the total number of training rounds.
  • the difference between the total number of training rounds and the specified number of training rounds is the preset number of rounds Q.
  • the above reduction formula can be: That is, in the last round of model training, the preset adjustment factor is set to Set to 0, that is, the judgment conditions used in the last round of model training are all
  • the decrease formula shown above is only a relatively simple linear decrease adjustment method. In the actual application process, the decrease rate of the preset adjustment factor ⁇ can be set according to actual needs. Therefore, the above decrease formula does not It does not constitute a limitation on the scope of protection of this application.
  • the above-mentioned model to be trained includes a generation sub-model, a discriminant sub-model and a classification sub-model, as shown in Figure 4b, which provides a schematic diagram of the specific implementation principle of the training process of another target detection model, including:
  • the generating sub-model predicts the bounding box based on the first initial bounding box to obtain the first predicted bounding box;
  • the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box generates a set of discrimination results;
  • the classification sub-model performs category prediction on the first predicted bounding box to obtain the category prediction result; according to the preset category matching constraints, the real bounding box corresponding to the first initial bounding box
  • the category prediction result of the true category and the first predicted bounding box corresponding to the first initial bounding box determines the category matching result; if the category matching result indicates that the first predicted category and the true category do not satisfy the preset category matching constraints, then the The sub-regression loss value corresponding to the first initial bounding box is zero; if the category matching result represents that the first predicted category and the true category satisfy the preset category matching constraints, then the first discrimination in the set of discrimination results based on the first initial bounding box As a result
  • the determination process of the above category matching results may be performed by a separate processing module Execution can also be executed by the discriminant sub-model.
  • Execution can also be executed by the discriminant sub-model.
  • the real bounding box corresponding to the initial bounding box and the corresponding first predicted bounding box generate a set of discrimination results, which can further improve the model training efficiency; with reference to what is shown in Figure 4b, the real category corresponding to each real bounding box and each third
  • the category prediction result corresponding to a predicted bounding box is input to the discriminator model; the discriminator model predicts the category prediction result of the first predicted bounding box corresponding to the first initial bounding box based on the true category of the real bounding box corresponding to the first initial bounding box, Determine the category matching result; if the category matching result represents that the first predicted category and the true category do not meet the preset
  • the sub-regression determined based on the discrimination result set The loss value is zero; if the category matching result represents that the first predicted category and the real category satisfy the preset category matching constraints, then a discrimination result is generated based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; therefore, the sub-regression loss value determined based on the discrimination result set is based on the first regression loss component corresponding to the first discrimination result in the discrimination result set, the second regression loss component corresponding to the second discrimination result, and the third discrimination result Determined by the corresponding third regression loss component;
  • the discrimination result can be generated directly based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; and then determine the category prediction result to determine the matching relationship between the first predicted category and the real category (that is, the category matching result indicates whether the first predicted category and the real category satisfy the preset category matching constraints); if the matching relationship is that the category is not matching, then determine the corresponding sub-regression loss value to be zero. If the matching relationship is category matching, then determine the corresponding sub-regression loss value based on multiple discrimination results in the discrimination result set; you can also first determine the first one based on the category prediction results.
  • the matching relationship between the predicted category and the real category If the matching relationship is category mismatch, then it is determined that the corresponding discrimination result set is empty or preset information, and the corresponding sub-regression loss value is determined to be zero. If the matching relationship is category matching, then Generate a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, and determine the pair based on multiple discrimination results in the set of discrimination results. The corresponding sub-regression loss value.
  • the above updated generated sub-model is determined as the trained target detection model; if the current model training results do not meet the preset model training end conditions, then the The above updated generative sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.
  • the target detection model training method in the embodiment of the present application is based on the real bounding box and the first initial bounding box, prompting the model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer Based on the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring the use of the target
  • the detection model's target detection accuracy for new images to be detected improves the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the set of discrimination results output by the discriminating sub-model.
  • adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving
  • the second discrimination result is to compensate for the problem caused by similar distribution of bounding boxes but specific position deviation.
  • the effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
  • FIG. 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application.
  • the method in Figure 5 can be executed by an electronic device provided with a target detection device, which may be a terminal device or a designated server, wherein the hardware device for target detection (ie, the electronic device provided with the target detection device) and the target
  • the hardware device for detection model training that is, the electronic device equipped with the target detection model training device
  • the method at least includes the following steps:
  • S502 Obtain a third preset number of second initial bounding boxes; wherein the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.
  • the process of obtaining the third preset number of second initial bounding boxes may be referred to the above-mentioned process of obtaining the first preset number of first initial bounding boxes, which will not be described again here.
  • the above target detection model includes a classification sub-model and a generation sub-model; for each second initial bounding box: during the target detection process, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the corresponding second initial bounding box.
  • the second predicted bounding box; the classification sub-model performs classification processing on the second initial bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second initial bounding box.
  • the classification sub-model performs category prediction on the above-mentioned second initial bounding box or the above-mentioned second predicted bounding box, and the output result may be a second category prediction result; wherein the second category prediction result includes the second initial bounding box or the second predicted bounding box.
  • the circled target object belongs to the predicted probability of each candidate category. rate, the candidate category corresponding to the maximum prediction probability is the second prediction category, that is, the category of the target object enclosed by the second initial bounding box or the second prediction bounding box is predicted by the classification sub-model as the second prediction category, that is, the second prediction category.
  • the target object category of the image area within the initial bounding box or the second predicted bounding box is predicted by the classification sub-model as the second predicted category; in addition, during specific implementation, the position information of the second initial bounding box and the second predicted bounding box is taken into account There will not be a large deviation, and the image features in the second initial bounding box will not deviate greatly from the image features in the second predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box, based on Therefore, for the situation where bounding box prediction and category prediction are performed sequentially, the second predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding second category prediction result is obtained, that is, based on the second initial bounding box prediction.
  • the second initial bounding box can also be input to the classification sub-model Category prediction is performed in the method to obtain the corresponding second category prediction result, that is, the second predicted bounding box is obtained based on the second initial bounding box prediction, and category prediction is performed on the second initial bounding box to obtain the second category prediction result.
  • S506 Generate a target detection result of the image to be detected based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box;
  • the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined.
  • the image to be detected contains a A cat, a dog and a pedestrian.
  • the above target detection model includes a generation sub-model and a classification sub-model.
  • a schematic diagram of the specific implementation principle of the target detection process is given, which specifically includes: using the preset region of interest extraction model to target the image to be detected. Extract and obtain P anchor boxes; randomly sample n anchor boxes from the P anchor boxes as the second initial bounding box; for each second initial bounding box, generate a sub-model to predict the bounding box based on the second initial bounding box. , obtain the second predicted bounding box; the classification sub-model predicts the category of the second predicted bounding box, and obtains the second predicted category;
  • the target detection model trained based on the above target detection model training method can be applied to any specific application scenario that requires target detection on the image to be detected, where the image to be detected can be set at a certain on-site location.
  • the target detection device can belong to the image acquisition device, and specifically can be an image processing device in the image acquisition device.
  • the image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, And perform target detection on the image to be detected; the target detection device can also be a separate target detection device independent of the image acquisition device.
  • the target detection device receives the image to be detected from the image acquisition device and performs target detection on the image to be detected. .
  • the image to be detected can be collected by an image collection device installed at the entrance of a certain public place (such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.).
  • a certain public place such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.
  • the target object to be detected in the image to be detected is the target user who enters the public place.
  • the above target detection model is used to perform target detection on the image to be detected, so as to delineate the second target user who enters the public place in the image to be detected.
  • the second prediction category corresponding to the second predicted bounding box that is, the category of the target user included in the second predicted bounding box, such as at least one of age group, gender, height, and occupation
  • the user group identification result is determined based on the target detection result (such as the flow of people entering the public place, or the attributes of the user group entering the public place, etc.), and then, based on the user group identification result, the corresponding Business processing (such as automatically triggering admission restriction prompt operations, or pushing information to target users, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the target detection of the image to be detected output by the target detection model is The accuracy of the results will be higher. Therefore, the accuracy of triggering corresponding business processing based on the target detection results will be higher.
  • the image to be detected can be collected by image acquisition equipment installed at each monitoring point in a certain breeding base.
  • the target object to be detected in the image to be detected is the target breeding object in the breeding monitoring point.
  • the second prediction category corresponding to the bounding box that is, the category of the target breeding object contained in the second prediction bounding box, such as at least one of living status and body size
  • the detection results determine the identification results of the breeding object group (such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.), and then perform corresponding control operations based on the identification result of the breeding object group ( If a decrease in survival rate is detected, an alarm message will be automatically issued, or if a slowdown in growth rate is detected, the feeding amount or frequency will be automatically increased, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the The accuracy of the target detection results of the image to be detected outputted by the target detection model will be higher. Therefore, the accuracy of triggering corresponding control operations based on the target detection results will be higher.
  • the target detection method in the embodiment of the present application during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third preset number of candidate bounding boxes among the candidate bounding boxes as The second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, Obtain the second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to the second initial bounding box, generate the target detection result of the image to be detected; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bounding box prediction
  • this embodiment in this application is based on the same inventive concept as the previous embodiment in this application. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the aforementioned target detection model training method, and repeated details will not be repeated.
  • FIG. 7 shows the target detection model training device provided by the embodiment of the present application.
  • the first bounding box acquisition module 702 is configured to acquire a first initial bounding box and acquire a real bounding box corresponding to the first initial bounding box; the first initial bounding box is extracted using a preset region of interest model pair The sample image data set is obtained by extracting the target area.
  • the model training module 704 is configured to input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model.
  • the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box.
  • the discrimination result set includes a first discrimination result and a second discrimination result
  • the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box
  • the second The discrimination result represents the overlap degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to each of the first initial bounding boxes, the model to be trained is determined
  • the regression loss value perform parameter updates on the generator sub-model and the discriminant sub-model based on the regression loss value.
  • the target detection model training device in the embodiment of the present application in the model training stage, is based on The real bounding box and the first initial bounding box prompt the model to be trained to continuously learn the distribution of bounding boxes, making the predicted first predicted bounding box closer to the real bounding box. This can not only improve the accuracy of the trained target detection model in the image to be detected.
  • the accuracy of the bounding box prediction of the location of the target object can also improve the generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model and improving the target after training.
  • Detect the data migration adaptability of the model; and the model to be trained includes a generating sub-model and a discriminating sub-model.
  • the regression loss value of the model to be trained is determined, and then the generating sub-model is continuously generated based on the regression loss value.
  • the model parameters of the model and the discriminator model are updated iteratively in multiple rounds until the current model training results meet the preset model training end conditions, that is, the bounding box distribution is continuously learned based on multiple rounds of generation and discrimination confrontation, in which the discriminator model can determine the generator Whether the first predicted bounding box predicted by the model is realistic enough.
  • the set of discrimination results output by the model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates to compensate for the boundary caused by the similarity of the bounding box distribution but the specific position deviation.
  • the effect of the frame regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
  • the embodiment of the target detection model training device in this application and the embodiment of the target detection model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the corresponding target detection model mentioned above. The implementation of training methods will not be repeated again.
  • FIG. 8 shows the target detection device provided by the embodiment of the present application. Schematic diagram of the module composition of the device. The device is used to perform the target detection method described in Figures 5 to 6. As shown in Figure 8, the device includes:
  • the second bounding box acquisition module 802 is configured to acquire a third preset number of second initial bounding boxes; the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.
  • the target detection module 804 is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to each of the second initial bounding boxes;
  • the detection result generation module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding box and the second prediction category corresponding to each of the second initial bounding box.
  • the target detection device in the embodiment of the present application during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third predicted number of candidate bounding boxes among the candidate bounding boxes as the third Two initial bounding boxes; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, and obtains second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to each second initial bounding box, the target detection result of the image to be detected is generated; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bound
  • embodiments of the present application also provide a computer device, which is used to execute the above-mentioned target detection model training method or target detection method, As shown in Figure 9.
  • Computer equipment may vary greatly due to different configurations or performance, and may include one or more processors 901 and memory 902, and the memory 902 may store one or more storage application programs or data.
  • the memory 902 may be short-term storage or persistent storage.
  • the application program stored in memory 902 may include one or more modules (not shown), and each module may include a series of computer-executable instructions on a computer device.
  • the processor 901 may be configured to communicate with the memory 902 and execute a series of computer-executable instructions in the memory 902 on the computer device.
  • the computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, etc.
  • the computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device.
  • a series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for obtaining a first predetermined number of first initial bounding boxes, and obtaining The real bounding box corresponding to the first initial bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; The first initial bounding box and the real bounding box are input into the model to be trained for iterative model training until the current model training result satisfies the preset model training end conditions, and a target detection model is obtained.
  • the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box.
  • the discrimination result set includes a first discrimination result and a second discrimination result
  • the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box
  • the second The discrimination result represents the degree of coincidence of the bounding box coordinates of the first predicted bounding box and the real bounding box
  • the model to be trained is determined Regression loss value; perform parameter update on the generating sub-model and the discriminating sub-model based on the regression loss value.
  • the computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device.
  • a series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for: obtaining a third predetermined number of second initial bounding boxes; The second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; the second initial bounding box is input into the target detection model for target detection, and the corresponding second initial bounding box is obtained. a second predicted bounding box and a second predicted category; based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, a target detection result of the image to be detected is generated.
  • the computer device in the embodiment of the present application prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box is closer to the real boundary
  • This can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve High generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model, and improving the data migration adaptability of the trained target detection model; and the model to be trained includes generation
  • the sub-model and the discriminant sub-model determine the regression loss value of the model to be trained based on the set of discrimination results output by the discriminant sub-model, and then continuously update the model parameters of the generating sub-model and the discriminant sub-model in multiple rounds of iterations based on the regression loss value.
  • the discriminator sub-model can determine whether the first predicted bounding box predicted by the generation sub-model is real enough.
  • the generated bounding box i.e., the first predicted bounding box
  • adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the prediction of the generated sub-model.
  • the obtained first predicted bounding box is closer to the real bounding box, thereby further improving the model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discrimination results output by the discriminating sub-model not only includes representations of similar bounding box distributions
  • the first discrimination result of degree also includes the second discrimination result that represents the degree of coincidence of bounding box coordinates, achieving the effect of making up for the bounding box regression loss caused by similar distribution of bounding boxes but specific position deviation, so that the regression obtained based on the discrimination result set
  • the loss value is more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, Then randomly sample candidate bounding boxes among the candidate bounding boxes as the second initial bounding box; for each second initial bounding box, the generating sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; classification
  • embodiments of the present application also provide a storage medium for storing computer executable instructions.
  • the The storage medium can be a U disk, an optical disk, a hard disk, etc.
  • the real bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box to be trained
  • the model undergoes model iterative training until the current model training results meet the preset model training end conditions, and the target detection model is obtained.
  • the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box.
  • the discrimination result set includes a first discrimination result and a second discrimination result
  • the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box
  • the second The discrimination result represents the degree of coincidence of the bounding box coordinates of the first predicted bounding box and the real bounding box
  • the model to be trained is determined Regression loss value; perform parameter update on the generating sub-model and the discriminating sub-model based on the regression loss value.
  • the storage medium can be a U disk, an optical disk, a hard disk, etc.
  • the following process can be implemented: obtain the second initial bounding box;
  • the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; inputting the second initial bounding box into the target detection model for target detection, the second initial bounding box is obtained the corresponding second predicted bounding box and the second predicted category; based on the second predicted bounding box corresponding to the second initial bounding box and the second predicted detection category, and generate target detection results of the image to be detected.
  • the model to be trained is prompted to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the prediction The obtained first predicted bounding box is closer to the real bounding box, which not only improves the accuracy of the trained target detection model in predicting the bounding box at the location of the target object in the image to be detected, but also improves the accuracy of the trained target detection model.
  • the model to be trained includes a generative sub-model and a discriminant sub-model, based on The set of discrimination results output by the discriminant sub-model determines the regression loss value of the model to be trained, and then continuously updates the model parameters of the generating sub-model and the discriminant sub-model for multiple rounds of iterations based on the regression loss value until the current model training results meet the preset
  • the end condition of model training is to continuously learn the bounding box distribution based on multiple rounds of generative and discriminative confrontation, in which the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is realistic enough.
  • the generated bounding box i.e., the first
  • adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be more accurate.
  • the set of discriminating results output by the discriminating sub-model not only includes the first discriminating result that characterizes the similarity of the bounding box distribution, but also Including the second discrimination result that represents the degree of coincidence of the bounding box coordinates, it achieves the effect of making up for the bounding box regression loss caused by the similar distribution of the bounding boxes but the specific position deviation, making the regression loss value obtained based on the discrimination result set more accurate, thus It can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly sample among the candidate bounding boxes The candidate bounding box is used as the second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable memory
  • the instructions in produce an article of manufacture that includes instruction means to implement the functions specified in the process or processes of the flowchart and/or the block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, disk storage or other magnetic storage devices, or any other non-transmission medium, can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • Embodiments of the present application may be implemented in the general context of computer-executable instructions executed by a computer.
  • Description such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present application provide a target detection model training method and apparatus, and a target detection method and apparatus. In a model training stage, on the basis of real bounding boxes and first initial bounding boxes, a model to be trained is caused to continuously learn bounding box distribution, so that first predicted bounding boxes are closer to the real bounding boxes; the model to be trained comprises a generation sub-model and a discrimination sub-model, a regression loss value is determined on the basis of discrimination result sets outputted by the discrimination sub-model, and then iterative updating is continuously performed on model parameters on the basis of the regression loss value; and each discrimination result set simultaneously comprises discrimination results representing a bounding box distribution similarity and a bounding box coordinate coincidence degree.

Description

目标检测模型训练方法、目标检测方法及装置Target detection model training method, target detection method and device
交叉引用cross reference
本发明要求在2022年07月15日提交中国专利局、申请号为202210831208.2、发明名称为“目标检测模型训练方法、目标检测方法及装置”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。This invention requires the priority of a Chinese patent application submitted to the China Patent Office on July 15, 2022, with the application number 202210831208.2 and the invention name "Target Detection Model Training Method, Target Detection Method and Device". The entire content of the application has been approved. This reference is incorporated herein by reference.
技术领域Technical field
本申请涉及目标检测领域,尤其涉及一种目标检测模型训练方法、目标检测方法及装置。The present application relates to the field of target detection, and in particular, to a target detection model training method, target detection method and device.
背景技术Background technique
随着人工智能技术的快速发展,通过预先训练的目标检测模型对某一图像中进行目标检测,从而预测得到图像中包含的各个目标所在边界框的坐标信息和分类信息的需求越来越高;然而,现有的目标检测模型的训练过程中,通过图像特征提取进行模型参数训练,导致对于样本图像数据集而言,训练得到的目标检测模型的模型参数的准确度比较高,但对于待目标检测图像而言,训练得到的目标检测模型的模型参数的准确度会有所降低,导致模型应用阶段的目标检测准确度比较低。With the rapid development of artificial intelligence technology, there is an increasing demand for detecting targets in an image through pre-trained target detection models to predict the coordinate information and classification information of the bounding boxes of each target contained in the image; However, in the training process of the existing target detection model, the model parameters are trained through image feature extraction. As a result, for the sample image data set, the accuracy of the model parameters of the trained target detection model is relatively high, but for the target to be For detection images, the accuracy of the model parameters of the trained target detection model will be reduced, resulting in a relatively low target detection accuracy in the model application stage.
发明内容Contents of the invention
本申请的目的是提供一种目标检测模型训练方法、目标检测方法及装置。The purpose of this application is to provide a target detection model training method, target detection method and device.
一方面,本申请提供的一种目标检测模型训练方法,所述方法包括:获取第一初始边界框,以及获取所述第一初始边界框对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将所述第一初始边界框和所述真实边界框输入待训练模型 进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型;其中,所述待训练模型包括生成子模型和判别子模型;所述模型迭代训练中的每次模型训练包括:所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。On the one hand, this application provides a method for training a target detection model. The method includes: obtaining a first initial bounding box, and obtaining a real bounding box corresponding to the first initial bounding box; the first initial bounding box is Obtained by extracting the target area from the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box into the model to be trained Carry out model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training The method includes: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first The first predicted bounding box corresponding to the initial bounding box generates a set of discrimination results; the set of discrimination results includes a first discrimination result and a second discrimination result, and the first discrimination result represents the difference between the first predicted bounding box and the real The similarity of the bounding box distribution of the bounding box, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result corresponding to the first initial bounding box and the second discrimination result, determining the regression loss value of the model to be trained; performing parameter updates on the generating sub-model and the discriminating sub-model based on the regression loss value.
一方面,本申请提供的一种目标检测方法,所述方法包括:获取第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二初始边界框对应的第二预测边界框和第二预测类别;基于所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。On the one hand, this application provides a target detection method. The method includes: obtaining a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset area of interest extraction model. ; Input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to the second initial bounding box; based on the second predicted bounding box corresponding to the second initial bounding box The second predicted bounding box and the second predicted category generate a target detection result of the image to be detected.
一方面,本申请提供的一种目标检测模型训练装置,所述装置包括:第一边界框获取模块,被配置为获取第一初始边界框,以及获取所述第一初始边界框分别对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;模型训练模块,被配置为将所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型;其中,所述待训练模型包括生成子模型和判别子模型;所述模型迭代训练中的每次模型训练包括:针对每个所述第一初始边界框:所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别 子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。On the one hand, this application provides a target detection model training device. The device includes: a first bounding box acquisition module configured to acquire a first initial bounding box, and acquire the real corresponding corresponding first initial bounding boxes. Bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; a model training module configured to combine the first initial bounding box and the real The bounding box is input to the model to be trained for model iterative training until the current model training results meet the preset model training end conditions to obtain the target detection model; wherein the model to be trained includes a generating sub-model and a discriminating sub-model; the model is iteratively trained Each model training in includes: for each first initial bounding box: the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discrimination The sub-model generates a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box; the set of discrimination results includes a first discrimination result and a second discrimination result. As a result, the first discrimination result represents the similarity of the bounding box distribution between the first predicted bounding box and the real bounding box, and the second discrimination result represents the similarity between the first predicted bounding box and the real bounding box. The degree of coincidence of bounding box coordinates; determining the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box; based on the regression loss value, the generated sub-model and The discriminant model performs parameter updating.
一方面,本申请提供的一种目标检测装置,所述装置包括:第二边界框获取模块,被配置为获取第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;目标检测模块,被配置为将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二初始边界框对应的第二预测边界框和第二预测类别;检测结果生成模块,被配置为基于所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。On the one hand, this application provides a target detection device. The device includes: a second bounding box acquisition module configured to acquire a second initial bounding box; the second initial bounding box is extracted using a preset region of interest. The model is obtained by extracting the target area of the image to be detected; the target detection module is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box corresponding to the second initial bounding box and A second prediction category; a detection result generation module configured to generate a target detection result of the image to be detected based on the second prediction bounding box corresponding to the second initial bounding box and the second prediction category.
一方面,本申请提供的一种计算机设备,所述设备包括:处理器;以及被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所述处理器执行,所述可执行指令包括用于执行如上述方法中的步骤。In one aspect, the present application provides a computer device, the device comprising: a processor; and a memory arranged to store computer-executable instructions, the executable instructions being configured to be executed by the processor, the executable instructions Instructions include steps for performing the methods as described above.
一方面,本申请实施例提供的一种存储介质,其中,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行如上述方法中的步骤。On the one hand, embodiments of the present application provide a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions cause the computer to perform steps in the above method.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获 得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only are some of the embodiments recorded in one or more of the present application. For those of ordinary skill in the art, without exerting any creative effort, they can also obtain results based on these drawings. Get other pictures.
图1为本申请实施例提供的目标检测模型训练方法的流程示意图;Figure 1 is a schematic flow chart of a target detection model training method provided by an embodiment of the present application;
图2为本申请实施例提供的目标检测模型训练方法中每次模型训练过程的流程示意图;Figure 2 is a schematic flow chart of each model training process in the target detection model training method provided by the embodiment of the present application;
图3为本申请实施例提供的目标检测模型训练方法的第一种实现原理示意图;Figure 3 is a schematic diagram of the first implementation principle of the target detection model training method provided by the embodiment of the present application;
图4a为本申请实施例提供的目标检测模型训练方法的第二种实现原理示意图;Figure 4a is a schematic diagram of the second implementation principle of the target detection model training method provided by the embodiment of the present application;
图4b为本申请实施例提供的目标检测模型训练方法的第三种实现原理示意图;Figure 4b is a schematic diagram of the third implementation principle of the target detection model training method provided by the embodiment of the present application;
图5为本申请实施例提供的目标检测方法的流程示意图;Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application;
图6为本申请实施例提供的目标检测方法的实现原理示意图;Figure 6 is a schematic diagram of the implementation principle of the target detection method provided by the embodiment of the present application;
图7为本申请实施例提供的目标检测模型训练装置的模块组成示意图;Figure 7 is a schematic diagram of the module composition of the target detection model training device provided by the embodiment of the present application;
图8为本申请实施例提供的目标检测装置的模块组成示意图;Figure 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application;
图9为本申请实施例提供的计算机设备的结构示意图。Figure 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请一个或多个中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一个或多个一部分实施例,而不是全部的实施例。基于本申请一个或多个中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请的保护范围。In order to enable those skilled in the art to better understand one or more technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, , the described embodiments are only one or more partial embodiments of the present application, rather than all embodiments. Based on one or more embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the protection scope of this application.
需要说明的是,在不冲突的情况下,本申请中的一个或多个实施例以及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请实施例。 It should be noted that, without conflict, one or more embodiments and features in the embodiments of the present application can be combined with each other. The embodiments of the present application will be described in detail below with reference to the accompanying drawings and embodiments.
考虑到如果通过使用深度网络提取特征,促使模型学习边界框中图像特征,不断学习预测边界框与真实边界框中的图像特征相似程度,进行模型参数调整,这样训练后的目标检测模型比较依赖于模型训练阶段所使用的样本数据集,目标检测模型的泛化性差、模型跨数据迁移能力差,势必会存在目标检测模型对样本数据集的目标检测准确度高,而对于新的待检测图像数据的目标检测准确度低的问题,基于此,本申请在模型训练阶段,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值对生成子模型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;又考虑到如果仅仅从边界框分布相似程度的粗粒度比对维度确定模型回归损失,进行模型参数调整,则无法兼顾边界框的精确位置学习,或者仅仅从边界框坐标重合程度的细粒度比对维度确定模型回归损失,进行模型参数调整,则无法兼顾边界框的边缘模糊性问题,基于此,通过将从边界框分布相似程度的粗粒度比对维度和边界框坐标重合程度的细粒度比对维度相结合的方式,确定模型回归损失,即判别子模型所 输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到同时考虑边界框分布相似但具体位置偏差的边界框所带来的回归损失、以及边缘模糊性的真实边界框对应的第一预测边界框所带来的回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度。Considering that by using a deep network to extract features, the model is prompted to learn the image features in the bounding box, continuously learn the similarity between the predicted bounding box and the image features in the real bounding box, and adjust the model parameters, so that the trained target detection model is more dependent on In the sample data set used in the model training phase, the target detection model has poor generalization and poor cross-data migration capabilities. There is bound to be a target detection model that has high target detection accuracy for the sample data set, but for new image data to be detected. The problem of low target detection accuracy. Based on this, in the model training stage, this application prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box Closer to the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring Use the target detection model to detect targets in new images to be detected, and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the discrimination output by the discriminating sub-model The results are collected to determine the regression loss value of the model to be trained, and then the model parameters of the generating sub-model and the discriminating sub-model are continuously updated for multiple rounds of iterations based on the regression loss value, until the current model training results meet the preset model training end conditions, that is, based on The method of generating and discriminating multiple rounds of confrontation continuously learns the bounding box distribution, in which the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is different from the real When the bounding boxes are difficult to distinguish, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thus Further improve the model parameter update efficiency and bounding box distribution learning accuracy of the generated sub-model; it is also considered that if the model regression loss is determined only from the coarse-grained comparison dimension of the similarity of the bounding box distribution and the model parameters are adjusted, the bounding box cannot be taken into account Precise position learning, or only determining the model regression loss from the fine-grained comparison dimension of the bounding box coordinate coincidence degree, and adjusting the model parameters, will not be able to take into account the edge ambiguity of the bounding box. Based on this, by similarity from the bounding box distribution The model regression loss is determined by combining the coarse-grained comparison dimension of the extent and the fine-grained comparison dimension of the bounding box coordinate coincidence degree, which is the discriminant sub-model. The set of output discrimination results not only includes the first discrimination result that characterizes the similarity of the distribution of bounding boxes, but also includes the second discrimination result that characterizes the degree of coincidence of the coordinates of the bounding boxes, so as to simultaneously consider the effects of bounding boxes with similar distribution of bounding boxes but specific position deviations. The regression loss and the effect of the regression loss caused by the first predicted bounding box corresponding to the real bounding box of edge ambiguity make the regression loss value obtained based on the discrimination result set more accurate, thus further improving the accuracy of the regression loss based on the regression The accuracy of the model parameters after the loss value is updated.
图1为本申请一个或多个实施例提供的目标检测模型训练方法的第一种流程示意图,图1中的方法能够由设置有目标检测模型训练装置的电子设备执行,该电子设备可以是终端设备或者指定服务器,其中,用于目标检测模型训练的硬件装置(即设置有目标检测模型训练装置的电子设备)与目标检测的硬件装置(即设置有目标检测装置的电子设备)可以相同或不同。需要说明的是,基于本申请实施例提供的目标检测模型训练方法训练得到的目标检测模型可以应用到任一需要对待检测图像进行目标检测的具体应用场景,例如,具体应用场景1,对利用某一公共场所入口(如商场入口、地铁口、景点入口、或演出现场入口等)的图像采集设备所采集得到的待检测图像进行目标检测,又如,具体应用场景2,对利用某一养殖基地中各监控点的图像采集设备所采集得到的待检测图像进行目标检测。Figure 1 is a first flow diagram of a target detection model training method provided by one or more embodiments of the present application. The method in Figure 1 can be executed by an electronic device equipped with a target detection model training device. The electronic device can be a terminal. Device or designated server, wherein the hardware device used for target detection model training (i.e., the electronic device provided with the target detection model training device) and the hardware device for target detection (i.e., the electronic device provided with the target detection device) may be the same or different . It should be noted that the target detection model trained based on the target detection model training method provided by the embodiment of the present application can be applied to any specific application scenario that requires target detection on the image to be detected. For example, specific application scenario 1, for using a certain The image to be detected collected by the image acquisition equipment at the entrance of a public place (such as the entrance of a shopping mall, subway entrance, entrance to an attraction, or entrance to a performance site, etc.) is used for target detection. Another example is specific application scenario 2, which uses a certain breeding base. The images to be detected collected by the image acquisition equipment at each monitoring point are used for target detection.
其中,由于目标检测模型的具体应用场景的不同,目标检测模型训练过程所使用的样本图像数据集也有所不同,针对具体应用场景1,样本图像数据集可以是预设历史时间段内在指定公共场所入口采集得到的历史样本图像,对应的,第一初始边界框所圈定的目标对象为历史样本图像中进入该指定公共场所的目标用户,真实类别和第一预测类别可以是目标用户所属类别,如年龄段、性别、身高、职业中至少一项;针对具体应用场景2,样本图像数据集可以是预设历史时间段内在指定养殖基地中各监控点采集得到的历史样本图像,对应的,第一初始边界框所圈定的目标对象为历史样本图像中的目标养殖对象,真实类别和第一预测类别可以是目标养殖对象所属类别,如活 体状态、体型大小中至少一项。Among them, due to the different specific application scenarios of the target detection model, the sample image data sets used in the training process of the target detection model are also different. For the specific application scenario 1, the sample image data set can be a designated public place within a preset historical time period. For the historical sample image collected at the entrance, correspondingly, the target object circled by the first initial bounding box is the target user who entered the designated public place in the historical sample image. The real category and the first predicted category can be the category to which the target user belongs, such as At least one of age group, gender, height, and occupation; for specific application scenario 2, the sample image data set can be historical sample images collected at each monitoring point in the designated breeding base within the preset historical time period, corresponding to the first The target object circled by the initial bounding box is the target breeding object in the historical sample image. The real category and the first predicted category can be the category of the target breeding object, such as live At least one of physical condition and body size.
针对目标检测模型的训练过程,如图1所示,至少包括以下步骤:The training process of the target detection model, as shown in Figure 1, includes at least the following steps:
S102,获取第一预设数量的第一初始边界框,以及获取各第一初始边界框对应的真实边界框;上述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的。S102, obtain a first preset number of first initial bounding boxes, and obtain a real bounding box corresponding to each first initial bounding box; the above-mentioned first initial bounding box is performed on the sample image data set using a preset region of interest extraction model. The target area is extracted.
针对第一预设数量的第一初始边界框的确定过程,可以是针对模型迭代训练中的每轮模型训练,执行一次利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取的步骤,得到第一预设数量的第一初始边界框;也可以是预先执行利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取的步骤,然后针对模型迭代训练中的每轮模型训练,从预先提取的大量候选边界框随机采样,得到第一预设数量的第一初始边界框。The process of determining the first preset number of first initial bounding boxes may include, for each round of model training in the model iterative training, performing a step of extracting the target area from the sample image data set using the preset region of interest extraction model. , to obtain a first preset number of first initial bounding boxes; it may also be performed in advance using a preset region of interest extraction model to extract the target area from the sample image data set, and then for each round of model training in the model iterative training , randomly sampling from a large number of pre-extracted candidate bounding boxes to obtain a first preset number of first initial bounding boxes.
样本图像数据集中可以包含多个样本目标对象,每个样本目标对象可以对应于多个第一初始边界框,即第一预设数量的第一初始边界框包含各样本目标对象分别对应的至少一个第一初始边界框。The sample image data set may contain multiple sample target objects, and each sample target object may correspond to multiple first initial bounding boxes. That is, the first preset number of first initial bounding boxes includes at least one corresponding to each sample target object. First initial bounding box.
在上述获取第一预设数量的第一初始边界框之前,还包括:将样本图像数据集输入预设感兴趣区域提取模型进行感兴趣区域提取,得到第二预设数量的候选边界框;其中,第二预设数量等于或大于第一预设数量,上述第一预设数量为所述第一初始边界框的数量,即针对第二预设数量等于第一预设数量的情况,针对上述模型迭代训练中的每轮模型训练,均利用预设感兴趣区域提取模型,对样本图像数据集中的多个样本图像数据进行感兴趣区域提取,得到第一预设数量的第一初始边界框;针对第二预设数量大于第一预设数量的情况,针对述模型迭代训练中的每轮模型训练,从第一预设数量的候选边界框中随机采样得到第一预设数量的第一初始边界框。Before obtaining the first preset number of first initial bounding boxes, the method further includes: inputting the sample image data set into a preset region of interest extraction model to perform region of interest extraction to obtain a second preset number of candidate bounding boxes; wherein , the second preset number is equal to or greater than the first preset number, and the first preset number is the number of the first initial bounding boxes, that is, for the situation where the second preset number is equal to the first preset number, for the above In each round of model training in the model iterative training, a preset region of interest extraction model is used to extract regions of interest from multiple sample image data in the sample image data set, and a first preset number of first initial bounding boxes are obtained; For the situation where the second preset number is greater than the first preset number, for each round of model training in the iterative training of the model, randomly sample the first preset number of first initial bounding boxes from the first preset number of candidate bounding boxes. bounding box.
其中,考虑到在模型训练过程中目的之一是通过对模型参数迭代训练不断学习边界框分布,从而提高模型的泛化性和数据可迁移性(即模型参数不依赖于模型训练过程使用的样本数据,能够更好地适用于模型应用过程的待 识别数据),为了促使待训练模型能够更好地学习边界框分布,需要确保预设感兴趣区域提取模型提取的输入到待训练模型的第一初始边界框服从一定概率分布(如高斯分布或者柯西分布),这样利用预设感兴趣区域提取模型提取的锚框的数量N越大,越有助于待训练模型更好地进行边界框分布学习,然而如果每次均实时利用预设感兴趣区域提取模型(如感兴趣区域提取算法ROI)提取N个锚框作为第一初始边界框,输入到待训练模型中进行模型训练,势必会导致数据处理量比较大,对硬件设备要求比较高。Among them, considering that one of the purposes in the model training process is to continuously learn the bounding box distribution through iterative training of model parameters, thereby improving the generalization and data transferability of the model (that is, the model parameters do not depend on the samples used in the model training process data, which can better suit the needs of the model application process. identification data), in order to promote the model to be trained to better learn the bounding box distribution, it is necessary to ensure that the first initial bounding box extracted by the preset region of interest extraction model and input to the model to be trained obeys a certain probability distribution (such as Gaussian distribution or Kohl's distribution). Western distribution), in this way, the larger the number N of anchor boxes extracted by the preset region of interest extraction model, the more helpful it is for the model to be trained to better learn the bounding box distribution. However, if the preset region of interest is used in real time every time The region extraction model (such as the region of interest extraction algorithm ROI) extracts N anchor boxes as the first initial bounding box and inputs them into the model to be trained for model training, which will inevitably lead to a relatively large amount of data processing and relatively high requirements for hardware equipment.
在具体实施时,优选地是预先利用预设感兴趣区域提取模型提取N个锚框,然后,每一轮模型训练从N个锚框中随机采样m个作为第一初始边界框,输入到待训练模型中进行模型训练,这样既能够确保每轮模型训练的数据处理量,也能够确保模型更好地进行边界框分布学习,即能够在兼顾模型训练过程中的数据处理量的同时,促使边界框分布学习,基于此,上述第二预设数量大于第一预设数量,上述第一预设数量为所述第一初始边界框的数量,对应的,上述获取第一初始边界框,具体包括:从上述第二预设数量的候选边界框中,随机选取第一预设数量的候选边界框作为第一初始边界框,即预先利用预设感兴趣区域提取模型,对样本图像数据集中的多个样本图像数据进行感兴趣区域提取,得到第二预设数量的候选边界框;然后,针对每轮模型训练,从第二预设数量的候选边界框中随机采样得到第一预设数量的第一初始边界框。In specific implementation, it is preferable to use a preset region of interest extraction model to extract N anchor boxes in advance, and then, in each round of model training, m are randomly sampled from the N anchor boxes as the first initial bounding boxes, and are input to the to-be- Model training is performed in the training model, which can not only ensure the data processing volume of each round of model training, but also ensure that the model can better learn the bounding box distribution, that is, it can promote the boundary while taking into account the data processing volume during the model training process. Frame distribution learning, based on this, the above-mentioned second preset number is greater than the first preset number, and the above-mentioned first preset number is the number of the first initial bounding boxes. Correspondingly, the above-mentioned acquisition of the first initial bounding box specifically includes : From the above-mentioned second preset number of candidate bounding boxes, randomly select a first preset number of candidate bounding boxes as the first initial bounding box, that is, use the preset region of interest extraction model in advance to extract multiple objects in the sample image data set. The region of interest is extracted from the sample image data to obtain a second preset number of candidate bounding boxes; then, for each round of model training, a first preset number of candidate bounding boxes are randomly sampled from the second preset number. An initial bounding box.
也就是说,优选的实施方式为,预先提取N个锚框(即第二预设数量的候选边界框),然后,针对每轮模型训练,均从N个锚框中随机采样m个锚框(即第一预设数量的第一初始边界框),然后继续执行下述步骤S104。That is to say, a preferred implementation is to pre-extract N anchor boxes (i.e., a second preset number of candidate bounding boxes), and then, for each round of model training, randomly sample m anchor boxes from the N anchor boxes. (i.e., a first preset number of first initial bounding boxes), and then continue to perform the following step S104.
S104,将上述第一初始边界框和真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型;上述预设模型训练结束条件可以包括:当前模型训练轮数等于总训练轮数、模型损失函数收敛、或者生成子模型和判别子模型之间达到平衡。 S104. Input the above-mentioned first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain the target detection model; the above preset model training end conditions may include: The current number of model training rounds is equal to the total number of training rounds, the model loss function converges, or a balance is reached between the generative sub-model and the discriminative sub-model.
其中,针对上述步骤S104中的模型迭代训练过程,下述对模型迭代训练的具体实现过程进行说明,由于模型迭代训练过程中每次模型训练的处理过程相同,因此,以任意一次模型训练为例进行细化说明。若上述待训练模型包括生成子模型和判别子模型;如图2所示,上述模型迭代训练中的每次模型训练可以包括如下步骤S1042至步骤S1046:Among them, regarding the model iterative training process in step S104, the specific implementation process of the model iterative training is explained below. Since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example. Provide detailed explanation. If the above-mentioned model to be trained includes a generator sub-model and a discriminant sub-model; as shown in Figure 2, each model training in the iterative training of the above-mentioned model may include the following steps S1042 to step S1046:
S1042,针对每个第一初始边界框:生成子模型基于第一初始边界框进行边界框预测,得到第一预测边界框;判别子模型基于第一初始边界框对应的真实边界框和第一初始边界框对应的第一预测边界框,生成判别结果集合;上述判别结果集合包括第一判别结果和第二判别结果,上述第一判别结果表征上述第一预测边界框与上述真实边界框的边界框分布相似程度,上述第二判别结果表征上述第一预测边界框与上述真实边界框的边界框坐标重合程度。S1042, for each first initial bounding box: the generating sub-model performs bounding box prediction based on the first initial bounding box and obtains the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial bounding box. The first predicted bounding box corresponding to the bounding box generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the boundary box between the first predicted bounding box and the real boundary box The degree of distribution similarity, the above-mentioned second discrimination result represents the degree of coincidence of the bounding box coordinates of the above-mentioned first predicted bounding box and the above-mentioned real bounding box.
针对表征边界框分布相似程度的第一判别结果的确定过程,可以直接计算真实边界框与对应的第一预测边界框之间的KL散度(Kullback-Leibler divergence);然而,在具体实施时,考虑到判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,因此,为了进一步提高边界框分布相似程度对应的回归损失分量的准确度,进而确保目标检测模型预测得到的第一预测边界框更加真实,也可以针对每个第一初始边界框,借助判别子模型判别该第一初始边界框对应的真实边界框与对应的第一预测边界框,分别来自于真实数据还是生成数据的判别概率,由于判别概率的大小与两个边界框(即真实边界框与对应的第一预测边界框)的概率分布接近程度有关,因此,判别概率能够表征真实边界框与对应的第一预测边界框之间的分布相似程度,从而基于判别概率,能够确定从边界框分布相似程度角度考量的判别维度对应的第一回归损失分量,进而促使模型进行边界框回归学习;具体 的,针对某一第一初始边界框对应的真实边界框和第一预测边界框而言,判别子模型判别真实边界框来自于真实数据的判别概率、以及判别第一预测边界框来自于生成数据的判别概率,判别子模型判别真实边界框来自于真实数据的判别概率越大、第一预测边界框来自于生成数据的判别概率越大,说明第一预测边界框与对应的真实边界框的概率分布相似程度越低,针对边界框分布相似程度的判别维度而言对应的第一回归损失分量越大,因此,某一第一初始边界框对应的第一预测边界框与对应的真实边界框的分布相似程度,是判别子模型针对真实边界框与第一预测边界框分别来自于真实数据还是生成数据的判别概率所确定的,因此,可以基于判别子模型的判别概率生成第一判别结果,这样第一判别结果能够表征边界框分布相似程度,进而基于第一判别结果中的判别概率即可确定边界框分布相似程度的判别维度对应的第一回归损失分量。For the determination process of the first discrimination result that represents the similarity of the bounding box distribution, the KL divergence (Kullback-Leibler divergence) between the real bounding box and the corresponding first predicted bounding box can be directly calculated; however, in specific implementation, Considering that the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is real enough, when the generated bounding box (i.e., the first predicted bounding box) is indistinguishable from the real bounding box, due to the discriminant sub-model existence, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box. Therefore, in order to further improve the corresponding regression of bounding box distribution similarity The accuracy of the loss component, thereby ensuring that the first predicted bounding box predicted by the target detection model is more realistic, or for each first initial bounding box, the discriminator model can be used to distinguish between the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box comes from the discriminant probability of real data or generated data respectively. Since the size of the discriminant probability is related to the proximity of the probability distribution of the two bounding boxes (i.e., the real bounding box and the corresponding first predicted bounding box) , therefore, the discriminant probability can characterize the distribution similarity between the real bounding box and the corresponding first predicted bounding box, so based on the discriminant probability, the first regression loss component corresponding to the discriminant dimension considered from the perspective of the boundary box distribution similarity can be determined , thereby prompting the model to perform bounding box regression learning; specifically , for the real bounding box and the first predicted bounding box corresponding to a certain first initial bounding box, the discriminator model determines that the real bounding box comes from the discriminant probability of the real data, and determines that the first predicted bounding box comes from the generated data. The greater the discriminant probability of the discriminator model's discriminant judgment that the real bounding box comes from real data, and the greater the discriminant probability that the first predicted bounding box comes from generated data, indicating the probability of the first predicted bounding box and the corresponding real bounding box The lower the distribution similarity, the greater the corresponding first regression loss component in terms of the discriminant dimension of the bounding box distribution similarity. Therefore, the difference between the first predicted bounding box corresponding to a certain first initial bounding box and the corresponding real bounding box The degree of distribution similarity is determined by the discriminant sub-model's discriminant probability of whether the real bounding box and the first predicted bounding box come from real data or generated data respectively. Therefore, the first discriminant result can be generated based on the discriminant sub-model's discriminant probability, so The first discrimination result can represent the similarity degree of the bounding box distribution, and then based on the discrimination probability in the first discrimination result, the first regression loss component corresponding to the discrimination dimension of the boundary box distribution similarity degree can be determined.
针对表征边界框坐标重合程度的第二判别结果的确定过程,可以仅考虑某一真实边界框与对应的第一预测边界框之间的交并比损失,得到目标交并比损失;也可以综合考虑某一真实边界框与对应的第一预测边界框之间的交并比损失、以及某一真实边界框与其他真实边界框对应的第一预测边界框之间的交并比损失,确定目标交并比损失;由于目标交并比损失的大小能够表征真实边界框与对应的第一预测边界框之间的坐标重合程度,从而基于目标交并比损失能够确定从边界框坐标重合程度角度考量的判别维度对应的第二回归损失分量,进而促使模型进行边界框回归学习;具体的,针对某一第一初始边界框对应的真实边界框和第一预测边界框而言,确定真实边界框与第一预测边界框之间的目标交并比损失,目标交并比损失越大,说明第一预测边界框与对应的真实边界框的坐标重合程度越低,针对边界框坐标重合程度的判别维度而言对应的第二回归损失分量越大,因此,某一第一初始边界框对应的第一预测边界框与对应的真实边界框的坐标重合程度是基于真实边界框与第一预测边界框之间的目标交并比损失所确定的,因此,可以基于目标 交并比损失生成第二判别结果,这样第二判别结果能够表征边界框坐标重合程度,进而基于第二判别结果中的标交并比损失,即可确定边界框坐标重合程度的判别维度对应的第二回归损失分量。For the determination process of the second discrimination result that represents the degree of coincidence of bounding box coordinates, only the intersection-union ratio loss between a certain real bounding box and the corresponding first predicted bounding box can be considered to obtain the target intersection-union ratio loss; it can also be comprehensive Considering the intersection loss between a certain real bounding box and the corresponding first predicted bounding box, and the intersection loss between a certain real bounding box and the first predicted bounding box corresponding to other real bounding boxes, determine the target Intersection-to-Union Ratio Loss; Since the size of the target Intersection-to-Union Ratio loss can represent the degree of coordinate coincidence between the real bounding box and the corresponding first predicted bounding box, based on the target Intersection-to-Union Ratio loss, it can be determined from the perspective of the boundary box coordinate coincidence degree. The second regression loss component corresponding to the discriminant dimension of The target intersection and union ratio loss between the first predicted bounding boxes. The greater the target intersection and union ratio loss, the lower the degree of coordinate coincidence between the first predicted bounding box and the corresponding real bounding box. The discriminant dimension for the degree of coordinate coincidence of the bounding box The larger the corresponding second regression loss component is, therefore, the degree of coordinate coincidence between the first predicted bounding box corresponding to a certain first initial bounding box and the corresponding real bounding box is based on the relationship between the real bounding box and the first predicted bounding box. is determined by the target intersection loss between The intersection-union ratio loss generates a second discrimination result, so that the second discrimination result can represent the degree of overlap of the bounding box coordinates, and then based on the intersection-union ratio loss in the second discrimination result, the discrimination dimension corresponding to the degree of overlap of the bounding box coordinates can be determined. The second regression loss component.
S1044,基于各上述第一初始边界框对应的判别结果集合中的第一判别结果和第二判别结果,确定待训练模型的回归损失值。S1044: Determine the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result in the discrimination result set corresponding to each of the first initial bounding boxes.
在针对每个第一初始边界框分别得到判别结果集合之后,即可得到各第一初始边界框对应的子回归损失值,该子回归损失值至少包括:从边界框分布相似程度角度考量的第一判别维度对应的第一回归损失分量、从边界框坐标重合程度角度考量的第二判别维度对应的第二回归损失分量;然后,基于各第一初始边界框对应的子回归损失值,即可确定用于对模型参数进行调整的回归损失值。After obtaining the set of discrimination results for each first initial bounding box, the sub-regression loss value corresponding to each first initial bounding box can be obtained. The sub-regression loss value at least includes: the third sub-regression loss value considered from the perspective of the similarity of the bounding box distribution. The first regression loss component corresponding to one discriminant dimension, the second regression loss component corresponding to the second discriminant dimension considered from the perspective of the coincidence degree of the bounding box coordinates; then, based on the sub-regression loss value corresponding to each first initial bounding box, Determine the regression loss value used to adjust model parameters.
在确定第一初始边界框对应的子回归损失值的过程中,可以同时从边界框分布相似程度和边界框坐标重合程度的角度考量,也可以仅从边界框分布相似程度角度考量,即上述第一初始边界框对应的判别结果集合包括第一判别结果,对应的,基于第一判别结果对应的第一回归损失分量,确定第一初始边界框对应的子回归损失值。In the process of determining the sub-regression loss value corresponding to the first initial bounding box, it can be considered from the perspective of the similarity of the distribution of the bounding boxes and the degree of coincidence of the coordinates of the bounding boxes at the same time, or it can be considered only from the perspective of the similarity of the distribution of the bounding boxes, that is, the above-mentioned third A set of discrimination results corresponding to an initial bounding box includes a first discrimination result, and correspondingly, based on the first regression loss component corresponding to the first discrimination result, a sub-regression loss value corresponding to the first initial bounding box is determined.
S1046,基于上述回归损失值对生成子模型和判别子模型进行参数更新。S1046: Update the parameters of the generating sub-model and the discriminating sub-model based on the above regression loss value.
在基于各第一初始边界框对应的子回归损失值确定出回归损失值之后,利用梯度下降方法,基于上述回归损失值对生成子模型和判别子模型进行参数调整;其中,由于子回归损失值至少反映了基于边界框分布相似程度的回归损失判别维度对应的第一回归损失分量、以及基于边界框坐标重合程度的回归损失判别维度对应的第二回归损失分量,因此,用于对模型参数进行调整的回归损失值也反映了这两个回归损失判别维度分别对应的回归损失分量,使得最终训练得到的目标检测模型不仅能够确保预测得到的第一预测边界框与真实边界框的概率分布更接近,也能够确保第一预测边界框与真实边界框的坐标重合程度更高。 After the regression loss value is determined based on the sub-regression loss value corresponding to each first initial bounding box, the gradient descent method is used to adjust the parameters of the generative sub-model and the discriminant sub-model based on the above-mentioned regression loss value; among them, due to the sub-regression loss value It at least reflects the first regression loss component corresponding to the regression loss discrimination dimension based on the similarity of the bounding box distribution, and the second regression loss component corresponding to the regression loss discrimination dimension based on the coincidence degree of the bounding box coordinates. Therefore, it is used to perform model parameters. The adjusted regression loss value also reflects the regression loss components corresponding to the two regression loss discriminant dimensions, so that the final trained target detection model can not only ensure that the predicted first predicted bounding box is closer to the probability distribution of the real bounding box , can also ensure that the coordinates of the first predicted bounding box and the real bounding box coincide more closely.
在模型训练过程中,判别子模型尽量区分第一初始边界框对应的真实边界框与对应的第一预测边界框,分别来自于真实数据还是生成数据,最小化待训练模型的回归损失,而为了使判别子模型的分辨错误最大化,迫使生成子模型不断学习边界框分布,促使生成子模型与判别子模型进行多轮对抗学习,从而得到预测更加准确的生成子模型作为目标检测模型。During the model training process, the discriminant sub-model tries to distinguish the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, which come from real data or generated data respectively, minimizing the regression loss of the model to be trained, and in order to Maximize the resolution error of the discriminant sub-model, force the generative sub-model to continuously learn the bounding box distribution, and promote multiple rounds of adversarial learning between the generative sub-model and the discriminant sub-model, thereby obtaining a more accurate generative sub-model as a target detection model.
基于待训练模型的回归损失值对模型参数进行迭代训练得到目标检测模型的过程,可以参见现有的利用梯度下降方法反向传播对模型参数进行调优的过程,在此不再赘述。For the process of iteratively training the model parameters to obtain the target detection model based on the regression loss value of the model to be trained, you can refer to the existing process of tuning the model parameters using the gradient descent method back propagation, which will not be described again here.
如图3所示,给出了一种目标检测模型训练过程的具体实现原理示意图,具体包括:获取第一预设数量的第一初始边界框,以及获取各第一初始边界框分别对应的真实边界框;针对每个第一初始边界框:上述生成子模型基于该第一初始边界框进行边界框预测,得到第一预测边界框;上述判别子模型基于上述第一初始边界框对应的真实边界框和上述第一初始边界框对应的第一预测边界框,生成判别结果集合;As shown in Figure 3, a schematic diagram of the specific implementation principle of the training process of a target detection model is given, which specifically includes: obtaining a first preset number of first initial bounding boxes, and obtaining the true corresponding to each first initial bounding box. Bounding box; for each first initial bounding box: the above-mentioned generating sub-model performs boundary box prediction based on the first initial bounding box to obtain the first predicted bounding box; the above-mentioned discriminating sub-model is based on the true boundary corresponding to the above-mentioned first initial bounding box The first prediction bounding box corresponding to the first initial bounding box and the above-mentioned first initial bounding box generates a set of discrimination results;
基于各第一初始边界框对应的第一判别结果和第二判别结果,确定待训练模型的回归损失值;基于上述回归损失值对待训练模型的模型参数进行迭代更新,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型。Based on the first discrimination result and the second discrimination result corresponding to each first initial bounding box, determine the regression loss value of the model to be trained; based on the above regression loss value, iteratively update the model parameters of the model to be trained until the current model training results meet the predetermined Set the end conditions of model training to obtain the target detection model.
本申请实施例中,在模型训练阶段,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值,对生成子模 型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式,不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;并且判别子模型所输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到由弥补边界框分布相似但具体位置偏差所带来的边界框回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度。In the embodiment of this application, during the model training phase, based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the real bounding box, so that It can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring that the target detection model can be used to detect new targets. Detect the target detection accuracy of the image and improve the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model. Based on the set of discrimination results output by the discriminating sub-model, the model to be trained is determined regression loss value, and then continue to generate sub-models based on the regression loss value The model parameters of the type and discriminant sub-models are updated iteratively for multiple rounds until the current model training results meet the preset model training end conditions, that is, based on the generation-discrimination multi-round confrontation method, the bounding box distribution is continuously learned, in which the discriminant sub-model can determine the generation Whether the first predicted bounding box predicted by the sub-model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, the discriminant sub-model based on The adjustment of model parameters as a result of the discrimination can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving the model parameter update efficiency and bounding box distribution learning accuracy of the generating sub-model; and the discrimination The set of discrimination results output by the sub-model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates, so as to compensate for the deviation caused by the similarity of the bounding box distribution but the specific position. The effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
进一步地,考虑到在模型训练过程中,可能存在边界框分布相似程度角度考量的第一判别维度对应的回归损失的梯度突然降低,甚至变为零的情况,为了进一步提高模型参数的训练准确度,引入回归损失补偿值,基于此,上述判别结果集合还包括第三判别结果;对应的,上述S1042中的基于上述第一初始边界框对应的真实边界框和上述第一初始边界框对应的第一预测边界框,生成判别结果集合,具体包括:对上述第一初始边界框对应的真实边界框和第一预测边界框进行边界框真伪判别,得到第一判别结果;基于上述第一初始边界框对应的真实边界框和第一预测边界框,计算边界框交并比损失,得到第二判别结果;基于上述第一初始边界框对应的真实边界框和第一预测边界框,计算用于对待训练模型的回归损失函数的损失梯度进行约束的回归损失补偿值,得到第三判别结果。Furthermore, considering that during the model training process, there may be a situation where the gradient of the regression loss corresponding to the first discriminant dimension considered from the angle of similarity of the bounding box distribution suddenly decreases or even becomes zero, in order to further improve the training accuracy of the model parameters , introducing a regression loss compensation value. Based on this, the above-mentioned judgment result set also includes a third judgment result; correspondingly, in the above-mentioned S1042, the real bounding box corresponding to the above-mentioned first initial bounding box and the third corresponding to the first initial bounding box are A predicted bounding box, generating a set of discrimination results, specifically including: judging the authenticity of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box to obtain the first discrimination result; based on the above first initial boundary The real bounding box and the first predicted bounding box corresponding to the frame are calculated, and the intersection and union ratio loss of the bounding box is calculated to obtain the second discrimination result; based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, the calculation for the treatment The loss gradient of the regression loss function of the training model is constrained by the regression loss compensation value to obtain the third discrimination result.
针对每个第一初始边界框,该第一初始边界框对应的判别结果集合不仅包括从边界框分布相似程度角度出发得到的第一判别结果和从边界框坐标重合程度角度出发得到的第二判别结果,还包括用于约束第一判别维度对应的 回归损失的梯度的回归损失补偿值,这样不仅能够提高回归损失值的准确度,还能够解决因第一判别维度对应的回归损失的梯度突然降低,甚至变为零的问题。For each first initial bounding box, the set of discrimination results corresponding to the first initial bounding box includes not only the first discrimination result obtained from the perspective of the similarity of the distribution of the bounding boxes and the second discrimination result obtained from the perspective of the degree of coincidence of the bounding box coordinates. The result also includes the constraint corresponding to the first discriminant dimension. The regression loss compensation value of the gradient of the regression loss can not only improve the accuracy of the regression loss value, but also solve the problem that the gradient of the regression loss corresponding to the first discriminant dimension suddenly decreases or even becomes zero.
如图4a所示,给出了另一种目标检测模型训练过程的具体实现原理示意图,包括:预先利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取,得到N个锚框;其中,样本图像数据集包括多个原始样本图像,每个原始样本图像至少包括一个目标对象;每个锚框对应的特征信息可以包括位置信息(x,y,w,h)和类别信息c,即(x,y,w,h,c);在模型训练过程中,可以设定多个参数维度之间是相互独立的,因此,针对每个维度的模型参数的迭代训练过程也是相互独立的。As shown in Figure 4a, a schematic diagram of the specific implementation principle of the training process of another target detection model is given, including: using the preset region of interest extraction model to extract the target area from the sample image data set in advance to obtain N anchor boxes; Among them, the sample image data set includes multiple original sample images, each original sample image includes at least one target object; the feature information corresponding to each anchor frame can include position information (x, y, w, h) and category information c, That is (x, y, w, h, c); during the model training process, multiple parameter dimensions can be set to be independent of each other. Therefore, the iterative training process of the model parameters for each dimension is also independent of each other. .
针对每一轮模型训练,从N个锚框中随机采样m个锚框作为第一初始边界框,以及确定每个第一初始边界框分别对应的真实边界框;其中,样本图像数据集中的每个目标对象可以对应于一个真实边界框,例如,样本图像数据集中目标对象的总数为d,则扩充前的真实边界框的数量为d,为了使得真实边界框与第一预测边界框相对应,因此,包含相同目标对象的多个第一初始边界框对应的真实边界框可以是相同的,即基于第一初始边界框所圈定的目标对象,对真实边界框进行扩充,得到m个真实边界框(m>d);例如,某一原始样本图像中包含的目标对象为一只猫A,猫A对应于真实边界框A,若包含有猫A的第一初始边界框的数量为4个(如序号为6、7、8、9的第一初始边界框),则将真实边界框A扩充为4个真实边界框A(即序号为6、7、8、9的真实边界框)。For each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first initial bounding box, and the real bounding box corresponding to each first initial bounding box is determined; where, each of the first initial bounding boxes in the sample image data set is A target object can correspond to a real bounding box. For example, if the total number of target objects in the sample image data set is d, then the number of real bounding boxes before expansion is d. In order to make the real bounding box correspond to the first predicted bounding box, Therefore, the real bounding boxes corresponding to multiple first initial bounding boxes containing the same target object can be the same, that is, based on the target object enclosed by the first initial bounding box, the real bounding box is expanded to obtain m real bounding boxes. (m>d); For example, the target object contained in a certain original sample image is a cat A, and cat A corresponds to the real bounding box A. If the number of first initial bounding boxes containing cat A is 4 ( For example, the first initial bounding box with serial numbers 6, 7, 8, and 9), then the real bounding box A is expanded into four real bounding boxes A (that is, the real boundary boxes with serial numbers 6, 7, 8, and 9).
针对每个第一初始边界框,生成子模型基于该第一初始边界框进行边界框预测,得到第一预测边界框;判别子模型基于该第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合;其中,每个第一初始边界框对应于一个真实边界框和一个第一预测边界框,第一预测边界框是通过不断进行边界框回归学习的生成子模型预测得到的;生成子模型输出的 m个第一预测边界框中序号为6、7、8、9的第一预测边界框所圈定的目标对象为猫A。For each first initial bounding box, the generating sub-model performs bounding box prediction based on the first initial bounding box to obtain the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the corresponding third bounding box. A predicted bounding box generates a set of discrimination results; where each first initial bounding box corresponds to a real bounding box and a first predicted bounding box, and the first predicted bounding box is a generative sub-model learned through continuous bounding box regression Predicted; generated sub-model output The target object circled by the first prediction bounding boxes numbered 6, 7, 8, and 9 among the m first prediction bounding boxes is cat A.
针对每个第一初始边界框,基于第一初始边界框的判别结果集合中的第一判别结果确定第一回归损失分量,基于第一初始边界框的判别结果集合中的第二判别结果确定第二回归损失分量,以及基于第一初始边界框的判别结果集合中的第三判别结果确定第三回归损失分量。For each first initial bounding box, the first regression loss component is determined based on the first discrimination result in the discrimination result set of the first initial bounding box, and the first regression loss component is determined based on the second discrimination result in the discrimination result set of the first initial bounding box. two regression loss components, and a third regression loss component determined based on the third discrimination result in the discrimination result set of the first initial bounding box.
基于各第一初始边界框分别对应的第一回归损失分量、第二回归损失分量和第三回归损失分量,确定待训练模型的回归损失值;利用随机梯度下降方法,基于该回归损失值调整上述生成子模型和判别子模型的模型参数,得到参数更新后的生成子模型和判别子模型。Based on the first regression loss component, the second regression loss component and the third regression loss component respectively corresponding to each first initial bounding box, determine the regression loss value of the model to be trained; use the stochastic gradient descent method to adjust the above-mentioned regression loss value based on the regression loss value The model parameters of the generative sub-model and the discriminant sub-model are obtained, and the updated generative sub-model and discriminant sub-model are obtained.
若当前模型训练结果满足预设模型训练结束条件,则将上述更新后的生成子模型确定为训练后的目标检测模型。If the current model training results meet the preset model training end conditions, the above updated generated sub-model is determined as the trained target detection model.
若当前模型训练结果不满足预设模型训练结束条件,则将上述更新后的生成子模型和判别子模型确定为下一轮模型训练所使用的待训练模型,直到满足预设模型训练结束条件。If the current model training results do not meet the preset model training end conditions, the above-mentioned updated generation sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.
在模型训练过程中,针对每一轮模型训练,可以基于判别结果集合对判别子模型的模型参数进行调整,同时基于判别结果集合对生成子模型的模型参数进行调整;然而,在具体实施时,为了提高生成子模型的模型参数的训练准确度,针对每一轮模型训练,先循环基于判别结果集合对判别子模型的模型参数调整t次,再基于判别结果集合对生成子模型的模型参数调整一次,得到参数调整后的判别子模型和生成子模型作为下一轮待训练模型。During the model training process, for each round of model training, the model parameters of the discriminating sub-model can be adjusted based on the discriminating result set, and the model parameters of the generating sub-model can be adjusted based on the discriminating result set; however, during specific implementation, In order to improve the training accuracy of the model parameters of the generated sub-model, for each round of model training, the model parameters of the discriminant sub-model are adjusted t times based on the set of discriminant results, and then the model parameters of the generated sub-model are adjusted based on the set of discriminant results. Once, the parameter-adjusted discriminant sub-model and generative sub-model are obtained as the next round of training models.
其中,待训练模型的回归损失值是基于多个第一初始边界框分别对应的子回归损失值共同决定的,每个第一初始边界框对应的子回归损失值是基于多个回归损失分量共同决定的,基于此,上述S1044,基于各上述第一初始边界框对应的第一判别结果和第二判别结果,确定待训练模型的回归损失值,具体包括:确定各第一初始边界框对应的子回归损失值;每个第一初始边界 框对应的子回归损失值是基于目标信息确定的,其中,目标信息包括以下一种或组合:第一初始边界框对应的第一判别结果所表征的边界框分布相似程度、第二判别结果所表征的边界框坐标重合程度、第三判别结果所表征的回归损失补偿值;基于各第一初始边界框对应的子回归损失值,确定待训练模型的回归损失值。Among them, the regression loss value of the model to be trained is jointly determined based on the sub-regression loss values corresponding to multiple first initial bounding boxes, and the sub-regression loss value corresponding to each first initial bounding box is jointly determined based on the multiple regression loss components. Determined, based on this, the above-mentioned S1044 determines the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to each of the above-mentioned first initial bounding boxes, specifically including: determining the regression loss value corresponding to each first initial bounding box. Sub-regression loss value; each first initial boundary The sub-regression loss value corresponding to the box is determined based on the target information, where the target information includes one or a combination of the following: the similarity of the bounding box distribution represented by the first discrimination result corresponding to the first initial bounding box, the similarity of the distribution of the bounding box represented by the second discrimination result. The degree of coincidence of bounding box coordinates represented and the regression loss compensation value represented by the third discrimination result; based on the sub-regression loss value corresponding to each first initial bounding box, the regression loss value of the model to be trained is determined.
在具体实施时,在确定第一初始边界框对应的子回归损失值的过程中,可以仅考虑第一判别结果对应的第一回归损失分量,也可以同时考虑第一判别结果对应的第一回归损失分量和第二判别结果对应的第二回归损失分量,还可以同时考虑第一判别结果对应的第一回归损失分量、第二判别结果对应的第二回归损失分量和第三判别结果对应的回归损失补偿分量;其中,以考虑损失补偿分量为例,针对每个第一初始边界框而言,对应的子回归损失值等于三个回归损失分量加权求和,具体可以表示为,
Vi(D,G)=λ1Vi12Vi23Vi3
In specific implementation, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, only the first regression loss component corresponding to the first discrimination result may be considered, or the first regression loss component corresponding to the first discrimination result may be considered at the same time. The loss component and the second regression loss component corresponding to the second discrimination result can also be considered at the same time. The first regression loss component corresponding to the first discrimination result, the second regression loss component corresponding to the second discrimination result, and the regression corresponding to the third discrimination result can also be considered at the same time. Loss compensation component; where, taking the loss compensation component as an example, for each first initial bounding box, the corresponding sub-regression loss value is equal to the weighted sum of three regression loss components, which can be expressed as,
V i (D,G)=λ 1 V i12 V i23 V i3
其中,λ1表示在第一判别维度下的第一回归损失分量对应的第一权重系数,Vi1表示在第一判别维度下的第一回归损失分量(即与第一判别结果所表征的边界框分布相似程度对应的回归损失分量),λ2表示在第二判别维度下的第二回归损失分量对应的第二权重系数,Vi2表示在第二判别维度下的第二回归损失分量(即与第二判别结果所表征的边界框坐标重合程度对应的回归损失分量),λ3表示回归损失补偿值对应的第三权重系数,Vi3表示回归损失补偿值(即第三回归损失分量);第一判别维度可以是基于边界框分布相似程度的回归损失判别维度,第二判别维度可以是基于边界框坐标重合程度的回归损失判别维度。Among them, λ 1 represents the first weight coefficient corresponding to the first regression loss component under the first discriminant dimension, and V i1 represents the first regression loss component under the first discriminant dimension (that is, the boundary represented by the first discriminant result). The regression loss component corresponding to the similarity of the box distribution), λ 2 represents the second weight coefficient corresponding to the second regression loss component under the second discriminant dimension, V i2 represents the second regression loss component under the second discriminant dimension (i.e. The regression loss component corresponding to the coincidence degree of the bounding box coordinates represented by the second discrimination result), λ 3 represents the third weight coefficient corresponding to the regression loss compensation value, V i3 represents the regression loss compensation value (i.e., the third regression loss component); The first discriminant dimension may be a regression loss discriminant dimension based on the similarity of bounding box distributions, and the second discriminant dimension may be a regression loss discriminant dimension based on the coincidence degree of bounding box coordinates.
在具体实施时,针对多个第一初始边界框而言,第一权重系数和第二权重系数可以是保持不变的,然而考虑到第一回归损失分量和第二回归损失分量分别对应于不同的回归损失判别维度(即基于边界框分布相似程度的回归损失判别维度和基于边界框坐标重合程度的回归损失判别维度),并且不同的 回归损失判别维度的回归损失考量的侧重点也有所不同(如基于边界框分布相似程度的回归损失判别维度,侧重于考虑边界框边缘模糊的真实边界框对应的第一初始边界框的回归损失,基于边界框坐标重合程度的回归损失判别维度,侧重于考虑边界框分布相似但具体位置偏差的第一初始边界框的回归损失),因此,第一回归损失分量和第二回归损失分量的大小关系,在一定程度上反映了哪个回归损失判别维度能够更加准确地表征真实边界框与第一预测边界框之间的回归损失,基于此,针对每个第一初始边界框,根据该第一初始边界框对应的第一回归损失分量和第二回归损失分量的大小关系,调节第一权重系数和第二权重系数的大小;若第一回归损失分量与第二回归损失分量的差值的绝对值不大于预设损失阈值,则第一权重系数和第二权重系数保持不变;若第一回归损失分量与第二回归损失分量的差值的绝对值大于预设损失阈值,且第一回归损失分量大于第二回归损失分量,则按照第一预设调节方式,增大第一权重系数;若第一回归损失分量与第二回归损失分量的差值的绝对值大于预设损失阈值,且第一回归损失分量小于第二回归损失分量,则按照第二预设调节方式,增大第二权重系数,从而达到在模型训练过程中针对每个第一初始边界框而言,重点参考能够更好地反映边界框回归损失的判别维度对应的回归损失分量的效果,进而实现进一步提高模型参数优化的准确度。In specific implementation, for multiple first initial bounding boxes, the first weight coefficient and the second weight coefficient may remain unchanged. However, considering that the first regression loss component and the second regression loss component respectively correspond to different The regression loss discriminant dimension (that is, the regression loss discriminant dimension based on the similarity of the bounding box distribution and the regression loss discriminant dimension based on the coincidence degree of the bounding box coordinates), and different The focus of regression loss consideration in the regression loss discriminant dimension is also different (for example, the regression loss discriminant dimension based on the similarity of the bounding box distribution focuses on the regression loss of the first initial bounding box corresponding to the real bounding box with blurred edge of the bounding box, The regression loss discrimination dimension based on the degree of coincidence of bounding box coordinates focuses on the regression loss of the first initial bounding box that considers the distribution of bounding boxes is similar but the specific position deviation). Therefore, the size relationship between the first regression loss component and the second regression loss component , to a certain extent, reflects which regression loss discriminant dimension can more accurately characterize the regression loss between the real bounding box and the first predicted bounding box. Based on this, for each first initial bounding box, according to the first initial boundary The size relationship between the first regression loss component and the second regression loss component corresponding to the frame, adjust the size of the first weight coefficient and the second weight coefficient; if the absolute value of the difference between the first regression loss component and the second regression loss component is not is greater than the preset loss threshold, then the first weight coefficient and the second weight coefficient remain unchanged; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is greater than the second regression loss component, then increase the first weight coefficient according to the first preset adjustment method; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first If the regression loss component is smaller than the second regression loss component, the second weight coefficient is increased according to the second preset adjustment method, so that for each first initial bounding box during the model training process, the key reference can be better Reflects the effect of the regression loss component corresponding to the discriminant dimension of the bounding box regression loss, thereby further improving the accuracy of model parameter optimization.
需要说明的是,上述第一预设调节方式对应的第一权重系数增大幅度和第二预设调节方式对应的第二权重系数增大幅度可以相同,也可以不同,权重系数增大幅度可以根据实际需求进行设置,本申请并不对此进行限定。It should be noted that the increase range of the first weight coefficient corresponding to the above-mentioned first preset adjustment method and the increase range of the second weight coefficient corresponding to the second preset adjustment method may be the same or different, and the increase range of the weight coefficient may be Set according to actual needs, and this application does not limit this.
其中,针对从边界框分布相似程度的判别维度考量得到第一判别结果的过程,上述对上述第一初始边界框对应的真实边界框和第一预测边界框进行边界框真伪判别,得到第一判别结果,具体包括:Among them, in view of the process of obtaining the first discrimination result by considering the discrimination dimension of the similarity degree of the bounding box distribution, the authenticity of the boundary box and the first predicted boundary box corresponding to the above-mentioned first initial boundary box are judged to obtain the first The judgment results include:
步骤A1,基于第一初始边界框对应的真实边界框,确定真实边界框被判别子模型预测为真的第一判别概率;以及基于上述第一初始边界框对应的第 一预测边界框,确定第一预测边界框被判别子模型预测为伪造的第二判别概率。Step A1, based on the real bounding box corresponding to the first initial bounding box, determine the first discriminant probability that the real bounding box is predicted to be true by the discriminant sub-model; and based on the first discriminant probability corresponding to the first initial bounding box. A predicted bounding box that determines the second discriminant probability that the first predicted bounding box is predicted to be fake by the discriminator model.
步骤A2,基于第一初始边界框对应的第一判别概率和第二判别概率,生成该第一初始边界框对应的第一判别结果。Step A2: Generate a first discrimination result corresponding to the first initial bounding box based on the first discrimination probability and the second discrimination probability corresponding to the first initial bounding box.
针对每个第一初始边界框,通过上述判别子模型判别第一初始边界框对应的真实边界框来自于真实数据的概率,即对于真实边界框而言,判别子模型对真实边界框进行真伪判别,得到预测真实边界框为真实数据的第一判别概率;同样的,针对每个第一初始边界框,通过上述判别子模型判别第一初始边界框对应的第一预测边界框来自于生成数据的概率(即数值1减去判别子模型判别第一预测边界框来自于真实数据的概率),即对于第一预测边界框而言,判别子模型对第一预测边界框进行真伪判别,得到预测第一预测边界框为生成数据的第二判别概率。For each first initial bounding box, the above discriminant sub-model determines the probability that the real bounding box corresponding to the first initial bounding box comes from real data, that is, for the real bounding box, the discriminant sub-model determines the authenticity of the real bounding box. Discriminate to obtain the first discriminant probability that the predicted real bounding box is real data; similarly, for each first initial bounding box, use the above discriminant sub-model to determine that the first predicted bounding box corresponding to the first initial bounding box comes from the generated data The probability (that is, the value 1 minus the probability that the discriminant model determines that the first predicted bounding box comes from real data), that is, for the first predicted bounding box, the discriminant model performs a true or false judgment on the first predicted bounding box, and we get Predict the first predicted bounding box to be the second discriminant probability of the generated data.
由于判别子模型从边界框分布相似程度角度,将真实边界框对应的第一概率分布与第一预测边界框对应的第二概率分布进行比对,以实现对真实边界框和第一预测边界框进行真伪判别,得到相应的判别概率,该判别概率能够表征真实边界框与对应的第一预测边界框之间的分布相似程度,因此,在确定出上述第一判别概率和第二判别概率之后,即可得到第一判别结果,其中,第一判别结果能够表征边界框分布相似程度;进而,基于第一判别结果,即可确定表征边界框分布相似程度的判别维度对应的第一回归损失分量,其中,第一判别概率和第二判别概率越大,表征第一初始边界框对应的真实边界框与对应的第一预测边界框的分布相似程度越低,因此,第一初始边界框对应的第一回归损失分量越大;然后,基于第一回归损失分量更新生成子模型的模型参数,从而使生成子模型的生成结果在经过判别子模型预测后能够优化待训练模型的损失值,达到优化生成子模型的目的,提高生成子模型的边界框预测效果。Since the discriminator model compares the first probability distribution corresponding to the real bounding box and the second probability distribution corresponding to the first predicted bounding box from the perspective of the similarity of the bounding box distribution, so as to realize the comparison between the real bounding box and the first predicted bounding box. Carry out authenticity discrimination and obtain the corresponding discrimination probability. This discrimination probability can represent the distribution similarity between the real bounding box and the corresponding first predicted bounding box. Therefore, after determining the above-mentioned first discrimination probability and second discrimination probability , the first discrimination result can be obtained, where the first discrimination result can represent the similarity of the bounding box distribution; further, based on the first discrimination result, the first regression loss component corresponding to the discrimination dimension that represents the similarity of the bounding box distribution can be determined , where the greater the first discriminant probability and the second discriminant probability are, the lower the distribution similarity between the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. Therefore, the distribution of the corresponding first initial bounding box is The greater the first regression loss component; then, the model parameters of the generation sub-model are updated based on the first regression loss component, so that the generation results of the generation sub-model can optimize the loss value of the model to be trained after being predicted by the discriminant sub-model, achieving optimization The purpose of generating a sub-model is to improve the bounding box prediction effect of the generated sub-model.
进一步的,为了提高各第一初始边界框对应的第一判别结果的准确度, 以便在基于第一判别结果确定子回归损失值的过程中,能够提高边界框分布相似程度的判别维度对应的第一回归损失分量的准确度,基于此,上述步骤A2,基于上述第一初始边界框对应的第一判别概率和第二判别概率,生成第一判别结果,具体包括:Further, in order to improve the accuracy of the first discrimination results corresponding to each first initial bounding box, In order to improve the accuracy of the first regression loss component corresponding to the discrimination dimension of the bounding box distribution similarity in the process of determining the sub-regression loss value based on the first discrimination result, based on this, the above-mentioned step A2, based on the above-mentioned first initial boundary The first discrimination probability and the second discrimination probability corresponding to the frame are generated to generate the first discrimination result, which specifically includes:
步骤A21,基于上述第一判别概率和第一初始边界框对应的真实边界框的第一先验概率,确定第一加权概率;以及基于上述第二判别概率和第一初始边界框的第二先验概率,确定第二加权概率。Step A21: Determine the first weighted probability based on the above-mentioned first discriminant probability and the first prior probability of the real bounding box corresponding to the first initial bounding box; and based on the above-mentioned second discriminant probability and the second prior probability of the first initial bounding box. The experimental probability is determined to determine the second weighted probability.
步骤A22,基于上述第一初始边界框对应的第一加权概率和第二加权概率,生成第一判别结果。Step A22: Generate a first discrimination result based on the first weighted probability and the second weighted probability corresponding to the first initial bounding box.
在确定表征边界框分布相似程度的第一判别结果的过程中,考虑真实边界框的第一先验概率和第一初始边界框的第二先验概率,分别对判别子模型对真实边界框和第一预测边界框进行真伪判别,得到的第一判别概率和第二判别概率进行加权处理,以确定第一判别结果(即第一判别结果可以包括第一加权概率和第二加权概率),因此,基于第一判别结果得到的与边界框分布相似程度有关的第一回归损失分量可以表示为:
In the process of determining the first discriminant result that represents the similarity of the bounding box distribution, the first prior probability of the real bounding box and the second prior probability of the first initial bounding box are considered, and the discriminant sub-model is used for the true bounding box and the second prior probability of the first initial bounding box, respectively. The first predicted bounding box is judged as true or false, and the obtained first judgment probability and second judgment probability are weighted to determine the first judgment result (that is, the first judgment result may include the first weighted probability and the second weighted probability), Therefore, the first regression loss component related to the similarity of the bounding box distribution obtained based on the first discrimination result can be expressed as:
其中,表示第i个真实边界框出现的先验概率(即第一先验概率),Pi1表示第i个真实边界框被判别子模型预测为真的第一判别概率,表示第i个第一初始边界框出现的先验概率(即第二先验概率),Pi2表示第i个第一预测边界框被判别子模型预测为伪造的第二判别概率。in, Represents the prior probability that the i-th true bounding box appears (i.e., the first prior probability), P i1 represents the first discriminant probability that the i-th true bounding box is predicted to be true by the discriminant sub-model, represents the prior probability that the i-th first initial bounding box appears (i.e., the second prior probability), and P i2 represents the second discriminant probability that the i-th first predicted bounding box is predicted to be fake by the discriminant sub-model.
需要说明的是,在具体实施时,可以为第i个第一原始边界框出现的先验概率,由于第一预测边界框是由生成子模型基于第一原始边界框进行边界框预测得到的,因此,也可以为第i个第一预测边界框出现先验概率。It should be noted that during specific implementation, can be the prior probability that the i-th first original bounding box appears. Since the first predicted bounding box is predicted by the generating sub-model based on the first original bounding box, therefore, It is also possible to provide a prior probability for the i-th first predicted bounding box occurrence.
由于真实边界框和预测边界框出现的概率均服从某一概率分布,如高斯 分布,因此,第一先验概率和第二先验概率可以通过下述方式得到:
Since the probability of occurrence of the real bounding box and the predicted bounding box both obey a certain probability distribution, such as Gaussian distribution, therefore, the first prior probability and the second prior probability can be obtained in the following way:
其中,表示序号为i的第一初始边界框对应的真实边界框,σ1表示第一预设数量的真实边界框的分布概率的方差,表示第一预设数量的真实边界框的分布概率的均值。
in, represents the real bounding box corresponding to the first initial bounding box with serial number i, σ 1 represents the variance of the distribution probability of the first preset number of real bounding boxes, Represents the mean value of the distribution probability of the first preset number of real bounding boxes.
其中,表示序号为i的第一初始边界框,σ2表示第一预设数量的第一初始边界框的分布概率的方差,表示第一预设数量的第一初始边界框的分布概率的均值。in, represents the first initial bounding box with serial number i, σ 2 represents the variance of the distribution probability of the first preset number of first initial bounding boxes, Represents the mean value of the distribution probability of the first preset number of first initial bounding boxes.
上述回归损失值等于第一预设数量的第一初始边界框分别对应的子回归损失值之和,具体可以表示为:
The above regression loss value is equal to the sum of the sub-regression loss values corresponding to the first preset number of first initial bounding boxes. Specifically, it can be expressed as:
其中,Nreg表示第一预设数量,i表示第一初始边界框的序号,i的取值为1至NregAmong them, N reg represents the first preset number, i represents the serial number of the first initial bounding box, and the value of i is 1 to N reg .
其中,针对从边界框坐标重合程度的判别维度考量得到第二判别结果的过程,上述基于上述第一初始边界框对应的真实边界框和第一预测边界框,计算边界框交并比损失,得到第二判别结果,具体包括:Among them, for the process of obtaining the second discrimination result by considering the discriminant dimension of the bounding box coordinate coincidence degree, the above-mentioned calculation of the intersection-union ratio loss of the bounding box based on the real bounding box and the first predicted bounding box corresponding to the above-mentioned first initial bounding box is obtained, The second judgment result specifically includes:
步骤B1,对上述第一初始边界框对应的真实边界框和上述第一初始边界框对应的第一预测边界框进行边界框交并比损失计算,得到第一交并比损失。Step B1: Calculate the intersection and union ratio loss of the bounding box on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box to obtain the first intersection and union ratio loss.
以序号为i的第一初始边界框为例,计算序号为i的真实边界框与序号为i的第一预测边界框之间的交并比损失,得到序号为i的第一初始边界框对应的第一交并比损失。 Taking the first initial bounding box with serial number i as an example, calculate the intersection loss between the real bounding box with serial number i and the first predicted bounding box with serial number i, and obtain the correspondence of the first initial bounding box with serial number i. The first cross-over loss.
步骤B2,基于上述第一交并比损失,确定上述第一初始边界框对应的第二判别结果。Step B2: Based on the first intersection-union ratio loss, determine the second discrimination result corresponding to the first initial bounding box.
由于两个边界框之间的交并比损失的大小能够表征边界框坐标重合程度,因此,可以基于真实边界框与第一预测边界框之间的交并比损失,得到第二判别结果,从而基于第二判别结果确定从边界框坐标重合程度角度考量的判别维度对应的第二回归损失分量,进而促使模型进行边界框回归学习。Since the size of the intersection-union ratio loss between two bounding boxes can characterize the degree of coincidence of the bounding box coordinates, the second discrimination result can be obtained based on the intersection-union ratio loss between the real bounding box and the first predicted bounding box, thus Based on the second discrimination result, the second regression loss component corresponding to the discrimination dimension considered from the perspective of the coincidence degree of the bounding box coordinates is determined, thereby prompting the model to perform bounding box regression learning.
针对第二判别结果的确定过程,可以仅考虑真实边界框与自身对应的第一预测边界框之间的第一交并比损失,然而,为了提高第二判别结果的确定准确度,从而提高从边界框坐标重合程度角度考量的判别维度对应的第二回归损失分量的准确度,进而提高用于调整模型参数的回归损失值的准确度,不仅考虑真实边界框与自身对应的第一预测边界框之间的第一交并比损失,还考虑真实边界框与其他第一预测边界框之间的第二交并比损失,这样能够达到将真实边界框分别与正例样本(即通过边界框回归学习得到的某一真实边界框自身对应的第一预测边界框)和负例样本(即通过边界框回归学习得到的除某一真实边界框之外的其他真实边界框对应的第一预测边界框)在边界框坐标重合程度的判别维度上进行比对,来学习真实边界框的具体位置表示,进而促使模型更好地进行边界框回归学习,基于此,上述步骤B2,基于上述第一交并比损失,确定上述第一初始边界框对应的第二判别结果,具体包括:For the determination process of the second discrimination result, only the first intersection-union ratio loss between the real bounding box and its corresponding first predicted bounding box can be considered. However, in order to improve the determination accuracy of the second discrimination result, thereby improving the The accuracy of the second regression loss component corresponding to the discriminant dimension considered in the angle of the boundary box coordinate coincidence degree, thereby improving the accuracy of the regression loss value used to adjust the model parameters, not only considering the first predicted boundary box corresponding to the real boundary box and itself The first cross-union loss between the real bounding box and other first predicted bounding boxes also considers the second cross-union loss between the real bounding box and other first predicted bounding boxes. This can achieve the goal of distinguishing the real bounding box from the positive sample (i.e. through bounding box regression). The first predicted bounding box corresponding to a certain real bounding box learned through learning) and the negative sample (that is, the first predicted bounding box corresponding to other real bounding boxes other than a certain real bounding box learned through bounding box regression ) is compared on the discriminant dimension of the coincidence degree of the bounding box coordinates to learn the specific position representation of the real bounding box, thereby prompting the model to better perform bounding box regression learning. Based on this, the above step B2, based on the above first intersection Ratio loss, determine the second discrimination result corresponding to the above-mentioned first initial bounding box, specifically including:
B21,在第一预设数量的第一初始边界框分别对应的第一预测边界框中,确定对比边界框集合。B21: Determine a set of comparison bounding boxes among the first predicted bounding boxes respectively corresponding to the first preset number of first initial bounding boxes.
其中,上述对比边界框集合包括除上述第一初始边界框对应的第一预测边界框之外的其他第一预测边界框、或者不包含上述第一初始边界框所圈定的目标对象的其他第一预测边界框。Wherein, the comparison bounding box set includes other first predicted bounding boxes other than the first predicted bounding box corresponding to the first initial bounding box, or other first predicted bounding boxes that do not include the target object enclosed by the first initial bounding box. Predict bounding boxes.
仍以序号为i的第一初始边界框为例,上述对比边界框集合可以包括除序号为i的第一预测边界框之外的其他第一预测边界框(即序号为k的第一 预测边界框,k≠p,p=i),也就是说,将除序号为i的第一预测边界框之外的其他第一预测边界框均作为序号为i的真实边界框的负例样本;为了进一步提高负例样本的选取准确度,上述对比边界框集合可以包括除序号为i的第一预测边界框之外的其他第一预测边界框,且其他第一预测边界框不包含序号为i的第一初始边界框所圈定的目标对象(即序号为k的第一预测边界框,k≠p,p=i或p=j,序号为j的第一预测边界框与序号为i的第一初始边界框所圈定的目标对象相同),也就是说,仅将与序号为i的第一初始边界框包含不同目标对象的其他第一预测边界框作为序号为i的真实边界框的负例样本。Still taking the first initial bounding box with the serial number i as an example, the comparison boundary box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i (i.e., the first predicted bounding box with the serial number k). Predicted bounding box, k≠p, p=i), that is to say, except for the first predicted bounding box with serial number i, all other first predicted bounding boxes are used as negative samples of the real bounding box with serial number i. ; In order to further improve the accuracy of selecting negative samples, the above comparison bounding box set may include other first predicted bounding boxes except the first predicted bounding box with the serial number i, and the other first predicted bounding boxes do not include the first predicted bounding box with the serial number i The target object enclosed by the first initial bounding box of i (i.e. the first predicted bounding box with serial number k, k≠p, p=i or p=j, the first predicted bounding box with serial number j and the first predicted bounding box with serial number i The target object enclosed by the first initial bounding box is the same), that is, only other first predicted bounding boxes that contain different target objects from the first initial bounding box with serial number i are used as the negative of the real bounding box with serial number i. Example sample.
B22,对上述第一初始边界框对应的真实边界框和上述其他第一预测边界框分别进行边界框交并比损失计算,得到第二交并比损失。B22, perform a boundary box intersection and union loss calculation on the real bounding box corresponding to the first initial bounding box and the other first predicted bounding boxes, respectively, to obtain a second intersection and union loss.
仍以序号为i的第一初始边界框为例,针对对比边界框集合中的每个其他第一预测边界框,计算序号为i的真实边界框与序号为k的第一预测边界框之间的交并比损失,得到序号为k的第一预测边界框对应的第二交并比损失。Still taking the first initial bounding box with serial number i as an example, for each other first predicted bounding box in the comparison boundary box set, calculate the difference between the real bounding box with serial number i and the first predicted bounding box with serial number k. The intersection-union ratio loss is obtained, and the second intersection-union ratio loss corresponding to the first predicted bounding box with serial number k is obtained.
B23,基于上述第一交并比损失和第二交并比损失,确定上述第一初始边界框对应的第二判别结果。B23: Based on the first intersection-union ratio loss and the second intersection-union ratio loss, determine the second discrimination result corresponding to the above-mentioned first initial bounding box.
在确定表征边界框坐标重合程度的第二判别结果的过程中,基于序号为i的真实边界框和序号为i的第一预测边界框,计算第一交并比损失,以及基于序号为i的真实边界框和序号为k的第一预测边界框,计算第二交并比损失(k≠p),以确定第二判别结果(即第二判别结果可以包括第一交并比损失和第二交并比损失),然后,基于第二判别结果即可确定与边界框坐标重合程度有关的第二回归损失分量,这样基于第二回归损失分量对模型参数进行调整,能够让序号为i的真实边界框与序号为i的第一预测边界框的坐标重合程度更高,而使得与其他第一预测边界框的坐标重合程度更小,从而增强边界框回归学习的全局性,进一步提高边界框回归学习的准确度。 In the process of determining the second discrimination result that represents the degree of coincidence of the bounding box coordinates, the first intersection and union ratio loss is calculated based on the real bounding box with serial number i and the first predicted bounding box with serial number i, and based on the first predicted bounding box with serial number i, The real bounding box and the first predicted bounding box with serial number k, calculate the second intersection and union ratio loss (k≠p) to determine the second discrimination result (that is, the second discrimination result can include the first intersection and union ratio loss and the second Intersection and union ratio loss), then, based on the second discrimination result, the second regression loss component related to the degree of coincidence of the bounding box coordinates can be determined. In this way, the model parameters can be adjusted based on the second regression loss component to make the real object with serial number i The bounding box has a higher degree of coincidence with the coordinates of the first predicted bounding box numbered i, which makes the coordinates of the bounding box coincide with other first predicted bounding boxes smaller, thereby enhancing the global nature of the bounding box regression learning and further improving the bounding box regression. Learning accuracy.
上述第二回归损失分量为对目标交并比损失的求对数,该目标交并比损失为第一交并比损失的指数与多个第二交并比的指数之和的商值,即以p=i为例,第二回归损失分量可以表示为:
The above-mentioned second regression loss component is the logarithm of the target intersection-union ratio loss. The target intersection-union ratio loss is the quotient of the index of the first intersection-union ratio loss and the sum of the indices of multiple second intersection-union ratios, that is Taking p=i as an example, the second regression loss component can be expressed as:
其中,表示序号为i的第一初始边界框对应的真实边界框,表示序号为i的第一初始边界框,表示序号为i的第一初始边界框对应的第一预测边界框,表示第一交并比损失,表示序号为k的第一初始边界框,表示序号为k的第一初始边界框对应的第一预测边界框,表示第二交并比损失,θg表示生成子模型的模型参数,ω表示预设调节因子。in, Represents the real bounding box corresponding to the first initial bounding box with serial number i, Represents the first initial bounding box with serial number i, Represents the first predicted bounding box corresponding to the first initial bounding box with serial number i, represents the first cross-union loss, Represents the first initial bounding box with serial number k, Represents the first predicted bounding box corresponding to the first initial bounding box with sequence number k, represents the second intersection-union ratio loss, θ g represents the model parameters of the generated sub-model, and ω represents the preset adjustment factor.
其中,针对上述回归损失补偿值的确定过程,上述基于上述第一初始边界框对应的真实边界框和第一预测边界框,计算用于对待训练模型的回归损失函数的回归损失函数的损失梯度进行约束的回归损失补偿值,具体包括:Among them, for the determination process of the regression loss compensation value, the loss gradient of the regression loss function of the regression loss function of the model to be trained is calculated based on the real boundary box and the first predicted boundary box corresponding to the first initial boundary box. Constrained regression loss compensation value, specifically including:
步骤C1,基于上述第一初始边界框对应的真实边界框和第一预测边界框,生成该第一初始边界框对应的合成边界框。Step C1: Generate a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box.
以序号为i的第一初始边界框为例,按照预设坐标信息采样方式,基于序号为i的真实边界框对应的第一坐标信息集合和序号为i的第一预测边界框对应的第二坐标信息集合,确定采样坐标信息集合;基于该采样坐标信息集合,确定序号为i的合成边界框。Taking the first initial bounding box with the serial number i as an example, according to the preset coordinate information sampling method, based on the first coordinate information set corresponding to the real bounding box with the serial number i and the second predicted bounding box corresponding to the serial number i The coordinate information set determines the sampling coordinate information set; based on the sampling coordinate information set, the synthetic bounding box with serial number i is determined.
步骤C2,基于上述第一初始边界框对应的合成边界框与真实边界框的边界框分布相似程度,确定回归损失补偿值。 Step C2: Determine the regression loss compensation value based on the similarity of the bounding box distribution between the synthetic bounding box corresponding to the first initial bounding box and the real bounding box.
在确定出序号为i的第一初始边界框对应的合成边界框后,计算序号为i的合成边界框与序号为i的真实边界框之间的边界框分布相似程度,即再对边界框分布相似程度计算关于合成边界框的补偿梯度,即再基于该补偿梯度的矩阵二范数,确定序号为i的第一初始边界框对应的回归损失补偿值。After determining the synthetic bounding box corresponding to the first initial bounding box with serial number i Finally, calculate the similarity of the bounding box distribution between the synthetic bounding box with serial number i and the real bounding box with serial number i, that is Then calculate the compensation gradient about the synthetic bounding box based on the similarity of the bounding box distribution, that is Then based on the matrix two norm of the compensation gradient, the regression loss compensation value corresponding to the first initial bounding box with serial number i is determined.
针对某一第一初始边界框对应的合成边界框的确定过程,上述步骤C1,基于上述第一初始边界框对应的真实边界框和第一预测边界框,生成该第一初始边界框对应的合成边界框,具体包括:Regarding the determination process of the synthetic bounding box corresponding to a certain first initial bounding box, the above-mentioned step C1 generates a synthetic bounding box corresponding to the first initial bounding box based on the real bounding box and the first predicted bounding box corresponding to the above-mentioned first initial bounding box. Bounding box, specifically including:
C11,基于第一采样比例和上述第一初始边界框对应的真实边界框的第一坐标信息集合,确定第一坐标信息子集。C11: Determine the first coordinate information subset based on the first sampling ratio and the first coordinate information set of the real bounding box corresponding to the first initial bounding box.
C12,基于第二采样比例和上述第一初始边界框对应的第一预测边界框的第二坐标信息集合,确定第二坐标信息子集;其中,需要说明的是,上述第一采样比例和第二采样比例可以是根据实际情况预设设置的,上述第一采样比例与上述第二采样比例之和等于1。C12: Determine the second coordinate information subset based on the second sampling ratio and the second coordinate information set of the first predicted bounding box corresponding to the first initial bounding box; wherein, it should be noted that the above-mentioned first sampling ratio and the third The second sampling ratio may be preset according to the actual situation, and the sum of the first sampling ratio and the second sampling ratio is equal to 1.
C13,基于上述第一坐标信息子集和第二坐标信息子集,生成上述第一初始边界框对应的合成边界框。C13: Generate a synthetic bounding box corresponding to the first initial bounding box based on the first coordinate information subset and the second coordinate information subset.
仍以序号为i的第一初始边界框为例,按照第一采样比例,在序号为i的真实边界框的第一坐标信息集合中,随机采样得到第一坐标信息子集;以及按照第二采样比例,在序号为i的第一预测边界框的第二坐标信息集合中,随机采样得到第二坐标信息子集;将第一坐标信息子集和第二坐标信息子集的组合确定为采样坐标信息集合,基于采样坐标信息集合绘制得到的边界框即为序号为i的合成边界框;其中,由于合成边界框是基于序号为i的真实边界框的的坐标信息(即真实数据)和序号为i的第一预测边界框的坐标信息(即生成数据),随机采样混合得到的边界框,因此,合成边界框的一部分坐标信息来自于真实数据,另一部坐标信息来自于生成数据,即合成边界框由真实数据和生成数据共用决定且具有一定随机性,这样能够在第一判别维度 对应的回归损失的梯度突然降低,甚至变为零的情况下,对回归损失值的梯度进行补偿,从而避免在模型训练过程中因第一判别维度对应的回归损失的梯度突然降低,甚至变为零而导致回归损失值的梯度突然降低的问题,进而进一步提高模型参数的训练准确度。Still taking the first initial bounding box with serial number i as an example, according to the first sampling ratio, in the first coordinate information set of the real bounding box with serial number i, randomly sample to obtain the first coordinate information subset; and according to the second Sampling ratio, in the second coordinate information set of the first predicted bounding box with serial number i, randomly sample to obtain the second coordinate information subset; determine the combination of the first coordinate information subset and the second coordinate information subset as sampling Coordinate information set, the bounding box drawn based on the sampling coordinate information set is the synthetic bounding box with serial number i; among them, since the synthetic bounding box is based on the coordinate information (that is, real data) and serial number of the real bounding box with serial number i is the coordinate information of the first predicted bounding box of i (that is, the generated data), the bounding box obtained by random sampling and mixing. Therefore, part of the coordinate information of the synthetic bounding box comes from the real data, and the other coordinate information comes from the generated data, that is The synthetic bounding box is determined by both real data and generated data and has a certain degree of randomness, so that it can be used in the first discriminant dimension. When the gradient of the corresponding regression loss suddenly decreases or even becomes zero, the gradient of the regression loss value is compensated to avoid the sudden decrease of the gradient of the regression loss corresponding to the first discriminant dimension during the model training process, or even becomes zero. Zero causes the problem of a sudden decrease in the gradient of the regression loss value, thereby further improving the training accuracy of the model parameters.
考虑到在目标检测过程中,目标检测模型不仅需要确定目标对象所在的位置,也需要确定目标对象的具体类别,因此,在目标检测模型的训练过程中,可能存在针对某些第一初始边界框进行类别识别的准确度低的问题,考虑到针对类别预测的准确度低的第一初始边界框而言,此类第一初始边界框对应的第一预测边界框可能不能真正反映生成子模型的边界框预测准确度,进而针对此类第一原始边界框对应的第一预测边界框和实际边界框的判别结果,判别子模型也无法真正反映生成子模型的边界框预测准确度,因此,为了进一步提高回归损失值的准确度,在确定第一预测边界框对应的子回归损失值的过程中,考虑第一预测边界框对应的第一预测类别,只有第一预测边界框对应的真实类别与第一预测类别相匹配的情况下,才考虑其对应的子回归损失值,否则,仅考虑其对应的子分类损失值,即排除类别预测结果不符合预设要求的第一初始边界框对应的子回归损失值,基于此,上述待训练模型还包括分类子模型;每次模型训练的具体实现方式还包括:上述分类子模型对上述第一初始边界框或者上述第一预测边界框进行分类处理,得到第一预测类别;在具体实施时,分类子模型对上述第一初始边界框或者上述第一预测边界框进行类别预测,输出结果可以为第一类别预测结果;其中,第一类别预测结果包括第一初始边界框或者第一预测边界框所圈定的目标对象属于各候选类别的预测概率,预测概率最大值对应的候选类别为第一预测类别,即第一初始边界框或者第一预测边界框所圈定的目标对象的类别被分类子模型预测为第一预测类别,也即第一初始边界框或者第一预测边界框内图像区域的目标对象类别被分类子模型预测为第一预测类别;另外,考虑到第一初始边界框与第一预测边界框的位置信息不会偏差很大,第一初始边界框内的 图像特征与第一预测边界框内的图像特征也不会偏差很大,因此,不会影响边界框内图像区域的目标对象类别的识别,基于此,针对边界框预测与类别预测先后执行的情况,可以将第一预测边界框输入到分类子模型中进行类别预测,得到对应的第一类别预测结果,即先基于第一初始边界框预测得到第一预测边界框,然后对第一预测边界框进行类别预测,得到第一类别预测结果;而针对边界框预测与类别预测同步执行的情况,也可以将第一初始边界框输入到分类子模型中进行类别预测,得到对应的第一类别预测结果,即基于第一初始边界框预测得到第一预测边界框,并且对第一初始边界框进行类别预测,得到第一类别预测结果。Considering that in the target detection process, the target detection model not only needs to determine the location of the target object, but also needs to determine the specific category of the target object. Therefore, during the training process of the target detection model, there may be some first initial bounding boxes for some The problem of low accuracy in category identification is that considering the first initial bounding box with low accuracy in category prediction, the first predicted bounding box corresponding to such first initial bounding box may not truly reflect the generated sub-model. Bounding box prediction accuracy, and then for the discrimination results between the first predicted bounding box and the actual bounding box corresponding to such first original bounding box, the discriminant sub-model cannot truly reflect the bounding box prediction accuracy of the generated sub-model. Therefore, in order to To further improve the accuracy of the regression loss value, in the process of determining the sub-regression loss value corresponding to the first predicted bounding box, the first predicted category corresponding to the first predicted bounding box is considered, and only the true category corresponding to the first predicted bounding box is the same as If the first prediction category matches, its corresponding sub-regression loss value will be considered. Otherwise, only its corresponding sub-category loss value will be considered, that is, the first initial bounding box corresponding to the category prediction result that does not meet the preset requirements will be excluded. Sub-regression loss value. Based on this, the above-mentioned model to be trained also includes a classification sub-model; the specific implementation method of each model training also includes: the above-mentioned classification sub-model performs classification processing on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box. , obtain the first prediction category; in specific implementation, the classification sub-model performs category prediction on the above-mentioned first initial bounding box or the above-mentioned first predicted bounding box, and the output result may be the first category prediction result; wherein, the first category prediction result Including the predicted probability that the target object enclosed by the first initial bounding box or the first predicted bounding box belongs to each candidate category. The candidate category corresponding to the maximum predicted probability is the first predicted category, that is, the first initial bounding box or the first predicted boundary. The category of the target object enclosed by the frame is predicted by the classification sub-model as the first prediction category, that is, the target object category of the image area within the first initial bounding box or the first prediction bounding box is predicted by the classification sub-model as the first prediction category; In addition, considering that the position information of the first initial bounding box and the first predicted bounding box will not deviate greatly, the position information within the first initial bounding box The image features will not deviate greatly from the image features in the first predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box. Based on this, for the case where bounding box prediction and category prediction are executed sequentially , the first predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding first category prediction result can be obtained, that is, the first predicted bounding box is first obtained based on the first initial bounding box prediction, and then the first predicted bounding box is Perform category prediction to obtain the first category prediction result; and for the situation where bounding box prediction and category prediction are executed simultaneously, the first initial bounding box can also be input into the classification sub-model for category prediction to obtain the corresponding first category prediction result. , that is, the first predicted bounding box is obtained based on the prediction of the first initial bounding box, and the category prediction is performed on the first initial bounding box to obtain the first category prediction result.
需要说明的是,上述分类子模型的模型参数迭代训练过程可以参照现有的分类模型训练过程,在此不再赘述。It should be noted that the iterative training process of model parameters of the above classification sub-model can refer to the existing classification model training process, and will not be described again here.
上述目标信息还包括第一初始边界框对应的第一预测类别与第一初始边界框的真实类别之间的匹配关系,其中,针对各第一初始边界框对应的子回归损失值的确定过程,若第一初始边界框对应的第一预测类别与真实类别不匹配,则上述第一初始边界框对应的子回归损失值为零;若第一初始边界框对应的第一预测类别与真实类别相匹配,则上述第一初始边界框对应的子回归损失值为基于上述边界框分布相似程度对应的第一回归损失分量、上述边界框坐标重合程度对应的第二回归损失分量和上述回归损失补偿值中至少一项确定的子回归损失值。The above target information also includes a matching relationship between the first predicted category corresponding to the first initial bounding box and the true category of the first initial bounding box, wherein, for the determination process of the sub-regression loss value corresponding to each first initial bounding box, If the first predicted category corresponding to the first initial bounding box does not match the true category, then the sub-regression loss value corresponding to the first initial bounding box is zero; if the first predicted category corresponding to the first initial bounding box matches the true category matching, then the sub-regression loss value corresponding to the above-mentioned first initial bounding box is based on the first regression loss component corresponding to the above-mentioned boundary box distribution similarity, the second regression loss component corresponding to the above-mentioned bounding box coordinate coincidence degree and the above-mentioned regression loss compensation value The sub-regression loss value determined by at least one of them.
确定第一初始边界框对应的第一预测类别与真实类别是否匹配的预设类别匹配约束条件可以与第一类别预测结果相关,具体可以包括:单一匹配方式的约束条件、或者变化匹配方式的约束条件,其中,对于单一匹配方式的约束条件而言,每一轮模型训练所使用的类别匹配约束条件保持不变(即与当前模型训练轮数无关),例如,针对每一轮模型训练而言,若真实类别与第一预测类别相同,则确定第一初始边界框对应的第一预测类别与真实类别相匹配;对于变化匹配方式的约束条件而言,每一轮模型训练所使用的类别匹 配约束条件与当前模型训练轮数有关,变化匹配方式的约束条件又可以分为类别匹配阶段式约束条件、或者类别匹配渐变式约束条件。The preset category matching constraints that determine whether the first predicted category corresponding to the first initial bounding box matches the true category may be related to the first category prediction result, and specifically may include: constraints in a single matching method, or constraints in changing matching methods. Conditions, among which, for the constraints of a single matching method, the category matching constraints used in each round of model training remain unchanged (that is, independent of the current round of model training), for example, for each round of model training , if the real category is the same as the first predicted category, then it is determined that the first predicted category corresponding to the first initial bounding box matches the real category; for the constraints of the changing matching method, the category matching used in each round of model training The matching constraints are related to the number of current model training rounds. The constraints that change the matching method can be divided into category matching stage constraints or category matching gradient constraints.
其中,上述类别匹配阶段式约束条件可以是在当前模型训练轮数小于第一预设轮数时,真实类别与第一预测类别属于同一类别群组,且在当前模型训练轮数大于或等于第一预设轮数时,真实类别与第一预测类别相同,即基于类别匹配阶段式约束条件和第一初始边界框对应的类别预测结果,能够实现阶段式类别匹配约束;上述类别匹配渐变式约束条件可以是第一约束项与第二约束项之和大于预设概率阈值,第一约束项为类别预测概率子集中真实类别对应的第一预测概率,第二约束项为类别预测概率子集中除第一预测概率之外的第二预测概率之和与预设调节因子的乘积,预设调节因子随着当前训练轮数的增加而逐渐减小,即基于类别匹配渐变式约束条件和第一初始边界框对应的类别预测结果,能够实现渐变式类别匹配约束;基于第一初始边界框对应的类别预测结果确定类别预测概率子集,该类别预测概率子集包括第一预测边界框所圈定的目标对象属于真实类别的第一预测概率、以及属于目标群组中的非真实类别的第二预测概率,即类别预测概率子集包括分类子模型对第一初始边界框或者第一预测边界框进行类别预测得到的,在目标群组中的真实类别下的第一预测概率和在目标群组中的非真实类别(即目标群组中除真实类别之外的候选类别)下的第二预测概率,目标群组为真实类别所在的类别群组;在具体实施时,预先确定与目标检测任务关联的多个候选类别,基于各候选类别的语义信息,对多个候选类别进行群组划分,得到多个类别群组。Wherein, the above-mentioned category matching stage-type constraint may be that when the current model training round number is less than the first preset round number, the real category and the first predicted category belong to the same category group, and when the current model training round number is greater than or equal to the first preset round number, the real category and the first predicted category belong to the same category group. At a preset number of rounds, the real category is the same as the first predicted category, that is, based on the category matching staged constraints and the category prediction result corresponding to the first initial bounding box, the staged category matching constraint can be realized; the above category matching gradient constraint The condition may be that the sum of the first constraint term and the second constraint term is greater than the preset probability threshold, the first constraint term is the first prediction probability corresponding to the true category in the category prediction probability subset, and the second constraint term is the category prediction probability subset except The product of the sum of the second predicted probabilities other than the first predicted probability and the preset adjustment factor. The preset adjustment factor gradually decreases as the current number of training rounds increases, that is, based on the category matching gradient constraints and the first initial The category prediction result corresponding to the bounding box can realize the gradual category matching constraint; a category prediction probability subset is determined based on the category prediction result corresponding to the first initial bounding box, and the category prediction probability subset includes the target circled by the first prediction bounding box The first predicted probability that the object belongs to the real category, and the second predicted probability that the object belongs to the non-real category in the target group, that is, the category predicted probability subset includes a classification sub-model that classifies the first initial bounding box or the first predicted bounding box. Predicted, the first predicted probability under the real category in the target group and the second predicted probability under the non-real category in the target group (that is, the candidate category other than the real category in the target group), The target group is the category group where the real category is located; in the specific implementation, multiple candidate categories associated with the target detection task are predetermined, and based on the semantic information of each candidate category, the multiple candidate categories are divided into groups to obtain multiple category groups.
由于考虑到第一初始边界框是利用预设感兴趣区域提取模型进行感兴趣区域提取得到的,因此,可能存在由于第一初始边界框所圈定的目标对象所在区域不够精准,从而导致在模型训练初期针对此类第一初始边界框对应的第一预测边界框的类别识别不准确的情况,基于此,在确定第一初始边界框对应的子回归损失值的过程中,参考第一初始边界框对应的第一预测类别与 第一初始边界框的真实类别之间的匹配关系,即基于上述预设类别匹配约束条件,确定用于表征第一初始边界框对应的第一预测类别与真实类别是否匹配的匹配关系。Considering that the first initial bounding box is obtained by extracting the area of interest using a preset area of interest extraction model, it may be that the area where the target object is delineated by the first initial bounding box is not accurate enough, resulting in model training. In the early stage, the category recognition of the first predicted bounding box corresponding to such a first initial bounding box is inaccurate. Based on this, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, the first initial bounding box is referred to The corresponding first prediction category is the same as The matching relationship between the real categories of the first initial bounding box is determined based on the above-mentioned preset category matching constraints and is used to determine whether the first predicted category corresponding to the first initial bounding box matches the real category.
进一步地,分类子模型可以是预先训练好的,也可以是在生成子模型的模型参数进行训练的过程中,同步对分类子模型的模型参数进行训练,即基于第一预测类别和真实类别确定分类损失值,基于分类损失值对分类子模型的模型参数进行迭代训练,其中,针对同步训练分类子模型的模型参数的情况,又考虑到还可能是由于在模型训练前期,待训练模型中的分类子模型中的模型参数的准确度低,从而导致针对第一初始边界框对应的第一预测边界框的类别识别不准确的情况,因此,在模型训练前期,放宽对类别准确度的要求,只要第一预测边界框对应的真实类别与第一预测类别属于同一类别群组的情况下,均考虑其对应的子回归损失值,而在模型训练后期,加严对类别准确度的要求,只有第一预测边界框对应的真实类别与第一预测类别相同的情况下,才考虑其对应的子回归损失值,基于此,上述预设类别匹配约束条件可以包括:上述变化匹配方式的约束条件(如类别匹配阶段式约束条件、或者类别匹配渐变式约束条件);Further, the classification sub-model can be pre-trained, or the model parameters of the classification sub-model can be trained simultaneously during the training process of generating the model parameters of the sub-model, that is, based on the first predicted category and the true category. The classification loss value is used to iteratively train the model parameters of the classification sub-model based on the classification loss value. In view of the situation of synchronous training of the model parameters of the classification sub-model, it is also considered that it may be due to the early stage of model training. The accuracy of the model parameters in the classification sub-model is low, resulting in inaccurate category identification of the first predicted bounding box corresponding to the first initial bounding box. Therefore, in the early stage of model training, the requirements for category accuracy are relaxed, As long as the real category corresponding to the first predicted bounding box and the first predicted category belong to the same category group, the corresponding sub-regression loss value will be considered. In the later stage of model training, the requirements for category accuracy will be tightened. Only Only when the real category corresponding to the first predicted bounding box is the same as the first predicted category will its corresponding sub-regression loss value be considered. Based on this, the above-mentioned preset category matching constraints may include: the above-mentioned constraints on the changing matching method ( Such as category matching stage-type constraints, or category matching gradient constraints);
更进一步地,为了确保预设类别匹配约束条件在限定第一预测类别与真实类别满足匹配关系的两种类别匹配约束分支(即第一预测类别属于目标群组、第一预测类别与真实类别相同)之间的过渡更加平滑,使得随着模型训练轮数的增加,预设类别匹配约束条件由限定第一预测类别落入目标群组逐渐转换为限定第一预测类别与真实类别相同,基于此,优选地,上述预设类别匹配约束条件包括:类别匹配渐变式约束条件。Furthermore, in order to ensure that the preset category matching constraint conditions are limited to two category matching constraint branches that satisfy the matching relationship between the first predicted category and the real category (that is, the first predicted category belongs to the target group, and the first predicted category is the same as the real category). ), so that as the number of model training rounds increases, the preset category matching constraints gradually transform from limiting the first predicted category to fall into the target group to limiting the first predicted category to be the same as the real category. Based on this , preferably, the above-mentioned preset category matching constraints include: category matching gradient constraints.
在具体实施时,针对上述预设类别匹配约束条件为类别匹配渐变式约束条件的情况,仍以序号为i的第一初始边界框为例,类别匹配渐变式约束条件可以表达为:
In specific implementation, in the case where the above-mentioned preset category matching constraint is a category matching gradient constraint, still taking the first initial bounding box with the serial number i as an example, the category matching gradient constraint can be expressed as:
其中,groups表示目标群组,reali表示目标群组groups中序号为i的第一初始边界框的真实类别,f∈groups\reali表示目标群组中的非真实类别,β表示预测调节因子,表示第一预测概率(即上述第一约束项),表示第二预测概率,表示上述第二约束项,μ表示上述预设概率阈值;越大,说明第一预测类别与真实类别越接近;由于预设调节因子随着当前训练轮数的增加而减小,使得第二约束项的参考占比逐渐减小,使得在模型训练后期主要由第一约束项(即真实类别下的第一预测概率)来决定第一预测类别与真实类别是否匹配,然后在当前模型训练轮数达到一定模型训练轮数之后,第二约束项变为零,即当大于预设概率阈值时,说明分类子模型将真实类别确定为第一预测类别。Among them, groups represents the target group, real i represents the real category of the first initial bounding box with serial number i in the target group groups, f∈groups\real i represents the non-real category in the target group, and β represents the prediction adjustment factor , Represents the first prediction probability (i.e. the above-mentioned first constraint item), represents the second prediction probability, represents the above-mentioned second constraint item, μ represents the above-mentioned preset probability threshold; The larger it is, the closer the first predicted category is to the real category; since the preset adjustment factor decreases as the current number of training rounds increases, the reference proportion of the second constraint item gradually decreases, making it more important in the later stages of model training. The first constraint term (i.e., the first prediction probability under the real category) determines whether the first predicted category matches the real category, and then after the current number of model training rounds reaches a certain number of model training rounds, the second constraint term becomes zero. , that is, when When it is greater than the preset probability threshold, it means that the classification sub-model determines the true category as the first predicted category.
针对上述预设调节因子而言,随着当前模型训练轮数的增加而减小,若当前模型训练轮数小于或等于目标训练轮数,则上述第二约束项与预设调节因子正相关,上述预设调节因子与当前模型训练轮数负相关;若当前模型训练轮数大于目标训练轮数,则上述第二约束项为零,其中,目标训练轮数小于总训练轮数。For the above-mentioned preset adjustment factor, it decreases as the number of current model training rounds increases. If the current number of model training rounds is less than or equal to the target number of training rounds, then the above-mentioned second constraint term is positively related to the preset adjustment factor, The above-mentioned preset adjustment factor is negatively related to the current number of model training rounds; if the current number of model training rounds is greater than the target number of training rounds, then the above-mentioned second constraint is zero, where the target number of training rounds is less than the total number of training rounds.
在具体实施时,为了确保对预设调节因子的调整平滑度,可以采用线性递减的调节方式逐渐减少预设调节因子β的取值,因此,针对当前模型训练所使用的预设调节因子的确定过程,具体为:During specific implementation, in order to ensure the smoothness of the adjustment of the preset adjustment factor, a linear decreasing adjustment method can be used to gradually reduce the value of the preset adjustment factor β. Therefore, for the determination of the preset adjustment factor used in current model training The process, specifically:
(1)针对首轮模型训练,将第一预设值确定为当前模型训练所使用的预设调节因子。(1) For the first round of model training, determine the first preset value as the preset adjustment factor used in current model training.
第一预设值可以根据实际需求进行设定,为了简化调节复杂度,可以将第一预设值设置为1,即预设调节因子β=1,也即在首轮模型训练的情况下,上述类别匹配渐变式约束条件可以为:The first preset value can be set according to actual needs. In order to simplify the adjustment complexity, the first preset value can be set to 1, that is, the preset adjustment factor β=1, that is, in the first round of model training, The above category matching gradient constraints can be:
Right now
也就是说,针对首轮模型训练,基于目标群组对应的第一预测概率和第二预测概率之和,确定第一初始边界框对应的第一预测类别与真实类别是否匹配。That is to say, for the first round of model training, based on the sum of the first predicted probability and the second predicted probability corresponding to the target group, it is determined whether the first predicted category corresponding to the first initial bounding box matches the true category.
(2)针对非首轮模型训练,按照因子递减调节方式,基于当前模型训练轮数、目标训练轮数和上述第一预设值,确定当前模型训练所使用的预设调节因子。(2) For non-first round model training, determine the preset adjustment factor used in the current model training based on the current model training round number, the target training round number and the above-mentioned first preset value according to the factor decreasing adjustment method.
若首轮模型训练对应的预设调节因子β=1,则在非首轮模型训练的情况下,上述类别匹配渐变式约束条件可以为:
If the preset adjustment factor β=1 corresponding to the first round of model training, then in the case of non-first round of model training, the above category matching gradient constraints can be:
也就是说,针对非首轮模型训练,上述类别匹配渐变式约束条件中的并且随着模型训练轮数的增加,第二约束项的参与程度逐渐减小。In other words, for non-first-round model training, the above categories match the gradient constraints And as the number of model training rounds increases, the second constraint participation gradually decreases.
例如,上述因子递减调节方式对应的递减公式可以为:
For example, the decreasing formula corresponding to the above factor decreasing adjustment method can be:
其中,表示与0之间取最大值,上述中的第一项1表示第一预设值(即首轮训练所使用的预设调节因子β),δ表示当前模型训练轮数,Z表示目标训练轮数,即目标训练轮数可以是总训练轮数减1,也可以是指定训练轮数,指定训练轮数小于总训练轮数,总训练轮数与指定训练轮数的差值为预设轮数Q,Q大于2,即在模型训练后期的一定轮数(非最后一轮)的训练过程中,就开始将预设调节因子β设置为0,也即在模型训练后期的δ=Z+1轮至最后一轮的模型训练所使用的判定条件均为 in, express Take the maximum value between 0 and 0, the above The first item 1 in represents the first preset value (i.e., the preset adjustment factor β used in the first round of training), δ represents the current model training round number, and Z represents the target training round number, that is, the target training round number can be the total The number of training rounds is reduced by 1, or it can be the specified number of training rounds. The specified number of training rounds is less than the total number of training rounds. The difference between the total number of training rounds and the specified number of training rounds is the preset number of rounds Q. Q is greater than 2, that is, in the model During the training process of a certain number of rounds (not the last round) in the later stage of training, the preset adjustment factor β begins to be set to 0, that is, from δ=Z+1 in the later stage of model training to the last round of model training. The judgment conditions used are all
需要说明的是,针对目标训练轮数Z为总训练轮数减1的情况,上述递减公式可以为:即在模型训练的最后一轮,将预设调节因子设 置为0,也即在最后一轮的模型训练所使用的判定条件均为另外,上述示意出的递减公式仅是给出的一种比较简单的线性递减调节方式,在实际应用过程中,可以根据实际需求设置对预设调节因子β的递减速率,因此,上述递减公式并不构成对本申请的保护范围的限制。It should be noted that for the situation where the target number of training rounds Z is the total number of training rounds minus 1, the above reduction formula can be: That is, in the last round of model training, the preset adjustment factor is set to Set to 0, that is, the judgment conditions used in the last round of model training are all In addition, the decrease formula shown above is only a relatively simple linear decrease adjustment method. In the actual application process, the decrease rate of the preset adjustment factor β can be set according to actual needs. Therefore, the above decrease formula does not It does not constitute a limitation on the scope of protection of this application.
在具体实施时,上述待训练模型包括生成子模型、判别子模型和分类子模型,如图4b所示,给出了又一种目标检测模型训练过程的具体实现原理示意图,具体包括:In specific implementation, the above-mentioned model to be trained includes a generation sub-model, a discriminant sub-model and a classification sub-model, as shown in Figure 4b, which provides a schematic diagram of the specific implementation principle of the training process of another target detection model, including:
(1)预先利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取,得到N个锚框。(1) Preliminarily use the preset region of interest extraction model to extract the target region of the sample image data set to obtain N anchor frames.
(2)针对每一轮模型训练,从N个锚框中随机采样m个锚框作为第一初始边界框,以及确定每个第一初始边界框分别对应的真实边界框。(2) For each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first initial bounding boxes, and the real bounding boxes corresponding to each first initial bounding box are determined.
(3)针对每个第一初始边界框,生成子模型基于该第一初始边界框进行边界框预测,得到第一预测边界框;判别子模型基于该第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合;分类子模型对第一预测边界框进行类别预测,得到类别预测结果;根据预设类别匹配约束条件、该第一初始边界框对应的真实边界框的真实类别和该第一初始边界框对应的第一预测边界框的类别预测结果,确定类别匹配结果;若类别匹配结果表征第一预测类别与真实类别不满足预设类别匹配约束条件,则该第一初始边界框对应的子回归损失值为零;若类别匹配结果表征第一预测类别与真实类别满足预设类别匹配约束条件,则基于第一初始边界框的判别结果集合中的第一判别结果确定第一回归损失分量,基于第一初始边界框的判别结果集合中的第二判别结果确定第二回归损失分量,以及基于第一初始边界框的判别结果集合中的第三判别结果确定第三回归损失分量,再基于第一回归损失分量、第二回归损失分量和第三回归损失分量确定第一初始边界框对应的子回归损失值。(3) For each first initial bounding box, the generating sub-model predicts the bounding box based on the first initial bounding box to obtain the first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and The corresponding first predicted bounding box generates a set of discrimination results; the classification sub-model performs category prediction on the first predicted bounding box to obtain the category prediction result; according to the preset category matching constraints, the real bounding box corresponding to the first initial bounding box The category prediction result of the true category and the first predicted bounding box corresponding to the first initial bounding box determines the category matching result; if the category matching result indicates that the first predicted category and the true category do not satisfy the preset category matching constraints, then the The sub-regression loss value corresponding to the first initial bounding box is zero; if the category matching result represents that the first predicted category and the true category satisfy the preset category matching constraints, then the first discrimination in the set of discrimination results based on the first initial bounding box As a result, a first regression loss component is determined, a second regression loss component is determined based on a second discrimination result in a set of discrimination results of the first initial bounding box, and a third discrimination result is determined based on a set of discrimination results of the first initial bounding box. three regression loss components, and then determine the sub-regression loss value corresponding to the first initial bounding box based on the first regression loss component, the second regression loss component and the third regression loss component.
需要说明的是,上述类别匹配结果的确定过程可以是由单独的处理模块 执行,也可以由判别子模型执行,这样针对第一预测类别与真实类别不满足预设类别匹配约束条件的情况,直接确定对应的判别结果集合为空或者预设信息即可,无需基于第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合,能够进一步提高模型训练效率;参照在图4b中所示的,将各真实边界框对应的真实类别和各第一预测边界框对应的类别预测结果输入至判别子模型;判别子模型根据第一初始边界框对应的真实边界框的真实类别和第一初始边界框对应的第一预测边界框的类别预测结果,确定类别匹配结果;若类别匹配结果表征第一预测类别与真实类别不满足预设类别匹配约束条件,则对应的判别结果集合为空或者预设信息,因此,基于判别结果集合确定出的子回归损失值为零;若类别匹配结果表征第一预测类别与真实类别满足预设类别匹配约束条件,则基于该第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合;因此,基于判别结果集合确定出的子回归损失值为基于判别结果集合中的第一判别结果对应的第一回归损失分量、第二判别结果对应的第二回归损失分量、第三判别结果对应的第三回归损失分量所确定的;It should be noted that the determination process of the above category matching results may be performed by a separate processing module Execution can also be executed by the discriminant sub-model. In this way, when the first predicted category and the real category do not satisfy the preset category matching constraints, it is enough to directly determine that the corresponding set of discriminant results is empty or has preset information, without the need to base it on the first The real bounding box corresponding to the initial bounding box and the corresponding first predicted bounding box generate a set of discrimination results, which can further improve the model training efficiency; with reference to what is shown in Figure 4b, the real category corresponding to each real bounding box and each third The category prediction result corresponding to a predicted bounding box is input to the discriminator model; the discriminator model predicts the category prediction result of the first predicted bounding box corresponding to the first initial bounding box based on the true category of the real bounding box corresponding to the first initial bounding box, Determine the category matching result; if the category matching result represents that the first predicted category and the true category do not meet the preset category matching constraints, the corresponding discrimination result set is empty or preset information. Therefore, the sub-regression determined based on the discrimination result set The loss value is zero; if the category matching result represents that the first predicted category and the real category satisfy the preset category matching constraints, then a discrimination result is generated based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; therefore, the sub-regression loss value determined based on the discrimination result set is based on the first regression loss component corresponding to the first discrimination result in the discrimination result set, the second regression loss component corresponding to the second discrimination result, and the third discrimination result Determined by the corresponding third regression loss component;
也就是说,在确定第一初始边界框对应的子回归损失值是否为零的过程中,可以是直接基于第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合;进而确定类别预测结果确定第一预测类别与真实类别的匹配关系(即类别匹配结果,表示第一预测类别与真实类别之间是否满足预设类别匹配约束条件);若匹配关系为类别不匹配,则确定对应的子回归损失值为零,若匹配关系为类别匹配,则基于判别结果集合中的多个判别结果确定对应的子回归损失值;也可以是先基于类别预测结果确定第一预测类别与真实类别的匹配关系,若匹配关系为类别不匹配,则确定对应的判别结果集合为空或预设信息,以及确定对应的子回归损失值为零,若匹配关系为类别匹配,则基于第一初始边界框对应的真实边界框和对应的第一预测边界框,生成判别结果集合,以及基于判别结果集合中的多个判别结果确定对 应的子回归损失值。That is to say, in the process of determining whether the sub-regression loss value corresponding to the first initial bounding box is zero, the discrimination result can be generated directly based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box. set; and then determine the category prediction result to determine the matching relationship between the first predicted category and the real category (that is, the category matching result indicates whether the first predicted category and the real category satisfy the preset category matching constraints); if the matching relationship is that the category is not matching, then determine the corresponding sub-regression loss value to be zero. If the matching relationship is category matching, then determine the corresponding sub-regression loss value based on multiple discrimination results in the discrimination result set; you can also first determine the first one based on the category prediction results. The matching relationship between the predicted category and the real category. If the matching relationship is category mismatch, then it is determined that the corresponding discrimination result set is empty or preset information, and the corresponding sub-regression loss value is determined to be zero. If the matching relationship is category matching, then Generate a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the corresponding first predicted bounding box, and determine the pair based on multiple discrimination results in the set of discrimination results. The corresponding sub-regression loss value.
(4)基于第一初始边界框对应的子回归损失值,确定待训练模型的回归损失值;利用随机梯度下降方法,基于该回归损失值调整上述生成子模型和判别子模型的模型参数,得到参数更新后的生成子模型和判别子模型。(4) Based on the sub-regression loss value corresponding to the first initial bounding box, determine the regression loss value of the model to be trained; use the stochastic gradient descent method to adjust the model parameters of the above-mentioned generative sub-model and discriminant sub-model based on the regression loss value, and obtain Generative submodel and discriminant submodel after parameter update.
(5)若当前模型训练结果满足预设模型训练结束条件,则将上述更新后的生成子模型确定为训练后的目标检测模型;若当前模型训练结果不满足预设模型训练结束条件,则将上述更新后的生成子模型和判别子模型确定为下一轮模型训练所使用的待训练模型,直到满足预设模型训练结束条件。(5) If the current model training results meet the preset model training end conditions, then the above updated generated sub-model is determined as the trained target detection model; if the current model training results do not meet the preset model training end conditions, then the The above updated generative sub-model and discriminant sub-model are determined as the to-be-trained models used in the next round of model training until the preset model training end conditions are met.
本申请实施例中的目标检测模型训练方法,在模型训练阶段,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值对生成子模型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;并且判别子模型所输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到弥补边界框分布相似但具体位置偏差所带来的 边界框回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度。The target detection model training method in the embodiment of the present application, during the model training stage, is based on the real bounding box and the first initial bounding box, prompting the model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer Based on the real bounding box, this can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve the generalization of the trained target detection model, thereby ensuring the use of the target The detection model's target detection accuracy for new images to be detected improves the data migration adaptability of the trained target detection model; and the model to be trained includes a generating sub-model and a discriminating sub-model, based on the set of discrimination results output by the discriminating sub-model. , determine the regression loss value of the model to be trained, and then continuously update the model parameters of the generative sub-model and the discriminant sub-model for multiple rounds of iterations based on the regression loss value, until the current model training results meet the preset model training end conditions, that is, based on the generative discriminant Multiple rounds of confrontation are used to continuously learn the bounding box distribution, in which the discriminator model can determine whether the first predicted bounding box predicted by the generating submodel is realistic enough. When it is difficult to distinguish, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving The model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discriminating results output by the discriminating sub-model not only includes the first discriminating result that represents the similarity of the bounding box distribution, but also includes the third discriminating result that represents the degree of coincidence of the bounding box coordinates. The second discrimination result is to compensate for the problem caused by similar distribution of bounding boxes but specific position deviation. The effect of the bounding box regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
对应上述图1至图4b描述的目标检测模型训练方法,基于相同的技术构思,本申请实施例还提供了一种目标检测方法,图5为本申请实施例提供的目标检测方法的流程示意图,图5中的方法能够由设置有目标检测装置的电子设备执行,该电子设备可以是终端设备或者指定服务器,其中,用于目标检测的硬件装置(即设置有目标检测装置的电子设备)与目标检测模型训练的硬件装置(即设置有目标检测模型训练装置的电子设备)可以相同或不同,如图5所示,该方法至少包括以下步骤:Corresponding to the target detection model training method described in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a target detection method. Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application. The method in Figure 5 can be executed by an electronic device provided with a target detection device, which may be a terminal device or a designated server, wherein the hardware device for target detection (ie, the electronic device provided with the target detection device) and the target The hardware device for detection model training (that is, the electronic device equipped with the target detection model training device) can be the same or different. As shown in Figure 5, the method at least includes the following steps:
S502,获取第三预设数量的第二初始边界框;其中,第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的。S502: Obtain a third preset number of second initial bounding boxes; wherein the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.
第三预设数量的第二初始边界框的获取过程,可以参照上述第一预设数量的第一初始边界框的获取过程,在此不再赘述。The process of obtaining the third preset number of second initial bounding boxes may be referred to the above-mentioned process of obtaining the first preset number of first initial bounding boxes, which will not be described again here.
S504,将上述第二初始边界框输入目标检测模型进行目标检测,得到各第二初始边界框对应的第二预测边界框和第二预测类别;其中,目标检测模型是基于上述目标检测模型训练方法训练得到的,目标检测模型的具体训练过程参见上述实施例,在此不再赘述。S504, input the above-mentioned second initial bounding box into the target detection model for target detection, and obtain the second prediction boundary box and the second prediction category corresponding to each second initial boundary box; wherein, the target detection model is based on the above-mentioned target detection model training method. For the specific training process of the target detection model obtained by training, please refer to the above embodiments and will not be described again here.
上述目标检测模型包括分类子模型和生成子模型;针对每个第二初始边界框:在目标检测过程中,生成子模型基于第二初始边界框进行边界框预测,得到第二初始边界框对应的第二预测边界框;分类子模型对第二初始边界框或者第二预测边界框进行分类处理,得到第二初始边界框对应的第二预测类别。The above target detection model includes a classification sub-model and a generation sub-model; for each second initial bounding box: during the target detection process, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the corresponding second initial bounding box. The second predicted bounding box; the classification sub-model performs classification processing on the second initial bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second initial bounding box.
分类子模型对上述第二初始边界框或者上述第二预测边界框进行类别预测,输出结果可以为第二类别预测结果;其中,第二类别预测结果包括第二初始边界框或者第二预测边界框所圈定的目标对象属于各候选类别的预测概 率,预测概率最大值对应的候选类别为第二预测类别,即第二初始边界框或者第二预测边界框所圈定的目标对象的类别被分类子模型预测为第二预测类别,也即第二初始边界框或者第二预测边界框内图像区域的目标对象类别被分类子模型预测为第二预测类别;另外,在具体实施时,考虑到第二初始边界框与第二预测边界框的位置信息不会偏差很大,第二初始边界框内的图像特征与第二预测边界框内的图像特征也不会偏差很大,因此,不会影响边界框内图像区域的目标对象类别的识别,基于此,针对边界框预测与类别预测先后执行的情况,可以将第二预测边界框输入到分类子模型中进行类别预测,得到对应的第二类别预测结果,即先基于第二初始边界框预测得到第二预测边界框,然后对第二预测边界框进行类别预测,得到第二类别预测结果;而针对边界框预测与类别预测同步执行的情况,也可以将第二初始边界框输入到分类子模型中进行类别预测,得到对应的第二类别预测结果,即基于第二初始边界框预测得到第二预测边界框,并且对第二初始边界框进行类别预测,得到第二类别预测结果。The classification sub-model performs category prediction on the above-mentioned second initial bounding box or the above-mentioned second predicted bounding box, and the output result may be a second category prediction result; wherein the second category prediction result includes the second initial bounding box or the second predicted bounding box. The circled target object belongs to the predicted probability of each candidate category. rate, the candidate category corresponding to the maximum prediction probability is the second prediction category, that is, the category of the target object enclosed by the second initial bounding box or the second prediction bounding box is predicted by the classification sub-model as the second prediction category, that is, the second prediction category. The target object category of the image area within the initial bounding box or the second predicted bounding box is predicted by the classification sub-model as the second predicted category; in addition, during specific implementation, the position information of the second initial bounding box and the second predicted bounding box is taken into account There will not be a large deviation, and the image features in the second initial bounding box will not deviate greatly from the image features in the second predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box, based on Therefore, for the situation where bounding box prediction and category prediction are performed sequentially, the second predicted bounding box can be input into the classification sub-model for category prediction, and the corresponding second category prediction result is obtained, that is, based on the second initial bounding box prediction. Secondly predict the bounding box, and then perform category prediction on the second predicted bounding box to obtain the second category prediction result; and for the situation where boundary box prediction and category prediction are executed simultaneously, the second initial bounding box can also be input to the classification sub-model Category prediction is performed in the method to obtain the corresponding second category prediction result, that is, the second predicted bounding box is obtained based on the second initial bounding box prediction, and category prediction is performed on the second initial bounding box to obtain the second category prediction result.
S506,基于各第二初始边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果;S506: Generate a target detection result of the image to be detected based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box;
基于各第二初始边界框对应的第二预测边界框和第二预测类别,即可确定待检测图像中所包含的目标对象的数量、以及各目标对象所属类别,例如,待检测图像中包含一只猫、一只狗和一个行人。Based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined. For example, the image to be detected contains a A cat, a dog and a pedestrian.
上述目标检测模型包括生成子模型和分类子模型,如图6所示,给出了一种目标检测过程的具体实现原理示意图,具体包括:利用预设感兴趣区域提取模型对待检测图像进行目标区域提取,得到P个锚框;从P个锚框中随机采样n个锚框作为第二初始边界框;针对每个第二初始边界框,生成子模型基于该第二初始边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;The above target detection model includes a generation sub-model and a classification sub-model. As shown in Figure 6, a schematic diagram of the specific implementation principle of the target detection process is given, which specifically includes: using the preset region of interest extraction model to target the image to be detected. Extract and obtain P anchor boxes; randomly sample n anchor boxes from the P anchor boxes as the second initial bounding box; for each second initial bounding box, generate a sub-model to predict the bounding box based on the second initial bounding box. , obtain the second predicted bounding box; the classification sub-model predicts the category of the second predicted bounding box, and obtains the second predicted category;
基于各第二初始边界框对应的第二预测边界框和第二预测类别,生成待 检测图像的目标检测结果。Based on the second predicted bounding box and the second predicted category corresponding to each second initial bounding box, generate Detect the object detection results of the image.
需要说明的是,基于上述目标检测模型训练方法训练得到的目标检测模型可以应用到任一需要对待检测图像进行目标检测的具体应用场景,其中,该待检测图像可以是设置于某一现场位置的图像采集设备所采集得到的,对应的,目标检测装置可以属于该图像采集设备,具体可以是图像采集设备中的图像处理装置,图像处理装置接收图像采集设备中图像采集装置传输的待检测图像,并对该待检测图像进行目标检测;目标检测装置也可以是独立于图像采集设备的单独的一个目标检测设备,目标检测设备接收图像采集设备的待检测图像,并对该待检测图像进行目标检测。It should be noted that the target detection model trained based on the above target detection model training method can be applied to any specific application scenario that requires target detection on the image to be detected, where the image to be detected can be set at a certain on-site location. What is collected by the image acquisition device, correspondingly, the target detection device can belong to the image acquisition device, and specifically can be an image processing device in the image acquisition device. The image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, And perform target detection on the image to be detected; the target detection device can also be a separate target detection device independent of the image acquisition device. The target detection device receives the image to be detected from the image acquisition device and performs target detection on the image to be detected. .
针对目标检测的具体应用场景,例如,待检测图像可以是设置于某一公共场所入口(如商场入口、地铁口、景点入口、或演出现场入口等)的图像采集设备所采集得到的,对应的,待检测图像中的待检测目标对象为进入该公共场所的目标用户,利用上述目标检测模型对待检测图像进行目标检测,以在待检测图像中圈定出包含进入该公共场所的目标用户的第二预测边界框,并确定第二预测边界框对应的第二预测类别(即第二预测边界框中包含的目标用户所属类别,如年龄段、性别、身高、职业中至少一项),得到待检测图像的目标检测结果;然后,基于目标检测结果确定用户群识别结果(如进入该公共场所的人流量、或者进入该公共场所的用户群属性等等),进而,基于用户群识别结果执行相应的业务处理(如自动触发入场限制提示操作、或者对目标用户进行信息推送等等);其中,上述目标检测模型的模型参数的准确度越高,利用目标检测模型输出的待检测图像的目标检测结果的准确度也就越高,因此,基于目标检测结果触发执行相应的业务处理的准确度也就越高。For specific application scenarios of target detection, for example, the image to be detected can be collected by an image collection device installed at the entrance of a certain public place (such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.). The corresponding , the target object to be detected in the image to be detected is the target user who enters the public place. The above target detection model is used to perform target detection on the image to be detected, so as to delineate the second target user who enters the public place in the image to be detected. Predict the bounding box, and determine the second prediction category corresponding to the second predicted bounding box (that is, the category of the target user included in the second predicted bounding box, such as at least one of age group, gender, height, and occupation), and obtain the target user to be detected The target detection result of the image; then, the user group identification result is determined based on the target detection result (such as the flow of people entering the public place, or the attributes of the user group entering the public place, etc.), and then, based on the user group identification result, the corresponding Business processing (such as automatically triggering admission restriction prompt operations, or pushing information to target users, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the target detection of the image to be detected output by the target detection model is The accuracy of the results will be higher. Therefore, the accuracy of triggering corresponding business processing based on the target detection results will be higher.
又如,待检测图像可以是设置于某一养殖基地中各监控点的图像采集设备所采集得到的,对应的,待检测图像中的待检测目标对象为该养殖监控点内的目标养殖对象,利用上述目标检测模型对待检测图像进行目标检测,以在待检测图像中圈定出包含目标养殖对象的第二预测边界框,并确定第二预 测边界框对应的第二预测类别(即第二预测边界框中包含的目标养殖对象所属类别,如活体状态、体型大小中至少一项),得到待检测图像的目标检测结果;然后,基于目标检测结果确定养殖对象群体识别结果(如该养殖监控点内目标养殖对象存活率、或者养殖监控点内目标养殖对象的生长速率等等),进而,基于养殖对象群体识别结果执行相应的管控操作(如若检测出存活率下降,则自动发出告警提示信息、或者若检测出生长速率减缓,则自动控制增加喂养量或喂养频次等等);其中,上述目标检测模型的模型参数的准确度越高,利用目标检测模型输出的待检测图像的目标检测结果的准确度也就越高,因此,基于目标检测结果触发执行相应的管控操作的准确度也就越高。For another example, the image to be detected can be collected by image acquisition equipment installed at each monitoring point in a certain breeding base. Correspondingly, the target object to be detected in the image to be detected is the target breeding object in the breeding monitoring point. Use the above target detection model to perform target detection on the image to be detected, so as to delineate the second predicted bounding box containing the target breeding object in the image to be detected, and determine the second predicted bounding box. Detect the second prediction category corresponding to the bounding box (that is, the category of the target breeding object contained in the second prediction bounding box, such as at least one of living status and body size), and obtain the target detection result of the image to be detected; then, based on the target The detection results determine the identification results of the breeding object group (such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.), and then perform corresponding control operations based on the identification result of the breeding object group ( If a decrease in survival rate is detected, an alarm message will be automatically issued, or if a slowdown in growth rate is detected, the feeding amount or frequency will be automatically increased, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the The accuracy of the target detection results of the image to be detected outputted by the target detection model will be higher. Therefore, the accuracy of triggering corresponding control operations based on the target detection results will be higher.
本申请实施例中的目标检测方法,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个候选边界框,再在候选边界框中随机采样第三预设数量的候选边界框作为第二初始边界框;针对每个第二初始边界框,生成子模型基于该第二初始边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于第二初始边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果;其中,由于生成子模型的模型参数训练过程中,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得第一预测边界框更加接近于真实边界框,提高目标检测模型的模型泛化性和数据迁移性,进而提高对待检测图像中目标对象所在位置的边界框预测的准确度;并且待训练模型包括生成子模型和判别子模型,基于判别子模型输出的判别结果集合确定回归损失值,再不断基于回归损失值对模型参数进行迭代更新,提高生成子模型的模型参数更新效率;并且判别结果集合同时包含表征边界框分布相似程度和边界框坐标重合程度的判别结果,使得基于判别结果集合得到的回归损失值准确度更高,进一步提高基于该回归损失值更新后的模型参数的准确度,从而确保生成子模型在新的待检测图像上也能够准确地进行边界框预测,进而提高利用目标检测模型对待检测图像进行目标检测的准确 度。The target detection method in the embodiment of the present application, during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third preset number of candidate bounding boxes among the candidate bounding boxes as The second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, Obtain the second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to the second initial bounding box, generate the target detection result of the image to be detected; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bounding box prediction of the location of the target object in the image to be detected; and the model to be trained includes a generative sub-model and a discriminant sub-model, the regression loss value is determined based on the set of discrimination results output by the discriminant sub-model, and then the regression loss value is continuously calculated The model parameters are updated iteratively to improve the efficiency of updating the model parameters of the generated sub-model; and the discrimination result set also contains discrimination results that represent the similarity of the distribution of bounding boxes and the degree of coincidence of bounding box coordinates, making the regression loss value obtained based on the discrimination result set more accurate Higher, the accuracy of the model parameters updated based on the regression loss value is further improved, thereby ensuring that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the use of the target detection model on the image to be detected. Accurate target detection Spend.
需要说明的是,本申请中该实施例与本申请中上一实施例基于同一发明构思,因此该实施例的具体实施可以参见前述目标检测模型训练方法的实施,重复之处不再赘述。It should be noted that this embodiment in this application is based on the same inventive concept as the previous embodiment in this application. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the aforementioned target detection model training method, and repeated details will not be repeated.
对应上述图1至图4b描述的目标检测模型训练方法,基于相同的技术构思,本申请实施例还提供了一种目标检测模型训练装置,图7为本申请实施例提供的目标检测模型训练装置的模块组成示意图,该装置用于执行图1至图4b描述的目标检测模型训练方法,如图7所示,该装置包括:Corresponding to the target detection model training method described in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a target detection model training device. Figure 7 shows the target detection model training device provided by the embodiment of the present application. A schematic diagram of the module composition. The device is used to perform the target detection model training method described in Figures 1 to 4b. As shown in Figure 7, the device includes:
第一边界框获取模块702,被配置为获取第一初始边界框,以及获取所述第一初始边界框对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的。The first bounding box acquisition module 702 is configured to acquire a first initial bounding box and acquire a real bounding box corresponding to the first initial bounding box; the first initial bounding box is extracted using a preset region of interest model pair The sample image data set is obtained by extracting the target area.
模型训练模块704,被配置为将所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型。The model training module 704 is configured to input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model.
其中,所述待训练模型包括生成子模型和判别子模型;上述模型迭代训练中的每次模型训练包括:针对每个所述第一初始边界框:所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于各所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box. Result set; the discrimination result set includes a first discrimination result and a second discrimination result, the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box, and the second The discrimination result represents the overlap degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to each of the first initial bounding boxes, the model to be trained is determined The regression loss value; perform parameter updates on the generator sub-model and the discriminant sub-model based on the regression loss value.
本申请实施例中的目标检测模型训练装置,在模型训练阶段,通过基于 真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值对生成子模型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;并且判别子模型所输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到弥补边界框分布相似但具体位置偏差所带来的边界框回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度。The target detection model training device in the embodiment of the present application, in the model training stage, is based on The real bounding box and the first initial bounding box prompt the model to be trained to continuously learn the distribution of bounding boxes, making the predicted first predicted bounding box closer to the real bounding box. This can not only improve the accuracy of the trained target detection model in the image to be detected. The accuracy of the bounding box prediction of the location of the target object can also improve the generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model and improving the target after training. Detect the data migration adaptability of the model; and the model to be trained includes a generating sub-model and a discriminating sub-model. Based on the set of discriminating results output by the discriminating sub-model, the regression loss value of the model to be trained is determined, and then the generating sub-model is continuously generated based on the regression loss value. The model parameters of the model and the discriminator model are updated iteratively in multiple rounds until the current model training results meet the preset model training end conditions, that is, the bounding box distribution is continuously learned based on multiple rounds of generation and discrimination confrontation, in which the discriminator model can determine the generator Whether the first predicted bounding box predicted by the model is realistic enough. When the generated bounding box (i.e. the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, the discriminant based on the discriminant sub-model As a result, adjusting the model parameters can further promote the first predicted bounding box predicted by the generating sub-model to be closer to the real bounding box, thereby further improving the model parameter update efficiency and bounding box distribution learning accuracy of the generating sub-model; and the discriminator The set of discrimination results output by the model not only includes the first discrimination result that characterizes the similarity of the bounding box distribution, but also includes the second discrimination result that characterizes the coincidence degree of the bounding box coordinates to compensate for the boundary caused by the similarity of the bounding box distribution but the specific position deviation. The effect of the frame regression loss makes the regression loss value obtained based on the discrimination result set more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value.
需要说明的是,本申请中关于目标检测模型训练装置的实施例与本申请中关于目标检测模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的目标检测模型训练方法的实施,重复之处不再赘述。It should be noted that the embodiment of the target detection model training device in this application and the embodiment of the target detection model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the corresponding target detection model mentioned above. The implementation of training methods will not be repeated again.
对应上述图5至图6描述的目标检测方法,基于相同的技术构思,本申请实施例还提供了一种目标检测装置,图8为本申请实施例提供的目标检测 装置的模块组成示意图,该装置用于执行图5至图6描述的目标检测方法,如图8所示,该装置包括:Corresponding to the target detection method described in Figures 5 to 6 above, based on the same technical concept, embodiments of the present application also provide a target detection device. Figure 8 shows the target detection device provided by the embodiment of the present application. Schematic diagram of the module composition of the device. The device is used to perform the target detection method described in Figures 5 to 6. As shown in Figure 8, the device includes:
第二边界框获取模块802,被配置为获取第三预设数量的第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的。The second bounding box acquisition module 802 is configured to acquire a third preset number of second initial bounding boxes; the second initial bounding boxes are obtained by extracting the target area of the image to be detected using a preset region of interest extraction model.
目标检测模块804,被配置为将所述第二初始边界框输入目标检测模型进行目标检测,得到各所述第二初始边界框对应的第二预测边界框和第二预测类别;The target detection module 804 is configured to input the second initial bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second predicted category corresponding to each of the second initial bounding boxes;
检测结果生成模块806,被配置为基于各所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。The detection result generation module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding box and the second prediction category corresponding to each of the second initial bounding box.
本申请实施例中的目标检测装置,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个候选边界框,再在候选边界框中随机采样第三预测数量的候选边界框作为第二初始边界框;针对每个第二初始边界框,生成子模型基于该第二初始边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于各第二初始边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果;其中,由于生成子模型的模型参数训练过程中,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得第一预测边界框更加接近于真实边界框,提高目标检测模型的模型泛化性和数据迁移性,进而提高对待检测图像中目标对象所在位置的边界框预测的准确度;并且待训练模型包括生成子模型和判别子模型,基于判别子模型输出的判别结果集合确定回归损失值,再不断基于回归损失值对模型参数进行迭代更新,提高生成子模型的模型参数更新效率;并且判别结果集合同时包含表征边界框分布相似程度和边界框坐标重合程度的判别结果,使得基于判别结果集合得到的回归损失值准确度更高,进一步提高基于该回归损失值更新后 的模型参数的准确度,从而确保生成子模型在新的待检测图像上也能够准确地进行边界框预测,进而提高利用目标检测模型对待检测图像进行目标检测的准确度。The target detection device in the embodiment of the present application, during the target detection process, first uses a preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third predicted number of candidate bounding boxes among the candidate bounding boxes as the third Two initial bounding boxes; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box, and obtains the second predicted bounding box; the classification sub-model performs category prediction on the second predicted bounding box, and obtains second prediction category; then, based on the second prediction bounding box and the second prediction category corresponding to each second initial bounding box, the target detection result of the image to be detected is generated; wherein, due to the model parameter training process of the generated sub-model, through Based on the real bounding box and the first initial bounding box, the model to be trained is prompted to continuously learn the bounding box distribution, making the first predicted bounding box closer to the real bounding box, improving the model generalization and data transferability of the target detection model, thereby improving The accuracy of the bounding box prediction of the location of the target object in the image to be detected; and the model to be trained includes a generative sub-model and a discriminant sub-model, the regression loss value is determined based on the set of discrimination results output by the discriminant sub-model, and then the regression loss value is continuously calculated The model parameters are updated iteratively to improve the efficiency of updating the model parameters of the generated sub-model; and the discrimination result set also contains discrimination results that represent the similarity of the distribution of bounding boxes and the degree of coincidence of bounding box coordinates, making the regression loss value obtained based on the discrimination result set more accurate Higher, further improvement based on the regression loss value update The accuracy of the model parameters ensures that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the accuracy of target detection using the target detection model on the image to be detected.
需要说明的是,本申请中关于目标检测装置的实施例与本申请中关于目标检测方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的目标检测方法的实施,重复之处不再赘述。It should be noted that the embodiments of the target detection device in this application and the embodiments of the target detection method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding target detection method mentioned above. Repeat No further details will be given.
进一步地,对应上述图1至图6所示的方法,基于相同的技术构思,本申请实施例还提供了一种计算机设备,该设备用于执行上述的目标检测模型训练方法或者目标检测方法,如图9所示。Further, corresponding to the methods shown in Figures 1 to 6 above, based on the same technical concept, embodiments of the present application also provide a computer device, which is used to execute the above-mentioned target detection model training method or target detection method, As shown in Figure 9.
计算机设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上的处理器901和存储器902,存储器902中可以存储有一个或一个以上存储应用程序或数据。其中,存储器902可以是短暂存储或持久存储。存储在存储器902的应用程序可以包括一个或一个以上模块(图示未示出),每个模块可以包括对计算机设备中的一系列计算机可执行指令。更进一步地,处理器901可以设置为与存储器902通信,在计算机设备上执行存储器902中的一系列计算机可执行指令。计算机设备还可以包括一个或一个以上电源903,一个或一个以上有线或无线网络接口904,一个或一个以上输入输出接口905,一个或一个以上键盘906等。Computer equipment may vary greatly due to different configurations or performance, and may include one or more processors 901 and memory 902, and the memory 902 may store one or more storage application programs or data. Among them, the memory 902 may be short-term storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), and each module may include a series of computer-executable instructions on a computer device. Furthermore, the processor 901 may be configured to communicate with the memory 902 and execute a series of computer-executable instructions in the memory 902 on the computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, etc.
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:获取第一预设数量的第一初始边界框,以及获取所述第一初始边界框对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将 所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型。The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for obtaining a first predetermined number of first initial bounding boxes, and obtaining The real bounding box corresponding to the first initial bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; The first initial bounding box and the real bounding box are input into the model to be trained for iterative model training until the current model training result satisfies the preset model training end conditions, and a target detection model is obtained.
其中,所述待训练模型包括生成子模型和判别子模型;上述模型迭代训练中的每次模型训练包括:针对每个所述第一初始边界框:所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box. Result set; the discrimination result set includes a first discrimination result and a second discrimination result, the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box, and the second The discrimination result represents the degree of coincidence of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box, the model to be trained is determined Regression loss value; perform parameter update on the generating sub-model and the discriminating sub-model based on the regression loss value.
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:获取第三预设数量的第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二初始边界框对应的第二预测边界框和第二预测类别;基于所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions and configured to execute the one or more programs by one or more processors includes computer-executable instructions for: obtaining a third predetermined number of second initial bounding boxes; The second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; the second initial bounding box is input into the target detection model for target detection, and the corresponding second initial bounding box is obtained. a second predicted bounding box and a second predicted category; based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, a target detection result of the image to be detected is generated.
本申请实施例中的计算机设备,在模型训练阶段,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提 高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值对生成子模型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;并且判别子模型所输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到弥补边界框分布相似但具体位置偏差所带来的边界框回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度;对应的,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个候选边界框,再在候选边界框中随机采样候选边界框作为第二初始边界框;针对每个第二初始边界框,生成子模型基于该第二初始边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于第二初始边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果,从而确保生成子模型在新的待检测图像上也能够准确地进行边界框预测,进而提高利用目标检测模型对待检测图像进行目标检测的准确度。The computer device in the embodiment of the present application, during the model training phase, prompts the model to be trained to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the predicted first predicted bounding box is closer to the real boundary This can not only improve the accuracy of the trained target detection model in predicting the bounding box of the location of the target object in the image to be detected, but also improve High generalization of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model, and improving the data migration adaptability of the trained target detection model; and the model to be trained includes generation The sub-model and the discriminant sub-model determine the regression loss value of the model to be trained based on the set of discrimination results output by the discriminant sub-model, and then continuously update the model parameters of the generating sub-model and the discriminant sub-model in multiple rounds of iterations based on the regression loss value. Until the current model training results meet the preset model training end conditions, that is, the bounding box distribution is continuously learned based on multiple rounds of generation and discrimination, where the discriminator sub-model can determine whether the first predicted bounding box predicted by the generation sub-model is real enough. When the generated bounding box (i.e., the first predicted bounding box) is indistinguishable from the real bounding box, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the prediction of the generated sub-model. The obtained first predicted bounding box is closer to the real bounding box, thereby further improving the model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discrimination results output by the discriminating sub-model not only includes representations of similar bounding box distributions The first discrimination result of degree also includes the second discrimination result that represents the degree of coincidence of bounding box coordinates, achieving the effect of making up for the bounding box regression loss caused by similar distribution of bounding boxes but specific position deviation, so that the regression obtained based on the discrimination result set The loss value is more accurate, which can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, Then randomly sample candidate bounding boxes among the candidate bounding boxes as the second initial bounding box; for each second initial bounding box, the generating sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; classification The sub-model predicts the category of the second predicted bounding box to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, the target detection result of the image to be detected is generated, thereby ensuring The generative sub-model can also accurately predict bounding boxes on new images to be detected, thereby improving the accuracy of target detection using the target detection model on images to be detected.
需要说明的是,本申请中关于计算机设备的实施例与本申请中关于目标检测模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可 以参见前述对应的目标检测模型训练方法的实施,重复之处不再赘述。It should be noted that the embodiment of the computer device in this application and the embodiment of the target detection model training method in this application are based on the same inventive concept, so the specific implementation of this embodiment can be Please refer to the implementation of the corresponding target detection model training method mentioned above, and the repeated points will not be described again.
进一步地,对应上述图1至图6所示的方法,基于相同的技术构思,本申请实施例还提供了一种存储介质,用于存储计算机可执行指令,一种具体的实施例中,该存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:获取第一初始边界框,以及获取所述第一初始边界框对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型。Further, corresponding to the methods shown in Figures 1 to 6 above, based on the same technical concept, embodiments of the present application also provide a storage medium for storing computer executable instructions. In a specific embodiment, the The storage medium can be a U disk, an optical disk, a hard disk, etc. When the computer executable instructions stored in the storage medium are executed by the processor, the following process can be achieved: obtaining the first initial bounding box, and obtaining the corresponding first initial bounding box. The real bounding box; the first initial bounding box is obtained by extracting the target area of the sample image data set using a preset region of interest extraction model; input the first initial bounding box and the real bounding box to be trained The model undergoes model iterative training until the current model training results meet the preset model training end conditions, and the target detection model is obtained.
其中,所述待训练模型包括生成子模型和判别子模型;上述模型迭代训练中的每次模型训练包括:针对每个所述第一初始边界框:所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the iterative training of the above model includes: for each first initial bounding box: the generating sub-model is based on the first initial bounding box. The bounding box performs bounding box prediction to obtain the first predicted bounding box; the discriminant sub-model generates a discriminant based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box. Result set; the discrimination result set includes a first discrimination result and a second discrimination result, the first discrimination result represents the similarity degree of the bounding box distribution of the first predicted bounding box and the real bounding box, and the second The discrimination result represents the degree of coincidence of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box, the model to be trained is determined Regression loss value; perform parameter update on the generating sub-model and the discriminating sub-model based on the regression loss value.
另一种具体的实施例中,该存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:获取第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二初始边界框对应的第二预测边界框和第二预测类别;基于所述第二初始边界框对应的所述第二预测边界框和所述第二预 测类别,生成所述待检测图像的目标检测结果。In another specific embodiment, the storage medium can be a U disk, an optical disk, a hard disk, etc., and when the computer executable instructions stored in the storage medium are executed by the processor, the following process can be implemented: obtain the second initial bounding box; The second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; inputting the second initial bounding box into the target detection model for target detection, the second initial bounding box is obtained the corresponding second predicted bounding box and the second predicted category; based on the second predicted bounding box corresponding to the second initial bounding box and the second predicted detection category, and generate target detection results of the image to be detected.
本申请实施例中的存储介质存储的计算机可执行指令在被处理器执行时,在模型训练阶段,通过基于真实边界框和第一初始边界框,促使待训练模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于真实边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且待训练模型包括生成子模型和判别子模型,基于判别子模型所输出的判别结果集合,确定待训练模型的回归损失值,再不断基于回归损失值对生成子模型和判别子模型的模型参数进行多轮迭代更新,直到当前模型训练结果满足预设模型训练结束条件,即基于生成判别多轮对抗的方式不断学习边界框分布,其中判别子模型能够判别生成子模型预测得到的第一预测边界框是否足够真实,在生成的边界框(即第一预测边界框)与真实的边界框难以区分的情况下,由于判别子模型的存在,基于判别子模型的判别结果对模型参数进行调整,能够进一步促使生成子模型预测得到的第一预测边界框更加接近于真实边界框,从而进一步提高生成子模型的模型参数更新效率和边界框分布学习准确度;并且判别子模型所输出的判别结果集合不仅包括表征边界框分布相似程度的第一判别结果,还包括表征边界框坐标重合程度的第二判别结果,达到弥补边界框分布相似但具体位置偏差所带来的边界框回归损失的效果,使得基于判别结果集合得到的回归损失值准确度更高,从而进一步能够提高基于该回归损失值更新后的模型参数的准确度;对应的,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个候选边界框,再在候选边界框中随机采样候选边界框作为第二初始边界框;针对每个第二初始边界框,生成子模型基于该第二初始边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于第二初始边界框对应 的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果,从而确保生成子模型在新的待检测图像上也能够准确地进行边界框预测,进而提高利用目标检测模型对待检测图像进行目标检测的准确度。When the computer-executable instructions stored in the storage medium in the embodiment of the present application are executed by the processor, during the model training phase, the model to be trained is prompted to continuously learn the bounding box distribution based on the real bounding box and the first initial bounding box, so that the prediction The obtained first predicted bounding box is closer to the real bounding box, which not only improves the accuracy of the trained target detection model in predicting the bounding box at the location of the target object in the image to be detected, but also improves the accuracy of the trained target detection model. Generalizability, thereby ensuring the target detection accuracy of new images to be detected using the target detection model, and improving the data migration adaptability of the trained target detection model; and the model to be trained includes a generative sub-model and a discriminant sub-model, based on The set of discrimination results output by the discriminant sub-model determines the regression loss value of the model to be trained, and then continuously updates the model parameters of the generating sub-model and the discriminant sub-model for multiple rounds of iterations based on the regression loss value until the current model training results meet the preset The end condition of model training is to continuously learn the bounding box distribution based on multiple rounds of generative and discriminative confrontation, in which the discriminant sub-model can determine whether the first predicted bounding box predicted by the generating sub-model is realistic enough. In the generated bounding box (i.e., the first When the predicted bounding box (predicted bounding box) is difficult to distinguish from the real bounding box, due to the existence of the discriminant sub-model, adjusting the model parameters based on the discrimination results of the discriminant sub-model can further promote the first predicted bounding box predicted by the generating sub-model to be more accurate. is close to the real bounding box, thereby further improving the model parameter update efficiency of the generating sub-model and the accuracy of bounding box distribution learning; and the set of discriminating results output by the discriminating sub-model not only includes the first discriminating result that characterizes the similarity of the bounding box distribution, but also Including the second discrimination result that represents the degree of coincidence of the bounding box coordinates, it achieves the effect of making up for the bounding box regression loss caused by the similar distribution of the bounding boxes but the specific position deviation, making the regression loss value obtained based on the discrimination result set more accurate, thus It can further improve the accuracy of the model parameters updated based on the regression loss value; correspondingly, in the target detection process, first use the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly sample among the candidate bounding boxes The candidate bounding box is used as the second initial bounding box; for each second initial bounding box, the generation sub-model performs boundary box prediction based on the second initial bounding box to obtain the second predicted bounding box; the classification sub-model predicts the second predicted bounding box Perform category prediction to obtain the second predicted category; then, based on the second initial bounding box correspondence The second predicted bounding box and the second predicted category are used to generate the target detection result of the image to be detected, thereby ensuring that the generated sub-model can accurately predict the bounding box on the new image to be detected, thereby improving the use of the target detection model to be detected. The accuracy of target detection in images.
需要说明的是,本申请中关于存储介质的实施例与本申请中关于目标检测模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的目标检测模型训练方法的实施,重复之处不再赘述。It should be noted that the embodiment about the storage medium in this application and the embodiment about the target detection model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the corresponding target detection model training method mentioned above. Implementation, repeated parts will not be repeated.
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。本领域内的技术人员应明白,本申请实施例可提供为方法、系统或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。The above has described specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations. Those skilled in the art should understand that embodiments of the present application may be provided as methods, systems or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器 中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer-readable memory The instructions in produce an article of manufacture that includes instruction means to implement the functions specified in the process or processes of the flowchart and/or the block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-permanent storage in computer-readable media, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, disk storage or other magnetic storage devices, or any other non-transmission medium, can be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves. It should also be noted that the terms "comprises," "comprises" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element.
本申请实施例可以在由计算机执行的计算机可执行指令的一般上下文中 描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请的一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。以上所述仅为本文件的实施例而已,并不用于限制本文件。对于本领域技术人员来说,本文件可以有各种更改和变化。凡在本文件的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本文件的权利要求范围之内。 Embodiments of the present application may be implemented in the general context of computer-executable instructions executed by a computer. Description, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices. Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. The above are only examples of this document and are not intended to limit this document. Various modifications and variations of this document may occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this document shall be included in the scope of the claims of this document.

Claims (16)

  1. 一种目标检测模型训练方法,所述方法包括:A target detection model training method, the method includes:
    获取第一初始边界框,以及获取所述第一初始边界框对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;Obtain a first initial bounding box, and obtain a real bounding box corresponding to the first initial bounding box; the first initial bounding box is obtained by extracting a target area from the sample image data set using a preset region of interest extraction model;
    将所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型;Input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model;
    其中,所述待训练模型包括生成子模型和判别子模型;所述模型迭代训练中的每次模型训练包括:Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training includes:
    所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;The generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial boundary. The first predicted bounding box corresponding to the frame generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the first predicted bounding box and the real boundary box The similarity degree of the bounding box distribution, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box;
    基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;Based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box, determine the regression loss value of the model to be trained;
    基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。Parameter updates are performed on the generating sub-model and the discriminating sub-model based on the regression loss value.
  2. 根据权利要求1所述的方法,其中,所述获取第一初始边界框还包括:The method according to claim 1, wherein said obtaining the first initial bounding box further includes:
    将样本图像数据集输入预设感兴趣区域提取模型进行感兴趣区域提取,得到第二预设数量的候选边界框;所述第二预设数量大于第一预设数量,所述第一预设数量为所述第一初始边界框的数量;The sample image data set is input into the preset region of interest extraction model to extract the region of interest, and a second preset number of candidate bounding boxes are obtained; the second preset number is greater than the first preset number, and the first preset number is The number is the number of the first initial bounding boxes;
    从所述第二预设数量的候选边界框中,选取所述第一预设数量的候选边界框作为第一初始边界框。 Select the first preset number of candidate bounding boxes from the second preset number of candidate bounding boxes as the first initial bounding box.
  3. 根据权利要求1所述的方法,其中,所述基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合,包括:The method according to claim 1, wherein generating a set of discrimination results based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box includes:
    对所述第一初始边界框对应的真实边界框和第一预测边界框进行边界框真伪判别,得到第一判别结果;基于所述第一初始边界框对应的真实边界框和第一预测边界框,计算边界框交并比损失,得到第二判别结果;基于所述第一初始边界框对应的真实边界框和第一预测边界框,计算用于对所述待训练模型的回归损失函数的损失梯度进行约束的回归损失补偿值,得到第三判别结果。Perform a boundary box authenticity judgment on the real bounding box and the first predicted boundary box corresponding to the first initial bounding box to obtain the first judgment result; based on the real bounding box and the first predicted boundary corresponding to the first initial bounding box box, calculate the intersection and union ratio loss of the bounding box to obtain the second discrimination result; based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, calculate the regression loss function for the model to be trained. The loss gradient is used to constrain the regression loss compensation value to obtain the third discrimination result.
  4. 根据权利要求3所述的方法,其中,所述基于所述第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值,包括:The method according to claim 3, wherein determining the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to the first initial bounding box includes:
    确定所述第一初始边界框对应的子回归损失值;所述第一初始边界框对应的子回归损失值是基于目标信息确定的,所述目标信息包括以下一种或组合:所述第一初始边界框对应的第一判别结果所表征的边界框分布相似程度、第二判别结果所表征的边界框坐标重合程度、第三判别结果所表征的回归损失补偿值;Determine the sub-regression loss value corresponding to the first initial bounding box; the sub-regression loss value corresponding to the first initial bounding box is determined based on target information, and the target information includes one or a combination of the following: the first The similarity of the distribution of bounding boxes represented by the first discrimination result corresponding to the initial bounding box, the degree of coincidence of bounding box coordinates represented by the second discrimination result, and the regression loss compensation value represented by the third discrimination result;
    基于所述第一初始边界框对应的所述子回归损失值,确定所述待训练模型的回归损失值。Based on the sub-regression loss value corresponding to the first initial bounding box, the regression loss value of the model to be trained is determined.
  5. 根据权利要求3所述的方法,其中,所述对所述第一初始边界框对应的真实边界框和第一预测边界框进行边界框真伪判别,得到第一判别结果,包括:The method according to claim 3, wherein said judging the authenticity of the bounding box corresponding to the first initial bounding box and the first predicted bounding box to obtain the first judgment result includes:
    基于所述第一初始边界框对应的真实边界框,确定所述真实边界框被所述判别子模型预测为真的第一判别概率;基于所述第一初始边界框对应的第 一预测边界框,确定所述第一预测边界框被所述判别子模型预测为伪造的第二判别概率;Based on the real bounding box corresponding to the first initial bounding box, determine the first discriminant probability that the real bounding box is predicted to be true by the discriminant sub-model; based on the first discriminant probability corresponding to the first initial bounding box A predicted bounding box, determining the second discrimination probability that the first predicted bounding box is predicted to be fake by the discriminant sub-model;
    基于所述第一判别概率和所述第二判别概率,生成第一判别结果。Based on the first discrimination probability and the second discrimination probability, a first discrimination result is generated.
  6. 根据权利要求5所述的方法,其中,所述基于所述第一判别概率和所述第二判别概率,生成第一判别结果,包括:The method of claim 5, wherein generating a first discrimination result based on the first discrimination probability and the second discrimination probability includes:
    基于所述第一判别概率和所述第一初始边界框对应的真实边界框的第一先验概率,确定第一加权概率;基于所述第二判别概率和所述第一初始边界框的第二先验概率,确定第二加权概率;Determine a first weighted probability based on the first discriminant probability and the first prior probability of the real bounding box corresponding to the first initial bounding box; based on the second discriminant probability and the first a priori probability of the first initial bounding box Two prior probabilities, determine the second weighted probability;
    基于所述第一加权概率和所述第二加权概率,生成第一判别结果。Based on the first weighted probability and the second weighted probability, a first discrimination result is generated.
  7. 根据权利要求3所述的方法,其中,所述基于所述第一初始边界框对应的真实边界框和第一预测边界框,计算边界框交并比损失,得到第二判别结果,包括:The method according to claim 3, wherein, based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box, calculating the intersection and union ratio loss of the bounding box to obtain the second discrimination result includes:
    对所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框进行边界框交并比损失计算,得到第一交并比损失;Perform a boundary box intersection and union loss calculation on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box corresponding to the first initial bounding box to obtain a first intersection and union loss;
    基于所述第一交并比损失,确定所述第一初始边界框对应的第二判别结果。Based on the first intersection-union ratio loss, a second discrimination result corresponding to the first initial bounding box is determined.
  8. 根据权利要求3所述的方法,其中,所述基于所述第一初始边界框对应的真实边界框和第一预测边界框,计算用于对所述待训练模型的回归损失函数的回归损失函数的损失梯度进行约束的回归损失补偿值,包括:The method according to claim 3, wherein the regression loss function for the regression loss function of the model to be trained is calculated based on the real bounding box and the first predicted bounding box corresponding to the first initial bounding box. The regression loss compensation value constrained by the loss gradient includes:
    基于所述第一初始边界框对应的真实边界框和第一预测边界框,生成所述第一初始边界框对应的合成边界框;Based on the real bounding box corresponding to the first initial bounding box and the first predicted bounding box, generate a synthetic bounding box corresponding to the first initial bounding box;
    基于所述第一初始边界框对应的合成边界框与真实边界框的边界框分布相似程度,确定回归损失补偿值。 Based on the similarity of the bounding box distribution between the synthetic bounding box corresponding to the first initial bounding box and the real bounding box, the regression loss compensation value is determined.
  9. 根据权利要求8所述的方法,其中,所述基于所述第一初始边界框对应的真实边界框和第一预测边界框,生成所述第一初始边界框对应的合成边界框,包括:The method of claim 8, wherein generating a synthetic bounding box corresponding to the first initial bounding box based on a real bounding box corresponding to the first initial bounding box and a first predicted bounding box includes:
    基于第一采样比例和所述第一初始边界框对应的真实边界框的第一坐标信息集合,确定第一坐标信息子集;Determine a first subset of coordinate information based on the first sampling ratio and the first set of coordinate information of the real bounding box corresponding to the first initial bounding box;
    基于第二采样比例和所述第一初始边界框对应的第一预测边界框的第二坐标信息集合,确定第二坐标信息子集;所述第一采样比例与所述第二采样比例之和等于1;Determine a second subset of coordinate information based on the second sampling ratio and the second coordinate information set of the first predicted bounding box corresponding to the first initial bounding box; the sum of the first sampling ratio and the second sampling ratio equal to 1;
    基于所述第一坐标信息子集和所述第二坐标信息子集,生成所述第一初始边界框对应的合成边界框。Based on the first subset of coordinate information and the second subset of coordinate information, a synthetic bounding box corresponding to the first initial bounding box is generated.
  10. 根据权利要求4所述的方法,其中,所述待训练模型还包括分类子模型;每次模型训练的具体实现方式还包括:所述分类子模型对所述第一初始边界框或者所述第一预测边界框进行分类处理,得到第一预测类别;The method according to claim 4, wherein the model to be trained further includes a classification sub-model; the specific implementation of each model training further includes: the classification sub-model analyzes the first initial bounding box or the third A predicted bounding box is classified and processed to obtain the first predicted category;
    所述目标信息还包括所述第一初始边界框对应的第一预测类别与所述第一初始边界框的真实类别之间的匹配关系,其中,若所述第一预测类别与所述真实类别不匹配,则所述第一初始边界框对应的子回归损失值为零;若所述第一预测类别与所述真实类别相匹配,则所述第一初始边界框对应的子回归损失值为基于所述边界框分布相似程度对应的第一回归损失分量、所述边界框坐标重合程度对应的第二回归损失分量和所述回归损失补偿值中至少一项确定的子回归损失值。The target information also includes a matching relationship between the first predicted category corresponding to the first initial bounding box and the true category of the first initial bounding box, wherein if the first predicted category and the true category does not match, then the sub-regression loss value corresponding to the first initial bounding box is zero; if the first predicted category matches the true category, then the sub-regression loss value corresponding to the first initial bounding box is A sub-regression loss value determined based on at least one of the first regression loss component corresponding to the similarity of the bounding box distribution, the second regression loss component corresponding to the coincidence degree of the bounding box coordinates, and the regression loss compensation value.
  11. 一种目标检测方法,所述方法包括:A target detection method, the method includes:
    获取第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;Obtain a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model;
    将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二 初始边界框对应的第二预测边界框和第二预测类别;The second initial bounding box is input into the target detection model for target detection to obtain the second The second predicted bounding box and second predicted category corresponding to the initial bounding box;
    基于所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。Based on the second predicted bounding box and the second predicted category corresponding to the second initial bounding box, a target detection result of the image to be detected is generated.
  12. 根据权利要求11所述的方法,其中,所述目标检测模型包括分类子模型和生成子模型;The method according to claim 11, wherein the target detection model includes a classification sub-model and a generation sub-model;
    在所述目标检测过程中,所述生成子模型基于所述第二初始边界框进行边界框预测,得到所述第二初始边界框对应的第二预测边界框;所述分类子模型对所述第二初始边界框或者所述第二预测边界框进行分类处理,得到所述第二初始边界框对应的第二预测类别。In the target detection process, the generation sub-model performs bounding box prediction based on the second initial bounding box, and obtains a second predicted bounding box corresponding to the second initial bounding box; the classification sub-model predicts the Classification processing is performed on the second initial bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second initial bounding box.
  13. 一种目标检测模型训练装置,所述装置包括:A target detection model training device, the device includes:
    第一边界框获取模块,被配置为获取第一初始边界框,以及获取所述第一初始边界框分别对应的真实边界框;所述第一初始边界框是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;The first bounding box acquisition module is configured to acquire the first initial bounding box, and acquire the real bounding boxes respectively corresponding to the first initial bounding boxes; the first initial bounding box is extracted using a preset region of interest model pair The sample image data set is obtained by extracting the target area;
    模型训练模块,被配置为将所述第一初始边界框和所述真实边界框输入待训练模型进行模型迭代训练,直到当前模型训练结果满足预设模型训练结束条件,得到目标检测模型;A model training module configured to input the first initial bounding box and the real bounding box into the model to be trained for iterative model training until the current model training results meet the preset model training end conditions to obtain a target detection model;
    其中,所述待训练模型包括生成子模型和判别子模型;所述模型迭代训练中的每次模型训练包括:Wherein, the model to be trained includes a generating sub-model and a discriminating sub-model; each model training in the model iterative training includes:
    所述生成子模型基于所述第一初始边界框进行边界框预测,得到第一预测边界框;所述判别子模型基于所述第一初始边界框对应的真实边界框和所述第一初始边界框对应的第一预测边界框,生成判别结果集合;所述判别结果集合包括第一判别结果和第二判别结果,所述第一判别结果表征所述第一预测边界框与所述真实边界框的边界框分布相似程度,所述第二判别结果表征所述第一预测边界框与所述真实边界框的边界框坐标重合程度;基于所述 第一初始边界框对应的第一判别结果和第二判别结果,确定所述待训练模型的回归损失值;基于所述回归损失值对所述生成子模型和所述判别子模型进行参数更新。The generating sub-model performs bounding box prediction based on the first initial bounding box to obtain a first predicted bounding box; the discriminating sub-model is based on the real bounding box corresponding to the first initial bounding box and the first initial boundary. The first predicted bounding box corresponding to the frame generates a set of judgment results; the set of judgment results includes a first judgment result and a second judgment result, and the first judgment result represents the first predicted bounding box and the real boundary box The similarity degree of the bounding box distribution, the second discrimination result represents the coincidence degree of the bounding box coordinates of the first predicted bounding box and the real bounding box; based on the The first discrimination result and the second discrimination result corresponding to the first initial bounding box are used to determine the regression loss value of the model to be trained; and based on the regression loss value, parameter updates are performed on the generating sub-model and the discriminating sub-model.
  14. 一种目标检测装置,所述装置包括:A target detection device, the device includes:
    第二边界框获取模块,被配置为获取第二初始边界框;所述第二初始边界框是利用预设感兴趣区域提取模型对待检测图像进行目标区域提取得到的;The second bounding box acquisition module is configured to acquire a second initial bounding box; the second initial bounding box is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model;
    目标检测模块,被配置为将所述第二初始边界框输入目标检测模型进行目标检测,得到所述第二初始边界框对应的第二预测边界框和第二预测类别;A target detection module configured to input the second initial bounding box into a target detection model for target detection, and obtain a second predicted bounding box and a second predicted category corresponding to the second initial bounding box;
    检测结果生成模块,被配置为基于所述第二初始边界框对应的所述第二预测边界框和所述第二预测类别,生成所述待检测图像的目标检测结果。A detection result generation module is configured to generate a target detection result of the image to be detected based on the second predicted bounding box corresponding to the second initial bounding box and the second predicted category.
  15. 一种计算机设备,所述设备包括:A computer device, the device includes:
    处理器;以及processor; and
    被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所述处理器执行,所述可执行指令包括用于执行如权利要求1-10任一项或者11-12任一项所述的方法中的步骤。Memory arranged to store computer-executable instructions configured to be executed by the processor, the executable instructions including instructions for performing any one of claims 1-10 or any one of claims 11-12 steps in the method.
  16. 一种存储介质,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行如权利要求1-10任一项或者11-12任一项所述的方法。 A storage medium, the storage medium is used to store computer-executable instructions, the executable instructions enable the computer to execute the method as described in any one of claims 1-10 or 11-12.
PCT/CN2023/100274 2022-07-15 2023-06-14 Target detection model training method and apparatus, and target detection method and apparatus WO2024012138A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210831208.2A CN117437395A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device
CN202210831208.2 2022-07-15

Publications (1)

Publication Number Publication Date
WO2024012138A1 true WO2024012138A1 (en) 2024-01-18

Family

ID=89535471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100274 WO2024012138A1 (en) 2022-07-15 2023-06-14 Target detection model training method and apparatus, and target detection method and apparatus

Country Status (2)

Country Link
CN (1) CN117437395A (en)
WO (1) WO2024012138A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118050839A (en) * 2024-04-16 2024-05-17 上海频准激光科技有限公司 Target grating generation method
CN118050894A (en) * 2024-04-16 2024-05-17 上海频准激光科技有限公司 Control system of light reflection module

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569901A (en) * 2019-09-05 2019-12-13 北京工业大学 Channel selection-based countermeasure elimination weak supervision target detection method
CN111767962A (en) * 2020-07-03 2020-10-13 中国科学院自动化研究所 One-stage target detection method, system and device based on generation countermeasure network
CN114565916A (en) * 2022-02-07 2022-05-31 苏州浪潮智能科技有限公司 Target detection model training method, target detection method and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569901A (en) * 2019-09-05 2019-12-13 北京工业大学 Channel selection-based countermeasure elimination weak supervision target detection method
CN111767962A (en) * 2020-07-03 2020-10-13 中国科学院自动化研究所 One-stage target detection method, system and device based on generation countermeasure network
CN114565916A (en) * 2022-02-07 2022-05-31 苏州浪潮智能科技有限公司 Target detection model training method, target detection method and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118050839A (en) * 2024-04-16 2024-05-17 上海频准激光科技有限公司 Target grating generation method
CN118050894A (en) * 2024-04-16 2024-05-17 上海频准激光科技有限公司 Control system of light reflection module

Also Published As

Publication number Publication date
CN117437395A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
US11594070B2 (en) Face detection training method and apparatus, and electronic device
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
CN110414550B (en) Training method, device and system of face recognition model and computer readable medium
JPWO2014136316A1 (en) Information processing apparatus, information processing method, and program
JP2006155594A (en) Pattern recognition device, pattern recognition method
CN114549894A (en) Small sample image increment classification method and device based on embedded enhancement and self-adaptation
CN110569870A (en) deep acoustic scene classification method and system based on multi-granularity label fusion
CN116089648B (en) File management system and method based on artificial intelligence
CN109961103B (en) Training method of feature extraction model, and image feature extraction method and device
CN114997287A (en) Model training and data processing method, device, equipment and storage medium
CN113255604B (en) Pedestrian re-identification method, device, equipment and medium based on deep learning network
Lee et al. Reinforced adaboost learning for object detection with local pattern representations
CN113435531A (en) Zero sample image classification method and system, electronic equipment and storage medium
WO2024012138A1 (en) Target detection model training method and apparatus, and target detection method and apparatus
CN115618043B (en) Text operation graph mutual inspection method and model training method, device, equipment and medium
WO2024012179A1 (en) Model training method, target detection method and apparatuses
WO2024012217A1 (en) Model training method and device, and target detection method and device
CN114937454A (en) Method, device and storage medium for preventing voice synthesis attack by voiceprint recognition
CN112214626B (en) Image recognition method and device, readable storage medium and electronic equipment
CN112862758A (en) Training method for neural network for detecting paint application quality of wall top surface
CN117437396A (en) Target detection model training method, target detection method and target detection device
CN112507137B (en) Small sample relation extraction method based on granularity perception in open environment and application
CN116737974B (en) Method and device for determining threshold value for face image comparison and electronic equipment
CN112085041B (en) Training method and training device of neural network and electronic equipment
WO2024016945A1 (en) Training method for image classification model, image classification method, and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838632

Country of ref document: EP

Kind code of ref document: A1