CN117437395A - Target detection model training method, target detection method and target detection device - Google Patents

Target detection model training method, target detection method and target detection device Download PDF

Info

Publication number
CN117437395A
CN117437395A CN202210831208.2A CN202210831208A CN117437395A CN 117437395 A CN117437395 A CN 117437395A CN 202210831208 A CN202210831208 A CN 202210831208A CN 117437395 A CN117437395 A CN 117437395A
Authority
CN
China
Prior art keywords
model
boundary
prediction
initial
boundary frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210831208.2A
Other languages
Chinese (zh)
Inventor
吕永春
朱徽
王钰
周迅溢
曾定衡
蒋宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Consumer Finance Co Ltd
Original Assignee
Mashang Consumer Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Consumer Finance Co Ltd filed Critical Mashang Consumer Finance Co Ltd
Priority to CN202210831208.2A priority Critical patent/CN117437395A/en
Priority to PCT/CN2023/100274 priority patent/WO2024012138A1/en
Publication of CN117437395A publication Critical patent/CN117437395A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

In a model training stage, a model to be trained is caused to continuously learn boundary frame distribution based on a real boundary frame and a first initial boundary frame, so that a first prediction boundary frame is more similar to the real boundary frame, and boundary frame prediction accuracy, model generalization and data migration of the target detection model are improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value is determined based on a judging result set output by the judging sub-model, and model parameters are repeatedly updated based on the regression loss value, so that the model parameter updating efficiency of the generating sub-model is improved; and the judging result set simultaneously comprises judging results representing the distribution similarity degree of the boundary frame and the coordinate superposition degree of the boundary frame, so that the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value is further improved.

Description

Target detection model training method, target detection method and target detection device
Technical Field
The present disclosure relates to the field of target detection, and in particular, to a target detection model training method, a target detection method, and a target detection device.
Background
At present, along with the rapid development of artificial intelligence technology, target detection is performed on a certain image through a pre-trained target detection model, so that the requirements of coordinate information and classification information of a boundary box where each target contained in the image is predicted to be obtained are higher and higher; however, in the existing training process of the target detection model, model parameter training is performed through image feature extraction, so that for a sample image dataset, accuracy of model parameters of the target detection model obtained through training is relatively high, but for an image to be detected, accuracy of model parameters of the target detection model obtained through training is reduced, and thus, target detection accuracy in a model application stage is relatively low.
Disclosure of Invention
The embodiment of the application aims to provide a target detection model training method, a target detection method and a target detection device, which not only can improve the accuracy of boundary frame prediction, model generalization and data migration of a target detection model, but also can improve the accuracy of a determined regression loss value and further improve the accuracy of model parameters updated based on the regression loss value.
In order to achieve the above technical solution, the embodiments of the present application are implemented as follows:
In a first aspect, an embodiment of the present application provides a method for training a target detection model, where the method includes:
acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
inputting the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame;
Determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box;
and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
In a second aspect, an embodiment of the present application provides a target detection method, where the method includes:
acquiring a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second initial boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second prediction categories corresponding to the second initial boundary boxes;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second initial boundary boxes.
In a third aspect, an embodiment of the present application provides an object detection model training apparatus, where the apparatus includes:
the first boundary frame acquisition module is configured to acquire a first preset number of first initial boundary frames and acquire real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
The model training module is configured to input the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame; determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box; and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
In a fourth aspect, an embodiment of the present application provides an object detection apparatus, including:
a second bounding box acquisition module configured to acquire a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module is configured to input the second initial bounding boxes into a target detection model to carry out target detection, so as to obtain second prediction bounding boxes and second prediction categories corresponding to the second initial bounding boxes;
and the detection result generation module is configured to generate a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second initial boundary box.
In a fifth aspect, a computer device provided in an embodiment of the present application, the device includes:
a processor; and a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method as described in the first or second aspect.
In a sixth aspect, embodiments of the present application provide a storage medium storing computer executable instructions that cause a computer to perform the steps of the method as described in the first or second aspect.
It can be seen that, in the embodiment of the application, in the model training stage, by based on the real bounding box and the first initial bounding box, the model to be trained is caused to continuously learn the bounding box distribution, so that the predicted first prediction bounding box is closer to the real bounding box, thus not only improving the accuracy of the trained target detection model in predicting the bounding box of the position of the target object in the image to be detected, but also improving the generalization of the trained target detection model, thereby realizing the purpose of ensuring the target detection accuracy of the target detection model on the new image to be detected, and improving the data migration adaptability of the trained target detection model; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the loss caused by the boundary frame distribution similarity but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments described in one or more of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 is a schematic flow chart of a training method of a target detection model according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of each model training process in the target detection model training method according to the embodiment of the present application;
FIG. 3 is a schematic diagram of a first implementation principle of the training method of the target detection model according to the embodiment of the present application;
FIG. 4a is a schematic diagram of a second implementation principle of the training method of the target detection model according to the embodiment of the present application;
fig. 4b is a schematic diagram of a third implementation principle of the training method of the target detection model according to the embodiment of the present application;
fig. 5 is a schematic flow chart of a target detection method according to an embodiment of the present application;
fig. 6 is a schematic diagram of an implementation principle of a target detection method according to an embodiment of the present application;
Fig. 7 is a schematic diagram of module composition of a training device for a target detection model according to an embodiment of the present application;
fig. 8 is a schematic block diagram of a target detection apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in one or more embodiments of the present application, the following description will clearly and completely describe the technical solutions in embodiments of the present application with reference to the drawings in embodiments of the present application, and it is obvious that the described embodiments are only one or more of some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one or more of the embodiments of the present application without inventive faculty, are intended to be within the scope of the present application.
It should be noted that, without conflict, one or more embodiments of the present application and features of the embodiments may be combined with each other. Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
Considering that if the depth network is used for extracting the characteristics, the model is promoted to learn the image characteristics in the boundary frame, the similarity degree of the image characteristics in the prediction boundary frame and the real boundary frame is continuously learned, and model parameter adjustment is carried out, so that the trained target detection model is compared with a sample data set which is used in a model training stage, the generalization of the target detection model is poor, the model cross data migration capability is poor, the problem that the target detection accuracy of the target detection model on the sample data set is high, and the target detection accuracy of new image data to be detected is low is solved, and based on the problem, the model is promoted to continuously learn the boundary frame distribution by the aid of the real boundary frame and the first initial boundary frame in the model training stage, so that the predicted first prediction boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of a target object in the image to be detected by the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the target detection model to the new image to be detected by the target detection model is guaranteed, and the data migration capability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; and considering that if model regression loss is determined only from coarse granularity comparison dimensions of the boundary frame distribution similarity degree, model parameter adjustment is performed, accurate position learning of the boundary frame cannot be considered, or model regression loss is determined only from fine granularity comparison dimensions of the boundary frame coordinate superposition degree, and model parameter adjustment is performed, the problem of edge ambiguity of the boundary frame cannot be considered, based on the model regression loss is determined by combining the coarse granularity comparison dimensions of the boundary frame distribution similarity degree and the fine granularity comparison dimensions of the boundary frame coordinate superposition degree, namely, a set of discrimination results output by the discrimination sub-model not only comprises a first discrimination result representing the boundary frame distribution similarity degree, but also comprises a second discrimination result representing the boundary frame coordinate superposition degree, thereby achieving the effect of regression loss caused by the boundary frame with similar boundary frame distribution but specific position deviation and regression loss caused by the first prediction boundary frame corresponding to the edge ambiguity, so that the accuracy of regression loss values obtained based on the discrimination result set is higher, and further improving the accuracy of model parameters after updating based on the regression loss values.
Fig. 1 is a schematic flow diagram of a first method for training an object detection model according to one or more embodiments of the present application, where the method in fig. 1 can be performed by an electronic device provided with an object detection model training apparatus, and the electronic device may be a terminal device or a designated server, where a hardware apparatus for training an object detection model (i.e. the electronic device provided with the object detection model training apparatus) and a hardware apparatus for object detection (i.e. the electronic device provided with the object detection apparatus) may be the same or different. It should be noted that, the target detection model obtained by training based on the target detection model training method provided by the embodiment of the present application may be applied to any specific application scenario where target detection is required to be performed on an image to be detected, for example, the specific application scenario 1 performs target detection on an image to be detected acquired by using an image acquisition device of a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance), and for example, the specific application scenario 2 performs target detection on an image to be detected acquired by using an image acquisition device of each monitoring point in a certain cultivation base;
The sample image data set used in the training process of the target detection model is different due to different specific application scenes of the target detection model, and for the specific application scene 1, the sample image data set can be a historical sample image acquired at an entrance of a specified public place in a preset historical time period, correspondingly, a target object outlined by a first initial boundary frame is a target user entering the specified public place in the historical sample image, and the real category and the first prediction category can be categories to which the target user belongs, such as at least one of age, sex, height and occupation; for the specific application scenario 2, the sample image dataset may be a historical sample image acquired by each monitoring point in the specified culture base within a preset historical time period, and correspondingly, the target object outlined by the first initial bounding box is a target culture object in the historical sample image, and the real category and the first prediction category may be at least one of a living body state and a body size of the target culture object.
Specifically, the training process for the target detection model, as shown in fig. 1, at least includes the following steps:
S102, acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target area from a sample image dataset by using a preset interested area extraction model;
specifically, the determining process of the first initial bounding boxes with the first preset number may be that, for each round of model training, a step of performing target region extraction on the sample image dataset by using the preset region of interest extraction model once is performed, so as to obtain the first initial bounding boxes with the first preset number; the step of extracting the target region from the sample image dataset by using the preset region of interest extraction model may also be performed in advance, and then, for each round of model training, a first preset number of first initial bounding boxes may be randomly sampled from a large number of candidate bounding boxes extracted in advance.
Specifically, the sample image dataset may include a plurality of sample target objects, and each sample target object may correspond to a plurality of first initial bounding boxes, that is, a first preset number of first initial bounding boxes includes at least one first initial bounding box corresponding to each sample target object.
Specifically, before the obtaining the first initial bounding boxes of the first preset number, the method further includes: inputting the sample image dataset into a preset region of interest extraction model to extract the region of interest, and obtaining a second preset number of candidate bounding boxes; the second preset number is equal to or greater than the first preset number, namely, for each round of model training, a preset region of interest extraction model is utilized to extract the regions of interest of a plurality of sample image data in the sample image data set, so as to obtain a first initial bounding box of the first preset number; and randomly sampling from the candidate bounding boxes with the first preset number to obtain first initial bounding boxes with the first preset number for each round of model training under the condition that the second preset number is larger than the first preset number.
Considering that one of the purposes in the model training process is to continuously learn the boundary frame distribution through iterative training of model parameters, so as to improve generalization and data migration of the model (that is, model parameters are not dependent on sample data used in the model training process and can be better suitable for data to be identified in the model application process), since in order to prompt the model to be trained to better learn the boundary frame distribution, it is required to ensure that a first initial boundary frame extracted and input into the model to be trained obeys a certain probability distribution (such as Gaussian distribution or Cauchy distribution), the greater the number N of anchor frames extracted by using a preset region of interest extraction model is, the better the boundary frame distribution learning of the model to be trained is facilitated, however, if N anchor frames are extracted as the first initial boundary frame by using the preset region of interest extraction model (such as a region of interest extraction algorithm ROI) in real time each time, the model to be input into the model to be trained, the data processing amount is inevitably large, and the requirement on hardware equipment is relatively high;
In a specific implementation, preferably, N anchor frames are extracted by using a preset region of interest extraction model in advance, and then m anchor frames are randomly sampled from the N anchor frames and used as first initial boundary frames for each round of model training, and are input into a model to be trained, so that the data processing amount of each round of model training can be ensured, and the model can be ensured to better perform boundary frame distribution learning, namely, the data processing amount in the model training process and the boundary frame distribution learning can be simultaneously considered, and based on the fact that the second preset number is greater than the first preset number, the corresponding first initial boundary frames with the first preset number are obtained, and the method specifically comprises the following steps: randomly selecting a first preset number of candidate bounding boxes from the second preset number of candidate bounding boxes as a first initial bounding box, namely extracting the regions of interest from a plurality of sample image data in a sample image data set by using a preset region of interest extraction model in advance to obtain a second preset number of candidate bounding boxes; then, for each round of model training, randomly sampling from the second preset number of candidate bounding boxes to obtain a first preset number of first initial bounding boxes.
That is, it is preferable that N anchor frames (i.e., the second preset number of candidate bounding boxes) are extracted in advance, then, for each round of model training, m anchor frames (i.e., the first preset number of first initial bounding boxes) are randomly sampled from the N anchor frames, and then the following step S104 is continued.
S104, inputting the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, and obtaining a target detection model; the preset model training ending condition may include: the number of training rounds of the current model is equal to the total number of training rounds, the model loss function converges, or equilibrium is achieved between the generation sub-model and the discrimination sub-model;
the specific implementation process of the model iterative training in the step S104 is described below, and since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example for detailed description. Specifically, if the model to be trained comprises a generation sub-model and a judgment sub-model; as shown in fig. 2, each model training implementation may have the following steps S1042 to S1046:
S1042, for each first initial bounding box: generating a sub-model to conduct boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on the real boundary frame corresponding to the first initial boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame;
specifically, for the determination process of the first discrimination result representing the distribution similarity degree of the boundary frame, the KL divergence between the real boundary frame and the corresponding first prediction boundary frame can be directly calculated; however, in the specific implementation, considering whether the first prediction boundary frame obtained by the prediction of the generation sub-model can be judged to be sufficiently real or not by the judgment sub-model, under the condition that the generated boundary frame (i.e. the first prediction boundary frame) and the real boundary frame are difficult to distinguish, the judgment sub-model is based on the judgment result of the judgment sub-model to regulate model parameters, so that the first prediction boundary frame obtained by the prediction of the generation sub-model can be further caused to be closer to the real boundary frame, in order to further improve the accuracy of regression loss components corresponding to the distribution similarity degree of the boundary frame, the first prediction boundary frame obtained by the prediction of the target detection model can be further ensured to be more real, and the judgment probability of whether the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame are generated by means of the judgment sub-model is respectively from real data or generated can be determined, and the judgment probability of the real data can be represented based on the probability distribution similarity degree of the two boundary frames (i.e. the real boundary frame and the corresponding first prediction boundary frame) can be further caused to characterize the probability distribution similarity degree of the real boundary frame and the corresponding regression distribution similarity degree; specifically, for the real boundary frame and the first prediction boundary frame corresponding to a certain first initial boundary frame, the discrimination sub-model discriminates that the real boundary frame comes from the discrimination probability of real data and the discrimination probability that the discrimination sub-model discriminates that the first prediction boundary frame comes from the discrimination probability of generating data, the larger the discrimination probability that the discrimination sub-model discriminates that the real boundary frame comes from the real data, the larger the discrimination probability that the first prediction boundary frame comes from the generating data, the lower the probability distribution similarity degree of the first prediction boundary frame and the corresponding real boundary frame, the larger the first regression loss component corresponding to the discrimination dimension of the boundary frame distribution similarity degree, so that the distribution similarity degree of the first prediction boundary frame corresponding to a certain first initial boundary frame and the corresponding real boundary frame is determined based on the discrimination probability that the discrimination sub-model respectively comes from the real data or the generating data, the first discrimination result can be generated based on the discrimination probability of the discrimination sub-model, and the first regression loss component similarity degree corresponding to the boundary frame distribution can be determined based on the probability in the first discrimination result.
Correspondingly, for the determination process of the second discrimination result for representing the coordinate coincidence degree of the boundary frames, only the cross-ratio loss between a certain real boundary frame and a corresponding first prediction boundary frame can be considered to obtain the target cross-ratio loss; the cross-ratio loss between a certain real boundary frame and a corresponding first prediction boundary frame and the cross-ratio loss between a certain real boundary frame and a first prediction boundary frame corresponding to other real boundary frames can be comprehensively considered, and the target cross-ratio loss is determined; the coordinate coincidence degree between the real boundary frame and the corresponding first prediction boundary frame can be represented by the size of the target coincidence ratio loss, so that a second regression loss component corresponding to the discrimination dimension considered from the coordinate coincidence degree angle of the boundary frame can be determined based on the target coincidence ratio loss, and the model is further promoted to carry out the boundary frame regression learning; specifically, for a real bounding box and a first prediction bounding box corresponding to a certain first initial bounding box, determining a target intersection ratio loss between the real bounding box and the first prediction bounding box, wherein the larger the target intersection ratio loss is, the lower the coordinate coincidence degree of the first prediction bounding box and the corresponding real bounding box is, the larger the corresponding second regression loss component is for the discrimination dimension of the coordinate coincidence degree of the bounding box, so that the coordinate coincidence degree of the first prediction bounding box corresponding to the certain first initial bounding box and the corresponding real bounding box is determined based on the target intersection ratio loss between the real bounding box and the first prediction bounding box, therefore, a second discrimination result can be generated based on the target intersection ratio loss, and the second discrimination result can represent the coordinate coincidence degree of the bounding box, and further, the second regression loss component corresponding to the discrimination dimension of the coordinate coincidence degree of the bounding box can be determined based on the intersection ratio loss in the second discrimination result.
S1044, determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result in a discrimination result set corresponding to each first initial boundary box;
specifically, after a discrimination result set is obtained for each first initial bounding box, a sub-regression loss value corresponding to each first initial bounding box is obtained, wherein the sub-regression loss value at least comprises a first regression loss component corresponding to a first discrimination dimension considered from the perspective of the distribution similarity of the bounding box and a second regression loss component corresponding to a second discrimination dimension considered from the perspective of the coordinate coincidence degree of the bounding box; then, based on the sub-regression loss values corresponding to each of the first initial bounding boxes, a regression loss value for adjusting the model parameters may be determined.
In the specific implementation, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, the sub-regression loss value corresponding to the first initial bounding box may be determined from the angle consideration of the similarity degree of the bounding box distribution and the coordinate coincidence degree of the bounding box, or may be considered from the angle consideration of the similarity degree of the bounding box distribution only, that is, the set of discrimination results corresponding to the first initial bounding box includes the first discrimination result, and correspondingly, the sub-regression loss value corresponding to the first initial bounding box is determined based on the first regression loss component corresponding to the first discrimination result.
S1046, updating parameters of the generation sub-model and the judgment sub-model based on the regression loss value.
Specifically, after determining a regression loss value based on the sub-regression loss value corresponding to each first initial boundary box, performing parameter adjustment on the generation sub-model and the discrimination sub-model based on the regression loss value by using a gradient descent method; the sub-regression loss value reflects at least a first regression loss component corresponding to a regression loss discrimination dimension based on the boundary frame distribution similarity degree and a second regression loss component corresponding to a regression loss discrimination dimension based on the boundary frame coordinate coincidence degree, so that the regression loss value used for adjusting the model parameters also reflects regression loss components corresponding to the two regression loss discrimination dimensions respectively, and therefore the target detection model obtained through final training can ensure that the probability distribution of the predicted first prediction boundary frame and the predicted real boundary frame is closer, and can ensure that the coordinate coincidence degree of the first prediction boundary frame and the real boundary frame is higher.
In the model training process, the judging sub-model distinguishes the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame as far as possible, whether the real boundary frame and the corresponding first prediction boundary frame are from real data or generated data respectively, and minimizes the regression loss of the model to be trained, and in order to maximize the resolution error of the judging sub-model, the generating sub-model is forced to continuously learn the boundary frame distribution, the generating sub-model and the judging sub-model are promoted to conduct multi-round countermeasure learning, and therefore the generating sub-model with more accurate prediction is obtained to serve as the target detection model.
It should be noted that, iterative training is performed on model parameters based on regression loss values of the model to be trained, so that the obtained target detection model can refer to the existing process of adjusting and optimizing the model parameters by using the gradient descent method for back propagation, which is not described herein.
As shown in fig. 3, a schematic diagram of a specific implementation principle of a training process of a target detection model is provided, which specifically includes:
acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively;
for each first initial bounding box: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame;
determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box;
and carrying out iterative updating on model parameters of the model to be trained based on the regression loss value until the current model training result meets the preset model training ending condition, thereby obtaining the target detection model.
In the embodiment of the application, in a model training stage, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the predicted first predicted boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image of the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the target detection model to a new to-be-detected image is ensured, and the data migration adaptability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by the similar distribution of the boundary frame but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved.
Further, considering that there may be a sudden decrease or even zero gradient of the regression loss corresponding to the first discrimination dimension in consideration of the degree of similarity angle of the boundary frame distribution during the model training, in order to further improve the training accuracy of the model parameters, a regression loss compensation value is introduced, and based on this, the set of discrimination results further includes a third discrimination result; correspondingly, the generating a set of discrimination results in S1042 based on the real bounding box corresponding to the first initial bounding box and the first prediction bounding box corresponding to the first initial bounding box specifically includes:
carrying out boundary frame authenticity judgment on the real boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame to obtain a first judgment result; calculating boundary frame cross ratio loss based on the real boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame to obtain a second judging result; and calculating a regression loss compensation value for restraining the loss gradient of the regression loss function of the model to be trained based on the real boundary box and the first prediction boundary box corresponding to the first initial boundary box, and obtaining a third judging result.
Specifically, for each first initial bounding box, the set of discrimination results corresponding to the first initial bounding box not only includes the first discrimination result obtained from the perspective of similarity of the distribution of the bounding box and the second discrimination result obtained from the perspective of coincidence of coordinates of the bounding box, but also includes a regression loss compensation value for constraining the gradient of the regression loss corresponding to the first discrimination dimension, so that the accuracy of the regression loss value can be improved, and the problem that the gradient of the regression loss corresponding to the first discrimination dimension suddenly decreases or even becomes zero can be solved.
In implementation, as shown in fig. 4a, a schematic diagram of another specific implementation principle of the training process of the target detection model is provided, which specifically includes:
carrying out target region extraction on a sample image dataset by utilizing a preset region of interest extraction model in advance to obtain N anchor frames; wherein the sample image dataset comprises a plurality of raw sample images, each raw sample image comprising at least one target object; the feature information corresponding to each anchor frame may include position information (x, y, w, h) and category information c, i.e., (x, y, w, h, c); specifically, in the model training process, multiple parameter dimensions can be set to be mutually independent, so that the iterative training process of model parameters aiming at each dimension is also mutually independent;
Randomly sampling m anchor frames from N anchor frames as first initial boundary frames aiming at each round of model training, and determining real boundary frames corresponding to each first initial boundary frame respectively; wherein each target object in the sample image dataset may correspond to a real bounding box, e.g. the total number of target objects in the sample image dataset is d, the number of real bounding boxes before expansion is d, so that the real bounding boxes correspond to the first prediction bounding boxes, and therefore, the real bounding boxes corresponding to the first initial bounding boxes containing the same target object may be the same, i.e. the real bounding boxes are expanded based on the target object outlined by the first initial bounding box, resulting in m real bounding boxes (m > d); for example, if the target object included in a certain original sample image is a cat a, and the cat a corresponds to a real bounding box a, and if the number of the first initial bounding boxes including the cat a is 4 (e.g., the first initial bounding boxes with serial numbers of 6, 7, 8, and 9), the real bounding box a is expanded into 4 real bounding boxes a (i.e., the real bounding boxes with serial numbers of 6, 7, 8, and 9);
for each first initial boundary frame, generating a sub-model to conduct boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame; each first initial boundary frame corresponds to a real boundary frame and a first prediction boundary frame, and the first prediction boundary frames are obtained through continuous boundary frame regression learning generation sub-model prediction; specifically, a target object outlined by a first prediction boundary box with serial numbers of 6, 7, 8 and 9 in m first prediction boundary boxes output by the generation sub-model is cat A;
For each first initial bounding box, determining a first regression loss component based on a first discrimination result in the discrimination result set of the first initial bounding box, determining a second regression loss component based on a second discrimination result in the discrimination result set of the first initial bounding box, and determining a third regression loss component based on a third discrimination result in the discrimination result set of the first initial bounding box;
determining a regression loss value of the model to be trained based on a first regression loss component, a second regression loss component and a third regression loss component which are respectively corresponding to each first initial boundary box; adjusting model parameters of the generating sub-model and the judging sub-model based on the regression loss value by using a random gradient descent method to obtain the generating sub-model and the judging sub-model after parameter updating;
if the current model training result meets the preset model training ending condition, determining the updated generation sub-model as a trained target detection model;
if the current model training result does not meet the preset model training ending condition, determining the updated generation sub-model and the updated judgment sub-model as a model to be trained for the next round of model training until the preset model training ending condition is met.
Specifically, in the model training process, for each round of model training, model parameters of the discriminant sub-model can be adjusted based on the discriminant result set, and model parameters of the generated sub-model can be adjusted based on the discriminant result set; however, in the specific implementation, in order to improve the training accuracy of the model parameters of the generation sub-model, for each round of model training, the model parameters of the discrimination sub-model are firstly circularly adjusted t times based on the discrimination result set, and then the model parameters of the generation sub-model are adjusted once based on the discrimination result set, so that the discrimination sub-model and the generation sub-model with the adjusted parameters are obtained as the models to be trained in the next round.
The regression loss value of the model to be trained is determined based on the sub-regression loss values corresponding to the plurality of first initial bounding boxes, and the sub-regression loss value corresponding to each of the first initial bounding boxes is determined based on the plurality of regression loss components, and based on this, the step S1044 of determining the regression loss value of the model to be trained based on the first discrimination result and the second discrimination result corresponding to each of the first initial bounding boxes specifically includes:
determining a sub-regression loss value corresponding to each first initial boundary box; the sub-regression loss value corresponding to each first initial bounding box is determined based on target information, wherein the target information includes one or a combination of the following: the boundary box distribution similarity represented by the first discrimination result corresponding to the first initial boundary box, the boundary box coordinate coincidence degree represented by the second discrimination result and the regression loss compensation value represented by the third discrimination result;
And determining the regression loss value of the model to be trained based on the sub-regression loss value corresponding to each first initial boundary box.
Specifically, in the specific implementation, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, only the first regression loss component corresponding to the first discrimination result may be considered, the first regression loss component corresponding to the first discrimination result and the second regression loss component corresponding to the second discrimination result may be considered at the same time, and the first regression loss component corresponding to the first discrimination result, the second regression loss component corresponding to the second discrimination result and the regression loss compensation component corresponding to the third discrimination result may be considered at the same time; taking the loss compensation component as an example, for each first initial bounding box, the corresponding sub-regression loss value is equal to the weighted sum of the three regression loss components, which may be expressed in particular,
V i (D,G)=λ 1 V i12 V i23 V i3
wherein lambda is 1 Representing a first weight coefficient corresponding to a first regression loss component in a first discriminant dimension, V i1 Represents a first regression loss component (i.e., a regression loss component corresponding to the degree of similarity of the bounding box distribution characterized by the first discrimination result) lambda in the first discrimination dimension 2 Representing a second weight coefficient corresponding to a second regression loss component in a second discrimination dimension, V i2 Represents a second regression loss component (i.e., a regression loss component corresponding to the degree of coincidence of the bounding box coordinates characterized by the second discrimination result) lambda in the second discrimination dimension 3 Representing a third weight coefficient corresponding to the regression loss compensation value, V i3 Representing a regression loss compensation value (i.e., a third regression loss component); specifically, the first discrimination dimension may be a regression loss discrimination dimension based on the degree of similarity of the boundary box distribution, and the second discrimination dimension may be a regression loss discrimination dimension based on the degree of coincidence of the boundary box coordinates.
In a specific implementation, for a plurality of first initial bounding boxes, the first weight coefficient and the second weight coefficient may be kept unchanged, however, considering that the first regression loss component and the second regression loss component respectively correspond to different regression loss discrimination dimensions (i.e. a regression loss discrimination dimension based on the degree of similarity of the bounding box distribution and a regression loss discrimination dimension based on the degree of coincidence of the bounding box coordinates), and the emphasis of the regression loss consideration of the different regression loss discrimination dimensions is also different (e.g. the regression loss of the first initial bounding box corresponding to the real bounding box considering the edge blurring of the bounding box is emphasized by the regression loss of the first initial bounding box corresponding to the real bounding box considering the degree of coincidence of the bounding box, the regression loss discrimination dimension based on the degree of coincidence of the bounding box coordinates is emphasized by the regression loss of the first initial bounding box considering the similarity of the bounding box distribution but the specific position deviation is reflected by the size relation of the first regression loss component and the second regression loss component to a certain degree which regression loss discrimination dimension can more accurately represent the regression loss between the real bounding box and the first prediction bounding box, and based on the first initial weight coefficient and the first weight coefficient and the second initial weight coefficient are adjusted to be smaller; specifically, if the absolute value of the difference between the first regression loss component and the second regression loss component is not greater than a preset loss threshold, the first weight coefficient and the second weight coefficient are kept unchanged; if the absolute value of the difference value between the first regression loss component and the second regression loss component is larger than a preset loss threshold value and the first regression loss component is larger than the second regression loss component, increasing a first weight coefficient according to a first preset adjustment mode; if the absolute value of the difference between the first regression loss component and the second regression loss component is larger than a preset loss threshold value and the first regression loss component is smaller than the second regression loss component, increasing the second weight coefficient according to a second preset adjustment mode, so that the effect that the regression loss component corresponding to the discrimination dimension of the regression loss of the bounding box can be reflected by the key reference for each first initial bounding box in the model training process is achieved, and further the accuracy of model parameter optimization is further improved.
It should be noted that, the first weight coefficient increasing amplitude corresponding to the first preset adjusting mode and the second weight coefficient increasing amplitude corresponding to the second preset adjusting mode may be the same or different, and the weight coefficient increasing amplitude may be set according to actual requirements, which is not limited in this application.
The process of obtaining a first discrimination result from discrimination dimension consideration of the distribution similarity of the boundary frames includes performing true and false discrimination of the boundary frames corresponding to the first initial boundary frame and the first prediction boundary frame to obtain the first discrimination result, specifically including:
a1, determining a first judging probability that a real boundary frame is predicted to be true by a judging sub-model based on a real boundary frame corresponding to a first initial boundary frame; determining a second discrimination probability of the first prediction boundary box predicted as falsification by the discrimination sub-model based on a first prediction boundary box corresponding to the first initial boundary box;
and A2, generating a first judging result corresponding to the first initial boundary box based on the first judging probability and the second judging probability corresponding to the first initial boundary box.
Specifically, for each first initial boundary frame, judging the probability that the real boundary frame corresponding to the first initial boundary frame comes from real data through the judging sub-model, namely for the real boundary frame, judging the authenticity of the real boundary frame through the judging sub-model to obtain a first judging probability of predicting that the real boundary frame is the real data; similarly, for each first initial bounding box, the probability that the first prediction bounding box corresponding to the first initial bounding box is derived from the generated data (that is, the probability that the first prediction bounding box is derived from the real data by subtracting the discrimination sub-model from the value 1) is determined by the discrimination sub-model, that is, for the first prediction bounding box, the discrimination sub-model performs true and false discrimination on the first prediction bounding box, so as to obtain the second discrimination probability that the first prediction bounding box is predicted to be the generated data.
Specifically, the first probability distribution corresponding to the real boundary frame is compared with the second probability distribution corresponding to the first prediction boundary frame from the boundary frame distribution similarity degree angle by the judging sub-model, so that the real boundary frame and the first prediction boundary frame are subjected to true-false judgment to obtain corresponding judging probabilities, the judging probabilities can represent the distribution similarity degree between the real boundary frame and the corresponding first prediction boundary frame, and therefore, after the first judging probabilities and the second judging probabilities are determined, a first judging result can be obtained, wherein the first judging result can represent the boundary frame distribution similarity degree; further, based on the first discrimination result, a first regression loss component corresponding to a discrimination dimension representing the distribution similarity degree of the boundary frame can be determined, wherein the larger the first discrimination probability and the second discrimination probability are, the lower the distribution similarity degree of the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame is represented, and therefore the larger the first regression loss component corresponding to the first initial boundary frame is; and updating model parameters of the generation sub model based on the first regression loss component, so that the generation result of the generation sub model can optimize the loss value of the model to be trained after being predicted by the discrimination sub model, the purpose of optimizing the generation sub model is achieved, and the boundary frame prediction effect of the generation sub model is improved.
Further, in order to improve the accuracy of the first discrimination result corresponding to each first initial bounding box, so that in the process of determining the sub-regression loss value based on the first discrimination result, the accuracy of the first regression loss component corresponding to the discrimination dimension of the bounding box distribution similarity degree can be improved, based on this, the step A2 generates the first discrimination result based on the first discrimination probability and the second discrimination probability corresponding to the first initial bounding box, specifically including:
step A21, determining a first weighted probability based on the first discrimination probability and a first prior probability of a real boundary box corresponding to the first initial boundary box; determining a second weighted probability based on the second discrimination probability and a second prior probability of the first initial bounding box;
step A22, generating a first discrimination result based on the first weighted probability and the second weighted probability corresponding to the first initial boundary box.
Specifically, in determining the first discrimination result characterizing the similarity of the boundary frame distribution, the discrimination sub-model performs the true and false discrimination on the true boundary frame and the first prediction boundary frame in consideration of the first prior probability of the true boundary frame and the second prior probability of the first initial boundary frame, and the obtained first discrimination probability and second discrimination probability are weighted to determine the first discrimination result (i.e., the first discrimination result may include the first weighted probability and the second weighted probability), so that the first regression loss component related to the similarity of the boundary frame distribution obtained based on the first discrimination result may be expressed as:
Wherein,representing the prior probability (i.e., first prior probability) of the occurrence of the ith real bounding box, P i1 A first discrimination probability indicating that the ith real bounding box is predicted as true by the discrimination sub-model,/for>Representing the prior probability (i.e., the second prior probability) of the occurrence of the ith first initial bounding box, P i2 Representing a second discrimination probability that the ith first prediction boundary box is predicted as counterfeit by the discrimination sub-model.
In the specific implementation, it is noted that,the prior probability of occurrence of the i-th first original bounding box may be given, since the first prediction bounding box is derived by the generator sub-model based on the bounding box of the first original bounding box, therefore +.>The prior probability may also occur for the i-th first prediction bounding box.
Specifically, since the probability of occurrence of the real bounding box and the prediction bounding box both obey a certain probability distribution, such as a gaussian distribution, the first prior probability and the second prior probability can be obtained by:
/>
wherein,representing the true bounding box, σ, corresponding to the first initial bounding box with the sequence number i 1 Variance of distribution probability representing a first preset number of real bounding boxes,/for>The mean value of the distribution probabilities of the first preset number of real bounding boxes is represented.
Wherein,a first initial bounding box denoted by the number i, σ 2 Variance of distribution probability representing a first initial bounding box of a first preset number, +.>A mean value representing a distribution probability of a first predetermined number of first initial bounding boxes.
Specifically, the regression loss value is equal to the sum of the sub-regression loss values corresponding to the first initial bounding boxes of the first preset number, which may be specifically expressed as:
wherein N is reg Representing a first preset number, i represents the serial number of a first initial boundary box, and the value of i is 1 to N reg
The process of obtaining the second discrimination result from the discrimination dimension consideration of the coordinate superposition degree of the boundary frame, which is based on the real boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame, calculates the boundary frame intersection ratio loss to obtain the second discrimination result, specifically includes:
step B1, calculating boundary frame cross-ratio loss of a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame to obtain a first cross-ratio loss;
specifically, taking a first initial boundary box with the sequence number i as an example, calculating the cross-ratio loss between a real boundary box with the sequence number i and a first prediction boundary box with the sequence number i, and obtaining a first cross-ratio loss corresponding to the first initial boundary box with the sequence number i.
Step B2, determining a second discrimination result corresponding to the first initial boundary box based on the first cross ratio loss;
specifically, since the coordinate coincidence degree of the boundary frames can be represented by the magnitude of the cross-ratio loss between the two boundary frames, a second discrimination result can be obtained based on the cross-ratio loss between the real boundary frame and the first prediction boundary frame, so that a second regression loss component corresponding to the discrimination dimension considered from the angle of the coordinate coincidence degree of the boundary frames is determined based on the second discrimination result, and the model is further promoted to carry out the regression learning of the boundary frames.
Further, for the determination process of the second discrimination result, only the first intersection ratio loss between the real boundary frame and the first prediction boundary frame corresponding to the real boundary frame and the second discrimination result can be considered, however, in order to improve the determination accuracy of the second discrimination result, thereby improving the accuracy of the second regression loss component corresponding to the discrimination dimension considered from the perspective of the degree of coincidence of the boundary frame coordinates, further improving the accuracy of the regression loss value used for adjusting the model parameter, not only the first intersection ratio loss between the real boundary frame and the first prediction boundary frame corresponding to the second discrimination result but also the second intersection ratio loss between the real boundary frame and the other first prediction boundary frames are considered, so that the comparison between the real boundary frame and the positive example sample (namely, the first prediction boundary frame corresponding to a certain real boundary frame obtained by the regression learning of the boundary frame) and the negative example sample (namely, the first prediction boundary frame corresponding to a certain real boundary frame obtained by the regression learning of the boundary frame) is carried out on the discrimination dimension of the degree of coincidence of the boundary frame coordinates, further learning the specific position representation of the real boundary frame is carried out, further, the boundary frame regression learning is carried out better, and the first intersection ratio loss is determined based on the first discrimination result, and the first discrimination result is determined based on the first comparison result, namely the first comparison result is obtained by the first comparison model:
B21, determining a comparison boundary frame set in first prediction boundary frames corresponding to a first preset number of first initial boundary frames respectively;
the contrast boundary frame set comprises other first prediction boundary frames except for the first prediction boundary frame corresponding to the first initial boundary frame or other first prediction boundary frames which do not comprise the target object outlined by the first initial boundary frame;
specifically, taking the first initial bounding box with the sequence number i as an example, the above-mentioned comparison bounding box set may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i (i.e. the first prediction bounding box with the sequence number k, k not equal to p, p=i), that is, all the other first prediction bounding boxes except the first prediction bounding box with the sequence number i are taken as negative examples of the real bounding box with the sequence number i; in order to further improve the selection accuracy of the negative example samples, the above-mentioned comparison bounding box set may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i, and the other first prediction bounding boxes do not include the target object outlined by the first initial bounding box with the sequence number i (i.e. the first prediction bounding box with the sequence number k, k not equal p, p=i or p=j, where the first prediction bounding box with the sequence number j is the same as the target object outlined by the first initial bounding box with the sequence number i), that is, only the other first prediction bounding boxes including different target objects with the first initial bounding box with the sequence number i are taken as the negative example samples of the real bounding box with the sequence number i.
B22, respectively carrying out boundary frame cross-ratio loss calculation on the real boundary frame corresponding to the first initial boundary frame and the other first prediction boundary frames to obtain a second cross-ratio loss;
specifically, taking the first initial bounding box with the sequence number i as an example, calculating the cross-ratio loss between the real bounding box with the sequence number i and the first prediction bounding box with the sequence number k for each other first prediction bounding box in the comparison bounding box set, and obtaining the second cross-ratio loss corresponding to the first prediction bounding box with the sequence number k.
And B23, determining a second judging result corresponding to the first initial boundary box based on the first cross-ratio loss and the second cross-ratio loss.
Specifically, in the process of determining the second discrimination result representing the coordinate coincidence degree of the boundary frame, based on the real boundary frame with the sequence number i and the first prediction boundary frame with the sequence number i, the first cross-over loss is calculated, and based on the real boundary frame with the sequence number i and the first prediction boundary frame with the sequence number k, the second cross-over loss (k not equal to p) is calculated to determine the second discrimination result (namely, the second discrimination result can comprise the first cross-over loss and the second cross-over loss), then, based on the second discrimination result, a second regression loss component related to the coordinate coincidence degree of the boundary frame can be determined, so that the model parameter is adjusted based on the second regression loss component, the coordinate coincidence degree of the real boundary frame with the sequence number i and the first prediction boundary frame with the sequence number i is higher, the coordinate coincidence degree with other first prediction boundary frames is smaller, and the global performance of the boundary frame regression learning is enhanced, and the accuracy of the boundary frame regression learning is further improved.
In a specific implementation, the second regression loss component is a logarithm of a target cross-ratio loss, where the target cross-ratio loss is a quotient value of a sum of an index of the first cross-ratio loss and a plurality of indexes of the second cross-ratio, that is, taking p=i as an example, the second regression loss component may be expressed as:
/>
wherein,representing the real bounding box corresponding to the first initial bounding box with serial number i +.>A first initial bounding box representing the sequence number i, < >>A first prediction boundary box corresponding to a first initial boundary box with the sequence number of i,/>Representing a first cross-ratio loss, ">A first initial bounding box denoted by number k, ">A first prediction bounding box corresponding to a first initial bounding box with a sequence number k is represented,represents the second cross-ratio loss, theta g Representing model parameters of the generated sub-model, ω representing a preset adjustment factor.
The determining process of the regression loss compensation value, based on the real bounding box and the first prediction bounding box corresponding to the first initial bounding box, calculates a regression loss compensation value for constraining a loss gradient of a regression loss function of a model to be trained, specifically includes:
step C1, generating a synthetic boundary frame corresponding to the first initial boundary frame based on the real boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame;
Specifically, taking a first initial boundary box with a sequence number of i as an example, according to a preset coordinate information sampling mode, determining a sampling coordinate information set based on a first coordinate information set corresponding to a real boundary box with the sequence number of i and a second coordinate information set corresponding to a first prediction boundary box with the sequence number of i; based on the set of sample coordinate information, a synthetic bounding box with a sequence number i is determined.
And C2, determining a regression loss compensation value based on the boundary frame distribution similarity degree of the synthesized boundary frame corresponding to the first initial boundary frame and the real boundary frame.
Specifically, in determining the synthesis bounding box corresponding to the first initial bounding box with the sequence number iThen, the boundary box distribution similarity degree between the synthesized boundary box with the sequence number i and the real boundary box with the sequence number i is calculated, namely +.>Then calculate the compensation gradient with respect to the synthesized bounding box for the degree of similarity of the bounding box distribution, i.e. +.>And determining a regression loss compensation value corresponding to the first initial boundary box with the sequence number of i based on the matrix two norms of the compensation gradient.
Specifically, for the determination process of the synthetic bounding box corresponding to a certain first initial bounding box, the step C1 generates the synthetic bounding box corresponding to the first initial bounding box based on the actual bounding box corresponding to the first initial bounding box and the first prediction bounding box, and specifically includes:
C11, determining a first coordinate information subset based on a first sampling proportion and a first coordinate information set of a real boundary box corresponding to the first initial boundary box;
c12, determining a second coordinate information subset based on a second sampling proportion and a second coordinate information set of a first prediction boundary box corresponding to the first initial boundary box; the first sampling ratio and the second sampling ratio may be preset according to actual conditions, and a sum of the first sampling ratio and the second sampling ratio is equal to 1;
and C13, generating a synthetic boundary box corresponding to the first initial boundary box based on the first coordinate information subset and the second coordinate information subset.
Specifically, taking a first initial bounding box with a sequence number of i as an example, randomly sampling in a first coordinate information set of a real bounding box with the sequence number of i according to a first sampling proportion to obtain a first coordinate information subset; randomly sampling in a second coordinate information set of the first prediction boundary box with the sequence number of i according to a second sampling proportion to obtain a second coordinate information subset; determining the combination of the first coordinate information subset and the second coordinate information subset as a sampling coordinate information set, and drawing a boundary frame based on the sampling coordinate information set, namely a synthetic boundary frame with the sequence number of i; the synthetic bounding box is a bounding box obtained by randomly sampling and mixing the coordinate information (namely real data) of the real bounding box with the number i and the coordinate information (namely generated data) of the first prediction bounding box with the number i, so that part of the coordinate information of the synthetic bounding box comes from the real data, and the other part of the coordinate information comes from the generated data, namely the synthetic bounding box is commonly determined by the real data and the generated data and has certain randomness, and therefore, the gradient of the regression loss value can be compensated under the condition that the gradient of the regression loss corresponding to the first discrimination dimension suddenly decreases or even becomes zero, thereby avoiding the problem that the gradient of the regression loss value suddenly decreases due to the sudden decrease or even becomes zero of the regression loss corresponding to the first discrimination dimension in the model training process, and further improving the training accuracy of model parameters.
Further, considering that in the target detection process, the target detection model needs to determine not only the position of the target object, but also the specific category of the target object, so that in the training process of the target detection model, the problem of low accuracy of category identification for some first initial boundary frames may exist, considering that for the first initial boundary frames with low accuracy of category prediction, the first prediction boundary frames corresponding to such first initial boundary frames may not truly reflect the boundary frame prediction accuracy of the generated sub-model, and further judging that the judging results of the sub-model for the first prediction boundary frames corresponding to such first initial boundary frames and the actual boundary frames also cannot truly reflect the boundary frame prediction accuracy of the generated sub-model, and therefore, in order to further improve the accuracy of the regression loss value, in the process of determining the sub-regression loss value corresponding to the first prediction boundary frames, only the real category corresponding to the first prediction boundary frame is considered to the first regression loss value under the condition that the real category corresponding to the first prediction boundary frame is matched with the first prediction category, otherwise, only the corresponding sub-regression loss value is considered, namely, the preset regression loss value is excluded, and the preset classification loss value is not met, and the sub-regression loss value is required to be based on the training result; the specific implementation manner of each model training further comprises the following steps: the classification sub-model classifies the first initial boundary box or the first prediction boundary box to obtain a first prediction category; in a specific implementation, the classification sub-model performs class prediction on the first initial bounding box or the first prediction bounding box, and the output result can be a first class prediction result; the first class prediction result comprises the prediction probability that the first initial boundary box or the target object outlined by the first prediction boundary box belongs to each candidate class, the candidate class corresponding to the maximum value of the prediction probability is the first prediction class, namely the class of the target object outlined by the first initial boundary box or the first prediction boundary box is predicted to be the first prediction class by the classification sub-model, namely the class of the target object in the first initial boundary box or the image area in the first prediction boundary box is predicted to be the first prediction class by the classification sub-model; in addition, in the specific implementation, considering that the position information of the first initial boundary frame and the position information of the first prediction boundary frame do not deviate greatly, the image features in the first initial boundary frame and the image features in the first prediction boundary frame do not deviate greatly, so that the identification of the target object category of the image area in the boundary frame is not affected, based on the fact that the boundary frame prediction and the category prediction are carried out successively, the first prediction boundary frame can be input into the classification sub-model to carry out category prediction, so that a corresponding first category prediction result is obtained, namely, the first prediction boundary frame is obtained based on the first initial boundary frame prediction, and then the category prediction is carried out on the first prediction boundary frame, so that a first category prediction result is obtained; and aiming at the situation that the boundary frame prediction and the class prediction are synchronously executed, the first initial boundary frame can be input into the classification sub-model to conduct class prediction to obtain a corresponding first class prediction result, namely, the first prediction boundary frame is obtained based on the first initial boundary frame prediction, and the class prediction is conducted on the first initial boundary frame to obtain the first class prediction result.
It should be noted that, the model parameter iterative training process of the classification sub-model may refer to the existing classification model training process, which is not described herein.
Specifically, the target information further includes a matching relationship between a first prediction category corresponding to the first initial bounding box and a true category of the first initial bounding box, wherein, in a determining process of the sub-regression loss value corresponding to each first initial bounding box, if the first prediction category corresponding to the first initial bounding box is not matched with the true category, the sub-regression loss value corresponding to the first initial bounding box is zero; if the first prediction category corresponding to the first initial bounding box is matched with the real category, the sub-regression loss value corresponding to the first initial bounding box is determined based on at least one of a first regression loss component corresponding to the similarity of the bounding box distribution, a second regression loss component corresponding to the coordinate coincidence degree of the bounding box and the regression loss compensation value.
Specifically, determining whether the first prediction category corresponding to the first initial bounding box matches the real category may be related to the first category prediction result, which may specifically include: a constraint condition of a single matching mode or a constraint condition of a variable matching mode, wherein for the constraint condition of the single matching mode, a category matching constraint condition used by each round of model training is kept unchanged (i.e. is irrelevant to the number of current model training rounds), for example, for each round of model training, if a real category is the same as a first prediction category, the first prediction category corresponding to the first initial bounding box is determined to be matched with the real category; for the constraint conditions of the change matching mode, the class matching constraint conditions used by each round of model training are related to the number of current model training rounds, and in particular, the constraint conditions of the change matching mode can be classified into class matching stage constraint conditions or class matching gradual change constraint conditions;
The class matching stage constraint condition may be that when the number of training rounds of the current model is less than a first preset number of rounds, the real class and the first prediction class belong to the same class group, and when the number of training rounds of the current model is greater than or equal to the first preset number of rounds, the real class is the same as the first prediction class, that is, based on the class matching stage constraint condition and a class prediction result corresponding to the first initial bounding box, stage class matching constraint can be realized; the class matching gradual change constraint condition may be that the sum of a first constraint term and a second constraint term is greater than a preset probability threshold, the first constraint term is a first prediction probability corresponding to a real class in a class prediction probability subset, the second constraint term is a product of the sum of second prediction probabilities except the first prediction probability in the class prediction probability subset and a preset adjustment factor, the preset adjustment factor gradually decreases along with the increase of the number of current training rounds, that is, based on the class matching gradual change constraint condition and a class prediction result corresponding to a first initial boundary frame, gradual change class matching constraint can be realized; specifically, a class prediction probability subset is determined based on a class prediction result corresponding to the first initial boundary box, the class prediction probability subset comprises a first prediction probability that a target object outlined by the first prediction boundary box belongs to a real class and a second prediction probability that the target object belongs to a non-real class in a target group, namely the class prediction probability subset comprises a first prediction probability under the real class in the target group and a second prediction probability under the non-real class in the target group (namely a candidate class in the target group except the real class) obtained by carrying out class prediction on the first initial boundary box or the first prediction boundary box by a classification sub-model, and the target group is a class group where the real class is located; in specific implementation, a plurality of candidate categories associated with the target detection task are predetermined, and based on semantic information of each candidate category, the plurality of candidate categories are subjected to group division to obtain a plurality of category groups.
Specifically, because the first initial bounding box is obtained by extracting the region of interest by using the preset region of interest extraction model, there may be a situation that the classification identification of the first prediction bounding box corresponding to the first initial bounding box is inaccurate in the initial stage of model training due to the fact that the region where the target object outlined by the first initial bounding box is located is not accurate enough, based on this, in the process of determining the sub-regression loss value corresponding to the first initial bounding box, the matching relationship between the first prediction class corresponding to the first initial bounding box and the real class of the first initial bounding box is referred to, that is, the matching relationship used for representing whether the first prediction class corresponding to the first initial bounding box is matched with the real class is determined based on the preset class matching constraint condition;
further, the classification sub-model may be pre-trained, or may be in a process of training model parameters of the generation sub-model, the model parameters of the classification sub-model are synchronously trained, that is, the classification loss value is determined based on the first prediction class and the real class, the model parameters of the classification sub-model are iteratively trained based on the classification loss value, wherein, for the case of synchronously training the model parameters of the classification sub-model, only when the real class corresponding to the first prediction boundary is the same as the first prediction class, the accuracy of the model parameters in the classification sub-model to be trained is considered possibly also due to low accuracy in the model training early stage, thereby resulting in the situation that the class identification of the first prediction boundary corresponding to the first initial boundary is inaccurate, so that, in the model training early stage, the requirement on the class accuracy is relaxed, as long as the real class corresponding to the first prediction boundary belongs to the same class group, the sub-regression loss value corresponding to the first prediction boundary is considered, and in the model training later stage, the requirement on the class accuracy is added, and the constraint loss is preset based on the constraint condition: constraint conditions of the above-mentioned change matching modes (such as category matching staged constraint conditions or category matching gradual change constraint conditions);
Further, in order to ensure that the transition between two types of matching constraint branches defining that the first prediction category and the real category satisfy the matching relationship is smoother (i.e. the first prediction category belongs to the target group, and the first prediction category is the same as the real category), so that as the number of model training rounds increases, the preset category matching constraint condition gradually changes from defining that the first prediction category falls into the target group to defining that the first prediction category is the same as the real category, based on which, preferably, the preset category matching constraint condition includes: category matching gradient constraints.
In a specific implementation, for the case that the preset category matching constraint condition is a category matching gradual change constraint condition, taking a first initial bounding box with a sequence number i as an example, the category matching gradual change constraint condition may be expressed as:
wherein groups represent target groups, real i Representing the true class of the first initial bounding box with the sequence number i in the target group groups, f epsilon groups\real i Representing non-real categories in the target group, beta represents a predictive modifier,representing a first predictive probability (i.e. the first constraint mentioned above), -a- >Representing a second predictive probability,/->Representing the second constraint item, μ representing the preset probability threshold; specifically, the->The larger the first predicted class is, the closer the first predicted class is to the true class is; since the preset adjustment factor decreases with the increase of the current training wheel number, the reference duty ratio of the second constraint term gradually decreases, so that whether the first prediction category matches the real category is mainly determined by the first constraint term (i.e. the first prediction probability under the real category) in the later period of model training, and then the second constraint term becomes zero after the current model training wheel number reaches a certain model training wheel number, i.e. when>And when the probability threshold value is larger than the preset probability threshold value, determining the real category as a first prediction category by the classifying sub-model.
Specifically, for the preset adjustment factor, the current model training wheel number is reduced along with the increase of the current model training wheel number, and if the current model training wheel number is smaller than or equal to the target training wheel number, the second constraint term is positively correlated with the preset adjustment factor, and the preset adjustment factor is negatively correlated with the current model training wheel number; and if the number of training wheels of the current model is larger than the number of target training wheels, the second constraint item is zero, wherein the number of target training wheels is smaller than the total number of training wheels.
In specific implementation, in order to ensure the adjustment smoothness of the preset adjustment factor, the value of the preset adjustment factor β may be gradually reduced by adopting a linearly decreasing adjustment manner, so that the determination process of the preset adjustment factor used for the current model training specifically includes:
(1) Aiming at first-round model training, determining a first preset value as a preset adjustment factor used for current model training;
specifically, the first preset value may be set according to actual requirements, in order to simplify the adjustment complexity, the first preset value may be set to 1, that is, the preset adjustment factor β=1, that is, in the case of first-round model training, the above-mentioned category matching gradual change constraint condition may be:
i.e. < ->
That is, for first round model training, it is determined whether a first predicted class corresponding to a first initial bounding box matches a true class based on a sum of a first predicted probability and a second predicted probability corresponding to a target group.
(2) Aiming at the model training of the non-initial round, determining a preset adjusting factor used for the model training according to a factor decreasing adjusting mode based on the current model training round number, the target training round number and the first preset value.
Specifically, if the preset adjustment factor β=1 corresponding to the first-round model training, under the condition of non-first-round model training, the category matching gradual change constraint condition may be:
that is, for non-first round model training, the above-mentioned categories match in the gradient constraintsAnd as the number of model training rounds increases, a second constraint termThe participation degree of the (c) is gradually reduced.
For example, the decreasing formula corresponding to the factor decreasing adjustment manner may be:
wherein,representation->Maximum value with 0, above +.>The first term 1 in (1) represents a first preset value (i.e. a preset adjustment factor beta used for first-round training), delta represents the current model training wheel number, Z represents the target training wheel number, i.e. the target training wheel number can be the total training wheel number minus 1, or the designated training wheel number, the designated training wheel number is smaller than the total training wheel number, and the total training wheel number is equal to the total training wheel numberThe difference value of the designated training round number is the preset round number Q, Q is larger than 2, namely, the preset adjusting factor beta is set to 0 in the training process of a certain round number (not the last round) in the later period of model training, namely, the judgment conditions used in model training from delta=Z+1 round to the last round in the later period of model training are all
It should be noted that, for the case that the target training wheel number Z is the total training wheel number minus 1, the decreasing formula may be:that is, the preset adjustment factor is set to 0 in the last round of model training, that is, the judgment conditions used in the last round of model training are +.>In addition, the above-indicated decreasing formula is only a relatively simple linear decreasing adjustment manner, and in the practical application process, the decreasing rate of the preset adjustment factor β may be set according to the actual requirement, so the decreasing formula does not limit the protection scope of the present application.
In a specific implementation, the model to be trained includes a generating sub-model, a discriminating sub-model and a classifying sub-model, as shown in fig. 4b, which provides a schematic diagram of a specific implementation principle of a training process of another target detection model, and specifically includes:
(1) Carrying out target region extraction on a sample image dataset by utilizing a preset region of interest extraction model in advance to obtain N anchor frames;
(2) Randomly sampling m anchor frames from N anchor frames as first initial boundary frames aiming at each round of model training, and determining real boundary frames corresponding to each first initial boundary frame respectively;
(3) For each first initial boundary frame, generating a sub-model to conduct boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame; the classification sub-model carries out classification prediction on the first prediction boundary box to obtain a classification prediction result; determining a category matching result according to a preset category matching constraint condition, the real category of the real boundary frame corresponding to the first initial boundary frame and the category prediction result of the first prediction boundary frame corresponding to the first initial boundary frame; if the category matching result indicates that the first prediction category and the real category do not meet the preset category matching constraint condition, the sub-regression loss value corresponding to the first initial boundary box is zero; if the category matching result represents that the first prediction category and the real category meet the preset category matching constraint condition, determining a first regression loss component based on a first judging result in a judging result set of the first initial boundary frame, determining a second regression loss component based on a second judging result in a judging result set of the first initial boundary frame, determining a third regression loss component based on a third judging result in the judging result set of the first initial boundary frame, and determining a sub regression loss value corresponding to the first initial boundary frame based on the first regression loss component, the second regression loss component and the third regression loss component;
It should be noted that, the above-mentioned determination process of the category matching result may be performed by an independent processing module, or may be performed by a discriminant sub-model, so that, for the case that the first prediction category and the real category do not meet the preset category matching constraint condition, it is only necessary to directly determine that the corresponding discriminant result set is null or preset information, and it is unnecessary to generate the discriminant result set based on the real bounding box corresponding to the first initial bounding box and the corresponding first prediction bounding box, so that the model training efficiency can be further improved; specifically, referring to the description shown in fig. 4b, the real class corresponding to each real bounding box and the class prediction result corresponding to each first prediction bounding box are input to the discriminant sub-model; determining a category matching result by the judging sub-model according to the real category of the real boundary frame corresponding to the first initial boundary frame and the category prediction result of the first prediction boundary frame corresponding to the first initial boundary frame; if the category matching result indicates that the first prediction category and the real category do not meet the preset category matching constraint condition, the corresponding judging result set is empty or preset information, and therefore the sub-regression loss value determined based on the judging result set is zero; if the category matching result represents that the first prediction category and the real category meet the preset category matching constraint condition, generating a judging result set based on the real boundary box corresponding to the first initial boundary box and the corresponding first prediction boundary box; therefore, the sub-regression loss value determined based on the discrimination result set is determined based on the first regression loss component corresponding to the first discrimination result, the second regression loss component corresponding to the second discrimination result, and the third regression loss component corresponding to the third discrimination result in the discrimination result set;
That is, in determining whether the sub-regression loss value corresponding to the first initial bounding box is zero, the set of discrimination results may be generated directly based on the actual bounding box corresponding to the first initial bounding box and the corresponding first prediction bounding box; further, determining a matching relation between the first prediction category and the real category based on the category prediction result (namely, a category matching result indicates whether a preset category matching constraint condition is met between the first prediction category and the real category); if the matching relation is category mismatch, determining that the corresponding sub-regression loss value is zero, and if the matching relation is category match, determining the corresponding sub-regression loss value based on a plurality of discrimination results in the discrimination result set; or determining a matching relation between the first prediction category and the real category based on the category prediction result, if the matching relation is category mismatch, determining that the corresponding discrimination result set is null or preset information, and determining that the corresponding sub-regression loss value is zero, if the matching relation is category match, generating the discrimination result set based on the real boundary frame corresponding to the first initial boundary frame and the corresponding first prediction boundary frame, and determining the corresponding sub-regression loss value based on a plurality of discrimination results in the discrimination result set;
(4) Determining a regression loss value of the model to be trained based on the sub regression loss values respectively corresponding to the first initial boundary boxes; adjusting model parameters of the generating sub-model and the judging sub-model based on the regression loss value by using a random gradient descent method to obtain the generating sub-model and the judging sub-model after parameter updating;
(5) If the current model training result meets the preset model training ending condition, determining the updated generation sub-model as a trained target detection model; if the current model training result does not meet the preset model training ending condition, determining the updated generation sub-model and the updated judgment sub-model as a model to be trained for the next round of model training until the preset model training ending condition is met.
According to the target detection model training method, in the model training stage, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the predicted first prediction boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image of the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the target detection model to a new to-be-detected image is ensured, and the data migration adaptability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by the similar distribution of the boundary frame but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved.
Corresponding to the method for training the target detection model described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a target detection method, fig. 5 is a flowchart of the target detection method provided in the embodiment of the present application, where the method in fig. 5 can be performed by an electronic device provided with a target detection apparatus, and the electronic device may be a terminal device or a designated server, where a hardware device for target detection (i.e. the electronic device provided with the target detection apparatus) and a hardware device for training the target detection model (i.e. the electronic device provided with the target detection model training apparatus) may be the same or different, and as shown in fig. 5, the method at least includes the following steps:
s502, acquiring a third preset number of second initial bounding boxes; the second initial boundary box is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model;
specifically, the process of obtaining the third preset number of second initial bounding boxes may refer to the process of obtaining the first preset number of first initial bounding boxes, which is not described herein.
S504, inputting the second initial boundary boxes into a target detection model to perform target detection, and obtaining second prediction boundary boxes and second prediction categories corresponding to the second initial boundary boxes; the target detection model is obtained based on the training method of the target detection model, and the specific training process of the target detection model is referred to the above embodiment and will not be described herein.
Specifically, the target detection model comprises a classification sub-model and a generation sub-model; for each second initial bounding box: in the target detection process, generating a sub-model to conduct boundary frame prediction based on a second initial boundary frame to obtain a second prediction boundary frame corresponding to the second initial boundary frame; and the classification sub-model classifies the second initial boundary box or the second prediction boundary box to obtain a second prediction category corresponding to the second initial boundary box.
In the implementation, the classification sub-model performs class prediction on the second initial bounding box or the second prediction bounding box, and the output result can be a second class prediction result; the second class prediction result comprises the prediction probability that the second initial boundary box or the target object outlined by the second prediction boundary box belongs to each candidate class, the candidate class corresponding to the maximum value of the prediction probability is the second prediction class, namely the class of the target object outlined by the second initial boundary box or the second prediction boundary box is predicted to be the second prediction class by the classification sub-model, namely the class of the target object in the image area in the second initial boundary box or the second prediction boundary box is predicted to be the second prediction class by the classification sub-model; in addition, in the specific implementation, considering that the position information of the second initial boundary frame and the second prediction boundary frame does not deviate greatly, the image features in the second initial boundary frame and the image features in the second prediction boundary frame do not deviate greatly, so that the identification of the target object category of the image area in the boundary frame is not affected, based on the fact that the boundary frame prediction and the category prediction are carried out successively, the second prediction boundary frame can be input into the classification sub-model for category prediction, and a corresponding second category prediction result is obtained, namely, the second prediction boundary frame is obtained based on the second initial boundary frame prediction, and then the category prediction is carried out on the second prediction boundary frame, so that the second category prediction result is obtained; and aiming at the condition that the boundary frame prediction and the class prediction are synchronously executed, a second initial boundary frame can be input into the classification sub-model to conduct class prediction, so that a corresponding second class prediction result is obtained, namely, a second prediction boundary frame is obtained based on the second initial boundary frame prediction, and class prediction is conducted on the second initial boundary frame, so that the second class prediction result is obtained.
S506, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second initial boundary box;
specifically, based on the second prediction bounding boxes and the second prediction categories corresponding to the second initial bounding boxes, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined, for example, the image to be detected contains a cat, a dog and a pedestrian.
In specific implementation, the object detection model includes a generating sub-model and a classifying sub-model, as shown in fig. 6, which provides a schematic diagram of a specific implementation principle of an object detection process, and specifically includes:
extracting target areas of the image to be detected by using a preset interested area extraction model to obtain P anchor frames;
randomly sampling n anchor frames from the P anchor frames to serve as a second initial boundary frame;
generating a sub-model for each second initial boundary frame, and carrying out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second initial boundary boxes.
It should be noted that, the target detection model obtained based on the training of the target detection model training method can be applied to any specific application scenario in which target detection is required to be performed on an image to be detected, where the image to be detected may be acquired by an image acquisition device disposed at a certain site position, and the corresponding target detection device may belong to the image acquisition device, and may specifically be an image processing device in the image acquisition device, where the image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, and performs target detection on the image to be detected; the object detection means may also be a separate object detection device independent of the image acquisition device, which receives the image to be detected of the image acquisition device and performs object detection on the image to be detected.
Specifically, for a specific application scenario of target detection, for example, an image to be detected may be acquired by an image acquisition device disposed at a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance, etc.), and the corresponding target object to be detected in the image to be detected is a target user entering the public place, and the target detection model is used to perform target detection on the image to be detected, so as to define a second prediction bounding box containing the target user entering the public place in the image to be detected, and determine a second prediction category corresponding to the second prediction bounding box (i.e., a category to which the target user contained in the second prediction bounding box belongs, such as at least one of age, gender, height, and occupation), so as to obtain a target detection result of the image to be detected; then, determining a user group identification result (such as the flow of people entering the public place or the attribute of the user group entering the public place) based on the target detection result, and further executing corresponding business processing (such as automatically triggering entry limit prompting operation or pushing information to the target user) based on the user group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of corresponding business processing based on the target detection result.
For another example, the image to be detected may be acquired by an image acquisition device disposed at each monitoring point in a certain cultivation base, and the corresponding object to be detected in the image to be detected is a target cultivation object in the monitoring point, and the object detection model is used to perform object detection on the image to be detected, so as to define a second prediction boundary box containing the target cultivation object in the image to be detected, and determine a second prediction category corresponding to the second prediction boundary box (i.e. a category to which the target cultivation object contained in the second prediction boundary box belongs, such as at least one of a living body state and a body size), so as to obtain a target detection result of the image to be detected; then, determining a breeding object group identification result (such as the survival rate of the target breeding objects in the breeding monitoring point or the growth rate of the target breeding objects in the breeding monitoring point) based on the target detection result, and further executing corresponding control operation (such as automatically sending out alarm prompt information if the survival rate is detected to be reduced or automatically controlling to increase the feeding amount or the feeding frequency if the growth rate is detected to be reduced) based on the breeding object group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of the corresponding control operation based on the target detection result.
In the target detection method, in the target detection process, firstly, a plurality of candidate bounding boxes are extracted by using a preset region of interest extraction model, and then a third preset number of candidate bounding boxes are randomly sampled in the candidate bounding boxes to serve as second initial bounding boxes; generating a sub-model for each second initial boundary frame, and carrying out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second initial boundary box; in the model parameter training process of the generated sub-model, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the first prediction boundary frame is more similar to the real boundary frame, the model generalization and data migration of the target detection model are improved, and the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image is further improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value is determined based on a judging result set output by the judging sub-model, and model parameters are repeatedly updated based on the regression loss value, so that the model parameter updating efficiency of the generating sub-model is improved; the judging result set simultaneously comprises judging results representing the distribution similarity degree of the boundary frame and the coordinate superposition degree of the boundary frame, so that the accuracy of the regression loss value obtained based on the judging result set is higher, the accuracy of the model parameter updated based on the regression loss value is further improved, the fact that the generated sub-model can accurately conduct boundary frame prediction on a new image to be detected is ensured, and the accuracy of target detection on the image to be detected by using the target detection model is further improved.
It should be noted that, in this application, the embodiment and the previous embodiment in this application are based on the same inventive concept, so the specific implementation of this embodiment may refer to the implementation of the foregoing object detection model training method, and the repetition is not repeated.
Corresponding to the above-mentioned target detection model training method described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a target detection model training device, and fig. 7 is a schematic diagram of module composition of the target detection model training device provided in the embodiment of the present application, where the device is used to execute the target detection model training method described in fig. 1 to fig. 4b, and as shown in fig. 7, the device includes:
a first bounding box obtaining module 702 configured to obtain a first preset number of first initial bounding boxes, and obtain real bounding boxes corresponding to the first initial bounding boxes respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
the model training module 704 is configured to input the first initial bounding box and the real bounding box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
The model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame; determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box; and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
According to the target detection model training device, in the model training stage, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the predicted first prediction boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image of the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the target detection model to a new to-be-detected image is ensured, and the data migration adaptability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by the similar distribution of the boundary frame but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved.
It should be noted that, the embodiment of the object detection model training device in the present application and the embodiment of the object detection model training method in the present application are based on the same inventive concept, so that the specific implementation of the embodiment may refer to the implementation of the corresponding object detection model training method, and the repetition is omitted.
Corresponding to the above-mentioned target detection methods described in fig. 5 to 6, based on the same technical concept, an embodiment of the present application further provides a target detection apparatus, and fig. 8 is a schematic block diagram of the target detection apparatus provided in the embodiment of the present application, where the apparatus is configured to perform the target detection method described in fig. 5 to 6, and as shown in fig. 8, the apparatus includes:
a second bounding box acquisition module 802 configured to acquire a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module 804 is configured to input the second initial bounding boxes into a target detection model to perform target detection, so as to obtain second prediction bounding boxes and second prediction categories corresponding to the second initial bounding boxes;
The detection result generating module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding boxes and the second prediction categories corresponding to the second initial bounding boxes.
In the target detection process, firstly extracting a plurality of candidate bounding boxes by using a preset region of interest extraction model, and randomly sampling a third predicted number of candidate bounding boxes in the candidate bounding boxes to serve as a second initial bounding box; generating a sub-model for each second initial boundary frame, and carrying out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second initial boundary box; in the model parameter training process of the generated sub-model, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the first prediction boundary frame is more similar to the real boundary frame, the model generalization and data migration of the target detection model are improved, and the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image is further improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value is determined based on a judging result set output by the judging sub-model, and model parameters are repeatedly updated based on the regression loss value, so that the model parameter updating efficiency of the generating sub-model is improved; the judging result set simultaneously comprises judging results representing the distribution similarity degree of the boundary frame and the coordinate superposition degree of the boundary frame, so that the accuracy of the regression loss value obtained based on the judging result set is higher, the accuracy of the model parameter updated based on the regression loss value is further improved, the fact that the generated sub-model can accurately conduct boundary frame prediction on a new image to be detected is ensured, and the accuracy of target detection on the image to be detected by using the target detection model is further improved.
It should be noted that, the embodiments of the object detection apparatus and the embodiments of the object detection method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding object detection method, and the repetition is not repeated.
Further, corresponding to the methods shown in fig. 1 to 6, based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device is configured to perform the above-mentioned object detection model training method or the object detection method, as shown in fig. 9.
Computer devices may vary widely in configuration or performance, and may include one or more processors 901 and memory 902, where memory 902 may store one or more stored applications or data. Wherein the memory 902 may be transient storage or persistent storage. The application programs stored in the memory 902 may include one or more modules (not shown) each of which may include a series of computer-executable instructions for use in a computer device. Still further, the processor 901 may be provided in communication with a memory 902 for executing a series of computer executable instructions in the memory 902 on a computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input output interfaces 905, one or more keyboards 906, and the like.
In a particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:
acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
inputting the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
For each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame;
determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box;
and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
In another particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:
Acquiring a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second initial boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second prediction categories corresponding to the second initial boundary boxes;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second initial boundary boxes.
In the model training stage, the computer equipment in the embodiment of the application promotes the model to be trained to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the predicted first prediction boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of the target object in the image to be detected by the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the target detection model to a new image to be detected is ensured, and the data migration adaptability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by the similar distribution of the boundary frame but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved; correspondingly, in the target detection process, firstly extracting a plurality of candidate boundary frames by using a preset region of interest extraction model, and randomly sampling a third preset number of candidate boundary frames in the candidate boundary frames to serve as a second initial boundary frame; generating a sub-model for each second initial boundary frame, and carrying out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; and then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second initial boundary box, so that the generated sub-model can accurately predict the boundary box on the new image to be detected, and further, the accuracy of target detection of the image to be detected by using the target detection model is improved.
It should be noted that, the embodiments related to the computer device and the embodiments related to the target detection model training method in the present application are based on the same inventive concept, so the specific implementation of the embodiments may refer to the implementation of the corresponding target detection model training method, and the repetition is omitted.
Further, corresponding to the methods shown in fig. 1 to 6, based on the same technical concept, the embodiments of the present application further provide a storage medium, which is used to store computer executable instructions, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instructions stored in the storage medium can implement the following flow when executed by a processor:
acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
inputting the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
The model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame;
determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box;
and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
In another specific embodiment, the storage medium may be a usb disk, an optical disc, a hard disk, or the like, where the computer executable instructions stored in the storage medium when executed by the processor implement the following procedures:
Acquiring a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second initial boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second prediction categories corresponding to the second initial boundary boxes;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second initial boundary boxes.
When the computer executable instructions stored in the storage medium in the embodiment of the application are executed by the processor, in a model training stage, the to-be-trained model is caused to continuously learn the boundary frame distribution based on the real boundary frame and the first initial boundary frame, so that the predicted first predicted boundary frame is more similar to the real boundary frame, the accuracy of boundary frame prediction of the position of the target object in the to-be-detected image by the trained target detection model can be improved, the generalization of the trained target detection model can be improved, the target detection accuracy of the new to-be-detected image by the target detection model is ensured, and the data migration adaptability of the trained target detection model is improved; the model to be trained comprises a generating sub-model and a judging sub-model, a regression loss value of the model to be trained is determined based on a judging result set output by the judging sub-model, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated based on the regression loss value until a current model training result meets a preset model training ending condition, namely, the boundary frame distribution is continuously learned based on a mode of generating judging multi-round countermeasures, wherein the judging sub-model can judge whether a first prediction boundary frame obtained by predicting the generating sub-model is sufficiently real or not, and under the condition that the generated boundary frame (namely, the first prediction boundary frame) is difficult to distinguish from a real boundary frame, the model parameters are adjusted based on the judging result of the judging sub-model, so that the first prediction boundary frame obtained by predicting the generating sub-model is more close to the real boundary frame, and the model parameter updating efficiency and the boundary frame distribution learning accuracy of the generating sub-model are further improved; the judging result set output by the judging sub-model not only comprises a first judging result representing the distribution similarity degree of the boundary frame, but also comprises a second judging result representing the coordinate superposition degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by the similar distribution of the boundary frame but specific position deviation is achieved, the accuracy of the regression loss value obtained based on the judging result set is higher, and the accuracy of the model parameter updated based on the regression loss value can be further improved; correspondingly, in the target detection process, firstly extracting a plurality of candidate boundary frames by using a preset region of interest extraction model, and randomly sampling a third preset number of candidate boundary frames in the candidate boundary frames to serve as a second initial boundary frame; generating a sub-model for each second initial boundary frame, and carrying out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; and then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second initial boundary box, so that the generated sub-model can accurately predict the boundary box on the new image to be detected, and further, the accuracy of target detection of the image to be detected by using the target detection model is improved.
It should be noted that, the embodiments related to the storage medium and the embodiments related to the target detection model training method in the present application are based on the same inventive concept, so the specific implementation of the embodiments may refer to the implementation of the corresponding target detection model training method, and the repetition is not repeated.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
Embodiments of the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims (16)

1. A method for training a target detection model, the method comprising:
acquiring a first preset number of first initial boundary frames and acquiring real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
inputting the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame;
Determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box;
and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
2. The method according to claim 1, wherein the method further comprises:
inputting the sample image dataset into a preset region of interest extraction model to extract the region of interest, and obtaining a second preset number of candidate bounding boxes; the second preset number is greater than the first preset number;
the obtaining a first initial bounding box of a first preset number includes: randomly selecting the first preset number of candidate bounding boxes from the second preset number of candidate bounding boxes as a first initial bounding box.
3. The method of claim 1, wherein the set of discrimination results further comprises a third discrimination result; the generating a discrimination result set based on the real bounding box corresponding to the first initial bounding box and the first prediction bounding box corresponding to the first initial bounding box includes:
carrying out boundary frame authenticity judgment on the real boundary frame and the first prediction boundary frame corresponding to the first initial boundary frame to obtain a first judgment result; calculating boundary frame cross ratio loss based on a real boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame to obtain a second judging result; and calculating a regression loss compensation value for restraining the loss gradient of the regression loss function of the model to be trained based on the real boundary box and the first prediction boundary box corresponding to the first initial boundary box, and obtaining a third judging result.
4. The method of claim 3, wherein determining the regression loss value for the model to be trained based on the first and second discrimination results for each of the first initial bounding boxes comprises:
determining a sub-regression loss value corresponding to each first initial boundary box; the sub-regression loss value corresponding to the first initial bounding box is determined based on target information including one or a combination of: the boundary box distribution similarity represented by the first discrimination result corresponding to the first initial boundary box, the boundary box coordinate coincidence degree represented by the second discrimination result and the regression loss compensation value represented by the third discrimination result;
and determining the regression loss value of the model to be trained based on the sub-regression loss value corresponding to each first initial boundary box.
5. The method of claim 3, wherein performing the authenticity determination on the real bounding box and the first prediction bounding box corresponding to the first initial bounding box to obtain a first determination result includes:
determining a first judging probability that the real boundary box is predicted to be true by the judging sub-model based on a real boundary box corresponding to the first initial boundary box; determining a second discrimination probability of the first prediction boundary box predicted as fake by the discrimination sub-model based on a first prediction boundary box corresponding to the first initial boundary box;
And generating a first discrimination result based on the first discrimination probability and the second discrimination probability.
6. The method of claim 5, wherein generating a first discrimination result based on the first discrimination probability and the second discrimination probability comprises:
determining a first weighted probability based on the first discrimination probability and a first prior probability of a real bounding box corresponding to the first initial bounding box; determining a second weighted probability based on the second discrimination probability and a second prior probability of the first initial bounding box;
and generating a first judging result based on the first weighted probability and the second weighted probability.
7. The method of claim 3, wherein calculating a bounding box overlap ratio loss based on the real bounding box and the first prediction bounding box corresponding to the first initial bounding box, to obtain the second discrimination result, comprises:
performing boundary frame cross-ratio loss calculation on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame to obtain a first cross-ratio loss;
and determining a second judging result corresponding to the first initial boundary box based on the first cross ratio loss.
8. A method according to claim 3, wherein the calculating a regression loss compensation value for constraining a loss gradient of a regression loss function of the model to be trained based on the real bounding box and the first prediction bounding box corresponding to the first initial bounding box comprises:
generating a synthetic boundary box corresponding to the first initial boundary box based on a real boundary box corresponding to the first initial boundary box and a first prediction boundary box;
and determining a regression loss compensation value based on the boundary frame distribution similarity degree of the synthesized boundary frame corresponding to the first initial boundary frame and the real boundary frame.
9. The method of claim 8, wherein the generating a composite bounding box corresponding to the first initial bounding box based on the true bounding box and the first prediction bounding box corresponding to the first initial bounding box comprises:
determining a first coordinate information subset based on a first sampling proportion and a first coordinate information set of a real boundary box corresponding to the first initial boundary box;
determining a second coordinate information subset based on a second sampling proportion and a second coordinate information set of a first prediction boundary box corresponding to the first initial boundary box; the sum of the first sampling ratio and the second sampling ratio is equal to 1;
And generating a synthetic boundary box corresponding to the first initial boundary box based on the first coordinate information subset and the second coordinate information subset.
10. The method of claim 4, wherein the model to be trained further comprises a classification sub-model; the specific implementation manner of each model training further comprises the following steps: the classification sub-model classifies the first initial boundary box or the first prediction boundary box to obtain a first prediction category;
the target information further comprises a matching relation between a first prediction category corresponding to the first initial boundary box and a real category of the first initial boundary box, wherein if the first prediction category is not matched with the real category, a sub-regression loss value corresponding to the first initial boundary box is zero; and if the first prediction category is matched with the real category, determining a sub-regression loss value corresponding to the first initial boundary box as a sub-regression loss value determined based on at least one of a first regression loss component corresponding to the boundary box distribution similarity degree, a second regression loss component corresponding to the boundary box coordinate coincidence degree and the regression loss compensation value.
11. A method of target detection, the method comprising:
acquiring a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second initial boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second prediction categories corresponding to the second initial boundary boxes;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second initial boundary boxes.
12. The method of claim 11, wherein the object detection model comprises a classification sub-model and a generation sub-model;
for each of said second initial bounding boxes: in the target detection process, the generating sub-model carries out boundary frame prediction based on the second initial boundary frame to obtain a second prediction boundary frame corresponding to the second initial boundary frame; and the classification sub-model classifies the second initial boundary box or the second prediction boundary box to obtain a second prediction category corresponding to the second initial boundary box.
13. An object detection model training apparatus, the apparatus comprising:
the first boundary frame acquisition module is configured to acquire a first preset number of first initial boundary frames and acquire real boundary frames corresponding to the first initial boundary frames respectively; the first initial bounding box is obtained by extracting a target region from a sample image dataset by using a preset region of interest extraction model;
the model training module is configured to input the first initial boundary box and the real boundary box into a model to be trained for model iterative training until a current model training result meets a preset model training ending condition, so as to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first initial bounding boxes: the generating sub-model carries out boundary frame prediction based on the first initial boundary frame to obtain a first prediction boundary frame; the judging sub-model generates a judging result set based on a real boundary frame corresponding to the first initial boundary frame and a first prediction boundary frame corresponding to the first initial boundary frame; the judging result set comprises a first judging result and a second judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the real boundary frame, and the second judging result represents the boundary frame coordinate superposition degree of the first prediction boundary frame and the real boundary frame; determining a regression loss value of the model to be trained based on a first discrimination result and a second discrimination result corresponding to each first initial boundary box; and updating parameters of the generating sub-model and the judging sub-model based on the regression loss value.
14. An object detection device, the device comprising:
a second bounding box acquisition module configured to acquire a third preset number of second initial bounding boxes; the second initial bounding box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module is configured to input the second initial bounding boxes into a target detection model to carry out target detection, so as to obtain second prediction bounding boxes and second prediction categories corresponding to the second initial bounding boxes;
and the detection result generation module is configured to generate a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second initial boundary box.
15. A computer device, the device comprising:
a processor; and
a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method of any of claims 1-10 or any of claims 11-12.
16. A storage medium storing computer executable instructions for causing a computer to perform the method of any one of claims 1-10 or any one of claims 11-12.
CN202210831208.2A 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device Pending CN117437395A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210831208.2A CN117437395A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device
PCT/CN2023/100274 WO2024012138A1 (en) 2022-07-15 2023-06-14 Target detection model training method and apparatus, and target detection method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210831208.2A CN117437395A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device

Publications (1)

Publication Number Publication Date
CN117437395A true CN117437395A (en) 2024-01-23

Family

ID=89535471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210831208.2A Pending CN117437395A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device

Country Status (2)

Country Link
CN (1) CN117437395A (en)
WO (1) WO2024012138A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118050839B (en) * 2024-04-16 2024-06-28 上海频准激光科技有限公司 Target grating generation method
CN118050894B (en) * 2024-04-16 2024-06-28 上海频准激光科技有限公司 Control system of light reflection module

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569901B (en) * 2019-09-05 2022-11-29 北京工业大学 Channel selection-based countermeasure elimination weak supervision target detection method
CN111767962B (en) * 2020-07-03 2022-11-08 中国科学院自动化研究所 One-stage target detection method, system and device based on generation countermeasure network
CN114565916B (en) * 2022-02-07 2024-06-28 苏州浪潮智能科技有限公司 Target detection model training method, target detection method and electronic equipment

Also Published As

Publication number Publication date
WO2024012138A1 (en) 2024-01-18

Similar Documents

Publication Publication Date Title
Arietta et al. City forensics: Using visual elements to predict non-visual city attributes
CN117437395A (en) Target detection model training method, target detection method and target detection device
US11585918B2 (en) Generative adversarial network-based target identification
US11468266B2 (en) Target identification in large image data
CN114549894A (en) Small sample image increment classification method and device based on embedded enhancement and self-adaptation
WO2024012179A1 (en) Model training method, target detection method and apparatuses
CN118097355A (en) Alarm information processing method, equipment and medium based on ensemble learning
CN113469111A (en) Image key point detection method and system, electronic device and storage medium
CN115953584B (en) End-to-end target detection method and system with learning sparsity
CN116091867A (en) Model training and image recognition method, device, equipment and storage medium
CN117437396A (en) Target detection model training method, target detection method and target detection device
CN115758159A (en) Zero sample text position detection method based on mixed contrast learning and generation type data enhancement
CN117437397A (en) Model training method, target detection method and device
KR102491451B1 (en) Apparatus for generating signature that reflects the similarity of the malware detection classification system based on deep neural networks, method therefor, and computer recordable medium storing program to perform the method
CN115131679A (en) Detection method, detection device and computer storage medium
CN111708745B (en) Cross-media data sharing representation method and user behavior analysis method and system
CN114445716A (en) Key point detection method, key point detection device, computer device, medium, and program product
CN113986245A (en) Object code generation method, device, equipment and medium based on HALO platform
CN116737974B (en) Method and device for determining threshold value for face image comparison and electronic equipment
Gultom et al. Indonesian Abusive Tweet Classification based on Convolutional Neural Network and Long Short Term Memory Method
WO2019116494A1 (en) Learning device, learning method, sorting method, and storage medium
CN116719962B (en) Image clustering method and device and electronic equipment
CN118135357B (en) Core set construction method, device, equipment and medium
Praveena et al. XG Boosting and Deep Random Forest Based House Number Detection
CN112508179B (en) Method, device and medium for constructing network structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination