WO2023160666A1 - 一种目标检测方法、目标检测模型训练方法及装置 - Google Patents

一种目标检测方法、目标检测模型训练方法及装置 Download PDF

Info

Publication number
WO2023160666A1
WO2023160666A1 PCT/CN2023/078250 CN2023078250W WO2023160666A1 WO 2023160666 A1 WO2023160666 A1 WO 2023160666A1 CN 2023078250 W CN2023078250 W CN 2023078250W WO 2023160666 A1 WO2023160666 A1 WO 2023160666A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
detection model
detection
channel layer
training
Prior art date
Application number
PCT/CN2023/078250
Other languages
English (en)
French (fr)
Inventor
唐小军
郑瑞
石瑞姣
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202380007919.0A priority Critical patent/CN116964588A/zh
Publication of WO2023160666A1 publication Critical patent/WO2023160666A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present disclosure relates to the field of image detection, and in particular to a target detection method, a target detection model training method and a device.
  • Multi-dataset fusion detection training refers to the use of multiple datasets labeled with different categories to train a single detector model to achieve full-category target detection. Compared with using multiple single detectors in parallel, multi-dataset fusion detection training can realize a single detector to detect all categories of targets at the same time, and the calculation amount is much smaller, so it has high practical application value.
  • a target detection method comprising: acquiring an image to be detected; using a target detection model to process the image to be detected to obtain a target detection result corresponding to the target to be detected in the image to be detected; wherein the target detection model includes features Extraction network and target prediction network; feature extraction network is used to extract features of the image to be detected to obtain image features related to various target objects; target detection network is used to process image features to obtain target detection results; target detection network includes category channel layer , multiple target channel layers and multiple coordinate channel layers; the target channel layer is used to output the detection prediction value representing whether there is a target object, each target channel layer is used to detect at least one of various target objects, and multiple targets The channel layer is used to detect different categories of target objects; the category channel layer is used to output the category prediction values corresponding to various target objects; the coordinate channel layer is used to output the coordinate prediction values corresponding to the target objects; the target detection result is based on the detection prediction value , category predicted value and coordinate predicted value are calculated.
  • the target detection model includes features Extraction network and target prediction
  • multiple coordinate channel layers correspond to multiple target channel layers one by one, and each coordinate channel layer is of the same type as the target object detected by its corresponding target channel layer; When the channel layer detects the target object, the coordinate prediction value of the target object is obtained.
  • the target detection result includes a detection result and a coordinate result; the detection result is calculated according to the fusion calculation of the detection prediction value of the target channel layer and the corresponding category prediction value; the coordinate channel The layer is used to determine the coordinate prediction value of the target object when the corresponding target channel layer detects the target object, including: when the detection result calculated by the target channel layer corresponding to the coordinate channel layer is greater than or equal to the threshold value, obtain the coordinate channel The coordinate prediction value of the layer; if the detection result calculated by the target channel layer corresponding to the coordinate channel layer is less than the threshold, the coordinate prediction value of the coordinate channel layer is not obtained.
  • the structure of the coordinate channel layer, the plurality of target channel layers and the plurality of coordinate channel layers is a convolution structure; the size of the convolution kernel of the convolution structure is one by one.
  • the feature extraction network includes a backbone network and a transition network
  • the backbone network is used to determine the image features of a general category according to the image to be detected
  • the transition network is used to determine images related to various target objects according to the image features of the general category feature.
  • a method for training a target detection model including: obtaining a training set; the training set includes a plurality of training data sets, each training data set includes labeled data of one or more categories of target objects, and multiple training At least two data sets in the data set mark different categories of target objects; the detection model is iteratively trained according to the training set to obtain the target detection model; wherein, the target detection model includes a feature extraction network and a target prediction network; the feature extraction network is used to treat Feature extraction is performed on the detected image to obtain image features related to various target objects; the target detection network is used to process image features to obtain target detection results; the target detection network includes a category channel layer, multiple target channel layers and multiple coordinate channel layers; The target channel layer is used to output a detection prediction value representing whether there is a target object, and each target channel layer is used to detect at least one of a variety of target objects, and multiple target channel layers are used to detect different categories of target objects; category The channel layer is used to output the category prediction values corresponding to various
  • the iterative training of the detection model according to the training set to obtain the target detection model includes: for each iteration, inputting the training set into the detection model to determine the detection results of various target objects;
  • the detection result and the first loss function calculate the first loss value, and adjust the parameters of the detection model;
  • the first loss function includes the target loss function, the coordinate loss function, and the category loss function; the detection model when the first loss function converges is determined as Object detection model.
  • the target loss function satisfies the following formula:
  • L obj+ represents the target loss value of the positive sample in the training set
  • NP represents the total number of target channel layers
  • b represents the number of the target channel layer
  • Target(b) represents the bth target channel layer corresponding to The Anchor set of the positive sample
  • BCELoss represents the BCE loss function
  • s represents the number of the positive sample
  • P obj (s, b) represents the target prediction value corresponding to the anchor of the bth target channel layer and the sth positive sample
  • GT obj ( s) represents the target true value corresponding to the Anchor of the sth positive sample
  • L obj- represents the target loss value of the negative sample in the training set
  • L obj (b) represents the category subset of the target object corresponding to the b-th target channel layer
  • 1(...) is the value function, when the input is True, the value is 1, otherwise the value is 0,
  • L data represents the category set of the target object marked by the current training data
  • H
  • the coordinate loss function satisfies the following formula:
  • L box represents the coordinate loss value
  • NP represents the total number of target channel layers
  • b represents the number of the target channel layer
  • Target(b) represents the Anchor set of positive samples corresponding to the bth target channel layer
  • IOU represents the degree of overlap ( intersection over union, IOU) calculation function
  • s represents the number of the positive sample
  • Phox (s, b) represents the coordinate prediction value of the sth positive sample output by the bth target channel layer
  • GT box (s) represents the sth positive sample The coordinate truth value of positive samples.
  • the class loss function satisfies the following formula:
  • L cls represents the category loss value
  • Class represents the total number of categories of the target object
  • 1[...] is the value function
  • b represents the number of the target channel layer
  • B cls (b) represents the set of the second category corresponding to the b-th target channel layer
  • Len(B cls (b)) represents the category subset of the target object corresponding to the b-th target channel layer
  • H represents the target channel layer
  • W indicates the number of columns of the data matrix output by the target channel layer
  • Anchor indicates all Anchor sets
  • Mask(p, a) indicates whether there is a label box at the position corresponding to pixel p
  • BCELos indicates BCE loss function
  • P cls (p, a, c) represents the predicted value of the category
  • GT cls (p, a, c) represents the true value of the category.
  • it also includes: obtaining a verification set; the verification set includes a plurality of verification data sets corresponding to a plurality of training data sets, and each verification data set includes label data of one or more target objects; Multiple verification data sets are respectively input into the target detection model to obtain the accuracy rates under the multiple verification data sets; the accuracy rates under the multiple verification data sets are summed and calculated as the total accuracy rate of the trained target detection model; or , the accuracy of multiple verification data sets is jointly used as the total accuracy of the trained target detection model.
  • a method for training a target detection model including: obtaining a training set; the training set includes a plurality of training data sets, each training data set includes label data of one or more types of target objects, and the plurality of data At least two data sets in the set mark different categories of target objects; determine the optimal detection model; the optimal detection model is the detection model with the highest accuracy in the historical training detection model, and the historical training detection model includes updated parameters after each iteration of training The detection model; according to the training set, the detection model is iteratively trained, and the training set is pseudo-labeled according to the optimal detection model, and the detection model is continuously trained to obtain the target detection model.
  • performing pseudo-labeling on the detection model for iterative training according to the optimal detection model to obtain the target detection model includes: pseudo-labeling the missing target objects in each training data set in the training set according to the optimal detection model Label labeling to obtain positive sample label data and negative sample label data; where the missing target object is the target object of the unlabeled category in the training data set; determine the positive sample loss value according to the positive sample label data and the positive sample loss function; according to the negative sample label Data and negative sample loss function to determine the negative sample loss value; according to the total loss value, adjust the parameters of the detection model; the total loss value is determined according to the first loss value, positive sample loss value and negative sample loss value; when the total loss function converges The detection model of is determined as the target detection model; the total loss function includes the first loss function, the positive sample loss function, and the negative sample loss function.
  • the missing target objects in the training set are marked to obtain positive sample label data and negative sample label data, including: inputting the training set into the optimal detection model, and determining the optimal detection model for The detection score of each missing target object; for each missing target object, if the detection score of the optimal detection model for the missing target object is greater than or equal to the positive sample score threshold, then it is determined that the label data corresponding to the missing target object is the positive sample label data ; For each missing target object, if the detection score of the optimal detection model for the missing target object is less than or equal to the negative sample score threshold, then it is determined that the labeled data corresponding to the missing target object is the negative sample label data.
  • the positive sample score threshold and the negative sample score threshold are determined according to the following steps: obtaining a verification set; the verification set includes a plurality of verification data sets corresponding to the plurality of training data sets, each The verification data set includes label data of one or more target objects, the accuracy of the detection model is determined according to the verification set; the optimal detection model is determined for the detection score of each target object in the verification set; according to each target The detection score and preset recall rate of the object determine the negative sample score threshold; according to the detection score and preset precision of each target object, determine the positive sample score threshold.
  • it also includes: determining the first weight value, the second weight value and the third weight value; according to the product of the first weight value and the first loss value, the product of the second weight value and the positive sample loss value, and The product of the third weight and the negative sample loss value determines the total loss value.
  • a target detection device including: an acquisition unit and a processing unit; the acquisition unit is configured to acquire an image to be detected; the processing unit is configured to process the image to be detected by using a target detection model to obtain the image to be detected The target detection result corresponding to the target to be detected; wherein, the target detection model includes a feature extraction network and a target prediction network; the feature extraction network is used to perform feature extraction on the image to be detected to obtain various image features related to the target object; the target detection network is used for The target detection result is obtained by processing the image features; the target detection network includes a category channel layer, multiple target channel layers and multiple coordinate channel layers; the target channel layer is used to output the detection prediction value representing whether there is a target object, each target channel The layer is used to detect at least one of a variety of target objects, and multiple target channel layers are used to detect different categories of target objects; the category channel layer is used to output category prediction values corresponding to various target objects; the coordinate channel layer is used for Output the coordinate prediction value corresponding to the target
  • the processing unit is further configured to: acquire the coordinate prediction value of the coordinate channel layer when the detection result calculated by the target channel layer corresponding to the coordinate channel layer is greater than or equal to a threshold.
  • the processing unit is further configured to: if the detection result calculated by the target channel layer corresponding to the coordinate channel layer is less than a threshold, not acquire the coordinate prediction value of the coordinate channel layer.
  • a target detection model training device including: an acquisition unit and a processing unit.
  • the acquiring unit is configured to: acquire a training set.
  • the training set includes multiple training data sets, and each training data set includes labeled data of one or more types of target objects, and at least two data sets in the multiple training data sets have different types of labeled target objects.
  • the processing unit is configured to: iteratively train the detector model according to the training set to obtain the target detector model.
  • the target detection model includes a feature extraction network and a target prediction network; the feature extraction network is used to extract features from the image to be detected to obtain image features related to various target objects; the target detection network is used to process image features to obtain target detection results;
  • the target detection network includes a category channel layer, multiple target channel layers and multiple coordinate channel layers; the target channel layer is used to output the detection prediction value representing whether there is a target object, and each target channel layer is used to detect various target objects. At least one, multiple target channel layers are used to detect different categories of target objects; the category channel layer is used to output category prediction values corresponding to various target objects; the coordinate channel layer is used to output coordinate prediction values corresponding to target objects; the target Detection results are calculated based on detection predictions, category predictions, and coordinate predictions.
  • the processing unit is further configured to: for each iteration, input the training set into the detector model, and determine detection results of various target objects.
  • the processing unit is further configured to: calculate the first loss value according to the detection results of various target objects and the first loss function, and adjust parameters of the detector model.
  • the first loss function includes an objective loss function, a coordinate loss function, and a category loss function.
  • the processing unit is further configured to: determine the detector model when the first loss function converges as the trained target detector model.
  • the obtaining unit is further configured to: obtain a verification set.
  • the verification set includes a plurality of verification data sets corresponding to the plurality of training data sets, and each verification data set includes labeled data of one or more target objects.
  • the processing unit is further configured to: respectively input multiple verification data sets into the target detection model to obtain accuracy rates under the multiple verification data sets.
  • the processing unit is further configured to: calculate the sum of accuracy rates under multiple verification data sets as the total accuracy rate of the trained target detection model. Or, the accuracies of multiple verification data sets are collectively used as the total accuracy of the trained object detection model.
  • a target detection model training device including: an acquisition unit and a processing unit.
  • the acquiring unit is configured to: acquire a training set.
  • the training set includes multiple training data sets, each training data set includes labeled data of one or more categories of target objects, and at least two of the multiple datasets label the categories of the target objects differently.
  • the processing unit is configured to: determine an optimal detection model.
  • the optimal detection model is the detection model with the highest accuracy among the historical training detection models, and the historical training detection model includes a detection model whose parameters have been updated after each iteration of training.
  • the processing unit is further configured to: perform iterative training on the detection model according to the training set, perform pseudo-label labeling on the training set according to the optimal detection model, and continue training the detection model to obtain a target detection model.
  • the processing unit is further configured to: perform pseudo-label labeling on missing target objects in each training data set in the training set according to the optimal detection model to obtain positive sample label data and negative sample label data; wherein, The missing target objects are the target objects of unlabeled categories in the training dataset.
  • the processing unit is further configured to: determine the positive sample loss value according to the positive sample label data and the positive sample loss function.
  • the processing unit is further configured to: determine a negative sample loss value according to the negative sample label data and the negative sample loss function.
  • the processing unit is further configured to: adjust the detection mode according to the total loss value type parameters.
  • the total loss value is determined according to the first loss value, the positive sample loss value and the negative sample loss value.
  • the processing unit is further configured to: determine the detection model when the total loss function converges as the target detection model.
  • the total loss function includes a first loss function, a positive sample loss function, and a negative sample loss function.
  • the processing unit is further configured to: input the labeled training set into the detection model, and determine the detection score of the detection model for each missing target object.
  • the processing unit is further configured to: for each missing target object corresponding to the pseudo-label data, if the detection score of the detection model for the missing target object is greater than or equal to the positive sample score threshold, determine the missing target object The corresponding label data is positive sample label data.
  • the processing unit is further configured to: for each missing target object corresponding to the pseudo-label data, if the detection score of the detection model for the missing target object is less than or equal to the negative sample score threshold, determine the missing target object The corresponding label data is the negative sample label data.
  • the acquisition unit is further configured to: acquire a verification set; the verification set includes a plurality of verification data sets corresponding to a plurality of training data sets, and each verification data set includes one or more target objects The labeled data, the accuracy of the detection model is determined according to the verification set.
  • the processing unit is further configured to: determine the detection score of the optimal detection model for each target object in the verification set.
  • the processing unit is further configured to: determine a negative sample score threshold according to the detection score and the preset recall rate of each target object.
  • the processing unit is further configured to: determine a positive sample score threshold according to the detection score and preset accuracy of each target object.
  • the processing unit is further configured to: determine the first weight, the second weight and the third weight.
  • the processing unit is further configured to: according to the product of the first weight value and the first loss value, the product of the second weight value and the positive sample loss value, and the product of the third weight value and the negative sample loss value , to determine the total loss value.
  • an object detection device including: a processor and a communication interface; the communication interface is coupled to the processor, and the processor is used to run computer programs or instructions to achieve the above-mentioned any embodiment.
  • an object detection device including: a processor and a communication interface; the communication interface is coupled to the processor, and the processor is used to run computer programs or instructions to achieve the above-mentioned any embodiment.
  • a non-transitory computer readable storage medium stores computer program instructions, and when the computer program instructions run on a computer (for example, an object detection device), the computer executes the object detection method as described in any one of the above embodiments.
  • a non-transitory computer readable storage medium stores computer program instructions, and when the computer program instructions run on a computer (for example, a target detection model training device), the computer executes the method described in any of the above-mentioned embodiments.
  • a computer for example, a target detection model training device
  • a computer program product includes computer program instructions, which when executed on a computer (for example, a detector training device), the computer program instructions cause the computer to perform object detection, object detection, and object detection as described in any of the above embodiments. Detect model training methods.
  • a computer program is provided.
  • the computer program When the computer program is executed on a computer (for example, a detector training device), the computer program causes the computer to execute the method for object detection and object detection model training described in any of the above embodiments.
  • a chip in yet another aspect, includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run computer programs or instructions to realize the target detection and target detection model described in any of the above embodiments training method.
  • the chip provided in the present disclosure further includes a memory for storing computer programs or instructions.
  • all or part of the above computer instructions may be stored on a computer-readable storage medium.
  • the computer-readable storage medium may be packaged together with the processor of the device, or may be packaged separately with the processor of the device, which is not limited in the present disclosure.
  • an eye detection system including: a target detection device and a target detection model training device, wherein the detector training device is used to execute the target detection method as described in any of the above embodiments, and the target detection model training device is used for To execute the target detection model training method described in any one of the above-mentioned embodiments.
  • the names of the above-mentioned target detection device and target detection model training device do not limit the equipment or functional modules themselves. In actual implementation, these devices or functional modules may appear with other names. As long as the functions of each device or functional module are similar to those of the present disclosure, they fall within the scope of the claims of the present disclosure and their equivalent technologies.
  • FIG. 1 is a flow chart of fusion detection of multiple data sets provided according to some embodiments
  • FIG. 2 is an architecture diagram of a detector model provided according to some embodiments.
  • Fig. 3 is an architecture diagram of a detector model provided according to some embodiments.
  • Fig. 4 is an architecture diagram of a target detection system provided according to some embodiments.
  • Fig. 5 is a flowchart of a target detection method provided according to some embodiments.
  • Fig. 6 is a flowchart of a method for training a target detection model provided according to some embodiments.
  • Fig. 7 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 8 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 9 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 10 is a flow chart of another target detection model training method provided according to some embodiments.
  • Fig. 11 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 12 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 13 is a flowchart of another target detection model training method provided according to some embodiments.
  • Fig. 14 is a structural diagram of an object detection device provided according to some embodiments.
  • Fig. 15 is a structural diagram of a target detection model training device provided according to some embodiments.
  • Fig. 16 is a structural diagram of another object detection model training device provided according to some embodiments.
  • Fig. 17 is a structural diagram of another object detection device provided according to some embodiments.
  • Fig. 18 is a structural diagram of another object detection model training device provided according to some embodiments.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present disclosure, unless otherwise specified, "plurality” means two or more.
  • the expressions “coupled” and “connected” and their derivatives may be used.
  • the term “connected” may be used in describing some embodiments to indicate that two or more elements are in direct physical or electrical contact with each other.
  • the term “coupled” may be used when describing some embodiments to indicate that two or more elements are in direct physical or electrical contact.
  • the terms “coupled” or “communicatively coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
  • the embodiments disclosed herein are not necessarily limited by the context herein.
  • At least one of A, B and C has the same meaning as “at least one of A, B or C” and both include the following combinations of A, B and C: A only, B only, C only, A and B A combination of A and C, a combination of B and C, and a combination of A, B and C.
  • a and/or B includes the following three combinations: A only, B only, and a combination of A and B.
  • the term “if” is optionally interpreted to mean “when” or “at” or “in response to determining” or “in response to detecting,” depending on the context.
  • the phrase “if it is determined that" or “if [the stated condition or event] is detected” is optionally construed to mean “when it is determined that" or “in response to determining that" or “when [the stated condition or event] is detected” a stated condition or event]” or “in response to the detection of [a stated condition or event]”.
  • Target detection refers to the detection of target objects of a set category in a given image, such as human faces, human bodies, vehicles, or building objects.
  • the detection results of target detection usually give the area detection frame, area coordinates and category of the target object.
  • the area detection frame is the circumscribed rectangular frame of the detected target in the detection result output by target detection.
  • Multi-data fusion detection refers to training a single detection model based on multiple datasets labeled with different categories to achieve full-category target detection.
  • the data set includes image data and annotation data
  • the image data is used to represent the image of the target object
  • the annotation data is data for annotating the target object existing in the image data.
  • multi-data fusion detection can train detectors on multiple data sets (in the figure, the number of data sets is three for example), input multiple data sets into the detection model and train, and train After completion, use the validation set of each data set to calculate the mean average precision (mAP) of the detector.
  • mAP mean average precision
  • Neural networks are also called artificial neural networks (artificial neural networks) networks, ANNs), is a mathematical model algorithm that imitates the behavior characteristics of animal neural networks and performs distributed parallel information processing.
  • Neural networks include deep learning networks, such as convolutional neural networks (CNN), long short-term memory networks (long short-term memory, LSTM), etc.
  • the Yolov5 (you only look once Version 5) algorithm used during detector training is also a kind of neural network.
  • the loss function is a function that maps a random event or its related random variables to non-negative real numbers to represent the "risk” or "loss" of the random event.
  • loss functions are often associated with optimization problems as learning criteria, i.e., solving and evaluating models by minimizing the loss function. For example, it is used for parameter estimation of models in statistics and machine learning.
  • the loss function is used to evaluate the detection accuracy of the detection model for the target object.
  • the loss function of the detection result output by the detection model satisfies certain preset conditions, it is determined that the detection model at this time has been trained, and the trained detection model is determined as the final detection model.
  • the traditional single detection model only one target channel is set, that is, one target channel is responsible for detecting all types of target objects.
  • the traditional single detection model only has one target channel, which will cause the target channel to detect the unlabeled objects in the data set, and then be erroneously intervened by the labeled data of other types of objects in the data set, which will seriously affect The training accuracy of the detector.
  • the missing objects in the existing data set are manually relabeled, the workload of labeling will be very large, and the labor cost will be too high, making it difficult to apply on a large scale.
  • some embodiments of the present disclosure provide a target detection method and a target detection model training method.
  • multiple target channel layers are set in the detection model to detect target objects by category. Therefore, for a target object of a certain category, after the target object of this type is detected in the target channel layer of the corresponding category, if the target object of this category is not marked in the current data set, when the detection model is trained according to the current data set , the output result of the target channel layer corresponding to the target object of this category will not be substituted into the subsequent training process.
  • the optimal detection model with the highest historical accuracy is determined, and the optimal detection model is used to label the training process with pseudo-labels, so that the labeled data of pseudo-labels and the labeled data of the real training set are fused, and the detection model
  • the detection recall rate of the final target detection model in cross-scenes is improved, and a better training effect than traditional single detection model training is achieved.
  • the target detection model trained through the above training process can detect target objects by category in a specific target detection application, with high detection accuracy and better detection effect.
  • Fig. 2 is a schematic diagram of the architecture of a detection model 20 provided according to some embodiments, the detection model 20 is a single detection model, and uses the Yolov5 algorithm as the basic structure.
  • the detection model 20 includes: an input module 21 and a target detection module 22 .
  • the input module 21 is used to input the data set into the detection model 20 .
  • Data transmission can be performed between the input module 21 and the target detection module 22 .
  • the target detection module 22 is used to process the data set to obtain the training detection result of the target object. As shown in FIG. 2 , the target detection module 22 includes a backbone (Backbone) network 221 , a transition (Neck) network 222 and a detection (Detection) network 223 .
  • Backbone Backbone
  • Neck transition
  • Detection detection
  • the Backbone network 221 is used to perform an extraction operation on the image data in the data set, so as to obtain common image features and transmit them to the Neck network 222 .
  • the Neck network 222 receives the general image features sent by the Backbone network 221 .
  • the general image features are the image features of general categories of objects acquired after the Backbone network 221 extracts the original image data during the preliminary image extraction in the field of image detection. It should be noted that how the Backbone network 221 acquires common image features will not be described in detail here.
  • the architecture of the Backbone network 221 may adopt CSPDarkner.
  • the Neck network 222 is used to extract image features that are strongly correlated with the category of the target object from common image features, and send the strongly correlated image features to the Detection network 223 .
  • the Detection network 223 receives the strongly correlated image features sent by the Neck network 222 .
  • the strongly correlated image features are image features of objects similar to the category of the target object obtained after the general image features are extracted by the Neck network 222 .
  • the category of the target object here is the detection category set by the detection model 20 . It should be noted that how the Neck network 222 acquires image features strongly related to the category of the target object will not be described in detail here. Exemplarily, the architecture of the Neck network 222 may adopt PANet.
  • the Detection network 223 is used to calculate the final target detection result according to the strongly correlated image features.
  • the target detection result includes the area detection frame, area coordinates and category of the target object.
  • three kinds of data output channel layers are set in the detection network 223 , which are object (Object) channel layer 31 , coordinate (Box) channel layer 32 , and category (Class) channel layer 33 . Wherein, there are multiple Object channel layers 31 and Box channel layers 32 , and one Class channel layer 33 .
  • the Object channel layer 31 is used to judge whether there is a target object in a corresponding position among strongly correlated image features. If the Object channel layer 31 determines that there is a target object, it will output the area detection frame of the target object at the corresponding position.
  • the Box channel layer 32 is used to calculate the specific coordinates of the target object when the Object channel layer 31 determines that the target object exists, so as to fine-tune the area detection frame of the target object, so that the position of the area detection frame is more accurate.
  • the Class channel layer 33 is used to identify the category of the target object.
  • the structure of the Object channel layer 31 , the Box channel layer 32 , and the Class channel layer 33 is a convolution structure, and the size of the convolution kernel of the convolution structure is one by one.
  • FIG. 4 is a structure diagram of a target detection system 40 provided according to some embodiments.
  • the target detection system 40 includes: an image acquisition device 41 , a detection processing device 42 , and an interaction device 43 .
  • the image acquisition device 41 is used to acquire the image to be detected. And, sending the image to be detected to the detection processing device 42 .
  • the image acquisition device 41 may be implemented as a surveillance camera, a camera, or other equipment capable of image acquisition. It can be understood that the image acquisition device 41 can be arranged at the entrance and exit of the area to be inspected, or at a certain vertical height in the area to be inspected, so as to obtain images of the object to be inspected.
  • the detection processing device 42 is configured to, after receiving the image to be detected, use the target detection model to process the image to be detected to obtain a target detection result corresponding to the target to be detected in the image to be detected. It should be noted that the specific detection processing device 42 uses the target detection model to process the image to be detected to obtain the target detection result corresponding to the target to be detected in the image to be detected. repeat.
  • the detection processing device 42 sends the target detection result to the interaction device 43 after obtaining the target detection result corresponding to the target to be detected in the image to be detected.
  • the interaction device 43 is used to realize the output of the target detection result and the human-computer interaction with the staff.
  • the interaction device 43 may include a display terminal and a human-computer interaction device.
  • the display terminal can be realized as a display or other devices with visual display function
  • the human-computer interaction device can realize It is a touch screen, keyboard and mouse, or other devices with human interaction functions.
  • the execution subject is the target detection system; in the target detection model training method provided in the present disclosure, the execution subject is the target detection model training device.
  • the target detection system and the target detection model training device can be servers respectively, including:
  • the processor can be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, a specific application integrated circuit (application-specific integrated circuit, ASIC), or one or more programs used to control the disclosed scheme implementation of the integrated circuit.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Transceiver can be a device that uses any transceiver for communicating with other devices or communication networks, such as Ethernet, radio access network (radio access network, RAN), wireless local area networks (wireless local area networks, WLAN), etc.
  • radio access network radio access network
  • WLAN wireless local area networks
  • Memory which can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions
  • Type of dynamic storage device also can be electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), read-only disc (compact disc read-only memory, CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by Any other medium accessed by a computer, but not limited to.
  • the memory may exist independently and be connected to the processor through a communication line. Memory can also be integrated with the processor.
  • the object detection system and the object detection model training device in the embodiments of the present disclosure may also be a part of the system coupled to the server, for example, a chip system in the server.
  • Figure 5 is a target detection method provided according to some embodiments, the method includes the following steps 501-step 503:
  • Step 501 the target detection system acquires an image to be detected.
  • the image to be detected is the image data that may include the target to be detected.
  • step 501 may specifically be performed by an image acquisition device included in the target detection system as described above, so that the target detection system acquires an image to be detected.
  • Step 502 the target detection system uses the target detection model to process the image to be detected, and obtains The object detection result corresponding to the object to be detected in the image to be detected.
  • the target detection model includes a feature extraction network and a target prediction network.
  • the following introduces the feature extraction network and target prediction network respectively:
  • Feature extraction network which is used to perform feature extraction on the image to be detected to obtain image features related to various target objects.
  • the feature extraction network here is constructed based on the backbone network 221 and the transition network 222 in the detection model 20 described above.
  • the feature extraction network includes a backbone network and a transition network.
  • the backbone network is used to determine the image features of the general category according to the image to be detected
  • the transition network is used to determine the image features related to various target objects according to the image features of the general category.
  • the target detection network is used to process image features to obtain target detection results.
  • the target detection network here is constructed based on the detection network 223 in the detection model 20 described above.
  • the object detection network includes a category channel layer, multiple object channel layers and multiple coordinate channel layers.
  • the structure of the coordinate channel layer, multiple target channel layers and multiple coordinate channel layers is a convolution structure, and the convolution kernel size of the convolution structure is one by one.
  • the target channel layer is used to output the detection prediction value representing whether there is a target object
  • each target channel layer is used to detect at least one of a variety of target objects
  • multiple target channel layers are used to detect different types of target objects
  • the category channel layer is used to output category prediction values corresponding to various target objects
  • the coordinate channel layer is used to output coordinate prediction values corresponding to the detected target objects.
  • each coordinate channel layer and its corresponding target channel layer detect the same target object category, and one or more target objects can be detected.
  • the coordinate channel layer is used to obtain the coordinate prediction value of the target object at the same time.
  • each coordinate channel layer is consistent with the target object category detected by its corresponding target channel layer, so it is possible to make the difference between different target objects Prediction does not affect each other, and realizes the detection of multiple types of targets.
  • the coordinate prediction value of the corresponding coordinate channel layer is obtained, and for the non-existing target object, the coordinate prediction value of the corresponding coordinate channel layer is discarded. In this way, by setting a one-to-one correspondence between the coordinate channel layer and the target channel layer, the positions of different types of targets can be efficiently detected at the same time, which greatly saves computing power and speeds up detection.
  • multiple target channel layers are also set in the detection model, and these target channel layers will classify the target objects line detection. Therefore, for a target object of a certain category, after the target object of this type is detected in the target channel layer of the corresponding category, if the target object of this category is not marked in the current data set, when the detection model is trained according to the current data set , the output result of the target channel layer corresponding to the target object of this category will not be substituted into the subsequent training process. In this way, after the target channel layer detects unlabeled objects in the data set, it is avoided that the wrong intervention is performed on the labeled data of other types of objects in the data set, and the training accuracy of the detection model is improved.
  • judging whether the target channel layer detects the existence of the target object can be achieved in the following manner: the detection prediction value output by each target channel layer and the category channel layer are compared with the target channel The category prediction value of the category detected by the layer is fused and calculated to obtain the detection result, and the detection result is compared with the threshold to determine whether there is a target object of the category detected by the target channel layer. That is to say, by combining the detection prediction value output by the target channel layer with the category prediction value in the category channel layer corresponding to the category detected by the target channel layer, it is possible to combine the information of the two dimensions of position and category to predict whether The target object is detected, thereby making the prediction result of the existence of the target object more accurate.
  • the present disclosure determines the optimal detection model with the highest historical accuracy during each iterative training process of the detection model during the training of the target detection model, and uses the optimal detection model to train The set is marked with pseudo-labels, and the labeling data obtained after the training set is labeled with pseudo-labels and the labeling data of the real training set are combined to perform fusion training on the detection model, and the pseudo-label labeling can be applied to the training process of the target detection model , which improves the detection recall rate of the final target detection model in cross-scenes, and achieves a better training effect of the target detection model.
  • the above two possible training methods of the target detection model can be used in combination with the target detection method in this embodiment while being applied separately to train the target detection model in this embodiment. That is to say, the target detection model in the target detection method provided in this embodiment can be trained separately through the above two possible target detection model training methods respectively; or, can be combined with the above two possible target detection model training method, obtained through joint training.
  • the target detection results are introduced below:
  • the target detection result is calculated based on the detection prediction value, category prediction value and coordinate prediction value.
  • the target detection result includes a detection result and a coordinate result.
  • the detection result is obtained by fusion calculation based on the detection prediction value of the target channel layer and the corresponding category prediction value. Specifically, it can be implemented in the following manner, the detection prediction value output by each target channel layer and the corresponding category The category prediction values are multiplied to obtain the corresponding detection result.
  • the detection result is higher than the threshold, it is considered that the target channel layer detects the existence of the target object of the detected category.
  • the coordinate prediction value output by the coordinate channel layer is obtained as the coordinate result ;
  • the detection result is low, it is considered that the target channel layer does not detect the target of the corresponding category, and at this time, the coordinate prediction value output by the coordinate channel layer corresponding to the target channel layer is directly discarded.
  • the coordinate channel layer is used to determine the coordinate prediction value of the target object, the following rules are followed: when the detection result calculated by the target channel layer corresponding to the coordinate channel layer is greater than or equal to the threshold value, the coordinate channel layer is obtained. Prediction value; when the detection result calculated by the target channel layer corresponding to the coordinate channel layer is less than the threshold, the coordinate prediction value of the coordinate channel layer is not obtained.
  • step 502 can be specifically performed by the detection processing device included in the target detection system as described above, so that the target detection system uses the target detection model to process the image to be detected to obtain The target detection result corresponding to the target to be detected.
  • Step 503 the target detection system outputs a target detection result.
  • the target detection system displays the target detection results to the staff in a visualized manner.
  • the target detection system displays the target detection results on the display screen in the form of the target object's area detection frame, area coordinates and category, so that the staff can know the target detection system's detection results of the detected target.
  • step 503 may specifically be performed by an interaction device included in the target detection system as described above, so that the target detection system outputs a target detection result.
  • the target detection system provided by the present disclosure can detect the target object in the image to be detected, and since multiple target channel layers are set in the target detection model in the target detection system, and these target channel layers will be divided into categories To detect the target object, the detection accuracy of the target detection model is relatively high. Therefore, the target detection system provided in the present disclosure can achieve better detection effect for the target object.
  • Figure 6 is a method for training a target detection model provided according to some embodiments, the method includes the following steps 601-602:
  • Step 601 the target detection model training device acquires a training set.
  • the training set includes labeled data of various target objects.
  • the training set includes multiple training data sets, and each training data set includes image data and data labeled with one or more types of target objects.
  • the training set includes three data sets, and the categories of the labeled target objects corresponding to the three data sets are people, motor vehicles and non-motor vehicles respectively.
  • the training set includes two data sets, one data set corresponds to the labeled category of the target object is a person, and the other dataset corresponds to the labeled category of the target object is a motor vehicle and a non-motor vehicle.
  • the number of data sets and the number of categories of target objects may not be equal.
  • multiple datasets included in the training set include datasets labeled with the same target object category.
  • the purpose of obtaining multiple data sets is to expand the scope of sample data collection, so as to improve the accuracy of the final trained detection model.
  • there are multiple data sets corresponding to the categories of labeled target objects are people, and the difference between these data sets is that some data sets are data collected during the day, and some data sets are data collected at night; or , part of the dataset is data collected at intersections with dense crowds, and part of the dataset is data collected at intersections with sparse crowds.
  • Step 602 the target detection model training device performs iterative training on the detection model according to the training set to obtain the target detection model.
  • the target detection model is a detection model that can be used in practical applications after iterative training and meets preset requirements.
  • meeting the preset requirement may mean that the loss function of the detection result of the detection model reaches convergence.
  • the accuracy rate of the detection result of the detection model reaches a preset required percentage, and the accuracy rate here may use a mAP value.
  • the detection model is constructed based on the Yolov5 architecture detection model shown in FIG. 2 .
  • the target detection model obtained after iterative training is also constructed based on the Yolov5 architecture detection model shown in Figure 2.
  • the target detection model after iterative training based on the training set here is the target detection model in the aforementioned step 502 .
  • the target detection model in the aforementioned step 502 .
  • the category of the target object detected by the target channel layer may be the same as or different from the category of the target objects labeled by all the datasets included in the training set.
  • the categories of target objects marked in all data sets include people, motor vehicles, and non-motor vehicles
  • the categories of target objects detected by the target channel layer may include people, motor vehicles, and non-motor vehicles. train.
  • the category of the target object detected by the target channel layer may also include subcategories of the category of the target object marked in the dataset, for example, the category of the target object detected by the target channel layer includes people, buses, cars, bicycles, and tricycles. Among them, buses and cars are the subcategories of motor vehicles in the first category, and bicycles and tricycles are the subcategories of non-motor vehicles in the first category.
  • the target detection model training device iteratively trains the detection model according to the training set, including: after the detection model determines the detection results of various target objects, according to the detection results of various target objects and the first A loss function that calculates the first loss value and adjusts accordingly Parameters of the detection model. It should be noted that, for the detailed process of the iterative training of the detection model by the target detection model training device according to the training set, refer to the following steps 701 to 704, which will not be repeated here.
  • the target channel layer used to detect the motor vehicle in the multiple target channel layers detects the existence of the motor vehicle in the image data corresponding to the dataset, but since the object marked by the current dataset is Therefore, the target detection model training device will not substitute the detection result output by the target channel layer whose detection category is a motor vehicle into the subsequent training process.
  • the detection results output by other types of target channel layers other than people will not be substituted into the subsequent training process, and only the detection results output by the target channel layer whose category is people will be substituted into the subsequent training process.
  • the labeled data of the dataset marked as people will only have an impact on the detection results output by the target channel layer whose detection category is people, and the same is true for other labeled categories. In this way, during the training process, it is avoided that a certain category of labeled data has a negative impact on the detection results of other categories of target objects, thereby improving the accuracy of target detection model training.
  • the present disclosure sets multiple target channel layers in the detection model, and these target channel layers detect target objects by categories. Therefore, for a target object of a certain category, after the target object of this type is detected in the target channel layer of the corresponding category, if the target object of this category is not marked in the current data set, when the detection model is trained according to the current data set , the output result of the target channel layer corresponding to the target object of this category will not be substituted into the subsequent training process. In this way, after the target channel layer detects unlabeled objects in the data set, it is avoided that the wrong intervention is performed on the labeled data of other types of objects in the data set, and the training accuracy of the detection model is improved.
  • the process of the detector training device determining to train the detection model according to the training set to obtain the target detection model will be specifically introduced.
  • step 602 specifically includes the following steps 701-704:
  • Step 701 the target detection model training device inputs the training set into the detection model, and determines the detection results of various target objects.
  • the target detection model training device can use the input module 21 in the detection model 20 to train input to the detection model.
  • the detection results of various target objects are also obtained from detection prediction values, category prediction values, and coordinate prediction values. That is, the detection prediction value, category prediction value and coordinate prediction value are respectively given by The category channel layer, target channel layer and coordinate channel layer of the detection model are determined.
  • Step 702 the target detection model training device calculates a first loss value according to the detection results of various target objects and the first loss function.
  • the first loss calculation function includes a target loss function, a coordinate loss function, and a category loss function.
  • the first loss function is obtained by adding an objective loss function, a coordinate loss function, and a class loss function.
  • Step 901-step 904 the process of calculating the first loss value by the specific target detection model training device based on the detection results of various target objects and the first loss function, as well as the formulas of the target loss function, coordinate loss function, and category loss function, refer to the following Step 901-step 904, which will not be repeated here.
  • Step 703 the object detection model training device adjusts the parameters of the detection model according to the first loss value.
  • the target detection model training device judges whether the first loss function of this detection result converges.
  • the target detection model training device determines that the detection model training is completed, and determines the current detection model as the target detection model.
  • the target detection model training device updates the parameters in the detection model to perform the next iterative detection. If the first loss function of the detection model converges in the next iteration, the target detection model training device determines the detection model at this time as the target detection model; if the first loss function of the detection model does not converge in the next iteration, the target detection model The model training device continues to update the parameters in the detection model until the first loss function of the detection model converges.
  • Step 704 the target detection model training device determines the detection model when the first loss function converges as the target detection model.
  • the target detection model is a detection model that can be used in practical applications.
  • the present disclosure performs multiple trainings on the detection model according to the first loss function.
  • the detection result output each time is closer to the labeling of the target object in the training set
  • the detection model at this time is determined as the target detection model. In this way, the detection model can well complete the detection of the target object in subsequent practical applications.
  • step 701 the process of inputting the training set into the detection model by the target detection model training device to determine the detection results of various target objects will be described in detail below.
  • step 701 specifically includes the following steps 801 to 803:
  • Step 801 the target detection model training device determines image features of general categories according to the training set.
  • the target detection model training device determines image features of general categories through a backbone network.
  • the detection model may include a backbone network, and the backbone network may be a Backbone network as shown in FIG. 2 .
  • the target detection model training device can extract image features of common categories in the image data included in the training set.
  • the method for extracting image features of general categories in the image data included in the training set through the Backbone network is described here, and the present disclosure will not repeat them here.
  • the transition network in the subsequent step 802 can extract image features corresponding to various target objects accordingly.
  • Step 802 the target detection model training device determines image features related to various target objects according to image features of general categories.
  • the target detection model training device determines image features related to various target objects through a transition network.
  • the detection model may include a transition network, and the transition network may be a Neck network as shown in FIG. 2 .
  • the target detection model training device can extract image features of general categories to determine image features related to various target objects.
  • a method for extracting image features of a general category through a Neck network to determine image features related to various target objects is described here, and the present disclosure will not repeat them here.
  • Step 803 the target detection model training device determines the detection results of various target objects according to the image features related to the various target objects.
  • the target detection model training device determines the detection results of various target objects through a detection network.
  • the detection model may include a detection network, and the detection network may be a Detection network as shown in FIG. 2 .
  • the target detection model training device can determine the detection results of various target objects based on image features related to various target objects.
  • the detection network is provided with multiple object channel layers, multiple coordinate channel layers and multiple category channel layers.
  • object channel layers multiple object channel layers
  • coordinate channel layers multiple coordinate channel layers
  • category channel layers multiple category channel layers.
  • a target channel layer is used to detect whether at least one target object among various target objects exists in the current detection area.
  • the results output by the target channel layer are "exist” and "absent", for example, may be output in the form of detection prediction value yes or no.
  • the target detection model training device presets a judgment threshold, and then the target channel layer determines the probability value of the target object existing in the current detection area.
  • the target The detection model training device determines that there is a target object in the current detection area, and the output result of the target channel layer is "existence"; similarly, if the probability determined by the target channel layer is less than the judgment threshold, the target detection model training device determines the current detection area There is no target object in , and the output result of the target channel layer is "does not exist".
  • the target channel layer determines that the probability of the existence of this type of target object in the current detection area is 0.98. Assuming that the judgment threshold preset by the target detection model training device is 0.9, since 0.98 is greater than 0.9, the target detection model training device determines that there is a target object in the current detection area.
  • the target channel layer may be the Object channel layer as shown in FIG. 3 .
  • the coordinate channel layer is used to determine the coordinates of the area where the target object exists and output, for example, output in the form of coordinate prediction values (X, Y).
  • the coordinate channel layer may be a Box channel layer as shown in FIG. 3 .
  • the coordinate channel layer will output the coordinates of the detection area. Therefore, in this disclosure, since the number of target channel layers is changed from the original one to multiple (assuming that there are N target channel layers), the corresponding number of coordinate channel layers will also become multiple times of the original (that is, the coordinate channel The number of layers will become N times the original).
  • the category channel layer is used to determine the category of the area where the target object exists and output it, for example, in the form of a detection prediction value person or car.
  • the class channel layer may be the Class channel layer as shown in FIG. 3 .
  • the number of category channel layers is the same as the number of categories of target objects labeled in the training set.
  • the output results of the target channel layer, the coordinate channel layer and the category channel layer may be in the form of a mathematical matrix.
  • the current image region detected by the target channel layer may be a pixel in an image feature related to various target objects.
  • the target detection model training device combines multiple target channel layers, The output results of multiple coordinate channel layers and multiple category channel layers are combined to determine the detection results of various target objects.
  • the present disclosure can use the backbone network, transition network, and detection network set in the detection model, as well as multiple target channel layers, coordinate channel layers, and category channel layers set in the detection network, according to the image data included in the training set. , to determine the detection results of various target objects, so as to facilitate the subsequent target detection model training process.
  • step 702 the process of determining the first loss function by the target detection model training device according to the training set and the detection results of various target objects will be described in detail below.
  • step 702 specifically includes the following steps 901-904:
  • Step 901 the target detection model training device determines the target loss value according to the output results of multiple target channel layers, the labeled data of multiple target objects and the target loss function.
  • the target loss value includes the target loss value of positive samples and the target loss value of negative samples.
  • the target (Object) loss function satisfies the following formula 1:
  • L obj+ represents the target loss value of the positive samples in the training set
  • NP represents the total number of target channel layers
  • b represents the number of the target channel layer
  • Target(b) represents the Anchor set of positive samples corresponding to the bth target channel layer
  • BCELoss represents the BCE loss function
  • s represents the number of the positive sample
  • P obj (s, b) represents the target prediction value corresponding to the bth target channel layer and the Anchor of the sth positive sample
  • GT obj (s) represents the sth
  • L obj- indicates the target loss value of the negative sample in the training set
  • L obj (b) indicates the second category subset corresponding to the bth target channel layer
  • 1(...) is the Value function, when the input is True, the value is 1, otherwise the value is 0,
  • L data represents the first category subset marked by the current training data
  • H represents the number of rows of
  • the above-mentioned positive sample means that when the target object is detected at the target channel layer, for a pixel point, if the pixel point has corresponding label data, it is determined that the pixel point is a positive sample point; vice versa, If there is no corresponding label data for a pixel, it is determined that the pixel is a negative sample. It should be understood that if the pixel is a positive sample, then it is substituted into the positive sample formula to calculate its L obj+ , and if the pixel is a negative sample, then it is substituted into the negative sample formula to calculate its L obj- .
  • Step 902 the target detection model training device determines the coordinate loss value according to the output results of multiple coordinate channel layers, the labeled data of various target objects and the coordinate loss function.
  • the Box loss function satisfies the following formula 2:
  • L box represents the coordinate loss value
  • NP represents the total number of target channel layers
  • b represents the number of the target channel layer
  • Target(b) represents the Anchor set of positive samples corresponding to the bth target channel layer
  • IOU represents the degree of overlap ( intersection over union, IOU) calculation function
  • s represents the number of the positive sample
  • P box (s, b) represents the Box coordinate prediction value of the s-th positive sample output by the b-th target channel layer
  • GT box (s) represents the The true value of Box coordinates of s positive samples. It should be understood that the aforementioned true target value is determined according to the labeled data of various target objects included in the training set.
  • Step 903 the target detection model training device determines the category loss value according to the output results of multiple category channel layers, the labeled data of various target objects and the category loss function.
  • the Class function satisfies the following formula 3:
  • L cls represents the category loss value
  • Class represents the total number of categories of the target object
  • b represents the numbering of the target channel layer
  • B cls (b) represents the set of the second category corresponding to the b-th target channel layer
  • Len(B cls ( b)) represents the total number of the second category corresponding to the bth target channel layer
  • H represents the number of rows of the target channel layer data matrix
  • W represents the number of columns of the target channel layer data matrix
  • Anchor refers to all Anchor sets
  • Mask (p, a) indicates whether there is a label box at the current position corresponding to the training set data
  • BCELoss refers to the BCE loss function
  • P cls (p, a, c) refers to the category prediction value
  • GT cls (p, a, c) is Refers to the category truth value.
  • the aforementioned target true value is determined according to multiple labeled data included in the training set. It should be noted that 1[...] is a value function (the value is 1 when the input is True, otherwise the value is 0).
  • Step 904 the target detection model training device obtains the first loss value from the target loss value, the coordinate loss value, and the class loss function value.
  • the target detection model training device adds the target loss function, the coordinate loss function, and the category loss function, and uses the added formula result as the first loss function.
  • the present disclosure uses the output results of the three channel layers in the detection model and the training set to determine the first loss function of the detection model for the detection results of various target objects.
  • the first loss function can reflect the detection model.
  • the gap between the detection results and the correct results in the labeled data so that the parameters in the detection model can be adjusted in the subsequent process, so that the detection model's The detection results are gradually approaching the correct results in the labeled data.
  • the process of verifying the accuracy of the target detection model by the target detection model training device is specifically introduced below.
  • step 1001-step 1002 are also included:
  • Step 1001 the target detection model training device acquires a verification set.
  • the verification set includes labeled data of various target objects.
  • the verification set includes a plurality of verification data sets, and each verification data set includes image data and data labeled with one or more categories of target objects. It can be understood that the category of the target object marked in the verification set is the same as the category of the target object marked in the training set in step 401 .
  • the categories of the corresponding labeled target objects are people, motor vehicles and non-motor vehicles. It is used to verify the verification set of the target detection model, and the categories of the corresponding marked target objects are also people, motor vehicles and non-motor vehicles.
  • Step 1002 the target detection model training device inputs multiple verification data sets into the target detection model respectively, and obtains the accuracy rates under the multiple verification data sets.
  • the target detection model training device determines the verification detection result of the target detection model according to the verification set. It can be understood that the manner in which the target detection model training device determines the verification detection result according to the verification set is the same as the manner in which the target detection model training device determines the detection results of various target objects according to the training set. For details, reference may be made to the descriptions in the aforementioned steps 1101-1103.
  • the target detection model training device determines the accuracy of the detection model according to the verification detection result of the detector.
  • the accuracy rate can be expressed in the form of mAP.
  • the present disclosure can further verify the accuracy of the detector according to the verification set after the detection model training is completed, so that the detection model can have a better detection effect when it is put into practical application.
  • the present disclosure also provides a target detection model training method, including the following steps 1101-1103:
  • Step 1101 the target detection model training device acquires a training set.
  • the training set includes multiple training data sets, each training data set includes labeled data of one or more types of target objects, and at least two of the multiple data sets have different types of labeled target objects.
  • training set here is the same as the training set described in step 601 above, and details are not repeated in this embodiment.
  • Step 1102 the target detection model training device determines the optimal detection model.
  • the optimal detection model is the detection model with the highest accuracy rate among the historical training detection models.
  • the historically trained detection model includes a detection model whose parameters have been updated after each iteration of training.
  • the accuracy here is evaluated by the mAP value, that is, among the detection models whose parameters have been updated after each iteration of training, the detection model with the highest mAP is the optimal detection model.
  • the structure of the detection model in this embodiment may adopt the same structure as in the foregoing embodiments, that is, the same as that described in step 502 above.
  • the structure of the detection model in the embodiment may also adopt other convolutional structure models.
  • the solution of this embodiment will be introduced below by taking the structure of the detection model in this embodiment as the same as that in the foregoing embodiments as an example.
  • Step 1103 the target detection model training device performs iterative training on the detection model according to the training set, and performs pseudo-label labeling on the training set according to the optimal detection model, and continues to train the detection model to obtain the target detection model.
  • step 701 to step 704 the process of iteratively training the detection model by the target detection model training device according to the training set refers to the above step 701 to step 704, which will not be repeated here.
  • the target detection model training device performs pseudo-labeling on the training set according to the optimal detection model, and continues to train the detection model to obtain the target detection model, which may include: according to the optimal detection model, for each The missing target objects in the training data set are marked with pseudo-labels to obtain positive sample label data and negative sample label data; wherein, the missing target objects are target objects that are not labeled in the training data set; and then, the target detection model training device is based on the positive sample The sample label data and the positive sample loss function determine the positive sample loss value, and determine the negative sample loss value according to the negative sample label data and the negative sample loss function; finally, the target detection model training device adjusts the parameters of the detection model according to the total loss value .
  • the target detection model training device performs pseudo-labeling on the training set according to the optimal detection model, and the specific process of continuing to train the detection model to obtain the target detection model can refer to the following steps 1201 to 1205, which will not be repeated here.
  • the target detection model is a detection model that can be used in practical applications after iterative training and meets preset requirements.
  • meeting the preset requirement may mean that the total loss function of the detection results of the detection model reaches convergence.
  • the accuracy rate of the detection result of the detection model reaches a preset required percentage, and the accuracy rate here may use a mAP value.
  • the total loss value may be determined from the first loss value, the positive sample loss value and the negative sample loss value.
  • the total loss function includes a first loss function, a positive sample loss function, and a negative sample loss function.
  • the embodiment of the present disclosure determines the optimal detection model with the highest historical accuracy in each iterative training process of the detection model, and uses the optimal detection model to simulate the training set.
  • Label labeling which combines the labeling data obtained after the pseudo-label labeling of the training set and the labeling data of the real training set to perform fusion training on the detection model, which improves the detection recall rate of the final target detection model in cross-scenario .
  • a higher detection accuracy can be achieved.
  • the target detection model training device performs pseudo-label labeling on the detection model for iterative training according to the optimal detection model, and makes a specific introduction:
  • step 1103 specifically includes the following steps 1201-1205:
  • Step 1201 the target detection model training device performs pseudo-label labeling on missing target objects in each training data set in the training set according to the optimal detection model, and obtains positive sample label data and negative sample label data.
  • the target detection model training device inputs the training set into the optimal detection model, and determines the detection score of the optimal detection model for each target object.
  • the detection score may be implemented as a confidence score of the optimal detection model for the target object.
  • the method of judging the positive sample label data is: for each target object, if the detection score of the optimal detection model for the target object is greater than or equal to the positive sample score threshold, then determine that the labeled data corresponding to the target object is a positive sample label data.
  • the method of judging the negative sample label data is: for each target object, if the detection score of the optimal detection model for the target object is less than or equal to the negative sample score threshold, then determine the labeled data corresponding to the target object as the negative sample label data .
  • Step 1202 the target detection model training device determines the positive sample loss value according to the positive sample label data and the positive sample loss function.
  • the positive sample loss function satisfies the following formula 4:
  • Loss pos represents the loss value of the positive sample
  • score(s) represents the detection score of each missing target object
  • TH pos represents the score threshold of the positive sample
  • BCELoss represents the BCE loss function
  • P pos (s) represents The predicted value corresponding to the Anchor of the sth positive sample label data.
  • Step 1203 the object detection model training device determines the negative sample loss value according to the negative sample label data and the negative sample loss function.
  • the negative sample loss function satisfies the following formula 5:
  • Loss neg represents the negative sample loss function
  • score(s) represents the detection score of each missing target object
  • TH neg represents the negative sample score threshold
  • BCELoss represents the BCE loss function
  • P neg (s) represents The predicted value corresponding to the Anchor of the sth negative sample label data.
  • Step 1204 the target detection model training device adjusts the parameters of the detection model according to the total loss value.
  • the total loss value is determined according to the first loss value, the positive sample loss value and the negative sample loss value.
  • the total loss value in this embodiment consists of the first loss value, the positive sample loss value and the negative sample loss value, A weighted sum determination is made.
  • the calculation method of the first loss value is referred to above.
  • the target detection model training device predetermines the first weight, the second weight and the third weight, and then the product of the first weight and the first loss value, the product of the second weight and the positive sample loss value , and the product of the third weight and the negative sample loss value are added to calculate the total loss value.
  • Step 1205 the target detection model training device determines the detection model when the total loss function converges as the target detection model.
  • the total loss function includes a first loss function, a positive sample loss function, and a negative sample loss function.
  • the formula of the total loss function is the first loss function multiplied by the first loss value, the positive sample loss function multiplied by the second weight value, and the negative sample loss function multiplied by the third weight value , and the products of the three are added together to obtain.
  • the embodiment of the present disclosure can perform pseudo-label labeling on the training set through the optimal detection model during the iterative training process of the detection model, determine the positive sample label data and negative sample label data in the training set, and then obtain the corresponding The loss value, and accordingly continuously update the parameters in the detection model, so that each output detection result is closer to the correct result reflected in the labeled data of the target object in the training set.
  • the target detection model thus obtained can well complete the detection of the target object in subsequent practical applications.
  • step 1201 the determination process of the positive sample score threshold and the negative sample score threshold is described:
  • step 1201 specifically includes the following steps 1301-1304:
  • Step 1301 the target detection model training device acquires a verification set.
  • the verification set includes a plurality of verification data sets corresponding to a plurality of training data sets one-to-one, each verification data set includes labeled data of one or more target objects, and the accuracy of the detection model is determined according to the verification set.
  • Step 1302 the target detection model training device determines the detection score of the optimal detection model for each target object in the verification set.
  • the detection score is a quantified parameter for a detection result of the optimal detection model. Specifically for a target object, the process of determining the detection score through the detection model will not be described in this embodiment.
  • Step 1303 the target detection model training device determines the negative sample score threshold according to the detection score and preset recall rate of each target object.
  • the target detection model training apparatus sets the preset recall rate to 0.95. At this time, the target detection model training device sets the initial negative sample score threshold and continuously adjusts it until the recall rate of the optimal detector for the detection scores of all target objects meets the preset recall rate of 0.95, then the negative sample score threshold at this time is set to output, as the final negative sample score threshold.
  • Step 1304 the target detection model training device determines the positive sample score threshold according to the detection score and preset accuracy of each target object.
  • the target detection model training device sets the preset accuracy to 0.95. At this time, the target detection model training device sets the initial positive sample score threshold and continuously adjusts it until the accuracy of the detection scores of the optimal detector for all target objects meets the preset accuracy of 0.95, then outputs the positive sample score threshold at this time, as the final positive sample score threshold.
  • the embodiment of the present disclosure can determine the positive sample score threshold and negative sample score for determining positive sample label data and negative sample label data based on the verification set and the optimal detection model determined from the historical detection model Threshold, in order to facilitate the smooth progress of the subsequent training process.
  • the embodiments of the present disclosure can divide the target detection system and the target detection model training device into functional modules or functional units according to the above method examples.
  • each functional module or functional unit can be divided corresponding to each function, or two or two
  • the above functions are integrated in one processing module.
  • the above-mentioned integrated modules can be implemented not only in the form of hardware, but also in the form of software function modules or functional units.
  • the division of modules or units in the embodiments of the present disclosure is schematic, and is only a logical function division, and there may be another division manner in actual implementation.
  • FIG. 14 it is a schematic structural diagram of an object detection device 1400 provided according to some embodiments, and the device includes: an acquisition unit 1401 and a processing unit 1402 .
  • the obtaining unit 1401 is configured to obtain the image to be detected.
  • the processing unit 1402 is configured to process the image to be detected by using a target detection model to obtain a target detection result corresponding to the target to be detected in the image to be detected.
  • the processing unit 1402 is further configured to: in the layer corresponding to the coordinate channel When the detection result calculated by the target channel layer is greater than or equal to the threshold, the coordinate prediction value of the coordinate channel layer is obtained.
  • the processing unit 1402 is further configured to: if the detection result calculated by the target channel layer corresponding to the coordinate channel layer is less than a threshold, not acquire the coordinate prediction value of the coordinate channel layer.
  • the object detection apparatus 1400 may further include a storage unit (shown by a dotted line box in FIG. 14 ), where programs or instructions are stored in the storage unit.
  • the processing unit 1402 executes the program or instruction
  • the object detection apparatus 1400 can execute the detector training method described in the method embodiment above.
  • FIG. 15 it is a schematic structural diagram of a target detection model training device 1500 provided according to some embodiments, and the device includes: an acquisition unit 1501 and a processing unit 1502 .
  • the acquiring unit 1501 is configured to: acquire a training set.
  • the training set includes multiple training data sets, and each training data set includes labeled data of one or more types of target objects, and at least two data sets in the multiple training data sets have different types of labeled target objects.
  • the processing unit 1502 is configured to iteratively train the detector model according to the training set to obtain a trained target detector model.
  • the processing unit 1502 is further configured to: for each iteration, input the training set into the detector model, and determine the detection results of various target objects.
  • the processing unit 1502 is further configured to: calculate the first loss value according to the detection results of various target objects and the first loss function, and adjust parameters of the detector model.
  • the first loss function includes an objective loss function, a coordinate loss function, and a class loss function.
  • the processing unit 1502 is further configured to: determine the detector model when the first loss function converges as the trained target detector model.
  • the obtaining unit 1501 is further configured to: obtain a verification set.
  • the verification set includes a plurality of verification data sets corresponding to the plurality of training data sets, and each verification data set includes labeled data of one or more target objects.
  • the processing unit 1502 is further configured to: respectively input multiple verification data sets into the target detection model to obtain accuracy rates under the multiple verification data sets.
  • the processing unit 1502 is further configured to: calculate the sum of accuracy rates under multiple verification data sets as the total accuracy rate of the trained target detection model. Or, the accuracies of multiple verification data sets are collectively used as the total accuracy of the trained object detection model.
  • the target detection model training device 1500 may also include a storage unit (represented by dashed line box), the storage unit stores programs or instructions.
  • the processing unit 1502 executes the program or instruction
  • the object detection model training apparatus 1500 can execute the detector training method described in the above method embodiment.
  • FIG. 16 it is a schematic structural diagram of a target detection model training device 1600 provided according to some embodiments, and the device includes: an acquisition unit 1601 and a processing unit 1602 .
  • the acquiring unit is configured to: acquire a training set.
  • the training set includes multiple training data sets, each training data set includes labeled data of one or more categories of target objects, and at least two of the multiple datasets label the categories of the target objects differently.
  • the processing unit 1602 is configured to: determine an optimal detection model.
  • the optimal detection model is the detection model with the highest accuracy among the historical training detection models, and the historical training detection model includes a detection model whose parameters have been updated after each iteration of training.
  • the processing unit 1602 is further configured to: perform iterative training on the detection model according to the training set, perform pseudo-label labeling on the training set according to the optimal detection model, and continue training the detection model to obtain a target detection model.
  • the processing unit 1602 is further configured to: determine pseudo-label data according to the optimal detection model.
  • the pseudo-label data includes labeled data of various missing target objects, and the category of the missing target object is different from the category of the target object corresponding to the labeled data included in the training set.
  • the processing unit 1602 is further configured to: mark the missing target objects in the training set according to the pseudo-label data to obtain positive sample label data and negative sample label data.
  • the processing unit 1602 is further configured to: determine a positive sample loss value according to the positive sample label data and the positive sample loss function.
  • the processing unit 1602 is further configured to: determine a negative sample loss value according to the negative sample label data and the negative sample loss function.
  • the processing unit 1602 is further configured to: adjust the parameters of the detection model according to the total loss value.
  • the total loss value is determined according to the first loss value, the positive sample loss value and the negative sample loss value.
  • the processing unit 1602 is further configured to: determine the detection model when the total loss function converges as the target detection model.
  • the total loss function includes a first loss function, a positive sample loss function, and a negative sample loss function.
  • the processing unit 1602 is further configured to: input the training set into the optimal detection model, and determine the detection score of the optimal detection model for each missing target object.
  • the processing unit 1602 is further configured to: for each target object, if the detection score of the optimal detection model for the missing target object is greater than or equal to the positive sample score threshold, then determine the label data corresponding to the missing target object Label data for positive samples.
  • the processing unit 1602 is further configured to: for each target object, if the detection score of the optimal detection model for the missing target object is less than or equal to the negative sample score threshold, determine the label data corresponding to the missing target object Label data for negative samples.
  • the acquisition unit 1601 is further configured to: acquire a verification set; the verification set includes a plurality of verification data sets corresponding to a plurality of training data sets, and each verification data set includes one or more target The labeled data of the object, the accuracy of the detection model is determined according to the verification set.
  • the processing unit 1602 is further configured to: determine the detection score of the optimal detection model for each target object in the verification set.
  • the processing unit 1602 is further configured to: determine a negative sample score threshold according to the detection score and the preset recall rate of each target object.
  • the processing unit 1602 is further configured to: determine a positive sample score threshold according to the detection score and preset accuracy of each target object.
  • the processing unit 1602 is further configured to: determine the first weight, the second weight, and the third weight.
  • the processing unit 1602 is further configured to: according to the product of the first weight value and the first loss value, the product of the second weight value and the positive sample loss value, and the third weight value and the negative sample loss value Product to determine the total loss value.
  • the object detection model training apparatus 1600 may also include a storage unit (shown by a dashed box in FIG. 14 ), where programs or instructions are stored.
  • the processing unit 1602 executes the program or instruction
  • the object detection model training apparatus 1600 can execute the detector training method described in the above method embodiment.
  • Fig. 17 shows another possible structural schematic diagram of the target detection device involved in the above-mentioned embodiment.
  • the object detection device 1700 includes: a processor 1702 and a communication interface 1703 .
  • the processor 1702 is configured to control and manage the actions of the object detection apparatus 1700, for example, to execute the steps executed by the acquisition unit 1401 and the processing unit 1402, and/or configured to execute other processes of the techniques described herein.
  • the communication interface 1703 is configured to support communication between the object detection apparatus 1700 and other network entities.
  • the object detection apparatus 1700 may further include a memory 1701 and a bus 1704 , and the memory 1701 is configured to store program codes and data of the object detection apparatus 1700 .
  • the memory 1701 may be a memory in the target detection device 1700, etc., and the memory may include a volatile memory, such as a random access memory; the memory may also include a non-volatile memory, such as a read-only memory, a flash memory, Hard disk or solid state disk; the storage may also include a combination of the above-mentioned types of storage.
  • a volatile memory such as a random access memory
  • the memory may also include a non-volatile memory, such as a read-only memory, a flash memory, Hard disk or solid state disk
  • the storage may also include a combination of the above-mentioned types of storage.
  • the aforementioned processor 1702 may realize or execute various exemplary logical blocks, modules and circuits described in connection with the disclosure of the present disclosure.
  • the processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the disclosure of the present disclosure.
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and the like.
  • the bus 1704 may be an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • EISA Extended Industry Standard Architecture
  • the bus 1704 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 17 , but it does not mean that there is only one bus or one type of bus.
  • the object detection device 1700 in FIG. 17 can also be a chip.
  • the chip includes one or more than two (including two) processors 1702 and a communication interface 1703 .
  • the chip further includes a memory 1701 .
  • the memory 1701 may include a read-only memory and a random access memory, and provides operation instructions and data to the processor 1702 .
  • a part of the memory 1701 may also include a non-volatile random access memory (non-volatile random access memory, NVRAM).
  • the memory 1701 stores the following elements, execution modules or data structures, or their subsets, or their extended sets.
  • the corresponding operation is executed by calling the operation instruction stored in the memory 1701 (the operation instruction may be stored in the operating system).
  • the target detection model training device 1800 includes: a processor 1802 and a communication interface 1803 .
  • the processor 1802 is configured to control and manage the actions of the object detection model training device 1800, for example, to execute the steps performed by the acquisition unit 1501, the processing unit 1502, the acquisition unit 1601, and the processing unit 1602, and/or to execute the steps herein. Other procedures for the techniques described.
  • the communication interface 1803 is configured to support communication between the object detection model training apparatus 1800 and other network entities.
  • the object detection model training apparatus 1800 may further include a memory 1801 and a bus 1804 , and the memory 1801 is configured to store program codes and data of the object detection model training apparatus 1800 .
  • the memory 1801 can be the memory in the target detection model training device 1800, etc., and the memory can include a volatile memory, such as a random access memory; the memory can also include a non-volatile memory, such as a read-only memory, flash Memory, hard disk or solid state disk; the memory may also include a combination of the above-mentioned types of memory.
  • a volatile memory such as a random access memory
  • the memory can also include a non-volatile memory, such as a read-only memory, flash Memory, hard disk or solid state disk
  • the memory may also include a combination of the above-mentioned types of memory.
  • the above-mentioned processor 1802 may implement or execute various exemplary logical blocks, modules and circuits described in connection with the disclosure of the present disclosure.
  • the processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It can implement or execute the various illustrative logical blocks, modules and circuits described in connection with the disclosure of the present disclosure.
  • the processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and the like.
  • the bus 1804 may be an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • EISA Extended Industry Standard Architecture
  • the bus 1804 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 18 , but it does not mean that there is only one bus or one type of bus.
  • the target detection model training device 1800 in FIG. 18 can also be a chip.
  • the chip includes one or more than two (including two) processors 1802 and a communication interface 1803 .
  • the chip further includes a memory 1801 , which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor 1802 .
  • a part of the memory 1801 may also include a non-volatile random access memory (non-volatile random access memory, NVRAM).
  • the memory 1801 stores the following elements, execution modules or data structures, or their subsets, or their extended sets.
  • the corresponding operation is executed by calling the operation instruction stored in the memory 1801 (the operation instruction may be stored in the operating system).
  • Some embodiments of the present disclosure provide a computer-readable storage medium (for example, a non-transitory computer-readable storage medium), in which computer program instructions are stored in the computer-readable storage medium.
  • the computer for example, the detector training device
  • the computer is made to execute the object detection method and the object detector model training method described in any of the above embodiments.
  • the above-mentioned computer-readable storage medium may include, but is not limited to: a magnetic storage device (for example, a hard disk, a floppy disk, or a magnetic tape, etc.), an optical disk (for example, a CD (Compact Disk, a compact disk), a DVD (Digital Versatile Disk, Digital Versatile Disk), etc.), smart cards and flash memory devices (for example, EPROM (Erasable Programmable Read-Only Memory, Erasable Programmable Read-Only Memory), card, stick or key drive, etc.).
  • Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • Some embodiments of the present disclosure also provide a computer program product, for example, the computer program product is stored on a non-transitory computer-readable storage medium.
  • the computer program product includes computer program instructions.
  • the computer program instructions When the computer program instructions are executed on a computer (for example, a detector training device), the computer program instructions cause the computer to execute the target detection method and the target detector model training described in the above-mentioned embodiments. method.
  • Some embodiments of the present disclosure also provide a computer program.
  • the computer program When the computer program is executed on a computer (for example, a detector training device), the computer program causes the computer to execute the object detection method and the object detector model training method described in the above-mentioned embodiments.
  • the disclosed system, device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种目标检测方法、目标检测模型训练方法及装置,包括:获取待检测图像;采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果;其中,目标检测模型包括特征提取网络和目标预测网络;特征提取网络用于对待检测图像进行特征提取得到多种目标对象相关的图像特征;目标检测网络用于对图像特征进行处理得到目标检测结果;目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层。

Description

一种目标检测方法、目标检测模型训练方法及装置
本申请要求于2022年02月25日提交国际局、国际申请号为PCT/CN2022/078114、申请名称为“检测器训练方法、装置及存储介质”的专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像检测领域,尤其涉及一种目标检测方法、目标检测模型训练方法及装置。
背景技术
多数据集融合检测训练是指利用多个不同类别标注的数据集,对一个单检测器模型进行训练,进而实现全类别的目标检测。与并行使用多个单检测器相比,多数据集融合检测训练可以实现单个检测器同时检测全部类别目标,计算量要小得多,因此具备有很高的实际应用价值。
发明内容
一方面,提供一种目标检测方法,该方法包括:获取待检测图像;采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果;其中,目标检测模型包括特征提取网络和目标预测网络;特征提取网络用于对待检测图像进行特征提取得到多种目标对象相关的图像特征;目标检测网络用于对图像特征进行处理得到目标检测结果;目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;目标通道层用于输出表征是否存在目标对象的检测预测值,每个目标通道层用于检测多种目标对象中的至少一种,多个目标通道层用于检测的目标对象的类别不同;类别通道层用于输出多种目标对象对应的类别预测值;坐标通道层用于输出目标对象对应的坐标预测值;目标检测结果是基于检测预测值、类别预测值和坐标预测值计算得到的。
在一些实施例中,多个坐标通道层和多个目标通道层一一对应,每个坐标通道层和与其对应的目标通道层检测的目标对象的类别相同;坐标通道层用于在对应的目标通道层检测到目标对象时获取目标对象的坐标预测值。
在一些实施例中,目标检测结果包括检测结果和坐标结果;检测结果为根据目标通道层的检测预测值和对应的类别预测值融合计算得到;坐标通道 层用于在对应的目标通道层检测到目标对象时确定目标对象的坐标预测值,包括:在与坐标通道层对应的目标通道层计算得到的检测结果大于或等于阈值的情况下,获取坐标通道层的坐标预测值;在与坐标通道层对应的目标通道层计算得到的检测结果小于阈值的情况下,不获取坐标通道层的坐标预测值。
在一些实施例中,坐标通道层、多个目标通道层和多个坐标通道层的结构为卷积结构;卷积结构的卷积核大小为一乘一。
在一些实施例中,特征提取网络包括主干网络和过渡网络,主干网络用于根据待检测图像确定通用类别的图像特征,过渡网络用于根据通用类别的图像特征确定与多种目标对象相关的图像特征。
另一方面,提供一种目标检测模型训练方法,包括:获取训练集;训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个训练数据集中的至少两个数据集标注目标对象的类别不同;根据训练集对检测模型进行迭代训练,得到目标检测模型;其中,目标检测模型包括特征提取网络和目标预测网络;特征提取网络用于对待检测图像进行特征提取得到多种目标对象相关的图像特征;目标检测网络用于对图像特征进行处理得到目标检测结果;目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;目标通道层用于输出表征是否存在目标对象的检测预测值,每个目标通道层用于检测多种目标对象中的至少一种,多个目标通道层用于检测的目标对象的类别不同;类别通道层用于输出多种目标对象对应的类别预测值;坐标通道层用于输出目标对象对应的坐标预测值;目标检测结果是基于检测预测值、类别预测值和坐标预测值计算得到的。
在一些实施例中,根据训练集对检测模型进行迭代训练,得到目标检测模型,包括:针对每一次迭代,将训练集输入检测模型,确定多种目标对象的检测结果;根据多种目标对象的检测结果和第一损失函数计算第一损失值,并调整检测模型的参数;第一损失函数包括目标损失函数、坐标损失函数、以及类别损失函数;将第一损失函数收敛时的检测模型确定为目标检测模型。
在一些实施例中,目标损失函数满足以下公式:
其中,Lobj+表示训练集中正样本的目标损失值,NP表示目标通道层的总数量,b表示目标通道层的编号,Target(b)表示第b个目标通道层对应的 正样本的Anchor集合,BCELoss表示BCE损失函数,s表示正样本的编号,Pobj(s,b)表示第b个目标通道层与第s个正样本的Anchor对应的目标预测值,GTobj(s)表示第s个正样本的Anchor对应的目标真值;Lobj-表示训练集中负样本的目标损失值,Lobj(b)表示第b个目标通道层对应的目标对象的类别子集,1(……)为取值函数,当输入为True时取值为1,否则取值为0,Ldata表示当前训练数据所标注的目标对象的类别集合,H表示目标通道层输出的数据矩阵的行数,W表示目标通道层输出的数据矩阵的列数,p表示像素点的编号,Anchor表示全部的Anchor集合,a表示像素点p的Anchor,Mask(p,a)表示像素点p对应的位置是否有标注框,Pobj(p,a,b)表示第b个目标通道层输出的像素点p的第a个Anchor的目标预测值,GTobj(p,a)表示像素点p的第a个Anchor的目标真值。
在一些实施例中,坐标损失函数满足以下公式:
其中,Lbox表示坐标损失值,NP表示目标通道层的总数量,b表示目标通道层的编号,Target(b)表示第b个目标通道层对应的正样本的Anchor集合,IOU表示重叠度(intersection over union,IOU)计算函数,s表示正样本的编号,Phox(s,b)表示第b个目标通道层输出的第s个正样本的坐标预测值,GTbox(s)表示第s个正样本的坐标真值。
在一些实施例中,类别损失函数满足以下公式:
其中,Lcls表示类别损失值,Class表示目标对象的类别总数,1[……]为取值函数,当输入为True时取值为1,否则取值为0,b表示目标通道层的编号,Bcls(b)表示第b个目标通道层对应的第二类别的集合,Len(Bcls(b))表示第b个目标通道层对应的目标对象的类别子集,H表示目标通道层输出的数据矩阵的行数,W表示目标通道层输出的数据矩阵的列数,Anchor表示全部的Anchor集合,Mask(p,a)表示像素点p对应的位置是否有标注框,BCELos表示BCE损失函数,Pcls(p,a,c)表示类别预测值,GTcls(p,a,c)表示类别真值。
在一些实施例中,还包括:获取验证集;验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据;将多个验证数据集分别输入目标检测模型,得到多个验证数据集下的准确率;将多个验证数据集下的准确率进行加和计算,作为训练后的目标检测模型的总准确率;或,将多个验证数据集的准确率,共同作为训练后的目标检测模型的总准确率。
再一方面,提供一种目标检测模型训练方法,包括:获取训练集;训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个数据集中的至少两个数据集标注目标对象的类别不同;确定最优检测模型;最优检测模型为历史训练检测模型中准确率最高的检测模型,历史训练检测模型包括每一次迭代训练后更新过参数的检测模型;根据训练集,对检测模型进行迭代训练,并根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型。
在一些实施例中,根据最优检测模型对进行迭代训练的检测模型进行伪标签标注,得到目标检测模型,包括:根据最优检测模型,对训练集中每个训练数据集的缺失目标对象进行伪标签标注,得到正样本标签数据和负样本标签数据;其中,缺失目标对象为训练数据集未标注类别的目标对象;根据正样本标签数据和正样本损失函数,确定正样本损失值;根据负样本标签数据和负样本损失函数,确定负样本损失值;根据总损失值,调整检测模型的参数;总损失值根据第一损失值、正样本损失值和负样本损失值确定;将总损失函数收敛时的检测模型确定为目标检测模型;总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。
在一些实施例中,根据最优检测模型,对训练集中的缺失目标对象进行标注,得到正样本标签数据和负样本标签数据,包括:将训练集输入最优检测模型,确定最优检测模型对于每个缺失目标对象的检测得分;对于每个缺失目标对象,若最优检测模型对于缺失目标对象的检测得分大于或等于正样本得分阈值,则确定缺失目标对象对应的标注数据为正样本标签数据;对于每个缺失目标对象,若最优检测模型对于缺失目标对象的检测得分小于或等于负样本得分阈值,则确定缺失目标对象对应的标注数据为负样本标签数据。
在一些实施例中,正样本得分阈值和负样本得分阈值根据以下步骤确定:获取验证集;所述验证集包括与所述多个训练数据集一一对应的多个验证数据集,每个所述验证数据集包括一种或多种目标对象的标注数据,所述检测模型的准确率根据所述验证集确定;确定最优检测模型对于验证集中每个目标对象的检测得分;根据每个目标对象的检测得分和预设召回率,确定负样本得分阈值;根据每个目标对象的检测得分和预设精度,确定正样本得分阈值。
在一些实施例中,还包括:确定第一权值、第二权值和第三权值;根据第一权值与第一损失值的乘积、第二权值和正样本损失值的乘积、以及第三权值和负样本损失值的乘积,确定总损失值。
又一方面,提供一种目标检测装置,包括:获取单元和处理单元;获取单元,被配置为获取待检测图像;处理单元,被配置为采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果;其中,目标检测模型包括特征提取网络和目标预测网络;特征提取网络用于对待检测图像进行特征提取得到多种目标对象相关的图像特征;目标检测网络用于对图像特征进行处理得到目标检测结果;目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;目标通道层用于输出表征是否存在目标对象的检测预测值,每个目标通道层用于检测多种目标对象中的至少一种,多个目标通道层用于检测的目标对象的类别不同;类别通道层用于输出多种目标对象对应的类别预测值;坐标通道层用于输出目标对象对应的坐标预测值;目标检测结果是基于检测预测值、类别预测值和坐标预测值计算得到的。
在一些实施例中,处理单元,还被配置为:在与坐标通道层对应的目标通道层计算得到的检测结果大于或等于阈值的情况下,获取坐标通道层的坐标预测值。
在一些实施例中,处理单元,还被配置为:在与坐标通道层对应的目标通道层计算得到的检测结果小于阈值的情况下,不获取坐标通道层的坐标预测值。
又一方面,提供一种目标检测模型训练装置,包括:获取单元和处理单元。获取单元,被配置为:获取训练集。训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个训练数据集中的至少两个数据集标注目标对象的类别不同。处理单元,被配置为:根据训练集对检测器模型进行迭代训练,得到目标检测器模型。其中,目标检测模型包括特征提取网络和目标预测网络;特征提取网络用于对待检测图像进行特征提取得到多种目标对象相关的图像特征;目标检测网络用于对图像特征进行处理得到目标检测结果;目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;目标通道层用于输出表征是否存在目标对象的检测预测值,每个目标通道层用于检测多种目标对象中的至少一种,多个目标通道层用于检测的目标对象的类别不同;类别通道层用于输出多种目标对象对应的类别预测值;坐标通道层用于输出目标对象对应的坐标预测值;目标检测结果是基于检测预测值、类别预测值和坐标预测值计算得到的。
在一些实施例中,处理单元,还被配置为:针对每一次迭代,将训练集输入检测器模型,确定多种目标对象的检测结果。
在一些实施例中,处理单元,还被配置为:根据多种目标对象的检测结果和第一损失函数计算第一损失值,并调整检测器模型的参数。第一损失函数包括目标损失函数、坐标损失函数、以及类别损失函数。
在一些实施例中,处理单元,还被配置为:将第一损失函数收敛时的检测器模型确定为训练后的目标检测器模型。
在一些实施例中,获取单元,还被配置为:获取验证集。验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据。
在一些实施例中,处理单元,还被配置为:将多个验证数据集分别输入目标检测模型,得到多个验证数据集下的准确率。
在一些实施例中,处理单元,还被配置为:将多个验证数据集下的准确率进行加和计算,作为训练后的目标检测模型的总准确率。或,将多个验证数据集的准确率,共同作为训练后的目标检测模型的总准确率。
又一方面,提供一种目标检测模型训练装置,包括:获取单元和处理单元。
其中,获取单元,被配置为:获取训练集。训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个数据集中的至少两个数据集标注目标对象的类别不同。
处理单元,被配置为:确定最优检测模型。最优检测模型为历史训练检测模型中准确率最高的检测模型,历史训练检测模型包括每一次迭代训练后更新过参数的检测模型。
处理单元,还被配置为:根据训练集,对检测模型进行迭代训练,并根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型。
在一些实施例中,处理单元,还被配置为:根据最优检测模型,对训练集中每个训练数据集的缺失目标对象进行伪标签标注,得到正样本标签数据和负样本标签数据;其中,缺失目标对象为训练数据集未标注类别的目标对象。
在一些实施例中,处理单元,还被配置为:根据正样本标签数据和正样本损失函数,确定正样本损失值。
在一些实施例中,处理单元,还被配置为:根据负样本标签数据和负样本损失函数,确定负样本损失值。
在一些实施例中,处理单元,还被配置为:根据总损失值,调整检测模 型的参数。总损失值根据第一损失值、正样本损失值和负样本损失值确定。
在一些实施例中,处理单元,还被配置为:将总损失函数收敛时的检测模型确定为目标检测模型。总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。
在一些实施例中,处理单元,还被配置为:将进行标注后的训练集输入检测模型,确定检测模型对于每个缺失目标对象的检测得分。
在一些实施例中,处理单元,还被配置为:对于伪标签数据中对应的每个缺失目标对象,若检测模型对于缺失目标对象的检测得分大于或等于正样本得分阈值,则确定缺失目标对象对应的标注数据为正样本标签数据。
在一些实施例中,处理单元,还被配置为:对于伪标签数据中对应的每个缺失目标对象,若检测模型对于缺失目标对象的检测得分小于或等于负样本得分阈值,则确定缺失目标对象对应的标注数据为负样本标签数据。
在一些实施例中,获取单元,还被配置为:获取验证集;验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据,检测模型的准确率根据验证集确定。
在一些实施例中,处理单元,还被配置为:确定最优检测模型对于所述验证集中每个目标对象的检测得分。
在一些实施例中,处理单元,还被配置为:根据每个目标对象的检测得分和预设召回率,确定负样本得分阈值。
在一些实施例中,处理单元,还被配置为:根据每个目标对象的检测得分和预设精度,确定正样本得分阈值。
在一些实施例中,处理单元,还被配置为:确定第一权值、第二权值和第三权值。
在一些实施例中,处理单元,还被配置为:根据第一权值与第一损失值的乘积、第二权值和正样本损失值的乘积、以及第三权值和负样本损失值的乘积,确定总损失值。
又一方面,提供一种目标检测装置,包括:处理器和通信接口;所述通信接口和所述处理器耦合,所述处理器用于运行计算机程序或指令,以实现如上述任一实施例所述的目标检测方法。
又一方面,提供一种目标检测装置,包括:处理器和通信接口;所述通信接口和所述处理器耦合,所述处理器用于运行计算机程序或指令,以实现如上述任一实施例所述的目标检测模型训练方法。
又一方面,提供一种非暂态计算机可读存储介质。所述非暂态计算机可 读存储介质存储有计算机程序指令,所述计算机程序指令在计算机(例如,目标检测装置)上运行时,使得所述计算机执行如上述任一实施例所述的目标检测方法。
又一方面,提供一种非暂态计算机可读存储介质。所述非暂态计算机可读存储介质存储有计算机程序指令,所述计算机程序指令在计算机(例如,目标检测模型训练装置)上运行时,使得所述计算机执行如上述任一实施例所述的目标检测模型训练方法。
又一方面,提供一种计算机程序产品。所述计算机程序产品包括计算机程序指令,在计算机(例如,检测器训练装置)上执行所述计算机程序指令时,所述计算机程序指令使计算机执行如上述任一实施例所述的目标检测、目标检测模型训练方法。
又一方面,提供一种计算机程序。当所述计算机程序在计算机(例如,检测器训练装置)上执行时,所述计算机程序使计算机执行如上述任一实施例所述的目标检测、目标检测模型训练方法。
又一方面,提供一种芯片,芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行计算机程序或指令,以实现如上述任一实施例所述的目标检测、目标检测模型训练方法。
在一些实施例中,本公开中提供的芯片还包括存储器,用于存储计算机程序或指令。
需要说明的是,上述计算机指令可以全部或者部分存储在计算机可读存储介质上。其中,计算机可读存储介质可以与装置的处理器封装在一起的,也可以与装置的处理器单独封装,本公开对此不作限定。
又一方面,提供一种目检测系统,包括:目标检测装置和目标检测模型训练装置,其中检测器训练装置用于执行如上述任一实施例所述的目标检测方法,目标检测模型训练装置用于执行如上述任一实施例所述的目标检测模型训练方法。
在本公开中,上述目标检测装置、目标检测模型训练装置的名字对设备或功能模块本身不构成限定,在实际实现中,这些设备或功能模块可以以其他名称出现。只要各个设备或功能模块的功能和本公开类似,属于本公开权利要求及其等同技术的范围之内。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施 例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。此外,以下描述中的附图可以视作示意图,并非对本公开实施例所涉及的产品的实际尺寸、方法的实际流程、信号的实际时序等的限制。
图1为根据一些实施例提供的一种多数据集融合检测的流程图;
图2为根据一些实施例提供的一种检测器模型的架构图;
图3为根据一些实施例提供的一种检测器模型的架构图;
图4为根据一些实施例提供的一种目标检测系统的架构图;
图5为根据一些实施例提供的一种目标检测方法的流程图;
图6为根据一些实施例提供的一种目标检测模型训练方法的流程图;
图7为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图8为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图9为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图10为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图11为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图12为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图13为根据一些实施例提供的另一种目标检测模型训练方法的流程图;
图14为根据一些实施例提供的一种目标检测装置的结构图;
图15为根据一些实施例提供的一种目标检测模型训练装置的结构图;
图16为根据一些实施例提供的另一种目标检测模型训练装置的结构图;
图17为根据一些实施例提供的另一种目标检测装置的结构图;
图18为根据一些实施例提供的另一种目标检测模型训练装置的结构图。
具体实施方式
下面将结合附图,对本公开一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开所提供的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本公开保护的范围。
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括(comprise)”及其其他形式例如第三人称单数形式“包括(comprises)”和现在分词形式“包括(comprising)”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一个实施例(one embodiment)”、“一些实施例(some embodiments)”、“示例性实施例(exemplary embodiments)”、“示例(example)”、“特定示例(specific example)”或“一些示例(some examples)”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在描述一些实施例时,可能使用了“耦接”和“连接”及其衍伸的表达。例如,描述一些实施例时可能使用了术语“连接”以表明两个或两个以上部件彼此间有直接物理接触或电接触。又如,描述一些实施例时可能使用了术语“耦接”以表明两个或两个以上部件有直接物理接触或电接触。然而,术语“耦接”或“通信耦合(communicatively coupled)”也可能指两个或两个以上部件彼此间并无直接接触,但仍彼此协作或相互作用。这里所公开的实施例并不必然限制于本文内容。
“A、B和C中的至少一个”与“A、B或C中的至少一个”具有相同含义,均包括以下A、B和C的组合:仅A,仅B,仅C,A和B的组合,A和C的组合,B和C的组合,及A、B和C的组合。
“A和/或B”,包括以下三种组合:仅A,仅B,及A和B的组合。
如本文中所使用,根据上下文,术语“如果”任选地被解释为意思是“当……时”或“在……时”或“响应于确定”或“响应于检测到”。类似地,根据上下文, 短语“如果确定……”或“如果检测到[所陈述的条件或事件]”任选地被解释为是指“在确定……时”或“响应于确定……”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。
本文中“适用于”或“被配置为”的使用意味着开放和包容性的语言,其不排除适用于或被配置为执行额外任务或步骤的设备。
另外,“基于”的使用意味着开放和包容性,因为“基于”一个或多个所述条件或值的过程、步骤、计算或其他动作在实践中可以基于额外条件或超出所述的值。
如本文所使用的那样,“约”、“大致”或“近似”包括所阐述的值以及处于特定值的可接受偏差范围内的平均值,其中所述可接受偏差范围如由本领域普通技术人员考虑到正在讨论的测量以及与特定量的测量相关的误差(即,测量系统的局限性)所确定。
以下,对本公开实施例涉及的名词进行解释,以方便读者理解。
(1)目标检测
目标检测是指在给定的图像中检测出设定类别的目标对象,例如人脸、人体、车辆或建筑物体等。目标检测所检测的结果通常会给出目标对象的区域检测框、区域坐标及所属类别,该区域检测框即为目标检测输出的检测结果中检测目标的外接矩形框。
(2)多数据集融合检测
多数据融合检测是指根据多个具有不同类别标注的数据集,对一个单检测模型进行训练,以实现全类别的目标检测。其中,数据集包括图像数据和标注数据,图像数据用于表征目标对象的图像,相应的,标注数据为对图像数据中存在的目标对象进行标注的数据。
如图1所示,多数据融合检测可在多个数据集上训练检测器(图中以数据集的数量为三个进行举例),将多个数据集输入检测模型并进行训练,并在训练完成后使用每个数据集的验证集计算检测器的全类平均正确率(mean average precision,mAP)。
然而,由于一个数据集的标注数据只针对一种类别的目标对象,每个数据集所标注的目标对象的类别又不尽相同,因此会出现对于数据集中存在的部分对象,数据集会缺失对这部分对象的标注的情况。此类情况的存在,会严重影响融合检测器的训练。
(3)神经网络
神经网络(neural networks,NNs)也称作人工神经网络(artificial neural  networks,ANNs),是一种模仿动物神经网络行为特征,进行分布式并行信息处理的数学模型算法。神经网络包括深度学习网络,例如卷积神经网络(convolutional neural networks,CNN)、长短期记忆网络(long short-term memory,LSTM)等。
在本公开中,进行检测器训练时采用的Yolov5(you only look once Version 5)算法也为神经网络的一种。
(4)损失函数
损失函数是将随机事件或其有关随机变量的取值映射为非负实数以表示该随机事件的“风险”或“损失”的函数。在应用中,损失函数通常作为学习准则与优化问题相联系,即通过最小化损失函数求解和评估模型。例如在统计学和机器学习中被用于模型的参数估计。
在本公开中,损失函数用于评估检测模型对目标对象进行检测的准确度。当检测模型输出的检测结果的损失函数满足一定预设条件时,则确定此时的检测模型已训练完成,将训练完成的检测模型确定为最终的检测模型。
一般来说,在传统的单检测模型中,只设置有一个目标通道,也即一个目标通道负责检测全部类别的目标对象。而由于现阶段的数据集一般只标注一个类别,且不同数据集标注的类别也不尽相同,因此会出现对于数据集中存在的部分对象,数据集会缺失对这部分对象的标注的情况。针对此类情况,传统的单检测模型由于只设置有一个目标通道,会导致目标通道检测出数据集未标注的对象后,被数据集中针对其他类别对象的标注数据进行错误干预的后果,严重影响检测器的训练精度。而如果通过人工对已有数据集中缺失的对象进行重新标注,则标注的工作量会很大,耗费人力成本过高,难以大规模应用。
鉴于上述现阶段方案存在的缺陷,本公开的一些实施例提供了一种目标检测方法、以及目标检测模型训练方法。概括的说,本公开在模型训练过程中,会在检测模型中设置多个目标通道层,分类别的对目标对象进行检测。由此针对某一类别的目标对象,在对应类别的目标通道层检测出该类目标对象后,若当前数据集未标注的该类别的目标对象,则在根据当前数据集对检测模型进行训练时,不会将该类别的目标对象对应的目标通道层的输出结果代入后续的训练过程中。这样一来,避免了上述传统单检测模型中,存在的目标通道层检测出数据集未标注的对象后,被数据集中针对其他类别对象的标注数据进行错误干预的问题,进而提高了检测模型的训练精度。
以及,本公开在模型训练过程中,还可以在每一次检测模型的迭代训练 过程中,确定出历史准确率最高的最优检测模型,并由最优检测模型对训练过程进行伪标签标注,由此将伪标签的标注数据和真实训练集的标注数据进行融合,对检测模型进行训练,提高了最终得出的目标检测模型在跨场景下的检测召回率,实现比传统单检测模型训练更好的训练效果。
由此,经过上述训练过程训练得出的目标检测模型在具体的目标检测应用中能够分类别的对目标对象进行检测,检测准确度较高,能够实现更好的检测效果。
下面将结合说明书附图,对本公开实施例的实施方式进行详细描述。
图2为根据一些实施例提供的一种检测模型20的架构示意图,该检测模型20为一种单检测模型,并且采用Yolov5算法为基础架构。如图2所示,该检测模型20包括:输入模块21、目标检测模块22。其中,输入模块21用于将数据集输入至检测模型20中。输入模块21与目标检测模块22之间能够进行数据传输。
目标检测模块22用于对数据集进行处理,以获取目标对象的训练检测结果。如图2所示,目标检测模块22包括主干(Backbone)网络221、过渡(Neck)网络222和检测(Detection)网络223。
其中,Backbone网络221用于对数据集中的图像数据执行提取操作,以获取通用的图像特征并将其传输至Neck网络222。相应的,Neck网络222接收Backbone网络221发送的通用的图像特征。可以理解的是,通用的图像特征即图像检测领域中,在进行初步的图像提取时,Backbone网络221对原图像数据进行提取后获取的通用类别的对象的图像特征。需要说明的是,Backbone网络221如何获取通用的图像特征,本公开在此不再赘述。示例性地,Backbone网络221的架构可以采用CSPDarkner。
Neck网络222用于从通用的图像特征中提取出与目标对象的类别强相关的图像特征,并将强相关的图像特征发送至Detection网络223。相应的,Detection网络223接收Neck网络222发送的强相关的图像特征。可以理解的是,强相关的图像特征,即为通用的图像特征经过Neck网络222进行提取操作后,获取的与目标对象的类别相近的对象的图像特征。
应理解,此处的目标对象的类别即为检测模型20设定的检测类别。需要说明的是,Neck网络222如何获取与目标对象的类别强相关的图像特征本公开在此不再赘述。示例性地,Neck网络222的架构可以采用PANet。
Detection网络223用于根据强相关的图像特征,来计算最终的目标检测结果。目标检测结果包括目标对象的区域检测框、区域坐标及所属类别。可 选地,如图3所示,检测(Detection)网络223中设置有三种数据输出通道层,分别为目标(Object)通道层31、坐标(Box)通道层32、类别(Class)通道层33。其中,Object通道层31、Box通道层32的数量皆为多个,Class通道层33的数量为1个。
其中,Object通道层31用于判断强相关的图像特征中,对应位置是否存在目标对象。若Object通道层31确定存在目标对象,则会在对应位置输出目标对象的区域检测框。
相应的,Box通道层32用于在Object通道层31确定存在目标对象的情况下,计算目标对象的具体坐标,以对目标对象的区域检测框进行微调,使得区域检测框的位置更加准确。
相应的,Class通道层33用于对目标对象的类别进行识别。
在一种可能的实现方式中,Object通道层31、Box通道层32、Class通道层33的结构为卷积结构,并且卷积结构的卷积核大小为一乘一。
以上对根据一些实施例提供的一种检测器模型检测模型的架构进行了介绍。
图4为根据一些实施例提供的一种目标检测系统40的架构图,该目标检测系统40包括:图像获取装置41、检测处理装置42、交互装置43。
其中,图像获取装置41,用于获取待检测图像。以及,将待检测图像向检测处理装置42发送。
可选地,图像获取装置41可实现为监控摄像头、相机,或其它具备图像获取功能的设备。可以理解的是,图像获取装置41可设置于待检测区域的出入口处,或设置于待检测区域内的一定垂直高度上,以便于获取检测目标的待检测图像。
检测处理装置42,用于在接收到待检测图像后,采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果。需要说明的是,具体检测处理装置42采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果的过程,参见下文步骤501-步骤503的叙述,此处不再赘述。
检测处理装置42在得到待检测图像中待检测目标对应的目标检测结果后,将该目标检测结果发送至交互装置43。
交互装置43,用于实现目标检测结果输出、以及与工作人员的人机交互。
可选地,交互装置43可包括显示终端、人机交互设备。其中,显示终端可实现为显示器、或其他具备可视化显示功能的设备,人机交互设备可实现 为触摸屏、键盘鼠标、或其它具备人家交互功能的设备。
需要指出的是,在本公开提供的目标检测方法中,执行主体是目标检测系统;在本公开提供的目标检测模型训练方法中,执行主体为目标检测模型训练装置。该目标检测系统、目标检测模型训练装置分别可以为服务器,包括:
处理器,处理器可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本公开方案程序执行的集成电路。
收发器,收发器可以是使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。
存储器,存储器可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路与处理器相连接。存储器也可以和处理器集成在一起。
本公开实施例中的目标检测系统,目标检测模型训练装置也分别可以是耦合在服务器的一部分系统,例如服务器中的芯片系统。
需要指出的是,本公开各实施例之间可以相互借鉴或参考,例如,相同或相似的步骤,方法实施例、系统实施例和装置实施例之间,均可以相互参考,不予限制。
如图5所示,图5为根据一些实施例提供的一种目标检测方法,该方法包括以下步骤501-步骤503:
步骤501、目标检测系统获取待检测图像。
其中,待检测图像即为可能包括待检测目标的图像数据。
在一种可能的实现方式中,步骤501具体可由如前文所描述的目标检测系统中包括的图像获取装置来执行,以使得目标检测系统获取待检测图像。
步骤502、目标检测系统采用目标检测模型对待检测图像进行处理,得到 待检测图像中待检测目标对应的目标检测结果。
其中,目标检测模型包括特征提取网络和目标预测网络。下面分别对特征提取网络和目标预测网络进行介绍:
(1)特征提取网络,用于对待检测图像进行特征提取得到多种目标对象相关的图像特征。
需要指出的是,此处的特征提取网络基于对前文所述的检测模型20中的主干网络221、过渡网络222来构建。
也即,特征提取网络包括主干网络和过渡网络。其中,主干网络用于根据待检测图像确定通用类别的图像特征,过渡网络用于根据通用类别的图像特征确定与多种目标对象相关的图像特征。
(2)目标检测网络用于对图像特征进行处理得到目标检测结果。
需要指出的是,此处的目标检测网络基于前文所述的检测模型20中的检测网络223来构建。
也即,目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层。并且坐标通道层、多个目标通道层和多个坐标通道层的结构为卷积结构,并且卷积结构的卷积核大小为一乘一。
其中,目标通道层用于输出表征是否存在目标对象的检测预测值,每个目标通道层用于检测多种目标对象中的至少一种,多个目标通道层用于检测的目标对象的类别不同。以及,类别通道层用于输出多种目标对象对应的类别预测值,坐标通道层用于输出其所检测的目标对象对应的坐标预测值。
需要说明的是,多个坐标通道层和多个目标通道层,是一一对应的。也就是说,每个坐标通道层和与其对应的目标通道层检测的目标对象类别是相同的,可以检测一个或多个目标对象。当在对应的目标通道层检测到存在目标对象时,坐标通道层用于同时获取目标对象的坐标预测值。
应理解,由于本实施例设置多个一一对应的目标通道层和坐标通道层,每个坐标通道层和与其对应的目标通道层检测的目标对象类别一致,因此能够使不同目标对象之间的预测不相互影响,实现检测多种类别目标,对于目标通道层检测到存在的目标对象,获取对应坐标通道层的坐标预测值,对于不存在的目标对象,丢弃对应坐标通道层的坐标预测值。这样一来,通过设置一一对应的坐标通道层和目标通道层,能够同时高效检测出不同类别目标的位置,大幅节省了算力,检测速度快。
在一种可能的实现方式中,在目标检测模型的训练过程中,同样在检测模型中设置了多个目标通道层,且这些目标通道层会分类别的对目标对象进 行检测。由此针对某一类别的目标对象,在对应类别的目标通道层检测出该类目标对象后,若当前数据集未标注的该类别的目标对象,则在根据当前数据集对检测模型进行训练时,不会将该类别的目标对象对应的目标通道层的输出结果代入后续的训练过程中。这样一来,避免了目标通道层检测出数据集未标注的对象后,被数据集中针对其他类别对象的标注数据进行错误干预的情况发生,提高了检测模型的训练精度。
可选地,在目标检测模型的训练过程中,判断目标通道层是否检测到存在目标对象,可通过如下方式实现:将每一目标通道层输出的检测预测值和类别通道层中与该目标通道层所检测类别的类别预测值进行融合计算,获得检测结果,根据该检测结果和阈值进行比较,进而判断是否存在目标通道层所检测类别的目标对象。也就是说,通过将目标通道层输出的检测预测值,与类别通道层中对应该目标通道层所检测类别的类别预测值进行融合计算,能够结合位置和类别这两个维度的信息来预测是否检测到目标对象,进而使得对是否存在目标对象的预测结果更加准确。
在另一种可能的实现方式中,本公开在目标检测模型训练时,在每一次检测模型的迭代训练过程中,确定出历史准确率最高的最优检测模型,并由最优检测模型对训练集进行伪标签标注,由此结合对训练集进行伪标签标注后得到的标注数据和真实训练集的标注数据,对检测模型进行融合训练,能够将伪标签标注应用于目标检测模型的训练过程中,提高了最终得出的目标检测模型在跨场景下的检测召回率,实现了更好的目标检测模型训练效果。
可以理解的是,上述两种可能的目标检测模型的训练方法,在单独应用与本实施例中目标检测方法的同时,也能够结合应用共同对本实施例中的目标检测模型进行训练。也就是说,本实施例提供的目标检测方法中的目标检测模型,可以分别通过上述两种可能的目标检测模型训练方法,单独训练得到;亦或者,可以结合上述两种可能的目标检测模型训练方法,共同训练得到。
以上对特征提取网络和目标预测网络进行了说明。
下面对目标检测结果进行介绍:
可以理解的是,目标检测结果是基于检测预测值、类别预测值和坐标预测值计算得到的。
示例性地,目标检测结果包括检测结果和坐标结果。其中,检测结果为根据目标通道层的检测预测值和对应的类别预测值融合计算得到。具体地,可通过如下方式实施,将每一目标通道层输出的检测预测值与和对应类别的 类别预测值进行相乘,得到对应的检测结果,当检测结果高于阈值时,认为该目标通道层检测到存在所检测类别的目标对象,此时获取坐标通道层输出的坐标预测值作为坐标结果;当检测结果较低时,认为该目标通道层没有检测到对应类别的目标,此时将与该目标通道层对应的坐标通道层输出的坐标预测值直接丢弃不用。
进一步地,坐标通道层用于确定目标对象的坐标预测值时,遵循以下规则:在与坐标通道层对应的目标通道层计算得到的检测结果大于或等于阈值的情况下,获取坐标通道层的坐标预测值;在与坐标通道层对应的目标通道层计算得到的检测结果小于阈值的情况下,不获取坐标通道层的坐标预测值。
在一种可能的实现方式中,步骤502具体可由如前文所描述的目标检测系统中包括的检测处理装置来执行,以使得目标检测系统采用目标检测模型对待检测图像进行处理,得到待检测图像中待检测目标对应的目标检测结果。
步骤503、目标检测系统输出目标检测结果。
可选地,目标检测系统将目标检测结果以可视化的方式,向工作人员展示。例如,目标检测系统将目标检测结果以目标对象的区域检测框、区域坐标及所属类别的方式,展现在显示屏幕上,使得工作人员获知目标检测系统对检测目标的检测结果。
在一种可能的实现方式中,步骤503具体可由如前文所描述的目标检测系统中包括的交互装置来执行,以使得目标检测系统输出目标检测结果。
基于上述技术方案,本公开提供的目标检测系统能够对待检测图像中的目标对象进行检测,并且由于目标检测系统中的目标检测模型中设置了多个目标通道层,且这些目标通道层会分类别的对目标对象进行检测,目标检测模型的检测准确度较高,因此,本公开提供的目标检测系统针对目标对象,能够实现更好的检测效果。
如图6所示,图6为根据一些实施例提供的一种目标检测模型训练方法,该方法包括以下步骤601-步骤602:
步骤601、目标检测模型训练装置获取训练集。
其中,训练集包括多种目标对象的标注数据。示例性地,训练集包括多个训练数据集,每个训练数据集包括图像数据和对一种或多种类别的目标对象标注的数据。示例性地,训练集包括三个数据集,三个数据集对应标注的目标对象的类别分别是人、机动车和非机动车。再例如,训练集包括二个数据集,一个数据集对应标注的目标对象的类别是人,另一个数据集对应标注的目标对象的类别是机动车和非机动车。
需要说明的是,在训练集中,数据集的数量和目标对象的类别的数量,可以不相等。例如,训练集包括的多个数据集中包括标注相同目标对象类别的数据集。此处针对同一类别的目标对象,获取多个数据集是为了扩大样本数据收集范围,以提高最终训练出的检测模型的准确率。示例性地,在训练集中,存在多个数据集对应标注的目标对象的类别都是人物,这些数据集的区别在于部分数据集是日间收集的数据,部分数据集时夜间收集的数据;或者,部分数据集是在人群密集的路口收集的数据,部分数据集是在人群稀疏的路口收集的数据。
步骤602、目标检测模型训练装置根据训练集对检测模型进行迭代训练,得到目标检测模型。
其中,目标检测模型即为经过迭代训练后,满足预设要求能够用于实际应用的检测模型。示例性地,满足预设要求可以是检测模型的检测结果的损失函数达到收敛。或者,检测模型的检测结果的准确率达到预设要求百分比,此处的准确率可采用mAP值。
可选地,检测模型是基于如图2所示的Yolov5架构检测模型来构建的。相应的,进行迭代训练后得到的目标检测模型也是基于如图2所示的Yolov5架构检测模型来构建的。
需要指出的是,此处根据训练集进行迭代训练后的目标检测模型,即为前述步骤502中的目标检测模型。具体对于目标检测模型的介绍参见前述步骤502,本实施例在此不再赘述。
需要说明的是,在迭代训练过程中,目标通道层所针对检测的目标对象的类别,与训练集中包括的所有数据集标注的目标对象的类别可以相同也可以不同。
示例性地,结合前述步骤601中的举例,所有数据集标注的目标对象的类别包括人、机动车、非机动车,则目标通道层检测的目标对象的类别可以包括人、机动车、非机动车。
亦或者,目标通道层检测的目标对象的类别也可以包括数据集标注目标对象的类别的子类别,例如目标通道层检测的目标对象的类别包括人、公交车、轿车、自行车、三轮车。其中的公交车和轿车即为第一类别中机动车的子类别,自行车和三轮车即为第一类别中非机动车的子类别。
在一种可能的实现方式中,目标检测模型训练装置根据训练集对检测模型进行迭代训练,包括:在检测模型确定出多种目标对象的检测结果后,根据多种目标对象的检测结果和第一损失函数,计算第一损失值,并据此调整 检测模型的参数。需要说明的是,目标检测模型训练装置根据训练集对检测模型进行迭代训练的详细流程参见下述步骤701-步骤704,此处不再赘述。
示例性地,结合前述步骤601中的举例,假设当前用于训练的数据集标注的对象是人物。若针对类别为机动车的目标对象,多个目标通道层中用于检测机动车的目标通道层在数据集对应的图像数据中检测到了机动车的存在,但是由于当前的数据集标注的对象是人物,因此目标检测模型训练装置不会将检测类别为机动车的目标通道层输出的检测结果代入后续的训练过程中。同理,除人物以外其他类型的目标通道层输出的检测结果也不会代入后续的训练过程中,只有类别为人物的目标通道层输出的检测结果会代入后续的训练过程中。
由此,标注类别为人物的数据集,其标注数据只会对检测类别为人物的目标通道层输出的检测结果产生影响,其他标注类别的数据集也是如此。这样一来,在训练过程中,就避免了某一类别的标注数据对其他类别的目标对象的检测结果产生负面影响,从而提高了目标检测模型训练的精准度。
基于上述技术方案,本公开在检测模型中设置了多个目标通道层,且这些目标通道层会分类别的对目标对象进行检测。由此针对某一类别的目标对象,在对应类别的目标通道层检测出该类目标对象后,若当前数据集未标注的该类别的目标对象,则在根据当前数据集对检测模型进行训练时,不会将该类别的目标对象对应的目标通道层的输出结果代入后续的训练过程中。这样一来,避免了目标通道层检测出数据集未标注的对象后,被数据集中针对其他类别对象的标注数据进行错误干预的情况发生,提高了检测模型的训练精度。
以下,结合上述步骤602,对检测器训练装置确定根据训练集对检测模型进行训练,得到目标检测模型的过程进行具体介绍。
作为本公开的一种可能的实施例,结合图6,如图7所示,上述步骤602具体包括以下步骤701-步骤704:
步骤701、目标检测模型训练装置将训练集输入检测模型,确定多种目标对象的检测结果。
示例性地,结合前文步骤602中的内容,由于检测模型是基于如图2所示的Yolov5架构检测模型来构建的,因此目标检测模型训练装置可以通过检测模型20中的输入模块21,将训练集输入至检测模型中。
可选地,多种目标对象的检测结果也是由检测预测值、类别预测值和坐标预测值进行得到的。也即,检测预测值、类别预测值和坐标预测值分别由 检测模型的类别通道层、目标通道层和坐标通道层确定得出。
步骤702、目标检测模型训练装置根据多种目标对象的检测结果和第一损失函数计算第一损失值。
可选地,第一损失计算函数包括目标损失函数、坐标损失函数、以及类别损失函数。
在一些实施例中,第一损失函数由目标损失函数、坐标损失函数、以及类别损失函数相加获得。
需要说明的是,具体目标检测模型训练装置根据多种目标对象的检测结果和第一损失函数计算第一损失值的流程,以及目标损失函数、坐标损失函数、类别损失函数的公式内容参见下述步骤901-步骤904,此处不再赘述。
步骤703、目标检测模型训练装置根据第一损失值调整检测模型的参数。
示例性地,在检测模型进行一次迭代检测后,目标检测模型训练装置判断此次检测结果的第一损失函数是否收敛。
若第一损失函数收敛,则目标检测模型训练装置确定检测模型训练完成,将此时的检测模型确定为目标检测模型。
若第一损失函数不收敛,则目标检测模型训练装置更新检测模型中的参数,进行下次迭代检测。若下次迭代中检测模型的第一损失函数收敛,则目标检测模型训练装置将此时的检测模型确定为目标检测模型;若下次迭代中检测模型的第一损失函数不收敛,则目标检测模型训练装置继续更新检测模型中的参数,直至检测模型的第一损失函数收敛。
步骤704、目标检测模型训练装置将第一损失函数收敛时的检测模型确定为目标检测模型。
可以理解的是,目标检测模型即为能够用于实际应用的检测模型。
基于上述技术方案,本公开根据第一损失函数对检测模型做多次训练,在训练过程中,通过不断更新检测模型中的参数,使得每次输出的检测结果更加接近训练集中对目标对象的标注数据所反映的正确结果。当第一损失函数的值逐步减小,直到第一损失函数的值不再减小,即损失函数收敛时,将此时的检测模型确定为目标检测模型。这样检测模型即可在后续实际应用中良好地完成对目标对象的检测。
以下结合步骤701,对目标检测模型训练装置将训练集输入检测模型,确定多种目标对象的检测结果的过程进行具体介绍。
作为本公开的一种可能的实施例,结合图5和图7,如图8所示,上述步骤701具体包括以下步骤801-步骤803:
步骤801、目标检测模型训练装置根据训练集确定通用类别的图像特征。
在一种可能的实现方式中,目标检测模型训练装置通过主干网络来确定通用类别的图像特征。
示例性地,检测模型中可包括主干网络,该主干网络可以是如图2所示的Backbone网络。通过该Backbone网络,目标检测模型训练装置可以对训练集包括的图像数据中通用类别的图像特征进行提取。在此说明,通过Backbone网络对训练集包括的图像数据中通用类别的图像特征进行提取的方法,本公开在此不再赘述。
可以理解的是,通过主干网络确定的通用类别的图像特征后,由于通用类别即包括了目标对象的类别,因此后续步骤802中过渡网络能够据此提取出与多种目标对象的图像特征。
步骤802、目标检测模型训练装置根据通用类别的图像特征,确定与多种目标对象相关的图像特征。
在一种可能的实现方式中,目标检测模型训练装置通过过渡网络来确定与多种目标对象相关的图像特征。
示例性地,检测模型中可包括过渡网络,该过渡网络可以是如图2所示的Neck网络。通过该Neck网络,目标检测模型训练装置可以对通用类别的图像特征进行提取,以确定与多种目标对象相关的图像特征。在此说明,通过Neck网络对通用类别的图像特征进行提取,以确定与多种目标对象相关的图像特征的方法,本公开在此不再赘述。
步骤803、目标检测模型训练装置根据与多种目标对象相关的图像特征,确定多种目标对象的检测结果。
在一种可能的实现方式中,目标检测模型训练装置通过检测网络来确定多种目标对象的检测结果。
示例性地,检测模型中可包括检测网络,该检测网络可以是如图2所示的Detection网络。通过该Detection网络,目标检测模型训练装置可以基于与多种目标对象相关的图像特征,来确定出多种目标对象的检测结果。
可选地,检测网络设有多个目标通道层、多个坐标通道层和多个类别通道层。下面对这三种通道层的作用进行介绍:
(1)目标通道层。
一个目标通道层用于检测多种目标对象中的至少一种目标对象在当前检测区域内是否存在。示例性地,此处目标通道层输出的结果为“存在”和“不存在”,例如,可输出为检测预测值yes或者no的形式。
可选地,目标检测模型训练装置预先设定判断阈值,之后,目标通道层确定目标对象在当前检测区域内存在的概率值,当目标通道层确定的概率大于或等于该判断阈值时,则目标检测模型训练装置确定当前检测区域内存在目标对象,目标通道层输出的结果为“存在”;同理,若目标通道层确定的概率小于该判断阈值时,则目标检测模型训练装置确定当前检测区域内不存在目标对象,目标通道层输出的结果为“不存在”。
示例性地,若针对某一类目标对象,目标通道层在当前检测区域确定该类目标对象存在的概率0.98。假设目标检测模型训练装置预先设定的判断阈值为0.9,由于0.98大于0.9,因此目标检测模型训练装置确定当前检测区域内存在目标对象。
在一种可能的实现方式中,目标通道层可以是如图3所示的Object通道层。
(2)坐标通道层。
在前述目标通道层确定当前检测区域内存在目标对象时,坐标通道层用于确定该目标对象存在的区域的坐标并输出,例如,可输出为坐标预测值(X,Y)的形式。
在一种可能的实现方式中,坐标通道层可以是如图3所示的Box通道层。
需要说明的是,对于每一个检测区域,当目标检测模型训练装置根据目标通道层确定该检测区域内存在目标对象,坐标通道层都会输出此检测区域的坐标。因此,在本公开中,由于目标通道层的数量由原来的一个变为多个(假设目标通道层有N个),相应的坐标通道层的数量也会变为原来的多倍(即坐标通道层的数量会变为原来的N倍)。
(3)类别通道层。
在前述目标通道层判断确定当前检测区域内存在目标对象时,类别通道层用于确定该目标对象存在的区域的类别并输出,例如,可输出为检测预测值person或car的形式。
在一种可能的实现方式中,类别通道层可以是如图3所示的Class通道层。
需要说明的是,类别通道层的数量与训练集中标注的目标对象的类别数量相同。
示例性地,目标通道层、坐标通道层以及类别通道层的输出结果,其形式可以是数学矩阵。目标通道层检测的当前图像区域,可以是与多种目标对象相关的图像特征中的一个像素点。
在一种可能的实现方式中,目标检测模型训练装置将多个目标通道层、 多个坐标通道层和多个类别通道层输出的结果合并,确定为多种目标对象的检测结果。
基于上述技术方案,本公开通过检测模型中设置的主干网络、过渡网络、检测网络,以及检测网络中设置的多个目标通道层、坐标通道层和类别通道层,能够根据训练集中包括的图像数据,来确定出多种目标对象的检测结果,以便于后续目标检测模型训练过程的进行。
以下结合步骤702,对目标检测模型训练装置根据训练集和多种目标对象的检测结果,确定第一损失函数的过程进行具体介绍。
作为本公开的一种可能的实施例,结合图7,如图9所示,上述步骤702具体包括以下步骤901-步骤904:
步骤901、目标检测模型训练装置根据多个目标通道层输出的结果、多种目标对象的标注数据和目标损失函数,确定目标损失值。
其中,目标损失值包括正样本的目标损失值和负样本的目标损失值。
在一种可能的实现方式中,目标(Object)损失函数满足以下公式1:
其中,Lobj+表示训练集中正样本的目标损失值,NP表示目标通道层的总数量,b表示目标通道层的编号,Target(b)表示第b个目标通道层对应的正样本的Anchor集合,BCELoss表示BCE损失函数,s表示正样本的编号,Pobj(s,b)表示第b个目标通道层与第s个正样本的Anchor对应的目标预测值,GTobj(s)表示第s个正样本的Anchor对应的目标真值;Lobj-表示训练集中负样本的目标损失值,Lobj(b)表示第b个目标通道层对应的第二类别子集,1(……)为取值函数,当输入为True时取值为1,否则取值为0,Ldata表示当前训练数据所标注的第一类别子集,H表示目标通道层数据矩阵的行数,W表示目标通道层数据矩阵的列数,p表示像素点的编号,Anchor表示全部的Anchor集合,a表示像素点p的Anchor,Mask(p,a)表示训练集数据对应的当前位置是否有标注框(根据是否有标注框,取值对应为0或1),Pobj(p,a,b)表示第b个目标通道层输出的像素点p的第a个Anchor的目标预测值,GTobj(p,a)表示像素点p的第a个Anchor的目标真值。应理解,前述目标真值根据训练集包括的多种目标对象的标注数据确定。
需要说明的是,上述正样本是指在目标通道层进行目标对象的检测时,针对于一个像素点,若该像素点有对应的标注数据,则确定该像素点为正样本;反之同理,若一个像素点没有对应的标注数据,则确定该像素点为负样本。应理解,若像素点为正样本,则代入正样本公式计算其Lobj+,若像素点为负样本,则代入负样本公式计算其Lobj-
步骤902、目标检测模型训练装置根据多个坐标通道层输出的结果、多种目标对象的标注数据和坐标损失函数,确定坐标损失值。
在一种可能的实现方式中,Box损失函数满足以下公式2:
其中,Lbox表示坐标损失值,NP表示目标通道层的总数量,b表示目标通道层的编号,Target(b)表示第b个目标通道层对应的正样本的Anchor集合,IOU表示重叠度(intersection over union,IOU)计算函数,s表示正样本的编号,Pbox(s,b)表示第b个目标通道层输出的第s个正样本的Box坐标预测值,GTbox(s)表示第s个正样本的Box坐标真值。应理解,前述目标真值根据训练集包括的多种目标对象的标注数据确定。
步骤903、目标检测模型训练装置根据多个类别通道层输出的结果、多种目标对象的标注数据和类别损失函数,确定类别损失值。
在一种可能的实现方式中,Class函数满足以下公式3:
其中,Lcls表示类别损失值,Class表示目标对象的类别总数,b表示目标通道层的编号,Bcls(b)表示第b个目标通道层对应的第二类别的集合,Len(Bcls(b))表示第b个目标通道层对应的第二类别的总数量,H表示目标通道层数据矩阵的行数,W表示目标通道层数据矩阵的列数,Anchor是指全部的Anchor集合,Mask(p,a)表示训练集数据对应的当前位置是否有标注框,BCELoss是指BCE损失函数,Pcls(p,a,c)是指类别预测值,GTcls(p,a,c)是指类别真值。
应理解,前述目标真值根据训练集包括的多个标注数据确定。需要说明的是,1[......]为取值函数(当输入为True时取值为1,否则取值为0)。
步骤904、目标检测模型训练装置将目标损失值、坐标损失值、以及类别损失函数值,得到第一损失值。
相对应的,在一种可能的实现方式中,目标检测模型训练装置将目标损失函数、坐标损失函数、以及类别损失函数相加,将相加后的公式结果作为第一损失函数。
基于上述技术方案,本公开通过检测模型中三种通道层输出的结果以及训练集,来确定出检测模型针对多种目标对象的检测结果的第一损失函数,该第一损失函数能够反映检测模型的检测结果与标注数据中的正确结果之间的差距,以便于后续流程中对检测模型中的参数进行调整,使得检测模型的 检测结果逐步接近标注数据中的正确结果。
以下对目标检测模型训练装置对目标检测模型进行准确率的验证的过程进行具体介绍。
作为本公开的一种可能的实施例,结合图6,如图10所示,在步骤602之后还包括步骤1001-步骤1002:
步骤1001、目标检测模型训练装置获取验证集。
其中,验证集包括多种目标对象的标注数据。示例性地,验证集包括多个验证数据集,每个验证数据集包括图像数据和对一种或多种类别的目标对象标注的数据。可以理解的是,验证集所标注目标对象的类别,与步骤401中训练集所标注目标对象的类别是相同的。
示例性地,用于训练检测模型的训练集,对应标注的目标对象的类别分别是人、机动车和非机动车。则用于验证目标检测模型的验证集,对应标注的目标对象的类别也分别是人、机动车和非机动车。
步骤1002、目标检测模型训练装置将多个验证数据集分别输入目标检测模型,得到多个验证数据集下的准确率。
可选地,目标检测模型训练装置根据验证集确定目标检测模型的验证检测结果。可以理解的是,目标检测模型训练装置根据验证集确定验证检测结果的方式,与目标检测模型训练装置根据训练集确定多种目标对象的检测结果的方式是相同的。具体可参考前述步骤1101-步骤1103中的叙述。
可选地,目标检测模型训练装置根据检测器的验证检测结果,确定检测模型的准确率。此处准确率的表现形式可以为mAP。基于上述技术方案,本公开还能在检测模型训练完成后,根据验证集进一步地验证检测器的准确率,使得检测模型在投入实际应用时能够有更好的检测效果。
作为本公开的一种可能的实施例,结合图6,如图11所示,本公开还提供的一种目标检测模型训练方法,包括以下步骤1101-步骤1103:
步骤1101、目标检测模型训练装置获取训练集。
其中,训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个数据集中的至少两个数据集标注目标对象的类别不同。
应理解,此处的训练集与前文步骤601所描述的训练集相同,本实施例不再赘述。
步骤1102、目标检测模型训练装置确定最优检测模型。
其中,最优检测模型为历史训练检测模型中准确率最高的检测模型,历 史训练检测模型包括每一次迭代训练后更新过参数的检测模型。
可选地,此处的准确率采用mAP值来评估,也即每一次迭代训练后更新过参数的检测模型中,mAP至最高的检测模型为最优检测模型。
需要说明的是,本实施例中的检测模型的结构可采用如前述实施例中相同的结构,也即与前文步骤502中所描述的相同。或者,实施例中的检测模型的结构也可采用其他卷积结构的模型。为了便于说明,下文以本实施例中的检测模型的结构采用如前述实施例中相同的结构为例,对本实施例的方案进行介绍。
步骤1103、目标检测模型训练装置根据训练集,对检测模型进行迭代训练,并根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型。
其中,目标检测模型训练装置根据训练集,对检测模型进行迭代训练的过程参见前文步骤701-步骤704,此处不再赘述。
在一种可能的实现方式中,目标检测模型训练装置根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型,可包括:根据最优检测模型,对训练集中每个训练数据集的缺失目标对象进行伪标签标注,得到正样本标签数据和负样本标签数据;其中,缺失目标对象为训练数据集未标注类别的目标对象;进而,目标检测模型训练装置根据所述正样本标签数据和正样本损失函数确定正样本损失值,根据所述负样本标签数据和负样本损失函数确定负样本损失值;最终,目标检测模型训练装置根据总损失值,调整所述检测模型的参数。
需要说明的是,目标检测模型训练装置根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型的具体流程可参见下述步骤1201-步骤1205,此处不再赘述。
其中,目标检测模型即为经过迭代训练后,满足预设要求能够用于实际应用的检测模型。示例性地,满足预设要求可以是检测模型的检测结果的总损失函数达到收敛。或者,检测模型的检测结果的准确率达到预设要求百分比,此处的准确率可采用mAP值。
应理解,总损失值可由第一损失值、正样本损失值和负样本损失值确定。对应的,所述总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。
基于上述技术方案,本公开实施例在每一次检测模型的迭代训练过程中,确定出历史准确率最高的最优检测模型,并由最优检测模型对训练集进行伪 标签标注,由此结合对训练集进行伪标签标注后得到的标注数据和真实训练集的标注数据,对检测模型进行融合训练,提高了最终得出的目标检测模型在跨场景下的检测召回率。并且,在目标检测模型的实际应用过程中,能够实现更高的检测准确率。
以下,结合上述步骤1103,对目标检测模型训练装置根据最优检测模型对进行迭代训练的检测模型进行伪标签标注,进行具体介绍:
作为本公开的一种可能的实施例,结合图11,如图12所示,上述步骤1103具体包括以下步骤1201-步骤1205:
步骤1201、目标检测模型训练装置根据最优检测模型,对训练集中每个训练数据集的缺失目标对象进行伪标签标注,得到正样本标签数据和负样本标签数据。
可选地,目标检测模型训练装置将训练集输入最优检测模型,确定最优检测模型对于每个目标对象的检测得分。
可选地,检测得分可实现为最优检测模型对于目标对象的置信度得分。
进一步地,正样本标签数据的判断方法为:对于每个目标对象,若最优检测模型对于目标对象的检测得分大于或等于正样本得分阈值,则确定该目标对象对应的标注数据为正样本标签数据。
以及,负样本标签数据的判断方法为:对于每个目标对象,若最优检测模型对于目标对象的检测得分小于或等于负样本得分阈值,则确定该目标对象对应的标注数据为负样本标签数据。
需要说明的是,正样本得分阈值和负样本得分阈值的确定过程参见下述步骤1301-步骤1303,此处不再赘述。
步骤1202、目标检测模型训练装置根据正样本标签数据和正样本损失函数,确定正样本损失值。
在一种可能的实现方式中,正样本损失函数满足以下公式4:
其中,Losspos表示所述正样本损失值,score(s)表示所述每个缺失目标对象的检测得分,THpos表示所述正样本得分阈值,BCELoss表示BCE损失函数,Ppos(s)表示第s个所述正样本标签数据的Anchor对应的预测值。
步骤1203、目标检测模型训练装置根据负样本标签数据和负样本损失函数,确定负样本损失值。
在一种可能的实现方式中,负样本损失函数满足以下公式5:
其中,Lossneg表示所述负样本损失函数,score(s)表示所述每个缺失目标对象的检测得分,THneg表示所述负样本得分阈值,BCELoss表示BCE损失函数,Pneg(s)表示第s个所述负样本标签数据的Anchor对应的预测值。
步骤1204、目标检测模型训练装置根据总损失值,调整检测模型的参数。
其中,总损失值根据第一损失值、正样本损失值和负样本损失值确定。
可以理解的是,若本实施例中的检测模型的结构与前述步骤502中所描述的相同,则本实施例中的总损失值由第一损失值、正样本损失值和负样本损失值,进行加权求和确定。其中,第一损失值的计算方式参见前文。
示例性地,目标检测模型训练装置预先确定第一权值、第二权值和第三权值,进而将第一权值与第一损失值的乘积、第二权值和正样本损失值的乘积、以及第三权值和负样本损失值的乘积,相加计算得出总损失值。
步骤1205、目标检测模型训练装置将总损失函数收敛时的检测模型确定为目标检测模型。
其中,总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。对应前述步骤1204的说明,总损失函数的公式即为第一损失函数与第一损失值相乘、正样本损失函数与第二权值相乘、负样本损失函数与第三权值相乘后,三者的乘积相加得出。
基于上述技术方案,本公开实施例能够在检测模型的迭代训练过程中,通过最优检测模型对训练集进行伪标签标注,确定训练集中的正样本标签数据和负样本标签数据,进而求得相应的损失值,并据此不断更新检测模型中的参数,使得每次输出的检测结果更加接近训练集中对目标对象的标注数据所反映的正确结果。由此得出的目标检测模型即可在后续实际应用中良好地完成对目标对象的检测。
以下,结合步骤1201,对正样本得分阈值和负样本得分阈值的确定流程进行说明:
作为本公开的一种可能的实施例,结合图12,如图13所示,上述步骤1201具体包括以下步骤1301-步骤1304:
步骤1301、目标检测模型训练装置获取验证集。
其中,验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据,检测模型的准确率根据验证集确定。
应理解,此处的验证集与前文描述的验证集相同,此处不再做过多说明。
步骤1302、目标检测模型训练装置确定最优检测模型对于验证集中每个目标对象的检测得分。
需要说明的是,检测得分是对最优检测模型一次检测结果的量化参数。具体对于一个目标对象,通过检测模型确定检测得分的过程,本实施例在此不再叙述。
步骤1303、目标检测模型训练装置根据每个目标对象的检测得分和预设召回率,确定负样本得分阈值。
示例性地,目标检测模型训练装置将预设召回率设置为0.95。此时,目标检测模型训练装置设置初始的负样本得分阈值并不断调整,直至最优检测器对于全部目标对象的检测得分的召回率满足预设召回率0.95,则将此时的负样本得分阈值输出,作为最终的负样本得分阈值。
步骤1304、目标检测模型训练装置根据每个目标对象的检测得分和预设精度,确定正样本得分阈值。
示例性地,目标检测模型训练装置将预设精度设置为0.95。此时,目标检测模型训练装置设置初始的正样本得分阈值并不断调整,直至最优检测器对于全部目标对象的检测得分的精度满足预设精度0.95,则将此时的正样本得分阈值输出,作为最终的正样本得分阈值。
基于上述技术方案,本公开实施例能够基于验证集和从历史检测模型中确定出的最优检测模型,确定出用于确定正样本标签数据、负样本标签数据的正样本得分阈值和负样本得分阈值,以便于后续训练过程的顺利进行。
本公开实施例可以根据上述方法示例对目标检测系统、目标检测模型训练装置进行功能模块或者功能单元的划分,例如,可以对应各个功能划分各个功能模块或者功能单元,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块或者功能单元的形式实现。其中,本公开实施例中对模块或者单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图14所示,为根据一些实施例提供的一种目标检测装置1400的结构示意图,该装置包括:获取单元1401和处理单元1402。
其中,获取单元1401,被配置为获取待检测图像。
所述处理单元1402,被配置为采用目标检测模型对所述待检测图像进行处理,得到所述待检测图像中待检测目标对应的目标检测结果。
在一些实施例中,处理单元1402,还被配置为:在与坐标通道层对应的 目标通道层计算得到的检测结果大于或等于阈值的情况下,获取坐标通道层的坐标预测值。
在一些实施例中,处理单元1402,还被配置为:在与坐标通道层对应的目标通道层计算得到的检测结果小于阈值的情况下,不获取坐标通道层的坐标预测值。
可选地,目标检测装置1400还可以包括存储单元(图14中以虚线框示出),该存储单元存储有程序或指令。当处理单元1402执行该程序或指令时,使得目标检测装置1400可以执行上述方法实施例所述的检测器训练方法。
此外,图14所述的目标检测装置1400的技术效果可以参考上述实施例所述的目标检测方法的技术效果,此处不再赘述。
如图15所示,为根据一些实施例提供的一种目标检测模型训练装置1500的结构示意图,该装置包括:获取单元1501和处理单元1502。
其中,获取单元1501,被配置为:获取训练集。训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个训练数据集中的至少两个数据集标注目标对象的类别不同。
处理单元1502,被配置为:根据训练集对检测器模型进行迭代训练,得到训练后的目标检测器模型。
在一些实施例中,处理单元1502,还被配置为:针对每一次迭代,将训练集输入检测器模型,确定多种目标对象的检测结果。
在一些实施例中,处理单元1502,还被配置为:根据多种目标对象的检测结果和第一损失函数计算第一损失值,并调整检测器模型的参数。第一损失函数包括目标损失函数、坐标损失函数、以及类别损失函数。
在一些实施例中,处理单元1502,还被配置为:将第一损失函数收敛时的检测器模型确定为训练后的目标检测器模型。
在一些实施例中,获取单元1501,还被配置为:获取验证集。验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据。
在一些实施例中,处理单元1502,还被配置为:将多个验证数据集分别输入目标检测模型,得到多个验证数据集下的准确率。
在一些实施例中,处理单元1502,还被配置为:将多个验证数据集下的准确率进行加和计算,作为训练后的目标检测模型的总准确率。或,将多个验证数据集的准确率,共同作为训练后的目标检测模型的总准确率。
可选地,目标检测模型训练装置1500还可以包括存储单元(图14中以 虚线框示出),该存储单元存储有程序或指令。当处理单元1502执行该程序或指令时,使得目标检测模型训练装置1500可以执行上述方法实施例所述的检测器训练方法。
此外,图15所述的目标检测模型训练装置1500的技术效果可以参考上述实施例所述的目标检测模型训练装置1500的技术效果,此处不再赘述。
如图16所示,为根据一些实施例提供的一种目标检测模型训练装置1600的结构示意图,该装置包括:获取单元1601和处理单元1602。
其中,获取单元,被配置为:获取训练集。训练集包括多个训练数据集,每个训练数据集包括一种或多种类别的目标对象的标注数据,多个数据集中的至少两个数据集标注目标对象的类别不同。
处理单元1602,被配置为:确定最优检测模型。最优检测模型为历史训练检测模型中准确率最高的检测模型,历史训练检测模型包括每一次迭代训练后更新过参数的检测模型。
处理单元1602,还被配置为:根据训练集,对检测模型进行迭代训练,并根据最优检测模型对训练集进行伪标签标注,继续训练检测模型得到目标检测模型。
在一些实施例中,处理单元1602,还被配置为:根据最优检测模型,确定伪标签数据。其中,伪标签数据包括多种缺失目标对象的标注数据,并且缺失目标对象的类别,与训练集包括的标注数据对应的目标对象的类别不同。
在一些实施例中,处理单元1602,还被配置为:根据伪标签数据,对训练集中缺失的目标对象进行标注,得到正样本标签数据和负样本标签数据。
在一些实施例中,处理单元1602,还被配置为:根据正样本标签数据和正样本损失函数,确定正样本损失值。
在一些实施例中,处理单元1602,还被配置为:根据负样本标签数据和负样本损失函数,确定负样本损失值。
在一些实施例中,处理单元1602,还被配置为:根据总损失值,调整检测模型的参数。总损失值根据第一损失值、正样本损失值和负样本损失值确定。
在一些实施例中,处理单元1602,还被配置为:将总损失函数收敛时的检测模型确定为目标检测模型。总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。
在一些实施例中,处理单元1602,还被配置为:将训练集输入最优检测模型,确定最优检测模型对于每个缺失目标对象的检测得分。
在一些实施例中,处理单元1602,还被配置为:对于每个目标对象,若最优检测模型对于缺失目标对象的检测得分大于或等于正样本得分阈值,则确定缺失目标对象对应的标注数据为正样本标签数据。
在一些实施例中,处理单元1602,还被配置为:对于每个目标对象,若最优检测模型对于缺失目标对象的检测得分小于或等于负样本得分阈值,则确定缺失目标对象对应的标注数据为负样本标签数据。
在一些实施例中,获取单元1601,还被配置为:获取验证集;验证集包括与多个训练数据集一一对应的多个验证数据集,每个验证数据集包括一种或多种目标对象的标注数据,检测模型的准确率根据验证集确定。
在一些实施例中,处理单元1602,还被配置为:确定最优检测模型对于验证集中每个目标对象的检测得分。
在一些实施例中,处理单元1602,还被配置为:根据每个目标对象的检测得分和预设召回率,确定负样本得分阈值。
在一些实施例中,处理单元1602,还被配置为:根据每个目标对象的检测得分和预设精度,确定正样本得分阈值。
在一些实施例中,处理单元1602,还被配置为:确定第一权值、第二权值和第三权值。
在一些实施例中,处理单元1602,还被配置为:根据第一权值与第一损失值的乘积、第二权值和正样本损失值的乘积、以及第三权值和负样本损失值的乘积,确定总损失值。
可选地,目标检测模型训练装置1600还可以包括存储单元(图14中以虚线框示出),该存储单元存储有程序或指令。当处理单元1602执行该程序或指令时,使得目标检测模型训练装置1600可以执行上述方法实施例所述的检测器训练方法。
此外,图16所述的目标检测模型训练装置1600的技术效果可以参考上述实施例所述的目标检测模型训练装置1600的技术效果,此处不再赘述。
图17示出了上述实施例中所涉及的目标检测装置的又一种可能的结构示意图。该目标检测装置1700包括:处理器1702和通信接口1703。处理器1702被配置为对目标检测装置1700的动作进行控制管理,例如,执行上述获取单元1401、处理单元1402执行的步骤,和/或被配置为执行本文所描述的技术的其它过程。通信接口1703被配置为支持目标检测装置1700与其他网络实体的通信。目标检测装置1700还可以包括存储器1701和总线1704,存储器1701被配置为存储目标检测装置1700的程序代码和数据。
其中,存储器1701可以是目标检测装置1700中的存储器等,该存储器可以包括易失性存储器,例如随机存取存储器;该存储器也可以包括非易失性存储器,例如只读存储器,快闪存储器,硬盘或固态硬盘;该存储器还可以包括上述种类的存储器的组合。
上述处理器1702可以是实现或执行结合本公开的公开内容所描述的各种示例性地逻辑方框,模块和电路。该处理器可以是中央处理器,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本公开的公开内容所描述的各种示例性地逻辑方框,模块和电路。该处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线1704可以是扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线1704可以分为地址总线、数据总线、控制总线等。为便于表示,图17中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
图17中的目标检测装置1700还可以为芯片。该芯片包括一个或两个以上(包括两个)处理器1702和通信接口1703。
可选地,该芯片还包括存储器1701,存储器1701可以包括只读存储器和随机存取存储器,并向处理器1702提供操作指令和数据。存储器1701的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。
在一些实施方式中,存储器1701存储了如下的元素,执行模块或者数据结构,或者他们的子集,或者他们的扩展集。
在本公开实施例中,通过调用存储器1701存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。
图18示出了上述实施例中所涉及的目标检测模型训练装置的又一种可能的结构示意图。该目标检测模型训练装置1800包括:处理器1802和通信接口1803。处理器1802被配置为对目标检测模型训练装置1800的动作进行控制管理,例如,执行上述获取单元1501、处理单元1502、获取单元1601、处理单元1602执行的步骤,和/或被配置为执行本文所描述的技术的其它过程。通信接口1803被配置为支持目标检测模型训练装置1800与其他网络实体的通信。目标检测模型训练装置1800还可以包括存储器1801和总线1804,存储器1801被配置为存储目标检测模型训练装置1800的程序代码和数据。
其中,存储器1801可以是目标检测模型训练装置1800中的存储器等,该存储器可以包括易失性存储器,例如随机存取存储器;该存储器也可以包括非易失性存储器,例如只读存储器,快闪存储器,硬盘或固态硬盘;该存储器还可以包括上述种类的存储器的组合。
上述处理器1802可以是实现或执行结合本公开的公开内容所描述的各种示例性地逻辑方框,模块和电路。该处理器可以是中央处理器,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本公开的公开内容所描述的各种示例性地逻辑方框,模块和电路。该处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线1804可以是扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线1804可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
图18中的目标检测模型训练装置1800还可以为芯片。该芯片包括一个或两个以上(包括两个)处理器1802和通信接口1803。
可选地,该芯片还包括存储器1801,存储器1801可以包括只读存储器和随机存取存储器,并向处理器1802提供操作指令和数据。存储器1801的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。
在一些实施方式中,存储器1801存储了如下的元素,执行模块或者数据结构,或者他们的子集,或者他们的扩展集。
在本公开实施例中,通过调用存储器1801存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本公开的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),该计算机可读存储介质中存储有计算机程序指令,计 算机程序指令在计算机(例如,检测器训练装置)上运行时,使得计算机执行如上述实施例中任一实施例所述的目标检测方法及目标检测器模型训练方法。
示例性地,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,CD(Compact Disk,压缩盘)、DVD(Digital Versatile Disk,数字通用盘)等),智能卡和闪存器件(例如,EPROM(Erasable Programmable Read-Only Memory,可擦写可编程只读存储器)、卡、棒或钥匙驱动器等)。本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
本公开的一些实施例还提供了一种计算机程序产品,例如该计算机程序产品存储在非瞬时性的计算机可读存储介质上。该计算机程序产品包括计算机程序指令,在计算机(例如,检测器训练装置)上执行该计算机程序指令时,该计算机程序指令使计算机执行如上述实施例所述的目标检测方法及目标检测器模型训练方法。
本公开的一些实施例还提供了一种计算机程序。当该计算机程序在计算机(例如,检测器训练装置)上执行时,该计算机程序使计算机执行如上述实施例所述的目标检测方法及目标检测器模型训练方法。
上述计算机可读存储介质、计算机程序产品及计算机程序的有益效果和上述一些实施例所述的目标检测方法及目标检测器模型训练方法的有益效果相同,此处不再赘述。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种目标检测方法,包括:
    获取待检测图像;
    采用目标检测模型对所述待检测图像进行处理,得到所述待检测图像中待检测目标对应的目标检测结果;
    其中,所述目标检测模型包括特征提取网络和目标预测网络;
    所述特征提取网络用于对所述待检测图像进行特征提取得到多种目标对象相关的图像特征;
    所述目标检测网络用于对所述图像特征进行处理得到所述目标检测结果;
    所述目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;所述目标通道层用于输出表征是否存在目标对象的检测预测值,每个所述目标通道层用于检测所述多种目标对象中的至少一种,多个所述目标通道层用于检测的目标对象的类别不同;所述类别通道层用于输出所述多种目标对象对应的类别预测值;所述坐标通道层用于输出目标对象对应的坐标预测值;所述目标检测结果是基于所述检测预测值、所述类别预测值和所述坐标预测值计算得到的。
  2. 根据权利要求1所述的方法,其中,所述多个所述坐标通道层和多个所述目标通道层一一对应,每个坐标通道层和与其对应的目标通道层检测的目标对象的类别相同;所述坐标通道层用于在对应的目标通道层检测到所述目标对象时获取所述目标对象的坐标预测值。
  3. 根据权利要求2所述的方法,其中,所述目标检测结果包括检测结果和坐标结果;所述检测结果为根据所述目标通道层的检测预测值和对应的类别预测值融合计算得到;
    所述坐标通道层用于在对应的所述目标通道层检测到所述目标对象时确定所述目标对象的坐标预测值,包括:在与所述坐标通道层对应的目标通道层计算得到的所述检测结果大于或等于阈值的情况下,获取所述坐标通道层的坐标预测值;在与所述坐标通道层对应的目标通道层计算得到的所述检测结果小于阈值的情况下,不获取所述坐标通道层的坐标预测值。
  4. 根据权利要求3所述的方法,其中,所述坐标通道层、所述多个目标通道层和所述多个坐标通道层的结构为卷积结构;所述卷积结构的卷积核大小为一乘一。
  5. 根据权利要求4所述的方法,其中,所述特征提取网络包括主干网络和过渡网络,所述主干网络用于根据所述待检测图像确定通用类别的图像特征,所述过渡网络用于根据所述通用类别的图像特征确定与所述多种目标对 象相关的图像特征。
  6. 一种目标检测模型训练方法,包括:
    获取训练集;所述训练集包括多个训练数据集,每个所述训练数据集包括一种或多种类别的目标对象的标注数据,多个所述训练数据集中的至少两个数据集标注目标对象的类别不同;
    根据所述训练集对检测模型进行迭代训练,得到目标检测模型;
    其中,所述目标检测模型包括特征提取网络和目标预测网络;
    所述特征提取网络用于对所述待检测图像进行特征提取得到多种目标对象相关的图像特征;
    所述目标检测网络用于对所述图像特征进行处理得到所述目标检测结果;
    所述目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;所述目标通道层用于输出表征是否存在目标对象的检测预测值,每个所述目标通道层用于检测所述多种目标对象中的至少一种,多个所述目标通道层用于检测的目标对象的类别不同;所述类别通道层用于输出所述多种目标对象对应的类别预测值;所述坐标通道层用于输出目标对象对应的坐标预测值;所述目标检测结果是基于所述检测预测值、所述类别预测值和所述坐标预测值计算得到的。
  7. 根据权利要求6所述的方法,其中,所述根据所述训练集对检测模型进行迭代训练,得到目标检测模型,包括:
    针对每一次迭代,将所述训练集输入所述检测模型,确定所述多种目标对象的检测结果;
    根据所述多种目标对象的检测结果和第一损失函数计算第一损失值,并调整所述检测模型的参数;所述第一损失函数包括目标损失函数、坐标损失函数、以及类别损失函数;
    将所述第一损失函数收敛时的检测模型确定为所述目标检测模型。
  8. 根据权利要求7所述的方法,其中,所述目标损失函数满足以下公式:
    其中,Lobj+表示所述训练集中正样本的目标损失值,NP表示所述目标通道层的总数量,b表示所述目标通道层的编号,Target(b)表示第b个所述目标通道层对应的所述正样本的Anchor集合,BCELoss表示BCE损失函数,s表示所述正样本的编号,Pobj(s,b)表示第b个所述目标通道层与第s个所述 正样本的Anchor对应的目标预测值,GTobj(s)表示第s个正样本的Anchor对应的目标真值;Lobj-表示所述训练集中负样本的目标损失值,1(……)为取值函数,当输入为True时取值为1,否则取值为0,Lobj(b)表示第b个所述目标通道层对应的所述目标对象的类别子集,Ldata表示当前训练数据所标注的目标对象的类别集合,H表示目标通道层输出的数据矩阵的行数,W表示目标通道层输出的数据矩阵的列数,p表示像素点的编号,Anchor表示全部的Anchor集合,a表示像素点p的Anchor,Mask(p,a)表示所述像素点p对应的位置是否有标注框,Pobj(p,a,b)表示第b个所述目标通道层输出的所述像素点p的第a个Anchor的目标预测值,GTobj(p,a)表示所述像素点p的第a个Anchor的目标真值。
  9. 根据权利要求7所述的方法,其中,所述坐标损失函数满足以下公式:
    其中,Lbox表示坐标损失值,NP表示所述目标通道层的总数量,b表示所述目标通道层的编号,Target(b)表示第b个所述目标通道层对应的正样本的Anchor集合,IOU表示重叠度(intersection over union,IOU)计算函数,s表示所述正样本的编号,Pbox(s,b)表示第b个所述目标通道层输出的第s个所述正样本的坐标预测值,GTbox(s)表示第s个所述正样本的坐标真值。
  10. 根据权利要求7所述的方法,其中,所述类别损失函数满足以下公式:
    其中,Lcls表示类别损失值,Class表示所述目标对象的类别总数,1[……]为取值函数,当输入为True时取值为1,否则取值为0,b表示所述目标通道层的编号,Bcls(b)表示第b个所述目标通道层对应的第二类别的集合,Len(Bcls(b))表示第b个目标通道层对应的所述目标对象的类别子集,H表示所述目标通道层输出的数据矩阵的行数,W表示所述目标通道层输出的数据矩阵的列数,Anchor表示全部的Anchor集合,Mask(p,a)表示像素点p对应的位置是否有标注框,BCELoss表示BCE损失函数,Pcls(p,a,c)表示类别预测值,GTcls(p,a,c)表示类别真值。
  11. 根据权利要求6-10任一项所述的方法,其中,还包括:
    获取验证集;所述验证集包括与所述多个训练数据集一一对应的多个验证数据集,每个所述验证数据集包括一种或多种目标对象的标注数据;
    将多个所述验证数据集分别输入所述目标检测模型,得到多个所述验证数据集下的准确率;
    将多个所述验证数据集下的准确率进行加和计算,作为所述训练后的目标检测模型的总准确率;或,将多个所述验证数据集的准确率,共同作为训练后的目标检测模型的总准确率。
  12. 一种目标检测模型训练方法,包括:
    获取训练集;所述训练集包括多个训练数据集,每个所述训练数据集包括一种或多种类别的目标对象的标注数据,多个所述数据集中的至少两个数据集标注目标对象的类别不同;
    确定最优检测模型;所述最优检测模型为历史训练检测模型中准确率最高的检测模型,所述历史训练检测模型包括每一次迭代训练后更新过参数的所述检测模型;
    根据所述训练集,对所述检测模型进行迭代训练,并根据所述最优检测模型对所述训练集进行伪标签标注,继续训练所述检测模型得到所述目标检测模型。
  13. 根据权利要求12所述的方法,其中,所述根据所述最优检测模型对所述训练集进行伪标签标注,得到所述目标检测模型,包括:
    根据所述最优检测模型,对所述训练集中每个所述训练数据集的缺失目标对象进行伪标签标注,得到正样本标签数据和负样本标签数据;其中,所述缺失目标对象为所述训练数据集未标注类别的目标对象;
    根据所述正样本标签数据和正样本损失函数,确定正样本损失值;
    根据所述负样本标签数据和负样本损失函数,确定负样本损失值;
    根据总损失值,调整所述检测模型的参数;所述总损失值根据第一损失值、所述正样本损失值和所述负样本损失值确定;
    将总损失函数收敛时的检测模型确定为所述目标检测模型;所述总损失函数包括第一损失函数、正样本损失函数、负样本损失函数。
  14. 根据权利要求13所述的方法,其中,所述根据所述最优检测模型,对所述训练集中的缺失目标对象进行标注,得到正样本标签数据和负样本标签数据,包括:
    将所述训练集输入所述最优检测模型,确定所述最优检测模型对于每个缺失目标对象的检测得分;
    对于每个缺失目标对象,若所述最优检测模型对于所述缺失目标对象的检测得分大于或等于正样本得分阈值,则确定所述缺失目标对象对应的标注数据为所述正样本标签数据;
    对于每个缺失目标对象,若所述最优检测模型对于所述缺失目标对 象的检测得分小于或等于负样本得分阈值,则确定所述缺失目标对象对应的标注数据为所述负样本标签数据。
  15. 根据权利要求14所述的方法,其中,所述正样本得分阈值和所述负样本得分阈值根据以下步骤确定:
    获取验证集;所述验证集包括与所述多个训练数据集一一对应的多个验证数据集,每个所述验证数据集包括一种或多种目标对象的标注数据,所述检测模型的准确率根据所述验证集确定;
    确定所述最优检测模型对于所述验证集中每个目标对象的检测得分;
    根据所述每个目标对象的检测得分和预设召回率,确定负样本得分阈值;
    根据所述每个目标对象的检测得分和预设精度,确定正样本得分阈值。
  16. 根据权利要求12-15中任一项所述的方法,其中,还包括:
    确定第一权值、第二权值和第三权值;
    根据所述第一权值与所述第一损失值的乘积、所述第二权值和所述正样本损失值的乘积、以及所述第三权值和所述负样本损失值的乘积,确定所述总损失值。
  17. 一种目标检测装置,包括:获取单元和处理单元;
    所述获取单元,被配置为获取待检测图像;
    所述处理单元,被配置为采用目标检测模型对所述待检测图像进行处理,得到所述待检测图像中待检测目标对应的目标检测结果;
    其中,所述目标检测模型包括特征提取网络和目标预测网络;
    所述特征提取网络用于对所述待检测图像进行特征提取得到多种目标对象相关的图像特征;
    所述目标检测网络用于对所述图像特征进行处理得到所述目标检测结果;
    所述目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;所述目标通道层用于输出表征是否存在目标对象的检测预测值,每个所述目标通道层用于检测所述多种目标对象中的至少一种,多个所述目标通道层用于检测的目标对象的类别不同;所述类别通道层用于输出所述多种目标对象对应的类别预测值;所述坐标通道层用于输出目标对象对应的坐标预测值;所述目标检测结果是基于所述检测预测值、类别预测值和坐标预测值计算得到的。
  18. 一种目标检测模型训练装置,包括:获取单元和处理单元;
    所述获取单元,被配置为获取训练集;所述训练集包括多个训练数据集,每个所述训练数据集包括一种或多种类别的目标对象的标注数据,多个所述训练数据集中的至少两个数据集标注目标对象的类别不同;
    所述处理单元,被配置为根据所述训练集对检测模型进行迭代训练,得到目标检测模型;
    其中,所述目标检测模型包括特征提取网络和目标预测网络;
    所述特征提取网络用于对所述待检测图像进行特征提取得到多种目标对象相关的图像特征;
    所述目标检测网络用于对所述图像特征进行处理得到所述目标检测结果;
    所述目标检测网络包括类别通道层、多个目标通道层和多个坐标通道层;所述目标通道层用于输出表征是否存在目标对象的检测预测值,每个所述目标通道层用于检测所述多种目标对象中的至少一种,多个所述目标通道层用于检测的目标对象的类别不同;所述类别通道层用于输出所述多种目标对象对应的类别预测值;所述坐标通道层用于输出目标对象对应的坐标预测值;所述目标检测结果是基于所述检测预测值、类别预测值和坐标预测值计算得到的。
  19. 一种目标检测模型训练装置,包括:获取单元和处理单元;
    获取单元,被配置为获取训练集;所述训练集包括多个训练数据集,每个所述训练数据集包括一种或多种类别的目标对象的标注数据,多个所述数据集中的至少两个数据集标注目标对象的类别不同;
    所述处理单元,被配置为确定最优检测模型;所述最优检测模型为历史训练检测模型中准确率最高的检测模型,所述历史训练检测模型包括每一次迭代训练后更新过参数的所述检测模型;
    所述处理单元,还被配置为根据所述训练集,对所述检测模型进行迭代训练,并根据所述最优检测模型对所述训练集进行伪标签标注,继续训练所述检测模型得到所述目标检测模型。
  20. 一种非暂态计算机可读存储介质,其中,所述非暂态计算机可读存储介质中存储有指令,当计算机执行所述指令时,所述计算机执行上述权利要求1-5中任一项所述的目标检测方法,和/或执行上述权利要求6-11或如权利要求12-16中任一项所述的目标检测模型训练方法。
  21. 一种电子设备,所述电子设备包括存储器和处理器,所述存储器存 储计算机程序指令,所述处理器执行所述计算机程序指令时,实现上述权利要求1-5中任一项所述的目标检测方法,和/或执行上述权利要求6-11或如权利要求12-16中任一项所述的目标检测模型训练方法。
PCT/CN2023/078250 2022-02-25 2023-02-24 一种目标检测方法、目标检测模型训练方法及装置 WO2023160666A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202380007919.0A CN116964588A (zh) 2022-02-25 2023-02-24 一种目标检测方法、目标检测模型训练方法及装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2022/078114 2022-02-25
PCT/CN2022/078114 WO2023159527A1 (zh) 2022-02-25 2022-02-25 检测器训练方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2023160666A1 true WO2023160666A1 (zh) 2023-08-31

Family

ID=87764327

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2022/078114 WO2023159527A1 (zh) 2022-02-25 2022-02-25 检测器训练方法、装置及存储介质
PCT/CN2023/078250 WO2023160666A1 (zh) 2022-02-25 2023-02-24 一种目标检测方法、目标检测模型训练方法及装置

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078114 WO2023159527A1 (zh) 2022-02-25 2022-02-25 检测器训练方法、装置及存储介质

Country Status (2)

Country Link
CN (2) CN117083621A (zh)
WO (2) WO2023159527A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117541782A (zh) * 2024-01-09 2024-02-09 北京闪马智建科技有限公司 对象的识别方法、装置、存储介质及电子装置
CN118038190A (zh) * 2024-04-09 2024-05-14 深圳精智达技术股份有限公司 一种深度原型网络的训练方法、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325538A (zh) * 2018-09-29 2019-02-12 北京京东尚科信息技术有限公司 目标检测方法、装置和计算机可读存储介质
CN111915020A (zh) * 2020-08-12 2020-11-10 杭州海康威视数字技术股份有限公司 检测模型的更新方法、装置及存储介质
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium
CN112560999A (zh) * 2021-02-18 2021-03-26 成都睿沿科技有限公司 一种目标检测模型训练方法、装置、电子设备及存储介质
CN113239982A (zh) * 2021-04-23 2021-08-10 北京旷视科技有限公司 检测模型的训练方法、目标检测方法、装置和电子系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11853884B2 (en) * 2017-02-10 2023-12-26 Synaptics Incorporated Many or one detection classification systems and methods
CN109871730A (zh) * 2017-12-05 2019-06-11 杭州海康威视数字技术股份有限公司 一种目标识别方法、装置及监控设备
CN110348303A (zh) * 2019-06-06 2019-10-18 武汉理工大学 一种可搭载于无人艇的辅助水面巡逻系统以及水面监测方法
US11238314B2 (en) * 2019-11-15 2022-02-01 Salesforce.Com, Inc. Image augmentation and object detection
CN111860510B (zh) * 2020-07-29 2021-06-18 浙江大华技术股份有限公司 一种x光图像目标检测方法及装置
CN112418278A (zh) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 一种多类物体检测方法、终端设备及存储介质
CN112488098A (zh) * 2020-11-16 2021-03-12 浙江新再灵科技股份有限公司 一种目标检测模型的训练方法
CN113095434B (zh) * 2021-04-27 2024-06-11 深圳市商汤科技有限公司 目标检测方法及装置、电子设备、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325538A (zh) * 2018-09-29 2019-02-12 北京京东尚科信息技术有限公司 目标检测方法、装置和计算机可读存储介质
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium
CN111915020A (zh) * 2020-08-12 2020-11-10 杭州海康威视数字技术股份有限公司 检测模型的更新方法、装置及存储介质
CN112560999A (zh) * 2021-02-18 2021-03-26 成都睿沿科技有限公司 一种目标检测模型训练方法、装置、电子设备及存储介质
CN113239982A (zh) * 2021-04-23 2021-08-10 北京旷视科技有限公司 检测模型的训练方法、目标检测方法、装置和电子系统

Also Published As

Publication number Publication date
CN117083621A (zh) 2023-11-17
WO2023159527A1 (zh) 2023-08-31
CN116964588A (zh) 2023-10-27

Similar Documents

Publication Publication Date Title
CN111476284B (zh) 图像识别模型训练及图像识别方法、装置、电子设备
CN109978893B (zh) 图像语义分割网络的训练方法、装置、设备及存储介质
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
WO2023160666A1 (zh) 一种目标检测方法、目标检测模型训练方法及装置
CN110020592B (zh) 物体检测模型训练方法、装置、计算机设备及存储介质
CN108537215B (zh) 一种基于图像目标检测的火焰检测方法
CN108230323B (zh) 一种基于卷积神经网络的肺结节假阳性筛选方法
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
WO2019200747A1 (zh) 分割股骨近端的方法、装置、计算机设备和存储介质
CN110599537A (zh) 基于Mask R-CNN的无人机图像建筑物面积计算方法及系统
CN111709409A (zh) 人脸活体检测方法、装置、设备及介质
CN113642431B (zh) 目标检测模型的训练方法及装置、电子设备和存储介质
CN112819821B (zh) 一种细胞核图像检测方法
CN115797736B (zh) 目标检测模型的训练和目标检测方法、装置、设备和介质
CN111368634A (zh) 基于神经网络的人头检测方法、系统及存储介质
CN110659601A (zh) 基于中心点的深度全卷积网络遥感图像密集车辆检测方法
CN113221956A (zh) 基于改进的多尺度深度模型的目标识别方法及装置
CN111369524A (zh) 结节识别模型训练方法、结节识别方法及装置
CN111144462A (zh) 一种雷达信号的未知个体识别方法及装置
CN117036834A (zh) 基于人工智能的数据分类方法、装置及电子设备
CN111832463A (zh) 一种基于深度学习的交通标志检测方法
WO2024021321A1 (zh) 模型生成的方法、装置、电子设备和存储介质
CN110738208A (zh) 一种高效的尺度规范化目标检测训练方法
CN116052189A (zh) 一种文本识别方法、系统和存储介质
CN115659221A (zh) 一种教学质量的评估方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202380007919.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23759288

Country of ref document: EP

Kind code of ref document: A1