WO2022000855A1 - Target detection method and device - Google Patents
Target detection method and device Download PDFInfo
- Publication number
- WO2022000855A1 WO2022000855A1 PCT/CN2020/121337 CN2020121337W WO2022000855A1 WO 2022000855 A1 WO2022000855 A1 WO 2022000855A1 CN 2020121337 W CN2020121337 W CN 2020121337W WO 2022000855 A1 WO2022000855 A1 WO 2022000855A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- target
- image
- frame
- information
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 363
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims description 33
- 239000000284 extract Substances 0.000 claims description 14
- 238000012216 screening Methods 0.000 claims description 13
- 230000001629 suppression Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 description 33
- 238000010586 diagram Methods 0.000 description 4
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present invention relates to the technical field of target detection, and in particular, to a target detection method and device.
- the target detection model is used to detect the image to be detected, and the obtained detection results generally include the position information of the target detection frame corresponding to the detection target and the corresponding category probability information, and the target detection model is used in the target detection.
- the model selects the target detection frame position information corresponding to the final output target from the position information of the multiple candidate detection frames predicted by the image to be detected, and uses the category probability information corresponding to the position information of each target detection frame to screen.
- the target detection frame position information corresponding to the final output target is obtained, wherein the category probability information is a confidence level indicating that the corresponding target is a certain category.
- the invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy.
- the specific technical solutions are as follows:
- an embodiment of the present invention provides a target detection method, the method comprising:
- a target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection result includes: target detection frame position information corresponding to the detected target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, the sample image corresponding The sample frame quality information is: information determined based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
- the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
- the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
- the method further includes:
- the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
- the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
- sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
- the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
- the expression of the preset positioning quality focusing loss function is:
- LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
- the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
- P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
- q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
- ⁇ represents the preset parameter.
- the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated.
- the sample category information corresponding to the sample image is indicated.
- the step of using a pre-established target detection model and the to-be-detected image to determine the target detection result corresponding to the to-be-detected image includes:
- the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
- an embodiment of the present invention provides a target detection device, the device comprising:
- an obtaining module configured to obtain an image to be detected
- a determination module configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: the detected target in the to-be-detected image corresponds to The target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model information, definite information.
- the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
- the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
- the device further includes:
- the model training module is configured to use the pre-established target detection model and the to-be-detected image to detect, from the to-be-detected image, the target detection frame position information and target detection frame corresponding to the target to be detected.
- the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein the initial target detection model includes Feature extraction layer, feature classification layer and feature regression layer;
- the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
- sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
- the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
- the expression of the preset positioning quality focusing loss function is:
- LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
- the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
- P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
- q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
- ⁇ represents the preset parameter.
- the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated.
- the sample category information corresponding to the sample image is indicated.
- the determining module is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image feature corresponding to the to-be-detected image;
- the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
- a target detection method and device provided in the embodiments of the present invention obtain an image to be detected; a target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection
- the results include: the target detection frame position information corresponding to the detection target in the image to be detected and the target frame quality information corresponding to the target detection frame position information.
- the pre-established target detection model is: based on the sample image and its corresponding calibration information and the corresponding sample
- the model obtained by training the frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the sample detected based on the initial target detection model corresponding to the pre-established target detection model
- the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
- the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model
- the pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and the sample frame quality
- the information is the position information of the calibration frame based on the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
- the determined information is obtained through the pre-established target detection model.
- the frame quality information corresponding to the frame position information corresponding to the target in the predicted image of the target detection model can be used to screen out the frame position information with better frame quality information corresponding to the detection target as the target detection frame position information, so as to realize the corresponding target in the image.
- the accuracy of the detection frame is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
- the ratio information of the set area is used as the quality information of the sample frame corresponding to the sample image, so that the pre-established target detection model can learn the prediction function that is more in line with the actual frame quality, which is the prediction function of the frame quality information corresponding to the subsequent frame position information, and Provides a basis for the screening of frame position information based on frame quality information.
- the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding indicates the sample corresponding to the sample image.
- Category information realizes the joint representation of category information and frame quality information.
- the position information of the candidate frame corresponding to the image to be detected is determined through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then, combined with the pre-established The feature classification layer of the target detection model, determines the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected;
- the quality information of the target frame corresponding to the position information of the candidate frame from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the to-be-detected image.
- the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
- FIG. 1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention.
- Fig. 2 is a kind of schematic flowchart of the process of training to obtain a pre-established target detection model
- FIG. 3 is a schematic diagram of category information and frame quality information jointly represented
- FIG. 4 is a schematic structural diagram of a target detection apparatus provided by an embodiment of the present invention.
- the invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy.
- the embodiments of the present invention will be described in detail below.
- FIG. 1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention. The method may include the following steps:
- the target detection method provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server.
- the electronic device may be an in-vehicle device, installed on the vehicle, and the vehicle may also be provided with an image acquisition device, which may collect images for the environment in which the vehicle is located, and the electronic device is connected to the image acquisition device, The image collected by the image collection device can be obtained as the image to be detected.
- the electronic device may be a non-vehicle device, and the electronic device may be connected to an image acquisition device that captures the target scene to obtain an image captured by the image capture device for the target scene as an image to be detected.
- the target scene can be a road scene or a square scene or an indoor scene, which is all possible.
- the to-be-detected image may be an RGB (Red Green Blue, red, green, blue) image or an infrared image, which are all possible.
- RGB Red Green Blue, red, green, blue
- the embodiment of the present invention does not limit the type of the image to be detected.
- S102 Determine a target detection result corresponding to the to-be-detected image by using the pre-established target detection model and the to-be-detected image.
- the target detection result includes: target detection frame position information corresponding to the detection target in the to-be-detected image and target frame quality information corresponding to the target detection frame position information, and the pre-established target detection model is: based on the sample image and its corresponding calibration information and the model obtained by training the corresponding sample frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model based on the pre-established target detection model corresponding to the detection model
- the position information of the prediction frame corresponding to the sample image obtained is determined information.
- the electronic device or the connected storage device locally stores a pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, wherein the pre-established target detection model is in the training process.
- the preset positioning quality focusing loss function is used to adjust the corresponding model parameters.
- the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, definitive information.
- the pre-established target detection model trained based on the sample images and their corresponding calibration information and the corresponding sample frame quality information has the ability to predict the quality corresponding to each detection frame, that is, the frame quality information corresponding to the position information of each detection frame.
- the frame quality information can represent the accuracy of the corresponding detection frame position information obtained by detection.
- the frame quality information corresponding to the detection frame position information can be represented by numerical values. The more consistent the characterizing location area is with the location area where the target is located. Among them, for the layout situation, the training process of the pre-established target detection model will be described later.
- the electronic device inputs the image to be detected into the pre-established target detection model, uses the pre-established target detection model to extract the image features of the image to be detected, and obtains the image features to be detected; and uses the pre-established target detection model to regress the image features to be detected.
- multiple candidate detection frames are regressed from the image to be detected, and their position information is obtained; the position information of each candidate detection frame is predicted by using the pre-established target detection model, the position information of multiple candidate detection frames and the characteristics of the image to be detected The corresponding frame quality information, and then use the frame quality information to filter the position information of multiple candidate detection frames, and screen out the frame quality information corresponding to each detection target to represent the candidate detection frame position information of the corresponding candidate detection frame.
- Obtain the target frame quality information including the target detection frame position information corresponding to the detection target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information.
- the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
- the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model,
- the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information.
- the accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
- the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
- the ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image obtained.
- the intersection area of the prediction frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model and the calibration frame position information in the calibration information corresponding to the sample image, and the combination of the above two
- the ratio information of the collection area is determined as the quality information of the sample frame corresponding to the sample image.
- the quality information of the sample frame corresponding to the sample image can be represented by a numerical value, and the value range of the numerical value can be [0, 1].
- the target detection result may further include: detection category information corresponding to the detection target in the image to be detected.
- the calibration information corresponding to the sample image may also include calibration category information, so that the pre-established target detection model obtained by training has the ability to predict the category of the target in the image.
- the method may further include:
- the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
- the calibration information includes: calibration frame position information and calibration category information corresponding to the sample target contained in the corresponding sample image.
- S206 For each sample image, input the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image into the feature classification layer, and determine the prediction category information and the prediction frame corresponding to the sample target in the sample image quality information.
- S207 For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, and the preset category loss function, the corresponding sample target in the sample image. Prediction category information and calibration category information to determine the current loss value.
- the electronic device may further include a process of training to obtain a pre-established target detection model.
- the electronic device obtains a plurality of sample images and their corresponding calibration information
- the sample images may include sample objects
- the calibration information corresponding to the sample images may include calibration frame position information corresponding to the sample objects in the sample image.
- an initial target detection model including a feature extraction layer, a feature regression layer and a feature classification layer; for each sample image, input the sample image into the feature extraction layer, and extract the sample image features corresponding to the sample image; the sample image corresponds to The sample image features are input into the feature regression layer, and the prediction frame position information corresponding to the sample target in the sample image is obtained; then, for each sample target in each sample image, the calibration frame position information corresponding to the sample target and the corresponding prediction are calculated. The ratio information of the intersection area and the union area between the frame position information is determined as the real frame quality information corresponding to the sample target; and for each sample image, the sample image feature corresponding to the sample image and the sample in the sample image.
- the position information of the prediction frame corresponding to the target is input into the feature classification layer, and the prediction category information and prediction frame quality information corresponding to the sample target in the sample image are determined; the real frame quality information corresponding to the sample target in the sample image is used as a kind of calibration information.
- a sample image, the current first loss value is determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image; and based on the preset category loss function, the sample image
- the prediction category information and the calibration category information corresponding to the sample target in the middle determine the current second loss value; and then determine the current loss value based on the current first loss value and the current second loss value.
- the preset optimization algorithm is used to adjust the model parameters of the feature extraction layer, feature regression layer and feature classification layer, and return to execute the For each sample image, the sample image is input into the feature extraction layer, and the sample image features corresponding to the sample image are extracted and obtained. If it is judged that the current loss value does not exceed the preset loss value threshold, it is determined that the initial target detection model has reached a convergence state, and an accurate image of the location area, category information and location information representing the location area of the detected target in the image can be detected. A pre-built object detection model with characteristic box quality information.
- the frame quality loss value corresponding to each sample target may be determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to each sample target in the sample image, and then the sample target The sum or average value of the frame quality loss values corresponding to all sample targets in the image is determined as the current first loss value; and based on the preset category loss function, the predicted category information and calibration category information corresponding to each sample target in the sample image , determine the category loss value corresponding to each sample target; and then determine the current second loss value by the sum or average value of the category loss values corresponding to all sample targets in the sample image; and then use the current first loss value and its The sum of the product of the corresponding weight value and the current second loss value and the product of the corresponding weight value determines the current loss value.
- the preset optimization algorithm may include, but is not limited to, the gradient descent method.
- the sample targets may be vehicles, pedestrians, and traffic signs.
- the initial target detection model may be a deep learning-based neural network model.
- the preset category loss function may be any type of loss function in the related art that can calculate a loss value between category information, which is not limited in the embodiment of the present invention.
- the above-mentioned current loss value may also be determined in combination with the preset position loss function, the position information of the prediction frame corresponding to the sample target in the sample image, and the position information of the calibration frame.
- the preset position loss function may be any type of loss function in the related art that can calculate a loss value between frame position information, which is not limited in the embodiment of the present invention.
- the expression of the preset localization quality focus loss function (LFL, Localization Focal Loss) may be:
- LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
- the LFL(i) represents the first loss value between the predicted frame quality information corresponding to the ith sample target in the sample image and the real frame quality information
- p i represents the ith sample target in the sample image corresponding to the first loss value
- the real frame quality information, q i represents the predicted frame quality information corresponding to the ith sample target in the sample image
- ⁇ represents the preset parameter.
- the electronic device may also use batch sample images to calculate the current loss value, that is, use the preset positioning quality focus loss function, the predicted frame quality information and the real frame quality information corresponding to the sample targets in the multiple sample images , and the preset category loss function, the predicted category information and the calibration category information corresponding to the sample targets in the multiple sample images, to determine the current loss value, which is also possible.
- the category information and the frame quality information can be jointly represented.
- the sample frame quality information and the sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample image corresponding to The position of the sample frame quality information in the preset soft one-hot encoding represents the sample category information corresponding to the sample image.
- FIG. 3 it is an example diagram of the category information and frame quality information jointly represented, in which the frame quality information is represented by numerical values, and the value range is [0, 1]. As shown in FIG.
- 0.9 represents the frame quality information corresponding to the position information of the corresponding detection frame
- 0.9 in the second frame can represent the preset target detection.
- the model predicts that the target corresponding to the location information of the detection frame belongs to the second category.
- the S102 may include the following steps:
- the feature of the image to be detected is input into the feature regression layer of the pre-established target detection model, and the position information of the candidate frame corresponding to the image to be detected is determined;
- the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the image to be detected, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, it is determined to satisfy the The position information of the candidate frame of the preset screening condition is used as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the position of the candidate frame corresponding to the detection target The condition for the maximum quality information of the corresponding target frame in the information.
- the preset suppression algorithm may be NMS (Non Maximum Suppression, non-maximum suppression algorithm).
- the electronic device determines the position information of the candidate frame corresponding to the image to be detected through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then uses the feature classification layer of the pre-established target detection model and the feature classification layer to be detected.
- Detect image features and candidate frame position information and determine the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected;
- the quality information of the target frame corresponding to the position information of the candidate frame from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the image to be detected.
- the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
- an embodiment of the present invention provides a target detection apparatus.
- the apparatus may include:
- an obtaining module 410 configured to obtain an image to be detected
- the determination module 420 is configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: a detected target in the to-be-detected image
- the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information
- the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model Location information, definite information.
- the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
- the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model,
- the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information.
- the accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
- the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the detected prediction frame position information corresponding to the sample image.
- the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
- the device further includes:
- the model training module (not shown in the figure) is configured to use the pre-established target detection model and the to-be-detected image to detect the target detection corresponding to the to-be-detected target from the to-be-detected image Before the frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein,
- the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
- the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
- sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
- the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
- the expression of the preset positioning quality focus loss function is:
- LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
- the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
- P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
- q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
- ⁇ represents the preset parameter.
- the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample frame quality information corresponding to the sample image is stored in the preset soft one-hot encoding.
- the position in one-hot encoding represents the sample category information corresponding to the sample image.
- the determining module 410 is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image corresponding to the to-be-detected image feature;
- the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
- the modules in the apparatus in the embodiment may be distributed in the apparatus in the embodiment according to the description of the embodiment, and may also be located in one or more apparatuses different from this embodiment with corresponding changes.
- the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules.
Abstract
Description
Claims (10)
- 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:获得待检测图像;Obtain the image to be detected;利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection result includes: target detection frame position information corresponding to the detected target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, the sample image corresponding The sample frame quality information is: information determined based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
- 如权利要求1所述的方法,其特征在于,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。The method according to claim 1, wherein the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial value corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image detected by the target detection model.
- 如权利要求1或2所述的方法,其特征在于,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。The method according to claim 1 or 2, wherein the target detection result further comprises: detection category information corresponding to the detection target in the to-be-detected image.
- 如权利要求1-3任一项所述的方法,其特征在于,在所述利用预先建立的目标检测模型以及所述待检测图像,从所述待检测图像中,检测出其中的待检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息的步骤之前,所述方法还包括:The method according to any one of claims 1-3, characterized in that, in using the pre-established target detection model and the to-be-detected image, the to-be-detected target is detected from the to-be-detected image. Before the step of the corresponding target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the method further includes:训练得到所述预先建立的目标检测模型的过程,其中,所述过程包括:The process of obtaining the pre-established target detection model by training, wherein the process includes:获得所述初始目标检测模型,其中,所述初始目标检测模型包括特征提取层、特征分类层以及特征回归层;obtaining the initial target detection model, wherein the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;获得多个样本图像以及样本图像对应的标定信息,其中,所述标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息;Obtaining a plurality of sample images and calibration information corresponding to the sample images, wherein the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;针对每一样本图像,将该样本图像对应的样本图像特征输入所述特征回归层,得到该样本图像中样本目标对应的预测框位置信息;For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image;针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对 应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame corresponding to the sample target quality information;针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入所述特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;For each sample image, the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值;For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, as well as the preset category loss function, the predicted category corresponding to the sample target in the sample image Information and calibration category information to determine the current loss value;判断当前的损失值是否超过预设损失值阈值;Determine whether the current loss value exceeds the preset loss value threshold;若判断结果为是,则调整所述特征提取层、所述特征回归层以及所述特征分类层的模型参数,返回执行针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征的步骤;If the judgment result is yes, then adjust the model parameters of the feature extraction layer, the feature regression layer and the feature classification layer, return to execute for each sample image, input the sample image into the feature extraction layer, and extract the result. The steps of the sample image feature corresponding to the sample image;若判断结果为否,则确定所述初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
- 如权利要求4所述的方法,其特征在于,所述预设定位质量聚焦损失函数的表达式为:The method according to claim 4, wherein the expression of the preset positioning quality focusing loss function is:LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γ; LFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;其中,所述LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Wherein the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame, P i represents the i-th sample image sample corresponding to a target The quality information of the real frame, q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
- 如权利要求4所述的方法,其特征在于,所述样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,所述样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。The method according to claim 4, wherein the sample frame quality information and the sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample frame quality information corresponding to the sample image is stored in the preset soft one-hot encoding. Let the position in the soft one-hot encoding represent the sample category information corresponding to the sample image.
- 如权利要求1-3任一项所述的方法,其特征在于,所述利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果的步骤,包括:The method according to any one of claims 1-3, wherein the step of determining the target detection result corresponding to the to-be-detected image by using a pre-established target detection model and the to-be-detected image comprises:将所述待检测图像输入预先建立的目标检测模型的特征提取层,提取得到所述待检测图像对应的待检测图像特征;Inputting the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extracting to-be-detected image features corresponding to the to-be-detected image;将所述待检测图像特征输入所述预先建立的目标检测模型的特征回归层,确定所述待检测图像对应的候选框位置信息;Inputting the feature of the image to be detected into the feature regression layer of the pre-established target detection model, and determining the position information of the candidate frame corresponding to the image to be detected;将所述待检测图像特征以及所述候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出所述待检测图像中每一检测目标对应的每一候选框位置信息对应 的检测类别信息以及目标框质量信息;Input the feature of the image to be detected and the position information of the candidate frame into the feature classification layer of the pre-established target detection model, and determine the position information of each candidate frame corresponding to each detection target in the image to be detected. Detection category information and target frame quality information;针对所述待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果,其中,所述预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
- 一种目标检测装置,其特征在于,所述装置包括:A target detection device, characterized in that the device comprises:获得模块,被配置为获得待检测图像;an obtaining module, configured to obtain an image to be detected;确定模块,被配置为利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A determination module, configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: the detected target in the to-be-detected image corresponds to The target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model information, definite information.
- 如权利要求8所述的装置,其特征在于,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。The device according to claim 8, wherein the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial value corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image detected by the target detection model.
- 如权利要求8或9所述的装置,其特征在于,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。The apparatus according to claim 8 or 9, wherein the target detection result further comprises: detection category information corresponding to the detection target in the to-be-detected image.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010601392.2A CN113935386A (en) | 2020-06-29 | 2020-06-29 | Target detection method and device |
CN202010601392.2 | 2020-06-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022000855A1 true WO2022000855A1 (en) | 2022-01-06 |
Family
ID=79272632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/121337 WO2022000855A1 (en) | 2020-06-29 | 2020-10-16 | Target detection method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113935386A (en) |
WO (1) | WO2022000855A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117173568A (en) * | 2023-09-05 | 2023-12-05 | 北京观微科技有限公司 | Target detection model training method and target detection method |
CN117636266A (en) * | 2024-01-25 | 2024-03-01 | 华东交通大学 | Method and system for detecting safety behaviors of workers, storage medium and electronic equipment |
CN117636266B (en) * | 2024-01-25 | 2024-05-14 | 华东交通大学 | Method and system for detecting safety behaviors of workers, storage medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295678A (en) * | 2016-07-27 | 2017-01-04 | 北京旷视科技有限公司 | Neural metwork training and construction method and device and object detection method and device |
CN108268869A (en) * | 2018-02-13 | 2018-07-10 | 北京旷视科技有限公司 | Object detection method, apparatus and system |
WO2019028725A1 (en) * | 2017-08-10 | 2019-02-14 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN109727275A (en) * | 2018-12-29 | 2019-05-07 | 北京沃东天骏信息技术有限公司 | Object detection method, device, system and computer readable storage medium |
CN111062413A (en) * | 2019-11-08 | 2020-04-24 | 深兰科技(上海)有限公司 | Road target detection method and device, electronic equipment and storage medium |
CN111241947A (en) * | 2019-12-31 | 2020-06-05 | 深圳奇迹智慧网络有限公司 | Training method and device of target detection model, storage medium and computer equipment |
-
2020
- 2020-06-29 CN CN202010601392.2A patent/CN113935386A/en active Pending
- 2020-10-16 WO PCT/CN2020/121337 patent/WO2022000855A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295678A (en) * | 2016-07-27 | 2017-01-04 | 北京旷视科技有限公司 | Neural metwork training and construction method and device and object detection method and device |
WO2019028725A1 (en) * | 2017-08-10 | 2019-02-14 | Intel Corporation | Convolutional neural network framework using reverse connections and objectness priors for object detection |
CN108268869A (en) * | 2018-02-13 | 2018-07-10 | 北京旷视科技有限公司 | Object detection method, apparatus and system |
CN109727275A (en) * | 2018-12-29 | 2019-05-07 | 北京沃东天骏信息技术有限公司 | Object detection method, device, system and computer readable storage medium |
CN111062413A (en) * | 2019-11-08 | 2020-04-24 | 深兰科技(上海)有限公司 | Road target detection method and device, electronic equipment and storage medium |
CN111241947A (en) * | 2019-12-31 | 2020-06-05 | 深圳奇迹智慧网络有限公司 | Training method and device of target detection model, storage medium and computer equipment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117173568A (en) * | 2023-09-05 | 2023-12-05 | 北京观微科技有限公司 | Target detection model training method and target detection method |
CN117636266A (en) * | 2024-01-25 | 2024-03-01 | 华东交通大学 | Method and system for detecting safety behaviors of workers, storage medium and electronic equipment |
CN117636266B (en) * | 2024-01-25 | 2024-05-14 | 华东交通大学 | Method and system for detecting safety behaviors of workers, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113935386A (en) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977812B (en) | Vehicle-mounted video target detection method based on deep learning | |
CN111353413B (en) | Low-missing-report-rate defect identification method for power transmission equipment | |
KR101926561B1 (en) | Road crack detection apparatus of patch unit and method thereof, and computer program for executing the same | |
CN109447169B (en) | Image processing method, training method and device of model thereof and electronic system | |
WO2020078229A1 (en) | Target object identification method and apparatus, storage medium and electronic apparatus | |
CN110263706B (en) | Method for detecting and identifying dynamic target of vehicle-mounted video in haze weather | |
CN110084165B (en) | Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation | |
Bello-Salau et al. | Image processing techniques for automated road defect detection: A survey | |
CN105788269A (en) | Unmanned aerial vehicle-based abnormal traffic identification method | |
CN111985365A (en) | Straw burning monitoring method and system based on target detection technology | |
CN112883921A (en) | Garbage can overflow detection model training method and garbage can overflow detection method | |
CN110956104A (en) | Method, device and system for detecting overflow of garbage can | |
CN113221804B (en) | Disordered material detection method and device based on monitoring video and application | |
CN113642474A (en) | Hazardous area personnel monitoring method based on YOLOV5 | |
CN111597901A (en) | Illegal billboard monitoring method | |
KR102391853B1 (en) | System and Method for Processing Image Informaion | |
CN110852164A (en) | YOLOv 3-based method and system for automatically detecting illegal building | |
WO2022000855A1 (en) | Target detection method and device | |
CN112862150A (en) | Forest fire early warning method based on image and video multi-model | |
CN114267082A (en) | Bridge side falling behavior identification method based on deep understanding | |
CN113989626B (en) | Multi-class garbage scene distinguishing method based on target detection model | |
Lam et al. | Real-time traffic status detection from on-line images using generic object detection system with deep learning | |
CN114926791A (en) | Method and device for detecting abnormal lane change of vehicles at intersection, storage medium and electronic equipment | |
CN116071711B (en) | Traffic jam condition detection method and device | |
CN113971666A (en) | Power transmission line machine inspection image self-adaptive identification method based on depth target detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20943586 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20943586 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20943586 Country of ref document: EP Kind code of ref document: A1 |