WO2022000855A1 - Target detection method and device - Google Patents

Target detection method and device Download PDF

Info

Publication number
WO2022000855A1
WO2022000855A1 PCT/CN2020/121337 CN2020121337W WO2022000855A1 WO 2022000855 A1 WO2022000855 A1 WO 2022000855A1 CN 2020121337 W CN2020121337 W CN 2020121337W WO 2022000855 A1 WO2022000855 A1 WO 2022000855A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
target
image
frame
information
Prior art date
Application number
PCT/CN2020/121337
Other languages
French (fr)
Chinese (zh)
Inventor
李翔
杨志雄
李亚
王文海
李俊
Original Assignee
魔门塔(苏州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔门塔(苏州)科技有限公司 filed Critical 魔门塔(苏州)科技有限公司
Publication of WO2022000855A1 publication Critical patent/WO2022000855A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of target detection, and in particular, to a target detection method and device.
  • the target detection model is used to detect the image to be detected, and the obtained detection results generally include the position information of the target detection frame corresponding to the detection target and the corresponding category probability information, and the target detection model is used in the target detection.
  • the model selects the target detection frame position information corresponding to the final output target from the position information of the multiple candidate detection frames predicted by the image to be detected, and uses the category probability information corresponding to the position information of each target detection frame to screen.
  • the target detection frame position information corresponding to the final output target is obtained, wherein the category probability information is a confidence level indicating that the corresponding target is a certain category.
  • the invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy.
  • the specific technical solutions are as follows:
  • an embodiment of the present invention provides a target detection method, the method comprising:
  • a target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection result includes: target detection frame position information corresponding to the detected target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, the sample image corresponding The sample frame quality information is: information determined based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
  • the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
  • the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
  • the method further includes:
  • the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
  • the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
  • sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
  • the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
  • the expression of the preset positioning quality focusing loss function is:
  • LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
  • the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
  • P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
  • q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
  • represents the preset parameter.
  • the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated.
  • the sample category information corresponding to the sample image is indicated.
  • the step of using a pre-established target detection model and the to-be-detected image to determine the target detection result corresponding to the to-be-detected image includes:
  • the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
  • an embodiment of the present invention provides a target detection device, the device comprising:
  • an obtaining module configured to obtain an image to be detected
  • a determination module configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: the detected target in the to-be-detected image corresponds to The target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model information, definite information.
  • the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
  • the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
  • the device further includes:
  • the model training module is configured to use the pre-established target detection model and the to-be-detected image to detect, from the to-be-detected image, the target detection frame position information and target detection frame corresponding to the target to be detected.
  • the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein the initial target detection model includes Feature extraction layer, feature classification layer and feature regression layer;
  • the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
  • sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
  • the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
  • the expression of the preset positioning quality focusing loss function is:
  • LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
  • the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
  • P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
  • q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
  • represents the preset parameter.
  • the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated.
  • the sample category information corresponding to the sample image is indicated.
  • the determining module is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image feature corresponding to the to-be-detected image;
  • the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
  • a target detection method and device provided in the embodiments of the present invention obtain an image to be detected; a target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection
  • the results include: the target detection frame position information corresponding to the detection target in the image to be detected and the target frame quality information corresponding to the target detection frame position information.
  • the pre-established target detection model is: based on the sample image and its corresponding calibration information and the corresponding sample
  • the model obtained by training the frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the sample detected based on the initial target detection model corresponding to the pre-established target detection model
  • the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
  • the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model
  • the pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and the sample frame quality
  • the information is the position information of the calibration frame based on the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
  • the determined information is obtained through the pre-established target detection model.
  • the frame quality information corresponding to the frame position information corresponding to the target in the predicted image of the target detection model can be used to screen out the frame position information with better frame quality information corresponding to the detection target as the target detection frame position information, so as to realize the corresponding target in the image.
  • the accuracy of the detection frame is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
  • the ratio information of the set area is used as the quality information of the sample frame corresponding to the sample image, so that the pre-established target detection model can learn the prediction function that is more in line with the actual frame quality, which is the prediction function of the frame quality information corresponding to the subsequent frame position information, and Provides a basis for the screening of frame position information based on frame quality information.
  • the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding indicates the sample corresponding to the sample image.
  • Category information realizes the joint representation of category information and frame quality information.
  • the position information of the candidate frame corresponding to the image to be detected is determined through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then, combined with the pre-established The feature classification layer of the target detection model, determines the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected;
  • the quality information of the target frame corresponding to the position information of the candidate frame from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the to-be-detected image.
  • the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
  • FIG. 1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention.
  • Fig. 2 is a kind of schematic flowchart of the process of training to obtain a pre-established target detection model
  • FIG. 3 is a schematic diagram of category information and frame quality information jointly represented
  • FIG. 4 is a schematic structural diagram of a target detection apparatus provided by an embodiment of the present invention.
  • the invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy.
  • the embodiments of the present invention will be described in detail below.
  • FIG. 1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention. The method may include the following steps:
  • the target detection method provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server.
  • the electronic device may be an in-vehicle device, installed on the vehicle, and the vehicle may also be provided with an image acquisition device, which may collect images for the environment in which the vehicle is located, and the electronic device is connected to the image acquisition device, The image collected by the image collection device can be obtained as the image to be detected.
  • the electronic device may be a non-vehicle device, and the electronic device may be connected to an image acquisition device that captures the target scene to obtain an image captured by the image capture device for the target scene as an image to be detected.
  • the target scene can be a road scene or a square scene or an indoor scene, which is all possible.
  • the to-be-detected image may be an RGB (Red Green Blue, red, green, blue) image or an infrared image, which are all possible.
  • RGB Red Green Blue, red, green, blue
  • the embodiment of the present invention does not limit the type of the image to be detected.
  • S102 Determine a target detection result corresponding to the to-be-detected image by using the pre-established target detection model and the to-be-detected image.
  • the target detection result includes: target detection frame position information corresponding to the detection target in the to-be-detected image and target frame quality information corresponding to the target detection frame position information, and the pre-established target detection model is: based on the sample image and its corresponding calibration information and the model obtained by training the corresponding sample frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model based on the pre-established target detection model corresponding to the detection model
  • the position information of the prediction frame corresponding to the sample image obtained is determined information.
  • the electronic device or the connected storage device locally stores a pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, wherein the pre-established target detection model is in the training process.
  • the preset positioning quality focusing loss function is used to adjust the corresponding model parameters.
  • the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, definitive information.
  • the pre-established target detection model trained based on the sample images and their corresponding calibration information and the corresponding sample frame quality information has the ability to predict the quality corresponding to each detection frame, that is, the frame quality information corresponding to the position information of each detection frame.
  • the frame quality information can represent the accuracy of the corresponding detection frame position information obtained by detection.
  • the frame quality information corresponding to the detection frame position information can be represented by numerical values. The more consistent the characterizing location area is with the location area where the target is located. Among them, for the layout situation, the training process of the pre-established target detection model will be described later.
  • the electronic device inputs the image to be detected into the pre-established target detection model, uses the pre-established target detection model to extract the image features of the image to be detected, and obtains the image features to be detected; and uses the pre-established target detection model to regress the image features to be detected.
  • multiple candidate detection frames are regressed from the image to be detected, and their position information is obtained; the position information of each candidate detection frame is predicted by using the pre-established target detection model, the position information of multiple candidate detection frames and the characteristics of the image to be detected The corresponding frame quality information, and then use the frame quality information to filter the position information of multiple candidate detection frames, and screen out the frame quality information corresponding to each detection target to represent the candidate detection frame position information of the corresponding candidate detection frame.
  • Obtain the target frame quality information including the target detection frame position information corresponding to the detection target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information.
  • the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
  • the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model,
  • the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information.
  • the accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
  • the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model.
  • the ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image obtained.
  • the intersection area of the prediction frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model and the calibration frame position information in the calibration information corresponding to the sample image, and the combination of the above two
  • the ratio information of the collection area is determined as the quality information of the sample frame corresponding to the sample image.
  • the quality information of the sample frame corresponding to the sample image can be represented by a numerical value, and the value range of the numerical value can be [0, 1].
  • the target detection result may further include: detection category information corresponding to the detection target in the image to be detected.
  • the calibration information corresponding to the sample image may also include calibration category information, so that the pre-established target detection model obtained by training has the ability to predict the category of the target in the image.
  • the method may further include:
  • the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
  • the calibration information includes: calibration frame position information and calibration category information corresponding to the sample target contained in the corresponding sample image.
  • S206 For each sample image, input the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image into the feature classification layer, and determine the prediction category information and the prediction frame corresponding to the sample target in the sample image quality information.
  • S207 For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, and the preset category loss function, the corresponding sample target in the sample image. Prediction category information and calibration category information to determine the current loss value.
  • the electronic device may further include a process of training to obtain a pre-established target detection model.
  • the electronic device obtains a plurality of sample images and their corresponding calibration information
  • the sample images may include sample objects
  • the calibration information corresponding to the sample images may include calibration frame position information corresponding to the sample objects in the sample image.
  • an initial target detection model including a feature extraction layer, a feature regression layer and a feature classification layer; for each sample image, input the sample image into the feature extraction layer, and extract the sample image features corresponding to the sample image; the sample image corresponds to The sample image features are input into the feature regression layer, and the prediction frame position information corresponding to the sample target in the sample image is obtained; then, for each sample target in each sample image, the calibration frame position information corresponding to the sample target and the corresponding prediction are calculated. The ratio information of the intersection area and the union area between the frame position information is determined as the real frame quality information corresponding to the sample target; and for each sample image, the sample image feature corresponding to the sample image and the sample in the sample image.
  • the position information of the prediction frame corresponding to the target is input into the feature classification layer, and the prediction category information and prediction frame quality information corresponding to the sample target in the sample image are determined; the real frame quality information corresponding to the sample target in the sample image is used as a kind of calibration information.
  • a sample image, the current first loss value is determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image; and based on the preset category loss function, the sample image
  • the prediction category information and the calibration category information corresponding to the sample target in the middle determine the current second loss value; and then determine the current loss value based on the current first loss value and the current second loss value.
  • the preset optimization algorithm is used to adjust the model parameters of the feature extraction layer, feature regression layer and feature classification layer, and return to execute the For each sample image, the sample image is input into the feature extraction layer, and the sample image features corresponding to the sample image are extracted and obtained. If it is judged that the current loss value does not exceed the preset loss value threshold, it is determined that the initial target detection model has reached a convergence state, and an accurate image of the location area, category information and location information representing the location area of the detected target in the image can be detected. A pre-built object detection model with characteristic box quality information.
  • the frame quality loss value corresponding to each sample target may be determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to each sample target in the sample image, and then the sample target The sum or average value of the frame quality loss values corresponding to all sample targets in the image is determined as the current first loss value; and based on the preset category loss function, the predicted category information and calibration category information corresponding to each sample target in the sample image , determine the category loss value corresponding to each sample target; and then determine the current second loss value by the sum or average value of the category loss values corresponding to all sample targets in the sample image; and then use the current first loss value and its The sum of the product of the corresponding weight value and the current second loss value and the product of the corresponding weight value determines the current loss value.
  • the preset optimization algorithm may include, but is not limited to, the gradient descent method.
  • the sample targets may be vehicles, pedestrians, and traffic signs.
  • the initial target detection model may be a deep learning-based neural network model.
  • the preset category loss function may be any type of loss function in the related art that can calculate a loss value between category information, which is not limited in the embodiment of the present invention.
  • the above-mentioned current loss value may also be determined in combination with the preset position loss function, the position information of the prediction frame corresponding to the sample target in the sample image, and the position information of the calibration frame.
  • the preset position loss function may be any type of loss function in the related art that can calculate a loss value between frame position information, which is not limited in the embodiment of the present invention.
  • the expression of the preset localization quality focus loss function (LFL, Localization Focal Loss) may be:
  • LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
  • the LFL(i) represents the first loss value between the predicted frame quality information corresponding to the ith sample target in the sample image and the real frame quality information
  • p i represents the ith sample target in the sample image corresponding to the first loss value
  • the real frame quality information, q i represents the predicted frame quality information corresponding to the ith sample target in the sample image
  • represents the preset parameter.
  • the electronic device may also use batch sample images to calculate the current loss value, that is, use the preset positioning quality focus loss function, the predicted frame quality information and the real frame quality information corresponding to the sample targets in the multiple sample images , and the preset category loss function, the predicted category information and the calibration category information corresponding to the sample targets in the multiple sample images, to determine the current loss value, which is also possible.
  • the category information and the frame quality information can be jointly represented.
  • the sample frame quality information and the sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample image corresponding to The position of the sample frame quality information in the preset soft one-hot encoding represents the sample category information corresponding to the sample image.
  • FIG. 3 it is an example diagram of the category information and frame quality information jointly represented, in which the frame quality information is represented by numerical values, and the value range is [0, 1]. As shown in FIG.
  • 0.9 represents the frame quality information corresponding to the position information of the corresponding detection frame
  • 0.9 in the second frame can represent the preset target detection.
  • the model predicts that the target corresponding to the location information of the detection frame belongs to the second category.
  • the S102 may include the following steps:
  • the feature of the image to be detected is input into the feature regression layer of the pre-established target detection model, and the position information of the candidate frame corresponding to the image to be detected is determined;
  • the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the image to be detected, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, it is determined to satisfy the The position information of the candidate frame of the preset screening condition is used as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the position of the candidate frame corresponding to the detection target The condition for the maximum quality information of the corresponding target frame in the information.
  • the preset suppression algorithm may be NMS (Non Maximum Suppression, non-maximum suppression algorithm).
  • the electronic device determines the position information of the candidate frame corresponding to the image to be detected through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then uses the feature classification layer of the pre-established target detection model and the feature classification layer to be detected.
  • Detect image features and candidate frame position information and determine the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected;
  • the quality information of the target frame corresponding to the position information of the candidate frame from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the image to be detected.
  • the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
  • an embodiment of the present invention provides a target detection apparatus.
  • the apparatus may include:
  • an obtaining module 410 configured to obtain an image to be detected
  • the determination module 420 is configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: a detected target in the to-be-detected image
  • the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information
  • the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model Location information, definite information.
  • the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image
  • the sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model,
  • the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information.
  • the accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
  • the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the detected prediction frame position information corresponding to the sample image.
  • the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
  • the device further includes:
  • the model training module (not shown in the figure) is configured to use the pre-established target detection model and the to-be-detected image to detect the target detection corresponding to the to-be-detected target from the to-be-detected image Before the frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein,
  • the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
  • the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
  • sample image For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
  • the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
  • the expression of the preset positioning quality focus loss function is:
  • LFL (i) - (( 1-p i) log (1-q i) + p i log (q i))
  • the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame
  • P i represents the i-th sample image sample corresponding to a target The quality information of the real frame
  • q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image
  • represents the preset parameter.
  • the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample frame quality information corresponding to the sample image is stored in the preset soft one-hot encoding.
  • the position in one-hot encoding represents the sample category information corresponding to the sample image.
  • the determining module 410 is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image corresponding to the to-be-detected image feature;
  • the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
  • the modules in the apparatus in the embodiment may be distributed in the apparatus in the embodiment according to the description of the embodiment, and may also be located in one or more apparatuses different from this embodiment with corresponding changes.
  • the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules.

Abstract

Disclosed in an embodiment of the present invention are a target detection method and device, the method comprising: acquiring an image to be detected; and using a pre-established target detection model and the image to be detected to determine a target detection result corresponding to the image to be detected. The target detection result comprises the position information of a target detection box in the image to be detected corresponding to a detection target, and the quality information of the target box corresponding thereto. The pre-established target detection model is a model trained based on sample images, calibration information corresponding thereto and the quality information of a corresponding sample box. The quality information of the sample box corresponding to the sample image is information determined based on the calibration box position information in the calibration information corresponding to the sample image and based on the prediction box position information corresponding to the sample image detected by an initial target detection model corresponding to the pre-established target detection model. The aim is to determine the accuracy of the detection box in the image corresponding to the target, and obtain a more accurate detection box corresponding to the target.

Description

一种目标检测方法及装置A target detection method and device 技术领域technical field
本发明涉及目标检测技术领域,具体而言,涉及一种目标检测方法及装置。The present invention relates to the technical field of target detection, and in particular, to a target detection method and device.
背景技术Background technique
目前的目标检测技术中,利用目标检测模型对待检测图像进行检测,所得到的检测结果一般包括检测目标对应的目标检测框位置信息及其对应的类别概率信息,并在利用目标检测模型于目标检测模型从待检测图像所预测的多个候选检测框位置信息中,筛选出最终输出的目标对应的目标检测框位置信息的过程中,利用各目标检测框位置信息对应的类别概率信息,进行筛选,得到最终输出的目标对应的目标检测框位置信息,其中,该类别概率信息为表示所对应目标为某一类别的置信度。In the current target detection technology, the target detection model is used to detect the image to be detected, and the obtained detection results generally include the position information of the target detection frame corresponding to the detection target and the corresponding category probability information, and the target detection model is used in the target detection. The model selects the target detection frame position information corresponding to the final output target from the position information of the multiple candidate detection frames predicted by the image to be detected, and uses the category probability information corresponding to the position information of each target detection frame to screen. The target detection frame position information corresponding to the final output target is obtained, wherein the category probability information is a confidence level indicating that the corresponding target is a certain category.
然而,目前大多数应用目标检测技术的场景,例如:车辆检测场景、行人检测场景等,需要基于目标检测模型从待检测图像中,检测得到位置更加准确的目标对应的目标检测框位置信息,即需要得到所对应框质量信息更加高的目标对应的目标检测框位置信息。而目前的目标检测技术无法实现对检测框位置信息对应的质量信息的确定。However, at present, most of the scenes applying target detection technology, such as vehicle detection scenes, pedestrian detection scenes, etc., need to detect the target detection frame position information corresponding to the target with a more accurate position from the to-be-detected image based on the target detection model, that is, The target detection frame position information corresponding to the target with higher corresponding frame quality information needs to be obtained. However, the current target detection technology cannot realize the determination of the quality information corresponding to the position information of the detection frame.
那么,如何提供一种确定目标所对应检测框的质量信息的方法成为亟待解决的问题。Then, how to provide a method for determining the quality information of the detection frame corresponding to the target becomes an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本发明提供了一种目标检测方法及装置,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。具体的技术方案如下:The invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy. The specific technical solutions are as follows:
第一方面,本发明实施例提供了一种目标检测方法,所述方法包括:In a first aspect, an embodiment of the present invention provides a target detection method, the method comprising:
获得待检测图像;Obtain the image to be detected;
利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection result includes: target detection frame position information corresponding to the detected target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, the sample image corresponding The sample frame quality information is: information determined based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
可选的,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。Optionally, the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model. The ratio information of the intersection area and the union area between the corresponding prediction frame position information.
可选的,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。Optionally, the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
可选的,在所述利用预先建立的目标检测模型以及所述待检测图像,从所述待检测图像中,检测出其中的待检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息的步骤之前,所述方法还包括:Optionally, in the use of the pre-established target detection model and the to-be-detected image, from the to-be-detected image, the target detection frame position information corresponding to the to-be-detected target and the corresponding target detection frame position information are detected. Before the step of the target frame quality information, the method further includes:
训练得到所述预先建立的目标检测模型的过程,其中,所述过程包括:The process of obtaining the pre-established target detection model by training, wherein the process includes:
获得所述初始目标检测模型,其中,所述初始目标检测模型包括特征提取层、特征分类层以及特征回归层;obtaining the initial target detection model, wherein the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
获得多个样本图像以及样本图像对应的标定信息,其中,所述标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息;Obtaining a plurality of sample images and calibration information corresponding to the sample images, wherein the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
针对每一样本图像,将该样本图像对应的样本图像特征输入所述特征回归层,得到该样本图像中样本目标对应的预测框位置信息;For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image;
针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame corresponding to the sample target quality information;
针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入所述特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;For each sample image, the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值;For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, as well as the preset category loss function, the predicted category corresponding to the sample target in the sample image Information and calibration category information to determine the current loss value;
判断当前的损失值是否超过预设损失值阈值;Determine whether the current loss value exceeds the preset loss value threshold;
若判断结果为是,则调整所述特征提取层、所述特征回归层以及所述特征分类层的模型参数,返回执行针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征的步骤;If the judgment result is yes, then adjust the model parameters of the feature extraction layer, the feature regression layer and the feature classification layer, return to execute for each sample image, input the sample image into the feature extraction layer, and extract the result. The steps of the sample image feature corresponding to the sample image;
若判断结果为否,则确定所述初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
可选的,所述预设定位质量聚焦损失函数的表达式为:Optionally, the expression of the preset positioning quality focusing loss function is:
LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γLFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;
其中,所述LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Wherein the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame, P i represents the i-th sample image sample corresponding to a target The quality information of the real frame, q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
可选的,所述样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,所述样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。Optionally, the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated. The sample category information corresponding to the sample image.
可选的,所述利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果的步骤,包括:Optionally, the step of using a pre-established target detection model and the to-be-detected image to determine the target detection result corresponding to the to-be-detected image includes:
将所述待检测图像输入预先建立的目标检测模型的特征提取层,提取得到所述待检测图像对应的待检测图像特征;Inputting the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extracting to-be-detected image features corresponding to the to-be-detected image;
将所述待检测图像特征输入所述预先建立的目标检测模型的特征回归层,确定所述待检测图像对应的候选框位置信息;Inputting the feature of the image to be detected into the feature regression layer of the pre-established target detection model, and determining the position information of the candidate frame corresponding to the image to be detected;
将所述待检测图像特征以及所述候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出所述待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;Input the feature of the image to be detected and the position information of the candidate frame into the feature classification layer of the pre-established target detection model, and determine the position information of each candidate frame corresponding to each detection target in the image to be detected. Detection category information and target frame quality information;
针对所述待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果,其中,所述预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
第二方面,本发明实施例提供了一种目标检测装置,所述装置包括:In a second aspect, an embodiment of the present invention provides a target detection device, the device comprising:
获得模块,被配置为获得待检测图像;an obtaining module, configured to obtain an image to be detected;
确定模块,被配置为利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A determination module, configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: the detected target in the to-be-detected image corresponds to The target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model information, definite information.
可选的,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。Optionally, the sample frame quality information corresponding to the sample image is: the sample image detected based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model. The ratio information of the intersection area and the union area between the corresponding prediction frame position information.
可选的,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。Optionally, the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
可选的,所述装置还包括:Optionally, the device further includes:
模型训练模块,被配置为在所述利用预先建立的目标检测模型以及所述待检测图像,从所述待检测图像中,检测出其中的待检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息之前,训练得到所述预先建立的目标检测模型,其中,所述模型训练模块,被具体配置为获得所述初始目标检测模型,其中,所述初始目标检测模型包括特征提取层、特征分类层以及特征回归层;The model training module is configured to use the pre-established target detection model and the to-be-detected image to detect, from the to-be-detected image, the target detection frame position information and target detection frame corresponding to the target to be detected. Before obtaining the target frame quality information corresponding to the position information, the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein the initial target detection model includes Feature extraction layer, feature classification layer and feature regression layer;
获得多个样本图像以及样本图像对应的标定信息,其中,所述标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息;Obtaining a plurality of sample images and calibration information corresponding to the sample images, wherein the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
针对每一样本图像,将该样本图像对应的样本图像特征输入所述特征回归层,得到该样本图像中样本目标对应的预测框位置信息;For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image;
针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame corresponding to the sample target quality information;
针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入所述特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;For each sample image, the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值;For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, as well as the preset category loss function, the predicted category corresponding to the sample target in the sample image Information and calibration category information to determine the current loss value;
判断当前的损失值是否超过预设损失值阈值;Determine whether the current loss value exceeds the preset loss value threshold;
若判断结果为是,则调整所述特征提取层、所述特征回归层以及所述特征分类层的模型参数,返回执行针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征的步骤;If the judgment result is yes, then adjust the model parameters of the feature extraction layer, the feature regression layer and the feature classification layer, return to execute for each sample image, input the sample image into the feature extraction layer, and extract the result. The steps of the sample image feature corresponding to the sample image;
若判断结果为否,则确定所述初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
可选的,所述预设定位质量聚焦损失函数的表达式为:Optionally, the expression of the preset positioning quality focusing loss function is:
LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γLFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;
其中,所述LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Wherein the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame, P i represents the i-th sample image sample corresponding to a target The quality information of the real frame, q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
可选的,所述样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,所述样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。Optionally, the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding is indicated. The sample category information corresponding to the sample image.
可选的,所述确定模块,被具体配置为将所述待检测图像输入预先建立的目标检测模型的特征提取层,提取得到所述待检测图像对应的待检测图像特征;Optionally, the determining module is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image feature corresponding to the to-be-detected image;
将所述待检测图像特征输入所述预先建立的目标检测模型的特征回归层,确定所述待检测图像对应的候选框位置信息;Inputting the feature of the image to be detected into the feature regression layer of the pre-established target detection model, and determining the position information of the candidate frame corresponding to the image to be detected;
将所述待检测图像特征以及所述候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出所述待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;Input the feature of the image to be detected and the position information of the candidate frame into the feature classification layer of the pre-established target detection model, and determine the position information of each candidate frame corresponding to each detection target in the image to be detected. Detection category information and target frame quality information;
针对所述待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果,其中,所述预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
由上述内容可知,本发明实施例提供的一种目标检测方法及装置,获得待检测图像;利用预先建立的目标检测模型以及待检测图像,确定待检测图像对应的目标检测结果,其中,目标检测结果包括:待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。It can be seen from the above that a target detection method and device provided in the embodiments of the present invention obtain an image to be detected; a target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection The results include: the target detection frame position information corresponding to the detection target in the image to be detected and the target frame quality information corresponding to the target detection frame position information. The pre-established target detection model is: based on the sample image and its corresponding calibration information and the corresponding sample The model obtained by training the frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the sample detected based on the initial target detection model corresponding to the pre-established target detection model The position information of the prediction frame corresponding to the image, the determined information.
应用本发明实施例,基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,具有预测图像中检测目标对应的目标检测框对应的质量的功能,且该样本框质量信息为基于该样本图像对应的标定信息中标定框位置信 息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息,通过预先建立的目标检测模型的预测图像中目标对应的框位置信息对应的框质量信息,可以实现筛选出检测目标所对应框质量信息更好的框位置信息作为目标检测框位置信息,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。当然,实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。Applying the embodiment of the present invention, the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and The sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, Through the frame quality information corresponding to the frame position information corresponding to the target in the prediction image of the pre-established target detection model, the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information. The accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained. Of course, it is not necessary for any product or method of the present invention to achieve all of the advantages described above at the same time.
本发明实施例的创新点包括:The innovative points of the embodiments of the present invention include:
1、基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,具有预测图像中检测目标对应的目标检测框对应的质量的功能,且该样本框质量信息为基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息,通过预先建立的目标检测模型的预测图像中目标对应的框位置信息对应的框质量信息,可以实现筛选出检测目标所对应框质量信息更好的框位置信息作为目标检测框位置信息,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。1. The pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and the sample frame quality The information is the position information of the calibration frame based on the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model. The determined information is obtained through the pre-established target detection model. The frame quality information corresponding to the frame position information corresponding to the target in the predicted image of the target detection model can be used to screen out the frame position information with better frame quality information corresponding to the detection target as the target detection frame position information, so as to realize the corresponding target in the image. The accuracy of the detection frame is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
2、将基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息,作为样本图像对应的样本框质量信息,使得预先建立的目标检测模型学习到更符合实际的框质量的预测功能,为后续的框位置信息对应的框质量信息的预测,并基于框质量信息对框位置信息的筛选提供基础。2. Compare the area of intersection between the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model. The ratio information of the set area is used as the quality information of the sample frame corresponding to the sample image, so that the pre-established target detection model can learn the prediction function that is more in line with the actual frame quality, which is the prediction function of the frame quality information corresponding to the subsequent frame position information, and Provides a basis for the screening of frame position information based on frame quality information.
3、训练得到预先建立的目标检测模型的过程,通过初始目标检测模型的特征提取层、特征回归层以及样本图像,得到样本图像中样本目标对应的预测框位置信息,进而计算针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息,进而,通过初始目标检测模型的特征分类层、该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;利用预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值,若当前的损失值超过预设损失值阈值,调整特征提取层、特征回归层以及特征分类层的模型参数,并返回针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;若当前的损失值未超过预设损失值阈值则,确定得到预先建立的目 标检测模型,实现模型的训练,使得预先建立的目标检测模型具有预测图像中检测目标对应的目标检测框对应的质量的能力,为后续的检测框位置信息的确定提供基础。3. The process of training to obtain a pre-established target detection model, through the feature extraction layer, feature regression layer and sample image of the initial target detection model, to obtain the position information of the prediction frame corresponding to the sample target in the sample image, and then calculate the corresponding prediction frame position information for each sample image. For each sample target in the sample target, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame quality information corresponding to the sample target, and then, Determine the prediction category information and prediction frame quality information corresponding to the sample target in the sample image through the feature classification layer of the initial target detection model, the sample image feature corresponding to the sample image, and the prediction frame position information corresponding to the sample target in the sample image; Using the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, and the preset category loss function, the predicted category information and calibration category information corresponding to the sample target in the sample image, Determine the current loss value, if the current loss value exceeds the preset loss value threshold, adjust the model parameters of the feature extraction layer, feature regression layer and feature classification layer, and return for each sample image, input the sample image into the feature The extraction layer extracts and obtains the sample image features corresponding to the sample image; if the current loss value does not exceed the preset loss value threshold, it is determined to obtain a pre-established target detection model, and the training of the model is realized, so that the pre-established target detection model has The ability to predict the quality corresponding to the target detection frame corresponding to the detection target in the image provides a basis for the subsequent determination of the position information of the detection frame.
4、设置可以支持框质量信息预测训练的预设定位质量聚焦损失函数,以支持预先建立的目标检测模型对图像中检测目标对应的目标检测框对应的质量的预测能力的训练。4. Set a preset positioning quality focusing loss function that can support frame quality information prediction training, so as to support the training of the pre-established target detection model on the quality of the target detection frame corresponding to the detection target in the image.
5、样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,并且样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息,实现类别信息与框质量信息的联合表示。5. The sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the position of the sample frame quality information corresponding to the sample image in the preset soft one-hot encoding indicates the sample corresponding to the sample image. Category information, realizes the joint representation of category information and frame quality information.
6、通过预先建立的目标检测模型对待检测图像进行检测过程中,通过预先建立的目标检测模型的特征提取层以及特征回归层,确定出待检测图像对应的候选框位置信息,进而,结合预先建立的目标检测模型的特征分类层,确定出待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;通过预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果。实现通过比较框质量信息所表征的所对应候选框位置信息的准确性的高低,来完成对所对应候选框位置信息的筛选和确定,以得到检测位置准确性更好的框位置信息。6. In the process of detecting the image to be detected through the pre-established target detection model, the position information of the candidate frame corresponding to the image to be detected is determined through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then, combined with the pre-established The feature classification layer of the target detection model, determines the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected; The quality information of the target frame corresponding to the position information of the candidate frame, from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the to-be-detected image. By comparing the accuracy of the corresponding candidate frame position information represented by the frame quality information, the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单介绍。显而易见地,下面描述中的附图仅仅是本发明的一些实施例。对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are required in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some embodiments of the invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.
图1为本发明实施例提供的目标检测方法的一种流程示意图;1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention;
图2为训练得到预先建立的目标检测模型的过程一种流程示意图;Fig. 2 is a kind of schematic flowchart of the process of training to obtain a pre-established target detection model;
图3为联合表示的类别信息与框质量信息的一种示意图;FIG. 3 is a schematic diagram of category information and frame quality information jointly represented;
图4为本发明实施例提供的目标检测装置的一种结构示意图。FIG. 4 is a schematic structural diagram of a target detection apparatus provided by an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
需要说明的是,本发明实施例及附图中的术语“包括”和“具有”以及它们的任何变形, 意图在于覆盖不排他的包含。例如包含的一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。It should be noted that the terms "comprising" and "having" in the embodiments of the present invention and the accompanying drawings, as well as any modifications thereof, are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the steps or units listed, but optionally also includes steps or units not listed, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.
本发明提供了一种目标检测方法及装置,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。下面对本发明实施例进行详细说明。The invention provides a target detection method and device, so as to realize the determination of the accuracy of the detection frame corresponding to the target in the image, and then obtain the detection frame corresponding to the target with better accuracy. The embodiments of the present invention will be described in detail below.
图1为本发明实施例提供的目标检测方法的一种流程示意图。该方法可以包括如下步骤:FIG. 1 is a schematic flowchart of a target detection method provided by an embodiment of the present invention. The method may include the following steps:
S101:获得待检测图像。S101: Obtain an image to be detected.
本发明实施例所提供的目标检测方法,可以应用于任一具有计算能力的电子设备,该电子设备可以为终端或者服务器。在一种实现方式中,该电子设备可以为车载设备,设置于车辆上,该车辆还可以设置有图像采集设备,图像采集设备可以针对车辆所处环境采集图像,电子设备与图像采集设备连接,可以获得图像采集设备所采集的图像,作为待检测图像。另一种实现方式中,该电子设备可以为非车载设备,电子设备可以与针对目标场景进行拍摄的图像采集设备连接,获得图像采集设备针对目标场景采集的图像,作为待检测图像,一种情况中,该目标场景可以为公路场景或广场场景或室内场景,这都是可以的。The target detection method provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server. In an implementation manner, the electronic device may be an in-vehicle device, installed on the vehicle, and the vehicle may also be provided with an image acquisition device, which may collect images for the environment in which the vehicle is located, and the electronic device is connected to the image acquisition device, The image collected by the image collection device can be obtained as the image to be detected. In another implementation manner, the electronic device may be a non-vehicle device, and the electronic device may be connected to an image acquisition device that captures the target scene to obtain an image captured by the image capture device for the target scene as an image to be detected. , the target scene can be a road scene or a square scene or an indoor scene, which is all possible.
该待检测图像可以为RGB(Red Green Blue,红绿蓝)图像,也可以为红外图像,这都是可以的。本发明实施例并不对待检测图像的类型进行限定。The to-be-detected image may be an RGB (Red Green Blue, red, green, blue) image or an infrared image, which are all possible. The embodiment of the present invention does not limit the type of the image to be detected.
S102:利用预先建立的目标检测模型以及待检测图像,确定待检测图像对应的目标检测结果。S102: Determine a target detection result corresponding to the to-be-detected image by using the pre-established target detection model and the to-be-detected image.
其中,目标检测结果包括:待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。The target detection result includes: target detection frame position information corresponding to the detection target in the to-be-detected image and target frame quality information corresponding to the target detection frame position information, and the pre-established target detection model is: based on the sample image and its corresponding calibration information and the model obtained by training the corresponding sample frame quality information, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model based on the pre-established target detection model corresponding to the detection model The position information of the prediction frame corresponding to the sample image obtained is determined information.
电子设备本地或所连接的存储设备本地存储有基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,其中,该预先建立的目标检测模型在训练过程中利用预设定位质量聚焦损失函数来调整相应的模型参数。样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测 框位置信息,确定的信息。该基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,具有预测各检测框对应的质量的能力,即各检测框位置信息对应的框质量信息的能力,该框质量信息可以表征所对应的检测所得的检测框位置信息的准确性。一种情况,检测框位置信息对应的框质量信息可以通过数值表征,该框质量信息的数值越大,表征其所对应的检测框位置信息的准确性越高,即表征该检测框位置信息所表征位置区域与目标所在位置区域越相符。其中,为了布局情况,预先建立的目标检测模型的训练过程后续进行说明。The electronic device or the connected storage device locally stores a pre-established target detection model trained based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, wherein the pre-established target detection model is in the training process. The preset positioning quality focusing loss function is used to adjust the corresponding model parameters. The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, definitive information. The pre-established target detection model trained based on the sample images and their corresponding calibration information and the corresponding sample frame quality information has the ability to predict the quality corresponding to each detection frame, that is, the frame quality information corresponding to the position information of each detection frame. The frame quality information can represent the accuracy of the corresponding detection frame position information obtained by detection. In one case, the frame quality information corresponding to the detection frame position information can be represented by numerical values. The more consistent the characterizing location area is with the location area where the target is located. Among them, for the layout situation, the training process of the pre-established target detection model will be described later.
电子设备将待检测图像输入该预先建立的目标检测模型,利用预先建立的目标检测模型对待检测图像进行图像特征提取,得到待检测图像特征;并利用预先建立的目标检测模型对待检测图像特征进行回归,从待检测图像中回归出多个作为候选的候选检测框,得到其位置信息;利用预先建立的目标检测模型、多个候选检测框位置信息以及待检测图像特征,预测各候选检测框位置信息对应的框质量信息,进而利用框质量信息对多个候选检测框位置信息进行筛选,筛选出各检测目标对应的框质量信息表征所对应候选检测框位置信息较准确的候选检测框位置信息,以得到包括待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息。The electronic device inputs the image to be detected into the pre-established target detection model, uses the pre-established target detection model to extract the image features of the image to be detected, and obtains the image features to be detected; and uses the pre-established target detection model to regress the image features to be detected. , multiple candidate detection frames are regressed from the image to be detected, and their position information is obtained; the position information of each candidate detection frame is predicted by using the pre-established target detection model, the position information of multiple candidate detection frames and the characteristics of the image to be detected The corresponding frame quality information, and then use the frame quality information to filter the position information of multiple candidate detection frames, and screen out the frame quality information corresponding to each detection target to represent the candidate detection frame position information of the corresponding candidate detection frame. Obtain the target frame quality information including the target detection frame position information corresponding to the detection target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information.
应用本发明实施例,基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,具有预测图像中检测目标对应的目标检测框对应的质量的功能,且该样本框质量信息为基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息,通过预先建立的目标检测模型的预测图像中目标对应的框位置信息对应的框质量信息,可以实现筛选出检测目标所对应框质量信息更好的框位置信息作为目标检测框位置信息,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。Applying the embodiment of the present invention, the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and The sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, Through the frame quality information corresponding to the frame position information corresponding to the target in the prediction image of the pre-established target detection model, the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information. The accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
在本发明的另一实施例中,该样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。即将基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息与该样本图像对应的标定信息中的标定框位置信息的交集面积,与上述两者的并集面积的比值信息,确定为该样本图像对应的样本框质量信息。可以理解的是,预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积越大,并集面积越小,相应的,样本图像对应的样本框质量信息表征:基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本 图像对应的预测框位置信息与该样本图像对应的标定信息中的标定框位置信息之间越相近,即基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息的位置信息准确性越高。In another embodiment of the present invention, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model. The ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image obtained. The intersection area of the prediction frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model and the calibration frame position information in the calibration information corresponding to the sample image, and the combination of the above two The ratio information of the collection area is determined as the quality information of the sample frame corresponding to the sample image. It can be understood that, the larger the intersection area between the position information of the prediction frame corresponding to the sample image detected by the initial target detection model corresponding to the pre-established target detection model, the smaller the union area, and the corresponding Representation of sample frame quality information: the closer the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model and the position information of the calibration frame in the calibration information corresponding to the sample image, is closer. That is, the accuracy of the position information of the prediction frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model is higher.
在一种情况中,样本图像对应的样本框质量信息可以通过数值来表示,该数值的取值范围可以为[0,1]。In one case, the quality information of the sample frame corresponding to the sample image can be represented by a numerical value, and the value range of the numerical value can be [0, 1].
在本发明的另一实施例中,目标检测结果还可以包括:待检测图像中检测目标对应的检测类别信息。相应的,样本图像对应的标定信息中还可以包括标定类别信息,以使得训练所得的预先建立的目标检测模型具有预测图像中目标的类别的能力。In another embodiment of the present invention, the target detection result may further include: detection category information corresponding to the detection target in the image to be detected. Correspondingly, the calibration information corresponding to the sample image may also include calibration category information, so that the pre-established target detection model obtained by training has the ability to predict the category of the target in the image.
在本发明的另一实施例中,在所述S102之前,所述方法还可以包括:In another embodiment of the present invention, before the S102, the method may further include:
训练得到预先建立的目标检测模型的过程,其中,如图2所示,该过程包括如下步骤:The process of training to obtain a pre-established target detection model, wherein, as shown in Figure 2, the process includes the following steps:
S201:获得初始目标检测模型。S201: Obtain an initial target detection model.
其中,初始目标检测模型包括特征提取层、特征分类层以及特征回归层;The initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
S202:获得多个样本图像以及样本图像对应的标定信息。S202: Obtain multiple sample images and calibration information corresponding to the sample images.
其中,标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息。The calibration information includes: calibration frame position information and calibration category information corresponding to the sample target contained in the corresponding sample image.
S203:针对每一样本图像,将该样本图像输入特征提取层,提取得到该样本图像对应的样本图像特征。S203: For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image.
S204:针对每一样本图像,将该样本图像对应的样本图像特征输入特征回归层,得到该样本图像中样本目标对应的预测框位置信息。S204: For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image.
S205:针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息。S205: For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine as the sample target corresponding Ground truth box quality information.
S206:针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息。S206: For each sample image, input the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image into the feature classification layer, and determine the prediction category information and the prediction frame corresponding to the sample target in the sample image quality information.
S207:针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值。S207: For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, and the preset category loss function, the corresponding sample target in the sample image. Prediction category information and calibration category information to determine the current loss value.
S208:判断当前的损失值是否超过预设损失值阈值。S208: Determine whether the current loss value exceeds a preset loss value threshold.
S209:若判断结果为是,则调整特征提取层、特征回归层以及特征分类层的模型参数,返回执行S203。S209: If the judgment result is yes, adjust the model parameters of the feature extraction layer, the feature regression layer, and the feature classification layer, and return to S203.
S210:若判断结果为否,则确定初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。S210: If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
本实现方式中,电子设备在确定待检测图像对应的目标检测结果之前,还可以包括训练得到预先建立的目标检测模型的过程。相应的,电子设备获得多个样本图像及其对应的标定信息,该样本图像可以包括样本目标,样本图像对应的标定信息可以包括样本图像中样本目标对应的标定框位置信息。获得包括特征提取层、特征回归层以及特征分类层的初始目标检测模型;针对每一样本图像,将该样本图像输入特征提取层,提取得到该样本图像对应的样本图像特征;将该样本图像对应的样本图像特征输入特征回归层,得到该样本图像中样本目标对应的预测框位置信息;进而,针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;并针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;将样本图像中样本目标对应的真实框质量信息作为一种标定信息,针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,确定当前的第一损失值;并基于预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的第二损失值;进而基于当前的第一损失值和当前的第二损失值,确定当前的损失值。In this implementation manner, before determining the target detection result corresponding to the image to be detected, the electronic device may further include a process of training to obtain a pre-established target detection model. Correspondingly, the electronic device obtains a plurality of sample images and their corresponding calibration information, the sample images may include sample objects, and the calibration information corresponding to the sample images may include calibration frame position information corresponding to the sample objects in the sample image. Obtain an initial target detection model including a feature extraction layer, a feature regression layer and a feature classification layer; for each sample image, input the sample image into the feature extraction layer, and extract the sample image features corresponding to the sample image; the sample image corresponds to The sample image features are input into the feature regression layer, and the prediction frame position information corresponding to the sample target in the sample image is obtained; then, for each sample target in each sample image, the calibration frame position information corresponding to the sample target and the corresponding prediction are calculated. The ratio information of the intersection area and the union area between the frame position information is determined as the real frame quality information corresponding to the sample target; and for each sample image, the sample image feature corresponding to the sample image and the sample in the sample image. The position information of the prediction frame corresponding to the target is input into the feature classification layer, and the prediction category information and prediction frame quality information corresponding to the sample target in the sample image are determined; the real frame quality information corresponding to the sample target in the sample image is used as a kind of calibration information. A sample image, the current first loss value is determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image; and based on the preset category loss function, the sample image The prediction category information and the calibration category information corresponding to the sample target in the middle, determine the current second loss value; and then determine the current loss value based on the current first loss value and the current second loss value.
若判断当前的损失值超过预设损失值阈值,则确定初始目标检测模型未达到收敛状态,利用预设优化算法,调整特征提取层、特征回归层以及特征分类层的模型参数,并返回执行针对每一样本图像,将该样本图像输入特征提取层,提取得到该样本图像对应的样本图像特征的步骤。若判断当前的损失值未超过预设损失值阈值,确定初始目标检测模型达到收敛状态,得到可以检测图像中目标所在位置区域、所属类别信息以及表征所检测的目标所在位置区域的位置信息的准确性的框质量信息的预先建立的目标检测模型。If it is judged that the current loss value exceeds the preset loss value threshold, it is determined that the initial target detection model has not reached the convergence state, and the preset optimization algorithm is used to adjust the model parameters of the feature extraction layer, feature regression layer and feature classification layer, and return to execute the For each sample image, the sample image is input into the feature extraction layer, and the sample image features corresponding to the sample image are extracted and obtained. If it is judged that the current loss value does not exceed the preset loss value threshold, it is determined that the initial target detection model has reached a convergence state, and an accurate image of the location area, category information and location information representing the location area of the detected target in the image can be detected. A pre-built object detection model with characteristic box quality information.
一种情况中,可以基于预设定位质量聚焦损失函数、样本图像中每一样本目标对应的预测框质量信息和真实框质量信息,确定每一样本目标对应的框质量损失值,进而将该样本图像中所有样本目标对应的框质量损失值的和或者平均值,确定为当前的第一损失值;并基于预设类别损失函数、样本图像中每一样本目标对应的预测类别信息和标定类别信息,确定每一样本目标对应的类别损失值;进而将该样本图像中所有样本目标对应的类别损失值的和或者平均值,确定当前的第二损失值;进而将当前的第一损失值及其对应的权重值的乘积,与当前的第二损失值及其对应的权重值的乘积的和,确定当前 的损失值。In one case, the frame quality loss value corresponding to each sample target may be determined based on the preset positioning quality focusing loss function, the predicted frame quality information and the real frame quality information corresponding to each sample target in the sample image, and then the sample target The sum or average value of the frame quality loss values corresponding to all sample targets in the image is determined as the current first loss value; and based on the preset category loss function, the predicted category information and calibration category information corresponding to each sample target in the sample image , determine the category loss value corresponding to each sample target; and then determine the current second loss value by the sum or average value of the category loss values corresponding to all sample targets in the sample image; and then use the current first loss value and its The sum of the product of the corresponding weight value and the current second loss value and the product of the corresponding weight value determines the current loss value.
其中,预设优化算法可以包括但不限于梯度下降法。一种情况中,该样本目标可以为车辆、行人以及交通指示线等。该初始目标检测模型可以为基于深度学习的神经网络模型。该预设类别损失函数可以为相关技术中任一类型的可以计算类别信息之间的损失值的损失函数,本发明实施例并不做限定。The preset optimization algorithm may include, but is not limited to, the gradient descent method. In one case, the sample targets may be vehicles, pedestrians, and traffic signs. The initial target detection model may be a deep learning-based neural network model. The preset category loss function may be any type of loss function in the related art that can calculate a loss value between category information, which is not limited in the embodiment of the present invention.
一种情况中,还可以结合预设位置损失函数、该样本图像中样本目标对应的预测框位置信息和标定框位置信息,确定上述当前的损失值。其中,该预设位置损失函数可以为相关技术中任一类型的可以计算框位置信息之间的损失值的损失函数,本发明实施例并不做限定。In one case, the above-mentioned current loss value may also be determined in combination with the preset position loss function, the position information of the prediction frame corresponding to the sample target in the sample image, and the position information of the calibration frame. The preset position loss function may be any type of loss function in the related art that can calculate a loss value between frame position information, which is not limited in the embodiment of the present invention.
在本发明的另一实施例中,该预设定位质量聚焦损失函数(LFL,Localization Focal Loss)的表达式可以为:In another embodiment of the present invention, the expression of the preset localization quality focus loss function (LFL, Localization Focal Loss) may be:
LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γLFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;
其中,该LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Among them, the LFL(i) represents the first loss value between the predicted frame quality information corresponding to the ith sample target in the sample image and the real frame quality information, and p i represents the ith sample target in the sample image corresponding to the first loss value The real frame quality information, q i represents the predicted frame quality information corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
在一种实现方式中,电子设备还可以利用批量样本图像计算得到当前的损失值,即利用预设定位质量聚焦损失函数、多个样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该多个样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值,这也是可以的。In an implementation manner, the electronic device may also use batch sample images to calculate the current loss value, that is, use the preset positioning quality focus loss function, the predicted frame quality information and the real frame quality information corresponding to the sample targets in the multiple sample images , and the preset category loss function, the predicted category information and the calibration category information corresponding to the sample targets in the multiple sample images, to determine the current loss value, which is also possible.
在本发明的另一实施例中,可以联合表示类别信息与框质量信息,相应的,样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。如图3所示,为联合表示的类别信息与框质量信息一种示例图,其中,框质量信息通过数值表示,取值范围为[0,1],如图3所示,可以表示该预设建立的目标检测模型对应的可检测目标的类别信息数量为5个,0.9表示所对应检测框位置信息对应的框质量信息,而0.9在第二个框中,可以表示预设建立的目标检测模型预测检测框位置信息对应的目标属于第二种类别。In another embodiment of the present invention, the category information and the frame quality information can be jointly represented. Correspondingly, the sample frame quality information and the sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample image corresponding to The position of the sample frame quality information in the preset soft one-hot encoding represents the sample category information corresponding to the sample image. As shown in FIG. 3, it is an example diagram of the category information and frame quality information jointly represented, in which the frame quality information is represented by numerical values, and the value range is [0, 1]. As shown in FIG. Set the number of category information of detectable targets corresponding to the established target detection model to 5, 0.9 represents the frame quality information corresponding to the position information of the corresponding detection frame, and 0.9 in the second frame can represent the preset target detection. The model predicts that the target corresponding to the location information of the detection frame belongs to the second category.
在本发明的另一实施例中,所述S102,可以包括如下步骤:In another embodiment of the present invention, the S102 may include the following steps:
将待检测图像输入预先建立的目标检测模型的特征提取层,提取得到待检测图像对应的待检测图像特征;Input the to-be-detected image into the feature extraction layer of the pre-established target detection model, and extract the to-be-detected image features corresponding to the to-be-detected image;
将待检测图像特征输入预先建立的目标检测模型的特征回归层,确定待检测图像对 应的候选框位置信息;The feature of the image to be detected is input into the feature regression layer of the pre-established target detection model, and the position information of the candidate frame corresponding to the image to be detected is determined;
将待检测图像特征以及候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;Input the image features to be detected and the candidate frame position information into the feature classification layer of the pre-established target detection model, and determine the detection category information and target frame corresponding to each candidate frame position information corresponding to each detection target in the to-be-detected image quality information;
针对待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到待检测图像对应的目标检测结果,其中,预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the image to be detected, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, it is determined to satisfy the The position information of the candidate frame of the preset screening condition is used as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the position of the candidate frame corresponding to the detection target The condition for the maximum quality information of the corresponding target frame in the information.
其中,预设抑制算法可以为NMS(Non Maximum Suppression,非极大抑制算法)。The preset suppression algorithm may be NMS (Non Maximum Suppression, non-maximum suppression algorithm).
本实现方式中,电子设备通过预先建立的目标检测模型的特征提取层以及特征回归层,确定出待检测图像对应的候选框位置信息,进而,利用预先建立的目标检测模型的特征分类层、待检测图像特征以及候选框位置信息,确定出待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;通过预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到待检测图像对应的目标检测结果。实现通过比较框质量信息所表征的所对应候选框位置信息的准确性的高低,来完成对所对应候选框位置信息的筛选和确定,以得到检测位置准确性更好的框位置信息。In this implementation manner, the electronic device determines the position information of the candidate frame corresponding to the image to be detected through the feature extraction layer and the feature regression layer of the pre-established target detection model, and then uses the feature classification layer of the pre-established target detection model and the feature classification layer to be detected. Detect image features and candidate frame position information, and determine the detection category information and target frame quality information corresponding to each candidate frame position information corresponding to each detection target in the image to be detected; The quality information of the target frame corresponding to the position information of the candidate frame, from the position information of all the candidate frames corresponding to the detection target, determine the position information of the candidate frame that satisfies the preset screening conditions, as the target detection frame position information corresponding to the detection target, In order to obtain the target detection result corresponding to the image to be detected. By comparing the accuracy of the corresponding candidate frame position information represented by the frame quality information, the selection and determination of the corresponding candidate frame position information is completed, so as to obtain frame position information with better detection position accuracy.
相应于上述方法实施例,本发明实施例提供了一种目标检测装置,如图4所示,所述装置可以包括:Corresponding to the foregoing method embodiments, an embodiment of the present invention provides a target detection apparatus. As shown in FIG. 4 , the apparatus may include:
获得模块410,被配置为获得待检测图像;an obtaining module 410, configured to obtain an image to be detected;
确定模块420,被配置为利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。The determination module 420 is configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: a detected target in the to-be-detected image Corresponding target detection frame position information and target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information , the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model Location information, definite information.
应用本发明实施例,基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的预先建立的目标检测模型,具有预测图像中检测目标对应的目标检测框对应的质量的功能,且该样本框质量信息为基于该样本图像对应的标定信息中标定框位置信 息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息,通过预先建立的目标检测模型的预测图像中目标对应的框位置信息对应的框质量信息,可以实现筛选出检测目标所对应框质量信息更好的框位置信息作为目标检测框位置信息,以实现对图像中目标对应的检测框的准确性的确定,进而得到准确性更好的目标对应的检测框。Applying the embodiment of the present invention, the pre-established target detection model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information has the function of predicting the quality corresponding to the target detection frame corresponding to the detection target in the image, and The sample frame quality information is the information determined based on the position information of the calibration frame in the calibration information corresponding to the sample image and the position information of the prediction frame corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model, Through the frame quality information corresponding to the frame position information corresponding to the target in the prediction image of the pre-established target detection model, the frame position information with better frame quality information corresponding to the detection target can be screened out as the target detection frame position information, so as to realize the target detection frame position information. The accuracy of the detection frame corresponding to the target in the image is determined, and then the detection frame corresponding to the target with better accuracy is obtained.
在本发明的另一实施例中,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。In another embodiment of the present invention, the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial target detection model corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the detected prediction frame position information corresponding to the sample image.
在本发明的另一实施例中,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。In another embodiment of the present invention, the target detection result further includes: detection category information corresponding to the detection target in the to-be-detected image.
在本发明的另一实施例中,所述装置还包括:In another embodiment of the present invention, the device further includes:
模型训练模块(图中未示出),被配置为在所述利用预先建立的目标检测模型以及所述待检测图像,从所述待检测图像中,检测出其中的待检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息之前,训练得到所述预先建立的目标检测模型,其中,所述模型训练模块,被具体配置为获得所述初始目标检测模型,其中,所述初始目标检测模型包括特征提取层、特征分类层以及特征回归层;The model training module (not shown in the figure) is configured to use the pre-established target detection model and the to-be-detected image to detect the target detection corresponding to the to-be-detected target from the to-be-detected image Before the frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is obtained by training, wherein the model training module is specifically configured to obtain the initial target detection model, wherein, The initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
获得多个样本图像以及样本图像对应的标定信息,其中,所述标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息;Obtaining a plurality of sample images and calibration information corresponding to the sample images, wherein the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
针对每一样本图像,将该样本图像对应的样本图像特征输入所述特征回归层,得到该样本图像中样本目标对应的预测框位置信息;For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image;
针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame corresponding to the sample target quality information;
针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入所述特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;For each sample image, the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值;For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, as well as the preset category loss function, the predicted category corresponding to the sample target in the sample image Information and calibration category information to determine the current loss value;
判断当前的损失值是否超过预设损失值阈值;Determine whether the current loss value exceeds the preset loss value threshold;
若判断结果为是,则调整所述特征提取层、所述特征回归层以及所述特征分类层的模型参数,返回执行针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征的步骤;If the judgment result is yes, then adjust the model parameters of the feature extraction layer, the feature regression layer and the feature classification layer, return to execute for each sample image, input the sample image into the feature extraction layer, and extract the result. The steps of the sample image feature corresponding to the sample image;
若判断结果为否,则确定所述初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
在本发明的另一实施例中,所述预设定位质量聚焦损失函数的表达式为:In another embodiment of the present invention, the expression of the preset positioning quality focus loss function is:
LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γLFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;
其中,所述LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Wherein the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame, P i represents the i-th sample image sample corresponding to a target The quality information of the real frame, q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
在本发明的另一实施例中,所述样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,所述样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。In another embodiment of the present invention, the sample frame quality information and sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample frame quality information corresponding to the sample image is stored in the preset soft one-hot encoding. The position in one-hot encoding represents the sample category information corresponding to the sample image.
在本发明的另一实施例中,所述确定模块410,被具体配置为将所述待检测图像输入预先建立的目标检测模型的特征提取层,提取得到所述待检测图像对应的待检测图像特征;In another embodiment of the present invention, the determining module 410 is specifically configured to input the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extract the to-be-detected image corresponding to the to-be-detected image feature;
将所述待检测图像特征输入所述预先建立的目标检测模型的特征回归层,确定所述待检测图像对应的候选框位置信息;Inputting the feature of the image to be detected into the feature regression layer of the pre-established target detection model, and determining the position information of the candidate frame corresponding to the image to be detected;
将所述待检测图像特征以及所述候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出所述待检测图像中每一检测目标对应的每一候选框位置信息对应的检测类别信息以及目标框质量信息;Input the feature of the image to be detected and the position information of the candidate frame into the feature classification layer of the pre-established target detection model, and determine the position information of each candidate frame corresponding to each detection target in the image to be detected. Detection category information and target frame quality information;
针对所述待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果,其中,所述预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
上述系统、装置实施例与系统实施例相对应,与该方法实施例具有同样的技术效果,具体说明参见方法实施例。装置实施例是基于方法实施例得到的,具体的说明可以参见方法实施例部分,此处不再赘述。本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。The foregoing system and device embodiments correspond to the system embodiments, and have the same technical effects as the method embodiments. For specific descriptions, refer to the method embodiments. The apparatus embodiment is obtained based on the method embodiment, and the specific description can refer to the method embodiment section, which will not be repeated here. Those of ordinary skill in the art can understand that the accompanying drawing is only a schematic diagram of an embodiment, and the modules or processes in the accompanying drawing are not necessarily necessary to implement the present invention.
本领域普通技术人员可以理解:实施例中的装置中的模块可以按照实施例描述分布于实施例的装置中,也可以进行相应变化位于不同于本实施例的一个或多个装置中。上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块。Those skilled in the art may understand that: the modules in the apparatus in the embodiment may be distributed in the apparatus in the embodiment according to the description of the embodiment, and may also be located in one or more apparatuses different from this embodiment with corresponding changes. The modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present invention.

Claims (10)

  1. 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:
    获得待检测图像;Obtain the image to be detected;
    利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A target detection result corresponding to the to-be-detected image is determined by using a pre-established target detection model and the to-be-detected image, wherein the target detection result includes: target detection frame position information corresponding to the detected target in the to-be-detected image and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, the sample image corresponding The sample frame quality information is: information determined based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position information corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model.
  2. 如权利要求1所述的方法,其特征在于,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。The method according to claim 1, wherein the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial value corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image detected by the target detection model.
  3. 如权利要求1或2所述的方法,其特征在于,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。The method according to claim 1 or 2, wherein the target detection result further comprises: detection category information corresponding to the detection target in the to-be-detected image.
  4. 如权利要求1-3任一项所述的方法,其特征在于,在所述利用预先建立的目标检测模型以及所述待检测图像,从所述待检测图像中,检测出其中的待检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息的步骤之前,所述方法还包括:The method according to any one of claims 1-3, characterized in that, in using the pre-established target detection model and the to-be-detected image, the to-be-detected target is detected from the to-be-detected image. Before the step of the corresponding target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the method further includes:
    训练得到所述预先建立的目标检测模型的过程,其中,所述过程包括:The process of obtaining the pre-established target detection model by training, wherein the process includes:
    获得所述初始目标检测模型,其中,所述初始目标检测模型包括特征提取层、特征分类层以及特征回归层;obtaining the initial target detection model, wherein the initial target detection model includes a feature extraction layer, a feature classification layer and a feature regression layer;
    获得多个样本图像以及样本图像对应的标定信息,其中,所述标定信息包括:所对应样本图像中包含的样本目标对应的标定框位置信息以及标定类别信息;Obtaining a plurality of sample images and calibration information corresponding to the sample images, wherein the calibration information includes: calibration frame position information and calibration category information corresponding to the sample targets contained in the corresponding sample images;
    针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征;For each sample image, input the sample image into the feature extraction layer, and extract the sample image feature corresponding to the sample image;
    针对每一样本图像,将该样本图像对应的样本图像特征输入所述特征回归层,得到该样本图像中样本目标对应的预测框位置信息;For each sample image, input the sample image feature corresponding to the sample image into the feature regression layer, and obtain the prediction frame position information corresponding to the sample target in the sample image;
    针对每一样本图像中每一样本目标,计算该样本目标对应的标定框位置信息以及对 应的预测框位置信息之间的交集面积与并集面积的比值信息,确定为该样本目标对应的真实框质量信息;For each sample target in each sample image, calculate the ratio information of the intersection area and the union area between the calibration frame position information corresponding to the sample target and the corresponding prediction frame position information, and determine it as the real frame corresponding to the sample target quality information;
    针对每一样本图像,将该样本图像对应的样本图像特征以及该样本图像中样本目标对应的预测框位置信息输入所述特征分类层,确定该样本图像中样本目标对应的预测类别信息以及预测框质量信息;For each sample image, the sample image feature corresponding to the sample image and the prediction frame position information corresponding to the sample target in the sample image are input into the feature classification layer, and the prediction category information and prediction frame corresponding to the sample target in the sample image are determined. quality information;
    针对每一样本图像,基于预设定位质量聚焦损失函数、该样本图像中样本目标对应的预测框质量信息和真实框质量信息,以及预设类别损失函数、该样本图像中样本目标对应的预测类别信息和标定类别信息,确定当前的损失值;For each sample image, focus on the loss function based on the preset positioning quality, the predicted frame quality information and the real frame quality information corresponding to the sample target in the sample image, as well as the preset category loss function, the predicted category corresponding to the sample target in the sample image Information and calibration category information to determine the current loss value;
    判断当前的损失值是否超过预设损失值阈值;Determine whether the current loss value exceeds the preset loss value threshold;
    若判断结果为是,则调整所述特征提取层、所述特征回归层以及所述特征分类层的模型参数,返回执行针对每一样本图像,将该样本图像输入所述特征提取层,提取得到该样本图像对应的样本图像特征的步骤;If the judgment result is yes, then adjust the model parameters of the feature extraction layer, the feature regression layer and the feature classification layer, return to execute for each sample image, input the sample image into the feature extraction layer, and extract the result. The steps of the sample image feature corresponding to the sample image;
    若判断结果为否,则确定所述初始目标检测模型达到收敛状态,得到预先建立的目标检测模型。If the judgment result is no, it is determined that the initial target detection model has reached a convergence state, and a pre-established target detection model is obtained.
  5. 如权利要求4所述的方法,其特征在于,所述预设定位质量聚焦损失函数的表达式为:The method according to claim 4, wherein the expression of the preset positioning quality focusing loss function is:
    LFL(i)=-((1-p i)log(1-q i)+p ilog(q i))|p i-q i| γLFL (i) = - (( 1-p i) log (1-q i) + p i log (q i)) | p i -q i | γ;
    其中,所述LFL(i)表示该样本图像中第i个样本目标对应的预测框质量信息和真实框质量信息之间的第一损失值,p i表示该样本图像中第i个样本目标对应的真实框质量信息,q i表示该样本图像中第i个样本目标对应的预测框质量信息,γ表示预设参数。 Wherein the LFL (i) denotes the i th value of loss between the first information and the real information of the sample corresponding to a target frame quality predicted mass of the sample image frame, P i represents the i-th sample image sample corresponding to a target The quality information of the real frame, q i represents the quality information of the predicted frame corresponding to the ith sample target in the sample image, and γ represents the preset parameter.
  6. 如权利要求4所述的方法,其特征在于,所述样本图像对应的样本框质量信息以及样本类别信息以预设软独热编码的形式存在,所述样本图像对应的样本框质量信息在预设软独热编码中的位置表示样本图像对应的样本类别信息。The method according to claim 4, wherein the sample frame quality information and the sample category information corresponding to the sample image exist in the form of preset soft one-hot encoding, and the sample frame quality information corresponding to the sample image is stored in the preset soft one-hot encoding. Let the position in the soft one-hot encoding represent the sample category information corresponding to the sample image.
  7. 如权利要求1-3任一项所述的方法,其特征在于,所述利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果的步骤,包括:The method according to any one of claims 1-3, wherein the step of determining the target detection result corresponding to the to-be-detected image by using a pre-established target detection model and the to-be-detected image comprises:
    将所述待检测图像输入预先建立的目标检测模型的特征提取层,提取得到所述待检测图像对应的待检测图像特征;Inputting the to-be-detected image into a feature extraction layer of a pre-established target detection model, and extracting to-be-detected image features corresponding to the to-be-detected image;
    将所述待检测图像特征输入所述预先建立的目标检测模型的特征回归层,确定所述待检测图像对应的候选框位置信息;Inputting the feature of the image to be detected into the feature regression layer of the pre-established target detection model, and determining the position information of the candidate frame corresponding to the image to be detected;
    将所述待检测图像特征以及所述候选框位置信息输入所述预先建立的目标检测模型的特征分类层,确定出所述待检测图像中每一检测目标对应的每一候选框位置信息对应 的检测类别信息以及目标框质量信息;Input the feature of the image to be detected and the position information of the candidate frame into the feature classification layer of the pre-established target detection model, and determine the position information of each candidate frame corresponding to each detection target in the image to be detected. Detection category information and target frame quality information;
    针对所述待检测图像中每一检测目标,基于预设抑制算法、该检测目标对应的每一候选框位置信息对应的目标框质量信息,从该检测目标对应的所有候选框位置信息中,确定出满足预设筛选条件的候选框位置信息,作为该检测目标对应的目标检测框位置信息,以得到所述待检测图像对应的目标检测结果,其中,所述预设筛选条件为:限定该检测目标对应的候选框位置信息中所对应目标框质量信息最大的条件。For each detection target in the to-be-detected image, based on the preset suppression algorithm, the target frame quality information corresponding to the position information of each candidate frame corresponding to the detection target, and from the position information of all candidate frames corresponding to the detection target, determine The position information of the candidate frame that satisfies the preset screening condition is obtained as the position information of the target detection frame corresponding to the detection target, so as to obtain the target detection result corresponding to the image to be detected, wherein the preset screening condition is: limit the detection The condition that the quality information of the corresponding target frame in the position information of the candidate frame corresponding to the target is the largest.
  8. 一种目标检测装置,其特征在于,所述装置包括:A target detection device, characterized in that the device comprises:
    获得模块,被配置为获得待检测图像;an obtaining module, configured to obtain an image to be detected;
    确定模块,被配置为利用预先建立的目标检测模型以及所述待检测图像,确定所述待检测图像对应的目标检测结果,其中,所述目标检测结果包括:所述待检测图像中检测目标对应的目标检测框位置信息及目标检测框位置信息对应的目标框质量信息,所述预先建立的目标检测模型为:基于样本图像及其对应的标定信息以及所对应样本框质量信息训练所得的模型,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息,确定的信息。A determination module, configured to use a pre-established target detection model and the to-be-detected image to determine a target detection result corresponding to the to-be-detected image, wherein the target detection result includes: the detected target in the to-be-detected image corresponds to The target detection frame position information and the target frame quality information corresponding to the target detection frame position information, the pre-established target detection model is: a model obtained by training based on the sample image and its corresponding calibration information and the corresponding sample frame quality information, The sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the predicted frame position corresponding to the sample image detected based on the initial target detection model corresponding to the pre-established target detection model information, definite information.
  9. 如权利要求8所述的装置,其特征在于,所述样本图像对应的样本框质量信息为:基于该样本图像对应的标定信息中的标定框位置信息以及基于预先建立的目标检测模型所对应初始目标检测模型检测出的该样本图像对应的预测框位置信息之间的交集面积与并集面积的比值信息。The device according to claim 8, wherein the sample frame quality information corresponding to the sample image is: based on the calibration frame position information in the calibration information corresponding to the sample image and the initial value corresponding to the pre-established target detection model The ratio information of the intersection area and the union area between the position information of the prediction frame corresponding to the sample image detected by the target detection model.
  10. 如权利要求8或9所述的装置,其特征在于,所述目标检测结果还包括:所述待检测图像中检测目标对应的检测类别信息。The apparatus according to claim 8 or 9, wherein the target detection result further comprises: detection category information corresponding to the detection target in the to-be-detected image.
PCT/CN2020/121337 2020-06-29 2020-10-16 Target detection method and device WO2022000855A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010601392.2A CN113935386A (en) 2020-06-29 2020-06-29 Target detection method and device
CN202010601392.2 2020-06-29

Publications (1)

Publication Number Publication Date
WO2022000855A1 true WO2022000855A1 (en) 2022-01-06

Family

ID=79272632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121337 WO2022000855A1 (en) 2020-06-29 2020-10-16 Target detection method and device

Country Status (2)

Country Link
CN (1) CN113935386A (en)
WO (1) WO2022000855A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173568A (en) * 2023-09-05 2023-12-05 北京观微科技有限公司 Target detection model training method and target detection method
CN117636266A (en) * 2024-01-25 2024-03-01 华东交通大学 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment
CN117636266B (en) * 2024-01-25 2024-05-14 华东交通大学 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295678A (en) * 2016-07-27 2017-01-04 北京旷视科技有限公司 Neural metwork training and construction method and device and object detection method and device
CN108268869A (en) * 2018-02-13 2018-07-10 北京旷视科技有限公司 Object detection method, apparatus and system
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN109727275A (en) * 2018-12-29 2019-05-07 北京沃东天骏信息技术有限公司 Object detection method, device, system and computer readable storage medium
CN111062413A (en) * 2019-11-08 2020-04-24 深兰科技(上海)有限公司 Road target detection method and device, electronic equipment and storage medium
CN111241947A (en) * 2019-12-31 2020-06-05 深圳奇迹智慧网络有限公司 Training method and device of target detection model, storage medium and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295678A (en) * 2016-07-27 2017-01-04 北京旷视科技有限公司 Neural metwork training and construction method and device and object detection method and device
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN108268869A (en) * 2018-02-13 2018-07-10 北京旷视科技有限公司 Object detection method, apparatus and system
CN109727275A (en) * 2018-12-29 2019-05-07 北京沃东天骏信息技术有限公司 Object detection method, device, system and computer readable storage medium
CN111062413A (en) * 2019-11-08 2020-04-24 深兰科技(上海)有限公司 Road target detection method and device, electronic equipment and storage medium
CN111241947A (en) * 2019-12-31 2020-06-05 深圳奇迹智慧网络有限公司 Training method and device of target detection model, storage medium and computer equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173568A (en) * 2023-09-05 2023-12-05 北京观微科技有限公司 Target detection model training method and target detection method
CN117636266A (en) * 2024-01-25 2024-03-01 华东交通大学 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment
CN117636266B (en) * 2024-01-25 2024-05-14 华东交通大学 Method and system for detecting safety behaviors of workers, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113935386A (en) 2022-01-14

Similar Documents

Publication Publication Date Title
CN109977812B (en) Vehicle-mounted video target detection method based on deep learning
CN111353413B (en) Low-missing-report-rate defect identification method for power transmission equipment
KR101926561B1 (en) Road crack detection apparatus of patch unit and method thereof, and computer program for executing the same
CN109447169B (en) Image processing method, training method and device of model thereof and electronic system
WO2020078229A1 (en) Target object identification method and apparatus, storage medium and electronic apparatus
CN110263706B (en) Method for detecting and identifying dynamic target of vehicle-mounted video in haze weather
CN110084165B (en) Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
Bello-Salau et al. Image processing techniques for automated road defect detection: A survey
CN105788269A (en) Unmanned aerial vehicle-based abnormal traffic identification method
CN111985365A (en) Straw burning monitoring method and system based on target detection technology
CN112883921A (en) Garbage can overflow detection model training method and garbage can overflow detection method
CN110956104A (en) Method, device and system for detecting overflow of garbage can
CN113221804B (en) Disordered material detection method and device based on monitoring video and application
CN113642474A (en) Hazardous area personnel monitoring method based on YOLOV5
CN111597901A (en) Illegal billboard monitoring method
KR102391853B1 (en) System and Method for Processing Image Informaion
CN110852164A (en) YOLOv 3-based method and system for automatically detecting illegal building
WO2022000855A1 (en) Target detection method and device
CN112862150A (en) Forest fire early warning method based on image and video multi-model
CN114267082A (en) Bridge side falling behavior identification method based on deep understanding
CN113989626B (en) Multi-class garbage scene distinguishing method based on target detection model
Lam et al. Real-time traffic status detection from on-line images using generic object detection system with deep learning
CN114926791A (en) Method and device for detecting abnormal lane change of vehicles at intersection, storage medium and electronic equipment
CN116071711B (en) Traffic jam condition detection method and device
CN113971666A (en) Power transmission line machine inspection image self-adaptive identification method based on depth target detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20943586

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20943586

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20943586

Country of ref document: EP

Kind code of ref document: A1