WO2021254205A1 - Procédé et appareil de détection de cible - Google Patents

Procédé et appareil de détection de cible Download PDF

Info

Publication number
WO2021254205A1
WO2021254205A1 PCT/CN2021/098734 CN2021098734W WO2021254205A1 WO 2021254205 A1 WO2021254205 A1 WO 2021254205A1 CN 2021098734 W CN2021098734 W CN 2021098734W WO 2021254205 A1 WO2021254205 A1 WO 2021254205A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
detection
image
mask
frame
Prior art date
Application number
PCT/CN2021/098734
Other languages
English (en)
Chinese (zh)
Inventor
冀怀远
汪明明
唐诗尧
刘澍
Original Assignee
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司 filed Critical 苏宁易购集团股份有限公司
Publication of WO2021254205A1 publication Critical patent/WO2021254205A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the invention relates to the technical field of image detection and recognition, in particular to a target detection method and device.
  • the current mainstream target detection method is based on the deep learning method, which has the advantage of obtaining better detection results.
  • the problem of poor results will occur when the scene is changed.
  • a single deep learning target detection algorithm cannot solve the problem of misdetection in complex scenes. This will lead to the tracking of the corresponding monitoring scene and the wrong analysis of shopping behavior, which will affect The final product settlement.
  • the present invention provides a target detection method and device to solve the problem of environmental dependence of target detection algorithms in existing corresponding monitoring scenes and the problem of misdetection of complex scenes.
  • a target detection method includes:
  • the mask image is input into a pre-trained target detection model for detection, and a detection result of each target object is obtained, wherein each detection result includes the positions, types, and confidences of multiple candidate frames ;
  • Denoising processing is performed on a plurality of candidate frames with a confidence level higher than a confidence threshold in each of the detection results, to obtain a final detection result of each of the target objects.
  • a target detection method includes:
  • the target mask is used to verify the valid candidate frame of each target object, and the final detection result of each target object is obtained.
  • a target detection device in a third aspect, includes:
  • the first acquisition module is configured to acquire a target mask corresponding to at least one target object in the image to be detected
  • the second acquisition module is configured to use the target mask to mask the image to be detected to obtain a mask image with the background removed;
  • the target detection module is used to input the mask image into a pre-trained target detection model for detection, and obtain a detection result of each target object, wherein each detection result includes multiple candidate frames Location, category and confidence level;
  • the denoising processing module is configured to perform denoising processing on a plurality of candidate frames whose confidence is higher than the confidence threshold in each of the detection results to obtain the final detection result of each of the target objects.
  • a target detection device in a fourth aspect, includes:
  • An acquiring module configured to acquire a target mask corresponding to at least one target object in the image to be detected
  • the target detection module is used to input the to-be-detected image into a pre-trained target detection model for detection, and obtain a detection result of each target object, wherein each detection result includes multiple candidate frames Location, category and confidence level;
  • a denoising processing module configured to perform denoising processing on a plurality of candidate frames with a confidence level higher than a confidence threshold in each of the detection results, to obtain a valid candidate frame for each of the target objects;
  • the verification module is configured to use the target mask to verify the valid candidate frame of each target object, and obtain the final detection result of each target object.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program
  • the computer program is The target detection method of any one of the aspect or the second aspect.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it achieves the objective described in any one of the first aspect or the second aspect Detection method.
  • the embodiment of the present invention provides a target detection method and device, which uses a pre-trained target detection model to perform target detection on a mask image with a background removed. Compared with the prior art, it largely solves the problem of deep learning algorithm. The dependence of the environment improves the robustness of the target detection algorithm, reduces target misdetection in complex scenes, and improves the accuracy of target detection.
  • FIG. 1 is a flowchart of a target detection method provided by Embodiment 1 of the present invention.
  • FIG. 2 is a training flowchart of the target detection model in step 103 shown in FIG. 1;
  • FIG. 3 is a flowchart of a target detection method provided by Embodiment 2 of the present invention.
  • FIG. 4 is a training flowchart of the target detection model in step 302 shown in FIG. 3;
  • FIG. 5 is a structural diagram of a target detection device provided by Embodiment 3 of the present invention.
  • FIG. 6 is a structural diagram of a target detection device provided by Embodiment 4 of the present invention.
  • the current mainstream target detection method is based on deep learning. Due to the dependence of the deep learning target detection method on the scene, it will produce poor results when changing the scene. In addition, due to the corresponding monitoring of the scene Complicated, a single deep learning target detection algorithm cannot well solve the problem of false detection in complex scenes, which will lead to false detection of target detection.
  • the embodiment of the present invention provides a target detection method, which can be applied to monitoring scenes such as human tracking, human-goods interaction, etc., by obtaining the target mask corresponding to the target object in the picture, and combining the target mask
  • the target detection algorithm based on deep learning can be used in more complex monitoring scenes, which can greatly reduce the problem of false detection of targets in complex scenes, and obtain better target detection results, and it can also be used in the case of environmental changes. Better robustness.
  • the embodiment of the present invention provides a target detection method.
  • the method is applied to a target detection device as an example.
  • the device can be configured in any computer device so that the computer device can execute the target detection method.
  • the method may include the steps:
  • the image to be detected can be extracted from the surveillance scene video shot by the surveillance camera, and the surveillance scene video is the video captured by the full-scene camera in the surveillance scene.
  • the image to be detected can be extracted from the surveillance scene video every preset time, or it can be triggered to extract the image to be detected from the surveillance scene video when a moving target is detected in the surveillance scene.
  • the image to be detected includes at least one target object and a background.
  • the target object in the image to be detected is specifically a human object.
  • step 101 may include the steps:
  • the background image may be an image obtained by shooting a surveillance scene that does not contain the target object, and the background image and the image to be detected may both be RGB images or both RGB-D images.
  • the following formula (1) can be used to compare the pixel values of the corresponding positions between the image to be detected R(x, y) and the background image G(x, y):
  • the candidate area of the target object can be obtained by using the region growing algorithm on the grayscale image of the image to be detected after the difference processing.
  • the specific steps are as follows:
  • the area threshold can be set according to actual needs.
  • the target mask to mask the image to be detected to obtain a mask image with the background removed.
  • the target mask and the image to be detected are subjected to a bitwise AND operation between the pixel values of the corresponding positions to obtain a mask image with the background removed.
  • the mask image with the background removed is obtained, that is, the instance mask is generated, and the instance segmentation is realized.
  • the pre-trained target detection model is obtained by pre-training multiple sample mask images.
  • the sample mask image may be obtained by photographing a monitoring scene where the target object exists to obtain a scene image, and performing mask processing on the scene image.
  • the confidence of a certain candidate box is used to indicate the probability that the candidate box belongs to a certain category.
  • the preprocessing of the mask image with the background removed includes: subtracting the mean value normalization, and scaling to a preset size (for example, 512*320) image, using the preprocessed mask image as the target detection model
  • the target detection model In the input image, the target detection model generates a feature map of each target object in the input image, outputs multiple detection frames on each anchor point in each feature map, and performs forward inference on each detection frame to obtain The position, category, and confidence of each detection frame form the detection result of each target object.
  • the following operations are performed for each detection result: according to the non-maximum suppression algorithm NMS, the candidate frames in the detection result whose confidence is lower than the preset confidence threshold are filtered out, and the remaining candidate frames are increased according to the confidence level. Sort in the lowest order, select the candidate box with the highest confidence in the ranking result, and traverse the remaining candidate boxes in the ranking result. If the ratio between the intersection of the two candidate frames and the union is greater than the preset threshold, the candidate frame currently traversed is deleted, and the retained candidate frame is used as the target frame of the target object.
  • NMS non-maximum suppression algorithm
  • the non-maximum suppression algorithm is as follows (5):
  • conf is the confidence of the candidate frame
  • ovr is the threshold of the intersection ratio iou of the candidate frame currently traversed and the candidate frame with the highest confidence.
  • the improved NMS algorithm can be soft-NMS or Softer-NMS.
  • the method may further include: using a target mask to verify the target frame in the final detection result of each target object.
  • the process may include:
  • all connected domains in the target mask are labeled, and the target frame of each target object is mapped to each connected domain after the label in the target mask, that is, the target frame of each target object is mapped to the target mask winning bid On each connected domain after the number.
  • the number of target frames mapped by a connected domain can be one or more, and the number of connected domains mapped by a target frame can be one or more.
  • Target frame If the target box is mapped with only one connected component, and the area of the mapped connected component is greater than the preset connected component area threshold, then the target box will be bound to the label of the mapped connected component; otherwise, the connected component will be filtered out.
  • the target frame is connected with the largest mapped area
  • the label of the domain is bound, otherwise, the target frame is filtered out;
  • the following steps can be used to determine the area of the connected domain in the target mask, including:
  • the area of the final target frame is calculated, the position of the circumscribed rectangle of the connected domain mapped by the final target frame is determined, and the area of the circumscribed rectangle of the connected domain is calculated.
  • the area of the final target frame is compared with the area of the circumscribed rectangle of the connected domain to which it is mapped.
  • the following steps can be used to determine the position of the circumscribed rectangle of the connected domain, including:
  • the position of the final target frame is adjusted, and the process includes:
  • the final target frame is Position adjustment
  • the comparison result indicates that the area of the final target frame is smaller than the area of the circumscribed rectangle of its mapped connected domain, then according to the union between the final target frame and the circumscribed rectangle of its mapped connected domain, the final target frame To adjust the position.
  • the area of the final target frame may be the same as the area of the circumscribed rectangle of the mapped connected domain.
  • the final target The area of the frame is larger than the area of the circumscribed rectangle of the connected domain to which it is mapped, indicating that the target object is partially occluded by the environment and the detected target frame (that is, the final target frame) is too large.
  • the target frame can be set The circumscribed rectangle of the mapped connected domain and the target frame are intersected to adjust the position of the target frame; another case is: the area of the final target frame is smaller than the area of the circumscribed rectangle of the connected domain to which it is mapped, indicating the detection
  • the result may be that the part of the target object (such as the part of the human hand) is truncated and the detected target frame (that is, the final target frame) is too small.
  • the circumscribed rectangle of the connected domain mapped by the target frame can be combined with the target The frame is combined to adjust the position of the target frame. It should be understood that in addition to using intersection or union, other methods may also be used to adjust the position of the target frame, which is not specifically limited in the present invention.
  • the embodiment of the present invention provides a target detection method, which obtains a target mask corresponding to at least one target object in an image to be detected; uses the target mask to mask the image to be detected to obtain a mask image with the background removed; The image is input into a pre-trained target detection model for detection, and the detection result of each target object is obtained. The multiple candidate frames with confidence higher than the confidence threshold in each detection result are denoised to obtain each target The final test result of the object.
  • the present invention uses a pre-trained target detection model to perform target detection on the mask image with the background removed, which largely solves the dependence of the deep learning algorithm on the environment and improves the performance of the target detection algorithm. Robustness reduces target misdetection in complex scenes and improves the accuracy of target detection.
  • the target detection model in step 103 can be trained in the following manner, including steps:
  • the following operations are performed for each frame of image in the sample video: obtain a target mask corresponding to at least one target object in the image, use the target mask to mask the image to be detected, and obtain a sample mask image with the background removed .
  • the process of acquiring the sample mask image can refer to step 101 to step 102, which will not be repeated here.
  • step 202 Perform preprocessing on each sample mask image to obtain a training sample set, where the training samples in the training sample set include sample images and label information of the sample images.
  • the implementation process of step 202 may include the following steps:
  • normalization processing is performed on each sample mask image, and sample enhancement is performed on each sample mask image after the normalization processing to obtain multiple sample images.
  • the normalization processing of subtracting the mean value can be performed on each sample mask image according to formula (8).
  • the sample enhancement may include scaling the sample mask image into images of multiple different sizes, and may also include flipping, mirroring, and rotating images of various sizes.
  • the sample image can be made more natural and the target feature more obvious, which is more convenient for model training; in addition, the sample mask image is sample-enhanced so that there is no need Collecting a large number of training data sets separately can generate a variety of data samples, which can improve the generalization ability and robustness of model recognition.
  • the labeling information of each sample image can specifically include the following parameters: sample image id, the spatial starting abscissa x of the target object in the sample image, the spatial starting ordinate y of the target object in the sample image, and the target object in the sample image.
  • sample image id the spatial starting abscissa x of the target object in the sample image
  • the spatial starting ordinate y of the target object in the sample image the spatial starting ordinate y of the target object in the sample image
  • the target object in the sample image can specifically include the following parameters: sample image id, the spatial starting abscissa x of the target object in the sample image, the spatial starting ordinate y of the target object in the sample image, and the target object in the sample image.
  • the training sample set can be divided into a training set and a test set according to a preset ratio (for example, 3:1).
  • the training set is used to train the initial network model and determine the parameters in the initial network model
  • the test set is used to test the model capabilities of the target detection model obtained by training.
  • the initial network model includes the basic convolutional neural network after weight initialization and the target detection network.
  • the network weights trained on the COCO data set can be used to initialize the weights of the basic convolutional neural network and the target detection network to obtain the initial network model.
  • step 203 the training set is input into the pre-built initial network model for training to obtain the target detection model.
  • the process may include the steps:
  • the basic convolutional neural network can use the Mobilenetv1 network framework.
  • Mobilenetv1 uses deep separable convolution to replace the basic neural network of traditional convolution, and deep separable convolution uses different convolution kernels for each input channel to extract feature maps, that is, one convolution kernel only performs one The channels are convolved, so there are a total of M convolution kernels for M channels, and one convolution kernel corresponds to one channel.
  • the depth of separable convolution can reduce the amount of convolution calculation.
  • the target detection network after weight initialization outputs multiple detection frames on each anchor point in the feature map, and forward inference is performed on each detection frame to obtain the position, category, and confidence of each detection frame.
  • the target detection network adopts networks such as YOLO, Fast-RCNN, FRCNN or MaskRCNN.
  • the detection network in this embodiment adopts the YOLOv3 network, and the feature map is 16 times and 32 times higher through the pooling layer in the YOLOv3 detection network.
  • Down-sampling select three candidate frames with different aspect ratios on each anchor point in the down-sampled feature map, and obtain the confidence, position and category of each candidate frame through forward inference.
  • one detection frame The position of the detection frame in the image to be detected includes the initial horizontal coordinate x of the space, the initial vertical coordinate of the space y, the area width w, the area height h, and the specific category type.
  • the position offset corresponding to each detection frame is obtained, and the square of the position of each detection frame is calculated according to the position offset corresponding to each detection frame.
  • Difference loss(x,y,w,h) According to the category of each detection frame and the category of the sample target, obtain the binary cross-entropy loss loss(p) of each detection frame category.
  • Loss( object) loss (x,y,w,h) + loss(C) + loss(p) (9)
  • loss(x,y,w,h) is the squared difference loss of the detection frame position
  • loss(C) and loss(p) are the confidence of the detection frame and the binary cross-entropy loss of the category. ... ,
  • the initial network model is optimized according to the model loss value, and the weights in the initial network model are updated through backpropagation to obtain a target detection model through training.
  • the parameters of the initial network model are optimized according to the model loss value, and step 2031 to step 2035 are re-entered, through repeated optimization iterations, until the loss function converges and the training ends, that is, a trained target detection model is obtained.
  • the gradient descent method SGD can be used to optimize the model parameters in the initial network model to minimize the difference between the predicted result and the actual result.
  • test set is input to the target detection model for testing to obtain the test value. If the test value is less than the preset threshold, the training set is used to continue training the target detection model; if the test value is greater than the preset threshold, the target detection model is indicated Complete training.
  • the embodiment of the present invention provides a target detection method.
  • the method is applied to a target detection device as an example.
  • the device can be applied to any computer device so that the computer device can execute the target detection method.
  • the method may include the steps:
  • the image to be detected can be extracted from the surveillance scene video captured by the surveillance camera, and the surveillance scene video is the video captured by the full-scene camera in the surveillance scene.
  • the image to be detected can be extracted from the surveillance scene video every preset time, or it can be triggered to extract the image to be detected from the surveillance scene video when a moving target is detected in the surveillance scene.
  • the image to be detected includes at least one target object and a background.
  • the target object in the image to be detected is specifically a human object.
  • step 301 may include the steps:
  • 3011 Perform difference processing between the image to be detected and the background image.
  • step 3011 can refer to step 1011, which will not be repeated here.
  • step 3012 can refer to step 1012, which will not be repeated here.
  • step 3013 For the implementation process of step 3013, refer to step 1013, which will not be repeated here.
  • the pre-trained target detection model is obtained by pre-training multiple sample images.
  • the sample image is a scene image obtained by shooting a surveillance scene where the target object exists.
  • the confidence of a certain candidate box is used to indicate the probability that the candidate box belongs to a certain category.
  • preprocessing the image to be detected includes: subtracting the mean value normalization and scaling to a preset size (for example, 512*320) image; using the preprocessed image to be detected as the input image of the target detection model ,
  • the target detection model generates a feature map of each target object in the input image, outputs multiple detection frames on each anchor point in each feature map, and performs forward inference on each detection frame to obtain each detection
  • the position, category, and confidence of the frame form the detection result of each target object.
  • the following operations are performed for each detection result: according to the non-maximum suppression algorithm NMS, the candidate frames in the detection result whose confidence is lower than the preset confidence threshold are filtered out, and the remaining candidate frames are increased according to the confidence level. Sort in low order, select the candidate box with the highest confidence in the ranking result, traverse the remaining candidate boxes in the ranking result, if the overlap area (IOU) of the candidate box currently traversed and the candidate box with the highest confidence is greater than a threshold , Delete the candidate frame currently traversed to obtain a valid candidate frame for each target object.
  • NMS non-maximum suppression algorithm
  • the improved NMS algorithm can be soft-NMS or Softer-NMS.
  • step 301 can be performed before step 302 or step 303, can also be performed after step 302 or step 303, can also be performed at the same time as step 302, or at the same time as step 303, this embodiment does not do this Specific restrictions.
  • step 304 may include the following steps:
  • all connected domains in the target mask are labeled, and the effective candidate frame of each target object is mapped to each connected domain after the label in the target mask, that is, the effective candidate frame of each target object is mapped to the target mask.
  • the effective candidate frame of each target object is mapped to the target mask.
  • the number of valid candidate frames mapped by a connected domain can be one or more, and the number of connected domains mapped by a valid candidate frame can be one or more.
  • the valid candidate box is mapped to only one connected component, and the area of the mapped connected component is greater than the preset connected component area threshold, then the valid candidate box is bound to the label of the mapped connected component; otherwise, it is filtered Drop the valid candidate box;
  • the valid candidate frame and the mapped area are the largest Bind the label of the connected domain of, otherwise, filter out the valid candidate frame;
  • the final valid candidate frame is determined.
  • the following steps can be used to determine the area of the connected domain in the target mask, including:
  • the area of the final valid candidate frame is compared with the area of the circumscribed rectangle of the connected domain to which it is mapped, according to the ratio For the result, adjust the position of the final valid candidate frame.
  • each final valid candidate frame it is determined whether there is only one connected domain mapped by the final valid candidate frame, and if so, the position of the final valid candidate frame is adjusted.
  • the connected domains mapped by the multiple final valid candidate frames are the same, there is no need to adjust the positions of the multiple final valid candidate frames.
  • the area of the final valid candidate frame is calculated, the position of the circumscribed rectangle of the connected domain mapped by the final valid candidate frame is determined, and the circumscribed rectangle of the connected domain is calculated. Compare the area of the final effective candidate frame with the area of the circumscribed rectangle of the connected domain to which it is mapped.
  • the following steps can be used to determine the position of the circumscribed rectangle of the connected domain, including:
  • the position of the final effective candidate frame is adjusted, and the process includes:
  • the comparison result indicates that the area of the final effective candidate frame is greater than the area of the circumscribed rectangle of the connected domain to which it is mapped, then the intersection between the final effective candidate frame and the circumscribed rectangle of the connected domain to which it is mapped is effective for the final Adjust the position of the candidate frame;
  • the comparison result indicates that the area of the final valid candidate box is smaller than the area of the circumscribed rectangle of the connected domain to which it is mapped, then according to the union between the final valid candidate box and the circumscribed rectangle of the connected domain to which it is mapped, the final The position of the valid candidate frame is adjusted.
  • the area of the final valid candidate frame may be the same as the area of the circumscribed rectangle of the mapped connected domain.
  • the area of the effective candidate frame is greater than the area of the circumscribed rectangle of the connected domain to which it is mapped, indicating that the target object is partially occluded by the environment and the detected target frame (that is, the final effective candidate frame) is too large.
  • This situation can be Take the intersection of the circumscribed rectangle of the connected domain mapped by the target frame and the target frame to adjust the position of the target frame; another case is that the area of the final valid candidate frame is smaller than the circumscribed rectangle of the connected domain mapped by it
  • the area of which means that the detection result may be truncated on the part of the target object (for example, the part of the human hand), resulting in the detected target frame (that is, the final valid candidate frame) is too small.
  • the connected domain mapped by the target frame can be The circumscribed rectangle of and the target frame are combined to adjust the position of the target frame. It should be understood that in addition to using intersection or union, other methods may also be used to adjust the position of the target frame, which is not specifically limited in the present invention.
  • the embodiment of the present invention provides a target detection method, by obtaining a target mask corresponding to at least one target object in an image to be detected; inputting the image to be detected into a pre-trained target detection model for detection, and obtaining each target object
  • the detection results where each detection result includes the position, category, and confidence of multiple candidate frames; denoising processing is performed on multiple candidate frames with confidence higher than the confidence threshold in each detection result to obtain each target The valid candidate frame of the object; use the target mask to verify the valid candidate frame of each target object, and obtain the final detection result of each target object.
  • the present invention uses a pre-trained target detection model to perform target detection on the image to be detected, and verifies the detection result by removing the background mask image, which largely solves the problem of deep learning detection algorithm.
  • the dependence of the environment improves the robustness of the target detection algorithm, reduces target misdetection in complex scenes, improves the accuracy of target detection, and also facilitates the scalability of the detection algorithm.
  • the target detection model in step 302 can be obtained by training in the following manner, including the steps:
  • step 401 For the implementation process of step 401, refer to step 202, which will not be repeated here.
  • step 402 can refer to step 203, which will not be repeated here.
  • step 403 can refer to step 204, which will not be repeated here.
  • an embodiment of the present invention provides a target detection device, which can be configured in any computer device, so that the computer device can execute the target detection method provided in the first embodiment .
  • the computer equipment can be configured as various terminals, such as a server, which can be implemented by an independent service or a server cluster.
  • the device may include:
  • the first acquiring module 51 is configured to acquire a target mask corresponding to at least one target object in the image to be detected
  • the second acquisition module 52 is configured to use the target mask to mask the image to be detected to obtain the mask image with the background removed;
  • the target detection module 53 is used to input the mask image into the pre-trained target detection model for detection, and obtain the detection result of each target object, wherein each detection result includes the positions, types, and confidences of multiple candidate frames Spend;
  • the denoising processing module 54 is configured to perform denoising processing on multiple candidate frames with confidence higher than the confidence threshold in each detection result to obtain the final detection result of each target object.
  • the first acquisition module 51 is specifically configured to: perform differential processing between the image to be detected and the background image to obtain a differential image with the background removed; use an area generation algorithm on the grayscale image of the differential image to generate an initial mask with the background removed.
  • Membrane filter the areas where the connected domain area in the initial mask is lower than the area threshold to obtain the target mask.
  • the device further includes a training module, and the training module includes:
  • the acquisition sub-module is used to mask each frame of the sample video to obtain multiple sample mask images with the background removed;
  • the preprocessing sub-module is used to preprocess each sample mask image to obtain a training sample set, where the training samples in the training sample set include sample images and label information of the sample images;
  • Dividing sub-module used to divide the training sample set into training set and test set
  • the training sub-module is used to input the training set into the pre-built initial network model for training to obtain the target detection model;
  • the test sub-module is used to input the test set to the target detection model for testing to obtain the test value.
  • the target detection model completes the training.
  • the preprocessing sub-module is specifically used to: normalize each sample mask image, and perform sample enhancement on each sample mask image after the normalization processing to obtain multiple sample images; Obtain the label information of each sample image, where the label information includes the position and category of the sample target object in the sample image; generate a training sample set according to each sample image and corresponding label information.
  • the sample enhancement includes at least one of the following ways: at least one of size scaling, flipping, mirroring, and image rotation is performed on the sample mask image.
  • the initial network model includes a basic convolutional neural network and a target detection network after weight initialization, and the training sub-module is specifically used for:
  • the target detection network after weight initialization outputs multiple detection frames on each anchor point in the feature map, and performs forward inference on each detection frame to obtain the position, category and confidence of each detection frame;
  • the initial network model is optimized according to the model loss value, and the weights in the initial network model are updated through backpropagation to train the target detection model.
  • the target detection device provided in this embodiment only the division of the above-mentioned functional modules is used as an example.
  • the above-mentioned function allocation can be completed by different functional modules as required, that is, the internal part of the device
  • the structure is divided into different functional modules to complete all or part of the functions described above.
  • the specific implementation process and beneficial effects of the target detection device in this embodiment are detailed in the target detection method in the first embodiment, which will not be repeated here.
  • an embodiment of the present invention provides a target detection device, which can be configured in any computer device, so that the computer device can execute the target detection method provided in the second embodiment .
  • the computer equipment can be configured as various terminals, such as a server, which can be implemented by an independent service or a server cluster.
  • the device may include:
  • the obtaining module 61 is configured to obtain a target mask corresponding to at least one target object in the image to be detected;
  • the target detection module 62 is configured to input the image to be detected into a pre-trained target detection model for detection, and obtain a detection result of each target object, wherein each detection result includes the positions, types, and confidences of multiple candidate frames Spend;
  • the denoising processing module 63 is configured to perform denoising processing on multiple candidate frames with confidence higher than the confidence threshold in each detection result, to obtain an effective candidate frame for each target object;
  • the verification module 64 is used for verifying the effective candidate frame of each target object using the target mask, and obtaining the final detection result of each target object.
  • the obtaining module 61 is specifically used for:
  • Filtering the areas where the area of the connected domain in the initial mask is lower than the first threshold value is filtered to obtain the target mask.
  • the device further includes a training module, and the training module includes:
  • the preprocessing sub-module is used to preprocess each frame image of the sample video to obtain a training sample set, where the training samples in the training sample set include sample images and label information of the sample images;
  • Dividing sub-module used to divide the training sample set into training set and test set
  • the training sub-module is used to input the training set into the pre-built initial network model for training to obtain the target detection model; and the test sub-module is used to input the test set to the target detection model for testing to obtain the test value, when the test value meets When the requirements are preset, the target detection model is trained.
  • the preprocessing sub-module is specifically used to: normalize each image, and perform sample enhancement on each image after normalization to obtain multiple sample images; Labeling information, where the labeling information includes the position and category of the sample target object in the sample image; a training sample set is generated according to each sample image and corresponding labeling information.
  • the sample enhancement includes at least one of the following ways: at least one of size scaling, flipping, mirroring, and image rotation is performed on the sample mask image.
  • the initial network model includes a basic convolutional neural network after weight initialization and a target detection network
  • the training sub-module is specifically used to: generate a feature map of the input sample image through the basic convolutional neural network after weight initialization;
  • the target detection network After weight initialization, the target detection network outputs multiple detection frames on each anchor point in the feature map, and performs forward inference on each detection frame to obtain the position, category, and confidence of each detection frame; Calculate the error between the location and category of the detection frame and the location and category of the sample target in the label information of the sample image to obtain the location loss value and category loss value of each detection frame; according to the location loss value and category loss value of each detection frame Calculate the model loss value and the confidence level; optimize the initial network model according to the model loss value, and update the weights in the initial network model through backpropagation to train the target detection model.
  • the verification module 64 includes: a mapping sub-module for mapping the valid candidate frame of each target object to each connected domain of the target mask; a filtering sub-module for mapping the valid candidate frame of each target object The area of the mapped connected domain is filtered for each valid candidate frame to determine the final valid candidate frame; the comparison sub-module is used for each final valid candidate frame, when the final valid candidate frame only maps one connected In the case of domains, compare the area of the final valid candidate frame with the area of the circumscribed rectangle of the connected domain to which it is mapped; the adjustment sub-module is used to adjust the position of the final valid candidate frame according to the comparison result;
  • the adjustment sub-module is specifically configured to: if the comparison result indicates that the area of the final valid candidate frame is greater than the area of the circumscribed rectangle of the connected domain to which it is mapped, then according to the circumscribed area of the final valid candidate box and its mapped connected domain The intersection between the rectangles adjusts the position of the final valid candidate frame; if the comparison result indicates that the area of the final valid candidate frame is smaller than the area of the circumscribed rectangle of the connected domain to which it is mapped, then the final valid candidate frame and its The union between the circumscribed rectangles of the mapped connected domains adjusts the position of the final effective candidate frame.
  • the target detection device provided in this embodiment only the division of the above-mentioned functional modules is used as an example.
  • the above-mentioned function allocation can be completed by different functional modules as required, that is, the internal part of the device
  • the structure is divided into different functional modules to complete all or part of the functions described above.
  • the specific implementation process and beneficial effects of the target detection device in this embodiment are detailed in the target detection method in the second embodiment, which will not be repeated here.
  • a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements the above method steps of the present invention when the processor executes the computer program.
  • a computer-readable storage medium is also provided, on which a computer program is stored, and the computer program is executed by a processor to realize the above method steps of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Un procédé de détection de cible et un appareil, qui appartiennent aux domaines techniques de la détection et de la reconnaissance d'image. Le procédé comprend les étapes consistant à obtenir un masque cible correspondant à au moins un objet cible dans une image à soumettre à une détection (101); à utiliser le masque cible pour effectuer un masquage sur l'image à soumettre à une détection, et à obtenir une image masquée à partir de laquelle un arrière-plan est éliminé (102); à entrer l'image masquée dans un modèle de détection de cible pré-entraîné, à réaliser une détection, et à obtenir un résultat de détection pour chaque objet cible, chaque résultat de détection comprenant une pluralité de positions, de types et de scores de confiance de boîtes de délimitation candidates (103); à réaliser un débruitage sur des boîtes de délimitation candidates dans chaque résultat de détection qui ont des scores de confiance supérieurs à un seuil de score de confiance, et à obtenir un résultat de détection final pour chaque objet cible (104). Le présent procédé peut résoudre le problème actuel de dépendance environnementale d'un algorithme de détection cible dans un scénario surveillé correspondant, ainsi que le problème de détection erronée dans un scénario complexe.
PCT/CN2021/098734 2020-06-17 2021-06-07 Procédé et appareil de détection de cible WO2021254205A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010553786.5A CN111723860B (zh) 2020-06-17 2020-06-17 一种目标检测方法及装置
CN202010553786.5 2020-06-17

Publications (1)

Publication Number Publication Date
WO2021254205A1 true WO2021254205A1 (fr) 2021-12-23

Family

ID=72567122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/098734 WO2021254205A1 (fr) 2020-06-17 2021-06-07 Procédé et appareil de détection de cible

Country Status (2)

Country Link
CN (1) CN111723860B (fr)
WO (1) WO2021254205A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445622A (zh) * 2022-01-14 2022-05-06 支付宝(杭州)信息技术有限公司 一种目标检测方法、装置、设备及处理器
CN114612769A (zh) * 2022-03-14 2022-06-10 电子科技大学 一种融入局部结构信息的集成感知红外成像舰船检测方法
CN114998705A (zh) * 2022-06-17 2022-09-02 集美大学 目标检测方法、系统及存内计算芯片
CN114998609A (zh) * 2022-05-18 2022-09-02 安徽理工大学 一种基于密集特征提取与轻量级网络的多类商品目标检测方法
CN115063578A (zh) * 2022-08-18 2022-09-16 杭州长川科技股份有限公司 芯片图像中目标对象检测与定位方法、装置及存储介质
CN115294478A (zh) * 2022-07-28 2022-11-04 北京航空航天大学 一种应用于现代光电平台的空中无人机目标检测方法
CN116030272A (zh) * 2023-03-30 2023-04-28 之江实验室 一种基于信息抽取的目标检测方法、系统和装置
CN116630832A (zh) * 2023-07-21 2023-08-22 江西现代职业技术学院 一种无人机目标识别方法、系统、计算机及可读存储介质
CN116664604A (zh) * 2023-07-31 2023-08-29 苏州浪潮智能科技有限公司 图像的处理方法及装置、存储介质及电子设备
CN116777843A (zh) * 2023-05-26 2023-09-19 湖南大学 一种基于动态非极大值抑制的厨余垃圾检测方法及系统
CN116824258A (zh) * 2023-06-30 2023-09-29 哈尔滨工业大学 一种基于反向投影的施工场地烟尘检测方法
CN117218515A (zh) * 2023-09-19 2023-12-12 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117541782A (zh) * 2024-01-09 2024-02-09 北京闪马智建科技有限公司 对象的识别方法、装置、存储介质及电子装置

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723860B (zh) * 2020-06-17 2022-11-18 苏宁云计算有限公司 一种目标检测方法及装置
CN112258504B (zh) * 2020-11-13 2023-12-08 腾讯科技(深圳)有限公司 一种图像检测方法、设备及计算机可读存储介质
CN112396116B (zh) * 2020-11-24 2021-12-07 武汉三江中电科技有限责任公司 一种雷电检测方法、装置、计算机设备及可读介质
CN112529851B (zh) * 2020-11-27 2023-07-18 中冶赛迪信息技术(重庆)有限公司 一种液压管状态确定方法、系统、终端及介质
CN112766046B (zh) * 2020-12-28 2024-05-10 深圳市捷顺科技实业股份有限公司 一种目标检测方法及相关装置
CN112613570B (zh) * 2020-12-29 2024-06-11 深圳云天励飞技术股份有限公司 一种图像检测方法、图像检测装置、设备及存储介质
CN112990211B (zh) * 2021-01-29 2023-07-11 华为技术有限公司 一种神经网络的训练方法、图像处理方法以及装置
CN112507983B (zh) * 2021-02-03 2021-11-16 北京世纪好未来教育科技有限公司 目标检测方法、装置、电子设备及存储介质
CN112989995B (zh) * 2021-03-10 2024-02-20 北京百度网讯科技有限公司 文本检测方法、装置及电子设备
CN113298122A (zh) * 2021-04-30 2021-08-24 北京迈格威科技有限公司 目标检测方法、装置和电子设备
CN113408361B (zh) * 2021-05-25 2023-09-19 中国矿业大学 一种基于深度学习的矿用输送带大块物料检测方法及系统
CN113361576A (zh) * 2021-05-31 2021-09-07 展讯通信(天津)有限公司 图片标注方法和设备
CN113331160B (zh) * 2021-06-02 2022-09-27 河南省农业科学院烟草研究所 一种烟草专用精准喷药系统
CN113449606B (zh) * 2021-06-04 2022-12-16 南京苏宁软件技术有限公司 一种目标对象识别方法、装置、计算机设备及存储介质
CN113808200B (zh) * 2021-08-03 2023-04-07 嘉洋智慧安全科技(北京)股份有限公司 一种检测目标对象移动速度的方法、装置及电子设备
CN113591765A (zh) * 2021-08-09 2021-11-02 精英数智科技股份有限公司 一种基于实例分割算法的异物检测方法及系统
CN113808117B (zh) * 2021-09-24 2024-05-21 北京市商汤科技开发有限公司 灯具检测方法、装置、设备及存储介质
CN113989626B (zh) * 2021-12-27 2022-04-05 北京文安智能技术股份有限公司 一种基于目标检测模型的多类别垃圾场景区分方法
CN114782412A (zh) * 2022-05-26 2022-07-22 马上消费金融股份有限公司 图像检测方法、目标检测模型的训练方法及装置
CN115100492B (zh) * 2022-08-26 2023-04-07 摩尔线程智能科技(北京)有限责任公司 Yolov3网络训练、pcb表面缺陷检测方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245571A1 (en) * 2008-03-31 2009-10-01 National Taiwan University Digital video target moving object segmentation method and system
CN108268869A (zh) * 2018-02-13 2018-07-10 北京旷视科技有限公司 目标检测方法、装置及系统
CN108647588A (zh) * 2018-04-24 2018-10-12 广州绿怡信息科技有限公司 物品类别识别方法、装置、计算机设备和存储介质
CN108876810A (zh) * 2018-06-11 2018-11-23 江苏东大金智信息系统有限公司 视频摘要中利用图割算法进行运动目标检测的方法
CN110490073A (zh) * 2019-07-15 2019-11-22 浙江省北大信息技术高等研究院 目标检测方法、装置、设备及存储介质
CN111160065A (zh) * 2018-11-07 2020-05-15 中电科海洋信息技术研究院有限公司 遥感图像舰船检测方法、装置、设备及其存储介质
CN111723860A (zh) * 2020-06-17 2020-09-29 苏宁云计算有限公司 一种目标检测方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images
CN109147254B (zh) * 2018-07-18 2021-05-18 武汉大学 一种基于卷积神经网络的视频野外火灾烟雾实时检测方法
CN110096960B (zh) * 2019-04-03 2021-06-08 罗克佳华科技集团股份有限公司 目标检测方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245571A1 (en) * 2008-03-31 2009-10-01 National Taiwan University Digital video target moving object segmentation method and system
CN108268869A (zh) * 2018-02-13 2018-07-10 北京旷视科技有限公司 目标检测方法、装置及系统
CN108647588A (zh) * 2018-04-24 2018-10-12 广州绿怡信息科技有限公司 物品类别识别方法、装置、计算机设备和存储介质
CN108876810A (zh) * 2018-06-11 2018-11-23 江苏东大金智信息系统有限公司 视频摘要中利用图割算法进行运动目标检测的方法
CN111160065A (zh) * 2018-11-07 2020-05-15 中电科海洋信息技术研究院有限公司 遥感图像舰船检测方法、装置、设备及其存储介质
CN110490073A (zh) * 2019-07-15 2019-11-22 浙江省北大信息技术高等研究院 目标检测方法、装置、设备及存储介质
CN111723860A (zh) * 2020-06-17 2020-09-29 苏宁云计算有限公司 一种目标检测方法及装置

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445622A (zh) * 2022-01-14 2022-05-06 支付宝(杭州)信息技术有限公司 一种目标检测方法、装置、设备及处理器
CN114612769A (zh) * 2022-03-14 2022-06-10 电子科技大学 一种融入局部结构信息的集成感知红外成像舰船检测方法
CN114998609A (zh) * 2022-05-18 2022-09-02 安徽理工大学 一种基于密集特征提取与轻量级网络的多类商品目标检测方法
CN114998705A (zh) * 2022-06-17 2022-09-02 集美大学 目标检测方法、系统及存内计算芯片
CN115294478A (zh) * 2022-07-28 2022-11-04 北京航空航天大学 一种应用于现代光电平台的空中无人机目标检测方法
CN115294478B (zh) * 2022-07-28 2024-04-05 北京航空航天大学 一种应用于现代光电平台的空中无人机目标检测方法
CN115063578A (zh) * 2022-08-18 2022-09-16 杭州长川科技股份有限公司 芯片图像中目标对象检测与定位方法、装置及存储介质
CN116030272A (zh) * 2023-03-30 2023-04-28 之江实验室 一种基于信息抽取的目标检测方法、系统和装置
CN116777843A (zh) * 2023-05-26 2023-09-19 湖南大学 一种基于动态非极大值抑制的厨余垃圾检测方法及系统
CN116777843B (zh) * 2023-05-26 2024-02-27 湖南大学 一种基于动态非极大值抑制的厨余垃圾检测方法及系统
CN116824258A (zh) * 2023-06-30 2023-09-29 哈尔滨工业大学 一种基于反向投影的施工场地烟尘检测方法
CN116824258B (zh) * 2023-06-30 2024-05-14 哈尔滨工业大学 一种基于反向投影的施工场地烟尘检测方法
CN116630832A (zh) * 2023-07-21 2023-08-22 江西现代职业技术学院 一种无人机目标识别方法、系统、计算机及可读存储介质
CN116630832B (zh) * 2023-07-21 2023-09-29 江西现代职业技术学院 一种无人机目标识别方法、系统、计算机及可读存储介质
CN116664604A (zh) * 2023-07-31 2023-08-29 苏州浪潮智能科技有限公司 图像的处理方法及装置、存储介质及电子设备
CN116664604B (zh) * 2023-07-31 2023-11-03 苏州浪潮智能科技有限公司 图像的处理方法及装置、存储介质及电子设备
CN117218515A (zh) * 2023-09-19 2023-12-12 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117218515B (zh) * 2023-09-19 2024-05-03 人民网股份有限公司 一种目标检测方法、装置、计算设备和存储介质
CN117541782A (zh) * 2024-01-09 2024-02-09 北京闪马智建科技有限公司 对象的识别方法、装置、存储介质及电子装置

Also Published As

Publication number Publication date
CN111723860B (zh) 2022-11-18
CN111723860A (zh) 2020-09-29

Similar Documents

Publication Publication Date Title
WO2021254205A1 (fr) Procédé et appareil de détection de cible
CN111046880B (zh) 一种红外目标图像分割方法、系统、电子设备及存储介质
WO2021208275A1 (fr) Procédé et système de modélisation d'arrière-plan de vidéo de trafic
CN108304798B (zh) 基于深度学习及运动一致性的街面秩序事件视频检测方法
CN102214298B (zh) 基于选择性视觉注意机制的遥感图像机场目标检测与识别方法
CN109284738B (zh) 不规则人脸矫正方法和系统
CN111460968B (zh) 基于视频的无人机识别与跟踪方法及装置
CN112150493B (zh) 一种基于语义指导的自然场景下屏幕区域检测方法
CN109255375A (zh) 基于深度学习的全景图像对象检测方法
CN113076871A (zh) 一种基于目标遮挡补偿的鱼群自动检测方法
US9122960B2 (en) Patch size adaptation for image enhancement
CN112926410A (zh) 目标跟踪方法、装置、存储介质及智能视频系统
CN110909724B (zh) 一种多目标图像的缩略图生成方法
WO2021249351A1 (fr) Procédé de détection de cible, appareil et dispositif informatique basés sur une image rgbd
CN109685045A (zh) 一种运动目标视频跟踪方法及系统
CN111199245A (zh) 油菜害虫识别方法
CN110222572A (zh) 跟踪方法、装置、电子设备及存储介质
WO2022152009A1 (fr) Procédé et appareil de détection de cible, dispositif et support d'enregistrement
CN117649610B (zh) 一种基于YOLOv5的害虫检测方法及系统
Shuai et al. An improved YOLOv5-based method for multi-species tea shoot detection and picking point location in complex backgrounds
CN113570540A (zh) 一种基于检测-分割架构的图像篡改盲取证方法
CN111832508B (zh) 基于die_ga的低照度目标检测方法
CN111881803B (zh) 一种基于改进YOLOv3的畜脸识别方法
Dubey et al. A review of image segmentation using clustering methods
CN113610178A (zh) 一种基于视频监控图像的内河船舶目标检测方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21826111

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21826111

Country of ref document: EP

Kind code of ref document: A1