CN109784145B

CN109784145B - Target detection method based on depth map and storage medium

Info

Publication number: CN109784145B
Application number: CN201811480757.XA
Authority: CN
Inventors: 彭博文; 王行; 李骊; 周晓军; 盛赞; 李朔; 杨淼
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2021-03-16
Anticipated expiration: 2038-12-05
Also published as: CN109784145A

Abstract

A specific target detection method and storage medium based on depth image, said method is to define the size of the real candidate frame through the size of the target object to be measured, and calculate the window traversal interval of the candidate frame; traversing the depth image to obtain a central point pixel coordinate of the candidate frame; obtaining the depth value of the center point of each candidate frame, and screening to obtain effective candidate frames; calculating the actual required frame length of the remaining effective candidate frames; and setting a filtering threshold value, filtering out points with overlarge depth difference between the depth of the effective candidate frame and the depth of the central point, and then further performing deep learning preprocessing and deep learning. The method can increase the step length of image traversal as much as possible, can filter out partial invalid candidate frames, and can calculate the side length of the candidate frame according to the depth value of the central point and the real size of the target object, thereby preventing the multi-scale candidate frame from being generated at the same position, saving a large amount of calculation and providing good convenience for rapid target detection.

Description

Target detection method based on depth map and storage medium

Technical Field

The invention relates to the field of image detection, in particular to a target detection method based on a depth map, which can greatly reduce the initial number of candidate frames on the depth map, thereby greatly reducing the number of model calculation and providing detection efficiency.

Background

Image classification, detection and segmentation are three major tasks in the field of computer vision. As the most basic part, the current main target detection image images are RGB images, and the development of structured light technology and TOF technology, depth images are gradually becoming new data sources.

In recent years, with the rapid development of deep learning technology, the speed and accuracy of detecting a specific target in an image are greatly improved. But the real-time detection effect of the video image is far from being achieved. However, compared with the traditional target detection scheme, the target detection based on deep learning has higher precision and better adaptability. The commonly used image target detection algorithm adopts the following schemes:

at present, mainstream target detection algorithms based on deep learning models can be divided into two categories: (1) the two-stage detection algorithm divides the detection problem into two stages, firstly generates candidate regions (region poppesals), then classifies the candidate regions (generally, the position refinement is also needed), and the typical representatives of the algorithm are R-CNN algorithms based on the region poppesals, such as R-CNN, Fast R-CNN and the like; (2) one-stage detection algorithm, which does not require a region pro-posal stage, directly generates class probability and position coordinate values of an object, comparing typical algorithms such as YOLO and SSD.

The method completely depends on random candidate frame extraction, and has the following defects:

(1) the candidate frames have high randomness and are not selected in a targeted manner;

(2) the size of the candidate frame needs to be selected in a multi-scale mode, and the number of the candidate frames is greatly increased;

(3) there may be a large amount of overlap between candidate frames, resulting in an increased number of candidate frames;

(4) the initial convolution image is too large, and the operation amount is large.

Therefore, how to reduce the number of candidate frames and improve the overlap ratio between the candidate frames and the target in the frame in the depth image recognition, so that the computation amount is reduced in the subsequent convolution algorithm to quickly detect the object, becomes a technical problem to be solved by the conventional image recognition technology.

Disclosure of Invention

The invention aims to provide a depth map-based specific target detection method, which utilizes the real size information of a target object and can greatly reduce the initial number of candidate frames on a depth image, thereby greatly reducing the number of model calculation and improving the detection efficiency.

In order to achieve the purpose, the invention adopts the following technical scheme:

a specific target detection method based on a depth image comprises the following steps:

step S110 of calculating traversal step: defining the size of a real candidate frame according to the size L of the target object to be detected, and calculating by using a formula (1) to obtain a candidate frame window traversal interval Stride_GUnit is a pixel

Stride_G＝0.5*L*f_xyformula/D (1)

Wherein L is the size of the target object to be measured, f_xyThe distance is the main distance of the depth sensor, the unit is the pixel unit, and D is the farthest distance of the target object to be detected;

image traversal step S120: according to the size of the real candidate frame and the candidate frame window traversal interval Stride_GTraversing the depth image and acquiring the pixel coordinates of the center points of all the candidate frames;

valid candidate box screening step S130: obtaining the depth value of the center point of each candidate frame, comparing the depth value with the farthest distance D of the target object to be detected, wherein the candidate frame smaller than the farthest distance D is an effective candidate frame, and otherwise, the candidate frame is an invalid candidate frame;

the actually required frame length calculation step S140 of the valid candidate frame: calculating the actual required frame length L of the remaining effective candidate frames by formula (3) using the actual depth value d of the center point of each candidate frame_pixel

L_pixel＝L*f_xyFormula (3);

a filtering step S150: according to the calculated actual required frame length L of the effective candidate frame_pixelAnd setting a filtering threshold value, and filtering out points with overlarge differences between the depths of the effective candidate frames and the depth of the central point. Therefore, the points of the effective candidate frame with the depth exceeding the filtering threshold value with the depth of the central point are filtered, and the foreground and background points in the candidate frame can be filtered.

Optionally, in step S110, the size of the real candidate frame is 1.25 to 1.75 times the real size of the target object to be measured.

Optionally, in step S110, the size of the real candidate frame is 1.5 times the real size of the target object to be measured.

Optionally, the filtering threshold is set at the actual required frame length L_pixel1/3-2/3.

Optionally, the filtering threshold is the actual required frame length L_pixelHalf of that.

Optionally, the method further includes, in the deep learning preprocessing step S160: the depth image is sampled to a fixed resolution and then the pixel values are normalized.

Optionally, the deep learning after the deep learning preprocessing step S160 is to train a multi-output CNN model, input a single-channel image, and output a new central point coordinate and a binary classification result.

The present invention further discloses a storage medium capable of being used to store computer-executable instructions, characterized by: the computer-executable instructions, when executed by a processor, perform the depth map-based object detection method described above.

According to the method, the target scale information of the depth map is introduced into the step length setting extracted from the target detection candidate box, so that the step length of image traversal can be increased as much as possible; meanwhile, the filtering of the depth value of the central point can filter out partial invalid candidate boxes; and then, the side length of the candidate frame is calculated according to the depth value of the central point and the real size of the target object, so that the situation that multi-scale candidate frames need to be generated at the same position can be prevented, a large amount of calculated amount is saved, and good convenience is provided for rapid target detection.

Drawings

FIG. 1 is a flow diagram of a depth map based specific target detector in accordance with a specific embodiment of the present invention;

fig. 2 is a schematic diagram of the center points of the candidate boxes after traversing the depth map in the embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

The invention relates to a depth image-based specific target detection method, wherein an implemented object is a depth image, and for the depth image, firstly, the maximum window size of a target is calculated as a window interval, namely a traversal step length, according to the size of a target object and the defined maximum distance of a target to be detected in the depth image; secondly, traversing the image on the depth image according to the window interval to generate a candidate frame, and judging whether the candidate frame is effective or not according to the depth of the central point of the candidate frame; and then calculating the actual required frame length of the effective candidate frame, and filtering partial foreground and background points by using the actual required frame length so as to reduce the number of candidate frames for the next deep learning and reduce interference during the candidate frame learning.

Further, referring to fig. 1, a flow chart of a specific target detector based on a depth map is shown, which includes the following steps:

Stride_G＝0.5*L*f_xyformula/D (1)

Wherein L is the size of the target object to be measured, f_xyThe unit is the principal distance of the depth sensor, the unit is the unit of pixel, and D is the farthest distance of the target object to be detected.

Optionally, the size of the real candidate frame may be 1.25 to 1.75 times the real size of the target object to be measured, and further preferably, the size of the real candidate frame may be 1.5 times the real size of the target object to be measured.

For example, assuming that the size of the target object to be measured is 170mm, the space size of the candidate frame may be defined as 255 mm.

In step S110, the target scale information of the depth map is introduced into the step setting extracted by the target detection candidate box, so that the step size of image traversal can be increased as much as possible.

Image traversal step S120: according to the size of the real candidate frame and the candidate frame window traversal interval Stride_GAnd traversing the depth image and acquiring the pixel coordinates of the center point of all the candidate frames.

Referring to fig. 2, a schematic diagram of the center point of each candidate box after traversing the depth map is shown.

Valid candidate box screening step S130: and obtaining the depth value of the central point of each candidate frame, comparing the depth value with the farthest distance D of the target object to be detected, wherein the candidate frame smaller than the farthest distance D is an effective candidate frame, and otherwise, the candidate frame is an invalid candidate frame.

That is, it is equivalent to the determination of the valid candidate frame by the formula (2).

P represents whether the candidate box is valid, D represents the actual depth value of the center point of the candidate box, and when the depth value is larger than 0 and smaller than D, the candidate box is valid, and otherwise, the candidate box is invalid.

Therefore, partially invalid candidate frames can be filtered out through step S130.

L_pixel＝L*f_xyFormula/d (3)

In the step, the side length of the candidate frame is calculated according to the depth value of the central point and the real size of the target object, so that the situation that a multi-scale candidate frame needs to be generated at the same position can be prevented, and a large amount of calculation is saved.

In an alternative embodiment, theTo actually require the frame length L_pixel1/3-2/3, a filtering threshold is set.

Further optionally, the filtering threshold is the actual required frame length L_pixelHalf of that.

Therefore, the extraction of the target detection candidate box using the real size of the depth map and the processing of partial foreground and background points are completed through steps S110 to S150, so that the target can be quickly found in the depth learning of the next step, and the correlation disturbance is reduced.

Deep learning preprocessing step S160: the depth image is sampled to a fixed resolution and then the pixel values are normalized.

In an alternative embodiment, a bilinear difference or nearest neighbor difference algorithm may be used to sample to a fixed resolution, e.g., 64 x 64, 128 x 128, and then normalize the pixel values to 0-1 based on the cube side length.

The subsequent deep learning may be to train a multi-output CNN model, input a single-channel image, and output a new centroid coordinate (regressor) and a binary result (classifier).

The present invention further discloses a storage medium capable of being used to store computer-executable instructions, characterized by:

the computer-executable instructions, when executed by a processor, perform the depth map-based object detection method described above.

Therefore, the invention introduces the target scale information of the depth map into the step setting extracted from the target detection candidate frame, and can increase the step length of image traversal as much as possible; meanwhile, the filtering of the depth value of the central point can filter out partial invalid candidate boxes; and then, the side length of the candidate frame is calculated according to the depth value of the central point and the real size of the target object, so that the situation that multi-scale candidate frames need to be generated at the same position can be prevented, and a large amount of calculation amount is saved. Because the number of the candidate frames is small, and the distance between the candidate frames is half of the target size, the target may not be in the central area of the candidate frames, the central position of the target object can be located through the CNN regressor, and the classifier can judge whether the target is the target to be detected, thereby providing good convenience for rapid target detection.

While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A specific target detection method based on a depth image comprises the following steps:

Stride_G＝0.5*L*f_xyformula/D (1)

L_pixel＝L*f_xyFormula (3);

a filtering step S150: according to the calculated valid candidateActual required frame length L of frame_pixelAnd setting a filtering threshold value, and filtering out points with overlarge depth difference between the depth and the central point in the effective candidate frame, so that points with the depth difference value between the depth and the central point in the effective candidate frame exceeding the filtering threshold value are filtered out, and foreground and background points in the candidate frame can be filtered out.

2. The specific object detection method according to claim 1, characterized in that:

in step S110, the size of the real candidate frame is 1.25 to 1.75 times the real size of the target object to be measured.

3. The specific object detection method according to claim 2, characterized in that:

in step S110, the size of the real candidate frame is 1.5 times the real size of the target object to be measured.

4. The specific object detection method according to claim 1, characterized in that:

the filtering threshold is set at the actual required frame length L_pixel1/3-2/3.

5. The specific object detection method according to claim 4, characterized in that:

the filtering threshold is the actual required frame length L_pixelHalf of that.

6. The specific object detection method according to claim 1, characterized in that:

further comprising, a deep learning preprocessing step S160: the depth image is sampled to a fixed resolution and then the pixel values are normalized.

7. The specific object detection method according to claim 6, characterized in that:

the deep learning after the deep learning preprocessing step S160 is to train a multi-output CNN model, input a single-channel image, and output a new centroid coordinate and a binary result.

8. A storage medium capable of being used to store computer-executable instructions, characterized by:

the computer-executable instructions, when executed by a processor, perform the depth map based object detection method of any one of claims 1-7.