CN109784145A

CN109784145A - Object detection method and storage medium based on depth map

Info

Publication number: CN109784145A
Application number: CN201811480757.XA
Authority: CN
Inventors: 彭博文; 王行; 李骊; 周晓军; 盛赞; 李朔; 杨淼
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2019-05-21
Anticipated expiration: 2038-12-05
Also published as: CN109784145B

Abstract

A kind of specific objective detection method and storage medium based on depth image, the method are the size of true candidate frame to be defined by the size of object to be measured object, and calculate candidate frame window traversal interval；Depth image is traversed, the central point pixel coordinate of candidate frame is obtained；The depth value of the central point of each candidate frame is obtained, screening obtains effective candidate frame；The actual needs frame for calculating remaining effective candidate frame is long；Filtering threshold is set, the point that depth in effective candidate frame and the depth of central point have big difference is filtered, it is subsequent further to carry out deep learning pretreatment and deep learning.The present invention can increase the step-length of image traversal as far as possible, the candidate frame of partial invalidity can be filtered out again, the side length of candidate frame is calculated according to the full-size(d) of object according to depth to center value, it can prevent from needing to generate multiple dimensioned candidate frame in same position to save a large amount of calculation amount, provide good convenience for fast target detection.

Description

Object detection method and storage medium based on depth map

Technical field

The present invention relates to field of image detection, specifically, being related to a kind of object detection method based on depth map, Ke Yi The initial number of candidate frame is greatly reduced on depth image, so that model calculation times, which are greatly lowered, provides detection efficiency.

Background technique

Image classification, detection and segmentation are three big tasks of computer vision field.And as most basic part, currently Main target detection image object is all RGB image, and the development of structured light technique and TOF technology, depth image gradually at For new data source.

The rapid development of depth learning technology in recent years, specific objective detection speed and precision has significantly in image It improves.But it far can not also reach the real-time detection effect of video image.But for conventional target detection scheme, base There is higher precision and better adaptability in the target detection of deep learning.Common image object detection algorithm uses Several schemes below:

Currently based on the algorithm of target detection of the mainstream of deep learning model, it is segmented into two major classes: (1) two-stage Detection algorithm, the problem of will test are divided into two stages, first generation candidate region (region proposals), then right Candidate region is classified (generally also needing to position refine), and the Typical Representative of this kind of algorithm is based on region proposal R-CNN system algorithm, such as R-CNN, Fast R-CNN, Faster R-CNN etc.；(2) one-stage detection algorithm does not need The region proposal stage, directly generate object class probability and position coordinate value, than more typical algorithm such as YOLO and SSD。

The above method is completely dependent on random candidate frame and extracts, and has the following deficiencies:

(1) candidate frame randomness is very big, does not select targetedly target object；

(2) size of candidate frame needs to carry out multiple dimensioned selection, considerably increases candidate frame quantity；

(3) there may be largely overlapped, candidate frame quantity is caused to increase between candidate frame；

(4) initial convolved image is excessive, and operand is big.

Therefore, the quantity of candidate frame how is reduced in depth image identification, and improved in candidate frame and frame in target Between registration be quickly detected object so that reducing operand in subsequent convolution algorithm, become existing image and know The technical issues of other technology urgent need to resolve.

Summary of the invention

It is an object of the invention to propose a kind of specific objective detection method based on depth map, target object is utilized Full-size(d) information can greatly reduce the initial number of candidate frame on depth image, so that model calculating be greatly lowered Number provides detection efficiency.

To achieve this purpose, the present invention adopts the following technical scheme:

A kind of specific objective detection method based on depth image, includes the following steps:

Traversal step size computation step S110: the size of true candidate frame, and benefit are defined by the size L of object to be measured object Candidate frame window traversal interval Stride is calculated with formula (1)_G, unit is pixel

Stride_G=0.5*L*f_xy/ D formula (1)

Wherein, L is the size of object to be measured object, f_xyFor depth transducer master away from unit is that pixel is unit, and D is The maximum distance for the target object for needing to detect；

Image traversal step S120: according to the size of true candidate frame and candidate frame window traversal interval Stride_G, Depth image is traversed, and obtains the central point pixel coordinate of all candidate frames；

Effective candidate frame screening step S130: obtaining the depth value of the central point of each candidate frame, it is detected with needs The maximum distance D of target object is compared, and the candidate frame less than maximum distance D is effective candidate frame, otherwise to wait in vain Select frame；

The actual needs frame of effective candidate frame is long to calculate step S140: utilizing the actual depth of the central point of each candidate frame Value d calculates the long L of actual needs frame of remaining effective candidate frame by formula (3)_pixel

L_pixel=L*f_xy/ d formula (3)；

Filtration step S150: according to calculated effective candidate frame the long L of actual needs frame_pixelFiltering threshold is set, The point that depth in effective candidate frame and the depth of central point have big difference is filtered.In this way, will effectively in candidate frame depth with The depth of central point is more than that the point of filtering threshold filters out, and can filter out the prospect and background dot in candidate frame.

Optionally, in step s 110, the size of true candidate frame be 1.25-1.75 times object to be measured object it is true Size.

Optionally, in step s 110, the full-size(d) for the object to be measured object that the size of true candidate frame is 1.5 times.

Optionally, the filtering threshold is set in the actual needs long L of frame_pixel1/3~2/3 among.

Optionally, filtering threshold is the actual needs long L of frame_pixelHalf.

It optionally, further include deep learning pre-treatment step S160: depth image being sampled under fixed resolution, so Pixel value is normalized afterwards.

Optionally, the deep learning after deep learning pre-treatment step S160 is the CNN mould of one multi output of training Type is inputted as single channel image, is exported as new center point coordinate and two classification results.

The present invention further discloses a kind of storage medium, which can be used for storing the executable finger of computer It enables, it is characterised in that: the computer executable instructions execute the above-mentioned target based on depth map when being executed by processor Detection method.

The target scale information of depth map is introduced the step size settings that target detection candidate frame extracts by the present invention, and can use up can The step-length of image traversal can be increased；The filtering of depth to center value can filter out the candidate frame of partial invalidity again simultaneously；Later The side length for calculating candidate frame according to the full-size(d) of object according to depth to center value, can prevent in same position needs Multiple dimensioned candidate frame is generated to save a large amount of calculation amount, provides good convenience for fast target detection.

Detailed description of the invention

Fig. 1 is the flow chart of the specific objective detection side based on depth map of specific embodiment according to the present invention；

Fig. 2 is the schematic diagram of the central point of each candidate frame after traversing depth map in a specific embodiment of the present invention.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

A kind of specific objective detection method based on depth image of the present invention, the object implemented are depth image, for The depth image is first according to the size of target object, the defined maximum distance meter for needing to detect target in depth image Target maximum window size is calculated as window interval and traverses step-length；Then according to window interval traversing graph on depth image As generating candidate frame, then judge whether candidate frame is effective according to the depth of the central point of the candidate frame；Then it calculates and effectively waits It selects the actual needs frame of frame long, and using the long filtration fraction prospect of the actual needs frame and background dot, and then is the depth of next step Degree study reduces interference when reducing the quantity and candidate frame study of candidate frame.

Further, referring to Fig. 1, the flow chart of the specific objective detection side based on depth map is shown, including walk as follows It is rapid:

Stride_G=0.5*L*f_xy/ D formula (1)

Wherein, L is the size of object to be measured object, f_xyFor depth transducer master away from unit is that pixel is unit, and D is The maximum distance for the target object for needing to detect.

Optionally, the size of true candidate frame can be the full-size(d) of 1.25-1.75 times of object to be measured object, into one Step is preferred, and the size of true candidate frame can be the full-size(d) of 1.5 times of object to be measured object.

Such as assume that the size of object to be measured object is 170mm, then the size of space of candidate frame can be defined as 255mm.

In step s 110, the target scale information of depth map is introduced into the step size settings that target detection candidate frame extracts, The step-length of image traversal can be increased as far as possible.

Image traversal step S120: according to the size of true candidate frame and candidate frame window traversal interval Stride_G, Depth image is traversed, and obtains the central point pixel coordinate of all candidate frames.

Referring to fig. 2, the schematic diagram of the central point of each candidate frame after traversing depth map is shown.

Effective candidate frame screening step S130: obtaining the depth value of the central point of each candidate frame, it is detected with needs The maximum distance D of target object is compared, and the candidate frame less than maximum distance D is effective candidate frame, otherwise to wait in vain Select frame.

That is, carrying out the judgement of effective candidate frame using formula (2).

Whether it is effective candidate frame that P is represented, and d indicates the actual depth value of the central point of candidate frame, when depth value is small greater than 0 Candidate frame is effective when D, other situation candidate frames are invalid.

Therefore, the candidate frame of partial invalidity can be filtered out by step S130.

L_pixel=L*f_xy/ d formula (3)

In this step, the side length for calculating candidate frame according to the full-size(d) of object according to depth to center value, can To prevent from needing to generate multiple dimensioned candidate frame in same position to save a large amount of calculation amount.

It in an alternative embodiment, can be in the actual needs long L of frame_pixel1/3~2/3 setting filtering threshold.

Further alternative, filtering threshold is the actual needs long L of frame_pixelHalf.

Therefore, by step S110 to S150 complete using depth map full-size(d) target detection candidate frame extract and The processing of part foreground and background point, so that can quickly find target in the deep learning of next step, and reduces phase Close interference.

Deep learning pre-treatment step S160: depth image is sampled under fixed resolution, then carries out pixel value Normalization.

In an alternative embodiment, fixed point can be sampled using bilinearity difference or closest difference arithmetic Under resolution, such as 64*64,128*128, pixel value is then based on cube side length and normalizes to 0~1.

Subsequent deep learning can be, the CNN model of one multi output of training, input as single channel image, export and be New center point coordinate (returning device) and two classification results (classifier).

The present invention further discloses a kind of storage medium, which can be used for storing the executable finger of computer It enables, it is characterised in that:

The computer executable instructions execute the above-mentioned target detection side based on depth map when being executed by processor Method.

Therefore, the target scale information of depth map is introduced the step size settings that target detection candidate frame extracts by the present invention, can To increase the step-length of image traversal as far as possible；The filtering of depth to center value can filter out the candidate of partial invalidity again simultaneously Frame；The side length for calculating candidate frame according to the full-size(d) of object according to depth to center value later, can prevent same Position needs to generate multiple dimensioned candidate frame to save a large amount of calculation amount.Since candidate frame quantity is few, between candidate frame Distance is that half of target size can be positioned so target may be not at the central area of candidate frame by the recurrence device of CNN To the center of object, and classifier may determine that whether target is the target for needing to detect, and detect for fast target Provide good convenience.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that A specific embodiment of the invention is only limitted to this, for those of ordinary skill in the art to which the present invention belongs, is not taking off Under the premise of from present inventive concept, several simple deduction or replace can also be made, all shall be regarded as belonging to the present invention by institute Claims of submission determine protection scope.

Claims

1. a kind of specific objective detection method based on depth image, includes the following steps:

Traversal step size computation step S110: the size of true candidate frame is defined by the size L of object to be measured object, and utilizes public affairs Candidate frame window traversal interval Stride is calculated in formula (1)_G, unit is pixel

Stride_G=0.5*L*f_xy/ D formula (1)

Wherein, L is the size of object to be measured object, f_xyIt is unit that master for depth transducer, which is pixel away from, unit, and D is to need The maximum distance of the target object of detection；

Image traversal step S120: according to the size of true candidate frame and candidate frame window traversal interval Stride_G, to depth Image is traversed, and obtains the central point pixel coordinate of all candidate frames；

Effective candidate frame screening step S130: obtaining the depth value of the central point of each candidate frame, the target that it is detected with needs The maximum distance D of object is compared, otherwise it is invalid candidate frame that the candidate frame less than maximum distance D, which is effective candidate frame,；

The actual needs frame of effective candidate frame is long to calculate step S140: the actual depth value d of the central point of each candidate frame is utilized, The long L of actual needs frame of remaining effective candidate frame is calculated by formula (3)_pixel

L_pixel=L*f_xy/ d formula (3)；

Filtration step S150: according to calculated effective candidate frame the long L of actual needs frame_pixelFiltering threshold is set, will be had The point that depth and the depth of central point have big difference in effect candidate frame filters.In this way, will depth and center effectively in candidate frame The depth of point is more than that the point of filtering threshold filters out, and can filter out the prospect and background dot in candidate frame.

2. specific objective detection method according to claim 1, it is characterised in that:

In step s 110, the full-size(d) for the object to be measured object that the size of true candidate frame is 1.25-1.75 times.

3. specific objective detection method according to claim 2, it is characterised in that:

In step s 110, the full-size(d) for the object to be measured object that the size of true candidate frame is 1.5 times.

4. specific objective detection method according to claim 1, it is characterised in that:

The filtering threshold is set in the actual needs long L of frame_pixel1/3~2/3 among.

5. specific objective detection method according to claim 4, it is characterised in that:

Filtering threshold is the actual needs long L of frame_pixelHalf.

6. specific objective detection method according to claim 1, it is characterised in that:

Further include that deep learning pre-treatment step S160: depth image is sampled under fixed resolution, then by pixel value into Row normalization.

7. specific objective detection method according to claim 6, it is characterised in that:

Deep learning after deep learning pre-treatment step S160 is the CNN model of one multi output of training, is inputted as list Channel image exports as new center point coordinate and two classification results.

8. a kind of storage medium, which can be used for storing computer executable instructions, it is characterised in that:

Computer executable instructions perform claim when being executed by processor requires described in any one of 1-7 based on depth Spend the object detection method of figure.