CN111652136B

CN111652136B - Pedestrian detection method and device based on depth image

Info

Publication number: CN111652136B
Application number: CN202010493600.1A
Authority: CN
Inventors: 尹延涛; 梁贵钘; 李永翔
Original assignee: Suning Cloud Computing Co Ltd
Current assignee: Suning Cloud Computing Co Ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2022-11-22
Anticipated expiration: 2040-06-03
Also published as: WO2021244364A1; CN111652136A

Abstract

The invention discloses a pedestrian detection method and device based on a depth image, which are used for acquiring pedestrian data in a scene through a depth camera in a downward shooting manner, so that the accuracy of the pedestrian detection data is improved. The method comprises the following steps: constructing a ground fitting formula based on the framed ground area in the first depth image, and constructing a corresponding marker fitting formula based on at least one marker area; fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas to obtain a background mask of the current scene; updating the background of the background mask according to the pixel points in the multi-frame continuous second depth image and the pixel points in the background mask; comparing pixel points of a third depth image obtained in real time with the updated background mask, and locking a foreground area containing human body pixels in the third depth image; and merging and/or segmenting the human body area in the foreground area by adopting an area growing mode to obtain human body detection data.

Description

Pedestrian detection method and device based on depth image

Technical Field

The invention relates to the technical field of image recognition, in particular to a pedestrian detection method and device based on a depth image.

Background

In the times of vigorous development of artificial intelligence, various new things develop like bamboo shoots in spring after rain, and new things such as unmanned supermarkets, unmanned stores and the like emerge in a dispute. With the trend of the era of intelligent retail, offline retail and artificial intelligence are combined, and a brand-new shopping mode as smooth as online shopping is provided, so that the method becomes a new research direction. The action track of each customer entering a closed scene is shot by the aid of the full-coverage type camera, services such as commodity recommendation and settlement are provided in real time, and the non-perception shopping experience that the customer takes and walks is achieved in the real sense.

At present, few pedestrians are detected and an oblique shooting scheme is adopted, the method has the advantages that the shooting projection area is large, more characteristic information can be obtained conveniently, the shielding problem causes the loss of partial characteristic information, if two people walk side by side, one person can shield partial body characteristics by the other person, in a complex scene such as an unmanned store, the shielding problem can bring the problem that the store cannot settle accounts, and the shopping experience of a user is influenced.

Disclosure of Invention

The invention aims to provide a pedestrian detection method and device based on a depth image, which are used for acquiring pedestrian data in a scene in a downward shooting mode of a depth camera, effectively solving the problem of shielding information loss caused by oblique shooting of a single camera and improving the accuracy of the pedestrian detection data.

In order to achieve the above object, a first aspect of the present invention provides a pedestrian detection method based on a depth image, the depth image being obtained by a depth camera being taken over, the method including:

constructing a ground fitting formula based on the framed ground area in the first depth image, and constructing a marker fitting formula corresponding to the marker area one by one based on at least one marker area;

fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas to obtain a background mask of the current scene;

updating the background of the background mask according to the pixel points in the multi-frame continuous second depth image and the pixel points in the background mask;

comparing pixel points of a third depth image obtained in real time with the updated background mask, and locking a foreground area containing human body pixels in the third depth image;

and merging and/or segmenting the human body area in the foreground area by adopting an area growing mode to obtain human body detection data.

Preferably, the method for constructing the ground fitting formula based on the framed ground region in the first depth image includes:

s11, counting a data set corresponding to a ground area, wherein the data set comprises a plurality of image points;

s12, randomly selecting n image points from the ground area to form a ground initial data set, wherein n is more than or equal to 3 and is an integer;

s13, constructing an initial ground fitting formula based on the currently selected n image points, traversing the unselected image points in the initial data set, and sequentially substituting the unselected image points into the initial ground fitting formula to calculate the ground fitting value of the corresponding image point;

s14, screening the ground fitting values smaller than the first threshold value to generate an effective ground fitting value set of the ith wheel, wherein the initial value of i is 1;

s15, when the ratio of the number of image points corresponding to the effective ground fitting value set of the ith round to the total number of the image points in the ground area is greater than a second threshold value, accumulating all the ground fitting values in the effective ground fitting value set of the ith round;

s16, when the accumulated result of all the ground fitting values in the ith round is smaller than a third threshold value, defining the initial ground fitting formula corresponding to the ith round as the ground fitting formula, when the accumulated result of all the ground fitting values corresponding to the ith round is larger than the third threshold value, enabling i = i +1, and returning to the step S12 when i does not reach the threshold value round number, otherwise, executing the step S17;

and S17, defining an initial ground fitting formula corresponding to the minimum value of the accumulation results of all the ground fitting values in all the wheels as a ground fitting formula.

Preferably, the method for constructing the corresponding marker fitting formula based on the marker region includes:

s21, counting a data set corresponding to the marker area one by one, wherein the data set comprises a plurality of image points;

s22, randomly selecting n image points from the marker region to form a marker initial data set, wherein n is more than or equal to 3 and is an integer;

s23, constructing an initial marker fitting formula based on the currently selected n image points, traversing the unselected image points in the initial data set, and sequentially substituting the unselected image points into the initial marker fitting formula to calculate the marker fitting value of the corresponding image point;

s24, screening out the fitting values of the markers smaller than the first threshold value to generate an effective marker fitting value set of the ith round, wherein the initial value of i is 1;

s25, when the ratio of the number of image points corresponding to the valid marker fitting value set of the ith round to the total number of image points in the marker region is greater than a second threshold value, accumulating all the marker fitting values in the valid marker fitting value set of the ith round;

s26, when the accumulation result of all the fitting values of the markers in the ith round is smaller than a third threshold value, defining the initial marker fitting formula corresponding to the ith round as the marker fitting formula, when the accumulation result of all the fitting values of the markers corresponding to the ith round is larger than the third threshold value, enabling i = i +1, and returning to the step S22 when i does not reach the threshold number of rounds, otherwise, executing the step S27;

and S27, defining an initial marker fitting formula corresponding to the minimum value of the accumulated result of all the marker fitting values in all the rounds as a marker fitting formula.

Further, the method for obtaining the background mask of the current scene by fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas comprises the following steps:

constructing a ground equation based on the ground fitting formula and constructing a marker equation based on the marker fitting formula;

traversing image points in the first depth image, and respectively substituting a ground equation and a marker equation to obtain the ground distance and the marker distance of the image points;

screening out image points with the ground distance smaller than a ground threshold value and filling the image points with the marker distance smaller than a marker threshold value as a marker mask;

and fusing the ground mask and all the marker masks to obtain a background mask of the current scene.

Preferably, the method for updating the background mask according to the pixel points in the multiple continuous second depth images and the pixel points in the background mask includes:

comparing the depth values of the pixel points at the corresponding positions in the mth frame of second depth image and the (m + 1) th frame of second depth image in sequence, wherein the initial value of m is 1;

identifying pixel points with changed depth values, updating the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image into large values in comparison results, enabling m = m +1, and comparing the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image with the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image again until the pixel points at the positions in the last frame of second depth image and the depth values corresponding to the pixel points are obtained;

comparing the pixel points at the positions in the second depth image of the last frame and the corresponding depth values thereof with the pixel points at the positions in the background mask and the corresponding depth values thereof;

and identifying the pixel points with the changed depth values, and updating the depth values of the pixel points at the corresponding positions in the background mask to be small values in the comparison result.

Preferably, the method for comparing the pixel points of the third depth image acquired in real time with the updated background mask and locking the foreground region containing the human body pixels in the third depth image comprises the following steps:

comparing each position pixel point in the third depth image obtained in real time and the corresponding depth value thereof with each position pixel point in the updated background mask and the corresponding depth value thereof;

and identifying pixel points with reduced depth values in the third depth image, and summarizing to obtain a foreground region containing human body pixels.

Further, the method for merging and/or segmenting the human body region in the foreground region by adopting a region growing mode comprises the following steps:

identifying a human body pixel point set in a foreground region by adopting a connected domain marking algorithm according to a set growth threshold value;

identifying the number of human body pixel point sets in the foreground area, and respectively calculating the central point of each human body pixel point set when the human body pixel point sets are a plurality of human body pixel point sets;

connecting the obtained central points pairwise, calculating the distance of the connecting lines, and orthographically projecting the connecting lines to the ground area to respectively obtain an included angle theta between each connecting line and the ground equation;

obtaining the human body distance corresponding to the two human body pixel point sets based on the connecting line distance and the corresponding included angle theta;

and when the distance between the human bodies is greater than the distance threshold value, the two human body pixel point sets are corresponding to human body regions generated by two different human bodies, otherwise, the two human body pixel point sets are regarded as the human body regions generated by the same human body.

Preferably, the method of obtaining human body detection data includes:

based on the set distance interval, searching local highest pixel points of one human body area or a plurality of human body areas in a down-sampling mode;

the method comprises the steps of locking a human body area or head areas of a plurality of human body areas in an area growing mode, and calculating human body detection data corresponding to the human body area or the human body areas by using a ground equation, wherein the human body detection data comprise the height of a human body and pixel point coordinates of the head.

Preferably, the marker region is a shelf region.

Compared with the prior art, the pedestrian detection method based on the depth image has the following steps

Has the beneficial effects that:

the pedestrian detection method based on the depth image, provided by the invention, can be divided into an algorithm preparation stage, an algorithm initialization stage and an algorithm detection application stage in practical application, wherein the algorithm preparation stage, namely a background mask generation stage, comprises the following specific processes: firstly, a depth image of a current detection scene is obtained through a depth camera, a ground area and at least one marker area are selected from a first depth image, a ground fitting formula and a corresponding marker fitting formula are constructed, then a ground mask established by the ground fitting formula and marker masks established by the marker fitting formulas are fused, and a background mask of the current scene is obtained. The algorithm initialization phase, namely the background mask updating phase, comprises the following specific processes: and updating the background of the background mask according to the pixel parameter values in the multiple frames of continuous second depth images and the pixel parameter values in the background mask. The algorithm detection application stage can be divided into a foreground region identification stage and a human body region detection stage, and the corresponding specific processes are as follows: and comparing pixel points of the third depth image acquired in real time with the updated background mask, locking a foreground region containing human body pixels in the third depth image, and merging and/or segmenting human body regions in the foreground region by adopting a region growing mode to obtain human body detection data of one human body region or a plurality of human body regions.

Therefore, the depth image is obtained by using a downward shooting mode and the background mask is established, the problem of information loss caused by shielding due to oblique shooting is solved, the applicable scene of pedestrian detection is improved, in addition, compared with a common camera, the depth camera is used for increasing the information dimension of the image, the data comprising the height and the head three-dimensional space coordinate of the human body can be obtained, and the accuracy of the pedestrian detection data is improved.

A second aspect of the present invention provides a pedestrian detection device based on a depth image, which is applied to the pedestrian detection method based on a depth image in the above technical solution, and the device includes:

the fitting formula building unit builds a ground fitting formula based on the ground area framed and selected in the first depth image and builds a marker fitting formula corresponding to the marker area one by one based on at least one marker area;

the mask generating unit is used for fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas to obtain a background mask of the current scene;

the mask updating unit is used for updating the background of the background mask according to the pixel points in the multi-frame continuous second depth image and the pixel points in the background mask;

the foreground region identification unit is used for comparing pixel points of the third depth image acquired in real time with the updated background mask and locking a foreground region containing human body pixels in the third depth image;

and the human body detection unit is used for merging and/or dividing the human body areas in the foreground area by adopting an area growing mode to obtain human body detection data.

Compared with the prior art, the beneficial effects of the pedestrian detection device based on the depth image provided by the invention are the same as the beneficial effects of the pedestrian detection method based on the depth image provided by the technical scheme, and the details are not repeated herein.

A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described depth-image-based pedestrian detection method.

Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the invention are the same as those of the pedestrian detection method based on the depth image provided by the technical scheme, and the detailed description is omitted here.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not limit the invention. In the drawings:

fig. 1 is a flowchart illustrating a pedestrian detection method based on depth images according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Referring to fig. 1, the present embodiment provides a pedestrian detection method based on a depth image, where the depth image is obtained by a depth camera by downward shooting, and the method includes:

The pedestrian detection method based on the depth image provided by the embodiment can be divided into an algorithm preparation stage, an algorithm initialization stage and an algorithm detection application stage in practical application, wherein the algorithm preparation stage, namely a background mask generation stage, and the specific process comprises the following steps: the method comprises the steps of firstly obtaining a depth image of a current detection scene through a depth camera, selecting a ground area and at least one marker area in a first depth image, constructing a ground fitting formula and a corresponding marker fitting formula, and then fusing a ground mask established by the ground fitting formula and marker masks established by the marker fitting formulas to obtain a background mask of the current scene. The algorithm initialization phase, namely the background mask updating phase, comprises the following specific processes: and updating the background of the background mask according to the pixel parameter values in the multiple frames of continuous second depth images and the pixel parameter values in the background mask. The algorithm detection application stage can be divided into a foreground region identification stage and a human body region detection stage, and the corresponding specific process is as follows: and comparing pixel points of the third depth image acquired in real time with the updated background mask, locking a foreground region containing human body pixels in the third depth image, and merging and/or segmenting human body regions in the foreground region by adopting a region growing mode to obtain human body detection data of one human body region or a plurality of human body regions.

It can be seen that the background mask that the embodiment used the way of bowing to acquire the depth image and establish has solved the oblique shooting and has brought the problem that shelters from and lead to the information disappearance, has promoted pedestrian's detection's suitable scene, in addition, uses the depth camera to compare in ordinary camera and has increased the information dimension of image, can acquire the data that include human height and head three-dimensional space coordinate, has improved pedestrian detection data's accuracy.

It should be noted that the first depth image, the second depth image, and the third depth image in the above embodiments are different only in the purpose of use, where the first depth image is used for constructing a ground fitting formula and constructing a ground fitting formula, the second depth image is used for updating a background mask, and the third depth image is a real-time detection image for acquiring human detection data. For example, the depth image of the 1 st frame obtained by the depth camera in-plane shooting the monitored area is used as a first depth image, the depth images of the 2 nd frame to the 100 th frame are used as second depth images, and after the background mask is updated, the real-time image obtained by the depth camera in-plane shooting the monitored area is used as a third depth image.

In the above embodiment, the method for constructing the ground fitting formula based on the ground region framed in the first depth image includes:

s11, counting a data set corresponding to the ground area, wherein the data set comprises a plurality of image points;

s15, when the ratio of the number of the image points corresponding to the effective ground fitting value set of the ith round to the total number of the image points in the ground region is larger than a second threshold value, accumulating all the ground fitting values in the effective ground fitting value set of the ith round;

In the above embodiment, the method for constructing a corresponding marker fitting formula based on a marker region includes:

s24, screening the marker fitting values smaller than the first threshold value to generate an effective marker fitting value set of the ith round, wherein the initial value of i is 1;

and S27, defining an initial marker fitting formula corresponding to the minimum value of the accumulation results of all the marker fitting values in all the rounds as a marker fitting formula.

In specific implementation, the following description takes a marker fitting formula as an example:

firstly, selecting a ground area through an interaction mode frame set by a program, screening out a data set only containing ground image points, then randomly selecting 3 image points to form a ground initial data set, adopting a plane formula to fit an initial ground fitting formula, a _i x+b _i y+c _i z+d _i And =0, wherein i represents the number of the depth camera, if the full scene only uses 1 depth camera, the value of i is 1, that is, a ground fitting formula is constructed only for the first depth image shot by the depth camera, and if the full scene uses w depth cameras, the value of i is traversed from 1 to n respectively, that is, corresponding ground fitting formulas are constructed one by one only for the first depth images shot by the w depth cameras.

After the initial ground fitting formula is constructed, traversing unselected image points (except for the selected 3 image points) in the initial data set, and sequentially substituting the visual coordinate values (x, y and z) corresponding to each image point into the initial ground fitting formula (| ax) _i +by _i +cz _i +d _i |) calculating a ground fitting value error _ current corresponding to the traversed image point, screening the ground fitting values smaller than a first threshold value e to form an effective ground fitting value set corresponding to the initial ground fitting formula of the current round, and accumulating all ground fitting values in the effective ground fitting value set of the current round to obtain a result when the ratio of the number of the corresponding image points in the effective ground fitting value set of the current round to the total number of the image points in the ground region is greater than a second threshold value d

And when the error _ sum in the current round is less than the error _ best and the error _ best is a third threshold value, constructing a ground fitting formula based on the values of a, b, c and d in the initial ground fitting formula of the current round, and when the error _ sum in the current round is more than or equal to the error _ best, repeating the steps to enter the next round, namely reselecting 3 image points to form a ground initial data set, constructing the initial ground fitting formula and obtaining the accumulation results of all ground fitting values in the current round until the initial ground fitting formula corresponding to the minimum value of the accumulation results of all ground fitting values in all rounds is defined as the ground fitting formula.

Through the process, the interference of some abnormal points can be effectively avoided, the obtained ground fitting formula is more fit to the ground, in addition, the values of a, b, c and d in the ground fitting formula are obtained by adopting a random consistency algorithm, so the obtained ground fitting formula can be used as an optimal model of a ground area in the first depth image, the influence of the abnormal points is effectively filtered, and the established ground equation is prevented from deviating from the ground.

Similarly, the building process of the marker fitting formula is logically consistent with the building process of the ground fitting formula, which is not described herein, but it should be emphasized that, because there is usually more than one marker region, the marker fitting formula corresponding to the multiple marker regions one to one needs to be applied.

In the above embodiment, the method for obtaining the background mask of the current scene by fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas includes:

constructing a ground equation based on a ground fitting formula and constructing a marker equation based on a marker fitting formula;

screening out image points with the ground distance smaller than a ground threshold value and filling the image points with the marker distance smaller than a marker threshold value as a ground mask, and screening out image points with the marker distance smaller than the marker threshold value and filling the image points with the marker distance smaller than the marker threshold value as a marker mask;

In specific implementation, general equations are used

Respectively calculating a ground equation and a marker equation when the numerator | ax _i +by _i +cz _i +d _i I is a ground fitting formula, and denominators a, b and c are in the ground fitting formulaWhen the value of (c) is greater than the value of (d), then the equation represents the ground equation when the numerator | ax _i +by _i +cz _i +d _i If | is the marker fitting formula and the denominators a, b, c are values in the marker fitting formula, then the equation represents the marker equation. After the ground equation and the marker equation are constructed, all image points in the first depth image are traversed, the ground equation and the marker equation are respectively substituted to obtain the ground distance and the marker distance of the image point, the image point with the ground distance smaller than the ground threshold is screened out and filled as a ground mask, and the image point with the marker distance smaller than the marker threshold is screened out and filled as a marker mask.

Exemplarily, the ground threshold and the marker threshold are both set to 10cm, that is, an area within 10cm of the ground is defined as a ground mask, an area within 10cm of the marker is defined as a marker mask, and finally the ground mask and all the marker mask areas are defined as a background mask of the current scene. Through the establishment of the scene background mask, the noise on the marker area and the ground area is effectively filtered, and the problem that the performance of the algorithm is reduced due to the noise generated when the depth camera shoots the areas is solved.

In the above embodiment, the method for updating the background mask according to the pixel points in the multiple continuous second depth images and the pixel points in the background mask includes:

identifying pixel points with changed depth values, updating the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image into large values in comparison results, enabling m = m +1, and comparing the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image with the depth values of the pixel points at corresponding positions in the m +1 th frame of second depth image again until the pixel points at the positions and the depth values corresponding to the pixel points in the last frame of second depth image are obtained;

During specific implementation, internal parameters and external parameters of the depth camera are calibrated firstly to convert two-dimensional coordinates of the image into three-dimensional coordinates, so that correlation calculation can be performed through actual physical meanings. And then continuously shooting 100 frames of second depth images by using each depth camera, and performing background updating on the background mask according to the 100 frames of second depth images shot by each depth camera. The updating process comprises the following steps: through comparing the depth values of all the same position pixel points (row, col) in 100 frames of second depth images, the maximum value of the corresponding depth value of each same position pixel point (row, col) is screened out from 100 frames of second depth images, so that the depth value corresponding to all the position pixel points (row, col) in the 100 th frame of output second depth images is the maximum value in the 100 frames of second depth images, and the purpose of setting is as follows: because what the degree of depth camera adopted is the scheme of bowing, consequently when appearing passing object (if the pedestrian passes) in the second depth image, the degree of depth value of relevant position pixel can diminish, through getting the maximum value that the corresponding degree of depth value of same position pixel in the 100 frames second depth image, can effectively avoid the second depth image to appear the influence that the passing object caused by accident, has avoided appearing the pixel of passing object in the background mask. And then comparing the pixel points at the positions in the second depth image of the 100 th frame and the corresponding depth values thereof with the pixel points at the positions in the background mask and the corresponding depth values thereof to identify the pixel points with changed depth values, and updating the depth values of the pixel points at the corresponding positions in the background mask to be small values in comparison results so as to ensure the accuracy of the updated background mask.

It can be understood that the pixel parameter values are represented by the coordinate parameters of the pixels in the pixel coordinate system, and the image point parameter values are represented by the coordinate parameters of the image points in the visual coordinate system.

In the above embodiment, the method for comparing the pixel point of the third depth image obtained in real time with the updated background mask and locking the foreground region including the human body pixel in the third depth image includes:

Through a similar frame difference method, noise in the third depth image acquired in real time can be effectively filtered, and accuracy of foreground region identification is improved.

In the above embodiment, the method for merging and/or segmenting the human body region in the foreground region by using the region growing method includes:

according to a set growth threshold value, a connected domain marking algorithm is adopted to identify a human body pixel point set in a foreground region;

In the specific implementation, a human body pixel point set in a foreground region is identified through a connected domain marking algorithm, a growth threshold th _ grow is set to limit the growth range and the cut-to condition, then the growth mode is set to be an eight-connected growth mode, the human body pixel point set is identified by traversing pixel points from top to bottom in the foreground region, if the pixel points are not traversed, the pixel points (row, col) =0, if the pixel points are traversed and calculate the growth condition of the next pixel point, if the next pixel point (row +/-1, col +/-1) -the current pixel point (row, col) < th _ grow is satisfied, it is indicated that the growth difference value between the next pixel point and the current pixel point is smaller than the threshold th _ grow, the next pixel point is the current pixel point, otherwise, the pixel point in the direction is cut to grow again from the other direction until all the pixel points (row, col) =1, namely all the pixel points are traversed and one or more human body pixel point sets are obtained, the scheme is compared with the area filtering scheme, the growth threshold can control the growth limit condition, and the occurrence of dense crowd is prevented. Then, by calculating the projection distance of the central point and every two central points of each individual voxel point set on the ground area, if the central point a and the central point B, the two-point straight line segment is AB, the included angle between AB and the ground equation is θ, and the calculation formula of the body distance corresponding to the two individual voxel point sets is distance = AB × cos (θ), if the body distance is greater than the distance threshold, the two individual voxel point sets are corresponding to body areas generated by two different bodies, otherwise, the two body voxel point sets are regarded as the body area generated by the same body.

In the above embodiment, the method for obtaining human body detection data includes:

the method comprises the steps of locking a human body area or head areas of a plurality of human body areas in an area growing mode, and calculating human body detection data corresponding to the human body area or the human body areas by using a ground equation, wherein the human body detection data comprise the height of a human body and the pixel point coordinates of the head.

In specific implementation, local highest pixel points of one human body region or a plurality of human body regions are searched in a down-sampling mode based on set distance intervals, then head regions of one human body region or a plurality of human body regions are obtained through region growing in a small range, the step is to limit the variants of region growing, growth towards a high position is allowed, a threshold value is added when growth towards a low position is carried out, and the method can be used for solving the problem that the local highest pixel points of the human body region or the plurality of human body regions are not suitable for the existing human body region or the head regions of the human body regionPreventing the head from growing excessively to the shoulder, then calculating the average value of pixel points in the head region to obtain the three-dimensional coordinates (x, y, z) of the center point of the head, and obtaining the three-dimensional coordinates (x, y, z) of the center point of the head through a formula

And calculating the height of the head from the ground. In summary, the human body detection data includes the head region and the body height, and the two-dimensional and three-dimensional coordinates of the head center point.

The method has the advantages that the situation that a plurality of depth cameras are used for processing activities of a plurality of people entering a scene simultaneously in one shooting scene is considered, the shooting visual angles of all the depth cameras can be overlapped in a small part when the depth cameras are installed, the visual angle coverage area of the cameras is utilized to the maximum extent, and the pedestrian tracking detection function is realized by using the cross-lens tracking technology of the REID module. And the detection of each depth camera is separately carried out, and finally, data are fused through mutual verification, so that the depth cameras can be rapidly expanded when being increased along with the field expansion, and the method has good algorithm robustness, multi-scene reusability and new version expansibility.

Example two

The present embodiment provides a pedestrian detection apparatus based on a depth image, including:

the mask generation unit is used for fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas to obtain a background mask of the current scene;

Compared with the prior art, the beneficial effects of the pedestrian detection device based on the depth image provided by the embodiment of the invention are the same as those of the pedestrian detection method based on the depth image provided by the first embodiment, and are not repeated herein.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the steps of the above-described depth-image-based pedestrian detection method.

Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment are the same as those of the pedestrian detection method based on the depth image provided by the above technical scheme, and are not repeated herein.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the invention may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the embodiment, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A pedestrian detection method based on a depth image, wherein the depth image is obtained by a depth camera through shooting, and the method is characterized by comprising the following steps:

merging and/or segmenting the human body area in the foreground area by adopting an area growing mode to obtain human body detection data;

the method for obtaining the background mask of the current scene by fusing the ground mask established by the ground fitting formula and the marker masks established by the marker fitting formulas comprises the following steps:

traversing image points in the first depth image, and respectively substituting the image points into a ground equation and a marker equation to obtain the ground distance and the marker distance of the image points;

2. The method of claim 1, wherein the method for constructing the ground fitting formula based on the framed ground area in the first depth image comprises:

s14, screening out the ground fitting values smaller than the first threshold value to generate an effective ground fitting value set of the ith wheel, wherein the initial value of i is 1;

s16, when the accumulation result of all the ground fitting values in the ith round is smaller than a third threshold value, defining the initial ground fitting formula corresponding to the ith round as the ground fitting formula, when the accumulation result of all the ground fitting values corresponding to the ith round is larger than the third threshold value, enabling i = i +1, and returning to the step S12 when i does not reach the threshold value round number, otherwise, executing the step S17;

3. The method of claim 1, wherein constructing a marker fitting formula corresponding one-to-one to a marker area based on at least one marker area comprises:

s23, constructing an initial marker fitting formula based on the currently selected n image points, traversing unselected image points in the initial data set, and sequentially substituting the unselected image points into the initial marker fitting formula to calculate a marker fitting value of the corresponding image point;

s25, when the ratio of the number of the image points corresponding to the valid marker fitted value set of the ith round to the total number of the image points in the marker region is larger than a second threshold value, accumulating all the marker fitted values in the valid marker fitted value set of the ith round;

s26, when the accumulated result of all the fitting values of the markers in the ith round is smaller than a third threshold value, defining the initial marker fitting formula corresponding to the ith round as the marker fitting formula, when the accumulated result of all the fitting values of the markers corresponding to the ith round is larger than the third threshold value, enabling i = i +1, and returning to the step S22 when i does not reach the threshold round number, otherwise, executing the step S27;

4. The method of claim 1, wherein the method for updating the background mask according to the pixel points in the multiple frames of continuous second depth images and the pixel points in the background mask comprises:

5. The method of claim 4, wherein comparing pixel points of a third depth image obtained in real time with an updated background mask, and locking a foreground region containing human body pixels in the third depth image comprises:

comparing the pixel points at the positions and the corresponding depth values in the third depth image acquired in real time with the pixel points at the positions and the corresponding depth values in the updated background mask;

6. The method according to claim 5, wherein the method for merging and/or segmenting the human body region in the foreground region by using a region growing method comprises:

connecting the obtained central points pairwise, calculating the distance of the connecting lines, and meanwhile, orthographically projecting each connecting line to the ground area to respectively obtain an included angle theta between each connecting line and the ground equation;

7. The method of claim 6, wherein obtaining human body detection data comprises:

8. The method of any one of claims 1-7, wherein the marker region is a shelf region.

9. A pedestrian detection device based on a depth image, the depth image being obtained by a depth camera by overhead shooting, the device comprising:

the human body detection unit is used for merging and/or segmenting the human body area in the foreground area in an area growing mode to obtain human body detection data;

wherein the mask generation unit is further configured to: