CN111144213B

CN111144213B - Object detection method and related equipment

Info

Publication number: CN111144213B
Application number: CN201911175243.8A
Authority: CN
Inventors: 孟令康; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-08-18
Anticipated expiration: 2039-11-26
Also published as: CN111144213A

Abstract

The embodiment of the application discloses an object detection method and related equipment, wherein after a depth image to be detected is acquired, a foreground region can be segmented from the depth image according to the depth value of a pixel point in the depth image. Since the target object has a foreground region that is more likely to be in the depth image, a corresponding point cloud can thus be generated for only the foreground region. Thus, according to the generated point cloud, whether the target object is included in the depth image is detected. According to the method, the foreground region which is more likely to comprise the target object in the depth image is subjected to point cloud generation and further detection, and the background region which is unlikely to comprise the target object is removed, namely, the point cloud is not generated for the regions, so that the calculated amount is reduced, and the object detection efficiency and the detection instantaneity are improved.

Description

Object detection method and related equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to an object detection method and related apparatus.

Background

Object detection is widely applied in modern life, wherein the object is a human body, and the like, and an object detection algorithm can provide data for a plurality of applications such as security, entertainment, accurate service, and the like in places such as markets, stations, families, and the like. In a scene with too dark light, an object detection algorithm based on the color map fails, and an object detection algorithm based on the depth map is good in performance, so that the defect of the color map is well overcome.

At present, a method for detecting an object based on a depth map mainly comprises a detection algorithm of point clouds, wherein the method comprises the steps of preprocessing a depth image and converting the depth image into the point clouds, and then determining the ground point clouds according to prior information, so that non-ground point clouds are extracted, and the extracted point clouds are further detected. The point cloud is a vector formed by a plurality of space points, and each space point comprises corresponding space coordinate information, color information and the like.

In this way, point clouds are required to be generated from scenes, such as the ground, which do not include the target object in the depth image, and the point clouds are calculated, so that the object detection efficiency is low and the real-time performance is poor.

Disclosure of Invention

In order to solve the technical problems, the application provides an object detection method and related equipment, which reduces the calculated amount and improves the object detection efficiency and the detection instantaneity.

The embodiment of the application discloses the following technical scheme:

in a first aspect, an embodiment of the present application provides an object detection method, where the method includes:

acquiring a depth image to be detected;

dividing a foreground region from the depth image according to the depth value of the pixel point in the depth image;

Generating a point cloud corresponding to the foreground region;

and detecting whether the depth image comprises a target object according to the point cloud.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

an acquisition unit for acquiring a depth image to be detected;

the segmentation unit is used for segmenting a foreground region from the depth image according to the depth value of the pixel point in the depth image;

the generating unit is used for generating a point cloud corresponding to the foreground region;

and the detection unit is used for detecting whether the depth image comprises a target object or not according to the point cloud.

In a third aspect, embodiments of the present application provide an apparatus for object detection, the memory being configured to store program code and to transmit the program code to the processor;

the processor is configured to perform the object detection method according to the first aspect according to instructions in the program code.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing program code for executing the object detection method according to the first aspect.

According to the technical scheme, after the depth image to be detected is acquired, the foreground region can be segmented from the depth image according to the depth value of the pixel point in the depth image. Since the target object has a foreground region that is more likely to be in the depth image, a corresponding point cloud can thus be generated for only the foreground region. Thus, according to the generated point cloud, whether the target object is included in the depth image is detected. According to the method, the foreground region which is more likely to comprise the target object in the depth image is subjected to point cloud generation and further detection, and the background region which is unlikely to comprise the target object is removed, namely, the point cloud is not generated for the regions, so that the calculated amount is reduced, and the object detection efficiency and the detection instantaneity are improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a flowchart of an object detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for segmenting a foreground region from a depth image according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for detecting whether a depth image includes a target object according to a point cloud according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for determining whether a sub-point cloud corresponds to a target object according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for determining whether a forward projection corresponds to a target object according to an embodiment of the present application;

FIG. 6 is a flowchart of a human body detection method according to an embodiment of the present application;

fig. 7 is a block diagram of an object detection apparatus according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

At present, a method for detecting an object based on a depth map mainly comprises a detection algorithm of point clouds, and because the method needs to generate point clouds from scenes, such as the ground, and the like, which do not comprise a target object in the depth image, and calculate the point clouds, the method has low object detection efficiency and poor real-time performance.

Therefore, the embodiment of the application provides an object detection method, which reduces the calculated amount and improves the object detection efficiency and the detection instantaneity by generating the point cloud in the foreground region which is more likely to include the target object in the depth image and further detecting the point cloud.

First, an execution body of an embodiment of the present application will be described. The object detection method provided by the application can be applied to image processing equipment, such as terminal equipment and servers. The terminal device may be a user terminal, and the terminal device may be, for example, an intelligent terminal, a computer, a personal digital assistant (Personal Digital Assistant, abbreviated as PDA), a tablet computer, or the like.

The object detection method can also be applied to a server, and the server can acquire the depth image to be detected, which is sent by the terminal equipment, detect the object of the depth image, and send the detection result to the terminal equipment. The server may be a stand-alone server or a server in a cluster.

In order to facilitate understanding of the technical scheme of the present application, the following describes the object detection method provided by the embodiment of the present application with a server as an execution body and with reference to an actual application scenario.

In the embodiment of the application, the server can acquire the depth image to be detected, and the pixel points in the depth image correspond to the depth values. The depth value of the pixel point can be used for representing the distance degree between the real scene corresponding to the pixel point and the camera lens, and if the depth value of the pixel point is larger, the real scene corresponding to the pixel point is farther from the camera; and if the depth value of the pixel point is smaller, the real scene corresponding to the pixel point is closer to the camera.

The foreground region and the background region may be segmented from the depth image, wherein a pixel depth value in the foreground region is lower than a pixel depth value in the background image. In the following, a foreground area and a background area in a depth image are illustrated, and assuming that one depth image captured by a camera includes a human body close to the camera (with a lower depth value) and a wall far from the camera (with a higher depth value), an image area corresponding to the human body in the depth image may be taken as the foreground area and an image area corresponding to the wall may be taken as the background area.

In an actual scene, when detecting an object in an image, the object to be detected is more likely to be close to a camera lens, so that a clear image aiming at the object can be obtained, and the accuracy of object detection is improved. As such, the target object to be detected is more likely to be distributed in the foreground region in the depth image than in the background region in the depth image. For example, based on the above example, it is assumed that the target object to be detected is a human body, and an image area corresponding to the human body in the depth image is a foreground area.

Based on the above, the server may segment the foreground region from the depth image according to the depth value of the pixel point in the depth image, and generate a corresponding point cloud for the foreground region. Thus, according to the generated point cloud, whether the target object is included in the depth image is detected.

By executing the object detection method, the foreground region which is more likely to comprise the target object in the depth image is generated and further detected, and the background region which is unlikely to comprise the target object is removed, namely, the point cloud is not generated for the regions, so that the calculated amount is reduced, and the object detection efficiency and the detection instantaneity are improved.

Next, an object detection method provided by the embodiment of the present application will be described with a server as an execution subject.

Referring to fig. 1, a flowchart of an object detection method provided by an embodiment of the present application is shown, where the method may include:

s101: and acquiring a depth image to be detected.

It should be noted that, the embodiment of the present application is not limited to the manner in which the server obtains the depth image to be detected, and may select a suitable manner to obtain the depth image to be detected according to the actual situation. For example: the depth image may be acquired by the server after being photographed by a camera having a depth image photographing function, or the depth image may be acquired by the server after being processed into a depth image according to a color image.

S102: and dividing a foreground region from the depth image according to the depth value of the pixel point in the depth image.

In the embodiment of the application, the server can divide the foreground region from the depth image according to the depth value of the pixel point in the depth image.

For example: a depth threshold value can be preset, and a region formed by pixel points with the depth value smaller than the depth threshold value in the depth image is taken as a foreground region and is segmented.

In addition, in one possible implementation, a pixel point in the depth image may correspond to a background probability condition. Wherein, the background probability condition corresponding to each pixel point in the depth image may be different. When the depth value of the pixel point meets the background probability condition corresponding to the pixel point, the pixel point can be determined to be a background pixel point. Then, referring to fig. 2, which is a flowchart illustrating a method for segmenting a foreground region from a depth image according to an embodiment of the present application, as shown in fig. 2, the method for segmenting a foreground region from a depth image according to a depth value of a pixel point in the depth image in S102 may include:

s201: and determining whether the depth value of the target pixel point meets the corresponding background probability condition, and if not, executing S202.

The target pixel point may be any one pixel point in the depth image.

S202: and determining the target pixel point as a foreground pixel point belonging to the foreground region.

That is, if the depth value of the target pixel does not satisfy the corresponding background probability condition, the target pixel may be regarded as a foreground pixel, and the region composed of the foreground pixels may be segmented from the depth image as a foreground region.

The specific implementation procedure of S201-S202 is described below, assuming that the depth image has a length W and a width H, and the set of pixels s= { (i, j) |0+.i < W, 0+.j, H }, where (i, j) can identify the position in the depth image, and each pixel (i, j) ∈s. Recording each pixel point to have a background probability distribution, wherein the background probability distribution obeys a mean value ofStandard deviation of->Is a normal distribution of (c).

Depth value I based on target pixel point in depth image _i,j The background probability condition corresponding to the target pixel point can beIf the depth value I _i,j The background probability condition is satisfied, the pixel point can be determined to be a background pixel point belonging to a background area, and if the background probability condition is not satisfied, the pixel point can be determined to be a foreground pixel point belonging to a foreground area. Wherein, alpha can be a fixed parameter, and the value range of the alpha can be 1.0 to 3.0.

In a specific implementation, a probability model may be generated based on a background probability condition, so as to input a depth image into the model, and output a 0-1 binary image corresponding to the depth image, where a pixel point in the output image may represent a background pixel point of the background region to which the pixel point belongs when the pixel point is 0, and may represent a foreground pixel point of the foreground region to which the pixel point belongs when the pixel point is 1.

After the output image is obtained, the pixel points with the value of 1 can be clustered according to the depth value and the adjacent relation, and then the image is obtainedAnd a foreground region is obtained and segmented from the depth image. Wherein the set of segmented foreground regions may be S _F ＝{s ₁ ,s ₂ ,Λ,s _n Each element s in the set _k (k=1, 2, …, n) can be a foreground regionMay be a set of coordinates of the foreground region in a coordinate system in the output image. The coordinate system of the output image may be a coordinate system origin with the position of the lower left corner of the output image.

By the method, the foreground region can be accurately segmented from the depth image.

S103: and generating a point cloud corresponding to the foreground region.

The point cloud is a vector formed by a plurality of space points, and each space point comprises corresponding space coordinate information, color information and the like.

In an actual scene, when generating a point cloud for each scene in a depth image, the point cloud needs to be generated according to the shape (outline) of each scene, and if the quality of the depth image is poor, the quality of the generated point cloud is low.

In the embodiment of the present application, by performing the segmentation (segmenting the foreground region) in advance, since the segmented foreground region already has the segmented contour feature, even if the depth image is poor in quality, the point cloud generated according to the segmented contour of the foreground region in S103 can still have higher quality. Compared with the prior art, the method has the advantages that the method is rapid and efficient in segmentation on the point cloud.

In a specific implementation, the optical center (c _x ,c _y ) Focal length (f) _x ,f _y ) Converting coordinates (i, j) of pixel points in each foreground region into spatial coordinates P in a camera coordinate system _i,j ＝(x _i,j ,y _i,j ,z _i,j ) ^T ，z _i,j ＝I _i,j 。

Thus, based on the foreground region set S _F ＝{s ₁ ,s ₂ ,Λ,s _n The set v= { V } of point clouds may be generated ₁ ,v ₂ ,Λ, _v n }, each point cloudCorresponding to a foreground region s _k Point cloud elements in a point cloud are recorded as

S104: based on the point cloud, it is detected whether a target object is included in the depth image.

Thus, the server can detect whether the depth image includes the target object according to the point cloud of the foreground region.

In one possible implementation manner, in order to detect a moving target object in the depth image, the object detection method may further include:

s301: after the detection of the continuous T1 frame depth images is completed, for a first set including pixel points at the target positions in the continuous T1 frame depth images, if the number of foreground pixel points in the first set is not less than a first number threshold, the server may update the background probability condition corresponding to the pixel points at the target positions to the foreground probability condition.

The target position may be any position in the depth image, for example, the target position may be a position where a pixel point in the upper left corner of the depth image is located, and the target position may include one pixel point or may include a plurality of pixel points.

The first quantity threshold value can be used for measuring whether the quantity of the pixels belonging to the foreground pixels in the first set is proper, and if the quantity of the pixels belonging to the foreground pixels in the first set is not less than the first quantity threshold value, it can be determined that the quantity of the pixels belonging to the foreground pixels in the first set is more.

In the actual scene, when it is determined that the pixel points at the target position in the T1 frame depth image belong to more foreground pixel points, it can be stated that the target object, such as a human body, may appear in the actual scene corresponding to the target position of the depth image. Meanwhile, if the number of the pixel points belonging to the foreground pixel points at the target position in the T1 frame depth image is large, the target object in the real scene corresponding to the target position can be represented to be continuously at the position, namely still.

In this way, in order to detect a moving target object, the server may update the background probability condition corresponding to the pixel point on the target position to the foreground probability condition. The foreground probability condition may be determined based on depth values of pixels at a target location in the consecutive T1 frame depth images, and the foreground probability condition may be used to determine whether a pixel at the target location in a depth image subsequent to the consecutive T1 frame depth images is a foreground pixel. The depth image after the consecutive T1 frame depth image may refer to a depth image acquired after the consecutive T1 frame depth image according to the timing of the video described above.

The foreground probability condition for a pixel at the target location may determine a pixel having a smaller depth value as a foreground pixel than the background probability condition described above. In addition, since the foreground probability condition may be determined according to the depth values of the pixels at the target positions in the consecutive T1 frame depth images, if the depth values of the pixels at the target positions in the depth images subsequent to the consecutive T1 frame depth images are unchanged, that is, the target object in the real scene corresponding to the target position is not moved, the depth values of the pixels at the target positions (corresponding to the target object) cannot be determined as foreground pixels based on the foreground probability condition.

Next, the method of S301 is described based on the specific implementation procedure of S201-S202, and a background counter may be presetAfter sequentially detecting the successive frames of depth images, if the pixel point at the target position is determined to be the background pixel point, +.>After the detection of consecutive T1 frame depth images is completed, if +.>(C _B =t1-first number threshold value), for the pixel point of the target position, its background probability condition may be updated to a foreground probability condition, wherein +. > I.e. the foreground probability distribution obeys the mean +.>Standard deviation->Is a normal distribution of (c).

It should be noted that, the embodiment of the present application is not limited to the distribution manner of the background probability condition and the foreground probability condition, and may also use a distribution such as gaussian distribution, uniform distribution, and non-parametric distribution.

In addition, after determining whether the pixel is a foreground pixel based on the background probability condition or the foreground probability condition of each pixel in the depth image, the foreground probability condition or the background probability condition of the pixel may be updated. The updating method may include:

if the pixel is determined to be the background pixel, the depth value I of the pixel is based on _i,j The background probability condition is updated as follows:wherein t is a weight parameter; in addition to this, the other cases can be made +.> Wherein sigma _m Is the upper limit of the standard deviation. The weight parameter t may be varied over time or other conditions.

The self-adaptive updating of the foreground probability condition and the background probability condition is carried out for the pixel points in the depth image, so that a moving target object in the depth image can be detected, and the speed of the whole calculation process is improved on the premise of ensuring stable results.

In one possible implementation, if the pixel point at the target position corresponds to a foreground probability condition, the method may further include:

S401: after the detection of the continuous T2 frame depth images in the video is completed, aiming at a second set comprising the pixel points at the target positions in the continuous T2 frame depth images, if the number of the pixel points belonging to the background in the second set is not less than a second number threshold, updating the foreground probability condition corresponding to the pixel points at the target positions into the background probability condition.

The second number threshold may be used to measure whether the number of pixels belonging to the background in the second set is appropriate, and if the number of pixels belonging to the background in the second set is not less than the second number threshold, it may be determined that the number of pixels belonging to the background in the second set is greater.

In the actual scene, when it is determined that more pixels at the target position in the T2 frame depth image belong to background pixels, it can be explained that the target object originally existing in the actual scene corresponding to the target position of the depth image has been moved away.

In this way, to ensure that the target object is detected, the server may update the foreground probability condition corresponding to the pixel point at the target position to the background probability condition. The background probability condition may be determined based on depth values of pixels at a target location in the consecutive T2 frame depth images, and the background probability condition may be used to determine whether a pixel at the target location in a depth image subsequent to the consecutive T2 frame depth images is a foreground pixel.

By the method, the accuracy of target object detection can be improved.

In an actual scene, due to reasons of acquisition equipment, environment and the like, a depth image with poor quality is easy to obtain, for example, pixels corresponding to a distant or corner scene in the depth image are zero-depth-value pixels, namely, pixels with depth values of 0, and the zero-depth-value pixels are usually noise points. To this end, in one possible implementation, the method may further comprise:

s501: after the detection of the continuous T3 frame depth images in the video is completed, aiming at a third set comprising the pixel points at the target position in the continuous T3 frame depth images, if the number of the pixel points belonging to the zero depth value in the third set is not less than a third number threshold, the pixel points at the target position in the depth image are not included in the target pixel points.

The target position may be any position in the depth image, for example, the target position may be a position where a pixel point in the upper left corner of the depth image is located, and the target position may include one pixel point or may include a plurality of pixel points. The target position in this embodiment may be the same as or different from that in the previous embodiment.

The third number threshold may be used to measure whether the number of pixels belonging to the zero depth value in the third set is appropriate, and if the number of pixels belonging to the zero depth value in the third set is not less than the third number threshold, it may be determined that the number of pixels belonging to the zero depth value in the third set is greater.

When the server determines that the number of pixels belonging to the zero depth value in the third set is excessive, that is, the pixel points at the target position in the T3 frame depth image are in the continuous zero depth value pixel points or intermittent zero depth value pixel points, the target pixel point in S201 may not include the part of the pixel points at the target position.

The method of S501 is described below based on the specific implementation procedure of S201-S202. A zero value counter can be presetAnd distribution failure flag is->After sequentially detecting the continuous depth images of each frame, if the pixel point on the target position is determined to be the zero depth value pixel point, the +.>After detection of consecutive T3 frame depth images, if +.>Make->Wherein C is _Z May be a third number threshold, when the distribution failure flag +.>In this case, it may indicate that the pixel at the target location is invalid, that is, the target pixel in S201 may not include the pixel at the target location. If- >Not true, let->The pixel point at the target location may be indicated as not being dead, and the target pixel point in S201 may include the pixel point at the target location.

By executing the method, the invalid zero-depth value pixel point serving as the target pixel point can be prevented from being segmented into the foreground region, and the influence of noise points on object detection is reduced.

In an actual scene, there may be a case where two objects or more objects are in contact in a scene close to the camera (segmented into foreground in S102), and thus one foreground region segmented in S102 may be a connected region corresponding to two or more objects. In this case, in order to improve accuracy of object detection, referring to fig. 3, which shows a flowchart of a method for detecting whether a depth image includes a target object according to a point cloud according to an embodiment of the present application, as shown in fig. 3, a method for detecting whether a depth image includes a target object according to a point cloud in S104 may include:

s601: world coordinates of the point Yun Zhongdian cloud element are determined.

In the case where the generated point cloud is a point cloud in a camera coordinate system, world coordinates of point cloud elements in the point cloud may also be determined.

In a specific implementation, based on S103, a world coordinate system may be established by taking the projection of the camera on the ground as an origin, taking the direction of the origin pointing to the camera as a z-axis, and taking the projection of the camera view axis on the ground as a y-axis. And a transformation matrix M from the camera coordinate system to the world coordinate system is obtained through calibration, so that the point cloud set V under the camera coordinate system is transformed into the point cloud set W under the world coordinate system. Wherein the point cloud v _i The transformation method of transforming one point cloud element coordinate in the (camera coordinate system) to the point cloud element coordinate in the world coordinate system is (x) _w ,y _w ,z _w ,i,j) ^T ＝M(x _v ,y _v ,z _v ,i,j) ^T ，(x _w ,y _w ,z _w ,i,j) ^T Is taken as a pointCloud V _i One point cloud element coordinate in (camera coordinate system), (x) _v ,y _v ,z _v ,i,j) ^T May be coordinates of a point cloud element in a world coordinate system.

S602: an altitude map and a density map are generated from world coordinates.

The pixel points in the height map and the density map correspond to ground positions in a ground coordinate system, the pixel points in the height map identify maximum height values of the corresponding ground positions, and the maximum height values are maximum values in heights corresponding to point cloud elements projected to the ground positions; the pixel points in the density map identify the number of point cloud elements projected to the corresponding ground location.

In the embodiment of the application, the grid processing is performed according to the world coordinates of the point Yun Zhongdian cloud element, and a height map and a density map are obtained.

The following describes the generation manner of the height map and the density map based on the specific implementation process in S601, and for the point cloud elements of each point cloud in the point cloud set W, the point cloud elements are reduced and projected onto the ground plane coordinate system according to a certain scaling factor β, and the height map H and the density map D are obtained. Each pixel point (k, l) in the height map identifies a maximum height value in the height values of the point cloud elements projected to the ground position corresponding to the pixel point, and each pixel point (k, l) in the density map identifies the number of the point cloud elements projected to the ground position corresponding to the pixel point. Wherein k is more than or equal to 0 and less than w _M ,0≤l＜h _M ，w _M The number of pixels h in the transverse direction is the height map and the density map _M The number of pixels included in the vertical direction for the height map and the density map.

The maximum height value H of each pixel point in the height map can be calculated by the following formula _k,l And the number D of point cloud elements of each pixel point in the density map _k,l ：

The calculation amount can be reduced by reducing the point cloud element of each point cloud in the point cloud set W through the scaling factor β.

S603: and determining the predicted ground position of the target object according to the height map and the density map.

In one possible implementation, the predicted ground location of the target object may be determined from the elevation map and the density map and by a first discriminant function. The first discriminant function may be determined according to a relationship between a target height and a target density of a point cloud corresponding to the target object projected to the ground.

Based on the specific implementation process in S602, the specific implementation manner of S603 is described below, where the first discriminant function may be preset as follows:

wherein, h, d can be the maximum height value and the number of point cloud elements of the pixel points corresponding to the same ground position in the height map and the density map respectively; u, v may be the tuning parameters.

Then, it can be determined whether g (H) is satisfied based on the pixel points (k, l) corresponding to the same ground position in the height map and the density map _k,l ,D _k,l )＝max{g(H _k+a,l+b ,D _k+a,l+b ) I (a, B) ∈b (r) }, where B (r) may be a pixel neighborhood with radius r in the height map or density map. If so, the pixel (k, l) may be added to the set of locations Q.

Finally, the radius of the set Q based on the target object can be subjected to non-maximized inhibition, so that the ground position spacing corresponding to the pixel points in the set is not too close to obtain the set Q'.

It should be noted that, when the point cloud is subdivided (i.e., the sub-point cloud is divided), the point cloud may be divided into a neighborhood form having other shapes, such as a rectangle, an ellipse, or the like, in addition to the circular region.

S604: and dividing the point cloud into sub point clouds according to the predicted ground position.

Then, the server can subdivide the point cloud according to the predicted ground position to obtain sub-point clouds. The point cloud may be subdivided based on the shape of the target object, as may be based on the predicted ground location.

Based on the implementation process in S603, the implementation manner of S604 is described below, and for each ground position corresponding to each pixel point (k, l) in the set Q', the ground position is determined according to the specified radius r _b Repartitioning the point cloud (corresponding to the size of the target object), and setting the ground position corresponding to the pixel point to be smaller than the radius r _b Forms a sub-point cloud. The resulting sub-point clouds are then saved in U, i.e., u= { U ₁ ,u ₂ Λ, each u _q May refer to a cloud of sub-points.

S605: it is determined whether the sub-point cloud corresponds to a target object.

The application of the point cloud to perform data processing in the related art generally performs voxelization, calculates patch information and the like to analyze the three-dimensional shape of the object, so that the detection range is larger, the distance is longer, and the target objects may have shielding, which further results in that the three-dimensional shape information of the determined point cloud may be incomplete or misleading. The method provided by the embodiment of the application obtains the predicted ground position of the pedestrian possibly through rasterizing the point cloud and non-greatly inhibiting. And after the predicted ground positions are obtained, the detection algorithm is not directly operated, but the point cloud is segmented again, so that the obtained sub-point cloud is ensured not to belong to a communication area comprising two or more objects, and a precondition is provided for accurately detecting the object even if the obtained sub-point cloud excludes the interference of surrounding communication areas.

In an actual scenario, based on the mounting angle of the camera (for obtaining the depth image) being generally taken from a top view, the sub-point cloud is caused to be a point cloud under the top view, so, in one possible implementation manner, in order to improve the accuracy of object detection, referring to fig. 4, which shows a flowchart of a method for determining whether the sub-point cloud corresponds to the target object according to the embodiment of the present application, a method for determining whether the sub-point cloud corresponds to the target object in S605 may include:

s701: and determining forward projection of the sub-point cloud along the direction parallel to the ground according to the shooting angle of the camera.

In the embodiment of the application, the sub-point cloud can be corrected to be a head-up view angle, and forward projection of the sub-point cloud along the direction parallel to the ground is determined.

Based on the implementation process in S604, the implementation manner of S701 is described below, for each sub-point cloud u _q E U, calculated by the following formula

Wherein x is _o Is u _q The abscissa value of the point cloud element in (a).

Then, the projection window size (w _p ,h _p ) The forward projection J is obtained:

in order to reduce the calculation amount, the scaling parameter s may also be applied when obtaining the forward projection of the sub-point cloud _p To perform scaling. Then, forward projection J is:

s702: it is determined whether the forward projection corresponds to a target object.

By executing the method, the sub-point cloud is corrected to be the point cloud under the head-up view angle, so that the accuracy of object detection is improved.

It should be noted that the embodiment of the present application is not limited to the target object to be detected, and in one possible implementation manner, the detected target object may be a human body.

In a possible implementation manner, if a forward projection of the sub-point cloud along a direction parallel to the ground is determined, referring to fig. 5, which shows a flowchart of a method for determining whether the forward projection corresponds to the target object according to an embodiment of the present application, as shown in fig. 5, a method for determining whether the forward projection corresponds to the target object in S702 may include:

s801: and taking the maximum height value of each column of pixel points in the integrated projection as the height value of each column of pixel points.

Wherein the integrated projection may comprise a forward projection of all sub-point clouds. Each column of pixels in the integrated projection may have a corresponding height value (ordinate value), and the server may use the maximum height value of each column of pixels as the height value of the column of pixels, where the maximum height value of each column of pixels may be the maximum ordinate value of the column of pixels.

In a specific implementation, the integrated projection may be located in a projection image, where the pixel points in the area of the integrated projection in the projection image are identified as non-zero values, and the pixel points in the projection image other than the integrated projection are identified as zero values.

In order to facilitate subsequent calculation, the height value of each column of pixel points of the integrated projection can be formed into a vector, which is marked as H _J . In addition, a detection mark vector M can be set for the comprehensive projection _J The dimension J of the detection mark vector is the number of columns included in the integrated projection, the number of the dimension in the detection mark vector identifies whether the corresponding column is removed from the integrated projection, and the initial value in the detection mark vector is 0. When the value of one dimension in the detection mark vector is 1, the column of pixel points of the comprehensive projection corresponding to the dimension should be removed from the comprehensive projection.

S802: and determining the number of the pixel points in each column in the integrated projection as the length value of the pixel points in each column.

In the embodiment of the application, the number of the pixel points in each column in the integrated projection can be determined and used as the length value of the pixel points in each column.

In specific implementation, the number of pixel points of each column of the integrated projection in the projection image, which is marked with a non-zero value, can be determined as the length value of each column of pixel points, and for facilitating subsequent calculation, the length value of each column of pixel points can be formed into a vector, which is marked as C _J 。

S803: and determining a maximum column corresponding to the maximum value obtained through the second discriminant function according to the height value and the length value of each column of pixel points.

The second discriminant function may be obtained by forward projecting a relationship between a target height value and a target length value when the forward projecting corresponds to a human body.

In a specific implementation, the second discriminant function may beWherein, h, d can be the height value and the length value of each column of pixel points respectively; u, v may be the tuning parameters. The subscript of the maximum column may then be determined according to the following formula: />

S804: determining whether the largest column is a local height peak column within a first search area centered on the largest column; if yes, S805 is executed.

In the embodiment of the present application, after determining the maximum column, a first search area may be determined in the integrated projection with the maximum column as a center. For example, a first search area is determined in the integrated projection with the maximum column as the center and the search radius s as the radius, and the first search area is an area between the left side s column of the maximum column and the right side s column of the maximum column in the integrated projection.

Then, it may be determined whether the height of the maximum column is a local height peak of the first search area, i.e., whether the maximum column is a local height peak column within the first search area, and if so, S805 may be performed.

In the actual scene, if the maximum column is a local height peak column in the first search area, it is indicated that the pixel point with the maximum height of the maximum column corresponds to the vertex of the head of the human body.

In a specific implementation, the subscript of the column of the first search area corresponding to the local height peak may be determined by the following formulaIf i _max ＝i _loc I.e., it may be indicated that the maximum column is determined to be a local height peak column within the first search area.

S805: and determining the first search area as a human body, updating the integrated projection in a first mode, and executing S803 on the updated integrated projection, namely determining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points.

The first way may be to reject the first search area from the integrated projection.

It will be appreciated that if it is determined that the maximum column is not a local height peak column in the first search area, in one possible implementation, the method of determining in S803 whether the maximum column is a local height peak column in the first search area centered on the maximum column may further include:

if not, updating the integrated projection by the second mode, and executing S803 on the updated integrated projection, namely determining a maximum column corresponding to the maximum value by the second discriminant function according to the height value and the length value of each column of pixel points.

The second way may be to reject the second search area from the integrated projection. The second search area is an area having a smaller number of columns centered on the largest column than the first search area.

Illustrating the determination of the second search area based on the aforementioned first search area, the search radius alpha is centered on the largest column in the integrated projection _m s is the radius to determine a second search area, the second search area is alpha from the left side of the maximum column in the comprehensive projection _m s columns to the right side alpha of the maximum column _m s columns. Wherein alpha is _m Is a fixed parameter, and alpha _m <1。

Therefore, the projection part which is detected in the comprehensive projection can be removed, repeated calculation is avoided, the calculated amount is reduced, and the detection efficiency is improved.

In an embodiment of the present application, in order to improve accuracy of human body detection, in a possible implementation manner, the method may further include:

s901: and determining the maximum length value according to the length value of each column of pixel points in the integrated projection.

In the embodiment of the application, the maximum length value of the integrated projection can be determined.

In a specific implementation, the maximum length value of the integrated projection may be determined by the following formula: c _max ＝max{C _J (i)|0≤i＜w _J }。

Then, the method for determining whether the maximum column is a local height peak column in the first search area centered on the maximum column in S804 may include:

s902: it is determined whether the maximum column is a local height peak column and the maximum length value meets the confidence condition within a first search region centered on the maximum column.

Wherein the credibility condition is determined according to the relation between the length of the target maximum length value and the length of the target maximum column when the forward projection corresponds to the human body.

In a specific implementation, the reliability condition may be that the length value of the maximum column is not less than a first parameter multiple of the maximum length value. Namely C _J (i _max )≥α _h c _max Wherein alpha is _h Is the first parameter. Alternatively, the reliability condition may be that the length value of the maximum column is not less than the second parameter times the height value of the maximum column. Namely C _J (i _max )≥β _h H _J (i _max ) Wherein beta is _h Is the second parameter.

In the embodiment of the present application, it is required to determine whether the maximum length value of the integrated projection meets the reliability condition, in addition to determining whether the maximum column is a local height peak column in the first search area centered on the maximum column in S804.

Next, the above human body detection method will be described with reference to fig. 6, which shows a flowchart of a human body detection method according to an embodiment of the present application, as shown in fig. 6, first, a maximum column of the integrated projection may be determined. It may then be determined whether the height value of the maximum column is a local peak within the first search area. If so, it may be determined whether the maximum length value of the integrated projection meets the reliability condition. If the confidence condition is met, columns within the first search area (column number corresponds to [ i ] _max -s,i _max +s]) The value corresponding to the detection flag vector is set to 1, so that the first search area is eliminated from the integrated projection, that is, the integrated projection is updated in the first manner, and the first search area is determined as a human body and stored in the human body detection result. If it is determined that the height value of the largest column is not a local peak in the first search area, or if it is determined that the maximum length value of the integrated projection does not meet the reliability condition, the column in the second search area (column number corresponds to [ i ] _max -α _m s,i _max +α _m s]) The value corresponding to the detection flag vector is set to 1 so that the second search area is removed from the integrated projection, i.e. the integrated projection is updated in a second way. If M _J The determination may be ended if both elements are 1, otherwise the maximum column in the integrated projection is redetermined.

In addition, the local peak algorithm can also replace other human body detectors according to the quality of the depth image, and the column where the detection result is located is taken as a peak value i _max 。

Considering the detail deficiency caused by the depth map quality in the related technology, the local texture features may not describe the human body, and by applying two statistical features of the height value and the density value, the human body detection method has robustness on the human body posture change and the mutual approaching of the human bodies, and improves the accuracy of human body detection.

Based on the above object detection method, the embodiment of the present application further provides an object detection device, as shown in fig. 7, where the diagram shows a structure diagram of the object detection device provided by the embodiment of the present application, as shown in fig. 7, and the device includes:

an acquiring unit 701, configured to acquire a depth image to be detected;

a segmentation unit 702, configured to segment a foreground region from the depth image according to a depth value of a pixel point in the depth image;

a generating unit 703, configured to generate a point cloud corresponding to the foreground area;

and a detection unit 704, configured to detect whether a target object is included in the depth image according to the point cloud.

In one possible implementation manner, the dividing unit 702 is specifically configured to:

the method comprises the steps that a background probability condition corresponds to a pixel point in a depth image, and whether a depth value of the target pixel point meets the corresponding background probability condition is determined according to the target pixel point in the depth image, wherein the target pixel point is any pixel point in the depth image;

if not, determining the target pixel point as a foreground pixel point belonging to a foreground region.

In a possible implementation manner, the apparatus further includes an updating unit, configured to:

After the detection of the continuous T1 frame depth images in the video is completed, for a first set comprising pixel points at a target position in the continuous T1 frame depth images, if the number of foreground pixel points in the first set is not less than a first number threshold, updating a background probability condition corresponding to the pixel points at the target position into a foreground probability condition, wherein the foreground probability condition is used for determining whether the pixel points at the target position in the depth images after the continuous T1 frame depth images are foreground pixel points belonging to a foreground region, and the foreground probability condition is determined according to the depth value of the pixel points at the target position in the continuous T1 frame depth images.

In a possible implementation, the updating unit is further configured to:

and if the number of the pixels belonging to the background pixel points in the second set is not less than a second number threshold value, updating the foreground probability condition corresponding to the pixels on the target position into a background probability condition, wherein the background probability condition is used for determining whether the pixels on the target position in the depth images after the continuous T2 frame depth images are foreground pixels belonging to a foreground region or not, and the new background probability condition is determined according to the depth value of the pixels on the target position in the continuous T2 frame depth images.

In a possible implementation, the updating unit is further configured to:

after the detection of the continuous T3 frame depth images in the video is completed, aiming at a third set comprising pixel points at the target position in the continuous T3 frame depth images, if the number of the pixel points belonging to the zero depth value in the third set is not less than a third number threshold, the pixel points at the target position in the depth image are not included in the target pixel points.

In one possible implementation manner, the detecting unit 704 is specifically configured to:

determining world coordinates of the point Yun Zhongdian cloud element;

generating an altitude map and a density map according to the world coordinates; wherein the pixel points in the height map and the density map correspond to ground positions in a ground coordinate system, the pixel points in the height map identify maximum height values of the corresponding ground positions, and the maximum height values are maximum values in heights corresponding to point cloud elements projected to the ground positions; the pixel point identification in the density map projects the number of point cloud elements of the corresponding ground position;

determining a predicted ground position of the target object according to the height map and the density map;

Dividing the point cloud into sub point clouds according to the predicted ground position;

it is determined whether the sub-point cloud corresponds to a target object.

In a possible implementation manner, the detecting unit 704 is further specifically configured to:

determining the predicted ground position of the target object through a first discriminant function according to the height map and the density map; the first discriminant function is determined according to the relation between the target height and the target density of the point cloud corresponding to the target object projected to the ground.

In a possible implementation manner, the detecting unit 704 is further specifically configured to determine, according to a shooting angle of the camera, a forward projection of the sub-point cloud along a direction parallel to the ground;

it is determined whether the forward projection corresponds to a target object.

In one possible implementation, the target object is a human body.

if the forward projection of the sub-point cloud along the direction parallel to the ground is determined, taking the maximum height value of each column of pixel points in the comprehensive projection as the height value of each column of pixel points, wherein the comprehensive projection comprises the forward projection of the sub-point cloud;

Determining the number of pixel points in each column in the comprehensive projection, and taking the number as a length value of each column of pixel points;

determining a maximum column corresponding to the maximum value obtained through a second discriminant function according to the height value and the length value of each column of pixel points, wherein the second discriminant function is obtained by the relation between a target height value and a target length value when the forward projection corresponds to a human body;

determining whether the maximum column is a local height peak column within a first search area centered on the maximum column;

if so, determining the first search area as a human body, updating the comprehensive projection in a first mode, executing the height value and the length value according to each column of pixel points on the updated comprehensive projection, and determining a maximum column corresponding to the maximum value through a second discriminant function; the first way is to reject the first search area from the integrated projection.

if not, updating the comprehensive projection in a second mode, and executing the step of determining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points on the updated comprehensive projection; the second mode is to reject a second search area from the comprehensive projection; the second search area is an area smaller than the first search area centered on the maximum column.

determining a maximum length value according to the length value of each column of pixel points in the comprehensive projection;

then, the determining whether the maximum column is a local height peak column within a first search area centered on the maximum column includes:

determining whether the maximum column is a local height peak column within a first search area centered on the maximum column and whether the maximum length value meets a reliability condition, the reliability condition being determined according to a relationship between a target maximum length value and a length of the target maximum column when the forward projection corresponds to a human body.

The embodiment of the application also provides equipment for detecting the object, which comprises a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to execute the above-described object detection method according to instructions in the program code.

The embodiment of the application also provides a computer readable storage medium for storing program codes for executing the object detection method.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An object detection method, the method comprising:

acquiring a depth image to be detected;

generating a point cloud corresponding to the foreground region;

detecting whether the depth image comprises a target object or not according to the point cloud;

the detecting whether the depth image includes a target object according to the point cloud includes:

determining world coordinates of the point Yun Zhongdian cloud element;

it is determined whether the sub-point cloud corresponds to a target object.

2. The method of claim 1, wherein the pixels in the depth image correspond to a background probability condition, and wherein the segmenting the foreground region from the depth image based on the depth values of the pixels in the depth image for the target pixels in the depth image comprises:

determining whether the depth value of the target pixel point meets a corresponding background probability condition, wherein the target pixel point is any pixel point in the depth image;

3. The method according to claim 2, wherein the method further comprises:

4. A method according to claim 3, wherein if the pixel point at the target location corresponds to a foreground probability condition, the method further comprises:

after the detection of the continuous T2 frame depth images in the video is completed, for a second set including pixels at a target position in the continuous T2 frame depth images, if the number of pixels belonging to the background in the second set is not less than a second number threshold, updating a foreground probability condition corresponding to the pixels at the target position to a background probability condition, where the background probability condition is used to determine whether the pixels at the target position in the depth images after the continuous T2 frame depth images are foreground pixels belonging to a foreground region, and the background probability condition is determined according to a depth value of the pixels at the target position in the continuous T2 frame depth images.

5. The method according to claim 2, wherein the method further comprises:

6. The method of claim 1, wherein determining a predicted ground location of a target object from the altitude map and the density map comprises:

7. The method of claim 1, wherein the determining whether the sub-point cloud corresponds to a target object comprises:

determining forward projection of the sub-point cloud along the direction parallel to the ground according to the shooting angle of the camera;

it is determined whether the forward projection corresponds to a target object.

8. The method of claim 1, wherein the target object is a human body, and wherein if a forward projection of the sub-point cloud along a direction parallel to the ground is determined, determining whether the forward projection corresponds to the target object comprises:

taking the maximum height value of each column of pixel points in the comprehensive projection as the height value of each column of pixel points, wherein the comprehensive projection comprises forward projection of the sub-point cloud, and the comprehensive projection is a region with a non-zero value of the pixel point identification value in the depth image;

9. The method of claim 8, wherein the determining whether the largest column is a local height peak column within a first search area centered on the largest column comprises:

10. The method according to claim 8 or 9, characterized in that the method further comprises:

determining whether the maximum column is a local height peak column within a first search area centered on the maximum column and whether the maximum length value meets a reliability condition, the reliability condition being determined from a relationship between a target maximum length value and a length of the target maximum column when the forward projection corresponds to a human body.

11. An object detection apparatus, the apparatus comprising:

an acquisition unit for acquiring a depth image to be detected;

the detection unit is used for detecting whether the depth image comprises a target object or not according to the point cloud;

Determining world coordinates of the point Yun Zhongdian cloud element;

it is determined whether the sub-point cloud corresponds to a target object.

12. An apparatus for object detection, the apparatus comprising a processor and a memory:

the processor is configured to perform the object detection method of any of claims 1-10 according to instructions in the program code.