CN111144213A - Object detection method and related equipment - Google Patents
Object detection method and related equipment Download PDFInfo
- Publication number
- CN111144213A CN111144213A CN201911175243.8A CN201911175243A CN111144213A CN 111144213 A CN111144213 A CN 111144213A CN 201911175243 A CN201911175243 A CN 201911175243A CN 111144213 A CN111144213 A CN 111144213A
- Authority
- CN
- China
- Prior art keywords
- pixel points
- depth image
- target
- point cloud
- foreground
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application discloses an object detection method and related equipment, and after a depth image to be detected is obtained, a foreground area can be segmented from the depth image according to depth values of pixel points in the depth image. Since the target object has a larger foreground region that is likely to be in the depth image, the corresponding point cloud may be generated only for the foreground region. Thus, whether the target object is included in the depth image is detected according to the generated point cloud. According to the method, the point cloud is generated by the foreground area which is more likely to include the target object in the depth image and is further detected, and the background area which is less likely to include the target object is removed, namely the point cloud is not generated in the areas, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
Description
Technical Field
The present application relates to the field of computer vision, and in particular, to an object detection method and related apparatus.
Background
Object detection is widely applied in modern life, wherein objects such as human bodies and the like can provide data for a plurality of applications such as security, entertainment, accurate services and the like in places such as shopping malls, stations, families and the like. Under the scene of too dark light, the object detection algorithm based on the color image fails, and the object detection algorithm based on the depth image is good in performance, so that the defects of the color image are well made up.
At present, a method for object detection based on a depth map mainly comprises a point cloud detection algorithm, wherein a depth image is preprocessed and converted into a point cloud, and then a ground point cloud is determined according to prior information, so that a non-ground point cloud is extracted, and the extracted point cloud is further detected. The point cloud is a vector composed of a plurality of space points, and each space point comprises corresponding space coordinate information, color information and the like.
Because the method needs to generate point clouds from scenes, such as the ground, etc., in the depth image, which do not include the target object, and calculate the point clouds, the object detection efficiency is low and the real-time performance is poor.
Disclosure of Invention
In order to solve the technical problem, the application provides an object detection method and related equipment, which reduce the calculated amount and improve the object detection efficiency and detection real-time performance.
The embodiment of the application discloses the following technical scheme:
in a first aspect, an embodiment of the present application provides an object detection method, where the method includes:
acquiring a depth image to be detected;
segmenting a foreground region from the depth image according to the depth value of a pixel point in the depth image;
generating a point cloud corresponding to the foreground area;
and detecting whether a target object is included in the depth image or not according to the point cloud.
In a second aspect, an embodiment of the present application provides an object detection apparatus, including:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a depth image to be detected;
the segmentation unit is used for segmenting a foreground region from the depth image according to the depth value of a pixel point in the depth image;
the generating unit is used for generating a point cloud corresponding to the foreground area;
and the detection unit is used for detecting whether the target object is included in the depth image or not according to the point cloud.
In a third aspect, an embodiment of the present application provides an apparatus for object detection, where the memory is configured to store a program code and transmit the program code to the processor;
the processor is configured to perform the object detection method according to the first aspect according to instructions in the program code.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium for storing program codes, where the program codes are used to execute the object detection method according to the first aspect.
According to the technical scheme, after the depth image to be detected is obtained, the foreground area can be segmented from the depth image according to the depth value of the pixel point in the depth image. Since the target object has a larger foreground region that is likely to be in the depth image, the corresponding point cloud may be generated only for the foreground region. Thus, whether the target object is included in the depth image is detected according to the generated point cloud. According to the method, the point cloud is generated by the foreground area which is more likely to include the target object in the depth image and is further detected, and the background area which is less likely to include the target object is removed, namely the point cloud is not generated in the areas, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of an object detection method according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for segmenting a foreground region from a depth image according to an embodiment of the present disclosure;
fig. 3 is a flowchart of a method for detecting whether a target object is included in a depth image according to a point cloud according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for determining whether a sub-point cloud corresponds to a target object according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a method for determining whether a forward projection corresponds to a target object according to an embodiment of the present disclosure;
fig. 6 is a flowchart of a human body detection method according to an embodiment of the present application;
fig. 7 is a structural diagram of an object detection apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
At present, a method for detecting an object based on a depth map mainly includes a point cloud detection algorithm, and since the method needs to generate point clouds in scenes, such as the ground, in which a depth image does not include a target object, and calculate the point clouds, the object detection efficiency is low and the real-time performance is poor.
Therefore, the embodiment of the application provides an object detection method, and the foreground area which is more likely to include the target object in the depth image is generated into the point cloud and is further detected, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
First, an execution body of the embodiment of the present application will be described. The object detection method provided by the application can be applied to image processing equipment, such as terminal equipment and a server. The terminal device may be a user terminal, and the terminal device may be, for example, an intelligent terminal, a computer, a Personal Digital Assistant (PDA), a tablet computer, and the like.
The object detection method can also be applied to a server, and the server can acquire the depth image to be detected sent by the terminal equipment, detect the object of the depth image and send the detection result to the terminal equipment. The server may be a stand-alone server or a server in a cluster.
In order to facilitate understanding of the technical solution of the present application, a server is taken as an execution subject, and an object detection method provided by the embodiment of the present application is introduced in combination with an actual application scenario.
In this embodiment, the server may obtain a depth image to be detected, where a pixel point in the depth image corresponds to a depth value. The depth value of the pixel point can be used for representing the distance degree between the real scene corresponding to the pixel point and the camera lens, and if the depth value of the pixel point is larger, the real scene corresponding to the pixel point is farther from the camera; if the depth value of the pixel point is smaller, the real scene corresponding to the pixel point is closer to the camera.
A foreground region and a background region may be segmented from the depth image, wherein depth values of pixels in the foreground region are lower than depth values of pixels in the background image. For example, if a depth image captured by the camera includes a human body closer to the camera (with a lower depth value) and a wall farther from the camera (with a higher depth value), an image area corresponding to the human body in the depth image may be used as the foreground area, and an image area corresponding to the wall may be used as the background area.
In an actual scene, when detecting an object based on an image, the object to be detected is likely to be closer to a camera lens, so that a clear image of the object can be obtained, and the accuracy of object detection is improved. As such, the target object that needs to be detected is more likely to be distributed in the foreground region in the depth image than in the background region in the depth image. For example, based on the above example, it is assumed that the target object to be detected is a human body, and an image area corresponding to the human body in the depth image is a foreground area.
Based on the method, the server can segment the foreground area from the depth image according to the depth value of the pixel point in the depth image, and generate corresponding point cloud for the foreground area. Thus, whether the target object is included in the depth image is detected according to the generated point cloud.
By executing the object detection method, the foreground areas which are more likely to include the target object in the depth image are generated into point clouds and are further detected, and background areas which are less likely to include the target object are removed, namely the point clouds are not generated for the areas, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
Next, the object detection method provided by the embodiment of the present application will be described with a server as an execution subject.
Referring to fig. 1, the figure shows a flowchart of an object detection method provided in an embodiment of the present application, where the method may include:
s101: and acquiring a depth image to be detected.
It should be noted that, in the embodiment of the present application, a manner of obtaining the depth image to be detected by the server is not limited, and a suitable manner may be selected to obtain the depth image to be detected according to an actual situation. For example: the depth image may be acquired by the server after being photographed by a camera with a depth image photographing function, or the depth image may be acquired by the server after being processed into a depth image according to a color image.
S102: and segmenting a foreground region from the depth image according to the depth value of the pixel point in the depth image.
In this embodiment, the server may segment the foreground region from the depth image according to the depth value of the pixel point in the depth image.
For example: a depth threshold value can be preset, and an area formed by pixel points of which the depth values are smaller than the depth threshold value in the depth image is used as a foreground area and is segmented.
In addition, in a possible implementation manner, a pixel point in the depth image may correspond to a background probability condition. Wherein, the background probability condition corresponding to each pixel point in the depth image may be different. When the depth value of a pixel point meets the background probability condition corresponding to the pixel point, the pixel point can be determined as a background pixel point. Then, referring to fig. 2 for a target pixel point in the depth image, this illustrates a flowchart of a method for segmenting a foreground region from the depth image according to an embodiment of the present application, and as shown in fig. 2, the method for segmenting a foreground region from the depth image according to a depth value of a pixel point in the depth image in S102 above may include:
s201: and determining whether the depth value of the target pixel point meets the corresponding background probability condition, and if not, executing S202.
The target pixel point can be any pixel point in the depth image.
S202: and determining the target pixel point as a foreground pixel point belonging to the foreground area.
That is to say, if the depth value of the target pixel does not satisfy the corresponding background probability condition, the target pixel may be considered as a foreground pixel, and an area formed by the foreground pixels is segmented from the depth image and used as a foreground area.
The following describes a specific implementation process of S201-S202, assuming that the length of the depth image is W and the width of the depth image is H, a set of pixels S in the depth image is { (i, j) |0 ≦ i < W, and 0 ≦ j, H }, where (i, j) may identify a position in the depth image, and each pixel (i, j) belongs to S. Note that each pixel point has a background probability distribution corresponding to the mean value ofStandard deviation ofIs normally distributed.
Depth value I based on target pixel point in depth imagei,jThe background probability condition corresponding to the target pixel point may beIf the depth value Ii,jIf the background probability condition is satisfied, the pixel point can be determined as a background pixel point belonging to a background area, and if the background probability condition is not satisfied, the pixel point can be determined as a foreground image belonging to a foreground areaα can be a fixed parameter, and its value can range from 1.0 to 3.0.
In specific implementation, a probability model may be generated based on a background probability condition, so as to input a depth image into the model and output a 0-1 binary image corresponding to the depth image, where a pixel point in the output image is 0 and may represent that the pixel point belongs to a background pixel point in a background region, and a pixel point is 1 and may represent that the pixel point belongs to a foreground pixel point in a foreground region.
After the output image is obtained, the pixel points whose median is 1 can be clustered according to the depth value and the proximity relation, so as to obtain a foreground region, and the foreground region is segmented from the depth image. Wherein the set of the divided foreground regions may be SF={s1,s2,Λ,snH, each element s in the setk(k-1, 2, …, n) may be a foreground region, the foreground regionMay be a set of coordinates of the foreground region in a coordinate system of the output image. The coordinate system of the output image may be the origin of the coordinate system with the position of the lower left corner of the output image.
By the method, the foreground region can be accurately segmented from the depth image.
S103: and generating a point cloud corresponding to the foreground area.
The point cloud is a vector composed of a plurality of space points, and each space point comprises corresponding space coordinate information, color information and the like.
In an actual scene, when point clouds are generated for all scenes in a depth image, the point clouds need to be generated according to the shape (contour) of each scene, and if the quality of the depth image is poor, the quality of the point clouds generated by the method is low.
In the embodiment of the present application, by performing segmentation (segmenting the foreground region) in advance, since the segmented foreground region already has the segmented contour features, even if the quality of the depth image is not good, the point cloud generated according to the contour of the segmented foreground region in S103 may still have high quality. And compared with the method of segmenting on point cloud in the related technology, the method is fast and efficient.
In a particular implementation, the camera's optical center (c) may be based onx,cy) Focal length (f)x,fy) Converting the coordinates (i, j) of the pixel points in each foreground area into a space coordinate P under a camera coordinate systemi,j=(xi,j,yi,j,zi,j)T,zi,j=Ii,j。
Thus, based on the foreground region set SF={s1,s2,Λ,snA set of point clouds V ═ V can be generated1,v2,Λ,vn, each point cloudCorresponding to a foreground region skThe point cloud elements in a point cloud are recorded as
S104: and detecting whether the target object is included in the depth image or not according to the point cloud.
Therefore, the server can detect whether the target object is included in the depth image or not according to the point cloud of the foreground area.
According to the technical scheme, after the depth image to be detected is obtained, the foreground area can be segmented from the depth image according to the depth value of the pixel point in the depth image. Since the target object has a larger foreground region that is likely to be in the depth image, the corresponding point cloud may be generated only for the foreground region. Thus, whether the target object is included in the depth image is detected according to the generated point cloud. According to the method, the point cloud is generated by the foreground area which is more likely to include the target object in the depth image and is further detected, and the background area which is less likely to include the target object is removed, namely the point cloud is not generated in the areas, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
In one possible implementation, in order to detect a target object moving in the depth image, the object detection method may further include:
s301: after the detection of the depth images of the continuous T1 frames is completed, for a first set including pixel points at a target position in the depth images of the continuous T1 frames, if the number of foreground pixel points in the first set is not less than a first number threshold, the server may update a background probability condition corresponding to the pixel points at the target position to a foreground probability condition.
The target position may be any position in the depth image, for example, the target position may be a position of a pixel point at an upper left corner of the depth image, and the target position may include one pixel point or a plurality of pixel points.
The first number threshold may be used to measure whether the number of the pixels belonging to the foreground pixels in the first set is proper, and if the number of the pixels belonging to the foreground pixels in the first set is not less than the first number threshold, it may be determined that the number of the pixels belonging to the foreground pixels in the first set is large.
In an actual scene, when it is determined that there are many pixel points belonging to foreground pixel points at a target position in the T1 frame depth image, it can be described that a target object such as a human body may appear in a corresponding actual scene at the target position of the depth image. Meanwhile, if the number of the pixel points belonging to the foreground pixel points at the target position in the T1 frame depth image is large, it may represent that the target object in the real scene corresponding to the target position is still at the position.
In this way, in order to detect a moving target object, the server may update the background probability condition corresponding to the pixel point at the target position to the foreground probability condition. The foreground probability condition may be determined according to depth values of pixels at target positions in the consecutive T1 frame depth images, and the foreground probability condition may be used to determine whether a pixel at a target position in a depth image after the consecutive T1 frame depth images is a foreground pixel. The depth image after the consecutive T1 frame depth images may refer to a depth image acquired after the consecutive T1 frame depth images according to the timing of the video.
Compared with the background probability condition, the foreground probability condition of the pixel point at the target position can determine the pixel point with a smaller depth value as the foreground pixel point. In addition, since the foreground probability condition may be determined according to the depth values of the pixels at the target positions in the consecutive T1 frame depth images, if the depth values of the pixels at the target positions in the depth images after the consecutive T1 frame depth images are not changed, that is, the target object in the real scene corresponding to the target positions is not moved, the depth values of the pixels at the target positions (corresponding to the target object) cannot be determined as foreground pixels based on the foreground probability condition.
Next, based on the specific implementation process of S201-S202, the method of S301 is described, and the background counter can be presetAfter detecting each continuous frame depth image in sequence, if the pixel point on the target position is determined as the background pixel point, the pixel point can be detected as the background pixel pointAfter completing the detection for the depth images of the continuous T1 frames, if(CBT1-first number threshold), the background probability condition of a pixel point for that target location may be updated to the foreground probability condition, wherein the background probability condition may be such that I.e. the foreground probability distribution obeys the meanStandard deviation ofIs normally distributed.
It should be noted that the embodiments of the present application do not limit the distribution manner of the background probability condition and the foreground probability condition, and for example, a gaussian distribution, a uniform distribution, a non-parametric distribution, and the like may also be used.
In addition, after determining whether a pixel is a foreground pixel based on the background probability condition or the foreground probability condition of each pixel in the depth image, the foreground probability condition or the background probability condition of the pixel can be updated. The updating method can comprise the following steps:
if the pixel point is determined as the background pixel point, the depth value I based on the pixel pointi,jThe background probability condition is updated as follows:wherein t is a weight parameter; in other cases, among others, it is possible to Wherein sigmamIs the upper standard deviation limit. The weighting parameter t may vary over time or other conditions.
By carrying out self-adaptive updating on the foreground probability condition and the background probability condition for the pixel points in the depth image, the moving target object in the depth image can be detected, and the speed of the whole calculation process is improved on the premise of ensuring the stable result.
In a possible implementation manner, if the pixel point on the target location corresponds to the foreground probability condition, the method may further include:
s401: after the detection of the continuous T2 frame depth images in the video is completed, aiming at a second set comprising pixel points at the target position in the continuous T2 frame depth images, if the number of the pixel points belonging to the background in the second set is not less than a second number threshold, updating the foreground probability condition corresponding to the pixel points at the target position to be a background probability condition.
The second number threshold may be used to measure whether the number of the background pixels in the second set is proper, and if the number of the background pixels in the second set is not less than the second number threshold, it may be determined that the number of the pixels in the second set is greater.
In an actual scene, when it is determined that there are more pixel points belonging to the background pixel point at the target position in the T2 frame depth image, it can be described that a target object originally existing in the real scene corresponding to the target position of the depth image has left.
In this way, in order to ensure that the target object is detected, the server may update the foreground probability condition corresponding to the pixel point at the target position to the background probability condition. The background probability condition may be determined according to depth values of pixels at target positions in the consecutive T2 frame depth images, and the background probability condition may be used to determine whether a pixel at a target position in a depth image after the consecutive T2 frame depth images is a foreground pixel.
By the method, the accuracy of target object detection can be improved.
In the actual scene, because reasons such as collection equipment and environment, acquire the not good depth image of quality easily, for example the pixel that corresponds to distant or corner scene in the depth image is the pixel that the depth value is 0 for the zero depth value pixel, such zero depth value pixel is the noise point usually. To this end, in one possible implementation, the method may further include:
s501: after the detection of the continuous T3 frame depth images in the video is completed, for a third set including pixel points at the target position in the continuous T3 frame depth images, if the number of pixel points belonging to zero depth values in the third set is not less than a third number threshold, the target pixel points do not include the pixel points at the target position in the depth images.
The target position may be any position in the depth image, for example, the target position may be a position of a pixel point at an upper left corner of the depth image, and the target position may include one pixel point or a plurality of pixel points. The target position in the present embodiment may be the same as or different from the target position in the foregoing embodiment.
The third number threshold may be used to measure whether the number of the pixels belonging to the zero depth value in the third set is proper, and if the number of the pixels belonging to the zero depth value in the third set is not less than the third number threshold, it may be determined that the number of the pixels belonging to the zero depth value in the third set is greater.
When the server determines that the number of the pixel points belonging to the zero depth value in the third set is too large, that is, it indicates that the pixel point at the target position in the depth image of T3 frame is in a situation where the pixel point is continuously the pixel point of the zero depth value or the pixel point is intermittently the pixel point of the zero depth value, so that the target pixel point in S201 may not include the portion of the pixel point at the target position.
The method of S501 will be described below based on the specific implementation procedures of S201-S202. A zero value counter may be presetAnd distribution failure identification asAfter sequentially detecting each continuous frame of depth image, if the pixel point on the target position is determined as a zero-depth-value pixel point, the pixel point may be detected by the depth image detection methodAfter detecting the depth images of the continuous T3 frames, ifThen orderWherein, CZMay be a third quantity threshold, when distributing the failure indicationIn this case, it may be indicated that the pixel point at the target position is invalid, that is, the target pixel point in S201 may not include the pixel point at the target position. If it isIf not, then orderIt may be indicated that the pixel at the target location is not failed, and the target pixel at S201 may include the pixel at the target location.
By executing the method, the invalid zero-depth-value pixel point can be prevented from being taken as a target pixel point and being segmented into the foreground area, and the influence of the noise point on object detection is reduced.
In an actual scene, a situation may occur in which two or more objects are in contact in a scene that is closer to the camera (divided into the foreground in S102), and thus, one foreground region divided in S102 may be a connected region corresponding to the two or more objects. In this case, in order to improve the accuracy of object detection, in a possible implementation manner, referring to fig. 3, this figure shows a flowchart of a method for detecting whether a target object is included in a depth image according to a point cloud provided in an embodiment of the present application, and as shown in fig. 3, the method for detecting whether a target object is included in a depth image according to a point cloud in S104 above may include:
s601: determining world coordinates of point cloud elements in the point cloud.
In the case where the generated point cloud is a point cloud under the camera coordinate system, the world coordinates of the point cloud elements in the point cloud may also be determined.
In a specific implementation, based on the above S103, a world can be established by using the projection of the camera on the ground as an origin, using the direction from the origin to the camera as a z-axis, and using the projection of the camera visual axis on the ground as a y-axisA coordinate system. And obtaining a transformation matrix M from the camera coordinate system to the world coordinate system through calibration, thereby transforming the point cloud set V under the camera coordinate system into a point cloud set W under the world coordinate system. Wherein the point cloud viThe transformation mode of transforming one point cloud element coordinate in the (camera coordinate system) into the point cloud element coordinate in the world coordinate system is (x)w,yw,zw,i,j)T=M(xv,yv,zv,i,j)T,(xw,yw,zw,i,j)TAs a point cloud Vi(camera coordinate system) one point cloud element coordinate, (x)v,yv,zv,i,j)TMay be the coordinates of the point cloud elements in the world coordinate system.
S602: and generating a height map and a density map according to world coordinates.
The pixel points in the height map and the density map correspond to ground positions for projecting the point cloud to a ground coordinate system, the pixel points in the height map identify the maximum height value of the ground positions corresponding to the pixel points in the height map, and the maximum height value is the maximum value in the heights corresponding to the point cloud elements projected to the ground positions; the pixel points in the density map identify the number of point cloud elements projected to the corresponding ground location.
In the embodiment of the application, the grid processing is performed according to the world coordinates of the point cloud elements in the point cloud, and a height map and a density map are obtained.
Based on the specific implementation process in S601, the generation manner of the height map and the density map is introduced, and for the point cloud elements of each point cloud in the point cloud set W, the point cloud elements are reduced according to a certain proportionality coefficient β and projected onto the ground plane coordinate system, and a height map H and a density map d are obtained, where each pixel point (k, l) in the height map identifies the maximum height value among the height values of the point cloud elements projected onto the ground position corresponding to the pixel point, and each pixel point (k, l) in the density map identifies the number of the point cloud elements projected onto the ground position corresponding to the pixel point, where k is greater than or equal to 0 and less than WM,0≤l<hM,wMThe number of pixels, h, included in the height and density maps in the transverse directionMThe number of pixels included in the height map and the density map in the longitudinal direction is shown.
The maximum height value H of each pixel point in the height map can be calculated by the following formulak,lAnd the number D of point cloud elements of each pixel point in the density mapk,l:
The calculation amount can be reduced by reducing the point cloud element of each point cloud in the point cloud set W by the scale factor β.
S603: and determining the predicted ground position of the target object according to the height map and the density map.
In one possible implementation, the predicted ground location of the target object may be determined from the height map and the density map and by a first discriminant function. The first discriminant function may be determined according to a relationship between a target height and a target density, where the point cloud corresponding to the target object is projected onto the ground.
Based on the specific implementation process in S602, the following describes a specific implementation manner of S603, and the first discriminant function may be preset as follows:
h and d can be the maximum height value and the number of point cloud elements of the pixel points corresponding to the same ground position in the height map and the density map respectively; u, v may be a tuning parameter.
Then, whether g (H) is satisfied or not can be judged based on the pixel points (k, l) corresponding to the same ground position in the height map and the density mapk,l,Dk,l)=max{g(Hk+a,l+b,Dk+a,l+b) L (a, b) e b (r), where b (r) may be a pixel neighborhood of radius r in a height map or a density map. If yes, the pixel point (k, l) can be added into the position set Q.
And finally, non-maximization inhibition can be performed on the radius of the set Q based on the target object, so that the ground position distance corresponding to the pixel points in the set is not too close, and a set Q' is obtained.
It should be noted that, when performing the subdivision of the point cloud (i.e., dividing the sub-point cloud), the sub-point cloud may be in the form of a neighborhood, such as a rectangle, an ellipse, or the like, in addition to a circle region.
S604: and dividing the point cloud into sub-point clouds according to the predicted ground position.
Next, the server may re-segment the point cloud according to the predicted ground location to obtain sub-point clouds. The point cloud may be re-segmented based on the shape of the target object, such as based on the predicted ground location.
Based on the specific implementation process in S603, the following describes a specific implementation manner of S604, and for the ground position corresponding to each pixel point (k, l) in the set Q', the ground position is specified according to the specified radius rbThe point cloud is divided again (corresponding to the size of the target object), and the ground position corresponding to the point is smaller than the radius rbThe point cloud elements of (a) form a sub-point cloud. The resulting sub-point clouds are then stored in U, i.e., U ═ U1,u2Λ, each uqMay refer to a sub-point cloud.
S605: it is determined whether the sub-point cloud corresponds to the target object.
The data processing by using the point cloud in the related art usually performs voxelization, calculates patch information, and the like to analyze the three-dimensional shape of the object, so that the detection range is large, the distance is long, and the target objects may be shielded, which further causes that the determined three-dimensional shape information of the point cloud may be incomplete or misleading. According to the method provided by the embodiment of the application, the predicted ground position where the pedestrian is likely to appear is obtained through rasterization of the point cloud and non-maximum suppression. And after the predicted ground positions are obtained, the detection algorithm is not directly operated, the point cloud is segmented again, the obtained sub-point cloud is ensured not to belong to a connected region comprising two or more objects, even if the interference of the surrounding connected region is eliminated by the obtained sub-point cloud, and a precondition is provided for accurately detecting the object.
In an actual scene, based on an installation angle of the camera (for obtaining the depth image), which is usually taken from a top view angle, the sub-point cloud is caused to be a point cloud under the top view angle, and thus, in order to improve an accuracy rate of object detection in one possible implementation manner, referring to fig. 4, a flowchart of a method for determining whether the sub-point cloud corresponds to the target object provided by the embodiment of the present application is shown, where the method for determining whether the sub-point cloud corresponds to the target object in S605 may include:
s701: and determining the forward projection of the sub-point cloud along the direction parallel to the ground according to the shooting angle of the camera.
In the embodiment of the application, the sub-point cloud can be corrected to a head-up view angle, and a forward projection of the sub-point cloud along a direction parallel to the ground is determined.
Based on the specific implementation process in S604, the following describes a specific implementation manner of S701, and each sub-point cloud u is processedqE.g. U, calculated by the following formula
Wherein x isoIs uqThe abscissa values of the point cloud elements in (1).
Then, the projection window size (w) may be presetp,hp) Obtaining a forward projection J:
it should be noted that, in order to reduce the amount of calculation, when obtaining the forward projection of the sub-point cloud, the scaling parameter s may also be appliedpTo perform scaling. Then, the forward projection J is:
s702: it is determined whether the forward projection corresponds to the target object.
By executing the method, the sub-point cloud is corrected to the point cloud under the head-up view angle, thereby improving the accuracy of object detection.
It should be noted that the target object to be detected is not limited in the embodiments of the present application, and in a possible implementation manner, the target object to be detected may be a human body.
In a possible implementation manner, if a forward projection of the sub-point cloud along a direction parallel to the ground is determined, referring to fig. 5, which shows a flowchart of a method for determining whether the forward projection corresponds to the target object according to an embodiment of the present application, as shown in fig. 5, the method for determining whether the forward projection corresponds to the target object in S702 may include:
s801: and taking the maximum height value of each column of pixel points in the comprehensive projection as the height value of each column of pixel points.
Wherein the synthetic projection may comprise a forward projection of the entire sub-point cloud. Each column of pixel points in the comprehensive projection can have a corresponding height value (longitudinal coordinate value), and the server can take the maximum height value of each column of pixel points as the height value of the column of pixel points, wherein the maximum height value of each column of pixel points can be the maximum longitudinal coordinate value of the column of pixel points.
In a specific implementation, the synthetic projection may be located in one projection image, the pixel point identification value in the region of the synthetic projection in the projection image is a non-zero value, and the pixel point identification value in the projection image except for the synthetic projection is a zero value.
For convenience of subsequent calculation, the height values of each column of pixel points of the synthetic projection can be combined into a vector, which is recorded as HJ. In addition, a detection mark vector M can be set for the comprehensive projectionJThe dimension J of the detection marker vector is the number of columns included in the synthetic projection, and the numerical value of one dimension in the detection marker vector identifies whether the corresponding column is removed from the synthetic projection or not, and the detection markerThe initial value in the notation vector is 0. When the value of one dimension in the detection mark vector is 1, the row of pixel points of the comprehensive projection corresponding to the dimension is removed from the comprehensive projection.
S802: and determining the number of each column of pixel points in the comprehensive projection as the length value of each column of pixel points.
In the embodiment of the application, the number of each column of pixel points in the synthetic projection can be determined and used as the length value of each column of pixel points.
In the specific implementation, the number of pixels with non-zero identification values in each column of the comprehensive projection in the projection image can be determined and used as the length value of each column of pixels, and for convenience of subsequent calculation, the length values of each column of pixels can be combined into a vector and recorded as CJ。
S803: and determining the maximum column corresponding to the maximum value obtained by the second discriminant function according to the height value and the length value of each column of pixel points.
The second discrimination function may be obtained by a relationship between a target height value and a target length value when the forward projection corresponds to the human body.
In a particular implementation, the second discrimination function may beH and d can be the height value and the length value of each column of pixel points respectively; u, v may be a tuning parameter. The subscript of the maximum column can then be determined according to the following formula:
s804: determining whether the maximum column is a local height peak column within a first search area centered on the maximum column; if yes, go to step S805.
In an embodiment of the present application, after the maximum column is determined, a first search area may be determined in the synthetic projection centered on the maximum column. For example, a first search area is determined in the synthetic projection by taking the maximum column as the center and the search radius s as the radius, and the first search area is an area from the left side s column of the maximum column to the right side s column of the maximum column in the synthetic projection.
Then, it may be determined whether the height of the maximum column is the local height peak of the first search area, that is, whether the maximum column is the local height peak column within the first search area, if so, S805 may be performed.
It should be noted that, in an actual scene, if the maximum column is a local height peak column in the first search area, it indicates that the pixel point with the maximum height of the maximum column corresponds to the vertex of the human head.
In particular implementations, the subscripts of the columns of the first search area corresponding to local height peaks may be determined by the following formulaIf imax=ilocThis may mean that the largest column is determined to be the local height peak column in the first search area.
S805: and determining the first search area as a human body, updating the comprehensive projection in a first mode, and executing S803 on the updated comprehensive projection, namely determining the maximum column corresponding to the maximum value obtained by the second discriminant function according to the height value and the length value of each column of pixel points.
The first method may be to eliminate the first search area from the synthetic projection.
It is to be understood that, if it is determined that the maximum column is not the local peak height column in the first search area, in a possible implementation manner, the method for determining whether the maximum column is the local peak height column in the first search area centered on the maximum column in S803 may further include:
if not, updating the comprehensive projection through a second mode, and executing S803 on the updated comprehensive projection, namely determining the maximum column corresponding to the maximum value obtained through the second discriminant function according to the height value and the length value of each column of pixel points.
The second mode may be to remove the second search area from the synthetic projection. The second search area is an area in which the number of columns of the maximum column center is smaller than the number of columns of the first search area.
The determination of the second search area is exemplified based on the aforementioned first search area, centered on the maximum column in the synthetic projection, by the search radius αms is the radius to determine a second search area which is α from the left side of the maximum column in the synthetic projectionms column to maximum column right αmThe area between s columns, wherein αmIs a fixed parameter, and αm<1。
Therefore, the projection part which is detected in the comprehensive projection can be removed, repeated calculation is avoided, the calculation amount is reduced, and the detection efficiency is improved.
In this embodiment of the application, in order to improve accuracy of human body detection, in a possible implementation manner, the method may further include:
s901: and determining the maximum length value according to the length value of each column of pixel points in the comprehensive projection.
In the embodiment of the present application, a maximum length value of the synthetic projection may be determined.
In a specific implementation, the maximum length value of the synthetic projection can be determined by the following formula: c. Cmax=max{CJ(i)|0≤i<wJ}。
Then, the method for determining whether the maximum column is the local height peak column in the first search area centered on the maximum column in S804 may include:
s902: it is determined whether the maximum column is a local height peak column and the maximum length value meets a trustworthiness condition within a first search area centered on the maximum column.
Wherein the credibility condition is determined according to a relation between a target maximum length value when the forward projection corresponds to the human body and the length of the target maximum column.
In a particular implementation, the trustworthiness condition may be that the length value of the maximum column is no less than a first parameter multiple of the maximum length value. I.e. CJ(imax)≥αhcmaxWherein, αhIs the first parameter. Alternatively, the confidence condition may be a length value of the maximum columnNot less than a second parameter times the height value of the largest column. I.e. CJ(imax)≥βhHJ(imax) Wherein, βhIs the second parameter.
In the embodiment of the present application, it is determined whether the maximum length value of the synthetic projection meets the credibility condition, in addition to determining whether the maximum column is the local height peak column in the first search area centered on the maximum column in step S804.
Next, the human body detection method is described, referring to fig. 6, which shows a flowchart of a human body detection method provided in an embodiment of the present application, and as shown in fig. 6, first, a maximum column of the synthetic projection may be determined. It may then be determined whether the height value of the largest column is a local peak within the first search area. If so, it may be determined whether the maximum length value of the synthetic projection meets a plausibility condition. If the confidence condition is met, the columns in the first search area (the number of columns corresponding to [ i ] may be assignedmax-s,imax+s]) And setting the value in the corresponding detection mark vector as 1, so as to remove the first search area from the comprehensive projection, namely updating the comprehensive projection in a first mode, determining the first search area as a human body and storing the human body detection result. If it is determined that the height value of the largest column is not a local peak in the first search area, or if it is determined that the maximum length value of the synthetic projection does not meet the confidence condition, the columns in the second search area (the number of columns may correspond to [ i ], [max-αms,imax+αms]) The value in the corresponding detection flag vector is set to 1, so that the second search area is removed from the synthetic projection, i.e. the synthetic projection is updated by the second means. If M isJIf the elements in (1) are all 1, the determination can be finished, otherwise, the maximum column in the synthetic projection is determined again.
In addition, the local peak value algorithm can replace the quality of the apparent depth image with other human body detectors, and the column of the detection result is taken as the peak value imax。
In consideration of the loss of details caused by the quality of the depth map in the related technology, the local texture features may not describe the human body, and by applying the two statistical features of the height value and the density value, robustness is provided for the human body posture change and the mutual approaching of the human bodies during human body detection, and the accuracy of the human body detection is improved.
Based on the object detection method, an embodiment of the present application further provides an object detection apparatus, as shown in fig. 7, which shows a structural diagram of the object detection apparatus provided in the embodiment of the present application, and as shown in fig. 7, the apparatus includes:
an obtaining unit 701, configured to obtain a depth image to be detected;
a segmentation unit 702, configured to segment a foreground region from the depth image according to depth values of pixel points in the depth image;
a generating unit 703, configured to generate a point cloud corresponding to the foreground region;
a detecting unit 704, configured to detect whether a target object is included in the depth image according to the point cloud.
In a possible implementation manner, the dividing unit 702 is specifically configured to:
the pixel point in the depth image corresponds to a background probability condition, and whether the depth value of a target pixel point meets the corresponding background probability condition or not is determined for the target pixel point in the depth image, wherein the target pixel point is any one pixel point in the depth image;
if not, determining the target pixel point as a foreground pixel point belonging to the foreground area.
In a possible implementation manner, the apparatus further includes an updating unit, configured to:
after detection of a continuous T1 frame depth image in a video is completed, for a first set including pixel points at a target position in the continuous T1 frame depth image, if the number of foreground pixel points in the first set is not less than a first number threshold, updating a background probability condition corresponding to the pixel points at the target position to a foreground probability condition, where the foreground probability condition is used to determine whether the pixel points at the target position in the depth image after the continuous T1 frame depth image are foreground pixel points belonging to a foreground region, and the foreground probability condition is determined according to depth values of the pixel points at the target position in the continuous T1 frame depth image.
In a possible implementation manner, the updating unit is further configured to:
if the pixel point at the target position corresponds to a foreground probability condition, after detection of a continuous T2 frame depth image in a video is completed, updating a foreground probability condition corresponding to the pixel point at the target position to a background probability condition for a second set including the pixel point at the target position in the continuous T2 frame depth image, if the number of the background pixel points in the second set is not less than a second number threshold, where the background probability condition is used to determine whether the pixel point at the target position in the depth image after the continuous T2 frame depth image is a foreground pixel point belonging to a foreground region, and the new background probability condition is determined according to the depth value of the pixel point at the target position in the continuous T2 frame depth image.
In a possible implementation manner, the updating unit is further configured to:
after the detection of the continuous T3 frame depth images in the video is completed, for a third set including pixel points at a target position in the continuous T3 frame depth images, if the number of pixel points belonging to zero depth values in the third set is not less than a third number threshold, the target pixel points do not include the pixel points at the target position in the depth images.
In a possible implementation manner, the detecting unit 704 is specifically configured to:
determining world coordinates of point cloud elements in the point cloud;
generating a height map and a density map according to the world coordinates; wherein pixel points in the height map and density map correspond to ground locations that project the point cloud into a ground coordinate system, the pixel points in the height map identify a maximum height value for the corresponding ground location, the maximum height value being a maximum of the heights corresponding to the point cloud elements projected to the ground location; pixel points in the density map identify the number of point cloud elements projected to a corresponding ground location;
determining a predicted ground position of the target object according to the height map and the density map;
partitioning the point cloud into sub-point clouds according to the predicted ground location;
determining whether the sub-point cloud corresponds to a target object.
In a possible implementation manner, the detecting unit 704 is further specifically configured to:
determining the predicted ground position of the target object through a first discriminant function according to the height map and the density map; the first discriminant function is determined according to a relation between a target height and a target density, wherein the target height is obtained by projecting a point cloud corresponding to a target object onto the ground.
In a possible implementation manner, the detecting unit 704 is further specifically configured to determine a forward projection of the sub-point cloud along a direction parallel to the ground according to a shooting angle of a camera;
determining whether the forward projection corresponds to a target object.
In one possible implementation, the target object is a human body.
In a possible implementation manner, the detecting unit 704 is further specifically configured to:
if the forward projection of the sub-point cloud along the direction parallel to the ground is determined, taking the maximum height value of each row of pixel points in the comprehensive projection as the height value of each row of pixel points, wherein the comprehensive projection comprises the forward projection of the sub-point cloud;
determining the number of each row of pixel points in the comprehensive projection as the length value of each row of pixel points;
determining a maximum column corresponding to the maximum value obtained through a second discrimination function according to the height value and the length value of each column of pixel points, wherein the second discrimination function is obtained by the relation between the target height value and the target length value when the forward projection corresponds to the human body;
determining whether the maximum column is a local height peak column within a first search area centered on the maximum column;
if so, determining the first search area as a human body, updating the comprehensive projection in a first mode, executing the step of obtaining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points on the updated comprehensive projection; the first way is to eliminate the first search area from the synthetic projection.
In a possible implementation manner, the detecting unit 704 is further specifically configured to:
if not, updating the comprehensive projection through a second mode, and executing the step of obtaining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points on the updated comprehensive projection; the second mode is to eliminate a second search area from the comprehensive projection; the second search area is an area smaller than the first search area centered on the maximum column.
In a possible implementation manner, the detecting unit 704 is further specifically configured to:
determining a maximum length value according to the length value of each column of pixel points in the comprehensive projection;
then, the determining whether the maximum column is a local height peak column within a first search area centered on the maximum column comprises:
determining whether the maximum column is a local height peak column in a first search area centered on the maximum column and whether the maximum length value meets a credibility condition, wherein the credibility condition is determined according to a relation between a target maximum length value when the forward projection corresponds to a human body and the length of the target maximum column.
According to the technical scheme, after the depth image to be detected is obtained, the foreground area can be segmented from the depth image according to the depth value of the pixel point in the depth image. Since the target object has a larger foreground region that is likely to be in the depth image, the corresponding point cloud may be generated only for the foreground region. Thus, whether the target object is included in the depth image is detected according to the generated point cloud. According to the method, the point cloud is generated by the foreground area which is more likely to include the target object in the depth image and is further detected, and the background area which is less likely to include the target object is removed, namely the point cloud is not generated in the areas, so that the calculated amount is reduced, and the object detection efficiency and the detection real-time performance are improved.
An embodiment of the present application further provides an apparatus for object detection, where the apparatus includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the above object detection method according to instructions in the program code.
An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used for storing a program code, and the program code is used for executing the object detection method.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (13)
1. An object detection method, characterized in that the method comprises:
acquiring a depth image to be detected;
segmenting a foreground region from the depth image according to the depth value of a pixel point in the depth image;
generating a point cloud corresponding to the foreground area;
and detecting whether a target object is included in the depth image or not according to the point cloud.
2. The method of claim 1, wherein the pixel points in the depth image have a background probability condition, and for a target pixel point in the depth image, the segmenting the foreground region from the depth image according to the depth values of the pixel points in the depth image comprises:
determining whether the depth value of the target pixel point meets a corresponding background probability condition, wherein the target pixel point is any one pixel point in the depth image;
if not, determining the target pixel point as a foreground pixel point belonging to the foreground area.
3. The method of claim 2, further comprising:
after detection of a continuous T1 frame depth image in a video is completed, for a first set including pixel points at a target position in the continuous T1 frame depth image, if the number of foreground pixel points in the first set is not less than a first number threshold, updating a background probability condition corresponding to the pixel points at the target position to a foreground probability condition, where the foreground probability condition is used to determine whether the pixel points at the target position in the depth image after the continuous T1 frame depth image are foreground pixel points belonging to a foreground region, and the foreground probability condition is determined according to depth values of the pixel points at the target position in the continuous T1 frame depth image.
4. The method of claim 3, wherein if the pixel point at the target location corresponds to a foreground probability condition, the method further comprises:
after the detection of the continuous T2 frame depth images in the video is completed, for a second set including pixel points at the target position in the continuous T2 frame depth images, if the number of the pixel points belonging to the background in the second set is not less than a second number threshold, updating the foreground probability condition corresponding to the pixel points at the target position to a background probability condition, where the background probability condition is used to determine whether the pixel points at the target position in the depth images after the continuous T2 frame depth images are foreground pixel points belonging to a foreground region, and the background probability condition is determined according to the depth values of the pixel points at the target position in the continuous T2 frame depth images.
5. The method of claim 2, further comprising:
after the detection of the continuous T3 frame depth images in the video is completed, for a third set including pixel points at a target position in the continuous T3 frame depth images, if the number of pixel points belonging to zero depth values in the third set is not less than a third number threshold, the target pixel points do not include the pixel points at the target position in the depth images.
6. The method of claim 1, wherein the detecting whether a target object is included in the depth image from the point cloud comprises:
determining world coordinates of point cloud elements in the point cloud;
generating a height map and a density map according to the world coordinates; wherein pixel points in the height map and density map correspond to ground locations that project the point cloud into a ground coordinate system, the pixel points in the height map identify a maximum height value for the corresponding ground location, the maximum height value being a maximum of the heights corresponding to the point cloud elements projected to the ground location; pixel points in the density map identify the number of point cloud elements projected to a corresponding ground location;
determining a predicted ground position of the target object according to the height map and the density map;
partitioning the point cloud into sub-point clouds according to the predicted ground location;
determining whether the sub-point cloud corresponds to a target object.
7. The method of claim 6, wherein determining a predicted ground location of a target object from the height map and the density map comprises:
determining the predicted ground position of the target object through a first discriminant function according to the height map and the density map; the first discriminant function is determined according to a relation between a target height and a target density, wherein the target height is obtained by projecting a point cloud corresponding to a target object onto the ground.
8. The method of claim 6, wherein the determining whether the sub-point cloud corresponds to a target object comprises:
determining the forward projection of the sub-point cloud along the direction parallel to the ground according to the shooting angle of a camera;
determining whether the forward projection corresponds to a target object.
9. The method of claim 1, wherein the target object is a human body, and if the forward projection of the sub-point cloud in the direction parallel to the ground is determined, the determining whether the forward projection corresponds to the target object comprises:
taking the maximum height value of each row of pixel points in the comprehensive projection as the height value of each row of pixel points, wherein the comprehensive projection comprises the forward projection of the sub-point cloud;
determining the number of each row of pixel points in the comprehensive projection as the length value of each row of pixel points;
determining a maximum column corresponding to the maximum value obtained through a second discrimination function according to the height value and the length value of each column of pixel points, wherein the second discrimination function is obtained by the relation between the target height value and the target length value when the forward projection corresponds to the human body;
determining whether the maximum column is a local height peak column within a first search area centered on the maximum column;
if so, determining the first search area as a human body, updating the comprehensive projection in a first mode, executing the step of obtaining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points on the updated comprehensive projection; the first way is to eliminate the first search area from the synthetic projection.
10. The method of claim 9, wherein determining whether the maximum column is a local height peak column within a first search area centered on the maximum column comprises:
if not, updating the comprehensive projection through a second mode, and executing the step of obtaining a maximum column corresponding to the maximum value through a second discriminant function according to the height value and the length value of each column of pixel points on the updated comprehensive projection; the second mode is to eliminate a second search area from the comprehensive projection; the second search area is an area smaller than the first search area centered on the maximum column.
11. The method according to claim 9 or 10, characterized in that the method further comprises:
determining a maximum length value according to the length value of each column of pixel points in the comprehensive projection;
then, the determining whether the maximum column is a local height peak column within a first search area centered on the maximum column comprises:
determining whether the maximum column is a local height peak column in a first search area centered on the maximum column and whether the maximum length value meets a credibility condition, wherein the credibility condition is determined according to a relation between a target maximum length value when the forward projection corresponds to a human body and the length of the target maximum column.
12. An object detection apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring a depth image to be detected;
the segmentation unit is used for segmenting a foreground region from the depth image according to the depth value of a pixel point in the depth image;
the generating unit is used for generating a point cloud corresponding to the foreground area;
and the detection unit is used for detecting whether the target object is included in the depth image or not according to the point cloud.
13. An apparatus for object detection, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the object detection method of claims 1-11 according to instructions in the program code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911175243.8A CN111144213B (en) | 2019-11-26 | 2019-11-26 | Object detection method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911175243.8A CN111144213B (en) | 2019-11-26 | 2019-11-26 | Object detection method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111144213A true CN111144213A (en) | 2020-05-12 |
CN111144213B CN111144213B (en) | 2023-08-18 |
Family
ID=70516667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911175243.8A Active CN111144213B (en) | 2019-11-26 | 2019-11-26 | Object detection method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111144213B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112001298A (en) * | 2020-08-20 | 2020-11-27 | 佳都新太科技股份有限公司 | Pedestrian detection method, device, electronic equipment and storage medium |
CN113657303A (en) * | 2021-08-20 | 2021-11-16 | 北京千丁互联科技有限公司 | Room structure identification method and device, terminal device and readable storage medium |
WO2021244364A1 (en) * | 2020-06-03 | 2021-12-09 | 苏宁易购集团股份有限公司 | Pedestrian detection method and device based on depth images |
CN113965695A (en) * | 2021-09-07 | 2022-01-21 | 福建库克智能科技有限公司 | Method, system, device, display unit and medium for image display |
CN113989276A (en) * | 2021-12-23 | 2022-01-28 | 珠海视熙科技有限公司 | Detection method and detection device based on depth image and camera equipment |
WO2022111682A1 (en) * | 2020-11-30 | 2022-06-02 | 深圳市普渡科技有限公司 | Moving pedestrian detection method, electronic device and robot |
CN115115655A (en) * | 2022-06-17 | 2022-09-27 | 重庆长安汽车股份有限公司 | Object segmentation method, device, electronic device, storage medium and program product |
CN115623318A (en) * | 2022-12-20 | 2023-01-17 | 荣耀终端有限公司 | Focusing method and related device |
CN118314358A (en) * | 2024-03-08 | 2024-07-09 | 钛玛科(北京)工业科技有限公司 | Visual measurement system for rubber curling |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156937A (en) * | 2013-05-15 | 2014-11-19 | 株式会社理光 | Shadow detection method and device |
CN104361577A (en) * | 2014-10-20 | 2015-02-18 | 湖南戍融智能科技有限公司 | Foreground detection method based on fusion of depth image and visible image |
CN104751146A (en) * | 2015-04-13 | 2015-07-01 | 中国科学技术大学 | Indoor human body detection method based on 3D (three-dimensional) point cloud image |
CN105096300A (en) * | 2014-05-08 | 2015-11-25 | 株式会社理光 | Object detecting method and device |
CN106526610A (en) * | 2016-11-04 | 2017-03-22 | 广东电网有限责任公司电力科学研究院 | Power tower automatic positioning method and apparatus based on unmanned aerial vehicle laser point cloud |
CN109658441A (en) * | 2018-12-14 | 2019-04-19 | 四川长虹电器股份有限公司 | Foreground detection method and device based on depth information |
CN110111414A (en) * | 2019-04-10 | 2019-08-09 | 北京建筑大学 | A kind of orthography generation method based on three-dimensional laser point cloud |
CN110321826A (en) * | 2019-06-26 | 2019-10-11 | 贵州省交通规划勘察设计研究院股份有限公司 | A kind of unmanned plane side slope vegetation classification method based on plant height |
-
2019
- 2019-11-26 CN CN201911175243.8A patent/CN111144213B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156937A (en) * | 2013-05-15 | 2014-11-19 | 株式会社理光 | Shadow detection method and device |
CN105096300A (en) * | 2014-05-08 | 2015-11-25 | 株式会社理光 | Object detecting method and device |
CN104361577A (en) * | 2014-10-20 | 2015-02-18 | 湖南戍融智能科技有限公司 | Foreground detection method based on fusion of depth image and visible image |
CN104751146A (en) * | 2015-04-13 | 2015-07-01 | 中国科学技术大学 | Indoor human body detection method based on 3D (three-dimensional) point cloud image |
CN106526610A (en) * | 2016-11-04 | 2017-03-22 | 广东电网有限责任公司电力科学研究院 | Power tower automatic positioning method and apparatus based on unmanned aerial vehicle laser point cloud |
CN109658441A (en) * | 2018-12-14 | 2019-04-19 | 四川长虹电器股份有限公司 | Foreground detection method and device based on depth information |
CN110111414A (en) * | 2019-04-10 | 2019-08-09 | 北京建筑大学 | A kind of orthography generation method based on three-dimensional laser point cloud |
CN110321826A (en) * | 2019-06-26 | 2019-10-11 | 贵州省交通规划勘察设计研究院股份有限公司 | A kind of unmanned plane side slope vegetation classification method based on plant height |
Non-Patent Citations (6)
Title |
---|
CHEN X等: "3d object proposals using stereo imagery for accurate object class detection", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
CHEN X等: "Multi-view 3d object detection network for autonomous driving", 《PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
MA, L等: "Mobile Laser Scanned Point-Clouds for Road Object Detection and Extraction: A Review", 《REMOTE SENS》 * |
PULITI S等: "Assessing 3D point clouds from aerial photographs for species-specific forest inventories", 《SCANDINAVIAN JOURNAL OF FOREST RESEARCH》 * |
张银等: "三维激光雷达在无人车环境感知中的应用研究", 《激光与光电子学进展》 * |
李永强等: "车载激光扫描数据中杆状地物提取", 《测绘科学》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021244364A1 (en) * | 2020-06-03 | 2021-12-09 | 苏宁易购集团股份有限公司 | Pedestrian detection method and device based on depth images |
CN112001298B (en) * | 2020-08-20 | 2021-09-21 | 佳都科技集团股份有限公司 | Pedestrian detection method, device, electronic equipment and storage medium |
CN112001298A (en) * | 2020-08-20 | 2020-11-27 | 佳都新太科技股份有限公司 | Pedestrian detection method, device, electronic equipment and storage medium |
WO2022111682A1 (en) * | 2020-11-30 | 2022-06-02 | 深圳市普渡科技有限公司 | Moving pedestrian detection method, electronic device and robot |
CN113657303A (en) * | 2021-08-20 | 2021-11-16 | 北京千丁互联科技有限公司 | Room structure identification method and device, terminal device and readable storage medium |
CN113657303B (en) * | 2021-08-20 | 2024-04-23 | 北京千丁互联科技有限公司 | Room structure identification method, device, terminal equipment and readable storage medium |
CN113965695A (en) * | 2021-09-07 | 2022-01-21 | 福建库克智能科技有限公司 | Method, system, device, display unit and medium for image display |
CN113989276B (en) * | 2021-12-23 | 2022-03-29 | 珠海视熙科技有限公司 | Detection method and detection device based on depth image and camera equipment |
CN113989276A (en) * | 2021-12-23 | 2022-01-28 | 珠海视熙科技有限公司 | Detection method and detection device based on depth image and camera equipment |
CN115115655A (en) * | 2022-06-17 | 2022-09-27 | 重庆长安汽车股份有限公司 | Object segmentation method, device, electronic device, storage medium and program product |
CN115623318A (en) * | 2022-12-20 | 2023-01-17 | 荣耀终端有限公司 | Focusing method and related device |
CN115623318B (en) * | 2022-12-20 | 2024-04-19 | 荣耀终端有限公司 | Focusing method and related device |
CN118314358A (en) * | 2024-03-08 | 2024-07-09 | 钛玛科(北京)工业科技有限公司 | Visual measurement system for rubber curling |
Also Published As
Publication number | Publication date |
---|---|
CN111144213B (en) | 2023-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111144213B (en) | Object detection method and related equipment | |
CN110675418B (en) | Target track optimization method based on DS evidence theory | |
CN110807809B (en) | Light-weight monocular vision positioning method based on point-line characteristics and depth filter | |
CN106940704B (en) | Positioning method and device based on grid map | |
JP5487298B2 (en) | 3D image generation | |
CN112258600A (en) | Simultaneous positioning and map construction method based on vision and laser radar | |
CN110689562A (en) | Trajectory loop detection optimization method based on generation of countermeasure network | |
CN109472820B (en) | Monocular RGB-D camera real-time face reconstruction method and device | |
CN105069804B (en) | Threedimensional model scan rebuilding method based on smart mobile phone | |
CN107274483A (en) | A kind of object dimensional model building method | |
CN111160291B (en) | Human eye detection method based on depth information and CNN | |
CN114140527B (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN111369495A (en) | Video-based panoramic image change detection method | |
CN115376109B (en) | Obstacle detection method, obstacle detection device, and storage medium | |
CN117593650B (en) | Moving point filtering vision SLAM method based on 4D millimeter wave radar and SAM image segmentation | |
CN114782628A (en) | Indoor real-time three-dimensional reconstruction method based on depth camera | |
CN116468786B (en) | Semantic SLAM method based on point-line combination and oriented to dynamic environment | |
CN111178193A (en) | Lane line detection method, lane line detection device and computer-readable storage medium | |
CN114608522B (en) | Obstacle recognition and distance measurement method based on vision | |
CN110428461B (en) | Monocular SLAM method and device combined with deep learning | |
CN106709432B (en) | Human head detection counting method based on binocular stereo vision | |
CN103077536B (en) | Space-time mutative scale moving target detecting method | |
CN112446355B (en) | Pedestrian recognition method and people stream statistics system in public place | |
CN117726747A (en) | Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene | |
CN107274477B (en) | Background modeling method based on three-dimensional space surface layer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |