WO2021244364A1 - 基于深度图像的行人检测方法及装置 - Google Patents

基于深度图像的行人检测方法及装置 Download PDF

Info

Publication number
WO2021244364A1
WO2021244364A1 PCT/CN2021/095972 CN2021095972W WO2021244364A1 WO 2021244364 A1 WO2021244364 A1 WO 2021244364A1 CN 2021095972 W CN2021095972 W CN 2021095972W WO 2021244364 A1 WO2021244364 A1 WO 2021244364A1
Authority
WO
WIPO (PCT)
Prior art keywords
ground
marker
human body
depth
fitting formula
Prior art date
Application number
PCT/CN2021/095972
Other languages
English (en)
French (fr)
Inventor
荆伟
尹延涛
梁贵钘
李永翔
Original Assignee
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司 filed Critical 苏宁易购集团股份有限公司
Publication of WO2021244364A1 publication Critical patent/WO2021244364A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates to the technical field of image recognition, in particular to a method and device for detecting pedestrians based on depth images.
  • the few pedestrian detections mostly use the oblique shooting scheme.
  • the advantage is that the shooting area is large and it is easy to obtain more characteristic information.
  • the occlusion problem leads to the lack of some characteristic information, such as two people standing side by side. Go, part of one person's physical characteristics will be obscured by another person. For complex scenes such as unmanned stores, obstruction may cause the problem of failure to settle out of the store, which affects the user's shopping experience.
  • the purpose of the present invention is to provide a method and device for pedestrian detection based on depth images.
  • Pedestrian data in a scene is collected through a depth camera overhead shot, which effectively solves the problem of missing occlusion information caused by a single camera oblique shot, and improves pedestrians Check the accuracy of the data.
  • the first aspect of the present invention provides a pedestrian detection method based on a depth image, where the depth image is acquired by a depth camera overhead, and the method includes:
  • the region growth method is adopted to merge and/or segment the human body regions in the foreground region to obtain human body detection data.
  • the method for constructing a ground fitting formula based on the ground area selected by the frame in the first depth image includes:
  • the method of constructing a corresponding marker fitting formula based on the marker region includes:
  • S21 Counting a data set corresponding to the marker area one-to-one, and the data set includes a plurality of image points;
  • S23 Construct an initial marker fitting formula based on the currently selected n image points, traverse the unselected image points in the initial data set, and substitute them into the initial marker fitting formula in turn to calculate the marker fitting value of the corresponding image point;
  • the method of fusing the ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula to obtain the background mask of the current scene includes:
  • the method for updating the background of the background mask according to the pixels in the second depth image of multiple consecutive frames and the pixels in the background mask includes:
  • the pixel comparison of the third depth image obtained in real time with the updated background mask, and the method of locking the foreground area containing human pixels in the third depth image includes:
  • the method of merging and/or segmenting the human body regions in the foreground region by using the region growth method includes:
  • the connected domain labeling algorithm is used to identify the set of human pixel points in the foreground area
  • each connection is orthographically projected to the ground area to obtain the angle between each connection and the ground equation.
  • the two human body pixel point sets are corresponding to the human body regions generated by two different human bodies, and the two human body pixel point sets are regarded as the human body regions generated by the same human body.
  • the method for obtaining human body detection data includes:
  • down-sampling is used to find the local highest pixel of a human body area or multiple human body areas;
  • the head area of a human body area or multiple human body areas is locked by the area growth method, and the human body detection data corresponding to a human body area or multiple human body areas is calculated by using the ground equation, and the human body detection data includes the body height and the pixels of the head Point coordinates.
  • the marker area is a shelf area
  • the second aspect of the present invention provides a depth image-based pedestrian detection device, which is applied to the depth image-based pedestrian detection method described in the above technical solution, and the device includes:
  • a fitting formula constructing unit constructing a ground fitting formula based on the ground area selected by the frame in the first depth image, and constructing a marker fitting formula corresponding to the marker area one-to-one based on at least one marker area;
  • the mask generation unit merges the ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula to obtain the background mask of the current scene;
  • the mask update unit updates the background of the background mask according to the pixels in the second depth image of multiple consecutive frames and the pixels in the background mask;
  • the foreground area recognition unit compares the pixels of the third depth image acquired in real time with the updated background mask, and locks the foreground area containing the human body pixels in the third depth image;
  • the human body detection unit adopts a region growth method to merge and/or segment the human body regions in the foreground region to obtain human body detection data.
  • a third aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is run by a processor, the steps of the above-mentioned depth image-based pedestrian detection method are executed.
  • the depth image-based pedestrian detection method provided by the present invention can be divided into an algorithm preparation phase, an algorithm initialization phase, and an algorithm detection application phase in actual application.
  • the algorithm preparation phase is also the background mask generation phase.
  • the specific process is : First obtain the depth image of the current detection scene through the depth camera, and select the ground area and at least one marker area in the first depth image, construct the ground fitting formula and the corresponding marker fitting formula, and then The ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula are merged to obtain the background mask of the current scene.
  • the algorithm initialization stage is also the background mask update stage.
  • the specific process is: according to the pixel parameter values in the acquired multi-frame continuous second depth image and the pixel parameter values in the background mask, the background mask is updated .
  • the algorithm detection application stage can be divided into the foreground region recognition stage and the human body region detection stage.
  • the corresponding specific process is: compare the third depth image acquired in real time with the updated background mask to lock the third depth image
  • the human body regions in the foreground region are merged and/or segmented using the region growth method to obtain human body detection data of one human body region or multiple human body regions.
  • the present invention uses the overhead mode to obtain the depth image and establishes the background mask, which solves the problem of the lack of information caused by oblique shooting, and improves the applicable scene of pedestrian detection.
  • the use of depth cameras is compared with ordinary cameras. It is said that the information dimension of the image is increased, and data including the height of the human body and the three-dimensional space coordinates of the head can be obtained, which improves the accuracy of the pedestrian detection data.
  • FIG. 1 is a schematic flowchart of a method for detecting pedestrians based on depth images in Embodiment 1 of the present invention.
  • this embodiment provides a method for detecting pedestrians based on depth images.
  • the depth images are acquired by a depth camera overhead.
  • the method includes:
  • the region growth method is adopted to merge and/or segment the human body regions in the foreground region to obtain human body detection data.
  • the depth image-based pedestrian detection method provided in this embodiment can be divided into an algorithm preparation phase, an algorithm initialization phase, and an algorithm detection application phase in actual application.
  • the algorithm preparation phase is also the background mask generation phase.
  • the specific process is It is: first obtain the depth image of the current detection scene through the depth camera, and select the ground area and at least one marker area in the first depth image to construct the ground fitting formula and the corresponding marker fitting formula, Then, the ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula are merged to obtain the background mask of the current scene.
  • the algorithm initialization stage is also the background mask update stage.
  • the specific process is: according to the pixel parameter values in the acquired multi-frame continuous second depth image and the pixel parameter values in the background mask, the background mask is updated .
  • the algorithm detection application stage can be divided into the foreground region recognition stage and the human body region detection stage.
  • the corresponding specific process is: compare the third depth image acquired in real time with the updated background mask to lock the third depth image
  • the human body regions in the foreground region are merged and/or segmented using the region growth method to obtain human body detection data of one human body region or multiple human body regions.
  • this embodiment uses the overhead shot to obtain the depth image and establishes the background mask, which solves the problem of missing information caused by oblique shots and improves the applicable scenes of pedestrian detection.
  • the use of depth cameras is compared with ordinary cameras. In other words, the information dimension of the image is increased, and data including the height of the human body and the three-dimensional space coordinates of the head can be obtained, which improves the accuracy of the pedestrian detection data.
  • the difference between the first depth image, the second depth image, and the third depth image in the above-mentioned embodiments is only the difference in use.
  • the first depth image is for constructing a ground fitting formula and constructing a ground fitting formula.
  • the second depth image is used for updating the background mask
  • the third depth image is a real-time detection image used to obtain human body detection data.
  • the depth image of the first frame obtained by the depth camera overhead shot of the monitored area is used as the first depth image
  • the depth image of the second frame to the 100th frame is used as the second depth image.
  • the depth camera The real-time image obtained from the overhead monitoring area is used as the third depth image.
  • the method of constructing a ground fitting formula based on the ground area selected by the frame in the first depth image includes:
  • the method for constructing a corresponding marker fitting formula based on the marker region includes:
  • S21 Counting a data set corresponding to the marker area one-to-one, and the data set includes a plurality of image points;
  • S23 Construct an initial marker fitting formula based on the currently selected n image points, traverse the unselected image points in the initial data set, and substitute them into the initial marker fitting formula in turn to calculate the marker fitting value of the corresponding image point;
  • i represents the number of the depth camera
  • the value of i is 1, that is, the ground fitting formula is constructed only for the first depth image taken by this depth camera, if If w depth cameras are used in the whole scene, the value of i traverses 1 to n respectively, that is, it is necessary to construct the corresponding ground fitting formula one by one for only the first depth images taken by the w depth cameras.
  • the initial ground fitting formula calculates the ground fitting value error_current corresponding to the traversed image points, and filter out the ground fitting values less than the first threshold e to form the effective ground fitting value set corresponding to the initial ground fitting formula of this round.
  • the result is obtained by accumulating all the ground fitting values in the effective ground fitting value set of this round
  • the ground fitting formula is constructed based on the values of a , b , c , and d in the initial ground fitting formula of this round, and in this round ⁇ , You need to repeat the above steps to enter the next round, that is, re-select 3 image points to form the ground initial data set, construct the initial ground fitting formula and get the cumulative result of all ground fitting values in this round, until all rounds
  • the initial ground fitting formula corresponding to the minimum value of the cumulative result of all ground fitting values is defined as the ground fitting formula.
  • the interference of some abnormal points can be effectively avoided, and the ground fitting formula obtained is more suitable for the ground.
  • the values of a , b , c , and d in the ground fitting formula are obtained by using a random consensus algorithm Therefore, the obtained ground fitting formula can be used as the optimal model of the ground area in the first depth image, which can effectively filter out the influence of abnormal points and prevent the established ground equation from deviating from the ground.
  • the construction process of the marker fitting formula is logically consistent with the construction process of the ground fitting formula.
  • This embodiment will not go into details here, but it should be emphasized that since there is usually more than one marker area, it is necessary to address multiple areas. One-to-one correspondence between the marker area and the marker fitting formula.
  • the method of fusing the ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula to obtain the background mask of the current scene includes:
  • the equation represents the ground equation, when the numerator When it is a marker fitting formula, and the denominators a , b , and c are the values in the marker fitting formula, the equation represents the marker equation.
  • the ground distance and marker distance of the image point are obtained by traversing all the image points in the first depth image and substituting them into the ground equation and marker equation respectively, and filter out that the ground distance is less than the ground
  • the threshold image points are filled as a ground mask, and the image points with the selected marker distance less than the marker threshold are filled as the marker mask.
  • the ground threshold and the marker threshold are both set to 10 cm, that is, the area within 10 cm of the ground is defined as a ground mask, and the area within 10 cm of the marker is defined as a marker mask, and finally the ground mask and all
  • the marker mask area of is defined as the background mask of the current scene.
  • the method for updating the background of the background mask according to the pixels in the second depth image and the pixels in the background mask of multiple consecutive frames includes:
  • the internal and external parameters of the depth camera are first calibrated to convert the image from two-dimensional coordinates to three-dimensional coordinates, so as to perform related calculations through actual physical meaning. Then, each depth camera is used to continuously shoot 100 frames of second depth images, and the background mask is updated for the 100 frames of second depth images captured by each depth camera.
  • the update process is as follows: by comparing the depth values of pixels (row, col) at the same position in 100 frames of second depth images, filter out each pixel point (row, col) at the same position from 100 frames of second depth images Corresponding to the maximum value of the depth value, so that the depth value corresponding to each position pixel (row, col) in the output second depth image of the 100th frame is the maximum value of the above-mentioned 100 second depth image.
  • the purpose of this setting is : Because the depth camera adopts the overhead shooting scheme, when a passing object (such as a pedestrian passing by) appears in the second depth image, the depth value of the pixel at the corresponding position will become smaller.
  • each position pixel and its corresponding depth value in the 100th frame of the second depth image uses each position pixel and its corresponding depth value in the 100th frame of the second depth image to compare the size value with each position pixel and its corresponding depth value in the background mask to identify the pixels whose depth values have changed. Update the depth value of the pixel at the corresponding position in the background mask to a small value in the comparison result to ensure the accuracy of the updated background mask.
  • the pixel point parameter value is represented by the coordinate parameter of the pixel point in the pixel coordinate system
  • the image point parameter value is represented by the coordinate parameter of the image point in the visual coordinate system.
  • the method of comparing the pixels of the third depth image acquired in real time with the updated background mask, and locking the foreground area containing the human body pixels in the third depth image includes:
  • the noise in the third depth image acquired in real time can be effectively filtered out, and the accuracy of foreground region recognition can be improved.
  • the method of merging and/or segmenting the human body regions in the foreground region by using the region growth method includes:
  • the connected domain labeling algorithm is used to identify the set of human pixel points in the foreground area
  • each connection is orthographically projected to the ground area to obtain the angle between each connection and the ground equation.
  • the two human body pixel point sets are corresponding to the human body regions generated by two different human bodies, and the two human body pixel point sets are regarded as the human body regions generated by the same human body.
  • the connected domain labeling algorithm is used to identify the set of human pixel points in the foreground area.
  • the growth threshold th_grow is set to limit the growth range and cut-off conditions, and then the growth mode is set to the eight-connected growth mode, in the foreground area Traverse the pixels from the top left to the bottom right to identify the set of human pixel points, if the pixels are not traversed, then , If the pixel has been traversed to calculate the growth condition of the next pixel, if it meets ⁇ th_grow , it means the next With current The growth difference is less than the threshold th_grow , then set to the next For current , Otherwise the growth of the pixels in this direction will end, and the growth will be re-growth in the other direction until all , That is, all
  • this scheme can control the conditions of growth restriction through the growth threshold to prevent the presence of shadows in dense crowds. Then, by calculating the center point of each human body pixel point set and the projection distance of the two center points on the ground area, such as center point A and center point B, the two-point straight line segment is AB, and the angle between AB and the ground equation is ⁇ , The calculation formula for the distance of the human body corresponding to the two pixel point sets of the human body is , If the human body distance is greater than the distance threshold, the two human body pixel point sets are corresponding to the human body area generated by two different human bodies, otherwise, the two human body pixel point sets are regarded as the human body area generated by the same human body.
  • the method for obtaining human body detection data includes:
  • down-sampling is used to find the local highest pixel of a human body area or multiple human body areas;
  • the head area of a human body area or multiple human body areas is locked by the area growth method, and the human body detection data corresponding to a human body area or multiple human body areas is calculated by using the ground equation, and the human body detection data includes the body height and the pixels of the head Point coordinates.
  • the down-sampling method is used to find the local highest pixel points of a human body area or multiple human body areas based on the set distance interval, and then the head area of a human body area or multiple human body areas is obtained through the growth of a small area.
  • This step is a variant of restricted area growth, allowing growth to high places.
  • a threshold is added when growing to low places to prevent the head from growing too much to the shoulders, and then calculate the average of the pixels in the head area to get the head
  • the human body detection data includes the head area and body height, as well as the two-dimensional and three-dimensional coordinates of the head center point.
  • the viewing angle of each depth camera can be overlapped to maximize the use of the camera's viewing angle coverage.
  • the cross-mirror tracking technology of the REID module is used to realize the pedestrian tracking and detection function.
  • the detection of each depth camera is carried out separately, and finally the data is merged through mutual verification.
  • This embodiment provides a pedestrian detection device based on depth images, including:
  • a fitting formula constructing unit constructing a ground fitting formula based on the ground area selected by the frame in the first depth image, and constructing a marker fitting formula corresponding to the marker area on a one-to-one basis based on at least one marker area;
  • the mask generation unit merges the ground mask established by the ground fitting formula and the marker mask established by each marker fitting formula to obtain the background mask of the current scene;
  • the mask update unit updates the background of the background mask according to the pixels in the second depth image of multiple consecutive frames and the pixels in the background mask;
  • the foreground area recognition unit compares the pixels of the third depth image acquired in real time with the updated background mask, and locks the foreground area containing the human body pixels in the third depth image;
  • the human body detection unit adopts a region growth method to merge and/or segment the human body regions in the foreground region to obtain human body detection data.
  • the beneficial effects of the depth image-based pedestrian detection device provided by the embodiment of the present invention are the same as the beneficial effects of the depth image-based pedestrian detection method provided in the first embodiment, and will not be repeated here.
  • This embodiment provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is run by a processor, the steps of the above-mentioned depth image-based pedestrian detection method are executed.
  • the above-mentioned inventive method can be implemented by a program instructing relevant hardware.
  • the above-mentioned program can be stored in a computer readable storage medium. When the program is executed, it includes For each step of the method in the foregoing embodiment, the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开一种基于深度图像的行人检测方法及装置,通过深度摄像头俯拍采集场景内的行人数据,提升了行人检测数据的准确性。该方法包括:基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建对应的标记物拟合公式;将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。

Description

基于深度图像的行人检测方法及装置 技术领域
本发明涉及图像识别技术领域,尤其涉及一种基于深度图像的行人检测方法及装置。
背景技术
在人工智能蓬勃发展的时代,各种新的事物如雨后春笋一样发展起来,无人超市、无人商店等新事物纷纷涌现。随着智能零售的时代潮流,将线下零售和人工智能相结合,提供一种和线上购物一样流畅的全新购物方式成为新的研究方向。通过在一个封闭场景利用全覆盖式摄像头拍摄进入场景内的每一个顾客的行为轨迹,实时提供商品推荐和结算等服务,真正意义上做到即拿即走的无感知购物体验。。
技术问题
目前为数不多的行人检测多采用斜拍方案,优点在于拍摄投影面积较大,便于获取更多的特征信息,但随之而来的是遮挡问题导致部分特征信息的缺失,如两人并排行走,一人的部分身体特征会被另一人遮挡,对于无人店这样的复杂场景中,遮挡可能带来出店无法结算的问题,影响了用户的购物体验。
技术解决方案
本发明的目的在于提供一种基于深度图像的行人检测方法及装置,通过深度摄像头俯拍的形式采集场景内的行人数据,有效解决了单摄像头斜拍带来的遮挡信息缺失问题,提高了行人检测数据的准确性。
为了实现上述目的,本发明的第一方面提供一种基于深度图像的行人检测方法,所述深度图像由深度摄像头俯拍获取,所述方法包括:
基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
优选地,基于第一深度图像中框选的地面区域构建地面拟合公式的方法包括:
S11,统计与地面区域对应的数据集合,所述数据集合包括多个图像点;
S12,从地面区域中随机选择n个图像点组建地面初始数据集,n≥3且n为整数;
S13,基于当前选择的n个图像点构建初始地面拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始地面拟合公式计算对应图像点的地面拟合值;
S14,将小于第一阈值的地面拟合值筛选出来,生成第i轮的有效地面拟合值集合,i的初始值为1;
S15,当第i轮的有效地面拟合值集合对应的图像点数量与地面区域中图像点总数量的比值大于第二阈值,则将第i轮有效地面拟合值集合中的全部地面拟合值累加;
S16,当第i轮中全部地面拟合值的累加结果小于第三阈值,则将第i轮对应的初始地面拟合公式定义为地面拟合公式,当第i轮对应的全部地面拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S12,否则执行步骤S17;
S17,将所有轮中全部地面拟合值累加结果最小值对应的初始地面拟合公式定义为地面拟合公式。
较佳地,基于标记物区域构建对应的标记物拟合公式的方法包括:
S21,统计与标记物区域一一对应的数据集合,所述数据集合中包括多个图像点;
S22,从标记物区域中随机选择n个图像点组建标记物初始数据集,n≥3且n为整数;
S23,基于当前选择的n个图像点构建初始标记物拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始标记物拟合公式计算对应图像点的标记物拟合值;
S24,将小于第一阈值的标记物拟合值筛选出来,生成第i轮的有效标记物拟合值集合,i的初始值为1;
S25,当第i轮的有效标记物拟合值集合对应的图像点数量与标记物区域中图像点总数量的比值大于第二阈值,则将第i轮有效标记物拟合值集合中的全部标记物拟合值累加;
S26,当第i轮中全部标记物拟合值的累加结果小于第三阈值,则将第i轮对应的初始标记物拟合公式定义为标记物拟合公式,当第i轮对应的全部标记物拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S22,否则执行步骤S27;
S27,将所有轮中全部标记物拟合值累加结果最小值对应的初始标记物拟合公式定义为标记物拟合公式。
进一步地,将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版的方法包括:
基于地面拟合公式构建地面方程,以及基于标记物拟合公式构建标记物方程;
遍历第一深度图像中的图像点,分别代入地面方程和标记物方程得到该图像点的地面距离和标记物距离;
筛选出地面距离小于地面阈值的图像点填充为地面蒙版,以及筛选出标记物距离小于标记物阈值的图像点填充为标记物蒙版;
将地面蒙版和全部的标记物蒙版融合,得到当前场景的背景蒙版。
优选地,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新的方法包括:
依次将第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行大小值比对,m的初始值为1;
识别深度值发生变化的像素点,将第m+1帧第二深度图像中对应位置像素点的深度值更新为比对结果中的大值,令m=m+1,重新对第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行比对,直至得到最后一帧第二深度图像中各位置像素点及其对应的深度值;
将最后一帧第二深度图像中各位置像素点及其对应的深度值与背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
识别深度值发生变化的像素点,将背景蒙版中对应位置像素点的深度值更新为比对结果中的小值。
较佳地,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域的方法包括:
将实时获取的第三深度图像中的各位置像素点及其对应的深度值与更新后的背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
识别所述第三深度图像中深度值变小的像素点,汇总得到包含人体像素的前景区域。
进一步地,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理的方法包括:
根据设定的生长阈值,采用连通域标记算法识别出前景区域中的人体像素点集;
识别前景区域中人体像素点集的数量,当为多个人体像素点集时分别计算每个人体像素点集的中心点;
将得到的中心点两两连线并计算连线距离,同时将各条连线向地面区域正投影,分别获取到每条连线与地面方程的夹角
Figure dest_path_image001
基于连线距离及对应的夹角
Figure 862367dest_path_image001
,得到两个人体像素点集对应的人体距离;
当人体距离大于距离阈值时,将两个人体像素点集对应为两个不同人体产生的人体区域,反之将两个人体像素点集视为同一个人体产生的人体区域。
优选地,得到人体检测数据的方法包括:
基于设定的距离间隔采用降采样方式寻找一个人体区域或者多个人体区域的局部最高像素点;
通过区域生长方式锁定一个人体区域或者多个人体区域的头部区域,同时利用地面方程计算一个人体区域或多个人体区域对应的人体检测数据,所述人体检测数据包括人体身高和头部的像素点坐标。
较佳地,所述标记物区域为货架区域
本发明的第二方面提供一种基于深度图像的行人检测装置,应用于上述技术方案所述的基于深度图像的行人检测方法中,所述装置包括:
拟合公式构建单元,基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
蒙版生成单元,将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
蒙版更新单元,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
前景区域识别单元,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
人体检测单元,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
本发明的第三方面提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器运行时执行上述基于深度图像的行人检测方法的步骤。
有益效果
本发明提供的基于深度图像的行人检测方法,实际应用时可将其划分为算法准备阶段、算法初始化阶段和算法检测应用阶段,其中,算法准备阶段也即背景蒙版生成阶段,其具体过程为:首先获取通过深度摄像头俯拍当前检测场景的深度图像,并在第一深度图像中框选出地面区域和至少一个标记物区域,构建出地面拟合公式及对应的标记物拟合公式,然后将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版。算法初始化阶段也即背景蒙版更新阶段,其具体过程为:根据获取的多帧连续第二深度图像中的像素点参数值以及背景蒙版中的像素点参数值,对背景蒙版进行背景更新。算法检测应用阶段可分为前景区域识别阶段和人体区域检测阶段,其对应的具体过程为:将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到一个人体区域或者多个人体区域的人体检测数据。
可见,本发明使用俯拍方式获取深度图像并建立的背景蒙版,解决了斜拍带来遮挡导致信息缺失的问题,提升了行人检测的适用场景,另外,使用深度相机相比较于普通相机来说增加了图像的信息维度,可获取到包括人体身高和头部三维空间坐标的数据,提高了行人检测数据的准确性。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明实施例一中基于深度图像的行人检测方法的流程示意图。
本发明的实施方式
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。
实施例一
请参阅图1,本实施例提供一种基于深度图像的行人检测方法,深度图像由深度摄像头俯拍获取,该方法包括:
基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
本实施例提供的基于深度图像的行人检测方法,实际应用时可将其划分为算法准备阶段、算法初始化阶段和算法检测应用阶段,其中,算法准备阶段也即背景蒙版生成阶段,其具体过程为:首先获取通过深度摄像头俯拍当前检测场景的深度图像,并在第一深度图像中框选出地面区域和至少一个标记物区域,构建出地面拟合公式及对应的标记物拟合公式,然后将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版。算法初始化阶段也即背景蒙版更新阶段,其具体过程为:根据获取的多帧连续第二深度图像中的像素点参数值以及背景蒙版中的像素点参数值,对背景蒙版进行背景更新。算法检测应用阶段可分为前景区域识别阶段和人体区域检测阶段,其对应的具体过程为:将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到一个人体区域或者多个人体区域的人体检测数据。
可见,本实施例使用俯拍方式获取深度图像并建立的背景蒙版,解决了斜拍带来遮挡导致信息缺失的问题,提升了行人检测的适用场景,另外,使用深度相机相比较于普通相机来说增加了图像的信息维度,可获取到包括人体身高和头部三维空间坐标的数据,提高了行人检测数据的准确性。
需要说明的是,上述实施例中的第一深度图像、第二深度图像和第三深度图像的区别仅在于用途不同,其中,第一深度图像是为构建地面拟合公式和构建地面拟合公式所使用,第二深度图像是为更新背景蒙版所使用,第三深度图像是用于获取人体检测数据的实时检测图像。例如,将通过深度摄像头俯拍监控区域得到的第1帧深度图像作为第一深度图像,将第2帧-第100帧深度图像作为第二深度图像,在背景蒙版更新完成后,将深度摄像头俯拍监控区域得到的实时图像作为第三深度图像。
上述实施例中,基于第一深度图像中框选的地面区域构建地面拟合公式的方法包括:
S11,统计与地面区域对应的数据集合,所述数据集合包括多个图像点;
S12,从地面区域中随机选择n个图像点组建地面初始数据集,n≥3且n为整数;
S13,基于当前选择的n个图像点构建初始地面拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始地面拟合公式计算对应图像点的地面拟合值;
S14,将小于第一阈值的地面拟合值筛选出来,生成第i轮的有效地面拟合值集合,i的初始值为1;
S15,当第i轮的有效地面拟合值集合对应的图像点数量与地面区域中图像点总数量的比值大于第二阈值,则将第i轮有效地面拟合值集合中的全部地面拟合值累加;
S16,当第i轮中全部地面拟合值的累加结果小于第三阈值,则将第i轮对应的初始地面拟合公式定义为地面拟合公式,当第i轮对应的全部地面拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S12,否则执行步骤S17;
S17,将所有轮中全部地面拟合值累加结果最小值对应的初始地面拟合公式定义为地面拟合公式。
上述实施例中,基于标记物区域构建对应的标记物拟合公式的方法包括:
S21,统计与标记物区域一一对应的数据集合,所述数据集合中包括多个图像点;
S22,从标记物区域中随机选择n个图像点组建标记物初始数据集,n≥3且n为整数;
S23,基于当前选择的n个图像点构建初始标记物拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始标记物拟合公式计算对应图像点的标记物拟合值;
S24,将小于第一阈值的标记物拟合值筛选出来,生成第i轮的有效标记物拟合值集合,i的初始值为1;
S25,当第i轮的有效标记物拟合值集合对应的图像点数量与标记物区域中图像点总数量的比值大于第二阈值,则将第i轮有效标记物拟合值集合中的全部标记物拟合值累加;
S26,当第i轮中全部标记物拟合值的累加结果小于第三阈值,则将第i轮对应的初始标记物拟合公式定义为标记物拟合公式,当第i轮对应的全部标记物拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S22,否则执行步骤S27;
S27,将所有轮中全部标记物拟合值累加结果最小值对应的初始标记物拟合公式定义为标记物拟合公式。
具体实施时,下文以标记物拟合公式为例进行说明:
首先通过程序设定的交互模式框选出地面区域,筛选出仅包含地面图像点的数据集合,然后随机选择3个图像点组建地面初始数据集,采用平面公式拟合初始地面拟合公式,
Figure 590020dest_path_image002
,其中,i表示深度摄像头的编号,若全场景仅使用1台深度摄像头,则i的取值为1,也即仅针对这一台深度摄像头拍摄的第一深度图像构建地面拟合公式,若全场景使用了w台深度摄像头,则i取值分别遍历1至n,也即需要针仅对这w台深度摄像头拍摄的第一深度图像一一构建出对应的地面拟合公式。
在初始地面拟合公式构建完成后,遍历初始数据集中未被选择的图像点(除了已选择的3个图像点),将每个图像点对应的视觉坐标值( x y z)依次代入初始地面拟合公式(
Figure dest_path_image003
)计算出遍历的图像点对应的地面拟合值 error_current,将小于第一阈值e的地面拟合值筛选出来,组成与本轮初始地面拟合公式对应的有效地面拟合值集合,在本轮有效地面拟合值集合中对应的图像点数量与地面区域中图像点总数量的比值大于第二阈值d时,则将本轮有效地面拟合值集合中的全部地面拟合值累加得到结果
Figure 436753dest_path_image004
,并在本轮中
Figure 725915dest_path_image004
Figure dest_path_image005
Figure 782732dest_path_image006
为第三阈值,则基于本轮初始地面拟合公式中的 a b c d的值构建出地面拟合公式,而在本轮中
Figure 64809dest_path_image004
Figure 248272dest_path_image005
,需重复上述步骤进入下一轮,也即重新选择3个图像点组建地面初始数据集,构建出初始地面拟合公式并得到本轮中的全部地面拟合值累加结果,直至将所有轮中全部地面拟合值累加结果最小值对应的初始地面拟合公式定义为地面拟合公式。
通过上述过程,可有效避免一些异常点的干扰,求得的地面拟合公式更加贴合地面,另外,由于地面拟合公式中 a b c d的值是采用随机一致性算法求得的,因此得到的地面拟合公式可作为第一深度图像中地面区域的最优模型,有效的滤除了异常点的影响,防止建立的地面方程偏离地面。
同理,标记物拟合公式的构建过程与地面拟合公式的构建过程逻辑一致,本实施例在此不做赘述,但需要强调的是,由于标记物区域通常不止一个,故需针对多个标记物区域一一对应的标记物拟合公式。
上述实施例中将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版的方法包括:
基于地面拟合公式构建地面方程,以及基于标记物拟合公式构建标记物方程;
遍历第一深度图像中的图像点,分别代入地面方程和标记物方程得到该图像点的地面距离和标记物距离;
筛选出地面距离小于地面阈值的图像点填充为地面蒙版,以及筛选出标记物距离小于标记物阈值的图像点填充为标记物蒙版;
将地面蒙版和全部的标记物蒙版融合,得到当前场景的背景蒙版。
具体实施时,利用通用方程
Figure dest_path_image007
分别计算地面方程和标记物方程,当分子
Figure 488761dest_path_image003
为地面拟合公式,且分母 a b c为地面拟合公式中的值时,则该方程表示地面方程,当分子
Figure 829612dest_path_image003
为标记物拟合公式,且分母 a b c为标记物拟合公式中的值时,则该方程表示标记物方程。在地面方程和标记物方程构建完成后,通过遍历第一深度图像中的全部图像点,并分别代入地面方程和标记物方程得到该图像点的地面距离和标记物距离,筛选出地面距离小于地面阈值的图像点填充为地面蒙版,以及筛选出标记物距离小于标记物阈值的图像点填充为标记物蒙版。
示例性地,地面阈值和标记物阈值均设置为10cm,也即地面10cm以内的区域定义为地面蒙版,将标记物10cm以的区域内定义为标记物蒙版,最终将地面蒙版和全部的标记物蒙版区域定义为当前场景的背景蒙版。通过场景背景蒙版的建立,有效的滤除了标记物区域和地面区域上的噪声,并且解决了深度摄像头拍摄这些区域产生的噪声导致算法性能下降的问题。
上述实施例中,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新的方法包括:
依次将第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行大小值比对,m的初始值为1;
识别深度值发生变化的像素点,将第m+1帧第二深度图像中对应位置像素点的深度值更新为比对结果中的大值,令m=m+1,重新对第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行比对,直至得到最后一帧第二深度图像中各位置像素点及其对应的深度值;
将最后一帧第二深度图像中各位置像素点及其对应的深度值与背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
识别深度值发生变化的像素点,将背景蒙版中对应位置像素点的深度值更新为比对结果中的小值。
具体实施时,首先对深度摄像头的内参和外参进行标定,用来对图像进行二维坐标到三维坐标的转换,以便通过实际的物理意义进行相关计算。然后利用每个深度摄像头连续拍摄100帧第二深度图像,针对每个深度摄像头拍摄的100帧第二深度图像对背景蒙版进行背景更新。更新过程为:通过对100帧第二深度图像中各相同位置像素点(row,col)的深度值进行比较,从100帧第二深度图像中筛选出每个相同位置像素点(row,col) 对应深度值的最大值,使得输出的第100帧第二深度图像中各位置像素点(row,col) 对应的深度值均为上述100帧第二深度图像中的最大值,这样设置的目的在于:由于深度摄像头采用的是俯拍方案,因此当第二深度图像中出现过往物体(如行人穿过)时,相应位置像素点的深度值会变小,通过取100帧第二深度图像中相同位置像素点对应深度值的最大值,可以有效避免第二深度图像偶然出现过往物体造成的影响,避免了背景蒙版中出现过往物体的像素点。然后使用第100帧第二深度图像中各位置像素点及其对应的深度值与背景蒙版中各位置像素点及其对应的深度值进行大小值比对,识别深度值发生变化的像素点,将背景蒙版中对应位置像素点的深度值更新为比对结果中的小值,以确保更新后背景蒙版的准确性。
可以理解的是,像素点参数值通过像素点在像素坐标系中的坐标参数表示,图像点参数值通过图像点在视觉坐标系中的坐标参数表示。
上述实施例中,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域的方法包括:
将实时获取的第三深度图像中的各位置像素点及其对应的深度值与更新后的背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
识别所述第三深度图像中深度值变小的像素点,汇总得到包含人体像素的前景区域。
通过类似帧差法,可有效滤除实时获取的第三深度图像中的噪声,提升前景区域识别的准确性。
上述实施例中,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理的方法包括:
根据设定的生长阈值,采用连通域标记算法识别出前景区域中的人体像素点集;
识别前景区域中人体像素点集的数量,当为多个人体像素点集时分别计算每个人体像素点集的中心点;
将得到的中心点两两连线并计算连线距离,同时将各条连线向地面区域正投影,分别获取到每条连线与地面方程的夹角
Figure 649801dest_path_image001
基于连线距离及对应的夹角
Figure 956279dest_path_image001
,得到两个人体像素点集对应的人体距离;
当人体距离大于距离阈值时,将两个人体像素点集对应为两个不同人体产生的人体区域,反之将两个人体像素点集视为同一个人体产生的人体区域。
具体实施时,通过连通域标记算法识别出前景区域中的人体像素点集,首先设定生长阈值 th_grow以限制生长的范围及截至条件,然后将生长方式设置为八连通生长方式,在前景区域中从左上向右下遍历像素点识别出人体像素点集,若像素点未被遍历则
Figure 164407dest_path_image008
,若像素点已被遍历计算下一个像素点的生长条件,若满足
Figure dest_path_image009
<th_grow,则表示下一
Figure 992554dest_path_image010
与当前
Figure 350855dest_path_image010
的生长差值小于阈值 th_grow,则置下一
Figure 712172dest_path_image010
为当前
Figure 28884dest_path_image010
,否则这个方向的像素点生长截至,转由另一方向重新生长,直至全部
Figure dest_path_image011
,也即全部
Figure 672224dest_path_image010
均被遍历为止得到一个或多个人体像素点集,此方案相比较于面积过滤方案,可通过生长阈值控制限制生长的条件,防止密集人群中出现人影粘连的情况。然后,通过计算各个人体像素点集的中心点及两两中心点在地面区域上的投影距离,如中心点A与中心点B,两点直线线段为AB,AB与地面方程的夹角为θ,两个人体像素点集对应的人体距离的计算公式为
Figure 834215dest_path_image012
,若人体距离大于距离阈值时,将两个人体像素点集对应为两个不同人体产生的人体区域,反之将两个人体像素点集视为同一个人体产生的人体区域。
上述实施例中,得到人体检测数据的方法包括:
基于设定的距离间隔采用降采样方式寻找一个人体区域或者多个人体区域的局部最高像素点;
通过区域生长方式锁定一个人体区域或者多个人体区域的头部区域,同时利用地面方程计算一个人体区域或多个人体区域对应的人体检测数据,所述人体检测数据包括人体身高和头部的像素点坐标。
具体实施时,基于设定的距离间隔采用降采样方式寻找一个人体区域或者多个人体区域的局部最高像素点,然后通过小范围的区域生长得到一个人体区域或者多个人体区域的头部区域,此步骤为限定区域生长的变体,允许向高处生长,在向低处生长时候加了一个阈值,可防止人头过分生长到肩部,然后计算头部区域中像素点的平均值,得到人头中心点的三维坐标(x,y,z),通过公式
Figure dest_path_image013
计算出人头距离地面的身高。综上,人体检测数据包括头部区域和身体身高、以及头部中心点的二维、三维坐标。
考虑到一个拍摄场景中会同时使用多台深度摄像头处理多人同时进入场景内的活动情况,在安装深度摄像头时可使每台深度摄像头拍摄的视角小部分重叠,最大化利用摄像头的视角覆盖面,同时使用REID模块的跨镜追踪技术来实现行人跟踪检测功能。且各个深度摄像头的检测分开进行,最后再通过相互校验融合数据,随着场地的扩大深度摄像头在增加时可迅速扩展,具有良好的算法鲁棒性、多场景复用性、和新版本扩展性。
实施例二
本实施例提供一种基于深度图像的行人检测装置,包括:
拟合公式构建单元,基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
蒙版生成单元,将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
蒙版更新单元,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
前景区域识别单元,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
人体检测单元,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
与现有技术相比,本发明实施例提供的基于深度图像的行人检测装置的有益效果与上述实施例一提供的基于深度图像的行人检测方法的有益效果相同,在此不做赘述。
实施例三
本实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器运行时执行上述基于深度图像的行人检测方法的步骤。
与现有技术相比,本实施例提供的计算机可读存储介质的有益效果与上述技术方案提供的基于深度图像的行人检测方法的有益效果相同,在此不做赘述。
本领域普通技术人员可以理解,实现上述发明方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,上述程序可以存储于计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。
以上,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种基于深度图像的行人检测方法,所述深度图像由深度摄像头俯拍获取,其特征在于,所述方法包括:
    基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
    将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
    根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
    将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
    采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
  2. 根据权利要求1所述的方法,其特征在于,基于第一深度图像中框选的地面区域构建地面拟合公式的方法包括:
    S11,统计与地面区域对应的数据集合,所述数据集合包括多个图像点;
    S12,从地面区域中随机选择n个图像点组建地面初始数据集,n≥3且n为整数;
    S13,基于当前选择的n个图像点构建初始地面拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始地面拟合公式计算对应图像点的地面拟合值;
    S14,将小于第一阈值的地面拟合值筛选出来,生成第i轮的有效地面拟合值集合,i的初始值为1;
    S15,当第i轮的有效地面拟合值集合对应的图像点数量与地面区域中图像点总数量的比值大于第二阈值,则将第i轮有效地面拟合值集合中的全部地面拟合值累加;
    S16,当第i轮中全部地面拟合值的累加结果小于第三阈值,则将第i轮对应的初始地面拟合公式定义为地面拟合公式,当第i轮对应的全部地面拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S12,否则执行步骤S17;
    S17,将所有轮中全部地面拟合值累加结果最小值对应的初始地面拟合公式定义为地面拟合公式。
  3. 根据权利要求1所述的方法,其特征在于,基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式的方法包括:
    S21,统计与标记物区域一一对应的数据集合,所述数据集合中包括多个图像点;
    S22,从标记物区域中随机选择n个图像点组建标记物初始数据集,n≥3且n为整数;
    S23,基于当前选择的n个图像点构建初始标记物拟合公式,遍历初始数据集中未被选择的图像点,将其依次代入初始标记物拟合公式计算对应图像点的标记物拟合值;
    S24,将小于第一阈值的标记物拟合值筛选出来,生成第i轮的有效标记物拟合值集合,i的初始值为1;
    S25,当第i轮的有效标记物拟合值集合对应的图像点数量与标记物区域中图像点总数量的比值大于第二阈值,则将第i轮有效标记物拟合值集合中的全部标记物拟合值累加;
    S26,当第i轮中全部标记物拟合值的累加结果小于第三阈值,则将第i轮对应的初始标记物拟合公式定义为标记物拟合公式,当第i轮对应的全部标记物拟合值累加结果大于第三阈值,令i=i+1,并在i未达到阈值轮数时返回步骤S22,否则执行步骤S27;
    S27,将所有轮中全部标记物拟合值累加结果最小值对应的初始标记物拟合公式定义为标记物拟合公式。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版的方法包括:
    基于地面拟合公式构建地面方程,以及基于标记物拟合公式构建标记物方程;
    遍历第一深度图像中的图像点,分别代入地面方程和标记物方程得到该图像点的地面距离和标记物距离;
    筛选出地面距离小于地面阈值的图像点填充为地面蒙版,以及筛选出标记物距离小于标记物阈值的图像点填充为标记物蒙版;
    将地面蒙版和全部的标记物蒙版融合,得到当前场景的背景蒙版。
  5. 根据权利要求4所述的方法,其特征在于,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新的方法包括:
    依次将第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行大小值比对,m的初始值为1;
    识别深度值发生变化的像素点,将第m+1帧第二深度图像中对应位置像素点的深度值更新为比对结果中的大值,令m=m+1,重新对第m帧第二深度图像与第m+1帧第二深度图像中各对应位置像素点的深度值进行比对,直至得到最后一帧第二深度图像中各位置像素点及其对应的深度值;
    将最后一帧第二深度图像中各位置像素点及其对应的深度值与背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
    识别深度值发生变化的像素点,将背景蒙版中对应位置像素点的深度值更新为比对结果中的小值。
  6. 根据权利要求5所述的方法,其特征在于,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域的方法包括:
    将实时获取的第三深度图像中的各位置像素点及其对应的深度值与更新后的背景蒙版中各位置像素点及其对应的深度值进行大小值比对;
    识别所述第三深度图像中深度值变小的像素点,汇总得到包含人体像素的前景区域。
  7. 根据权利要求6所述的方法,其特征在于,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理的方法包括:
    根据设定的生长阈值,采用连通域标记算法识别出前景区域中的人体像素点集;
    识别前景区域中人体像素点集的数量,当为多个人体像素点集时分别计算每个人体像素点集的中心点;
    将得到的中心点两两连线并计算连线距离,同时将各条连线向地面区域正投影,分别获取到每条连线与地面方程的夹角
    Figure dest_path_image001
    基于连线距离及对应的夹角
    Figure 42464dest_path_image001
    ,得到两个人体像素点集对应的人体距离;
    当人体距离大于距离阈值时,将两个人体像素点集对应为两个不同人体产生的人体区域,反之将两个人体像素点集视为同一个人体产生的人体区域。
  8. 根据权利要求7所述的方法,其特征在于,得到人体检测数据的方法包括:
    基于设定的距离间隔采用降采样方式寻找一个人体区域或者多个人体区域的局部最高像素点;
    通过区域生长方式锁定一个人体区域或者多个人体区域的头部区域,同时利用地面方程计算一个人体区域或多个人体区域对应的人体检测数据,所述人体检测数据包括人体身高和头部的像素点坐标。
  9. 根据权利要求1-3、5-8任一项所述的方法,其特征在于,所述标记物区域为货架区域。
  10. 一种基于深度图像的行人检测装置,所述深度图像由深度摄像头俯拍获取,其特征在于,所述装置包括:
    拟合公式构建单元,基于第一深度图像中框选的地面区域构建地面拟合公式,以及基于至少一个标记物区域构建与标记物区域一一对应的标记物拟合公式;
    蒙版生成单元,将由地面拟合公式建立的地面蒙版以及各标记物拟合公式建立的标记物蒙版融合,得到当前场景的背景蒙版;
    蒙版更新单元,根据多帧连续的第二深度图像中的像素点以及背景蒙版中的像素点,对背景蒙版进行背景更新;
    前景区域识别单元,将实时获取的第三深度图像与更新后的背景蒙版进行像素点比对,锁定第三深度图像中包含人体像素的前景区域;
    人体检测单元,采用区域生长方式对前景区域中的人体区域进行合并和/或分割处理,得到人体检测数据。
PCT/CN2021/095972 2020-06-03 2021-05-26 基于深度图像的行人检测方法及装置 WO2021244364A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010493600.1A CN111652136B (zh) 2020-06-03 2020-06-03 基于深度图像的行人检测方法及装置
CN202010493600.1 2020-06-03

Publications (1)

Publication Number Publication Date
WO2021244364A1 true WO2021244364A1 (zh) 2021-12-09

Family

ID=72349808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095972 WO2021244364A1 (zh) 2020-06-03 2021-05-26 基于深度图像的行人检测方法及装置

Country Status (2)

Country Link
CN (1) CN111652136B (zh)
WO (1) WO2021244364A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876968A (zh) * 2024-03-11 2024-04-12 盛视科技股份有限公司 联合多目标的密集行人检测方法
CN117876968B (zh) * 2024-03-11 2024-05-28 盛视科技股份有限公司 联合多目标的密集行人检测方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652136B (zh) * 2020-06-03 2022-11-22 苏宁云计算有限公司 基于深度图像的行人检测方法及装置
CN112070053B (zh) * 2020-09-16 2023-02-28 青岛维感科技有限公司 背景图像的自更新方法、装置、设备及存储介质
CN112788818A (zh) * 2020-12-29 2021-05-11 欧普照明股份有限公司 控制方法、控制装置及电子设备
CN113065397B (zh) * 2021-03-02 2022-12-23 南京苏宁软件技术有限公司 行人检测方法及装置
CN117094965A (zh) * 2023-08-21 2023-11-21 深圳市宝安信息管道管理有限公司 一种基于图像识别算法的镜头画面质量分析方法及系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877131A (zh) * 2009-04-28 2010-11-03 青岛海信数字多媒体技术国家重点实验室有限公司 一种目标识别方法、装置及目标识别系统
US20140056471A1 (en) * 2012-08-23 2014-02-27 Qualcomm Incorporated Object tracking using background and foreground models
CN104123529A (zh) * 2013-04-25 2014-10-29 株式会社理光 人手检测方法及系统
CN105225217A (zh) * 2014-06-23 2016-01-06 株式会社理光 基于深度的背景模型更新方法和系统
CN109271944A (zh) * 2018-09-27 2019-01-25 百度在线网络技术(北京)有限公司 障碍物检测方法、装置、电子设备、车辆及存储介质
CN109344702A (zh) * 2018-08-23 2019-02-15 北京华捷艾米科技有限公司 基于深度图像和彩色图像的行人检测方法及装置
CN110930411A (zh) * 2019-11-20 2020-03-27 杭州光珀智能科技有限公司 一种基于深度相机的人体分割方法及系统
CN111144213A (zh) * 2019-11-26 2020-05-12 北京华捷艾米科技有限公司 一种对象检测方法和相关设备
CN111652136A (zh) * 2020-06-03 2020-09-11 苏宁云计算有限公司 基于深度图像的行人检测方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9311550B2 (en) * 2013-03-06 2016-04-12 Samsung Electronics Co., Ltd. Device and method for image processing
CN103389042A (zh) * 2013-07-11 2013-11-13 夏东 基于深度图像的地面自动检测以及场景高度计算的方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877131A (zh) * 2009-04-28 2010-11-03 青岛海信数字多媒体技术国家重点实验室有限公司 一种目标识别方法、装置及目标识别系统
US20140056471A1 (en) * 2012-08-23 2014-02-27 Qualcomm Incorporated Object tracking using background and foreground models
CN104123529A (zh) * 2013-04-25 2014-10-29 株式会社理光 人手检测方法及系统
CN105225217A (zh) * 2014-06-23 2016-01-06 株式会社理光 基于深度的背景模型更新方法和系统
CN109344702A (zh) * 2018-08-23 2019-02-15 北京华捷艾米科技有限公司 基于深度图像和彩色图像的行人检测方法及装置
CN109271944A (zh) * 2018-09-27 2019-01-25 百度在线网络技术(北京)有限公司 障碍物检测方法、装置、电子设备、车辆及存储介质
CN110930411A (zh) * 2019-11-20 2020-03-27 杭州光珀智能科技有限公司 一种基于深度相机的人体分割方法及系统
CN111144213A (zh) * 2019-11-26 2020-05-12 北京华捷艾米科技有限公司 一种对象检测方法和相关设备
CN111652136A (zh) * 2020-06-03 2020-09-11 苏宁云计算有限公司 基于深度图像的行人检测方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117876968A (zh) * 2024-03-11 2024-04-12 盛视科技股份有限公司 联合多目标的密集行人检测方法
CN117876968B (zh) * 2024-03-11 2024-05-28 盛视科技股份有限公司 联合多目标的密集行人检测方法

Also Published As

Publication number Publication date
CN111652136B (zh) 2022-11-22
CN111652136A (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
WO2021244364A1 (zh) 基于深度图像的行人检测方法及装置
Lipton Local application of optic flow to analyse rigid versus non-rigid motion
Cohen et al. Detecting and tracking moving objects for video surveillance
JP5487298B2 (ja) 3次元画像生成
Cannons A review of visual tracking
CN102456225B (zh) 一种运动目标检测与跟踪方法和系统
US7620207B2 (en) Linking tracked objects that undergo temporary occlusion
CN112258600A (zh) 一种基于视觉与激光雷达的同时定位与地图构建方法
Tang et al. ESTHER: Joint camera self-calibration and automatic radial distortion correction from tracking of walking humans
TWI639136B (zh) 即時視訊畫面拼接方法
CN113065397B (zh) 行人检测方法及装置
US20110123067A1 (en) Method And System for Tracking a Target
Pareek et al. Re-projected SURF features based mean-shift algorithm for visual tracking
CN114943773A (zh) 相机标定方法、装置、设备和存储介质
Tian et al. Absolute head pose estimation from overhead wide-angle cameras
Wang et al. Video stabilization: A comprehensive survey
Foggia et al. Real-time tracking of single people and groups simultaneously by contextual graph-based reasoning dealing complex occlusions
Gruenwedel et al. Decentralized tracking of humans using a camera network
Xu et al. Moving target detection and tracking in FLIR image sequences based on thermal target modeling
Nithin et al. Multi-camera tracklet association and fusion using ensemble of visual and geometric cues
Rabe Detection of moving objects by spatio-temporal motion analysis
Dijk et al. Image processing in aerial surveillance and reconnaissance: from pixels to understanding
Vandewiele et al. Occlusion management strategies for pedestrians tracking across fisheye camera networks
Topçu et al. Occlusion-aware 3D multiple object tracker with two cameras for visual surveillance
Shere et al. 3D Multi Person Tracking With Dual 360° Cameras

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817431

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21817431

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21817431

Country of ref document: EP

Kind code of ref document: A1