CN115546829A

CN115546829A - Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera

Info

Publication number: CN115546829A
Application number: CN202211187402.8A
Authority: CN
Inventors: 寄珊珊; 李特; 朱世强; 孟启炜; 王文; 宛敏红
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2022-12-30

Abstract

The invention discloses a pedestrian spatial information sensing method and device based on a ZED stereo camera, which are mainly used for intelligently sensing the spatial position and the moving speed of a pedestrian by a guiding robot in public scenes such as an exhibition hall and the like. Real-time data in a scene are collected by a ZED binocular vision camera and uploaded to a cloud server; inputting the preprocessed RGB data into a deployed human key point detection network to obtain human key point two-dimensional information, and generating a pedestrian enclosure frame according to the human two-dimensional key point information of the upper half body trunk region of the pedestrian; continuously tracking multi-target pedestrians under continuous multiple frames; acquiring three-dimensional space coordinates of key points of the human body in a corresponding area by combining point cloud data, and calculating the spatial position and the moving speed of the pedestrian; and finally, the navigation robot performs body movement control according to the acquired pedestrian space information, and intelligent navigation tasks such as autonomous following and obstacle avoidance are completed, so that the flexibility of the navigation robot is increased, and the interaction experience of visitors is improved.

Description

Pedestrian Spatial Information Perception Method and Device Based on ZED Stereo Camera

技术领域technical field

本发明涉及机器视觉领域，尤其是涉及了基于ZED立体相机的行人空间信息感知方法及装置。The invention relates to the field of machine vision, in particular to a method and device for pedestrian spatial information perception based on a ZED stereo camera.

背景技术Background technique

展厅、博物馆等公共场所中采用智能导览机器人代替人工讲解员可以有效的节省人力。导览机器人需要对所处环境进行智能感知，而行人作为场景中的动态目标具有不确定性，对行人的空间位置及移动速度等空间信息进行智能感知具有重要意义。In exhibition halls, museums and other public places, the use of intelligent guide robots instead of human guides can effectively save manpower. The navigation robot needs to have intelligent perception of its environment, and pedestrians are uncertain as a dynamic target in the scene. It is of great significance to intelligently perceive spatial information such as the spatial position and moving speed of pedestrians.

视觉是机器人获取外部信息的重要方式。根据2D图像首先获取行人检测框，然后再根据坐标系转换关系及深度信息或者点云信息获取行人空间位置信息，这种方法依赖于行人检测框的精准度，而行人具有柔性及身体姿态不断变化的特性，传统的行人检测包围框在行人肢体变化时会存在大量背景噪声，导致后续的基于此包围框进行2D到3D空间转换也存在较大误差，从而进一步影响行人空间位置及移动速度的估计。Vision is an important way for robots to obtain external information. First obtain the pedestrian detection frame based on the 2D image, and then obtain the spatial position information of the pedestrian according to the coordinate system conversion relationship and depth information or point cloud information. This method relies on the accuracy of the pedestrian detection frame, and pedestrians are flexible and their body postures are constantly changing. The traditional pedestrian detection bounding box will have a lot of background noise when the pedestrian's body changes, resulting in a large error in the subsequent 2D to 3D space conversion based on this bounding box, which will further affect the estimation of the pedestrian's spatial position and moving speed .

发明内容Contents of the invention

为解决现有技术的不足，实现提高识别行人空间位置及行人移动速度的精度的目的，本发明采用如下的技术方案：In order to solve the deficiencies of the prior art and achieve the purpose of improving the accuracy of identifying the spatial position of pedestrians and the moving speed of pedestrians, the present invention adopts the following technical solutions:

一种基于立体相机的行人空间信息感知方法，包括如下步骤：A pedestrian spatial information perception method based on a stereo camera, comprising the steps of:

步骤S1：获取立体相机实时图像数据，包括RGB图像数据和点云数据；Step S1: Obtain real-time image data of the stereo camera, including RGB image data and point cloud data;

步骤S2：通过RGB图像检测人体关键点，得到行人的二维关键点信息，根据行人动态特性确定行人上半身主干区域，结合行人上半身主干区域的二维关键点信息，生成行人包围框，将行人包围框作为行人检测框；Step S2: Detect the key points of the human body through the RGB image to obtain the two-dimensional key point information of the pedestrian, determine the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combine the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate a pedestrian enclosing frame to surround the pedestrian frame as a pedestrian detection frame;

对生成的行人包围框按比例进行扩展，将扩展的行人包围框作为行人检测框。由于生成的行人包围框为行人最小的包围框，考虑最小行人包围框仅仅只是行人骨架的最小包围框，需要对最小行人包围框进行扩展；扩展的行人包围框与最小行人包围框的面积比值为1.2；The generated pedestrian bounding box is expanded proportionally, and the expanded pedestrian bounding box is used as the pedestrian detection box. Since the generated pedestrian bounding box is the smallest pedestrian bounding box, considering that the minimum pedestrian bounding box is only the minimum pedestrian bounding box, the minimum pedestrian bounding box needs to be expanded; the area ratio of the expanded pedestrian bounding box to the minimum pedestrian bounding box is 1.2;

步骤S3：根据二维关键点相似度特征和行人检测框，对连续多帧图像下的行人进行多目标跟踪；Step S3: According to the two-dimensional key point similarity feature and the pedestrian detection frame, perform multi-target tracking on the pedestrians under the continuous multi-frame images;

步骤S4：对于持续跟踪到的行人，根据其二维关键点信息结合点云数据，获取行人三维关键点信息，计算当前帧下，行人相对于立体相机坐标系的空间位置坐标，并结合帧间隔计算行人移动速度，生成行人相对于立体相机的实时空间信息。Step S4: For the continuously tracked pedestrians, according to their two-dimensional key point information and point cloud data, obtain pedestrian three-dimensional key point information, calculate the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system in the current frame, and combine the frame interval Calculate the pedestrian's moving speed and generate real-time spatial information of the pedestrian relative to the stereo camera.

进一步地，所述步骤S2中，获取RGB图像数据，采用人体关键点检测网络进行前向推理，输出关键点热力图和部分关联域，根据关键点热力图和部分关联域提取二维关键点并进行分组，将属于同一行人的二维关键点匹配到当前行人上，得到当前图像中每个行人的二维关键点坐标。Further, in the step S2, the RGB image data is obtained, and the human body key point detection network is used to perform forward reasoning, and the key point heat map and part of the correlation domain are output, and the two-dimensional key points are extracted according to the key point heat map and part of the correlation domain. Carry out grouping, match the 2D key points belonging to the same pedestrian to the current pedestrian, and obtain the 2D key point coordinates of each pedestrian in the current image.

进一步地，所述步骤S3包括如下步骤：Further, the step S3 includes the following steps:

步骤S3.1：根据行人检测框，获取行人运动特征；根据二维关键点相似度特征，获取行人的外观特征；Step S3.1: According to the pedestrian detection frame, obtain pedestrian motion features; according to the two-dimensional key point similarity feature, obtain pedestrian appearance features;

步骤S3.2：根据行人运动特征和行人的外观特征，得到当前t时刻下的行人实测状态信息；Step S3.2: Obtain the measured status information of pedestrians at the current time t according to the pedestrian movement characteristics and pedestrian appearance characteristics;

步骤S3.3：将历史轨迹与t时刻下的行人实测状态信息进行数据关联，得到t时刻下每个行人的ID；数据关联的目的是将当前时刻的检测结果与历史轨迹通过外观、几何等特征进行匹配，以此确定当前帧检测出的每个人的ID；Step S3.3: Data-associate historical trajectories with the measured status information of pedestrians at time t to obtain the ID of each pedestrian at time t; the purpose of data association is to combine the detection results at the current moment with historical trajectories through appearance, geometry, etc. Features are matched to determine the ID of each person detected in the current frame;

步骤S3.4：通过t时刻下每个行人的ID，更新历史轨迹，从而对行人进行持续跟踪。Step S3.4: Update the historical trajectory through the ID of each pedestrian at time t, so as to continuously track pedestrians.

进一步地，所述步骤S3.1中的二维关键点相似度特征，采用目标关键点相似性评价指标OKS进行相似度计算，并通过预设阈值判断二维关键点是否关联。Further, for the two-dimensional key point similarity feature in the step S3.1, the target key point similarity evaluation index OKS is used for similarity calculation, and a preset threshold is used to judge whether the two-dimensional key points are related.

进一步地，所述步骤S3.3中的数据关联，是对各帧进行运动特征和外观特征两种关联，进行线性加权，得到最终的关联矩阵，根据此关联矩阵采用匈牙利匹配算法得到各帧间行人匹配结果。Further, the data association in the step S3.3 is to associate two kinds of motion features and appearance features for each frame, and perform linear weighting to obtain the final association matrix. According to this association matrix, the Hungarian matching algorithm is used to obtain the inter-frame Pedestrian matching results.

进一步地，所述步骤S4包括如下步骤：Further, the step S4 includes the following steps:

步骤S4.1：根据所述行人上半身主干区域，结合二维关键点置信度进行筛选，根据筛选后的二维关键点，获取点云数据，得到行人关键点的三维坐标集合；Step S4.1: Screen according to the trunk area of the upper body of the pedestrian, combined with the confidence of the two-dimensional key points, obtain point cloud data according to the screened two-dimensional key points, and obtain the three-dimensional coordinate set of the key points of the pedestrian;

步骤S4.2：对于每一个行人目标，根据其关键点的三维坐标集合，计算三维坐标均值作为行人目标的空间位置坐标；并根据欧式距离计算行人相对于立体相机的实际距离，根据当前时刻的行人空间位置坐标及上一时刻的行人空间位置坐标，计算在当前时刻与上一时刻的时间间隔下，行人相对于立体相机的移动距离，再结合所用时间，得到当前时刻下的行人移动速度。Step S4.2: For each pedestrian target, according to the 3D coordinate set of its key points, calculate the mean value of the 3D coordinates as the spatial position coordinates of the pedestrian target; and calculate the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, according to the current moment The spatial position coordinates of pedestrians and the spatial position coordinates of pedestrians at the previous moment are used to calculate the moving distance of pedestrians relative to the stereo camera at the time interval between the current moment and the previous moment, and combined with the time used, the moving speed of pedestrians at the current moment is obtained.

进一步地，所述步骤S4.2中行人移动速度公式如下：Further, the pedestrian moving speed formula in the step S4.2 is as follows:

其中，X、Y、Z分别为行人的三维空间位置坐标，i表示当前跟踪到的行人ID，t表示当前帧的时间，

、

、

表示行人在当前帧下相对于立体相机的空间位置坐标，

、

、

分别表示在

时刻下相对立体相机的空间位置坐标，

， m表示帧间隔数，f为立体相机帧率。Among them, X, Y, and Z are the three-dimensional spatial position coordinates of pedestrians, i represents the currently tracked pedestrian ID, and t represents the time of the current frame,

,

Indicates the spatial position coordinates of the pedestrian relative to the stereo camera in the current frame,

,

respectively expressed in

The spatial position coordinates of the relative stereo camera at the moment,

, m represents the number of frame intervals, and f is the frame rate of the stereo camera.

一种基于立体相机的行人空间信息感知装置，包括实时图像数据获取模块、行人检测框获取模块、多目标跟踪模块、实时空间信息生成模块；A pedestrian spatial information perception device based on a stereo camera, comprising a real-time image data acquisition module, a pedestrian detection frame acquisition module, a multi-target tracking module, and a real-time spatial information generation module;

所述实时图像数据获取模块，通过立体相机获取RGB图像数据和点云数据；The real-time image data acquisition module acquires RGB image data and point cloud data through a stereo camera;

所述行人检测框获取模块，通过RGB图像检测人体关键点，得到行人的二维关键点信息，根据行人动态特性确定行人上半身主干区域，结合行人上半身主干区域的二维关键点信息，生成行人包围框，将行人包围框作为行人检测框；The pedestrian detection frame acquisition module detects the key points of the human body through the RGB image, obtains the two-dimensional key point information of the pedestrian, determines the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combines the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate pedestrian surround Frame, the pedestrian bounding box is used as the pedestrian detection frame;

所述多目标跟踪模块，根据二维关键点相似度特征和行人检测框，对连续多帧图像下的行人进行多目标跟踪；The multi-target tracking module, according to the two-dimensional key point similarity feature and the pedestrian detection frame, performs multi-target tracking to pedestrians under continuous multi-frame images;

所述实时空间信息生成模块，对于持续跟踪到的行人，根据其二维关键点信息结合点云数据，获取行人三维关键点信息，计算当前帧下，行人相对于立体相机坐标系的空间位置坐标，并结合帧间隔计算行人移动速度，生成行人相对于立体相机的实时空间信息。The real-time spatial information generation module obtains the pedestrian's three-dimensional key point information for the continuously tracked pedestrian according to its two-dimensional key point information combined with point cloud data, and calculates the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system under the current frame , and combine the frame interval to calculate the pedestrian moving speed, and generate the real-time spatial information of the pedestrian relative to the stereo camera.

一种基于ZED相机的行人空间信息感知方法，根据所述的基于立体相机的行人空间信息感知方法，采用设置在导览机器人上的ZED双目视觉相机采集实时图像数据，并传输至云端服务器进行行人检测框获取、多目标跟踪和生成实时空间信息，其中对RGB图像进行预处理，包括调整图像尺寸resize及编码，以提高后续的数据传输效率，预处理后的数据通过消息中间件传输至云端服务器，ZED双目视觉相机的视域范围与行人上半身主干区域配合设置，根据行人动态特性以及与导览机器人上ZED双目视觉相机的视域范围，确定行人上半身主干区域，云端服务器将行人相对于ZED双目视觉相机的实时空间信息传输至导览机器人，导览机器人根据所述实时空间信息进行本体的移动控制，完成导览任务。A pedestrian spatial information perception method based on a ZED camera. According to the pedestrian spatial information perception method based on a stereo camera, a ZED binocular vision camera installed on a tour robot is used to collect real-time image data and transmit it to a cloud server for Pedestrian detection frame acquisition, multi-target tracking and real-time spatial information generation, including preprocessing of RGB images, including adjusting image size resize and encoding, to improve subsequent data transmission efficiency, preprocessed data is transmitted to the cloud through message middleware The server, the ZED binocular vision camera’s field of view range is set in conjunction with the trunk area of the upper body of the pedestrian. According to the dynamic characteristics of the pedestrian and the field of view of the ZED binocular vision camera on the guide robot, the trunk area of the upper body of the pedestrian is determined, and the cloud server compares the pedestrian’s The real-time spatial information from the ZED binocular vision camera is transmitted to the guide robot, and the guide robot controls the movement of the main body according to the real-time spatial information to complete the guide task.

一种基于ZED相机的行人空间信息感知装置，包括云端服务器和设置在导览机器人上的ZED双目视觉相机，所述云端服务器包括行人检测框获取模块、多目标跟踪模块、实时空间信息生成模块；A pedestrian spatial information perception device based on a ZED camera, including a cloud server and a ZED binocular vision camera arranged on a navigation robot, the cloud server includes a pedestrian detection frame acquisition module, a multi-target tracking module, and a real-time spatial information generation module ;

所述ZED双目视觉相机实时获取RGB图像数据和点云数据，并传输至云端服务器；The ZED binocular vision camera acquires RGB image data and point cloud data in real time, and transmits them to the cloud server;

所述行人检测框获取模块，通过RGB图像检测人体关键点，得到行人的二维关键点信息，根据行人动态特性确定行人上半身主干区域，结合行人上半身主干区域的二维关键点信息，生成行人包围框，将行人包围框作为行人检测框；其中ZED双目视觉相机的视域范围与行人上半身主干区域配合设置，根据行人动态特性以及与导览机器人上ZED双目视觉相机的视域范围，确定行人上半身主干区域；The pedestrian detection frame acquisition module detects the key points of the human body through the RGB image, obtains the two-dimensional key point information of the pedestrian, determines the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combines the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate pedestrian surround Frame, the pedestrian bounding box is used as the pedestrian detection frame; the field of view of the ZED binocular vision camera is set in conjunction with the trunk area of the upper body of the pedestrian, and is determined according to the dynamic characteristics of pedestrians and the field of view of the ZED binocular vision camera on the navigation robot. Pedestrian upper body trunk area;

所述实时空间信息生成模块，对于持续跟踪到的行人，根据其二维关键点信息结合点云数据，获取行人三维关键点信息，计算当前帧下，行人相对于立体相机坐标系的空间位置坐标，并结合帧间隔计算行人移动速度，生成行人相对于立体相机的实时空间信息；The real-time spatial information generation module obtains the pedestrian's three-dimensional key point information for the continuously tracked pedestrian according to its two-dimensional key point information combined with point cloud data, and calculates the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system under the current frame , and combine the frame interval to calculate the pedestrian's moving speed, and generate the real-time spatial information of the pedestrian relative to the stereo camera;

所述导览机器人，从云端服务器获取行人相对于ZED双目视觉相机的实时空间信息，进行本体的移动控制，完成导览任务。The navigation robot obtains the real-time spatial information of pedestrians relative to the ZED binocular vision camera from the cloud server, performs movement control of the main body, and completes the navigation task.

本发明的优势和有益效果在于：Advantage and beneficial effect of the present invention are:

本发明采用无接触式的安装在导览机器人头部的ZED双目视觉相机对场景内的行人进行智能感知；采用云端部署的轻量化人体关键点检测网络获取二维关键点信息，可以充分利用云端服务器强大的计算资源及存储资源从而有效的解决机器人本体算力不足的问题；考虑人体运动特性及导览机器人视域范围，确定人体上半身主干区域的关键点作为目标区域，此区域不包含手臂等变化较大的人体关键点，根据此区域生成的行人检测框更加稳定，与此区域的行人关键点三维坐标计算的行人空间位置及行人移动速度的精度也更高。The present invention uses a non-contact ZED binocular vision camera installed on the head of the navigation robot to intelligently perceive pedestrians in the scene; a lightweight human body key point detection network deployed in the cloud is used to obtain two-dimensional key point information, which can be fully utilized The powerful computing resources and storage resources of the cloud server can effectively solve the problem of insufficient computing power of the robot body; considering the movement characteristics of the human body and the range of vision of the navigation robot, the key points of the trunk area of the upper body of the human body are determined as the target area, which does not include the arm For the key points of the human body with large changes, the pedestrian detection frame generated based on this area is more stable, and the accuracy of the pedestrian spatial position and pedestrian moving speed calculated with the 3D coordinates of the pedestrian key points in this area is also higher.

附图说明Description of drawings

图1是本发明的基于立体相机视觉的行人空间信息感知方法流程图。Fig. 1 is a flow chart of the pedestrian spatial information perception method based on stereo camera vision of the present invention.

图2是本发明实施例中面向导览机器人的基于ZED视觉的行人空间信息感知场景化应用示意图。Fig. 2 is a schematic diagram of a scenario-based application of pedestrian spatial information perception based on ZED vision for a navigation robot in an embodiment of the present invention.

图3a是本发明实施例中人体关键点参数图。Fig. 3a is a parameter diagram of key points of a human body in an embodiment of the present invention.

图3b是本发明实施例中基于行人上半身二维关键点生成行人包围框的示意图。Fig. 3b is a schematic diagram of generating a bounding box of a pedestrian based on two-dimensional key points of the upper body of the pedestrian in an embodiment of the present invention.

图3c是本发明实施例中三维关键点示意图。Fig. 3c is a schematic diagram of three-dimensional key points in the embodiment of the present invention.

图4是本发明实施例中实测场景下基于ZED视觉的行人空间信息感知的可视化流程图。Fig. 4 is a visual flowchart of pedestrian spatial information perception based on ZED vision in an actual measurement scene in an embodiment of the present invention.

图5是本发明实施例中基于人体关键点与传统基于人体检测框的行人测距误差对比图。Fig. 5 is a comparison diagram of pedestrian distance measurement errors based on human body key points and traditional human body detection frames in the embodiment of the present invention.

图6是本发明的基于ZED相机的行人空间信息感知方法流程图。Fig. 6 is a flow chart of the method for sensing spatial information of pedestrians based on a ZED camera of the present invention.

图7是本发明实施例中基于立体相机的行人空间信息感知设备的结构示意图。Fig. 7 is a schematic structural diagram of a pedestrian spatial information perception device based on a stereo camera in an embodiment of the present invention.

具体实施方式detailed description

以下结合附图对本发明的具体实施方式进行详细说明。应当理解的是，此处所描述的具体实施方式仅用于说明和解释本发明，并不用于限制本发明。Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

如图1所示，基于立体相机视觉的行人空间信息感知方法，包括如下步骤：As shown in Figure 1, the pedestrian spatial information perception method based on stereo camera vision includes the following steps:

本发明实施例中，如图2所示，ZED双目视觉相机安装于轮式机器人的头部，距离地面水平高度约1.2米。获取ZED相机（左目）的RGB数据和点云数据，帧率为30Hz，RGB彩色图像的分辨率为1280×720。对RGB图像进行预处理，包括resize及编码，以提高后续的数据传输效率，resize的尺寸为456×256，与后续人体关键点检测网络的输入相一致，预处理后的数据通过消息中间件传输至云端服务器。In the embodiment of the present invention, as shown in Figure 2, the ZED binocular vision camera is installed on the head of the wheeled robot, about 1.2 meters above the ground level. Obtain the RGB data and point cloud data of the ZED camera (left eye), the frame rate is 30Hz, and the resolution of the RGB color image is 1280×720. Preprocess the RGB image, including resize and encoding, to improve the efficiency of subsequent data transmission. The size of resize is 456×256, which is consistent with the input of the subsequent human key point detection network. The preprocessed data is transmitted through the message middleware to the cloud server.

获取RGB图像数据，采用人体关键点检测网络进行前向推理，输出关键点热力图和部分关联域，根据关键点热力图和部分关联域提取二维关键点并进行分组，将属于同一行人的二维关键点匹配到当前行人上，得到当前图像中每个行人的二维关键点坐标。Obtain RGB image data, use human body key point detection network to perform forward reasoning, output key point heat map and partial correlation domain, extract two-dimensional key points and group them according to key point heat map and partial correlation domain, and group two persons belonging to the same pedestrian The two-dimensional key points are matched to the current pedestrian, and the two-dimensional key point coordinates of each pedestrian in the current image are obtained.

本发明实施例中，云端服务器接收来自机器人端采集的ZED相机的实时数据，输入到部署好的人体关键点检测网络进行前向推理，人体关键点检测网络采用LightWeightOpenPose，其主干网络采用的是改进后的MoblieNet，可满足实时性要求，通过此框架获取行人在图像中的二维关键点信息。网络的输入是解码后的RGB数据，大小为 [1,3,256,456]，前向推理网络的输出分别为关键点热力图（Heatmaps）和部分关联域（Part AffinityField, PAFs），大小分别为[1, 19, 32, 57]、[1, 38, 32, 57]，根据Heatmaps和PAFs提取所有关键点并进行分组，将属于同一个行人的关键点匹配到当前行人上，得到当前图像中每个行人的关键点坐标，假设当前图像检测的行人数为N，则输出为N×[18,3]，其中18表示每个行人的关键点个数为18，3表示关键点在图像坐标系下的横轴x、纵轴y及置信度，置信度范围为0-1；In the embodiment of the present invention, the cloud server receives real-time data from the ZED camera collected by the robot, and inputs it to the deployed key point detection network of the human body for forward reasoning. The key point detection network of the human body adopts LightWeightOpenPose, and its backbone network adopts the improved The later MoblieNet can meet the real-time requirements, and obtain the two-dimensional key point information of pedestrians in the image through this framework. The input of the network is the decoded RGB data, the size is [1,3,256,456], and the output of the forward reasoning network is the key point heatmap (Heatmaps) and the partial association field (Part AffinityField, PAFs), the size is [1, 19, 32, 57], [1, 38, 32, 57], extract all key points according to Heatmaps and PAFs and group them, match the key points belonging to the same pedestrian to the current pedestrian, and get each pedestrian in the current image Assuming that the number of pedestrians detected in the current image is N, the output is N×[18,3], where 18 means that the number of key points of each pedestrian is 18, and 3 means the number of key points in the image coordinate system Horizontal axis x, vertical axis y and confidence level, the confidence range is 0-1;

由于行人具有姿态变化的特性，尤其是人体的四肢随着人的运动变换较大，并且当人距离机器人较近时，机器人的视域只能看到人体的上半部分，综合考虑这两个因素的影响，根据行人动态特性及导览机器人的视域范围，确定行人上半身主干区域为我们人体目标区域；此区域内包含的人体关键点为：{0:NOSE, 1:NECK, 2:RIGHT_SHOULDER, 5:LEFT_SHOULDER, 8:RTGHT_HIP, 11:LEFT_HIP, 14:RIGHT_EYE, 15:LEFT_EYE, 16:RIGHT_EAR, 17:LEFT_EAR}，根据上述二维关键点生成行人包围框，如图3a中“人体关键点”所示；根据此区域的二维关键点，采用OpenCV的boundingRect函数生成最小包围框，如图3b中2D关键点示意图中的P1所示，用虚线表；考虑此包围框仅仅只是行人骨架的最小包围框，需要进行适当的扩充，扩充后的包围框如P2所示，用实线表示，P2与P1的面积比值为1.2，将此包围框作为后续的行人跟踪算法中的行人检测框。Since pedestrians have the characteristics of posture changes, especially the limbs of the human body change greatly with the movement of the human body, and when the human body is close to the robot, the robot's field of vision can only see the upper part of the human body. Considering these two Influenced by factors, according to the dynamic characteristics of pedestrians and the range of vision of the navigation robot, the trunk area of the upper body of the pedestrian is determined as our human body target area; the key points of the human body contained in this area are: {0:NOSE, 1:NECK, 2:RIGHT_SHOULDER , 5:LEFT_SHOULDER, 8:RTGHT_HIP, 11:LEFT_HIP, 14:RIGHT_EYE, 15:LEFT_EYE, 16:RIGHT_EAR, 17:LEFT_EAR}, generate pedestrian bounding boxes according to the above two-dimensional key points, as shown in Figure 3a "Human Key Points" As shown; according to the two-dimensional key points in this area, the boundingRect function of OpenCV is used to generate the minimum bounding box, as shown in P1 in the 2D key point schematic diagram in Figure 3b, which is represented by a dotted line; considering that this bounding box is only the minimum of the pedestrian skeleton The bounding box needs to be appropriately expanded. The expanded bounding box is shown in P2, which is represented by a solid line. The area ratio of P2 to P1 is 1.2. This bounding box is used as the pedestrian detection box in the subsequent pedestrian tracking algorithm.

步骤S3：根据二维关键点相似度特征和行人检测框，对连续多帧图像下的行人进行多目标跟踪，包括如下步骤：Step S3: According to the two-dimensional key point similarity feature and the pedestrian detection frame, perform multi-target tracking on the pedestrians under the continuous multi-frame images, including the following steps:

步骤S3.1：根据行人检测框，获取行人运动特征；根据二维关键点相似度特征，获取行人的外观特征；二维关键点相似度特征，采用目标关键点相似性评价指标OKS进行相似度计算，并通过预设阈值判断二维关键点是否关联；Step S3.1: According to the pedestrian detection frame, obtain pedestrian motion features; according to the two-dimensional key point similarity feature, obtain the pedestrian appearance feature; two-dimensional key point similarity feature, use the target key point similarity evaluation index OKS to perform similarity Calculate, and judge whether the two-dimensional key points are related through the preset threshold;

步骤S3.3：将历史轨迹与t时刻下的行人实测状态信息进行数据关联，得到t时刻下每个行人的ID；数据关联是对各帧进行运动特征和外观特征两种关联，进行线性加权，得到最终的关联矩阵，根据此关联矩阵采用匈牙利匹配算法得到各帧间行人匹配结果；Step S3.3: Data-associate historical trajectories with the measured status information of pedestrians at time t to obtain the ID of each pedestrian at time t; data association is to associate motion features and appearance features of each frame, and carry out linear weighting , get the final correlation matrix, and use the Hungarian matching algorithm to get the pedestrian matching results between frames according to the correlation matrix;

步骤S3.4：通过t时刻下每个行人的ID，更新历史轨迹。Step S3.4: Update the historical trajectory through the ID of each pedestrian at time t.

本发明实施例中，对传输至云端的各帧RGB图像中的行人目标，我们通过多目标跟踪方法为其分配唯一的身份信息（ID）来进行帧间持续跟踪。所述多目标行人跟踪方法采用DeepSort算法原理。根据步骤S2中生成的行人检测框获取行人运动特征，根据二维关键点相似度特征获取行人的外观特征，采用OKS（object keypoint similarity）进行相似度计算，并通过设置的阈值进行关联成功与否的判断，融合这两种特征信息可以获取当前t时刻下的行人实测状态信息；再将历史轨迹与t时刻的状态结果进行数据关联，结合运动特征和外观特征两种关联进行线性加权作为最终的关联矩阵，根据此关联矩阵采用匈牙利匹配算法得到匹配结果，即可得到t时刻每个行人目标的ID，最后用t时刻的结果更新历史轨迹，从而获取行人的身份ID。In the embodiment of the present invention, for the pedestrian targets in each frame of RGB images transmitted to the cloud, we assign unique identity information (ID) to them through a multi-target tracking method for inter-frame continuous tracking. The multi-target pedestrian tracking method adopts the principle of DeepSort algorithm. Obtain pedestrian motion features based on the pedestrian detection frame generated in step S2, obtain pedestrian appearance features based on two-dimensional key point similarity features, use OKS (object keypoint similarity) for similarity calculation, and use the set threshold to determine whether the association is successful or not The judgment of the two kinds of characteristic information can be combined to obtain the measured state information of pedestrians at the current time t; then the historical trajectory and the state results at the time t are associated with data, and the two associations of motion characteristics and appearance characteristics are combined to carry out linear weighting as the final The correlation matrix, according to the correlation matrix, the Hungarian matching algorithm is used to obtain the matching result, and the ID of each pedestrian target at time t can be obtained, and finally the historical trajectory is updated with the result at time t, so as to obtain the ID of the pedestrian.

数据关联的目的是将当前时刻的检测结果与历史轨迹通过外观、几何等特征进行匹配，以此确定当前帧检测出的每个人的ID，采用运动特征和外观特征两种关联进行线性加权作为最终的关联矩阵，根据此关联矩阵采用匈牙利匹配算法得到匹配结果。The purpose of data association is to match the detection results at the current moment with the historical trajectory through appearance, geometry and other features, so as to determine the ID of each person detected in the current frame, and linearly weight the two associations of motion features and appearance features as the final According to the incidence matrix, the Hungarian matching algorithm is used to obtain the matching result.

步骤S4：对于持续跟踪到的行人，根据其二维关键点信息结合点云数据，获取行人三维关键点信息，计算当前帧下，行人相对于立体相机坐标系的空间位置坐标，并结合帧间隔计算行人移动速度，生成行人相对于立体相机的实时空间信息，包括如下步骤：Step S4: For the continuously tracked pedestrians, according to their two-dimensional key point information and point cloud data, obtain pedestrian three-dimensional key point information, calculate the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system in the current frame, and combine the frame interval Calculating the moving speed of pedestrians and generating real-time spatial information of pedestrians relative to the stereo camera, including the following steps:

本发明实施例中，根据步骤2中所确定的行人上半身主干区域，结合关键点置信度进行筛选，置信度大于0.6的关键点参与接下来的计算，再根据获取的ZED点云数据检索相应的三维关键点，获取相应的行人关键点的三维坐标集合；In the embodiment of the present invention, according to the trunk area of the upper body of the pedestrian determined in step 2, the key points are screened based on the confidence of the key points, and the key points with a confidence greater than 0.6 participate in the next calculation, and then retrieve the corresponding points according to the obtained ZED point cloud data. 3D key points, to obtain the 3D coordinate set of the corresponding pedestrian key points;

人体关键点检测网络输出行人在图像坐标系下的2D关键点坐标集合为

)},其中k为关键点个数，取值为k=0,1,2,5,8,11,14,15,16,17，

为相应的置信度，如图3b中“2D关键点示意图”实心原点所示。结合ZED相机采集的点云信息进行检索，我们根据行人主干区域并结合二维关键点置信度可以获取行人相应的三维坐标，行人所有二维关键点对应的三维坐标如图3c中“3D关键点示意图”所示，包含主干区域的人体关键点的三维坐标集合为

)}，如图3c中三角形图标所示，该坐标系是以ZED相机左目摄像头为基准，通过相机参数COORDINATE_SYSTEM.LEFT_HANDED_Y_UP 进行设置，坐标系及XYZ轴的方向如图2所示；The human body key point detection network outputs the 2D key point coordinate set of pedestrians in the image coordinate system as

)}, where k is the number of key points, the value is k=0,1,2,5,8,11,14,15,16,17,

is the corresponding confidence, as shown by the solid origin in the "2D key point schematic diagram" in Figure 3b. Combined with the point cloud information collected by the ZED camera for retrieval, we can obtain the corresponding 3D coordinates of pedestrians according to the pedestrian backbone area and the confidence of 2D key points. The 3D coordinates corresponding to all 2D key points of pedestrians are shown in "3D key points As shown in the schematic diagram, the three-dimensional coordinate set of the key points of the human body including the backbone area is

)}, as shown by the triangle icon in Figure 3c, the coordinate system is based on the left-eye camera of the ZED camera, and is set through the camera parameter COORDINATE_SYSTEM.LEFT_HANDED_Y_UP, and the coordinate system and the direction of the XYZ axis are shown in Figure 2;

步骤S4.2：对于每一个行人目标，根据其关键点的三维坐标集合，计算三维坐标均值作为行人目标的空间位置坐标；并根据欧式距离计算行人相对于立体相机的实际距离，根据当前时刻的行人空间位置坐标及上一时刻的行人空间位置坐标，计算在当前时刻与上一时刻的时间间隔下，行人相对于立体相机的移动距离，再结合所用时间，得到当前时刻下的行人移动速度；Step S4.2: For each pedestrian target, according to the 3D coordinate set of its key points, calculate the mean value of the 3D coordinates as the spatial position coordinates of the pedestrian target; and calculate the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, according to the current moment The spatial position coordinates of pedestrians and the spatial position coordinates of pedestrians at the previous moment are used to calculate the moving distance of pedestrians relative to the stereo camera under the time interval between the current moment and the previous moment, and combined with the time used, the pedestrian moving speed at the current moment is obtained;

本发明实施例中，根据获取的行人人体关键点三维坐标集合，计算其均值作为该目标行人的空间三维坐标位置，即

，其中N=10，表示每一个行人上半身区域的关键点的数量； i表示当前跟踪到的行人ID，设当前帧的时间为t，则行人在当前帧下相对于机器人的空间位置为

，在

时刻相对机器人的空间位置为

，则行人的移动速度为

，其中，

， f为ZED相机帧率，m表示帧间隔数，综合考虑算法总耗时，m的取值为

；In the embodiment of the present invention, according to the acquired three-dimensional coordinate set of key points of the pedestrian's human body, the average value is calculated as the spatial three-dimensional coordinate position of the target pedestrian, namely

, where N=10, represents the number of key points in the upper body area of each pedestrian; i represents the currently tracked pedestrian ID, if the time of the current frame is t, then the spatial position of the pedestrian relative to the robot in the current frame is

,exist

The spatial position relative to the robot at all times is

, then the moving speed of the pedestrian is

,in,

, f is the frame rate of the ZED camera, m is the number of frame intervals, considering the total time consumption of the algorithm, the value of m is

;

如图4所示，ZED相机获取RBG数据及点云数据，然后根据人体关键点检测算法获取的人体关键点二维坐标，最后结合二维关键点信息及相应的点云信息获取的人体关键点的三维坐标。为了验证本发明方法的有效性，采集了约1000帧图像，采用基于YOLOV3的人体检测框结合点云信息计算行人到相机的距离以及本发明的基于人体主干区域的关键点计算行人到相机的距离，分别进行了对比测试与统计，行人到相机的距离采用欧式距离公式进行计算：

，其中（X,Y,Z）为行人的三维空间坐标，如图5所示，深色线表示本发明的方法，浅色线表示对比方法，可以看出，在行人运动过程中，基于人体检测框的方法会随着人体姿态的变化引入较大噪声，而我们的方法抗干扰性更好，能够获取更精准的行人定位信息。As shown in Figure 4, the ZED camera obtains RBG data and point cloud data, and then obtains the two-dimensional coordinates of the key points of the human body according to the key point detection algorithm of the human body, and finally combines the two-dimensional key point information and the corresponding point cloud information to obtain the key points of the human body three-dimensional coordinates. In order to verify the effectiveness of the method of the present invention, about 1000 frames of images were collected, and the distance from the pedestrian to the camera was calculated using the human body detection frame based on YOLOV3 combined with the point cloud information and the distance from the pedestrian to the camera was calculated based on the key points of the main body area of the present invention , comparative tests and statistics were carried out respectively, and the distance from the pedestrian to the camera was calculated using the Euclidean distance formula:

, where (X, Y, Z) are the three-dimensional space coordinates of pedestrians, as shown in Figure 5, the dark line represents the method of the present invention, and the light color line represents the comparison method, it can be seen that during the movement of pedestrians, based on the human body The detection frame method will introduce larger noise with the change of human body posture, while our method has better anti-interference ability and can obtain more accurate pedestrian positioning information.

一种基于立体相机的行人空间信息感知装置，用于实现所述的基于立体相机的行人空间信息感知方法，包括实时图像数据获取模块、行人检测框获取模块、多目标跟踪模块、实时空间信息生成模块；A pedestrian spatial information perception device based on a stereo camera, used to realize the pedestrian spatial information perception method based on a stereo camera, including a real-time image data acquisition module, a pedestrian detection frame acquisition module, a multi-target tracking module, and real-time spatial information generation module;

这部分内容实施方式与上述方法实施例的实施方式类似，此处不再赘述。The implementation manners of this part of the content are similar to the implementation manners of the foregoing method embodiments, and will not be repeated here.

如图6所示，一种基于ZED相机的行人空间信息感知方法，根据所述的基于立体相机的行人空间信息感知方法，采用设置在导览机器人上的ZED双目视觉相机采集实时图像数据，并传输至云端服务器进行行人检测框获取、多目标跟踪和生成实时空间信息，其中，ZED双目视觉相机的视域范围与行人上半身主干区域配合设置，根据行人动态特性以及与导览机器人上ZED双目视觉相机的视域范围，确定行人上半身主干区域，云端服务器将行人相对于ZED双目视觉相机的实时空间信息传输至导览机器人，导览机器人根据所述实时空间信息进行本体的移动控制，完成自主跟随和避障等导览任务。As shown in Figure 6, a pedestrian spatial information perception method based on a ZED camera, according to the pedestrian spatial information perception method based on a stereo camera, adopts a ZED binocular vision camera installed on a tour robot to collect real-time image data, And transmit it to the cloud server for pedestrian detection frame acquisition, multi-target tracking and real-time spatial information generation. Among them, the field of view of the ZED binocular vision camera is set in conjunction with the trunk area of the upper body of the pedestrian. The field of view of the binocular vision camera determines the trunk area of the upper body of the pedestrian, and the cloud server transmits the real-time spatial information of the pedestrian relative to the ZED binocular vision camera to the guide robot, and the guide robot performs the movement control of the main body according to the real-time spatial information , complete navigation tasks such as autonomous follow and obstacle avoidance.

具体地，包括如下步骤：Specifically, include the following steps:

步骤S101：获取导览机器人上的ZED双目视觉相机实时图像数据，包括RGB图像数据和点云数据，并传输至云端服务器；Step S101: Obtain the real-time image data of the ZED binocular vision camera on the tour robot, including RGB image data and point cloud data, and transmit them to the cloud server;

步骤S102：云端服务器通过RGB图像检测人体关键点，得到行人的二维关键点信息，根据行人动态特性确定行人上半身主干区域，结合行人上半身主干区域的二维关键点信息，生成行人包围框，将行人包围框作为行人检测框；其中，ZED双目视觉相机的视域范围与行人上半身主干区域配合设置，根据行人动态特性以及与导览机器人上ZED双目视觉相机的视域范围，确定行人上半身主干区域；Step S102: The cloud server detects the key points of the human body through the RGB image, obtains the two-dimensional key point information of the pedestrian, determines the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and generates the bounding box of the pedestrian by combining the two-dimensional key point information of the trunk area of the upper body of the pedestrian. The pedestrian bounding box is used as the pedestrian detection frame; among them, the field of view of the ZED binocular vision camera is set in conjunction with the trunk area of the upper body of the pedestrian, and the upper body of the pedestrian is determined according to the dynamic characteristics of the pedestrian and the field of view of the ZED binocular vision camera on the navigation robot backbone area;

步骤S103：云端服务器根据二维关键点相似度特征和行人检测框，对连续多帧图像下的行人进行多目标跟踪；Step S103: The cloud server performs multi-target tracking on the pedestrians under the continuous multi-frame images according to the two-dimensional key point similarity feature and the pedestrian detection frame;

步骤S104：云端服务器对于持续跟踪到的行人，根据其二维关键点信息结合点云数据，获取行人三维关键点信息，计算当前帧下，行人相对于ZED双目视觉相机坐标系的空间位置坐标，并结合帧间隔计算行人移动速度，生成行人相对于ZED双目视觉相机的实时空间信息；Step S104: For the continuously tracked pedestrian, the cloud server obtains the 3D key point information of the pedestrian according to the 2D key point information combined with the point cloud data, and calculates the spatial position coordinates of the pedestrian relative to the ZED binocular vision camera coordinate system in the current frame , and combine the frame interval to calculate the pedestrian's moving speed, and generate the real-time spatial information of the pedestrian relative to the ZED binocular vision camera;

步骤S105：云端服务器将行人相对于ZED双目视觉相机的实时空间信息传输至导览机器人，导览机器人根据所述实时空间信息进行本体的移动控制，完成导览任务。Step S105: The cloud server transmits the real-time spatial information of pedestrians relative to the ZED binocular vision camera to the guide robot, and the guide robot controls the movement of the main body according to the real-time spatial information to complete the guide task.

本发明实施例中，云端将当前时刻的行人ID、行人空间位置以及行人移动速度，存储于云端服务器的数据库中，以消息队列的方式进行存储，基本格式为data= {‘key1’:‘value1’, ‘key2’: ‘value2’, ‘key3’: ‘value3’}，其中key1、key2、key3分别为‘p_ID’、‘p_Pos3D’、 ‘p_Speed’，表示行人ID、行人三维空间坐标、行人移动速度，对应的value值即为计算出的具体的行人i的身份ID、行人三维空间坐标和行移动速度。根据机器人端的请求指令，云端通过RocketMQ消息中间件将数据实时发送到机器人端，机器人根据预先设定的指令进行本体移动控制，根据行人的实时位置信息进行自身移动速度的调整，并且当行人到机器人的距离小于安全距离时，机器人停止运动以免发生碰撞，并且根据行人的实时运动速度进行自身移动速度的调整，从而实现自主跟随及避障等智能导览任务。In the embodiment of the present invention, the cloud stores the pedestrian ID, the spatial position of the pedestrian, and the moving speed of the pedestrian at the current moment in the database of the cloud server, and stores them in the form of a message queue. The basic format is data={'key1':'value1 ', 'key2': 'value2', 'key3': 'value3'}, where key1, key2, and key3 are 'p_ID', 'p_Pos3D', and 'p_Speed' respectively, indicating pedestrian ID, pedestrian three-dimensional space coordinates, pedestrian movement Velocity, the corresponding value is the calculated identity ID of the specific pedestrian i, the three-dimensional coordinates of the pedestrian and the moving speed of the pedestrian. According to the request instructions from the robot side, the cloud sends the data to the robot side in real time through the RocketMQ message middleware. When the distance is less than the safety distance, the robot stops moving to avoid collisions, and adjusts its own moving speed according to the real-time moving speed of pedestrians, so as to realize intelligent navigation tasks such as autonomous following and obstacle avoidance.

一种基于ZED相机的行人空间信息感知装置，用于实现基于ZED相机的行人空间信息感知方法，包括云端服务器和设置在导览机器人上的ZED双目视觉相机，所述云端服务器包括行人检测框获取模块、多目标跟踪模块、实时空间信息生成模块；A pedestrian spatial information perception device based on a ZED camera, used to realize a pedestrian spatial information perception method based on a ZED camera, including a cloud server and a ZED binocular vision camera arranged on a navigation robot, the cloud server includes a pedestrian detection frame Acquisition module, multi-target tracking module, real-time spatial information generation module;

与前述基于立体相机视觉的行人空间信息感知方法的实施例相对应，本发明还提供了基于立体相机视觉的行人空间信息感知设备的实施例。Corresponding to the foregoing embodiment of the pedestrian spatial information perception method based on stereo camera vision, the present invention also provides an embodiment of a pedestrian spatial information perception device based on stereo camera vision.

参见图7，本发明实施例提供的基于立体相机视觉的行人空间信息感知设备，包括存储器和一个或多个处理器，存储器中存储有可执行代码，所述一个或多个处理器执行所述可执行代码时，用于实现上述实施例中的基于立体相机视觉的行人空间信息感知方法。Referring to FIG. 7 , the pedestrian spatial information perception device based on stereo camera vision provided by an embodiment of the present invention includes a memory and one or more processors, executable codes are stored in the memory, and the one or more processors execute the described When the code is executable, it is used to implement the pedestrian spatial information perception method based on stereo camera vision in the above embodiments.

本发明基于立体相机视觉的行人空间信息感知设备的实施例可以应用在任意具备数据处理能力的设备上，该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现，也可以通过硬件或者软硬件结合的方式实现。以软件实现为例，作为一个逻辑意义上的装置，是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言，如图7所示，为本发明基于立体相机视觉的行人空间信息感知设备所在任意具备数据处理能力的设备的一种硬件结构图，除了图7所示的处理器、内存、网络接口、以及非易失性存储器之外，实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能，还可以包括其他硬件，对此不再赘述。The embodiment of the pedestrian spatial information perception device based on stereo camera vision of the present invention can be applied to any device with data processing capability, and any device with data processing capability can be a device or device such as a computer. The device embodiments can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of any device capable of data processing. From the hardware level, as shown in Figure 7, it is a hardware structure diagram of any device with data processing capabilities where the pedestrian spatial information perception device based on stereo camera vision is located, except for the processor and memory shown in Figure 7 In addition to , network interface, and non-volatile memory, any device with data processing capability where the device in the embodiment is located usually can also include other hardware according to the actual function of any device with data processing capability. repeat.

上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程，在此不再赘述。For the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method for details, and will not be repeated here.

对于装置实施例而言，由于其基本对应于方法实施例，所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. It can be understood and implemented by those skilled in the art without creative effort.

本发明实施例还提供一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时，实现上述实施例中的基于立体相机视觉的行人空间信息感知方法。An embodiment of the present invention also provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the spatial information perception method for pedestrians based on stereo camera vision in the above-mentioned embodiments is implemented.

所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元，例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备，例如所述设备上配备的插接式硬盘、智能存储卡（Smart Media Card，SMC）、SD卡、闪存卡（Flash Card）等。进一步的，所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据，还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be an internal storage unit of any device capable of data processing described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of any device capable of data processing, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), an SD card, or a flash memory card equipped on the device. (Flash Card), etc. Further, the computer-readable storage medium may also include both an internal storage unit of any device capable of data processing and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by any device capable of data processing, and may also be used to temporarily store data that has been output or will be output.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments Modifications to the technical solutions, or equivalent replacement of some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. a pedestrian spatial information perception method based on stereo camera, it is characterized in that comprising the steps:

Step S1: Obtain real-time image data of the stereo camera, including RGB image data and point cloud data;

Step S2: Detect the key points of the human body through the RGB image to obtain the two-dimensional key point information of the pedestrian, determine the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combine the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate a pedestrian enclosing frame to surround the pedestrian frame as a pedestrian detection frame;

Step S3: According to the two-dimensional key point similarity feature and the pedestrian detection frame, perform multi-target tracking on the pedestrians under the continuous multi-frame images;

Step S4: For the continuously tracked pedestrians, according to their two-dimensional key point information and point cloud data, obtain pedestrian three-dimensional key point information, calculate the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system in the current frame, and combine the frame interval Calculate the pedestrian's moving speed and generate real-time spatial information of the pedestrian relative to the stereo camera.

2. The pedestrian spatial information perception method based on a stereo camera according to claim 1, characterized in that: in the step S2, RGB image data is obtained, and a human body key point detection network is used to perform forward reasoning, and a key point heat map is output and part of the correlation domain, extract and group two-dimensional key points according to the key point heat map and part of the correlation domain, match the two-dimensional key points belonging to the same pedestrian to the current pedestrian, and obtain the two-dimensional key points of each pedestrian in the current image coordinate.

3. The pedestrian spatial information perception method based on stereo camera according to claim 1, characterized in that: said step S3 comprises the following steps:

Step S3.1: According to the pedestrian detection frame, obtain pedestrian motion features; according to the two-dimensional key point similarity feature, obtain pedestrian appearance features;

Step S3.2: Obtain the measured status information of pedestrians at the current time t according to the pedestrian movement characteristics and pedestrian appearance characteristics;

Step S3.3: Data association of the historical trajectory with the measured status information of pedestrians at time t, to obtain the ID of each pedestrian at time t;

Step S3.4: Update the historical trajectory through the ID of each pedestrian at time t.

4. The pedestrian spatial information perception method based on a stereo camera according to claim 3, characterized in that: the two-dimensional key point similarity feature in the step S3.1 uses the target key point similarity evaluation index OKS to perform similar degree calculation, and judge whether the two-dimensional key points are related through the preset threshold.

5. The pedestrian spatial information perception method based on a stereo camera according to claim 3, characterized in that: the data association in the step S3.3 is to carry out two associations of motion features and appearance features for each frame, and perform linear Weighted to get the final correlation matrix, and according to the correlation matrix, the Hungarian matching algorithm is used to obtain the matching results of pedestrians between frames.

6. The pedestrian spatial information perception method based on stereo camera according to claim 1, characterized in that: said step S4 comprises the following steps:

Step S4.1: Screen according to the trunk area of the upper body of the pedestrian, combined with the confidence of the two-dimensional key points, obtain point cloud data according to the screened two-dimensional key points, and obtain the three-dimensional coordinate set of the key points of the pedestrian;

Step S4.2: For each pedestrian target, according to the 3D coordinate set of its key points, calculate the mean value of the 3D coordinates as the spatial position coordinates of the pedestrian target; and calculate the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, according to the current moment The spatial position coordinates of pedestrians and the spatial position coordinates of pedestrians at the previous moment are used to calculate the moving distance of pedestrians relative to the stereo camera at the time interval between the current moment and the previous moment, and combined with the time used, the moving speed of pedestrians at the current moment is obtained.

7. The pedestrian spatial information perception method based on a stereo camera according to claim 6, characterized in that: the pedestrian moving speed formula in the step S4.2 is as follows:

Among them, X, Y, and Z are the three-dimensional spatial position coordinates of pedestrians, i represents the currently tracked pedestrian ID, and t represents the time of the current frame,

,

,

,

,

respectively expressed in

The spatial position coordinates of the relative stereo camera at the moment,

8. A pedestrian space information perception device based on a stereo camera, used to realize the pedestrian space information perception method based on a stereo camera according to any one of claims 1-7, comprising a real-time image data acquisition module, a pedestrian detection frame acquisition Module, multi-target tracking module, real-time spatial information generation module, is characterized in that:

The real-time image data acquisition module acquires RGB image data and point cloud data through a stereo camera;

The pedestrian detection frame acquisition module detects the key points of the human body through the RGB image, obtains the two-dimensional key point information of the pedestrian, determines the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combines the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate pedestrian surround Frame, the pedestrian bounding box is used as the pedestrian detection frame;

The multi-target tracking module, according to the two-dimensional key point similarity feature and the pedestrian detection frame, performs multi-target tracking to pedestrians under continuous multi-frame images;

The real-time spatial information generation module obtains the pedestrian's three-dimensional key point information for the continuously tracked pedestrian according to its two-dimensional key point information combined with point cloud data, and calculates the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system under the current frame , and combine the frame interval to calculate the pedestrian moving speed, and generate the real-time spatial information of the pedestrian relative to the stereo camera.

9. A pedestrian spatial information perception method based on a ZED camera, characterized in that according to the pedestrian spatial information perception method based on a stereo camera according to any one of claims 1-7, the ZED dual-axis sensor installed on the navigation robot is adopted. The eye vision camera collects real-time image data and transmits it to the cloud server for pedestrian detection frame acquisition, multi-target tracking, and real-time spatial information generation. Among them, the field of view of the ZED binocular vision camera is set in conjunction with the trunk area of the upper body of the pedestrian. Features and the field of view of the ZED binocular vision camera on the guide robot, determine the trunk area of the upper body of the pedestrian, and the cloud server transmits the real-time spatial information of the pedestrian relative to the ZED binocular vision camera to the guide robot. Real-time spatial information is used to control the movement of the main body and complete the navigation task.

10. A pedestrian spatial information perception device based on a ZED camera, used to realize the pedestrian spatial information perception method based on a ZED camera according to claim 9, comprising a cloud server and a ZED binocular vision camera arranged on a navigation robot, It is characterized in that: the cloud server includes a pedestrian detection frame acquisition module, a multi-target tracking module, and a real-time spatial information generation module;

The ZED binocular vision camera acquires RGB image data and point cloud data in real time, and transmits them to the cloud server;

The pedestrian detection frame acquisition module detects the key points of the human body through the RGB image, obtains the two-dimensional key point information of the pedestrian, determines the trunk area of the upper body of the pedestrian according to the dynamic characteristics of the pedestrian, and combines the two-dimensional key point information of the trunk area of the upper body of the pedestrian to generate pedestrian surround Frame, the pedestrian bounding box is used as the pedestrian detection frame; the field of view of the ZED binocular vision camera is set in conjunction with the trunk area of the upper body of the pedestrian, and is determined according to the dynamic characteristics of pedestrians and the field of view of the ZED binocular vision camera on the navigation robot. Pedestrian upper body trunk area;

The real-time spatial information generation module obtains the pedestrian's three-dimensional key point information for the continuously tracked pedestrian according to its two-dimensional key point information combined with point cloud data, and calculates the spatial position coordinates of the pedestrian relative to the stereo camera coordinate system under the current frame , and combine the frame interval to calculate the pedestrian's moving speed, and generate the real-time spatial information of the pedestrian relative to the stereo camera;

The navigation robot obtains the real-time spatial information of pedestrians relative to the ZED binocular vision camera from the cloud server, performs movement control of the main body, and completes the navigation task.