CN115035162A

CN115035162A - Monitoring video personnel positioning and tracking method and system based on visual slam

Info

Publication number: CN115035162A
Application number: CN202210669897.1A
Authority: CN
Inventors: 闫丹凤; 蔡院强; 赵岳; 刘子豪; 郭熙东; 陈梦实
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-09-09
Anticipated expiration: 2042-06-14
Also published as: CN115035162B

Abstract

The invention discloses a method and a system for positioning and tracking monitoring video personnel based on visual slam.A depth camera is used for carrying out three-dimensional reconstruction on an environment to obtain a point cloud map of the environment; combining external reference calibration with a scene of a monitoring camera, and performing external reference calibration on the monitoring camera through a calibration plate to obtain position and posture information of the monitoring camera; the method comprises the steps of tracking personnel through a monitoring camera, identifying people in a monitored image by utilizing a deep neural network, calculating the three-dimensional positions of the people according to the ground priors of the pedestrians appearing in the monitoring by adopting the principle of inverse perspective transformation based on the previously calibrated positions and postures, drawing the tracks of the personnel in a constructed point cloud map, and enabling the tracks to be presented in the point cloud map.

Description

Method and system for location and tracking of surveillance video personnel based on visual slam

技术领域technical field

本发明涉及机器视觉技术领域，尤其涉及一种基于视觉slam的监控视频人员定位跟踪方法及系统。The invention relates to the technical field of machine vision, in particular to a method and system for positioning and tracking personnel in surveillance video based on visual slam.

背景技术Background technique

近年来，国内外视频监控市场爆发式增长，监控发展不断趋向高清化、智能化，然而传统二维监控系统需要频繁切换监控画面，存在多种弊端。随着人工智能的发展，将有越来越多的工作能够通过机器自动完成，使得人们能够从枯燥的工作中解放出来，且提高了工作的效率。In recent years, the video surveillance market at home and abroad has grown explosively, and surveillance development has been trending towards high-definition and intelligence. However, traditional two-dimensional surveillance systems require frequent switching of surveillance images, which has many drawbacks. With the development of artificial intelligence, more and more tasks will be completed automatically by machines, which frees people from boring work and improves work efficiency.

现有技术中，对场景下的人员位置判断一般通过Wi-Fi等来进行定位。Wi-Fi定位一般采用“近邻法”判断，即最靠近哪个热点或基站，即认为处在什么位置，如附近有多个信源，则可以通过交叉定位(三角定位)，提高定位精度。用户在使用智能手机时开启过Wi-Fi、移动蜂窝网络，就可能成为数据源，需要事先记录巨量的确定位置点的信号强度，通过用新加入的设备的信号强度对比拥有巨量数据的数据库，来确定位置。In the prior art, the location of persons in a scene is generally determined through Wi-Fi and the like. Wi-Fi positioning generally uses the "nearest neighbor method" to determine which hotspot or base station is closest to, that is, where it is considered to be. If there are multiple sources nearby, cross positioning (triangulation positioning) can be used to improve positioning accuracy. When a user has turned on Wi-Fi or a mobile cellular network when using a smartphone, it may become a data source. It is necessary to record the signal strength of a huge number of determined location points in advance, and compare the signal strength of the newly added device with the signal strength of the huge amount of data. database to determine the location.

相较于Wi-fi定位的方法，被监控人员需要主动携带手持设备接收无线信号，现有的标定方法均需要移动摄像头，然而实际环境中大多数监控摄像头已经被固定到某一位置，没法采用视觉SLAM中常用的移动摄像头的标定方法来对其做标定，也就是没办法准确得到其在环境的位置在计算机中的具体表示。基于此问题，本发明采用依托于监控系统的视觉定位方法，把外参标定结合到监控摄像头的场景下，不需要移动摄像头即可对其进行外参标定，同时被定位人员不需要携带设备，只是通过摄像头拍摄到的视频图像信息，进行自动化处理，准确定位人员位置。Compared with the Wi-Fi positioning method, the monitored person needs to actively carry a handheld device to receive wireless signals. The existing calibration methods all require moving the camera. However, in the actual environment, most of the monitoring cameras have been fixed to a certain position and cannot be used. The calibration method of the mobile camera commonly used in visual SLAM is used to calibrate it, that is, there is no way to accurately obtain the specific representation of its position in the environment in the computer. Based on this problem, the present invention adopts the visual positioning method relying on the monitoring system, and combines the external parameter calibration with the monitoring camera scene. Only through the video image information captured by the camera, automatic processing is performed to accurately locate the position of the person.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术二维监控操作繁琐、不直观、效率低以及人员定位跟踪依赖手持设备等的问题，提出一种基于视觉slam的监控视频人员定位跟踪方法及系统。Aiming at the problems in the prior art that two-dimensional monitoring operations are cumbersome, unintuitive, low in efficiency, and relying on handheld devices for personnel positioning and tracking, the present invention proposes a method and system for monitoring video personnel positioning and tracking based on visual slam.

为了实现上述目的，本发明提供如下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

一方面，本发明提供的一种基于视觉slam的监控视频人员定位跟踪方法，使用深度摄像头对环境进行三维重建，得到环境的点云地图；将外参标定结合到监控摄像头的场景下，通过标定板对监控摄像头进行外参标定，得到监控摄像头的位置和姿态信息；通过监控摄像头对人员进行跟踪，利用深度神经网络识别出监控图像中的人物，基于先前标定出的位置和姿态，采用逆透视变换的原理依据监控中出现的行人在地面上的先验计算出人物的三维位置，实现在已构建的点云地图中对人员轨迹进行绘制，并使其呈现在点云地图中。On the one hand, the present invention provides a monitoring video personnel positioning and tracking method based on visual slam, which uses a depth camera to reconstruct the environment three-dimensionally to obtain a point cloud map of the environment; The board calibrates the external parameters of the surveillance camera to obtain the position and attitude information of the surveillance camera; tracks the person through the surveillance camera, uses the deep neural network to identify the person in the surveillance image, and uses inverse perspective based on the previously calibrated position and attitude. The principle of transformation is to calculate the three-dimensional position of the person on the ground according to the a priori of the pedestrians appearing in the monitoring, so as to realize the drawing of the person's trajectory in the constructed point cloud map, and make it appear in the point cloud map.

进一步地，上述的基于视觉slam的监控视频人员定位跟踪方法，其特征在于，包括以下步骤：Further, the above-mentioned monitoring video personnel positioning and tracking method based on visual slam is characterized in that, comprises the following steps:

S1、基于视觉slam的三维重建：构建场景的三维点云地图和记录外参标定所需的数据，将深度摄像头提供的RGBD图片流和惯性传感器数据，依据视觉里程计构建场景的三维点云地图并保存至文件中；在三维重建过程中，拍摄棋盘格标定板并记录此时相机的位置姿态；S1. 3D reconstruction based on visual slam: construct a 3D point cloud map of the scene and record the data required for external parameter calibration, and use the RGBD image stream and inertial sensor data provided by the depth camera to construct a 3D point cloud map of the scene based on visual odometry And save it to the file; in the process of 3D reconstruction, shoot the checkerboard calibration board and record the position and attitude of the camera at this time;

S2、标定：监控摄像头拍摄一张棋盘格标定板的监控画面照片，结合步骤S1三维重建时提供的标定数据，计算出监控摄像头在三维点云地图中的位置与姿态；S2, calibration: the surveillance camera takes a picture of the surveillance image of the checkerboard calibration board, and combines the calibration data provided during the 3D reconstruction in step S1 to calculate the location and attitude of the surveillance camera in the 3D point cloud map;

S3、监控摄像头中行人的位置跟踪与计算：对人员进行跟踪，依据步骤S2得出的位置与姿态数据和监控摄像头提供的监控视频流，识别出图像中人物的位置并给出该人物在三维点云地图中的位置；S3. Position tracking and calculation of pedestrians in the surveillance camera: track the person, and identify the position of the person in the image based on the position and attitude data obtained in step S2 and the surveillance video stream provided by the surveillance camera. the location in the point cloud map;

监控摄像头中行人的位置跟踪与计算方流程为：The process of tracking and calculating the pedestrian's position in the surveillance camera is as follows:

S31、依据自身监控摄像头的种类和是否发生过移动选择是否进入姿态校正流程，是则进入步骤S32，否则进入步骤S33；S31, according to the type of self-monitoring camera and whether the movement has occurred, select whether to enter the posture correction process, if yes, enter step S32, otherwise enter step S33;

S32、若进入姿态校正流程，提取图像中的消失点并比对消失点位置，依据消失点是否移动判定监控画面是否发生旋转，更新旋转；S32, if entering the attitude correction process, extract the vanishing point in the image and compare the position of the vanishing point, determine whether the monitoring screen rotates according to whether the vanishing point moves, and update the rotation;

S33、进入定位流程，首先取一帧监控视频图像；S33, enter the positioning process, first take a frame of surveillance video image;

S34、对图像中的行人进行目标检测，获得目标框坐标；S34, perform target detection on the pedestrian in the image, and obtain the coordinates of the target frame;

S35、对检测到的目标框进行目标跟踪，给出其对应的人员位置坐标；S35, target tracking is performed on the detected target frame, and the corresponding personnel position coordinates are given;

S36、依据监控摄像头的标定参数计算人员的空间位置坐标；S36. Calculate the spatial position coordinates of the personnel according to the calibration parameters of the surveillance camera;

S37、如果不终止定位，则获取下一帧图像并返回步骤S33，否则结束。S37, if the positioning is not terminated, acquire the next frame of image and return to step S33, otherwise end.

进一步地，步骤S1场景的三维点云地图的构建过程增加点云地图处理线程，用于接受传入的每一帧相机的位姿信息和RGBD图像帧，输出准确的点云地图，具体流程如下：Further, the construction process of the three-dimensional point cloud map of the scene in step S1 adds a point cloud map processing thread, which is used to accept the incoming pose information and RGBD image frames of each frame of the camera, and output an accurate point cloud map. The specific process is as follows :

S141、对传入的每一帧深度摄像头的位姿信息和RGBD图像帧进行筛选，当前帧与上一个已选取帧之间的相机角度变化大于10°且位移变化大于2米时，选择当前帧，进行后续点云地图生成操作；S141. Screen the incoming pose information and RGBD image frame of each frame of the depth camera. When the camera angle change between the current frame and the last selected frame is greater than 10° and the displacement change is greater than 2 meters, select the current frame , and perform subsequent point cloud map generation operations;

S142、计算当前帧的点云块，并将其旋转至统一的世界坐标系下；S142, calculate the point cloud block of the current frame, and rotate it to a unified world coordinate system;

S143、对所有帧生成的点云块进行拼接合并，得到整体的点云地图，对点云地图做滤波与去除离群点的处理，以压缩点云地图的数据量，同时优化地图的视觉观感；S143, splicing and merging the point cloud blocks generated by all frames to obtain an overall point cloud map, filtering and removing outliers on the point cloud map, so as to compress the data volume of the point cloud map and optimize the visual perception of the map at the same time ;

S144、当在建图过程中发生回环时，ORB-SLAM3重新优化已选取帧的位姿，将点云重新拼接，并按照步骤S143重新做点云处理操作。S144. When a loopback occurs during the mapping process, ORB-SLAM3 re-optimizes the pose of the selected frame, re-splices the point cloud, and re-does the point cloud processing operation according to step S143.

进一步地，步骤S2外参标定方法为：Further, the external parameter calibration method of step S2 is:

S21、监控摄像头拍摄标棋盘格标定板：选择世界坐标系的原点位置，监控摄像头从原点位置开始慢慢向棋盘格标定板方向移动，在这个过程中使用ORB_SLAM3实时对监控摄像头的位姿进行估计，当监控摄像头移动到棋盘格标定板前方时，关闭程序，保存当前监控摄像头所拍摄的照片以及相机的位姿；S21. The surveillance camera shoots the checkerboard calibration board: select the origin position of the world coordinate system, and the surveillance camera slowly moves from the origin position to the checkerboard calibration board. In this process, ORB_SLAM3 is used to estimate the pose of the surveillance camera in real time. , when the surveillance camera moves to the front of the checkerboard calibration board, close the program and save the photos taken by the current surveillance camera and the pose of the camera;

S22、监控摄像头内参标定：将棋盘格标定板置于监控摄像头范围内，多角度移动棋盘格标定板，录制一段视频，抽取视频中的帧，识别出棋盘格，使用张正友标定法标定出监控摄像头的内参和畸变；S22. Surveillance camera internal parameter calibration: place the checkerboard calibration board within the range of the surveillance camera, move the checkerboard calibration board from multiple angles, record a video, extract frames from the video, identify the checkerboard, and use Zhang Zhengyou's calibration method to calibrate the surveillance camera internal reference and distortion;

S23、监控摄像头外参标定：根据目标特征点的实际三维位置信息以及目标特征点在图像中的二维位置，采用直接线性变换的方法，求解相机坐标系与目标坐标系，实现监控摄像头与三维点云地图之间的相对位置关系的计算。S23. Surveillance camera external parameter calibration: According to the actual three-dimensional position information of the target feature point and the two-dimensional position of the target feature point in the image, the direct linear transformation method is used to solve the camera coordinate system and the target coordinate system, so as to realize the monitoring camera and the three-dimensional Calculation of relative positional relationship between point cloud maps.

进一步地，步骤S36中人员的空间位置坐标计算方法为：Further, the calculation method of the spatial position coordinates of the personnel in step S36 is:

依据监控摄像头的位姿变换矩阵求得监控摄像头的相机模型光心P_ow＝(X_ow，Y_ow，Z_ow)：According to the pose transformation matrix of the surveillance camera, the camera model optical center P _ow = (X _ow , Y _ow , Z _ow ) of the surveillance camera is obtained:

P_ow＝-R^Tt#(21)P _ow =-RT ^t #(21)

取待定位人物的目标框的下底边中点M(u，v)，依据投影方程计算M点在归一化平面上的空间位置，设深度d为1米，依据公式(22)求解出点M在归一化平面上的空间坐标P_m＝(X_m，Y_m，Z_m)；Take the midpoint M(u, v) of the lower bottom edge of the target frame of the person to be positioned, calculate the spatial position of point M on the normalized plane according to the projection equation, set the depth d to be 1 meter, and solve it according to formula (22) The spatial coordinates of the point M on the normalized plane P _m =(X _m , Y _m , Z _m );

依据地面在世界坐标系中的高度h，给出地面所在平面的方程如式(24)所示，将地面所在平面转换为平面上一点和法向量表示：According to the height h of the ground in the world coordinate system, the equation of the plane where the ground is located is given as shown in equation (24), and the plane where the ground is located is converted into a point and a normal vector on the plane to represent:

z＝h#(23)z=h#(23)

p₀＝(0，0，h)，

p ₀ = (0, 0, h),

将射线

写作参数方程如(25)所示，其中

为射线的方向向量，

t为参数t∈[0，∞)，will ray

The writing parameter equation is shown in (25), where

is the direction vector of the ray,

t is the parameter t∈[0, ∞),

设射线

与地面所在平面交点为P_g，则有：set ray

The intersection point with the plane where the ground is located is P _g , then:

式(26)整理后得到：After finishing formula (26), we get:

由向量点乘分配律得：From the distributive law of vector dot product:

由此解出交点

交点P_g的坐标即为行人在世界坐标系下的三维坐标。Solve for the intersection

The coordinates of the intersection point _Pg are the three-dimensional coordinates of the pedestrian in the world coordinate system.

进一步地，上述的基于视觉slam的监控视频人员定位跟踪方法，还包括步骤S4可视化展示：将步骤S1场景的三维点云地图和步骤S3人物在三维点云地图中的位置展示出来，并提供GUI界面供用户交互和监督，展示的具体内容包括多个三维点云及其对应的摄像头与轨迹，在选择展示某一点云时，其他点云及对应的摄像头及轨迹隐藏。Further, the above-mentioned monitoring video personnel positioning and tracking method based on visual slam also includes step S4 visual display: displaying the three-dimensional point cloud map of the scene in step S1 and the position of the person in the three-dimensional point cloud map in step S3, and providing GUI. The interface is for user interaction and supervision. The specific content displayed includes multiple 3D point clouds and their corresponding cameras and trajectories. When a certain point cloud is selected to be displayed, other point clouds and corresponding cameras and trajectories are hidden.

另一方面，本发明还提供了一种基于视觉slam的监控视频人员定位跟踪系统，包括以下模块，以实现上述的方法步骤：On the other hand, the present invention also provides a visual slam-based surveillance video personnel positioning and tracking system, comprising the following modules to realize the above method steps:

三维重建模块，用于构建场景的三维点云地图和记录外参标定所需的数据，整个使用周期只运行一次，将深度摄像头提供的RGBD图片流和惯性传感器数据，依据视觉里程计构建场景的三维点云地图并保存至文件中供可视化展示模块展示使用；在三维重建过程中，拍摄棋盘格标定板和并记录此时相机的位置姿态供标定模块使用；The 3D reconstruction module is used to construct the 3D point cloud map of the scene and record the data required for external parameter calibration. It runs only once in the entire use cycle, and uses the RGBD image stream and inertial sensor data provided by the depth camera to construct the scene's The 3D point cloud map is saved to the file for display by the visual display module; in the process of 3D reconstruction, the checkerboard calibration board and the position and attitude of the camera are recorded for the calibration module to use;

标定模块，用于测量每一个监控摄像头在三维点云地图中的位置和姿态，整个使用周期只运行一次，监控摄像头拍摄一张棋盘格标定板的监控画面照片，结合三维重建时提供的标定数据计算出监控摄像头在三维点云地图中的位置与姿态供位置计算模块使用；The calibration module is used to measure the position and attitude of each surveillance camera in the 3D point cloud map. It runs only once in the entire use cycle. The surveillance camera takes a picture of the surveillance image of the checkerboard calibration board, combined with the calibration data provided during 3D reconstruction. Calculate the position and attitude of the surveillance camera in the 3D point cloud map for use by the position calculation module;

位置计算模块，用于依据标定模块得出的位置姿态数据和监控摄像头提供的监控视频流，识别出图像中人物的位置并给出该人物在三维点云地图中的位置供可视化展示模块展示使用；The position calculation module is used to identify the position of the character in the image and give the position of the character in the 3D point cloud map based on the position and attitude data obtained by the calibration module and the monitoring video stream provided by the monitoring camera for the visual display module to display. ;

可视化展示模块，用于将场景的三维点云地图和人物在三维点云地图中的位置展示出来，并提供GUI界面供用户交互和监督。The visual display module is used to display the 3D point cloud map of the scene and the position of the characters in the 3D point cloud map, and provide a GUI interface for user interaction and supervision.

与现有技术相比，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

本发明基于视觉slam的监控视频人员定位跟踪方法，具有建图、标定、监控三大模块，实现2D监控向3D监控的能力拓展。与此同时，设计了一种适用于该监控场景的展示系统，并在系统点云展示和界面视频与轨迹输出方面做了优化。The monitoring video personnel positioning and tracking method based on the visual slam of the present invention has three modules of mapping, calibration and monitoring, and realizes the capability expansion from 2D monitoring to 3D monitoring. At the same time, a display system suitable for this monitoring scene is designed, and optimized in the system point cloud display and interface video and trajectory output.

1.本发明提出了一种基于视觉SLAM和惯性里程计的三维地图构建方法，通过跟踪、局部建图、回环优化三个主要过程将重建生成的三维点云地图做到最佳，以达到将建筑物内的场景进行更好效果的地形重构，使得监控人员对地图进行快速理解和准确定位。1. The present invention proposes a three-dimensional map construction method based on visual SLAM and inertial odometry. Through the three main processes of tracking, local mapping and loopback optimization, the reconstructed three-dimensional point cloud map is optimized to achieve the The scene in the building is reconstructed with better effect, so that the monitoring personnel can quickly understand and accurately locate the map.

2.为了解决实际环境中大多数监控摄像头已经被固定到某一位置，没法采用视觉SLAM中常用的移动摄像头的标定方法来对其做标定，也就是没办法准确得到其在环境的位置在计算机中的具体表示。本发明给出了一种通用监控摄像头与环境三维点云的外参标定方法，通过使用棋盘格标定板，能够适应各种环境下对相机外参标定的需求。2. In order to solve the problem that most surveillance cameras have been fixed to a certain position in the actual environment, the calibration method of mobile cameras commonly used in visual SLAM cannot be used to calibrate them, that is, there is no way to accurately obtain their position in the environment. Concrete representation in a computer. The invention provides a method for calibrating external parameters of a general surveillance camera and a three-dimensional point cloud of the environment. By using a checkerboard calibration board, it can adapt to the needs of calibrating external parameters of cameras in various environments.

3.通过本发明的深度摄像头参数标定方法，得到深度摄像头相关的位置信息，并仅借助普通通用摄像头拍下的视频做处理，就可以把人物轨迹准确的计算出来。同时按照每个人在已经生成的点云图中进行跟踪绘制，将轨迹的相应信息保存展示到界面上，且能够做到目标跟踪轨迹超时自动消除。3. Through the depth camera parameter calibration method of the present invention, the position information related to the depth camera is obtained, and the trajectory of the character can be accurately calculated only by processing the video captured by the ordinary general camera. At the same time, according to each person's tracking and drawing in the generated point cloud map, the corresponding information of the track is saved and displayed on the interface, and the target tracking track timeout can be automatically eliminated.

4.本发明通过对人员位置计算与轨迹优化，提出一种基于三维视觉算法的人物位置计算跟踪算法，算法首先利用深度神经网络识别出监控图像中的人物，在基于先前标定出的位置关系，基于逆透视变换的原理依据监控中出现的行人都在地面上的先验计算出人物的三维位置。4. The present invention proposes a character position calculation tracking algorithm based on a three-dimensional visual algorithm by calculating the position of the personnel and optimizing the trajectory. The algorithm first uses a deep neural network to identify the character in the monitoring image, and then based on the previously demarcated positional relationship, Based on the principle of inverse perspective transformation, the three-dimensional position of the person is calculated according to the priori that the pedestrians appearing in the monitoring are all on the ground.

5.基于以上的三维点云地图和图形化界面，本发明还提供了可供扩展的功能，可以识别在视频图像中出现的火苗或烟雾，适应于真实场景的楼内监控需求，保证在出现异常情况时系统可以自动发现危险并及时报警。5. Based on the above three-dimensional point cloud map and graphical interface, the present invention also provides functions for expansion, which can identify the flames or smoke appearing in the video image, adapt to the real scene monitoring requirements in the building, and ensure that the In abnormal situations, the system can automatically detect danger and give an alarm in time.

6.本发明通过设计数据传输和显示方法，完善三维点云地图中的轨迹绘制方法，将后台的点云文件格式化传输到系统中，建立一个能够展示三维点云、摄像头模型与轨迹的基于Three.JS图形库的3D展示界面，使得监控人员能够简单易懂且清晰地理解输出结果，方便采取相应措施。设计的界面通用性好，可以支持在多种设备上进行展示。6. The present invention improves the trajectory drawing method in the 3D point cloud map by designing the data transmission and display method, formats and transmits the background point cloud file to the system, and establishes a system that can display the 3D point cloud, camera model and trajectory. The 3D display interface of the Three.JS graphics library enables monitoring personnel to understand the output results easily and clearly, and it is convenient to take corresponding measures. The designed interface is versatile and can be displayed on a variety of devices.

7.本发明结合视觉SLAM在地图重建方面的优势，设计开发一个3D智能监控系统，可以全方位解决传统二维监控痛点问题，具有直观高效，低成本，易部署的优势，在其中可展示3维点云地图，可监测整座大楼立体布局，一键实现目标跟踪的功能，自动地将进入建筑物的人员轨迹在点云图中计算绘制出来，实现对实时轨迹的清晰准确的三维展示及轨迹列表中相关信息的查询，也可以一键切换查看各楼层监控画面。还可以添加火苗预警，烟雾识别，行为识别等功能。7. The present invention combines the advantages of visual SLAM in map reconstruction to design and develop a 3D intelligent monitoring system, which can comprehensively solve the pain points of traditional two-dimensional monitoring, and has the advantages of being intuitive, efficient, low-cost, and easy to deploy. It can show 3 The 3D point cloud map can monitor the three-dimensional layout of the entire building, realize the function of target tracking with one key, and automatically calculate and draw the trajectory of people entering the building in the point cloud map to achieve a clear and accurate three-dimensional display and trajectory of the real-time trajectory. You can also query the relevant information in the list, and you can also switch to view the monitoring screen of each floor with one key. You can also add fire warning, smoke recognition, behavior recognition and other functions.

8.本发明设计的系统同时支持固定监控摄像头相关位姿的计算与展示，方便应用到现有的监控场景中，便于互联网+大背景下的大数据、云计算等新一代信息技术与监控系统深度融合。8. The system designed by the present invention simultaneously supports the calculation and display of the relative poses of fixed surveillance cameras, which is convenient to apply to existing surveillance scenarios, and facilitates the new generation of information technology and surveillance systems such as big data and cloud computing under the Internet + background. Deep integration.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only described in the present invention. For some of the embodiments, those of ordinary skill in the art can also obtain other drawings according to these drawings.

图1为本发明实施例提供的基于视觉slam的监控视频人员定位跟踪系统；1 is a visual slam-based monitoring video personnel positioning and tracking system provided by an embodiment of the present invention;

图2为本发明实施例提供的三维重建模块流程图；FIG. 2 is a flowchart of a three-dimensional reconstruction module provided by an embodiment of the present invention;

图3为本发明实施例提供的点云地图校准流程图；3 is a flowchart of point cloud map calibration provided by an embodiment of the present invention;

图4为本发明实施例提供的RGBD摄像头、监控摄像头拍摄的标定板图像与提取出的特征点的对应关系；Fig. 4 is the corresponding relationship between the calibration plate image captured by the RGBD camera and the surveillance camera provided by the embodiment of the present invention and the extracted feature points;

图5为本发明实施例提供的目标检测跟踪和位置计算流程；FIG. 5 is a flow chart of target detection tracking and position calculation provided by an embodiment of the present invention;

图6为本发明实施例提供的消失点提取图；6 is a vanishing point extraction diagram provided by an embodiment of the present invention;

图7为本发明实施例提供的类型化数组示意图；7 is a schematic diagram of a typed array provided by an embodiment of the present invention;

图8为本发明实施例提供的三维点云加载样例图；FIG. 8 is a sample diagram of a 3D point cloud loading provided by an embodiment of the present invention;

图9为本发明实施例提供的摄像头模型样例图；FIG. 9 is a sample diagram of a camera model provided by an embodiment of the present invention;

图10为本发明实施例提供的轨迹更新绘制流程图；10 is a flow chart of trajectory update drawing provided by an embodiment of the present invention;

图11为本发明实施例的系统数据传输图；11 is a system data transmission diagram of an embodiment of the present invention;

图12为本发明实施例的基于三维视觉的可视化的视频监控系统轨迹相关功能展示；12 is a display of functions related to the trajectory of a video surveillance system based on 3D vision visualization according to an embodiment of the present invention;

图13为本发明实施例的基于三维视觉的可视化的视频监控系统的整体界面图，以及开关功能与楼层切换功能展示；13 is an overall interface diagram of a three-dimensional vision-based visualization video surveillance system according to an embodiment of the present invention, and display of switch functions and floor switching functions;

图14为本发明实施例的基于三维视觉的可视化的视频监控系统火苗、烟雾报警功能展示。FIG. 14 is a display of fire and smoke alarm functions of a three-dimensional vision-based visualization video surveillance system according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供了一种基于视觉slam的监控视频人员定位跟踪系统，硬件设备采用Kinect for Azure深度摄像头、棋盘格标定板、监控摄像头、个人电脑或笔记本电脑，系统包括以下模块(如图1所示)：An embodiment of the present invention provides a visual slam-based surveillance video personnel location tracking system. The hardware device adopts a Kinect for Azure depth camera, a checkerboard calibration board, a surveillance camera, a personal computer or a notebook computer. The system includes the following modules (as shown in FIG. 1 ) shown):

系统由上述四大模块组成，其中三维重建模块、标定模块和位置计算模块在技术层面进行了创新，可视化展示模块从产品层面进行了创新。The system consists of the above four modules, of which the 3D reconstruction module, the calibration module and the position calculation module are innovative at the technical level, and the visual display module is innovative at the product level.

基于上述系统，本发明实施例提出了一种基于视觉slam的监控视频人员定位跟踪方法，具体步骤如下：Based on the above system, an embodiment of the present invention proposes a method for locating and tracking personnel in surveillance video based on visual slam, and the specific steps are as follows:

S1、基于视觉slam的三维重建：构建场景的三维点云地图和记录外参标定所需的数据，将深度摄像头提供的RGBD图片流和惯性传感器数据，依据视觉里程计构建场景的三维点云地图并保存至文件中；在三维重建过程中，拍摄棋盘格标定板并记录此时相机的位置姿态。S1. 3D reconstruction based on visual slam: construct a 3D point cloud map of the scene and record the data required for external parameter calibration, and use the RGBD image stream and inertial sensor data provided by the depth camera to construct a 3D point cloud map of the scene based on visual odometry And save it to the file; in the process of 3D reconstruction, shoot the checkerboard calibration board and record the position and attitude of the camera at this time.

在三维重建过程中，设备参数标定使用IMU_utils工具、Kalibr工具箱和二维码标定板，测量深度摄像头的参数，包括：深度摄像头内参、IMU参数、IMU与深度摄像头之间的时间偏移以及IMU和深度摄像头坐标系之间的位姿变换关系。In the 3D reconstruction process, the device parameter calibration uses the IMU_utils tool, Kalibr toolbox and QR code calibration board to measure the parameters of the depth camera, including: depth camera internal parameters, IMU parameters, time offset between IMU and depth camera, and IMU and the pose transformation relationship between the depth camera coordinate system.

三维重建模块基于ORB_SLAM3开源框架，对Kinect for Azure深度相机进行适配，使得其在实际工况下可以正常使用。步骤S1场景的三维点云地图的构建流程如图2所示，修改内容具体为：The 3D reconstruction module is based on the ORB_SLAM3 open source framework and adapts the Kinect for Azure depth camera so that it can be used normally under actual working conditions. Step S1 The construction process of the 3D point cloud map of the scene is shown in Figure 2, and the modified content is as follows:

S11、对深度摄像头提供的RGBD图片流进行降采样，分辨率由1280×720降至960×540；S11. Downsample the RGBD image stream provided by the depth camera, and the resolution is reduced from 1280×720 to 960×540;

S12、对传入数据流按照文件中保存的标定参数对齐时间戳；S12, align the timestamps for the incoming data stream according to the calibration parameters saved in the file;

S13、对特征点的Z轴方向距离估计由三角测量改为深度图读取；S13. The distance estimation in the Z-axis direction of the feature points is changed from triangulation to depth map reading;

S14、增加点云地图处理线程对场景地形进行稠密点云的重建。S14 , adding a point cloud map processing thread to reconstruct the dense point cloud of the scene terrain.

视觉里程计只提供了深度摄像头位置和姿态，因此需要对场景地形进行的稠密点云的重建。一共分为两步，先按深度摄像头姿态和深度摄像头模型将点云拼接出来，再对点云进行处理，去除重复的点和多余的点。在本系统中上述功能通过点云地图处理线程实现，流程图如图2所示。点云地图处理线程的功能是接受传入的每一帧深度摄像头的位姿信息和RGBD图像帧(包括彩色图和深度图)，输出准确的点云地图，步骤S14点云地图处理线程包括：Visual odometry only provides depth camera positions and poses, thus requiring a dense point cloud reconstruction of the scene terrain. It is divided into two steps. First, the point cloud is spliced according to the depth camera pose and the depth camera model, and then the point cloud is processed to remove duplicate points and redundant points. In this system, the above functions are realized through the point cloud map processing thread, and the flow chart is shown in Figure 2. The function of the point cloud map processing thread is to accept the incoming pose information and RGBD image frame (including color map and depth map) of each frame of the depth camera, and output an accurate point cloud map. The step S14 point cloud map processing thread includes:

S142、根据深度摄像头模型公式计算当前帧的点云块，并将其旋转至统一的世界坐标系下；S142, calculate the point cloud block of the current frame according to the depth camera model formula, and rotate it to a unified world coordinate system;

将深度相机得到的RGBD图像帧转化成点云块并拼接到世界坐标系的详细过程如下所示：The detailed process of converting the RGBD image frames obtained by the depth camera into point cloud blocks and stitching them into the world coordinate system is as follows:

对于图像中某一像素点p，设其坐标为(u，v)，点云中由p投影得到的三维点P的坐标为(X_c，Y_c，Z_c)，则点云中点的生成公式如下：For a pixel p in the image, let its coordinates be (u, v), and the coordinates of the three-dimensional point P projected by p in the point cloud are (X _c , Y _c , Z _c ), then the point in the point cloud is The generation formula is as follows:

其中f_x，f_y，c_x，c_y表示相机的内参，通过上述给出的设备参数标定方法得到，d表示由深度图读取的点p的深度。where f _x , f _y , c _x , and _cy represent the internal parameters of the camera, which are obtained by the device parameter calibration method given above, and d represents the depth of point p read from the depth map.

通过ORB-SLAM3，得到相机在世界坐标系下的旋转和平移的矩阵R_cw，t_cw，将其合并在一起：Through ORB-SLAM3, the rotation and translation matrices R _cw and t _cw of the camera in the world coordinate system are obtained, and they are merged together:

则可以求得点P在世界坐标系下的坐标X_w，Y_w，Z_w为：Then the coordinates X _w , Y _w , and Z _w of the point P in the world coordinate system can be obtained as:

基于上述方法，可以将图像中所有的像素点都转化成世界坐标系下的三维点，这种做法的好处是使得ORB-SLAM的回环检测线程在对关键帧进行校正后，校正的结果可以作用于稠密地图。相比于直接对点云进行拼接，这种做法可以消除累积误差。Based on the above method, all pixels in the image can be converted into 3D points in the world coordinate system. The advantage of this approach is that after the loop detection thread of ORB-SLAM corrects the key frames, the correction results can be used. on dense maps. Compared to stitching point clouds directly, this approach can eliminate accumulated errors.

在ORB_SLAM3框架中，由于世界坐标系以初始化时相机的位置作为基准，生成的点云地图不可避免地会出现歪斜的情况，这会对后续地图的可视化展示以及后续的外参标定等工作带来负面影响，因此需要对点云地图进行坐标系校准。如图3所示，三维点云地图的校准流程为：In the ORB_SLAM3 framework, since the world coordinate system is based on the position of the camera during initialization, the generated point cloud map will inevitably be skewed, which will bring about visual display of subsequent maps and subsequent external parameter calibration and other work. Negative effects, so the coordinate system calibration of the point cloud map is required. As shown in Figure 3, the calibration process of the 3D point cloud map is as follows:

1)计算点云地面的平面方程：1) Calculate the plane equation of the point cloud ground:

使用PCL库中基于RANSAC的平面检测方法，求出地面的平面方程ax+by+cz+d＝0，其中a,b,c,d为平面方程的四个参数，通过平面方程可以得到地面的法向量为(a,b,c)。Using the plane detection method based on RANSAC in the PCL library, find the plane equation of the ground ax+by+cz+d=0, where a, b, c, d are the four parameters of the plane equation, and the ground plane equation can be obtained through the plane equation. The normal vector is (a,b,c).

2)计算点云地面与坐标系水平面的旋转矩阵：2) Calculate the rotation matrix of the point cloud ground and the horizontal plane of the coordinate system:

计算点云地面与坐标系水平面的旋转矩阵，通过该矩阵可以实现点云地图的旋转变换，从而校准点云地图。计算方法如下：Calculate the rotation matrix of the point cloud ground and the horizontal plane of the coordinate system, through which the rotation transformation of the point cloud map can be realized, thereby calibrating the point cloud map. The calculation method is as follows:

(1)设点云地面的法向量为v1＝(a,b,c)，坐标系水平面的法向量为v2＝(0,0,1)，计算两个向量之间旋转变换的旋转轴n与旋转角θ，计算公式如下：(1) Set the normal vector of the point cloud ground as v1=(a,b,c), the normal vector of the horizontal plane of the coordinate system as v2=(0,0,1), and calculate the rotation axis n of the rotation transformation between the two vectors With the rotation angle θ, the calculation formula is as follows:

(2)转轴n与旋转角θ求出旋转矩阵R，计算公式如下：(2) The rotation matrix R is obtained from the rotation axis n and the rotation angle θ, and the calculation formula is as follows:

R＝cosθI+(1-cosθ)nn^T+sinθn^R=cosθI+(1-cosθ)nn ^T +sinθn^

其中^符号是向量到反对称矩阵的转换符，设向量a＝(a1,a2,a3)，则具体转换公式如下：The ^ symbol is the conversion symbol from vector to antisymmetric matrix. If vector a=(a1, a2, a3), the specific conversion formula is as follows:

3)使用旋转矩阵校准点云地图：3) Calibrate the point cloud map using the rotation matrix:

假设待校准点云地图中某点为p₀＝(x₀，y₀，z₀)，校准后该点坐标为p₁＝(x₁，y₁，z₁)，转换公式如下：Assuming that a certain point in the point cloud map to be calibrated is p ₀ =(x ₀ , y ₀ , z ₀ ), and the coordinates of the point after calibration are p ₁ =(x ₁ , y ₁ , z ₁ ), the conversion formula is as follows:

p₁＝Rp_０ p ₁ =Rp ₀

对于待校准点云地图中的所有点都应用该公式，则可实现整个点云地图的校准，最终实现地面与坐标系水平面的对齐。Applying this formula to all the points in the point cloud map to be calibrated can realize the calibration of the entire point cloud map, and finally realize the alignment between the ground and the horizontal plane of the coordinate system.

S2、标定：测量每一个监控摄像头在三维点云地图中的位置和姿态，监控摄像头拍摄一张棋盘格标定板的监控画面照片，结合步骤S1三维重建时提供的标定数据计算出监控摄像头在三维点云地图中的位置与姿态。S2. Calibration: Measure the position and attitude of each surveillance camera in the 3D point cloud map, and the surveillance camera takes a picture of the surveillance image of the checkerboard calibration board. Combined with the calibration data provided during the 3D reconstruction in step S1, the surveillance camera is calculated in 3D. Position and pose in the point cloud map.

该步骤主要解决了以下问题：对于已经安装在建筑中的监控摄像头，如何得到这些摄像头在世界坐标系下的坐标，步骤S2标定过程为：This step mainly solves the following problems: for the surveillance cameras already installed in the building, how to obtain the coordinates of these cameras in the world coordinate system, the calibration process of step S2 is as follows:

S21、监控摄像头拍摄标棋盘格标定板：S21. The surveillance camera shoots the standard checkerboard calibration board:

在使用RGBD相机对环境进行三维重建后，得到了环境的点云地图。为了使用监控摄像头完成人员定位工作，还需要测量监控摄像头与点云地图之间的相对位置关系，即外参数。使用标定板作为标志物计算外参数。After 3D reconstruction of the environment using an RGBD camera, a point cloud map of the environment is obtained. In order to use the surveillance camera to complete the personnel positioning work, it is also necessary to measure the relative positional relationship between the surveillance camera and the point cloud map, that is, the external parameters. Extrinsic parameters were calculated using the calibration plate as a marker.

标定板的作用是为相机提供相对位置关系固定的多个三维点，从而计算所需参数，标定常用的标定板有棋盘格标定板和二维码标定板，二维码标定板相较于棋盘格标定板拥有上下左右颠倒后可以区分的优势，同时也因为图形复杂，对光照和相机分辨率提出了更高的要求，在本文的使用条件下鉴于监控摄像头均为正视监控场景，不存在上下颠倒的情况，同时为了标定方法能够适配更低成本的监控摄像头和在更加昏暗的室内环境光线条件下布置本文系统，棋盘格标定板更加适合本发明的使用环境。The function of the calibration board is to provide the camera with multiple three-dimensional points with a fixed relative position, so as to calculate the required parameters. The commonly used calibration boards include checkerboard calibration board and two-dimensional code calibration board. The two-dimensional code calibration board is compared with the chessboard. The grid calibration board has the advantage that it can be distinguished after upside-down and left-right. At the same time, because of the complexity of the graphics, it puts forward higher requirements for lighting and camera resolution. Under the conditions of use in this article, since the surveillance cameras are all facing the surveillance scene, there is no upper and lower In the reversed situation, at the same time, the checkerboard calibration board is more suitable for the use environment of the present invention in order to adapt the calibration method to lower-cost surveillance cameras and to arrange the system in this article in a darker indoor ambient light condition.

修改ORB_SLAM3框架代码，实现运行某一时刻保存当前相机拍摄的相片以及相机位姿。实际操作时，选择世界坐标系的原点位置，监控摄像头从原点位置开始慢慢向棋盘格标定板方向移动，在这个过程中使用ORB_SLAM3实时对监控摄像头的位姿进行估计，当监控摄像头移动到棋盘格标定板前方时，关闭程序，保存当前监控摄像头所拍摄的照片以及相机的位姿；Modify the ORB_SLAM3 framework code to save the photo taken by the current camera and the camera pose at a certain moment. In actual operation, select the origin position of the world coordinate system, and the surveillance camera slowly moves from the origin to the direction of the checkerboard calibration board. In this process, ORB_SLAM3 is used to estimate the pose of the surveillance camera in real time. When the surveillance camera moves to the chessboard When the grid is in front of the calibration board, close the program and save the photos taken by the current surveillance camera and the pose of the camera;

S22、监控摄像头内参标定：S22. Internal parameter calibration of surveillance camera:

将棋盘格标定板置于监控摄像头范围内，多角度移动棋盘格标定板，录制一段视频，抽取视频中的帧，识别出棋盘格，使用张正友标定法标定出监控摄像头的内参和畸变。Place the checkerboard calibration board within the range of the surveillance camera, move the checkerboard calibration board from multiple angles, record a video, extract the frames in the video, identify the checkerboard, and use the Zhang Zhengyou calibration method to calibrate the internal parameters and distortion of the surveillance camera.

S23、监控摄像头外参标定：S23. Surveillance camera external parameter calibration:

位姿的计算是根据目标特征点的实际三维位置信息以及目标特征点在图像中的二维位置来求解相机坐标系与目标坐标系的一种方法。若标定板上的特征点的三维坐标已知，本文将采用直接线性变换的方法，构造一个含有12个未知数的增广矩阵[R|T]来表示相机坐标系与目标坐标系之间的变换。选取至少6对已知三维空间点坐标与二维像素点坐标的对应点对，来求解增广矩阵中的未知数，从而实现相机位姿的计算。The calculation of the pose is a method to solve the camera coordinate system and the target coordinate system according to the actual three-dimensional position information of the target feature point and the two-dimensional position of the target feature point in the image. If the three-dimensional coordinates of the feature points on the calibration board are known, this paper will use the direct linear transformation method to construct an augmented matrix [R|T] containing 12 unknowns to represent the transformation between the camera coordinate system and the target coordinate system. . Select at least 6 pairs of corresponding point pairs of known three-dimensional space point coordinates and two-dimensional pixel point coordinates to solve the unknowns in the augmented matrix, thereby realizing the calculation of the camera pose.

假设空间中标定板上一特征点P₁对应的齐次坐标是P₁＝(X，Y，Z，1)^T，将此点在监控相机的图像中的对应二维点记为x₁＝(u₁，v₁，1)^T，依据直接线性变换的计算方法，设3×4的增广矩阵[R|T]展开成形式：Assuming that the homogeneous coordinate corresponding to a feature point P ₁ on the calibration plate in space is P ₁ =(X, Y, Z, 1) ^T , mark the corresponding two-dimensional point of this point in the image of the surveillance camera as x ₁ = (u ₁ , v ₁ , 1) ^T , according to the calculation method of direct linear transformation, let the 3×4 augmented matrix [R|T] be expanded into the form:

上式中u₁，v₁监控相机的图像中的某个二维点像素坐标X，Y，Z为对应点的三维坐标s为尺度。In the above formula, the pixel coordinates X, Y, and Z of a certain two-dimensional point in the image of the monitoring camera u ₁ , v ₁ are the three-dimensional coordinates of the corresponding point and s is the scale.

依据矩阵的线性变换利用最后一行消去尺度系数s，得到：According to the linear transformation of the matrix, the scale coefficient s is eliminated by the last row, and we get:

为了简化表示，将式(1)增广矩阵的每一横行用向量t₁，t₂，t₃表示：In order to simplify the representation, each row of the augmented matrix of formula (1) is represented by vectors t ₁ , t ₂ , t ₃ :

然后可以得到向量表示的等式如下式所示：Then the equation for the vector representation can be obtained as follows:

式(5)(6)中t为待求解向量，依据(5)(6)可知，每个特征点含有两个未知数的约束方程，如果存在N对三维坐标与二维坐标的对应点对，可将特征方程写为下式所示形式：In equations (5) and (6), t is the vector to be solved. According to (5) and (6), each feature point contains two unknown constraint equations. If there are N pairs of corresponding point pairs of three-dimensional coordinates and two-dimensional coordinates, The characteristic equation can be written in the form shown below:

根据方程(7)可知共含有12个未知数，因此求解上述方程至少需要六对对应点对，本系统所使用的棋盘格标定板有42对角点坐标对，使得上式成为超定方程，使用SVD法对方程进行最小二乘求解。According to equation (7), it can be seen that there are 12 unknowns in total, so at least six pairs of corresponding point pairs are needed to solve the above equation. The SVD method solves the equation by least squares.

对于棋盘格标定板上42个三维点P及其在归一化平面上的投影p，先前使用直接线性变换的方法计算出相机的位姿R、t，其李代数表示为ξ。假设空间中某个标定板角点的空间坐标为P_i＝[X_i，Y_i，Z_i]^T，其投影坐标u_i＝[u_i，v_i]^T。像素坐标与空间点位置的关系如式所示：For the 42 three-dimensional points P on the checkerboard calibration board and their projections p on the normalized plane, the pose R and t of the camera were previously calculated using the method of direct linear transformation, and its Lie algebra is expressed as ξ. Assume that the spatial coordinates of a certain calibration plate corner point in space are P _i =[X _i , Y _i , Z _i ] ^T , and its projected coordinates _ui =[u _i , v _i ] ^T . The relationship between pixel coordinates and spatial point positions is shown in the formula:

写成矩阵形式则有：Written in matrix form:

s_iu_i＝Kexp(ξ^)P_i s _i u _i =Kexp(ξ^)P _i

由于相机位姿未知及观测点的噪声，该等式存在一个误差。将误差求和，构建最小二乘问题，然后寻找最好的相机位姿，使它最小化，误差项求和如式所示：There is an error in this equation due to the unknown camera pose and the noise of the observation point. Sum the errors, construct a least squares problem, and then find the best camera pose to minimize it. The sum of the error terms is as follows:

通过高斯牛顿算法进行求解，得出误差项最小时的相机位姿变换矩阵。The solution is solved by Gauss-Newton algorithm, and the camera pose transformation matrix with the smallest error term is obtained.

以上所述可以在已知棋盘格标定板特征点的坐标的前提下求解相机姿态，由于单目摄像头的尺度不确定性，仅凭图像中观察到的标定板的大小和样式是无法确定各个特征点之间的相对位置关系，依据本发明系统所处的实际情况，如图4所示，尺度信息可以从两个方面获取：The above can solve the camera pose on the premise of knowing the coordinates of the feature points of the checkerboard calibration board. Due to the scale uncertainty of the monocular camera, it is impossible to determine each feature based on the size and style of the calibration board observed in the image. The relative positional relationship between the points, according to the actual situation where the system of the present invention is located, as shown in Figure 4, the scale information can be obtained from two aspects:

(1)从深度图可以获取特征点距离RGBD相机光心的Z轴绝对距离，从而获取各个特征点的绝对位置信息。(1) The Z-axis absolute distance between the feature point and the optical center of the RGBD camera can be obtained from the depth map, so as to obtain the absolute position information of each feature point.

(2)测量标定板上棋盘格的大小获取棋盘格上各个特征点之间的相对位置关系。(2) Measure the size of the checkerboard on the calibration board to obtain the relative positional relationship between each feature point on the checkerboard.

以棋盘格标定板左上角第一个特征角点为坐标原点，横向为x轴纵向为y轴，竖直向上为z轴，构成标定板坐标系。已知棋盘格标定板上的特征点的三维坐标求解相机位姿的方法，可求解出相机在标定板坐标系下的位姿变换矩阵。当使用监控摄像头拍摄棋盘格标定板时得到的标定板坐标系下变换矩阵记为T_mb，当使用RGBD摄像头在视觉里程计运行过程中拍摄棋盘格标定板时得到的标定板坐标系下的变换矩阵记为T_cb，此时视觉里程计给出的RGBD相机相对于世界坐标系的变换矩阵为T_cw。Taking the first characteristic corner point of the upper left corner of the checkerboard calibration plate as the coordinate origin, the horizontal axis is the x-axis, the vertical axis is the y-axis, and the vertical upward is the z-axis to form the calibration plate coordinate system. Knowing the three-dimensional coordinates of the feature points on the checkerboard calibration board to solve the camera pose, the pose transformation matrix of the camera in the calibration board coordinate system can be solved. The transformation matrix under the calibration board coordinate system obtained when the checkerboard calibration board is photographed with a surveillance camera is denoted as T _mb . The matrix is denoted as T _cb , and the transformation matrix of the RGBD camera relative to the world coordinate system given by the visual odometry is T _cw .

设棋盘格标定板上某个特征点在标定板坐标系下的坐标为P_b＝(X，Y，0，1)^T，该点在监控摄像头相机坐标系下的坐标记P_m，在RGBD相机坐标系下的坐标记P_c，在视觉里程计的世界坐标系下的坐标记为为P_w则有：Let the coordinates of a feature point on the checkerboard calibration board in the calibration board coordinate system be P _b =(X, Y, 0, 1) ^T , the coordinate mark P _m of this point in the surveillance camera camera coordinate system, in RGBD The coordinate mark P _c under the camera coordinate system and the coordinate mark P _w under the world coordinate system of the visual odometer are:

P_c＝T_cbP_b (10)P _c =T _cb P _b (10)

P_m＝T_mbP_b (11)P _m =T _mb P _b (11)

将式(10)左右两侧同时左乘T_cb ^-1得到：Multiply the left and right sides of equation (10) by T _cb ^-1 to get:

P_b＝T_cb ^-1P_c (12)P _b =T _cb ^-1 P _c (12)

将式(12)代入(11)有：Substitute equation (12) into (11) to have:

P_m＝T_mbT_cb ^-1P_c (13)P _m =T _mb T _cb ^-1 P _c (13)

依据欧式变换定义(13)中T_mbT_cb ^-1代表了从RGBD相机坐标系到监控相机坐标系的变换方法，因此记T_mbT_cb ^-1为T_mc。According to the Euclidean transformation definition (13), T _mb T _cb ^-1 represents the transformation method from the RGBD camera coordinate system to the monitoring camera coordinate system, so T _mb T _cb ^-1 is denoted as T _mc .

由式(14)可求解世界坐标系到监控相机坐标系的变换矩阵T_mw。From the formula (14), the transformation matrix T _mw from the world coordinate system to the monitoring camera coordinate system can be solved.

T_mw＝T_mcT_cw (14)T _mw =T _mc T _cw (14)

S3、位置计算：依据步骤S2得出的位置与姿态数据和监控摄像头提供的监控视频流，识别出图像中人物的位置并给出该人物在三维点云地图中的位置。S3. Position calculation: According to the position and attitude data obtained in step S2 and the monitoring video stream provided by the monitoring camera, the position of the person in the image is identified and the position of the person in the three-dimensional point cloud map is given.

基于监控视频进行人员定位和轨迹追踪具有精度高稳定性好的优点。下面将重点介绍基于监控视频的行人检测与定位算法。Personnel positioning and trajectory tracking based on surveillance video has the advantages of high precision and good stability. The following will focus on the pedestrian detection and localization algorithm based on surveillance video.

如图5所示，监控摄像头中行人的位置跟踪与计算方流程为：As shown in Figure 5, the process of tracking and calculating the pedestrian's position in the surveillance camera is as follows:

具体的，关于监控摄像头姿态校正，部分监控摄像头具有手动或自动旋转的功能，可以造成相机姿态偏航角和俯仰角上的变化，鉴于本文所述定位系统，使用前期对监控摄像头的标定工作的工作量较大，每次监控摄像头姿态变化后再进行重新标定外参的动作是不现实的，因此需要提供一个自动更新校正姿态的工作流程。Specifically, regarding the attitude correction of surveillance cameras, some surveillance cameras have the function of manual or automatic rotation, which can cause changes in the yaw angle and pitch angle of the camera attitude. The workload is large, and it is unrealistic to re-calibrate the external parameters every time the posture of the surveillance camera changes. Therefore, it is necessary to provide a workflow for automatically updating and correcting posture.

1)消失点提取1) Vanishing point extraction

根据射影几何原理，在透视变形的情况下，真实世界中的平行直线群会相交于无穷点，相交点在成像平面上的投影称为消失点。当现实世界中的平行线与成像平面平行时，消失点位于成像平面的无穷远处。然而，当平行线群与成像平面之间存在非平行关系时，消失点将位于成像平面的有限距离内，甚至在成像区域内。According to the principle of projective geometry, in the case of perspective deformation, the group of parallel lines in the real world will intersect at an infinite point, and the projection of the intersection point on the imaging plane is called the vanishing point. When parallel lines in the real world are parallel to the imaging plane, the vanishing point is at infinity of the imaging plane. However, when there is a non-parallel relationship between the group of parallel lines and the imaging plane, the vanishing point will lie within a limited distance of the imaging plane, even within the imaging area.

消失点具有一些重要的性质：Vanishing points have some important properties:

a.在现实世界中，彼此平行的线和彼此平行的线都指向同一个消失点；a. In the real world, lines parallel to each other and lines parallel to each other all point to the same vanishing point;

b.直线对应的消失点点必须位于像面上直线的投影光线的方向上；b. The vanishing point corresponding to the straight line must be located in the direction of the projected light of the straight line on the image plane;

c.消失点的位置与滚转角无关，只与俯仰角和偏航角有关。c. The position of the vanishing point has nothing to do with the roll angle, only the pitch angle and the yaw angle.

消失点是透视投影后在图像平面上形成的重要特征，可以为场景分析提供大量的结构信息和方向信息或者用于测量相机自身参数。因此，消失点在矩形结构估计和匹配、三维重建、相机标定、方位角估计中有广泛用途。The vanishing point is an important feature formed on the image plane after perspective projection, which can provide a large amount of structural information and orientation information for scene analysis or be used to measure the parameters of the camera itself. Therefore, vanishing points are widely used in rectangular structure estimation and matching, 3D reconstruction, camera calibration, and azimuth estimation.

鉴于此系统在建筑物内部使用，绝大部分建筑墙面地面都是平直的，因此可以提取到固定的消失点。依据消失点是否移动判定监控画面是否发生旋转，首先提取消失点，流程如下：Since this system is used inside buildings, most of the building walls and floors are flat, so a fixed vanishing point can be extracted. According to whether the vanishing point moves or not, it is determined whether the monitoring screen rotates. First, the vanishing point is extracted. The process is as follows:

(1)取监控摄像头的内参和畸变，对原始图片去畸变，得到的去畸变图片；(1) Take the internal parameters and distortion of the surveillance camera, de-distort the original picture, and obtain the de-distorted picture;

(2)在去畸变后的图片上使用LSD线段提取器提取出线段；(2) Use the LSD line segment extractor to extract line segments on the dedistorted picture;

(3)按照长度筛选线段取大于60个像素的为有效线段；(3) Screening line segments according to their lengths and taking more than 60 pixels as valid line segments;

(4)使用霍夫变换计算每条线段的角度；(4) Calculate the angle of each line segment using the Hough transform;

(5)对线段按角度聚类，分为三类；(5) The line segments are clustered by angle and divided into three categories;

(6)使用最小二乘法对每类直线求最近点；(6) Use the least squares method to find the nearest point for each type of straight line;

(7)选取三个消失点中坐标之和最小的为参考点；(7) Select the smallest sum of coordinates among the three vanishing points as the reference point;

如图6所示，红绿蓝为三类线段，红色圆圈中心处为参考点具体位置.As shown in Figure 6, red, green and blue are three types of line segments, and the center of the red circle is the specific location of the reference point.

2)姿态校正2) Attitude correction

如图6所示，设旋转前的消失点坐标为(x₀，y₀)，旋转后消失点坐标为(x₁，y₁)，则可以计算出偏航角的变化量δyaw和俯仰角的变化量δpitch如式(15)(16)所示。As shown in Figure 6, the coordinates of the vanishing point before the rotation are (x ₀ , y ₀ ), and the coordinates of the vanishing point after the rotation are (x ₁ , y ₁ ), then the variation of the yaw angle δyaw and the pitch angle can be calculated The amount of change δpitch is shown in equations (15) and (16).

δyaw＝arctan(x₁-x₀)#(15)δyaw=arctan(x ₁ -x ₀ )#(15)

δpitch＝arctan(y₁-y₀)#(16)δpitch=arctan(y ₁ -y ₀ )#(16)

由于监控摄像头的位姿是由变换矩阵T_mw表示的，因此将角度的变化量转化成矩阵形式表示同样的旋转，偏航角方向的旋转记为R_x，俯仰角方向的旋转记为R_y如式(17)(18)所示：Since the pose of the surveillance camera is represented by the transformation matrix T _mw , the change of the angle is converted into a matrix form to represent the same rotation, the rotation in the yaw direction is denoted as R _x , and the rotation in the pitch direction is denoted as R _y As shown in formula (17) (18):

鉴于摄像头仅能发生旋转而无法发生移动，因此可以认为光心坐标发生的平移量近似为零，将两个方向的旋转矩阵相乘再与平移向量t＝(0，0，0)^T组成变换矩阵T₀₁，表示从旋转前的相机坐标转换至旋转后的相机坐标。定位所需的新的监控摄像头的变换矩阵记为T_mw′则有：In view of the fact that the camera can only rotate and cannot move, it can be considered that the translation amount of the optical center coordinate is approximately zero, and the rotation matrix in the two directions is multiplied and then combined with the translation vector t=(0, 0, 0) ^T to form a transformation The matrix T ₀₁ represents the transformation from the camera coordinates before rotation to the camera coordinates after rotation. The transformation matrix of the new surveillance camera required for positioning is denoted as T _mw ′, then there are:

T_mw′＝T₀₁T_mw#(19)T _mw ′=T ₀₁ T _mw #(19)

具体地，位置计算流程如下：Specifically, the location calculation process is as follows:

1)目标识别跟踪1) Target recognition and tracking

基于SORT(Simple Online And Realtime Tracking)算法对视频流中的多个目标进行检测并跟踪，显示每个目标的id。本算法使用强大的CNN检测器——yolov3对目标进行检测，然后使用卡尔曼滤波(Kalman filter)与匈牙利算法(Hungarian algorithm)的对检测到的目标进行跟踪。本算法能够在满足实时性要求的前提下，实现准确的多行人跟踪。Detect and track multiple targets in the video stream based on the SORT (Simple Online And Realtime Tracking) algorithm, and display the id of each target. This algorithm uses a powerful CNN detector - yolov3 to detect the target, and then uses the Kalman filter (Kalman filter) and the Hungarian algorithm to track the detected target. The algorithm can achieve accurate multi-pedestrian tracking on the premise of meeting the real-time requirements.

2)行人位置计算2) Pedestrian position calculation

世界坐标系为三维重建得到的点云地图的坐标系，即以第一帧光心位置为原点，Z轴方向与重力方向相反。世界坐标系到监控相机坐标系的变换矩阵T_mw由3×3的旋转矩阵R和1×3平移向量t组成。通过实际测量得出在世界坐标系中地面的高度为h，通过标定板标定出相机的内参矩阵K。The world coordinate system is the coordinate system of the point cloud map obtained by 3D reconstruction, that is, the position of the optical center of the first frame is the origin, and the Z-axis direction is opposite to the direction of gravity. The transformation matrix T _mw from the world coordinate system to the surveillance camera coordinate system consists of a 3×3 rotation matrix R and a 1×3 translation vector t. Through the actual measurement, the height of the ground in the world coordinate system is h, and the camera's internal parameter matrix K is calibrated through the calibration plate.

根据投影方程(20)行人在地面上的坐标X_w、Y_w可解：According to the projection equation (20), the coordinates X _w , Y _w of pedestrians on the ground can be solved:

具体求解过程如下：The specific solution process is as follows:

P_ow＝-R^Tt#(21)P _ow =-RT ^t #(21)

依据地面在世界坐标系中的高度为h写出地面所在平面的方程，如式(24)所示，将地面所在平面转换为平面上一点和法向量表示：According to the height of the ground in the world coordinate system as h, write the equation of the plane where the ground is located, as shown in Equation (24), convert the plane where the ground is located to a point and a normal vector on the plane to represent:

z＝h#(23)z=h#(23)

p₀＝(0，0，h)，

p ₀ = (0, 0, h),

将射线

写作参数方程如(25)所示，其中

为射线的方向向量，

t为参数t∈[0，∞)。will ray

The writing parameter equation is shown in (25), where

is the direction vector of the ray,

t is the parameter t∈[0, ∞).

设射线

与地面所在平面交点为P_g，则有：set ray

The intersection point with the plane where the ground is located is P _g , then there are:

式(26)整理后得到：After finishing formula (26), we get:

由向量点乘分配律得：From the distributive law of vector dot product:

由此可解出交点

交点P_g的坐标即为行人在世界坐标系下的三维坐标。From this, the intersection point can be solved

S4可视化展示：将步骤S1场景的三维点云地图和步骤S3人物在三维点云地图中的位置展示出来，并提供GUI界面供用户交互和监督。S4 Visual display: Display the 3D point cloud map of the scene in step S1 and the position of the characters in the 3D point cloud map in step S3, and provide a GUI interface for user interaction and supervision.

本发明所设计实现的系统中包含一个能够展示三维点云、摄像头模型与轨迹的基于Three.JS图形库的3D展示界面，其功能逻辑如下所述。The system designed and implemented by the present invention includes a 3D display interface based on the Three.JS graphics library capable of displaying the three-dimensional point cloud, the camera model and the trajectory, and its functional logic is as follows.

1)三维点云加载1) 3D point cloud loading

本发明能够加载位于服务端的多个三维点云PCD文件并正确显示在3D展示界面中，其功能逻辑如下：The present invention can load multiple three-dimensional point cloud PCD files located on the server and correctly display them in the 3D display interface, and its function logic is as follows:

(1)寻找指定目录下的所有PCL文件，得到PCL文件名；(1) Find all PCL files in the specified directory, and get the PCL file name;

(2)根据文件名，使用JavaScript自带的FileLoader读取PCD文件，得到三维点云的数据，如点云数、点坐标集、颜色集等；(2) According to the file name, use the FileLoader that comes with JavaScript to read the PCD file to obtain the data of the 3D point cloud, such as the number of point clouds, point coordinate sets, color sets, etc.;

(3)将点云的颜色集色域重映射至Three.Js图形库支持的色域范围，公式如下所示；(3) Remap the color gamut of the point cloud to the color gamut range supported by the Three.Js graphics library, the formula is as follows;

(4)结合JavaScript的类型化数组与Three.Js的数据类，将三维点云数据对应填充至Three.Js提供的图形类对象中；类型化数组如图7所示。(4) Combining the typed array of JavaScript and the data class of Three.Js, correspondingly fill the 3D point cloud data into the graphics class object provided by Three.Js; the typed array is shown in Figure 7.

(5)每一个创建完毕的图形类对象用文件名命名，并添加至展示界面中；(5) Each created graphic class object is named with a file name and added to the display interface;

(6)根据系统设置，只显示默认查看的点云，将其他点云留作备用。(6) According to the system settings, only the point cloud viewed by default is displayed, and other point clouds are reserved for backup.

至此，关于系统中加载三维点云的功能逻辑梳理完毕，图8为三维点云加载样例。So far, the functional logic of loading 3D point clouds in the system has been sorted out. Figure 8 is an example of loading 3D point clouds.

2)摄像头模型加载2) Camera model loading

本发明能够根据位于服务端的摄像头标定参数，在3D展示界面中的正确位置添加朝向正确的摄像头模型，使整体布局一目了然，其功能逻辑如下：The present invention can add a camera model with the correct orientation in the correct position in the 3D display interface according to the camera calibration parameters located on the server side, so that the overall layout is clear at a glance, and its functional logic is as follows:

(1)读取指定参数xml文件，得到摄像头数量和对应的名称与变换矩阵；(1) Read the specified parameter xml file to obtain the number of cameras and the corresponding name and transformation matrix;

(2)使用JavaScript自带的FileLoader读取摄像头模型obj文件，据此创建指定数量的Three.Js的图形类对象；(2) Use the FileLoader that comes with JavaScript to read the camera model obj file, and create a specified number of Three.Js graphic class objects accordingly;

(3)根据变换矩阵调整每个图形类对象的世界坐标与朝向，并进行命名；(3) Adjust the world coordinates and orientation of each graphic object according to the transformation matrix, and name it;

(4)将每一个调整完毕的图形类对象添加至展示界面中。(4) Add each adjusted graphic object to the display interface.

至此，关于系统中加载摄像头模型的功能逻辑梳理完毕，图9为加载完成的摄像头模型样例图，其背景为图8的点云。So far, the functional logic of loading the camera model in the system has been sorted out. Figure 9 is a sample diagram of the loaded camera model, and the background is the point cloud in Figure 8.

展示的具体内容包括多个三维点云及其对应的摄像头与轨迹，在选择展示某一点云时，其他点云及对应的摄像头及轨迹隐藏。The specific content displayed includes multiple 3D point clouds and their corresponding cameras and trajectories. When a certain point cloud is selected to be displayed, other point clouds and corresponding cameras and trajectories are hidden.

3)轨迹信息传输与绘制优化3) Trajectory information transmission and rendering optimization

本发明的服务端能够接收Socket信息，基于此实现了所得轨迹信息的传输与绘制功能，且解决了目前Web图形库线条无法调整粗细的问题，功能流程如图10所示，功能逻辑具体如下：The server of the present invention can receive Socket information, based on this, the transmission and drawing functions of the obtained trajectory information are realized, and the problem that the line thickness of the current Web graphics library cannot be adjusted is solved. The function flow is shown in Figure 10, and the function logic is as follows:

(1)接受Socket信息，实时得到轨迹相关信息，其中包含轨迹ID、操作类型与轨迹数据；(1) Accept Socket information, and obtain track-related information in real time, including track ID, operation type and track data;

(2)若操作类型为轨迹更新：先在已有轨迹库中寻找对应ID，未找到则新建Three.Js的线条类对象，填充对应的轨迹数据，并添加至展示界面中；若找到，则使用新得到的轨迹数据更新已有的Three.Js线条类对象，并重新于展示界面中绘制；(2) If the operation type is track update: first search for the corresponding ID in the existing track library, if not found, create a Three.Js line object, fill in the corresponding track data, and add it to the display interface; if found, then Update the existing Three.Js line class object with the newly obtained trajectory data, and redraw it in the display interface;

(3)若操作类型为轨迹删除，则在已有轨迹库中寻找对应ID，将该ID对应的Three.Js线条类对象于展示界面中删除；(3) If the operation type is track deletion, search for the corresponding ID in the existing track library, and delete the Three.Js line class object corresponding to the ID in the display interface;

轨迹更新的闪烁问题：轨迹绘制时，Three.Js提供的更新功能会导致轨迹不停闪烁，究其原因是轨迹更新时不停地删除重绘，本专利为解决该问题，采用“双轨迹覆盖”方法，具体为：在处理新得到的轨迹信息时保留旧轨迹，在新轨迹覆盖绘制完毕后再从展示界面中删除旧轨迹。如此既能保证轨迹的展示效果，又能降低系统的内存占用。The flickering problem of track update: When the track is drawn, the update function provided by Three.Js will cause the track to keep flickering. The reason is that the redrawing of the track is constantly deleted when the track is updated. ” method, specifically: retain the old track when processing the newly obtained track information, and delete the old track from the display interface after the new track is overwritten and drawn. This can not only ensure the display effect of the track, but also reduce the memory usage of the system.

轨迹的粗细问题：轨迹绘制时，Three.Js提供的线条类对象无法调整粗细，导致轨迹在展示界面中很难辨识，本专利为解决该问题提出了“复制单轨迹部分重叠”的方法，具体为：对Three.Js线条类对象的坐标数据更新后，在展示界面中绘制时多次复制该对象，逐渐向四周进行平移，成功实现了轨迹加粗的效果，并通过性能测试。The problem of the thickness of the track: When the track is drawn, the line object provided by Three.Js cannot adjust the thickness, which makes the track difficult to identify in the display interface. This patent proposes a method of "copying a single track with partial overlap" to solve this problem. The purpose is: After updating the coordinate data of the Three.Js line object, copy the object multiple times when drawing in the display interface, and gradually pan around it, successfully achieving the effect of thickening the track and passing the performance test.

(4)对象绑定及视角操作(4) Object binding and perspective operation

如前所述，本发明的3D展示界面能够展示多个点云及其对应的摄像头、轨迹，并能够确保在选择展示某一点云时，其他点云及对应的摄像头、轨迹隐藏。本专利采用了Three.Js图形库提供的对象绑定操作将摄像头、轨迹依据ID对应至指定的点云，从而实现了切换点云场景的功能。As mentioned above, the 3D display interface of the present invention can display multiple point clouds and their corresponding cameras and trajectories, and can ensure that when a certain point cloud is selected to be displayed, other point clouds and corresponding cameras and trajectories are hidden. This patent uses the object binding operation provided by the Three.Js graphics library to correspond the camera and the trajectory to the specified point cloud according to the ID, thereby realizing the function of switching the point cloud scene.

此外，本发明的3D展示界面基于Web与JavaScript的事件监听器机制实现了视角的平移、旋转、放缩操作，且具备多终端适配的能力。In addition, the 3D display interface of the present invention realizes the translation, rotation, and zoom operations of the viewing angle based on the event listener mechanism of Web and JavaScript, and has the ability of multi-terminal adaptation.

关于GUI系统界面，系统整体架构如图11所示。Regarding the GUI system interface, the overall system architecture is shown in Figure 11.

本系统采用前后端分离架构，前端采用Vue开发框架构建可视化展示平台，提供3D点云、行人轨迹、实时监控等可视化功能。The system adopts the front-end and back-end separation architecture, and the front-end uses the Vue development framework to build a visual display platform, providing visualization functions such as 3D point cloud, pedestrian trajectory, and real-time monitoring.

后端基于SpringBoot开发框架进行实现，为前端可视化提供监控管理、行人计数、行人轨迹、日志管理等计算和存储功能，前后端之间通过Http和WebSocket的方式进行交互通信。各功能描述如下：The back-end is implemented based on the SpringBoot development framework, which provides monitoring management, pedestrian counting, pedestrian trajectory, log management and other computing and storage functions for front-end visualization. The front-end and back-end communicate through Http and WebSocket. Each function is described as follows:

监控管理：对系统接入的监控进行管理，主要包含监控配置管理和状态管理。Monitoring management: Manage the monitoring of system access, mainly including monitoring configuration management and status management.

行人轨迹：利用行人定位跟踪算法实时获取行人轨迹坐标，实现行人轨迹可视化与历史轨迹的持久化存储。Pedestrian trajectory: Use the pedestrian positioning and tracking algorithm to obtain the coordinates of the pedestrian trajectory in real time, and realize the visualization of the pedestrian trajectory and the persistent storage of the historical trajectory.

行人计数：根据当前场景中轨迹数量计算行人数量，并在可视化界面实时展示。Pedestrian counting: Calculate the number of pedestrians based on the number of trajectories in the current scene, and display them in real time on the visual interface.

日志管理：记录系统产生的所有行为，并提供查询功能，帮助提高系统安全性。Log management: record all behaviors generated by the system and provide query functions to help improve system security.

本系统后端使用Mysql数据库实现数据持久化，能够提供高效的数据读写；使用kafka消息队列将行人定位跟踪算法集成进系统，具体流程为：The backend of the system uses the Mysql database to achieve data persistence, which can provide efficient data reading and writing; the kafka message queue is used to integrate the pedestrian location tracking algorithm into the system. The specific process is as follows:

(1)行人定位跟踪算法分析视频帧，将行人在三维坐标系中的坐标送入Kafka中名为slam的topic，partition数量设为1可以保证消息的有序性；(1) The pedestrian location tracking algorithm analyzes the video frame, and sends the coordinates of the pedestrian in the three-dimensional coordinate system to the topic named slam in Kafka. The number of partitions is set to 1 to ensure the orderliness of the message;

(2)系统后端实时监听Kafka，从中读取行人定位跟踪算法计算得到的行人轨迹坐标，一方面将行人轨迹坐标持久化存储至MySql数据库，另一方面将行人轨迹坐标通过WebSocket发送给前端；(2) The backend of the system monitors Kafka in real time, reads the pedestrian trajectory coordinates calculated by the pedestrian positioning tracking algorithm, and stores the pedestrian trajectory coordinates persistently in the MySql database on the one hand, and sends the pedestrian trajectory coordinates to the front end through WebSocket on the other hand;

(3)前端每当接收到一次后端推送的行人轨迹坐标，即根据该坐标在3D点云图中绘制轨迹点。当轨迹点积累一定数量后形成较明显的行人轨迹。(3) Whenever the front end receives the pedestrian trajectory coordinates pushed by the back end, it draws the trajectory points in the 3D point cloud map according to the coordinates. When a certain number of trajectory points accumulate, a more obvious pedestrian trajectory is formed.

使用Kafka来集成算法，有利于提高系统的可扩展性，能够灵活地对算法功能进行扩展，提供可插拔式的功能植入。Using Kafka to integrate algorithms is conducive to improving the scalability of the system, flexibly extending algorithm functions, and providing pluggable function implantation.

本发明的可视化视频监控系统前端界面如图13所示，包含3D展示界面、视频监控、异常项监控开关、轨迹列表及异常事件列表等部分。右上角可以对楼层进行切换，切换后会同步更新左侧的点云、摄像头与轨迹及右侧的视频监控情况；左下角则可以选择性启用本专利的功能如目标跟踪、火苗报警等。The front-end interface of the visual video surveillance system of the present invention is shown in FIG. 13 , including a 3D display interface, video surveillance, an abnormal item monitoring switch, a track list, and an abnormal event list. The upper right corner can switch floors. After switching, the point cloud, camera and trajectory on the left and video surveillance on the right can be updated simultaneously; the lower left corner can selectively enable the functions of this patent, such as target tracking, flame alarm, etc.

当右侧视频监控区检测到行人并计算得出行人轨迹后，在左侧会同步更新绘制这些轨迹，且在下侧轨迹列表区会有对应表项的更新，如图12所示，轨迹清晰可见，且使用摄像头模型展示了摄像头的位置与朝向。When pedestrians are detected in the video surveillance area on the right and the trajectories of pedestrians are calculated, these trajectories will be updated and drawn synchronously on the left side, and the corresponding entries will be updated in the track list area on the lower side, as shown in Figure 12, the trajectories are clearly visible , and use the camera model to show the position and orientation of the camera.

当视频监控中监测到火苗或烟雾时，会弹出警报提示，便于本专利的使用者及时发现异常情况，同时会在右下角的异常事件列表中更新对应表项，如图14所示。整体系统运行流畅，满足性能要求。When a fire or smoke is detected in the video surveillance, an alarm prompt will pop up, which is convenient for the user of this patent to find the abnormal situation in time, and at the same time, the corresponding table item will be updated in the abnormal event list in the lower right corner, as shown in Figure 14. The overall system runs smoothly and meets the performance requirements.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置实施例、电子设备实施例、计算机可读存储介质实施例和计算机程序产品实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiments, the electronic device embodiments, the computer-readable storage medium embodiments, and the computer program product embodiments, since they are basically similar to the method embodiments, the descriptions are relatively simple, and for relevant details, refer to the method embodiments part of the description.

以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特殊进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围。都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。The above-mentioned embodiments are only specific implementations of the present application, and are used to illustrate the technical solutions of the present application, but not to limit them. The protection scope of the present application is not limited thereto. Detailed description, those of ordinary skill in the art should understand: any person skilled in the art is within the technical scope disclosed in this application, and it can still modify the technical solutions described in the foregoing embodiments or can easily think of changes, Alternatively, equivalent replacements are made to some of the technical solutions; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application. All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

1. a monitoring video personnel positioning and tracking method based on visual slam, is characterized in that, use the depth camera to carry out three-dimensional reconstruction to the environment, obtain the point cloud map of the environment; External parameter calibration is combined under the scene of the monitoring camera, by the calibration board Perform external parameter calibration on the surveillance camera to obtain the position and attitude information of the surveillance camera; track the person through the surveillance camera, use the deep neural network to identify the person in the surveillance image, and use the inverse perspective transformation based on the previously calibrated position and attitude. The principle of the method is to calculate the three-dimensional position of the person on the ground based on the a priori of the pedestrians appearing in the monitoring, so as to realize the drawing of the person's trajectory in the constructed point cloud map, and make it appear in the point cloud map.

2. the monitoring video personnel positioning tracking method based on visual slam according to claim 1, is characterized in that, comprises the following steps:

S1. 3D reconstruction based on visual slam: construct a 3D point cloud map of the scene and record the data required for external parameter calibration, and use the RGBD image stream and inertial sensor data provided by the depth camera to construct a 3D point cloud map of the scene based on visual odometry And save it to the file; in the process of 3D reconstruction, shoot the checkerboard calibration board and record the position and attitude of the camera at this time;

S2, calibration: the surveillance camera takes a picture of the surveillance image of the checkerboard calibration board, and combines the calibration data provided during the 3D reconstruction in step S1 to calculate the location and attitude of the surveillance camera in the 3D point cloud map;

S3. Position tracking and calculation of pedestrians in the surveillance camera: track the person, and identify the position of the person in the image based on the position and attitude data obtained in step S2 and the surveillance video stream provided by the surveillance camera. the location in the point cloud map;

The process of tracking and calculating the pedestrian's position in the surveillance camera is as follows:

S31, according to the type of self-monitoring camera and whether the movement has occurred, select whether to enter the posture correction process, if yes, enter step S32, otherwise enter step S33;

S32, if entering the attitude correction process, extract the vanishing point in the image and compare the position of the vanishing point, determine whether the monitoring screen rotates according to whether the vanishing point moves, and update the rotation;

S33, enter the positioning process, first take a frame of surveillance video image;

S34, perform target detection on the pedestrian in the image, and obtain the coordinates of the target frame;

S35, target tracking is performed on the detected target frame, and the corresponding personnel position coordinates are given;

S36. Calculate the spatial position coordinates of the personnel according to the calibration parameters of the surveillance camera;

S37, if the positioning is not terminated, acquire the next frame of image and return to step S33, otherwise end.

3. the monitoring video personnel positioning tracking method based on visual slam according to claim 2, is characterized in that, the construction process of the three-dimensional point cloud map of step S1 scene increases point cloud map processing thread, for accepting each incoming Frame the camera's pose information and RGBD image frames, and output an accurate point cloud map. The specific process is as follows:

S141. Screen the incoming pose information and RGBD image frame of each frame of the depth camera. When the camera angle change between the current frame and the last selected frame is greater than 10° and the displacement change is greater than 2 meters, select the current frame , and perform subsequent point cloud map generation operations;

S142, calculate the point cloud block of the current frame, and rotate it to a unified world coordinate system;

S143, splicing and merging the point cloud blocks generated by all frames to obtain an overall point cloud map, filtering and removing outliers on the point cloud map, so as to compress the data volume of the point cloud map and optimize the visual perception of the map at the same time ;

S144. When a loopback occurs during the mapping process, ORB-SLAM3 re-optimizes the pose of the selected frame, re-splices the point cloud, and re-does the point cloud processing operation according to step S143.

4. the monitoring video personnel positioning and tracking method based on visual slam according to claim 2, is characterized in that, step S2 external parameter calibration method is:

S21. The surveillance camera shoots the checkerboard calibration board: select the origin position of the world coordinate system, and the surveillance camera slowly moves from the origin position to the checkerboard calibration board. In this process, ORB_SLAM3 is used to estimate the pose of the surveillance camera in real time. , when the surveillance camera moves to the front of the checkerboard calibration board, close the program and save the photos taken by the current surveillance camera and the pose of the camera;

S22. Surveillance camera internal parameter calibration: place the checkerboard calibration board within the range of the surveillance camera, move the checkerboard calibration board from multiple angles, record a video, extract frames from the video, identify the checkerboard, and use Zhang Zhengyou's calibration method to calibrate the surveillance camera internal reference and distortion;

S23. Surveillance camera external parameter calibration: According to the actual three-dimensional position information of the target feature point and the two-dimensional position of the target feature point in the image, the direct linear transformation method is used to solve the camera coordinate system and the target coordinate system, so as to realize the monitoring camera and the three-dimensional Calculation of relative positional relationship between point cloud maps.

5. the monitoring video personnel positioning and tracking method based on visual slam according to claim 2, is characterized in that, in step S36, the spatial position coordinate calculation method of personnel is:

According to the pose transformation matrix of the surveillance camera, the camera model optical center P _ow = (X _ow , Y _ow , Z _ow ) of the surveillance camera is obtained:

P _ow =-RT ^t #(21)

Take the midpoint M(u, v) of the lower bottom edge of the target frame of the person to be positioned, calculate the spatial position of point M on the normalized plane according to the projection equation, set the depth d to be 1 meter, and solve it according to formula (22) The spatial coordinates of the point M on the normalized plane P _m =(X _m , Y _m , Z _m );

According to the height h of the ground in the world coordinate system, the equation of the plane where the ground is located is given as shown in equation (24), and the plane where the ground is located is converted into a point and a normal vector on the plane to represent:

z=h#(23)

p ₀ = (0, 0, h),

will ray

The writing parameter equation is shown in (25), where

is the direction vector of the ray,

t is the parameter t∈[0, ∞),

set ray

After finishing formula (26), we get:

From the distributive law of vector dot product:

Solve for the intersection

6. the monitoring video personnel locating and tracking method based on visual slam according to claim 2, is characterized in that, also comprises step S4 visual display: the three-dimensional point cloud map of step S1 scene and step S3 character are in three-dimensional point cloud map The location of the 3D point cloud is displayed, and a GUI interface is provided for user interaction and supervision. The specific content displayed includes multiple 3D point clouds and their corresponding cameras and trajectories. When a certain point cloud is selected for display, other point clouds and corresponding cameras and trajectories are displayed. hide.

7. a monitoring video personnel positioning tracking system based on visual slam, is characterized in that, comprises following module, to realize the method of any one of claim 1-6:

The 3D reconstruction module is used to construct the 3D point cloud map of the scene and record the data required for external parameter calibration. It runs only once in the entire use cycle, and uses the RGBD image stream and inertial sensor data provided by the depth camera to construct the scene's The 3D point cloud map is saved to the file for display by the visual display module; in the process of 3D reconstruction, the checkerboard calibration board and the position and attitude of the camera are recorded for the calibration module to use;

The calibration module is used to measure the position and attitude of each surveillance camera in the 3D point cloud map. It runs only once in the entire use cycle. The surveillance camera takes a picture of the surveillance image of the checkerboard calibration board, combined with the calibration data provided during 3D reconstruction. Calculate the position and attitude of the surveillance camera in the 3D point cloud map for use by the position calculation module;

The position calculation module is used to identify the position of the person in the image and give the position of the person in the 3D point cloud map for the visualization display module to display according to the position and attitude data obtained by the calibration module and the monitoring video stream provided by the monitoring camera. ;

The visual display module is used to display the 3D point cloud map of the scene and the position of the characters in the 3D point cloud map, and provide a GUI interface for user interaction and supervision.