CN115035162A - Monitoring video personnel positioning and tracking method and system based on visual slam - Google Patents
Monitoring video personnel positioning and tracking method and system based on visual slam Download PDFInfo
- Publication number
- CN115035162A CN115035162A CN202210669897.1A CN202210669897A CN115035162A CN 115035162 A CN115035162 A CN 115035162A CN 202210669897 A CN202210669897 A CN 202210669897A CN 115035162 A CN115035162 A CN 115035162A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- camera
- monitoring
- dimensional
- cloud map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention discloses a method and a system for positioning and tracking monitoring video personnel based on visual slam.A depth camera is used for carrying out three-dimensional reconstruction on an environment to obtain a point cloud map of the environment; combining external reference calibration with a scene of a monitoring camera, and performing external reference calibration on the monitoring camera through a calibration plate to obtain position and posture information of the monitoring camera; the method comprises the steps of tracking personnel through a monitoring camera, identifying people in a monitored image by utilizing a deep neural network, calculating the three-dimensional positions of the people according to the ground priors of the pedestrians appearing in the monitoring by adopting the principle of inverse perspective transformation based on the previously calibrated positions and postures, drawing the tracks of the personnel in a constructed point cloud map, and enabling the tracks to be presented in the point cloud map.
Description
Technical Field
The invention relates to the technical field of machine vision, in particular to a monitoring video personnel positioning and tracking method and system based on visual slam.
Background
In recent years, the market of home and abroad video monitoring is increased explosively, the monitoring development tends to high clarity and intellectualization continuously, however, the traditional two-dimensional monitoring system needs to switch monitoring pictures frequently, and various disadvantages exist. Along with the development of artificial intelligence, more and more work can be accomplished through the machine is automatic for people can be liberated from boring work, and the efficiency of work has been improved.
In the prior art, the position of a person in a scene is generally determined by Wi-Fi or the like. Wi-Fi positioning generally adopts a 'neighbor method' to judge, namely, the position of a hot spot or a base station which is closest to the Wi-Fi positioning is considered, and if a plurality of information sources are nearby, the positioning accuracy can be improved through cross positioning (triangulation positioning). When a user opens a Wi-Fi and mobile cellular network when using a smart phone, the user can become a data source, the signal intensity of a huge amount of determined position points needs to be recorded in advance, and the position is determined by comparing the signal intensity of newly added equipment with a database with huge amount of data.
Compared with a Wi-fi positioning method, monitored personnel need to actively carry a handheld device to receive wireless signals, the existing calibration method needs to move a camera, however, most monitoring cameras in an actual environment are fixed to a certain position, and the calibration method of the mobile camera commonly used in a visual SLAM cannot be adopted to calibrate the monitoring cameras, namely, a specific representation of the position of the monitored personnel in the environment in a computer cannot be accurately obtained. Based on the problem, the invention adopts a visual positioning method depending on a monitoring system, external reference calibration is combined in a scene of a monitoring camera, the external reference calibration can be carried out without moving the camera, and meanwhile, a positioned person does not need to carry equipment, only video image information shot by the camera is automatically processed, and the position of the person is accurately positioned.
Disclosure of Invention
The invention provides a method and a system for positioning and tracking personnel of a monitoring video based on a visual slam, aiming at the problems that in the prior art, two-dimensional monitoring operation is complicated, unintuitive, low in efficiency, the personnel positioning and tracking depend on a handheld device and the like.
In order to achieve the above purpose, the invention provides the following technical scheme:
on one hand, the monitoring video personnel positioning and tracking method based on the visual slam provided by the invention has the advantages that the environment is subjected to three-dimensional reconstruction by using the depth camera, so that a point cloud map of the environment is obtained; combining external reference calibration with a scene of a monitoring camera, and performing external reference calibration on the monitoring camera through a calibration plate to obtain position and posture information of the monitoring camera; the method comprises the steps of tracking personnel through a monitoring camera, identifying people in a monitoring image by utilizing a deep neural network, calculating the three-dimensional positions of the people according to the ground prior of the pedestrians appearing in the monitoring by adopting the principle of inverse perspective transformation based on the previously calibrated positions and postures, drawing the tracks of the personnel in a constructed point cloud map, and enabling the tracks to be presented in the point cloud map.
Further, the monitoring video personnel positioning and tracking method based on the visual slam is characterized by comprising the following steps:
s1, three-dimensional reconstruction based on visual slam: constructing a three-dimensional point cloud map of a scene and recording data required by external reference calibration, constructing the three-dimensional point cloud map of the scene according to a visual odometer and storing the three-dimensional point cloud map of the scene into a file by using RGBD picture stream and inertial sensor data provided by a depth camera; in the three-dimensional reconstruction process, shooting a checkerboard calibration board and recording the position and the posture of a camera at the moment;
s2, calibration: the monitoring camera shoots a monitoring picture of the chessboard pattern calibration plate, and the position and the posture of the monitoring camera in the three-dimensional point cloud map are calculated by combining the calibration data provided in the step S1 during three-dimensional reconstruction;
s3, tracking and calculating the position of the pedestrian in the monitoring camera: tracking the person, identifying the position of the person in the image and providing the position of the person in the three-dimensional point cloud map according to the position and posture data obtained in the step S2 and the monitoring video stream provided by the monitoring camera;
the pedestrian position tracking and calculating process in the monitoring camera comprises the following steps:
s31, selecting whether to enter an attitude correction process according to the type of the monitoring camera and whether the monitoring camera moves, if so, entering a step S32, otherwise, entering a step S33;
s32, if entering the posture correction process, extracting a vanishing point in the image, comparing the vanishing point with the position of the vanishing point, judging whether the monitoring picture rotates according to whether the vanishing point moves, and updating the rotation;
s33, entering a positioning process, and firstly taking a frame of monitoring video image;
s34, carrying out target detection on the pedestrian in the image to obtain a target frame coordinate;
s35, carrying out target tracking on the detected target frame and giving the corresponding personnel position coordinates;
s36, calculating the space position coordinates of the personnel according to the calibration parameters of the monitoring camera;
s37, if the positioning is not terminated, acquiring the next frame image and returning to the step S33, otherwise, ending.
Further, a point cloud map processing thread is added in the construction process of the three-dimensional point cloud map of the scene in the step S1, the point cloud map processing thread is used for receiving the position and orientation information and RGBD image frames of each frame of camera, and outputting an accurate point cloud map, and the specific flow is as follows:
s141, screening the position and pose information of each frame of depth camera and an RGBD image frame, and selecting a current frame when the camera angle change between the current frame and a last selected frame is more than 10 degrees and the displacement change is more than 2 meters, and performing subsequent point cloud map generation operation;
s142, calculating a point cloud block of the current frame, and rotating the point cloud block to a uniform world coordinate system;
s143, splicing and merging the point cloud blocks generated by all the frames to obtain an integral point cloud map, and performing filtering and outlier removing treatment on the point cloud map to compress the data volume of the point cloud map and optimize the visual impression of the map;
and S144, when a loop occurs in the process of drawing construction, the ORB-SLAM3 re-optimizes the pose of the selected frame, re-splices the point clouds, and re-performs point cloud processing operation according to the step S143.
Further, the external reference calibration method in step S2 includes:
s21, shooting a standard chessboard grid calibration plate by a monitoring camera: selecting an origin position of a world coordinate system, slowly moving a monitoring camera to the direction of a checkerboard calibration board from the origin position, estimating the pose of the monitoring camera in real time by using ORB _ SLAM3 in the process, closing a program when the monitoring camera moves to the front of the checkerboard calibration board, and storing the current photo shot by the monitoring camera and the pose of the camera;
s22, calibrating internal reference of the monitoring camera: placing a chessboard grid calibration board in the range of a monitoring camera, moving the chessboard grid calibration board at multiple angles, recording a section of video, extracting frames in the video, identifying a chessboard grid, and calibrating internal parameters and distortion of the monitoring camera by using a Zhang Zhengyou calibration method;
s23, calibrating external parameters of the monitoring camera: and solving a camera coordinate system and a target coordinate system by adopting a direct linear transformation method according to the actual three-dimensional position information of the target feature point and the two-dimensional position of the target feature point in the image, so as to realize the calculation of the relative position relation between the monitoring camera and the three-dimensional point cloud map.
Further, the method for calculating the spatial position coordinates of the person in step S36 includes:
solving camera model optical center P of monitoring camera according to pose transformation matrix of monitoring camera ow =(X ow ,Y ow ,Z ow ):
P ow =-R T t#(21)
Taking the midpoint M (u, v) of the lower bottom edge of the target frame of the person to be positioned according to the projectionThe shadow equation calculates the space position of the point M on the normalization plane, the depth d is set to be 1M, and the space coordinate P of the point M on the normalization plane is solved according to a formula (22) m =(X m ,Y m ,Z m );
According to the height h of the ground in the world coordinate system, an equation giving the plane of the ground is expressed by the formula (24), and the plane of the ground is converted into a point on the plane and normal vector representation:
z=h#(23)
will rayThe formulation parameter equation is shown in (25), whereinIs the direction vector of the ray and,t is a parameter t ∈ [0, ∞),
after work-up of formula (26) to give:
the distribution law is multiplied by vector points to obtain:
thereby solving the intersection pointIntersection point P g The coordinates of the pedestrian are three-dimensional coordinates of the pedestrian in a world coordinate system.
Further, the method for positioning and tracking the monitoring video personnel based on the visual slam further comprises a step S4 of visually displaying: and displaying the three-dimensional point cloud map of the scene in the step S1 and the position of the character in the step S3 in the three-dimensional point cloud map, providing a GUI (graphical user interface) for user interaction and supervision, wherein specific displayed contents comprise a plurality of three-dimensional point clouds and corresponding cameras and tracks thereof, and when a certain point cloud is selected to be displayed, other point clouds and the corresponding cameras and tracks are hidden.
On the other hand, the invention also provides a monitoring video personnel positioning and tracking system based on the visual slam, which comprises the following modules to realize the steps of the method:
the three-dimensional reconstruction module is used for constructing a three-dimensional point cloud map of a scene and recording data required by external reference calibration, the whole using period is only operated once, and the RGBD picture stream and the inertial sensor data provided by the depth camera are used for constructing the three-dimensional point cloud map of the scene according to the visual odometer and are stored in a file for being displayed by the visual display module; in the three-dimensional reconstruction process, shooting a checkerboard calibration board and recording the position and the posture of a camera at the moment for a calibration module to use;
the calibration module is used for measuring the position and the posture of each monitoring camera in the three-dimensional point cloud map, the whole using period only runs once, the monitoring cameras shoot a monitoring picture photo of a chessboard grid calibration plate, and the position and the posture of the monitoring cameras in the three-dimensional point cloud map are calculated by combining calibration data provided during three-dimensional reconstruction and are used by the position calculation module;
the position calculation module is used for identifying the position of a person in the image and providing the position of the person in the three-dimensional point cloud map for the visual display module to display according to the position and posture data obtained by the calibration module and the monitoring video stream provided by the monitoring camera;
and the visual display module is used for displaying the three-dimensional point cloud map of the scene and the positions of the characters in the three-dimensional point cloud map, and providing a GUI (graphical user interface) for user interaction and supervision.
Compared with the prior art, the invention has the following beneficial effects:
the method for positioning and tracking the monitoring video personnel based on the visual slam has three modules of drawing construction, calibration and monitoring, and the capacity expansion from 2D monitoring to 3D monitoring is realized. Meanwhile, a display system suitable for the monitoring scene is designed, and optimization is performed on the aspects of system point cloud display and interface video and track output.
1. The invention provides a three-dimensional map construction method based on a visual SLAM and an inertial odometer, which optimizes a reconstructed three-dimensional point cloud map through three main processes of tracking, local map construction and loop optimization so as to achieve terrain reconstruction with better effect on scenes in a building and enable monitoring personnel to quickly understand and accurately position the map.
2. In order to solve the problem that most monitoring cameras in the actual environment are fixed at a certain position, the calibration method of the mobile camera commonly used in the visual SLAM cannot be adopted to calibrate the monitoring cameras, namely, the specific representation of the position of the monitoring cameras in the environment in a computer cannot be accurately obtained. The invention provides a method for calibrating external parameters of a universal monitoring camera and an environmental three-dimensional point cloud.
3. By the depth camera parameter calibration method, the position information related to the depth camera is obtained, and the person track can be accurately calculated only by processing the video shot by the common general camera. Meanwhile, tracking and drawing are carried out according to the fact that each person traces in the generated point cloud picture, corresponding information of the track is stored and displayed on an interface, and overtime automatic elimination of the target tracking track can be achieved.
4. The invention provides a figure position calculation tracking algorithm based on a three-dimensional visual algorithm by calculating the position of a person and optimizing a track.
5. Based on the three-dimensional point cloud map and the graphical interface, the invention also provides an extensible function, can identify the flame or smoke appearing in the video image, is suitable for the in-building monitoring requirement of a real scene, and ensures that the system can automatically find the danger and give an alarm in time when abnormal conditions occur.
6. The track drawing method in the three-dimensional point cloud map is perfected by designing a data transmission and display method, background point cloud files are transmitted to the system in a formatted mode, and a three-dimensional (JS) graphic library-based 3D display interface capable of displaying the three-dimensional point cloud, the camera model and the track is established, so that monitoring personnel can understand output results simply, easily and clearly, and corresponding measures are taken conveniently. The designed interface has good universality and can support display on various devices.
7. The invention designs and develops a 3D intelligent monitoring system by combining the advantages of a visual SLAM in the aspect of map reconstruction, can comprehensively solve the problem of traditional two-dimensional monitoring of pain points, has the advantages of intuition, high efficiency, low cost and easy deployment, can display a 3-dimensional point cloud map, can monitor the three-dimensional layout of a whole building, realizes the function of target tracking by one key, automatically calculates and draws the track of personnel entering the building in the point cloud map, realizes clear and accurate three-dimensional display of the real-time track and inquiry of relevant information in a track list, and can also switch and view monitoring pictures of each floor by one key. And functions of flame early warning, smoke identification, behavior identification and the like can also be added.
8. The system designed by the invention simultaneously supports the calculation and display of the relative pose of the fixed monitoring camera, is conveniently applied to the existing monitoring scene, and is convenient for the integration of new-generation information technologies such as internet and big data under a big background, cloud calculation and the like with the monitoring system in depth.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a monitoring video personnel positioning and tracking system based on visual slam according to an embodiment of the present invention;
FIG. 2 is a flow chart of a three-dimensional reconstruction module according to an embodiment of the present invention;
FIG. 3 is a flow chart of a point cloud map calibration provided by an embodiment of the present invention;
fig. 4 is a corresponding relationship between the calibration board images shot by the RGBD camera and the monitoring camera provided by the embodiment of the present invention and the extracted feature points;
FIG. 5 is a flowchart of target detection, tracking, and position calculation according to an embodiment of the present invention;
fig. 6 is a vanishing point extracting diagram provided by the embodiment of the invention;
FIG. 7 is a diagram of a typed array according to an embodiment of the present invention;
fig. 8 is an illustration of a sample loading of a three-dimensional point cloud according to an embodiment of the present invention;
FIG. 9 is an illustration of a camera model provided by an embodiment of the present invention;
FIG. 10 is a flowchart for track update rendering according to an embodiment of the present invention;
FIG. 11 is a diagram of system data transmission according to an embodiment of the present invention;
fig. 12 is a track related function display of a visual video surveillance system based on three-dimensional vision according to an embodiment of the present invention;
fig. 13 is an overall interface diagram of a visual video monitoring system based on three-dimensional vision according to an embodiment of the present invention, and a display of a switch function and a floor switching function;
fig. 14 is a display of the flame and smoke alarm functions of a visual video monitoring system based on three-dimensional vision according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a monitoring video personnel positioning and tracking system based on visual slam, wherein hardware equipment adopts a Kinect for Azure depth camera, a checkerboard calibration board, a monitoring camera, a personal computer or a notebook computer, and the system comprises the following modules (as shown in figure 1):
the three-dimensional reconstruction module is used for constructing a three-dimensional point cloud map of a scene and recording data required by external reference calibration, the whole using period is only operated once, and the RGBD picture stream and the inertial sensor data provided by the depth camera are used for constructing the three-dimensional point cloud map of the scene according to the visual odometer and are stored in a file for being displayed by the visual display module; in the three-dimensional reconstruction process, shooting a checkerboard calibration board and recording the position and the posture of a camera at the moment for a calibration module to use;
the calibration module is used for measuring the position and the posture of each monitoring camera in the three-dimensional point cloud map, the whole using period only runs once, the monitoring cameras shoot a monitoring picture photo of a chessboard grid calibration plate, and the position and the posture of the monitoring cameras in the three-dimensional point cloud map are calculated by combining calibration data provided during three-dimensional reconstruction and are used by the position calculation module;
the position calculation module is used for identifying the position of a person in the image and providing the position of the person in the three-dimensional point cloud map for the visual display module to display according to the position and posture data obtained by the calibration module and the monitoring video stream provided by the monitoring camera;
and the visual display module is used for displaying the three-dimensional point cloud map of the scene and the positions of the characters in the three-dimensional point cloud map and providing a GUI (graphical user interface) for user interaction and supervision.
The system consists of the four modules, wherein the three-dimensional reconstruction module, the calibration module and the position calculation module are innovated on the technical level, and the visual display module is innovated on the product level.
Based on the system, the embodiment of the invention provides a monitoring video personnel positioning and tracking method based on visual slam, which comprises the following specific steps:
s1, three-dimensional reconstruction based on visual slam: constructing a three-dimensional point cloud map of a scene and recording data required by external reference calibration, constructing the three-dimensional point cloud map of the scene according to a visual odometer and storing the three-dimensional point cloud map of the scene into a file by using RGBD picture stream and inertial sensor data provided by a depth camera; in the three-dimensional reconstruction process, a chessboard pattern calibration plate is shot and the position and the posture of the camera at the moment are recorded.
In the three-dimensional reconstruction process, the equipment parameter calibration uses an IMU _ utils tool, a Kalibr toolbox and a two-dimensional code calibration plate, and the parameter of the depth camera is measured, which comprises the following steps: the method comprises the steps of obtaining an IMU parameter of the depth camera, obtaining a time offset between the IMU and the depth camera, and obtaining a pose transformation relation between an IMU coordinate system and a depth camera coordinate system.
The three-dimensional reconstruction module adapts the Kinect for Azure depth camera based on an ORB _ SLAM3 open source framework, so that the Kinect for Azure depth camera can be used normally under actual working conditions. The construction process of the three-dimensional point cloud map of the scene in step S1 is shown in fig. 2, and the modification content specifically includes:
s11, performing down-sampling on the RGBD picture stream provided by the depth camera, and reducing the resolution from 1280 × 720 to 960 × 540;
s12, aligning the time stamp to the incoming data stream according to the calibration parameters stored in the file;
s13, changing triangulation into depth map reading for Z-axis direction distance estimation of the feature points;
and S14, increasing a point cloud map processing thread to reconstruct dense point clouds of the scene terrain.
Visual odometry provides only depth camera position and pose, thus requiring the reconstruction of dense point clouds from the scene terrain. The method is divided into two steps, the point clouds are spliced out according to the postures of the depth cameras and the depth camera model, then the point clouds are processed, and repeated points and redundant points are removed. In the system, the functions are realized through a point cloud map processing thread, and a flow chart is shown in fig. 2. The point cloud map processing thread is used for receiving the position and pose information of each frame of depth camera and RGBD image frames (including color images and depth images) and outputting an accurate point cloud map, and the step S14 is used for the point cloud map processing thread and comprises the following steps:
s141, screening the position and pose information of each frame of depth camera and an RGBD image frame, and selecting a current frame when the camera angle change between the current frame and a last selected frame is more than 10 degrees and the displacement change is more than 2 meters, and performing subsequent point cloud map generation operation;
s142, calculating a point cloud block of the current frame according to a depth camera model formula, and rotating the point cloud block to a uniform world coordinate system;
s143, splicing and merging the point cloud blocks generated by all the frames to obtain an integral point cloud map, and performing filtering and outlier removing treatment on the point cloud map to compress the data volume of the point cloud map and optimize the visual impression of the map;
and S144, when a loop occurs in the process of drawing establishment, the ORB-SLAM3 re-optimizes the pose of the selected frame, re-splices the point clouds, and re-performs point cloud processing operation according to the step S143.
The detailed process of converting the RGBD image frames obtained by the depth camera into point cloud blocks and stitching the point cloud blocks to the world coordinate system is as follows:
for a certain pixel point P in the image, the coordinate of the certain pixel point P is set as (u, v), and the coordinate of a three-dimensional point P obtained by projecting P in the point cloud is set as (X) c ,Y c ,Z c ) Generation of the midpoint of the point cloudThe formula is as follows:
wherein f is x ,f y ,c x ,c y Representing the camera's internal parameters, obtained by the device parameter calibration method given above, and d representing the depth of the point p read from the depth map.
Through ORB-SLAM3, a matrix R of rotation and translation of the camera in the world coordinate system is obtained cw ,t cw Combine them together:
the coordinate X of the point P in the world coordinate system can be obtained w ,Y w ,Z w Comprises the following steps:
based on the method, all pixel points in the image can be converted into three-dimensional points in a world coordinate system, and the method has the advantage that after the key frame is corrected by the loop detection thread of the ORB-SLAM, the corrected result can act on a dense map. Compared with the method of directly splicing point clouds, the method can eliminate accumulated errors.
In the ORB _ SLAM3 framework, since the world coordinate system uses the position of the camera at the time of initialization as a reference, the generated point cloud map inevitably has a skew condition, which may negatively affect the subsequent visualization display of the map and the subsequent external reference calibration, and therefore the coordinate system calibration needs to be performed on the point cloud map. As shown in fig. 3, the calibration process of the three-dimensional point cloud map is as follows:
1) calculating a plane equation of the point cloud ground:
and (3) calculating a plane equation ax + by + cz + d of the ground to be 0 by using a plane detection method based on RANSAC in a PCL library, wherein a, b, c and d are four parameters of the plane equation, and a normal vector of the ground to be (a, b and c) can be obtained through the plane equation.
2) Calculating a rotation matrix of the point cloud ground and a horizontal plane of a coordinate system:
and calculating a rotation matrix of the point cloud ground and the horizontal plane of the coordinate system, and realizing rotation transformation of the point cloud map through the matrix so as to calibrate the point cloud map. The calculation method is as follows:
(1) assuming that a normal vector of the cloud ground is v1 ═ a, b, and c, and a normal vector of the horizontal plane of the coordinate system is v2 ═ 0,0, and 1, a rotation axis n and a rotation angle θ of rotation transformation between the two vectors are calculated, and the calculation formula is as follows:
(2) the rotation matrix R is obtained by the rotation axis n and the rotation angle theta, and the calculation formula is as follows:
R=cosθI+(1-cosθ)nn T +sinθn^
where ^ symbol is the vector to antisymmetric matrix converter, and let vector a ═ be (a1, a2, a3), the specific conversion formula is as follows:
3) calibrating the point cloud map using the rotation matrix:
suppose a certain point in the point cloud map to be calibrated is p 0 =(x 0 ,y 0 ,z 0 ) The point coordinate after calibration is p 1 =(x 1 ,y 1 ,z 1 ) The conversion formula is as follows:
p 1 =Rp 0
the formula is applied to all points in the point cloud map to be calibrated, so that the whole point cloud map can be calibrated, and finally, the ground is aligned with the horizontal plane of the coordinate system.
S2, calibration: and measuring the position and the posture of each monitoring camera in the three-dimensional point cloud map, taking a monitoring picture of the checkerboard calibration plate by the monitoring cameras, and calculating the position and the posture of the monitoring cameras in the three-dimensional point cloud map by combining the calibration data provided in the step S1 during three-dimensional reconstruction.
This step mainly solves the following problems: for the monitoring cameras installed in the building, how to obtain the coordinates of the cameras in the world coordinate system, the calibration process in step S2 is as follows:
s21, shooting a standard chessboard grid calibration plate by a monitoring camera:
after the RGBD camera is used for carrying out three-dimensional reconstruction on the environment, a point cloud map of the environment is obtained. In order to complete personnel positioning work by using the monitoring camera, the relative position relation between the monitoring camera and the point cloud map, namely an external parameter, needs to be measured. The extrinsic parameters were calculated using the calibration plate as a marker.
The calibration board is used for providing a plurality of three-dimensional points with fixed relative position relation for the camera, so that required parameters are calculated, the calibration board commonly used for calibration comprises a chessboard grid calibration board and a two-dimensional code calibration board, the two-dimensional code calibration board has the advantage that the two-dimensional code calibration board can be distinguished after being turned upside down compared with the chessboard grid calibration board, meanwhile, higher requirements on illumination and camera resolution are provided due to complex graphs, in the use condition of the system, the monitoring cameras are all in a front view monitoring scene, the condition that the cameras are turned upside down does not exist, meanwhile, in order to enable the calibration method to be suitable for the monitoring cameras with lower cost and the system is arranged under the condition of more dim indoor environment light, and the chessboard grid calibration board is more suitable for the use environment of the system.
And modifying ORB _ SLAM3 frame codes to save photos taken by the current camera and the camera pose at a certain running time. In actual operation, selecting the origin position of a world coordinate system, slowly moving the monitoring camera from the origin position to the direction of the checkerboard calibration plate, estimating the pose of the monitoring camera in real time by using ORB _ SLAM3 in the process, closing a program when the monitoring camera moves to the front of the checkerboard calibration plate, and storing the current photo shot by the monitoring camera and the pose of the camera;
s22, calibrating internal parameters of the monitoring camera:
the checkerboard calibration plate is arranged in the range of the monitoring camera, the checkerboard calibration plate is moved in multiple angles, a section of video is recorded, frames in the video are extracted, checkerboards are identified, and internal parameters and distortion of the monitoring camera are calibrated by using a Zhang friend calibration method.
S23, calibrating external parameters of the monitoring camera:
the pose calculation is a method for solving a camera coordinate system and a target coordinate system according to the actual three-dimensional position information of the target feature points and the two-dimensional positions of the target feature points in the image. If the three-dimensional coordinates of the feature points on the calibration plate are known, an augmentation matrix [ R | T ] containing 12 unknowns is constructed by using a direct linear transformation method to represent the transformation between the camera coordinate system and the target coordinate system. And selecting at least 6 pairs of corresponding points of the known three-dimensional space point coordinates and the two-dimensional pixel point coordinates to solve the unknown number in the augmentation matrix, thereby realizing the calculation of the camera pose.
Assuming a feature point P on the calibration plate in space 1 The corresponding homogeneous coordinate is P 1 =(X,Y,Z,1) T The corresponding two-dimensional point of the point in the image of the monitoring camera is recorded as x 1 =(u 1 ,v 1 ,1) T Setting 3X 4 augmentation matrix [ R | T ] according to the calculation method of direct linear transformation]Unfolding into a form:
in the above formula u 1 ,v 1 The pixel coordinates X, Y and Z of a certain two-dimensional point in the image of the monitoring camera are the three-dimensional coordinates s of the corresponding point as a scale.
The linear transformation according to the matrix takes advantage of the last row of cancellation scale coefficients s to yield:
for the sake of simplifying the representation, the vector t for each horizontal row of the augmented matrix of formula (1) 1 ,t 2 ,t 3 Represents:
the equation for the vector representation can then be found as follows:
in the equations (5) and (6), t is a vector to be solved, and according to the equations (5) and (6), each feature point contains constraint equations of two unknowns, and if there are N pairs of corresponding point pairs of the three-dimensional coordinates and the two-dimensional coordinates, the feature equations can be written as shown in the following formula:
according to the equation (7), 12 unknowns are contained, so that at least six pairs of corresponding points are needed for solving the equation, a checkerboard calibration board used by the system has 42 diagonal coordinate pairs, the equation becomes an over-determined equation, and the least square solution is carried out on the equation by using an SVD method.
For 42 three-dimensional points P on the checkerboard calibration plate and their projections P on the normalized plane, the pose R, t of the camera was previously calculated using a direct linear transformation method, which isThe lie algebra is denoted ξ. Suppose that the spatial coordinate of a certain calibration plate corner point in space is P i =[X i ,Y i ,Z i ] T Its projected coordinate u i =[u i ,v i ] T . The relation between the pixel coordinates and the position of the space point is shown as follows:
written in matrix form then:
s i u i =Kexp(ξ^)P i
the equation has an error due to the unknown pose of the camera and the noise of the observation point. Summing the errors to construct a least squares problem, and then finding the best camera pose to minimize it, the sum of the error terms is as shown:
and solving through a Gauss-Newton algorithm to obtain a camera pose transformation matrix when the error term is minimum.
The camera pose can be solved on the premise that the coordinates of the characteristic points of the checkerboard calibration plate are known, and due to the scale uncertainty of the monocular camera, the relative position relationship between the characteristic points cannot be determined only by the size and the style of the calibration plate observed in the image, so according to the actual situation of the system, as shown in fig. 4, the scale information can be obtained from two aspects:
(1) and acquiring the Z-axis absolute distance from the RGBD camera optical center to the characteristic point from the depth map, thereby acquiring the absolute position information of each characteristic point.
(2) And measuring the size of the checkerboard on the calibration plate to obtain the relative position relation between the characteristic points on the checkerboard.
The first characteristic angular point at the upper left corner of the checkerboard calibration board is taken as the origin of coordinates, the transverse direction is taken as the x axis, the longitudinal direction is taken as the y axis, and the vertical direction is taken as the z axisThe coordinate system of the board is calibrated. The method for solving the camera pose by the three-dimensional coordinates of the feature points on the checkerboard calibration plate is known, and a pose transformation matrix of the camera in a coordinate system of the calibration plate can be solved. The down-conversion matrix of the coordinate system of the calibration plate obtained when the chessboard grid calibration plate is shot by using the monitoring camera is marked as T mb And when the RGBD camera is used for shooting the chessboard pattern calibration plate in the running process of the visual odometer, the transformation matrix under the coordinate system of the calibration plate obtained by shooting the chessboard pattern calibration plate is marked as T cb When the RGBD camera given by the visual odometer has a transformation matrix T relative to the world coordinate system cw 。
Setting the coordinate of some characteristic point on the chessboard pattern calibration plate in the coordinate system of the calibration plate as P b =(X,Y,0,1) T The sitting mark P of the point in the camera coordinate system of the surveillance camera m Sit mark P in RGBD camera coordinate System c The coordinate in the world coordinate system of the visual odometer is marked as P w Then there are:
P c =T cb P b (10)
P m =T mb P b (11)
left and right sides of the formula (10) are simultaneously multiplied by T cb -1 Obtaining:
P b =T cb -1 P c (12)
substituting (11) for formula (12) has:
P m =T mb T cb -1 P c (13)
according to the Euclidean transformation definition (13) of T mb T cb -1 Represents the transformation method from the RGBD camera coordinate system to the monitoring camera coordinate system, and therefore takes T mb T cb -1 Is T mc 。
From equation (14), the transformation matrix T from the world coordinate system to the coordinate system of the surveillance camera can be solved mw 。
T mw =T mc T cw (14)
S3, position calculation: and identifying the position of the person in the image and giving the position of the person in the three-dimensional point cloud map according to the position and posture data obtained in the step S2 and the monitoring video stream provided by the monitoring camera.
Personnel location and trajectory tracking based on surveillance video have the advantage of high accuracy and good stability. The pedestrian detection and positioning algorithm based on surveillance video will be described with emphasis below.
As shown in fig. 5, the process of tracking the pedestrian position in the monitoring camera and calculating is as follows:
s31, selecting whether to enter an attitude correction process according to the type of the monitoring camera and whether the monitoring camera moves, if so, entering a step S32, otherwise, entering a step S33;
s32, if entering the attitude correction process, extracting a vanishing point in the image and comparing the vanishing point position, judging whether the monitoring picture rotates according to whether the vanishing point moves, and updating the rotation;
specifically, regarding the attitude correction of the monitoring camera, part of the monitoring cameras have a manual or automatic rotation function, which may cause changes in the attitude yaw angle and pitch angle of the camera, and in view of the positioning system described herein, the workload of calibration work on the monitoring cameras in the early stage of use is large, and it is not practical to perform an action of re-calibrating external parameters after the attitude of the monitoring camera changes every time, so a workflow for automatically updating and correcting the attitude needs to be provided.
1) Vanishing point extraction
According to the projective geometry principle, in the case of perspective deformation, a group of parallel straight lines in the real world intersects at an infinite point, and the projection of the intersection point on the imaging plane is called a vanishing point. When the parallel line in the real world is parallel to the imaging plane, the vanishing point is located at infinity from the imaging plane. However, when there is a non-parallel relationship between the group of parallel lines and the imaging plane, the vanishing point will be located within a limited distance of the imaging plane, even within the imaging area.
Vanishing points have some important properties:
a. in the real world, lines parallel to each other and lines parallel to each other both point to the same vanishing point;
b. the vanishing point corresponding to the straight line is required to be positioned in the direction of the projection light of the straight line on the image surface;
c. the vanishing point is located independent of roll angle and solely pitch and yaw.
The vanishing point is an important feature formed on an image plane after perspective projection, and can provide a large amount of structural information and direction information for scene analysis or be used for measuring parameters of a camera. Therefore, the vanishing point has wide application in rectangular structure estimation and matching, three-dimensional reconstruction, camera calibration and azimuth angle estimation.
In view of the use of the system inside buildings, most of the walls and floors of the buildings are flat, so that fixed vanishing points can be extracted. Judging whether the monitoring picture rotates according to whether the vanishing point moves or not, firstly extracting the vanishing point, and the process is as follows:
(1) taking internal parameters and distortion of a monitoring camera, and carrying out distortion removal on an original picture to obtain a distortion-removed picture;
(2) extracting line segments on the undistorted picture by using an LSD line segment extractor;
(3) selecting effective line segments with more than 60 pixels according to the length screening line segments;
(4) calculating the angle of each line segment by using Hough transform;
(5) clustering the line segments according to angles, and dividing the line segments into three classes;
(6) solving the closest point of each type of straight line by using a least square method;
(7) selecting the reference point with the smallest sum of the coordinates in the three vanishing points;
as shown in fig. 6, red, green and blue are three types of line segments, and the center of the red circle is a specific position of a reference point.
2) Attitude correction
As shown in FIG. 6, the coordinate of the vanishing point before rotation is set to (x) 0 ,y 0 ) And the coordinate of the vanishing point after rotation is (x) 1 ,y 1 ) Then, the amount of change δ yaw and the amount of change δ pitch of the pitch angle can be calculated as shown in equations (15) and (16).
δyaw=arctan(x 1 -x 0 )#(15)
δpitch=arctan(y 1 -y 0 )#(16)
Because the pose of the monitoring camera is formed by the transformation matrix T mw Expressed, therefore, the amount of change in angle is converted into a matrix representing the same rotation, the rotation in the yaw direction being denoted as R x Rotation in the Pitch Angle Direction is denoted as R y As shown in formulas (17) and (18):
since the camera can only rotate but cannot move, the translation amount of the optical center coordinate is considered to be approximately zero, and the rotation matrix in two directions is multiplied by the translation vector t which is equal to (0,0, 0) T Form a transformation matrix T 01 And represents conversion from the camera coordinates before rotation to the camera coordinates after rotation. The transformation matrix of the new surveillance camera required for the positioning is denoted T mw ' then there are:
T mw ′=T 01 T mw #(19)
s33, entering a positioning process, and firstly taking a frame of monitoring video image;
s34, carrying out target detection on the pedestrian in the image to obtain a target frame coordinate;
s35, carrying out target tracking on the detected target frame and giving the corresponding personnel position coordinates;
s36, calculating the space position coordinates of the personnel according to the calibration parameters of the monitoring camera;
specifically, the position calculation flow is as follows:
1) target recognition tracking
A plurality of targets in the video stream are detected And tracked based on SORT (simple Online And real tracking) algorithm, And the id of each target is displayed. The algorithm uses a powerful CNN detector-yolov 3 to detect the target, and then uses Kalman filtering (Kalman filter) and Hungarian algorithm (Hungarian algorithm) to track the detected target. The algorithm can realize accurate multi-row person tracking on the premise of meeting the real-time requirement.
2) Pedestrian position calculation
The world coordinate system is a coordinate system of a point cloud map obtained by three-dimensional reconstruction, namely, the optical center position of the first frame is taken as an origin, and the direction of the Z axis is opposite to the direction of gravity. Transformation matrix T from world coordinate system to monitoring camera coordinate system mw Consisting of a3 x 3 rotation matrix R and a1 x 3 translation vector t. And (3) obtaining the height h of the ground in a world coordinate system through actual measurement, and calibrating an internal reference matrix K of the camera through a calibration board.
Coordinate X of pedestrian on ground according to projection equation (20) w 、Y w Can solve the following problems:
the specific solving process is as follows:
obtaining camera model optical center P of monitoring camera according to pose transformation matrix of monitoring camera ow =(X ow ,Y ow ,Z ow ):
P ow =-R T t#(21)
Taking the middle point M (u, v) of the lower bottom edge of the target frame of the person to be positioned, calculating the spatial position of the point M on the normalization plane according to a projection equation, setting the depth d to be 1M, and solving the spatial coordinate P of the point M on the normalization plane according to a formula (22) m =(X m ,Y m ,Z m );
An equation of the plane of the ground is written according to the height h of the ground in the world coordinate system, as shown in formula (24), the plane of the ground is converted into a point on the plane and normal vector representation:
z=h#(23)
will rayThe formulation parameter equation is shown in (25), whereinIs the direction vector of the ray and,t is a parameter t ∈ [0, ∞).
after work-up of formula (26) to give:
the vector dot product distribution law yields:
from which the intersection point can be solvedIntersection point P g The coordinates of the pedestrian are three-dimensional coordinates of the pedestrian in a world coordinate system.
S37, if the positioning is not terminated, acquiring the next frame image and returning to the step S33, otherwise, ending.
S4 visual display: and displaying the three-dimensional point cloud map of the scene in the step S1 and the positions of the characters in the three-dimensional point cloud map in the step S3, and providing a GUI interface for user interaction and supervision.
The system designed and realized by the invention comprises a three-dimensional (3D) display interface based on a three.JS graph library and capable of displaying three-dimensional point cloud, a camera model and a track, and the functional logic of the system is as follows.
1) Three-dimensional point cloud loading
The invention can load a plurality of three-dimensional point cloud PCD files positioned at a server and correctly display the files in a 3D display interface, and the functional logic of the invention is as follows:
(1) searching all PCL files under the appointed directory to obtain PCL file names;
(2) reading the PCD file by using a FileLoader carried by JavaScript according to the file name to obtain three-dimensional point cloud data, such as point cloud number, a point coordinate set, a color set and the like;
(3) remapping the color set color gamut of the point cloud to the color gamut range supported by the three.js graphic library, wherein the formula is shown as follows;
(4) correspondingly filling the three-dimensional point cloud data into a graphic class object provided by three.Js by combining a JavaScript typed array and the data class of three.Js; the typed array is shown in FIG. 7.
(5) Each created graphic class object is named by a file name and is added to a display interface;
(6) and according to system setting, only displaying the default checked point cloud, and reserving other point clouds for standby.
So far, the functional logic of loading the three-dimensional point cloud in the system is completely combed, and fig. 8 is a three-dimensional point cloud loading sample.
2) Camera model loading
The invention can add the camera model with the correct orientation at the correct position in the 3D display interface according to the camera calibration parameters at the service end, so that the overall layout is clear at a glance, and the functional logic is as follows:
(1) reading an appointed parameter xml file to obtain the number of cameras, corresponding names and a transformation matrix;
(2) reading the obj file of the camera model by using a FileLoader carried by JavaScript, and creating graphics class objects of three.
(3) Adjusting the world coordinate and the orientation of each graphic object according to the transformation matrix, and naming;
(4) and adding each adjusted graphic class object to the display interface.
So far, after the functional logic of the loaded camera model in the system is sorted, fig. 9 is an illustration of the loaded camera model sample, and the background of the illustration is the point cloud of fig. 8.
The specific content displayed comprises a plurality of three-dimensional point clouds, cameras and tracks corresponding to the three-dimensional point clouds, and when a certain point cloud is selected to be displayed, other point clouds, the corresponding cameras and the tracks are hidden.
3) Trajectory information transmission and rendering optimization
The server side can receive Socket information, the transmission and drawing functions of the obtained track information are realized based on the Socket information, the problem that the thickness of the lines of the Web graphic library cannot be adjusted at present is solved, the functional flow is shown in FIG. 10, and the functional logic is as follows:
(1) receiving Socket information, and obtaining track related information in real time, wherein the track related information comprises a track ID, an operation type and track data;
(2) if the operation type is track update: firstly, searching a corresponding ID in an existing track library, if not, building a line class object of three.Js, filling corresponding track data, and adding the track data to a display interface; if the object is found, updating the existing three.Js line class object by using the newly obtained track data, and drawing in the display interface again;
(3) if the operation type is track deletion, searching a corresponding ID in an existing track library, and deleting a three.js line class object corresponding to the ID in a display interface;
flicker problem for trace update: js can cause the track to flicker ceaselessly when the track is drawn, and the reason is that redrawing is ceaselessly deleted when the track is updated. And keeping the old track when processing the newly obtained track information, and deleting the old track from the display interface after the new track is completely covered and drawn. Therefore, the display effect of the track can be ensured, and the memory occupation of the system can be reduced.
The thickness problem of the track: js, which causes the track to be difficult to identify in a display interface, and provides a method for copying partial overlapping of single tracks, specifically comprising the following steps: and after the coordinate data of the three.js line object is updated, the object is copied for multiple times when the object is drawn in the display interface, and the object is gradually translated to the periphery, so that the effect of thickening the track is successfully realized, and the performance test is passed.
(4) Object binding and perspective operations
As described above, the 3D display interface of the present invention can display a plurality of point clouds and corresponding cameras and tracks thereof, and can ensure that when a certain point cloud is selected to be displayed, other point clouds and corresponding cameras and tracks are hidden. Js image library provides object binding operation to correspond the camera and the track to the appointed point cloud according to the ID, thereby realizing the function of switching the point cloud scene.
In addition, the 3D display interface of the invention realizes the operations of translation, rotation and scaling of the visual angle based on the event monitor mechanism of Web and JavaScript, and has the capability of multi-terminal adaptation.
Regarding the GUI system interface, the overall system architecture is shown in fig. 11.
The system adopts a front-end and back-end separation framework, the front end adopts an Vue development framework to construct a visual display platform, and visual functions such as 3D point cloud, pedestrian track, real-time monitoring and the like are provided.
The rear end is realized on the basis of a Springboot development framework, calculation and storage functions such as monitoring management, pedestrian counting, pedestrian track management and log management are provided for front end visualization, and interactive communication is performed between the front end and the rear end in an Http and WebSocket mode. The functions are described as follows:
monitoring and managing: the system access monitoring is managed, and mainly comprises monitoring configuration management and state management.
Pedestrian trajectory: and acquiring the coordinates of the pedestrian track in real time by utilizing a pedestrian positioning and tracking algorithm, and realizing the visualization of the pedestrian track and the persistent storage of the historical track.
Counting pedestrians: and calculating the number of pedestrians according to the number of tracks in the current scene, and displaying the pedestrians on a visual interface in real time.
Log management: all behaviors generated by the system are recorded, and a query function is provided, so that the system safety is improved.
The rear end of the system realizes data persistence by using a Mysql database, and can provide efficient data reading and writing; the method integrates a pedestrian positioning and tracking algorithm into a system by using a kafka message queue, and comprises the following specific steps:
(1) analyzing a video frame by a pedestrian positioning and tracking algorithm, sending the coordinates of pedestrians in a three-dimensional coordinate system into a topic named slam in Kafka, and setting the number of partitions to be 1 so as to ensure the ordering of messages;
(2) the system rear end monitors Kafka in real time, reads a pedestrian track coordinate calculated by a pedestrian positioning and tracking algorithm, stores the pedestrian track coordinate to a MySql database in a lasting mode on one hand, and sends the pedestrian track coordinate to the front end through WebSocket on the other hand;
(3) and when the front end receives the pedestrian track coordinate pushed by the rear end once, drawing track points in the 3D point cloud picture according to the coordinate. When the track points accumulate a certain number, a relatively obvious pedestrian track is formed.
The Kafka is used for integrating the algorithm, so that the expandability of the system is improved, the functions of the algorithm can be flexibly expanded, and pluggable function implantation is provided.
The front-end interface of the visual video monitoring system of the invention is shown in fig. 13 and comprises a 3D display interface, video monitoring, an abnormal item monitoring switch, a track list, an abnormal event list and the like. The upper right corner can switch floors, and after switching, the point cloud, the camera and the track on the left side and the video monitoring condition on the right side can be synchronously updated; the lower left corner can selectively enable the functions of the patent, such as target tracking, flame alarming and the like.
When the right video monitoring area detects pedestrians and calculates the trajectories of the pedestrians, the trajectories are synchronously updated and drawn on the left side, and the corresponding table entries are updated in the lower trajectory list area, as shown in fig. 12, the trajectories are clear and visible, and the position and the orientation of the camera are displayed by using a camera model.
When a flame or smoke is monitored in video monitoring, an alarm prompt is popped up, so that a user of the patent can find abnormal conditions in time, and meanwhile, a corresponding table entry is updated in an abnormal event list at the lower right corner, as shown in fig. 14. The whole system runs smoothly and meets the performance requirement.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described with relative simplicity as they are substantially similar to method embodiments, where relevant only as described in portions of the method embodiments.
The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (7)
1. A monitoring video personnel positioning and tracking method based on visual slam is characterized in that a depth camera is used for carrying out three-dimensional reconstruction on an environment to obtain a point cloud map of the environment; combining external reference calibration with a scene of the monitoring camera, and performing external reference calibration on the monitoring camera through a calibration plate to obtain position and posture information of the monitoring camera; the method comprises the steps of tracking personnel through a monitoring camera, identifying people in a monitored image by utilizing a deep neural network, calculating the three-dimensional positions of the people according to the ground priors of the pedestrians appearing in the monitoring by adopting the principle of inverse perspective transformation based on the previously calibrated positions and postures, drawing the tracks of the personnel in a constructed point cloud map, and enabling the tracks to be presented in the point cloud map.
2. The visual slam-based surveillance video personnel location tracking method of claim 1, comprising the steps of:
s1, three-dimensional reconstruction based on visual slam: constructing a three-dimensional point cloud map of a scene, recording data required by external parameter calibration, and constructing the three-dimensional point cloud map of the scene according to the visual odometer and storing the three-dimensional point cloud map of the scene into a file by using RGBD picture stream and inertial sensor data provided by the depth camera; shooting a chessboard pattern calibration plate and recording the position and the posture of a camera at the moment in the three-dimensional reconstruction process;
s2, calibration: the monitoring camera shoots a monitoring picture of the chessboard pattern calibration plate, and the position and the posture of the monitoring camera in the three-dimensional point cloud map are calculated by combining the calibration data provided in the step S1 during three-dimensional reconstruction;
s3, tracking and calculating the position of the pedestrian in the monitoring camera: tracking people, identifying the positions of people in the images and providing the positions of the people in the three-dimensional point cloud map according to the position and posture data obtained in the step S2 and a monitoring video stream provided by a monitoring camera;
the pedestrian position tracking and calculating process in the monitoring camera comprises the following steps:
s31, selecting whether to enter the attitude correction process according to the type of the monitoring camera and whether the monitoring camera moves, if so, entering the step S32, otherwise, entering the step S33;
s32, if entering the posture correction process, extracting a vanishing point in the image, comparing the vanishing point with the position of the vanishing point, judging whether the monitoring picture rotates according to whether the vanishing point moves, and updating the rotation;
s33, entering a positioning process, and firstly taking a frame of monitoring video image;
s34, carrying out target detection on the pedestrian in the image to obtain a target frame coordinate;
s35, carrying out target tracking on the detected target frame, and giving the corresponding personnel position coordinates;
s36, calculating the space position coordinates of the personnel according to the calibration parameters of the monitoring camera;
s37, if the positioning is not terminated, acquiring the next frame image and returning to the step S33, otherwise, ending.
3. The method for positioning and tracking people in surveillance videos based on visual slam according to claim 2, wherein a point cloud map processing thread is added in the process of constructing the three-dimensional point cloud map of the scene in step S1, and is used for receiving the position and orientation information of each frame of camera and the RGBD image frame and outputting an accurate point cloud map, and the specific process is as follows:
s141, screening the position and pose information of each frame of depth camera and an RGBD image frame, and selecting a current frame when the camera angle change between the current frame and a last selected frame is more than 10 degrees and the displacement change is more than 2 meters, and performing subsequent point cloud map generation operation;
s142, calculating a point cloud block of the current frame, and rotating the point cloud block to a uniform world coordinate system;
s143, splicing and merging the point cloud blocks generated by all the frames to obtain an integral point cloud map, and performing filtering and outlier removing treatment on the point cloud map to compress the data volume of the point cloud map and optimize the visual impression of the map;
and S144, when a loop occurs in the process of drawing construction, the ORB-SLAM3 re-optimizes the pose of the selected frame, re-splices the point clouds, and re-performs point cloud processing operation according to the step S143.
4. The visual slam-based surveillance video personnel positioning and tracking method according to claim 2, wherein the step S2 external reference calibration method is:
s21, shooting a standard chessboard grid calibration plate by a monitoring camera: selecting an origin position of a world coordinate system, slowly moving a monitoring camera to the direction of a checkerboard calibration board from the origin position, estimating the pose of the monitoring camera in real time by using ORB _ SLAM3 in the process, closing a program when the monitoring camera moves to the front of the checkerboard calibration board, and storing the current photo shot by the monitoring camera and the pose of the camera;
s22, calibrating internal reference of the monitoring camera: placing the chessboard pattern calibration board in the range of the monitoring camera, moving the chessboard pattern calibration board in multiple angles, recording a section of video, extracting frames in the video, identifying the chessboard pattern, and calibrating the internal parameters and distortion of the monitoring camera by using a Zhang Yongyou calibration method;
s23, calibrating external parameters of the monitoring camera: and solving a camera coordinate system and a target coordinate system by adopting a direct linear transformation method according to the actual three-dimensional position information of the target feature point and the two-dimensional position of the target feature point in the image, so as to realize the calculation of the relative position relation between the monitoring camera and the three-dimensional point cloud map.
5. The method for locating and tracking people in surveillance video based on visual slam according to claim 2, wherein the calculation method of the spatial position coordinates of the people in step S36 is:
obtaining camera model optical center P of monitoring camera according to pose transformation matrix of monitoring camera ow =(X ow ,Y ow ,Z ow ):
P ow =-R T t#(21)
Taking the middle point M (u, v) of the lower bottom edge of the target frame of the person to be positioned, calculating the spatial position of the point M on the normalization plane according to a projection equation, setting the depth d to be 1M, and solving the spatial coordinate P of the point M on the normalization plane according to a formula (22) m =(X m ,Y m ,Z m );
According to the height h of the ground in the world coordinate system, an equation giving the plane of the ground is expressed by the formula (24), and the plane of the ground is converted into a point on the plane and normal vector representation:
z=h#(23)
will rayThe formulation parameter equation is shown in (25), whereinIs the direction vector of the ray and,t is a parameter t ∈ [0, ∞),
after work-up of formula (26) to give:
the distribution law is multiplied by vector points to obtain:
6. The visual slam-based surveillance video personnel location tracking method of claim 2, further comprising a step S4 of visually presenting: and displaying the three-dimensional point cloud map of the scene in the step S1 and the position of the character in the step S3 in the three-dimensional point cloud map, providing a GUI (graphical user interface) for user interaction and supervision, wherein specific displayed contents comprise a plurality of three-dimensional point clouds and corresponding cameras and tracks thereof, and when a certain point cloud is selected to be displayed, other point clouds and the corresponding cameras and tracks are hidden.
7. A visual slam-based surveillance video personnel location tracking system comprising the following modules to implement the method of any of claims 1-6:
the three-dimensional reconstruction module is used for constructing a three-dimensional point cloud map of a scene and recording data required by external parameter calibration, the whole using period is only operated once, and the RGBD picture stream and the inertial sensor data provided by the depth camera are used for constructing the three-dimensional point cloud map of the scene according to the visual odometer and are stored in a file for being displayed and used by the visual display module; in the three-dimensional reconstruction process, shooting a checkerboard calibration board and recording the position and the posture of a camera at the moment for a calibration module to use;
the calibration module is used for measuring the position and the posture of each monitoring camera in the three-dimensional point cloud map, the whole using period only runs once, the monitoring cameras shoot a monitoring picture photo of a chessboard grid calibration plate, and the position and the posture of the monitoring cameras in the three-dimensional point cloud map are calculated by combining calibration data provided during three-dimensional reconstruction and are used by the position calculation module;
the position calculation module is used for identifying the position of a person in the image and providing the position of the person in the three-dimensional point cloud map for the visual display module to display according to the position and posture data obtained by the calibration module and the monitoring video stream provided by the monitoring camera;
and the visual display module is used for displaying the three-dimensional point cloud map of the scene and the positions of the characters in the three-dimensional point cloud map, and providing a GUI (graphical user interface) for user interaction and supervision.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210669897.1A CN115035162A (en) | 2022-06-14 | 2022-06-14 | Monitoring video personnel positioning and tracking method and system based on visual slam |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210669897.1A CN115035162A (en) | 2022-06-14 | 2022-06-14 | Monitoring video personnel positioning and tracking method and system based on visual slam |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115035162A true CN115035162A (en) | 2022-09-09 |
Family
ID=83125739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210669897.1A Pending CN115035162A (en) | 2022-06-14 | 2022-06-14 | Monitoring video personnel positioning and tracking method and system based on visual slam |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115035162A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115375890A (en) * | 2022-10-25 | 2022-11-22 | 苏州千里雪智能科技有限公司 | Based on four mesh stereovision cameras governing system of 5G |
CN115861427A (en) * | 2023-02-06 | 2023-03-28 | 成都智元汇信息技术股份有限公司 | Indoor personnel dynamic positioning method and device based on image recognition and medium |
CN116931524A (en) * | 2023-07-25 | 2023-10-24 | 江苏猎人安防科技有限公司 | Intelligent monitoring system and process for building |
CN117058331B (en) * | 2023-10-13 | 2023-12-19 | 山东建筑大学 | Indoor personnel three-dimensional track reconstruction method and system based on single monitoring camera |
-
2022
- 2022-06-14 CN CN202210669897.1A patent/CN115035162A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115375890A (en) * | 2022-10-25 | 2022-11-22 | 苏州千里雪智能科技有限公司 | Based on four mesh stereovision cameras governing system of 5G |
CN115861427A (en) * | 2023-02-06 | 2023-03-28 | 成都智元汇信息技术股份有限公司 | Indoor personnel dynamic positioning method and device based on image recognition and medium |
CN116931524A (en) * | 2023-07-25 | 2023-10-24 | 江苏猎人安防科技有限公司 | Intelligent monitoring system and process for building |
CN116931524B (en) * | 2023-07-25 | 2024-04-26 | 铯镨科技有限公司 | Intelligent monitoring system and process for building |
CN117058331B (en) * | 2023-10-13 | 2023-12-19 | 山东建筑大学 | Indoor personnel three-dimensional track reconstruction method and system based on single monitoring camera |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Golparvar-Fard et al. | Evaluation of image-based modeling and laser scanning accuracy for emerging automated performance monitoring techniques | |
CN115035162A (en) | Monitoring video personnel positioning and tracking method and system based on visual slam | |
US9697607B2 (en) | Method of estimating imaging device parameters | |
US8139111B2 (en) | Height measurement in a perspective image | |
US8107722B2 (en) | System and method for automatic stereo measurement of a point of interest in a scene | |
WO2023093217A1 (en) | Data labeling method and apparatus, and computer device, storage medium and program | |
KR20130138247A (en) | Rapid 3d modeling | |
Elibol et al. | A new global alignment approach for underwater optical mapping | |
CN112991534B (en) | Indoor semantic map construction method and system based on multi-granularity object model | |
JP2010256253A (en) | Image capturing device for three-dimensional measurement and method therefor | |
CN106530407A (en) | Three-dimensional panoramic splicing method, device and system for virtual reality | |
US20230106339A1 (en) | 2d and 3d floor plan generation | |
Afzal et al. | Rgb-d multi-view system calibration for full 3d scene reconstruction | |
Özdemir et al. | A multi-purpose benchmark for photogrammetric urban 3D reconstruction in a controlled environment | |
Wientapper et al. | Composing the feature map retrieval process for robust and ready-to-use monocular tracking | |
Junejo et al. | Autoconfiguration of a dynamic nonoverlapping camera network | |
CN117711130A (en) | Factory safety production supervision method and system based on 3D modeling and electronic equipment | |
JP3221384B2 (en) | 3D coordinate measuring device | |
CN116152471A (en) | Factory safety production supervision method and system based on video stream and electronic equipment | |
Biström | Comparative analysis of properties of LiDAR-based point clouds versus camera-based point clouds for 3D reconstruction using SLAM algorithms | |
Zhang et al. | ARCargo: Multi-Device Integrated Cargo Loading Management System with Augmented Reality | |
Ahmadabadian | Photogrammetric multi-view stereo and imaging network design | |
Xu et al. | Robust object detection with real-time fusion of multiview foreground silhouettes | |
US20230334688A1 (en) | Multi-view height estimation from satellite images | |
Vezeteu | Stereo-Camera–LiDAR Calibration for Autonomous Driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |