CN115546829A - Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera - Google Patents

Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera Download PDF

Info

Publication number
CN115546829A
CN115546829A CN202211187402.8A CN202211187402A CN115546829A CN 115546829 A CN115546829 A CN 115546829A CN 202211187402 A CN202211187402 A CN 202211187402A CN 115546829 A CN115546829 A CN 115546829A
Authority
CN
China
Prior art keywords
pedestrian
dimensional
frame
camera
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211187402.8A
Other languages
Chinese (zh)
Inventor
寄珊珊
李特
朱世强
孟启炜
王文
宛敏红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202211187402.8A priority Critical patent/CN115546829A/en
Publication of CN115546829A publication Critical patent/CN115546829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian spatial information sensing method and device based on a ZED stereo camera, which are mainly used for intelligently sensing the spatial position and the moving speed of a pedestrian by a guiding robot in public scenes such as an exhibition hall and the like. Real-time data in a scene are collected by a ZED binocular vision camera and uploaded to a cloud server; inputting the preprocessed RGB data into a deployed human key point detection network to obtain human key point two-dimensional information, and generating a pedestrian enclosure frame according to the human two-dimensional key point information of the upper half body trunk region of the pedestrian; continuously tracking multi-target pedestrians under continuous multiple frames; acquiring three-dimensional space coordinates of key points of the human body in a corresponding area by combining point cloud data, and calculating the spatial position and the moving speed of the pedestrian; and finally, the navigation robot performs body movement control according to the acquired pedestrian space information, and intelligent navigation tasks such as autonomous following and obstacle avoidance are completed, so that the flexibility of the navigation robot is increased, and the interaction experience of visitors is improved.

Description

Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera
Technical Field
The invention relates to the field of machine vision, in particular to a pedestrian spatial information perception method and device based on a ZED stereo camera.
Background
The intelligent guide robot is adopted to replace a manual interpreter in public places such as exhibition halls and museums, so that manpower can be effectively saved. The navigation robot needs to intelligently sense the environment, and the pedestrians serving as dynamic targets in the scene have uncertainty, so that the navigation robot has important significance in intelligently sensing spatial information such as spatial positions and moving speeds of the pedestrians.
Vision is an important way for robots to obtain external information. The method comprises the steps of firstly obtaining a pedestrian detection frame according to a 2D image, and then obtaining pedestrian spatial position information according to a coordinate system conversion relation and depth information or point cloud information, wherein the method depends on the accuracy of the pedestrian detection frame, pedestrians have the characteristics of flexibility and constant change of body postures, and a large amount of background noise exists in a traditional pedestrian detection enclosure frame when the body of the pedestrian changes, so that a large error also exists in subsequent 2D-to-3D spatial conversion based on the enclosure frame, and estimation of the spatial position and the moving speed of the pedestrian is further influenced.
Disclosure of Invention
In order to solve the defects of the prior art and achieve the purpose of improving the accuracy of identifying the spatial position and the moving speed of the pedestrian, the invention adopts the following technical scheme:
a pedestrian spatial information perception method based on a stereo camera comprises the following steps:
step S1: acquiring real-time image data of a stereo camera, wherein the real-time image data comprises RGB image data and point cloud data;
step S2: detecting key points of a human body through RGB images to obtain two-dimensional key point information of a pedestrian, determining an upper half body trunk region of the pedestrian according to the dynamic characteristic of the pedestrian, generating a pedestrian surrounding frame by combining the two-dimensional key point information of the upper half body trunk region of the pedestrian, and taking the pedestrian surrounding frame as a pedestrian detection frame;
and expanding the generated pedestrian surrounding frame in proportion, and taking the expanded pedestrian surrounding frame as a pedestrian detection frame. Because the generated pedestrian enclosure frame is the enclosure frame with the smallest pedestrian, the minimum pedestrian enclosure frame is only the smallest enclosure frame of the pedestrian framework, and the minimum pedestrian enclosure frame needs to be expanded; the area ratio of the expanded pedestrian surrounding frame to the minimum pedestrian surrounding frame is 1.2;
and step S3: carrying out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
and step S4: and for the continuously tracked pedestrians, acquiring three-dimensional key point information of the pedestrians according to the two-dimensional key point information of the pedestrians in combination with the point cloud data, calculating the spatial position coordinates of the pedestrians relative to a three-dimensional camera coordinate system under the current frame, calculating the moving speed of the pedestrians in combination with the frame interval, and generating real-time spatial information of the pedestrians relative to the three-dimensional camera.
Further, in step S2, RGB image data is obtained, a human body key point detection network is used for forward reasoning, a key point thermodynamic diagram and a partial association domain are output, two-dimensional key points are extracted according to the key point thermodynamic diagram and the partial association domain and grouped, the two-dimensional key points belonging to the same pedestrian are matched to the current pedestrian, and a two-dimensional key point coordinate of each pedestrian in the current image is obtained.
Further, the step S3 includes the steps of:
step S3.1: acquiring pedestrian motion characteristics according to the pedestrian detection frame; acquiring the appearance characteristics of the pedestrian according to the similarity characteristics of the two-dimensional key points;
step S3.2: acquiring actually measured state information of the pedestrian at the current time t according to the motion characteristics of the pedestrian and the appearance characteristics of the pedestrian;
step S3.3: performing data association on the historical track and actual measurement state information of the pedestrians at the time t to obtain the ID of each pedestrian at the time t; the purpose of data association is to match the detection result at the current moment with the historical track through characteristics such as appearance, geometry and the like, so as to determine the ID of each person detected by the current frame;
step S3.4: and updating the historical track through the ID of each pedestrian at the time t, so that the pedestrians are continuously tracked.
Further, in the two-dimensional key point similarity feature in the step S3.1, the target key point similarity evaluation index OKS is used for similarity calculation, and whether the two-dimensional key points are related or not is judged through a preset threshold.
Further, the data association in the step S3.3 is to perform two associations of motion characteristics and appearance characteristics on each frame, perform linear weighting to obtain a final association matrix, and obtain a pedestrian matching result between frames by using a hungarian matching algorithm according to the association matrix.
Further, the step S4 includes the steps of:
step S4.1: screening by combining the confidence coefficient of the two-dimensional key points according to the trunk region of the upper half of the pedestrian, and acquiring point cloud data according to the screened two-dimensional key points to obtain a three-dimensional coordinate set of the pedestrian key points;
step S4.2: for each pedestrian target, calculating a three-dimensional coordinate mean value as a space position coordinate of the pedestrian target according to the three-dimensional coordinate set of the key point of the pedestrian target; and calculating the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, calculating the moving distance of the pedestrian relative to the stereo camera under the time interval of the current moment and the previous moment according to the spatial position coordinate of the pedestrian at the current moment and the spatial position coordinate of the pedestrian at the previous moment, and combining the used time to obtain the moving speed of the pedestrian at the current moment.
Further, the pedestrian moving speed formula in step S4.2 is as follows:
Figure 100002_DEST_PATH_IMAGE002
wherein X, Y and Z are respectively the three-dimensional space position coordinates of the pedestrian, i represents the ID of the currently tracked pedestrian, t represents the time of the current frame,
Figure 100002_DEST_PATH_IMAGE004
Figure 100002_DEST_PATH_IMAGE006
Figure 100002_DEST_PATH_IMAGE008
representing the spatial location coordinates of the pedestrian under the current frame with respect to the stereo camera,
Figure 100002_DEST_PATH_IMAGE010
Figure 100002_DEST_PATH_IMAGE012
Figure 100002_DEST_PATH_IMAGE014
are respectively shown in
Figure 100002_DEST_PATH_IMAGE016
The spatial position coordinates of the relative stereo camera at a moment,
Figure 100002_DEST_PATH_IMAGE018
m represents the number of frame intervals, and f represents the stereo camera frame rate.
A pedestrian spatial information sensing device based on a stereo camera comprises a real-time image data acquisition module, a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module;
the real-time image data acquisition module acquires RGB image data and point cloud data through a stereo camera;
the pedestrian detection frame acquisition module is used for detecting key points of a human body through RGB images to obtain two-dimensional key point information of a pedestrian, determining a pedestrian upper half body trunk area according to the dynamic characteristics of the pedestrian, generating a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk area, and taking the pedestrian enclosure frame as a pedestrian detection frame;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
the real-time spatial information generation module acquires three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian and the point cloud data, calculates spatial position coordinates of the pedestrian relative to a coordinate system of the stereo camera under the current frame, calculates the moving speed of the pedestrian according to frame intervals, and generates real-time spatial information of the pedestrian relative to the stereo camera.
A pedestrian space information perception method based on a ZED camera is characterized in that real-time image data are collected by the ZED binocular vision camera arranged on a navigation robot and are transmitted to a cloud server to obtain a pedestrian detection frame, track multiple targets and generate real-time space information, RGB images are preprocessed, image size resize and encoding are adjusted to improve subsequent data transmission efficiency, the preprocessed data are transmitted to the cloud server through a message middleware, the view range of the ZED binocular vision camera is matched with the pedestrian upper half trunk area, the pedestrian upper half trunk area is determined according to the dynamic characteristics of pedestrians and the view range of the ZED binocular vision camera on the navigation robot, the cloud server transmits the real-time space information of the pedestrians relative to the ZED binocular vision camera to the navigation robot, and the navigation robot performs movement control of a body according to the real-time space information to complete a navigation task.
A pedestrian spatial information sensing device based on a ZED camera comprises a cloud server and the ZED binocular vision camera arranged on a navigation robot, wherein the cloud server comprises a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module;
the ZED binocular vision camera acquires RGB image data and point cloud data in real time and transmits the RGB image data and the point cloud data to the cloud server;
the pedestrian detection frame acquisition module detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines a pedestrian upper half body trunk region according to dynamic characteristics of the pedestrians, generates a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk region, and takes the pedestrian enclosure frame as a pedestrian detection frame; the vision range of the ZED binocular vision camera is matched with the trunk area of the upper half of the pedestrian, and the trunk area of the upper half of the pedestrian is determined according to the dynamic characteristic of the pedestrian and the vision range of the ZED binocular vision camera on the navigation robot;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the two-dimensional key point similarity characteristics and the pedestrian detection frame;
the real-time spatial information generation module is used for acquiring three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian by combining point cloud data, calculating the spatial position coordinate of the pedestrian relative to a three-dimensional camera coordinate system under the current frame, and calculating the moving speed of the pedestrian by combining frame intervals to generate real-time spatial information of the pedestrian relative to the three-dimensional camera;
and the navigation robot acquires real-time space information of the pedestrian relative to the ZED binocular vision camera from the cloud server, performs mobile control on the body and completes navigation tasks.
The invention has the advantages and beneficial effects that:
the invention adopts a ZED binocular vision camera which is installed at the head of the navigation robot in a non-contact way to intelligently sense pedestrians in a scene; the light-weight human body key point detection network deployed by the cloud is adopted to obtain two-dimensional key point information, and powerful computing resources and storage resources of a cloud server can be fully utilized, so that the problem of insufficient computing power of a robot body is effectively solved; the method comprises the steps of considering human motion characteristics and the visual field range of a navigation robot, determining key points of a trunk region of the upper body of a human body as a target region, wherein the region does not contain human body key points with large changes such as arms, a pedestrian detection frame generated according to the region is more stable, and the accuracy of a pedestrian space position and a pedestrian moving speed calculated according to the three-dimensional coordinates of the pedestrian key points of the region is higher.
Drawings
Fig. 1 is a flow chart of a pedestrian spatial information perception method based on stereo camera vision according to the invention.
Fig. 2 is a schematic diagram of a pedestrian spatial information perception scenarized application facing a navigation robot based on the ZED vision in the embodiment of the present invention.
FIG. 3a is a diagram of human body key point parameters in an embodiment of the present invention.
Fig. 3b is a schematic diagram of generating a pedestrian enclosure frame based on the two-dimensional key points of the upper half body of the pedestrian in the embodiment of the present invention.
FIG. 3c is a schematic diagram of three-dimensional key points in an embodiment of the invention.
FIG. 4 is a flow chart of visualization of pedestrian spatial information perception based on ZED vision in an actual measurement scene in the embodiment of the invention.
FIG. 5 is a comparison graph of pedestrian distance measurement errors based on human body key points and a traditional human body detection frame in the embodiment of the invention.
FIG. 6 is a flow chart of a pedestrian spatial information perception method based on the ZED camera.
Fig. 7 is a schematic structural diagram of a pedestrian spatial information perception device based on a stereo camera in the embodiment of the invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1, the method for sensing spatial information of a pedestrian based on stereo camera vision includes the following steps:
step S1: acquiring real-time image data of a stereo camera, wherein the real-time image data comprises RGB image data and point cloud data;
in the embodiment of the invention, as shown in fig. 2, the ZED binocular vision camera is installed on the head of the wheeled robot, and is about 1.2 meters away from the ground level. RGB data and point cloud data of a ZED camera (left eye) are acquired, the frame rate is 30Hz, and the resolution of an RGB color image is 1280 multiplied by 720. Preprocessing the RGB image, including resize and encoding to improve the subsequent data transmission efficiency, wherein the size of the resize is 456 x 256 and is consistent with the input of a subsequent human body key point detection network, and the preprocessed data are transmitted to a cloud server through message middleware.
Step S2: detecting key points of a human body through RGB images to obtain two-dimensional key point information of a pedestrian, determining an upper half body trunk region of the pedestrian according to the dynamic characteristic of the pedestrian, generating a pedestrian surrounding frame by combining the two-dimensional key point information of the upper half body trunk region of the pedestrian, and taking the pedestrian surrounding frame as a pedestrian detection frame;
the method comprises the steps of obtaining RGB image data, adopting a human body key point detection network to conduct forward reasoning, outputting key point thermodynamic diagrams and partial association domains, extracting two-dimensional key points according to the key point thermodynamic diagrams and the partial association domains, grouping, matching the two-dimensional key points belonging to the same pedestrian to the current pedestrian, and obtaining the two-dimensional key point coordinates of each pedestrian in the current image.
In the embodiment of the invention, the cloud server receives real-time data acquired from the ZED camera at the robot end, and the real-time data is input into a deployed human body key point detection network for forward reasoning, the human body key point detection network adopts Lightweight OpenPose, and the main network adopts improved MobleeNet, so that the real-time requirement can be met, and the two-dimensional key point information of pedestrians in the image can be acquired through the framework. The input of the network is decoded RGB data, the size is [1,3,256,456], the output of the forward reasoning network is a key point thermodynamic diagram (Heatmaps) and a Partial Association Field (PAFs), the size is [1, 19, 32, 57], [1, 38, 32, 57], all key points are extracted according to Heatmaps and PAFs and are grouped, the key points belonging to the same pedestrian are matched to the current pedestrian, the key point coordinate of each pedestrian in the current image is obtained, the number of the pedestrians detected by the current image is assumed to be N, the output is Nx [18,3], wherein 18 represents that the number of the key points of each pedestrian is 18,3 represents that the horizontal axis x, the vertical axis y and the confidence coefficient of the key points under the image coordinate system, and the confidence coefficient range is 0-1;
the pedestrian has the characteristic of posture change, particularly, the four limbs of the human body are greatly changed along with the motion of the human body, and when the human body is close to the robot, the vision field of the robot can only see the upper half part of the human body, the influence of the two factors is comprehensively considered, and the trunk area of the upper half body of the pedestrian is determined to be the human body target area according to the dynamic characteristic of the pedestrian and the vision field range of the navigation robot; the key points of the human body contained in the region are as follows: {0, neck, 2; generating a smallest bounding box by adopting a bounngrake function of OpenCV according to the two-dimensional key points of the area, as shown by P1 in a 2D key point schematic diagram in fig. 3b, and using a dotted line table; considering that the bounding box is only the smallest bounding box of the pedestrian framework and needs to be expanded appropriately, the expanded bounding box is shown as P2 and is represented by a solid line, the area ratio of P2 to P1 is 1.2, and the bounding box is used as a pedestrian detection box in the subsequent pedestrian tracking algorithm.
And step S3: according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame, the method carries out multi-target tracking on pedestrians under continuous multi-frame images and comprises the following steps:
step S3.1: acquiring the motion characteristics of the pedestrians according to the pedestrian detection frame; acquiring the appearance characteristic of the pedestrian according to the similarity characteristic of the two-dimensional key points; the similarity calculation method comprises the following steps of (1) performing similarity calculation on two-dimensional key point similarity characteristics by adopting a target key point similarity evaluation index OKS, and judging whether two-dimensional key points are related or not through a preset threshold value;
step S3.2: acquiring actual measurement state information of the pedestrian at the current time t according to the motion characteristics of the pedestrian and the appearance characteristics of the pedestrian;
step S3.3: performing data association on the historical track and actual measurement state information of the pedestrians at the time t to obtain the ID of each pedestrian at the time t; the data association is to perform two associations of motion characteristics and appearance characteristics on each frame, perform linear weighting to obtain a final association matrix, and obtain pedestrian matching results among the frames by adopting a Hungarian matching algorithm according to the association matrix;
step S3.4: and updating the historical track by the ID of each pedestrian at the time t.
In the embodiment of the invention, the pedestrian target in each frame of RGB image transmitted to the cloud is continuously tracked between frames by allocating unique identity Information (ID) to the pedestrian target through a multi-target tracking method. The multi-target pedestrian tracking method adopts a Deepsort algorithm principle. Acquiring pedestrian motion characteristics according to the pedestrian detection frame generated in the step S2, acquiring appearance characteristics of a pedestrian according to the two-dimensional key point similarity characteristics, calculating similarity by adopting an Object Keypoint Similarity (OKS), judging whether the correlation is successful or not according to a set threshold value, and fusing the two characteristic information to acquire actual measurement state information of the pedestrian at the current time t; and performing data association on the historical track and the state result at the time t, performing linear weighting by combining two associations of motion characteristics and appearance characteristics to obtain a final association matrix, obtaining a matching result by adopting a Hungarian matching algorithm according to the association matrix, namely obtaining the ID of each pedestrian target at the time t, and finally updating the historical track by using the result at the time t so as to obtain the ID of the pedestrian.
The purpose of data association is to match the detection result at the current moment with the historical track through characteristics such as appearance, geometry and the like so as to determine the ID of each person detected by the current frame, linear weighting is carried out by adopting two associations of motion characteristics and appearance characteristics to serve as a final association matrix, and a Hungarian matching algorithm is adopted according to the association matrix to obtain a matching result.
And step S4: for the continuously tracked pedestrians, acquiring three-dimensional key point information of the pedestrians according to the two-dimensional key point information of the pedestrians by combining point cloud data, calculating the spatial position coordinates of the pedestrians relative to a three-dimensional camera coordinate system under the current frame, calculating the moving speed of the pedestrians by combining frame intervals, and generating real-time spatial information of the pedestrians relative to the three-dimensional camera, wherein the method comprises the following steps:
step S4.1: screening by combining the confidence coefficient of the two-dimensional key points according to the trunk region of the upper half of the pedestrian, and acquiring point cloud data according to the screened two-dimensional key points to obtain a three-dimensional coordinate set of the pedestrian key points;
in the embodiment of the invention, according to the trunk region of the upper half body of the pedestrian determined in the step 2, screening is carried out by combining the confidence coefficient of the key points, the key points with the confidence coefficient larger than 0.6 participate in the following calculation, and then, the corresponding three-dimensional key points are searched according to the acquired ZED point cloud data, and the three-dimensional coordinate set of the corresponding pedestrian key points is acquired;
the human body key point detection network outputs a 2D key point coordinate set of the pedestrian under the image coordinate system as
Figure DEST_PATH_IMAGE020
) Wherein k is the number of key points and has a value of k =0,1,2,5,8,11,14,15,16,17,
Figure DEST_PATH_IMAGE022
for corresponding confidence, as shown by the "2D keypoint diagram" solid origin in fig. 3 b. Searching by combining point cloud information acquired by a ZED camera, acquiring corresponding three-dimensional coordinates of pedestrians according to a pedestrian trunk region and a two-dimensional key point confidence coefficient, wherein the corresponding three-dimensional coordinates of all two-dimensional key points of the pedestrians are shown as a '3D key point schematic diagram' in figure 3c, and the three-dimensional coordinate set of the human body key points comprising the trunk region is
Figure DEST_PATH_IMAGE024
) As shown by a triangle icon in fig. 3c, the COORDINATE system is set by a camera parameter COORDINATE _ system, left _ HANDED _ Y _ UP, with reference to the left eye camera of the ZED camera, and the COORDINATE system and the directions of the XYZ axes are shown in fig. 2;
step S4.2: for each pedestrian target, calculating a three-dimensional coordinate mean value as a space position coordinate of the pedestrian target according to the three-dimensional coordinate set of the key point of the pedestrian target; calculating the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, calculating the moving distance of the pedestrian relative to the stereo camera under the time interval of the current moment and the previous moment according to the spatial position coordinate of the pedestrian at the current moment and the spatial position coordinate of the pedestrian at the previous moment, and combining the used time to obtain the moving speed of the pedestrian at the current moment;
in the embodiment of the invention, according to the acquired three-dimensional coordinate set of the key points of the human body of the pedestrian, the average value of the three-dimensional coordinate set is calculated to be used as the space three-dimensional coordinate position of the target pedestrian, namely
Figure DEST_PATH_IMAGE026
Where N =10, represents the number of keypoints for each pedestrian upper body region; i represents the ID of the currently tracked pedestrian, and if the time of the current frame is t, the spatial position of the pedestrian under the current frame relative to the robot is
Figure DEST_PATH_IMAGE028
In a
Figure 118756DEST_PATH_IMAGE016
The spatial position of the moment relative to the robot is
Figure DEST_PATH_IMAGE030
The moving speed of the pedestrian is
Figure DEST_PATH_IMAGE032
Wherein, in the process,
Figure DEST_PATH_IMAGE034
f is a ZED camera frame rate, m represents the number of frame intervals, the total time consumption of the algorithm is considered comprehensively, and the value of m is
Figure DEST_PATH_IMAGE036
As shown in fig. 4, the ZED camera acquires RBG data and point cloud data, then acquires two-dimensional coordinates of human key points according to a human key point detection algorithm, and finally acquires three-dimensional coordinates of human key points by combining two-dimensional key point information and corresponding point cloud information. In order to verify the effectiveness of the method, about 1000 frames of images are collected, a human body detection frame based on YOLOV3 is adopted to combine point cloud information to calculate the distance from a pedestrian to a camera, and key points based on a human body trunk region are adopted to calculate the distance from the pedestrian to the camera, comparison tests and statistics are respectively carried out, and the distance from the pedestrian to the camera is calculated by adopting an Euclidean distance formula:
Figure DEST_PATH_IMAGE038
the three-dimensional space coordinates (X, Y and Z) of the pedestrian are shown in figure 5, the dark line represents the method, the light line represents the comparison method, and it can be seen that in the moving process of the pedestrian, the method based on the human body detection frame can introduce larger noise along with the change of the human body posture, and the method has better anti-interference performance and can acquire more accurate pedestrian positioning information.
A pedestrian spatial information perception device based on a stereo camera is used for realizing the pedestrian spatial information perception method based on the stereo camera and comprises a real-time image data acquisition module, a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module;
the real-time image data acquisition module acquires RGB image data and point cloud data through a stereo camera;
the pedestrian detection frame acquisition module detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines a pedestrian upper half body trunk region according to dynamic characteristics of the pedestrians, generates a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk region, and takes the pedestrian enclosure frame as a pedestrian detection frame;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the two-dimensional key point similarity characteristics and the pedestrian detection frame;
the real-time spatial information generation module acquires three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian and the point cloud data, calculates spatial position coordinates of the pedestrian relative to a coordinate system of the stereo camera under the current frame, calculates the moving speed of the pedestrian according to frame intervals, and generates real-time spatial information of the pedestrian relative to the stereo camera.
The implementation of this part is similar to that of the above method embodiment, and is not described here again.
As shown in fig. 6, a pedestrian spatial information perception method based on a ZED camera, according to the pedestrian spatial information perception method based on a stereo camera, real-time image data are collected by the ZED binocular vision camera arranged on a navigation robot and transmitted to a cloud server for acquiring a pedestrian detection frame, tracking multiple targets and generating real-time spatial information, wherein the vision range of the ZED binocular vision camera is matched with the upper half trunk area of a pedestrian, the upper half trunk area of the pedestrian is determined according to the dynamic characteristics of the pedestrian and the vision range of the ZED binocular vision camera on the navigation robot, the cloud server transmits the real-time spatial information of the pedestrian relative to the ZED binocular vision camera to the navigation robot, and the navigation robot performs movement control of a body according to the real-time spatial information to complete navigation tasks such as autonomous following and obstacle avoidance.
Specifically, the method comprises the following steps:
step S101: acquiring real-time image data of a ZED binocular vision camera on the navigation robot, wherein the real-time image data comprises RGB image data and point cloud data, and transmitting the real-time image data to a cloud server;
step S102: the cloud server detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines an upper half body trunk region of the pedestrians according to dynamic characteristics of the pedestrians, generates a pedestrian surrounding frame by combining the two-dimensional key point information of the upper half body trunk region of the pedestrians, and takes the pedestrian surrounding frame as a pedestrian detection frame; the system comprises a navigation robot, a ZED binocular vision camera, a pedestrian upper half body trunk area, a navigation robot and a monitoring system, wherein the vision range of the ZED binocular vision camera is matched with the upper half body trunk area of the pedestrian;
step S103: the cloud server performs multi-target tracking on pedestrians under continuous multi-frame images according to the two-dimensional key point similarity characteristics and the pedestrian detection frame;
step S104: the cloud server acquires three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian and the point cloud data, calculates the space position coordinate of the pedestrian relative to a coordinate system of the ZED binocular vision camera under the current frame, calculates the moving speed of the pedestrian according to the frame interval, and generates real-time space information of the pedestrian relative to the ZED binocular vision camera;
step S105: the cloud server transmits real-time space information of the pedestrians relative to the ZED binocular vision camera to the navigation robot, and the navigation robot performs mobile control over the body according to the real-time space information to complete navigation tasks.
In the embodiment of the invention, the cloud stores the pedestrian ID, the pedestrian spatial position and the pedestrian movement Speed at the current moment in a database of a cloud server, and stores the pedestrian ID, the pedestrian spatial position and the pedestrian movement Speed in a message queue mode, wherein the basic formats are data = { ' key1': value1', ' key2': value2', ' key3': value3' }, wherein the key1, the key2 and the key3 are respectively ' p _ ID ', ' p _ Pos3D ' and ' p _ Speed ', represent the pedestrian ID, the pedestrian three-dimensional spatial coordinate and the pedestrian movement Speed, and the corresponding value is the calculated identity ID, the pedestrian three-dimensional spatial coordinate and the pedestrian movement Speed of the specific pedestrian i. According to the request instruction of the robot end, the cloud end sends data to the robot end in real time through the RocktMQ message middleware, the robot performs body movement control according to a preset instruction, self movement speed adjustment is performed according to real-time position information of pedestrians, when the distance from the pedestrians to the robot is smaller than a safe distance, the robot stops moving to avoid collision, self movement speed adjustment is performed according to the real-time movement speed of the pedestrians, and therefore intelligent navigation tasks such as autonomous following and obstacle avoidance are achieved.
A pedestrian spatial information perception device based on a ZED camera is used for realizing a pedestrian spatial information perception method based on the ZED camera and comprises a cloud server and a ZED binocular vision camera arranged on a navigation robot, wherein the cloud server comprises a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module;
the ZED binocular vision camera acquires RGB image data and point cloud data in real time and transmits the RGB image data and the point cloud data to the cloud server;
the pedestrian detection frame acquisition module detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines a pedestrian upper half body trunk region according to dynamic characteristics of the pedestrians, generates a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk region, and takes the pedestrian enclosure frame as a pedestrian detection frame; the vision range of the ZED binocular vision camera is matched with the trunk area of the upper half body of the pedestrian, and the trunk area of the upper half body of the pedestrian is determined according to the dynamic characteristic of the pedestrian and the vision range of the ZED binocular vision camera on the navigation robot;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
the real-time spatial information generation module is used for acquiring three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian by combining point cloud data, calculating the spatial position coordinate of the pedestrian relative to a three-dimensional camera coordinate system under the current frame, and calculating the moving speed of the pedestrian by combining frame intervals to generate real-time spatial information of the pedestrian relative to the three-dimensional camera;
and the navigation robot acquires real-time space information of the pedestrian relative to the ZED binocular vision camera from the cloud server, performs mobile control on the body and completes navigation tasks.
The implementation of this part is similar to that of the above method embodiment, and is not described here again.
Corresponding to the embodiment of the pedestrian spatial information perception method based on the stereo camera vision, the invention also provides an embodiment of the pedestrian spatial information perception device based on the stereo camera vision.
Referring to fig. 7, the apparatus for sensing pedestrian spatial information based on stereoscopic camera vision according to the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and when the one or more processors execute the executable codes, the one or more processors are configured to implement the method for sensing pedestrian spatial information based on stereoscopic camera vision in the foregoing embodiment.
The embodiment of the pedestrian spatial information perception device based on the stereo camera vision of the invention can be applied to any device with data processing capability, such as a computer or other devices or apparatuses. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. From a hardware aspect, as shown in fig. 7, the present invention is a hardware structure diagram of any device with data processing capability in which a pedestrian spatial information sensing device based on stereo camera vision is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 7, any device with data processing capability in which an apparatus in the embodiment is located may also include other hardware according to an actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the method for sensing spatial information of a pedestrian based on stereo camera vision in the foregoing embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the embodiments of the present invention in nature.

Claims (10)

1. A pedestrian spatial information perception method based on a stereo camera is characterized by comprising the following steps:
step S1: acquiring real-time image data of a stereo camera, wherein the real-time image data comprises RGB image data and point cloud data;
step S2: detecting key points of a human body through RGB images to obtain two-dimensional key point information of a pedestrian, determining an upper half body trunk region of the pedestrian according to the dynamic characteristic of the pedestrian, generating a pedestrian surrounding frame by combining the two-dimensional key point information of the upper half body trunk region of the pedestrian, and taking the pedestrian surrounding frame as a pedestrian detection frame;
and step S3: carrying out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
and step S4: and for the continuously tracked pedestrians, acquiring three-dimensional key point information of the pedestrians according to the two-dimensional key point information of the pedestrians in combination with the point cloud data, calculating the spatial position coordinates of the pedestrians relative to a three-dimensional camera coordinate system under the current frame, calculating the moving speed of the pedestrians in combination with the frame interval, and generating real-time spatial information of the pedestrians relative to the three-dimensional camera.
2. The stereo camera-based pedestrian spatial information perception method according to claim 1, wherein: in the step S2, RGB image data are obtained, a human body key point detection network is adopted for forward reasoning, a key point thermodynamic diagram and a partial association domain are output, two-dimensional key points are extracted according to the key point thermodynamic diagram and the partial association domain and grouped, the two-dimensional key points belonging to the same pedestrian are matched to the current pedestrian, and the two-dimensional key point coordinates of each pedestrian in the current image are obtained.
3. The pedestrian spatial information perception method based on the stereo camera according to claim 1, wherein: the step S3 includes the steps of:
step S3.1: acquiring the motion characteristics of the pedestrians according to the pedestrian detection frame; acquiring the appearance characteristics of the pedestrian according to the similarity characteristics of the two-dimensional key points;
step S3.2: acquiring actual measurement state information of the pedestrian at the current time t according to the motion characteristics of the pedestrian and the appearance characteristics of the pedestrian;
step S3.3: performing data association on the historical track and actual measurement state information of the pedestrians at the time t to obtain the ID of each pedestrian at the time t;
step S3.4: and updating the historical track by the ID of each pedestrian at the time t.
4. The pedestrian spatial information perception method based on the stereo camera according to claim 3, wherein: and the similarity calculation is carried out on the two-dimensional key point similarity characteristics in the step S3.1 by adopting a target key point similarity evaluation index OKS, and whether the two-dimensional key points are related or not is judged through a preset threshold value.
5. The pedestrian spatial information perception method based on the stereo camera according to claim 3, wherein: and the data association in the step S3.3 is to perform two associations of motion characteristics and appearance characteristics on each frame, perform linear weighting to obtain a final association matrix, and obtain a pedestrian matching result between frames by adopting a Hungarian matching algorithm according to the association matrix.
6. The pedestrian spatial information perception method based on the stereo camera according to claim 1, wherein: the step S4 includes the steps of:
step S4.1: screening by combining the confidence coefficient of the two-dimensional key points according to the trunk region of the upper half of the pedestrian, and acquiring point cloud data according to the screened two-dimensional key points to obtain a three-dimensional coordinate set of the pedestrian key points;
step S4.2: for each pedestrian target, calculating a three-dimensional coordinate mean value as a space position coordinate of the pedestrian target according to the three-dimensional coordinate set of the key point of the pedestrian target; and calculating the actual distance of the pedestrian relative to the stereo camera according to the Euclidean distance, calculating the moving distance of the pedestrian relative to the stereo camera under the time interval between the current moment and the previous moment according to the spatial position coordinate of the pedestrian at the current moment and the spatial position coordinate of the pedestrian at the previous moment, and combining the used time to obtain the moving speed of the pedestrian at the current moment.
7. The pedestrian spatial information perception method based on the stereo camera according to claim 6, wherein: the pedestrian moving speed formula in the step S4.2 is as follows:
Figure DEST_PATH_IMAGE002
wherein X, Y and Z are respectively the three-dimensional space position coordinates of the pedestrian, i represents the ID of the currently tracked pedestrian, t represents the time of the current frame,
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE008
representing the spatial location coordinates of the pedestrian under the current frame with respect to the stereo camera,
Figure DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE012
Figure DEST_PATH_IMAGE014
are respectively shown in
Figure DEST_PATH_IMAGE016
The spatial position coordinates of the relative stereo camera at a moment,
Figure DEST_PATH_IMAGE018
m represents the number of frame intervals, and f represents the stereo camera frame rate.
8. A pedestrian spatial information perception device based on a stereo camera is used for realizing the pedestrian spatial information perception method based on the stereo camera and described in any one of claims 1 to 7, and comprises a real-time image data acquisition module, a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module, and is characterized in that:
the real-time image data acquisition module acquires RGB image data and point cloud data through a stereo camera;
the pedestrian detection frame acquisition module detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines a pedestrian upper half body trunk region according to dynamic characteristics of the pedestrians, generates a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk region, and takes the pedestrian enclosure frame as a pedestrian detection frame;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
the real-time spatial information generation module is used for acquiring three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian and the point cloud data, calculating the spatial position coordinate of the pedestrian relative to a three-dimensional camera coordinate system under the current frame, calculating the moving speed of the pedestrian according to the frame interval, and generating the real-time spatial information of the pedestrian relative to the three-dimensional camera.
9. A pedestrian spatial information perception method based on a ZED camera is characterized in that real-time image data are collected by the ZED binocular vision camera arranged on a navigation robot and transmitted to a cloud server to acquire a pedestrian detection frame, track multiple targets and generate real-time spatial information, wherein the vision range of the ZED binocular vision camera is matched with the upper half trunk area of a pedestrian, the upper half trunk area of the pedestrian is determined according to the dynamic characteristics of the pedestrian and the vision range of the ZED binocular vision camera on the navigation robot, the cloud server transmits the real-time spatial information of the pedestrian relative to the ZED binocular vision camera to the navigation robot, and the navigation robot performs movement control of a body according to the real-time spatial information to complete a navigation task.
10. A pedestrian spatial information perception device based on a ZED camera for realizing the pedestrian spatial information perception method based on the ZED camera of claim 9, comprising a cloud server and ZED binocular vision cameras disposed on a navigation robot, characterized in that: the cloud server comprises a pedestrian detection frame acquisition module, a multi-target tracking module and a real-time spatial information generation module;
the ZED binocular vision camera acquires RGB image data and point cloud data in real time and transmits the RGB image data and the point cloud data to the cloud server;
the pedestrian detection frame acquisition module detects human body key points through RGB images to obtain two-dimensional key point information of pedestrians, determines a pedestrian upper half body trunk region according to dynamic characteristics of the pedestrians, generates a pedestrian enclosure frame by combining the two-dimensional key point information of the pedestrian upper half body trunk region, and takes the pedestrian enclosure frame as a pedestrian detection frame; the vision range of the ZED binocular vision camera is matched with the trunk area of the upper half of the pedestrian, and the trunk area of the upper half of the pedestrian is determined according to the dynamic characteristic of the pedestrian and the vision range of the ZED binocular vision camera on the navigation robot;
the multi-target tracking module carries out multi-target tracking on pedestrians under continuous multi-frame images according to the similarity characteristics of the two-dimensional key points and the pedestrian detection frame;
the real-time spatial information generation module is used for acquiring three-dimensional key point information of the pedestrian according to the two-dimensional key point information of the continuously tracked pedestrian by combining point cloud data, calculating the spatial position coordinate of the pedestrian relative to a three-dimensional camera coordinate system under the current frame, and calculating the moving speed of the pedestrian by combining frame intervals to generate real-time spatial information of the pedestrian relative to the three-dimensional camera;
and the navigation robot acquires real-time space information of the pedestrian relative to the ZED binocular vision camera from the cloud server, performs mobile control on the body and completes navigation tasks.
CN202211187402.8A 2022-09-28 2022-09-28 Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera Pending CN115546829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211187402.8A CN115546829A (en) 2022-09-28 2022-09-28 Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211187402.8A CN115546829A (en) 2022-09-28 2022-09-28 Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera

Publications (1)

Publication Number Publication Date
CN115546829A true CN115546829A (en) 2022-12-30

Family

ID=84729506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211187402.8A Pending CN115546829A (en) 2022-09-28 2022-09-28 Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera

Country Status (1)

Country Link
CN (1) CN115546829A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150298A (en) * 2023-04-19 2023-05-23 山东盛途互联网科技有限公司 Data acquisition method and system based on Internet of things and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150298A (en) * 2023-04-19 2023-05-23 山东盛途互联网科技有限公司 Data acquisition method and system based on Internet of things and readable storage medium

Similar Documents

Publication Publication Date Title
CN110070615B (en) Multi-camera cooperation-based panoramic vision SLAM method
CN110349250B (en) RGBD camera-based three-dimensional reconstruction method for indoor dynamic scene
CN108406731B (en) Positioning device, method and robot based on depth vision
US10977818B2 (en) Machine learning based model localization system
CN110097553B (en) Semantic mapping system based on instant positioning mapping and three-dimensional semantic segmentation
US20190188533A1 (en) Pose estimation
CN107341442B (en) Motion control method, motion control device, computer equipment and service robot
US10068344B2 (en) Method and system for 3D capture based on structure from motion with simplified pose detection
CN102609942B (en) Depth map is used to carry out mobile camera location
CN102622762B (en) Real-time camera tracking using depth maps
CN112184757B (en) Method and device for determining motion trail, storage medium and electronic device
CN113674416B (en) Three-dimensional map construction method and device, electronic equipment and storage medium
CN108051002A (en) Transport vehicle space-location method and system based on inertia measurement auxiliary vision
US20220051425A1 (en) Scale-aware monocular localization and mapping
CN108628306B (en) Robot walking obstacle detection method and device, computer equipment and storage medium
JP7379065B2 (en) Information processing device, information processing method, and program
CN110260866A (en) A kind of robot localization and barrier-avoiding method of view-based access control model sensor
KR20210058686A (en) Device and method of implementing simultaneous localization and mapping
CN208323361U (en) A kind of positioning device and robot based on deep vision
CN111998862A (en) Dense binocular SLAM method based on BNN
WO2022021661A1 (en) Gaussian process-based visual positioning method, system, and storage medium
JP2018120283A (en) Information processing device, information processing method and program
CN116128966A (en) Semantic positioning method based on environmental object
CN117593650A (en) Moving point filtering vision SLAM method based on 4D millimeter wave radar and SAM image segmentation
CN115546829A (en) Pedestrian spatial information sensing method and device based on ZED (zero-energy-dimension) stereo camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination