CN113256796A

CN113256796A - Three-dimensional point cloud environment real-time reconstruction method based on Kinect V2 sensor

Info

Publication number: CN113256796A
Application number: CN202110607248.4A
Authority: CN
Inventors: 姚寿文; 栗丽辉; 孔若思; 王瑀; 常富祥; 丁佳
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2021-08-13

Abstract

The invention discloses a three-dimensional point cloud environment real-time reconstruction method based on a Kinect V2 sensor, which belongs to the technical field of virtual reality, and is characterized in that Kinect V2 is adopted to collect image data and mileage data, then data preprocessing and synchronization are carried out, feature detection and matching are carried out on the image data collected by Kinect V2, the position of a camera is estimated, a front-end visual odometer is realized, loop detection mainly judges the relation between the current position and the previous position of the camera, so that the accuracy of a modeling environment is optimized, rear-end optimization is combined with the front-end visual odometer and loop detection data, and the environment modeling of three-dimensional point cloud is realized in a graph optimization mode.

Description

Three-dimensional point cloud environment real-time reconstruction method based on Kinect V2 sensor

Technical Field

The invention relates to the technical field of virtual reality, in particular to a three-dimensional point cloud environment real-time reconstruction method based on a Kinect V2 sensor.

Background

Technological progress and development of the era make the environment around people increasingly complex, and in a dangerous environment, environmental exploration or rescue through a remote unmanned vehicle is necessary. Different from actual operation, the perception function of the operator to the environment is in a decoupling state with the actual environment in the remote operation process, so that the operator cannot effectively judge the environment, and the operation accuracy is reduced.

In remote operation, when an operator can only remotely sense and observe through an environment signal transmitted back by a sensor, a perception processing organ of the environment is in a decoupling state with a detected environment, and when a perception system understands visual environment information from a remote place, the problem of correlation with a perception mode which is directly observed in a natural environment needs to be overcome. Furthermore, the perception of the environment by the operator is an active process, the operator changing the visual environment information by performing relevant actions in the environment. In the remote operation process, an operator cannot directly interact with the environment, so that the environment cannot be effectively judged, accurate remote control is difficult to complete, and how to present the remote environment to the operator becomes one of the problems of realizing the remote accurate control of the unmanned vehicle.

In the traditional remote operation technology, an operator judges the current environment of the vehicle through a video image picture transmitted back by a camera sensor, and makes a corresponding decision to adjust the posture of the vehicle, so that the vehicle is controlled. The video two-dimensional image information cannot comprehensively reflect the environment around the vehicle, cannot present information such as the size and distance of obstacles around the vehicle, and the self-perception degree of an operator for operating and driving the vehicle is low, so that the operation precision of the vehicle is influenced.

With the development of sensor technology, sensors based on laser radars, depth cameras and the like are widely applied to the fields of automatic driving, remote operation, virtual reality and the like. Since the three-dimensional depth information can capture depth information of the environment, the environment presentation based on the three-dimensional point cloud data is of great help to the operator in understanding the environment around the vehicle. The perception capability of an operator to the environment can be improved by reconstructing the three-dimensional environment through the point cloud, but the real-time data transmission and the environment reconstruction are difficult due to the huge number of the point cloud, and in the point cloud environment, the operator can possibly distinguish objects in the environment difficultly.

The virtual reality technology has the characteristics of immersion, interactivity and imagination. Therefore, the three-dimensional reconstruction technology research of the vehicle surrounding environment by using the virtual reality technology is urgently needed, rich vehicle surrounding environment information can be provided for an operator, and the method has important significance for improving the perception capability of the operator to the environment and improving the judgment of the operator to the operation site condition.

Disclosure of Invention

The invention aims to provide a three-dimensional point cloud environment real-time reconstruction method based on a Kinect V2 sensor, which comprises the following steps:

acquiring image data based on a Kinect V2 sensor, and performing pose estimation on the Kinect V2 sensor to obtain camera pose data of the Kinect V2 sensor;

constructing a feature dictionary tree based on the image data according to a K-means unsupervised machine learning clustering method, wherein the feature dictionary tree is used for performing loop detection on the image data to obtain loop information, and the loop information is used for improving the data accuracy of camera pose data;

based on the camera pose data, establishing a three-dimensional point cloud environment by setting an error function and iterating based on a Gauss-Newton method;

and simulating a three-dimensional point cloud environment scene based on the feature word dictionary tree and the three-dimensional point cloud environment.

Preferably, before the pose estimation process of the Kinect V2 sensor, feature data of the image data are extracted, and the camera pose data are obtained from the feature data and the image data.

Preferably, feature data are extracted according to the NNDR algorithm based on FAST keypoint detection and BRIEF feature descriptor algorithm.

Preferably, in the process of estimating the pose of the Kinect V2 sensor, extracting the space coordinates of the feature data and the pixel coordinates corresponding to the space coordinates to construct a pose matrix;

based on the pose matrix, camera pose data are obtained by calculating a reprojection error and iterating according to a Gauss-Newton algorithm.

Preferably, in the process of constructing the feature dictionary tree, extracting a feature point set of the image data based on OpenCV;

and clustering the feature point set according to a K-means unsupervised machine learning clustering method to obtain a feature dictionary tree.

Preferably, in the process of clustering the feature point set, based on the feature point set, a root node and a first-layer clustering set are obtained;

based on the first layer of cluster set, obtaining a next layer of cluster set according to a K-means unsupervised machine learning clustering method;

obtaining feature point words based on the first layer of clustering set;

and constructing a feature dictionary tree according to the root node, the first-layer clustering set, the next-layer clustering set and the feature point words.

Preferably, after the process of constructing the feature dictionary tree, the feature points of the image data are extracted and compared with the cluster center point of the feature dictionary tree to determine the data accuracy of the camera pose data, wherein the cluster center point is used for representing each center node established when the feature dictionary tree is created.

Preferably, after the process of constructing the feature dictionary tree, extracting first image data and second image data of the image data to obtain first feature data of the first image data and second feature data of the second image data;

and obtaining the similarity of the first image data and the second image data according to the first feature data, the second feature data and the feature dictionary tree.

Preferably, in the process of constructing the three-dimensional point cloud environment, extracting an error function of camera pose data;

constructing a pose data optimization matrix model according to a Gauss-Newton method based on the camera pose data and an error function, wherein the pose data optimization matrix model is used for optimizing the camera pose data;

and splicing the optimized camera pose data to construct a three-dimensional point cloud environment.

Preferably, in the process of simulating the three-dimensional point cloud environment scene, an information transmission channel is constructed based on a TCP/IP protocol of the ROS, and data interaction is realized through WebSocket, wherein the information transmission channel is used for capturing environment information in real time by a Kinect V2 sensor, and the environment information is generated into input data of the three-dimensional point cloud environment scene through the information transmission channel.

The invention discloses the following technical effects:

the invention provides a three-dimensional point cloud environment real-time reconstruction method based on a Kinect V2 sensor, and the real-time three-dimensional reconstruction of a point cloud environment is realized.

(1) The method comprises the steps of collecting and processing point cloud data through an SLAM method, realizing a visual odometer through feature detection and processing, realizing loop detection of the point cloud data through a visual word bag model, and optimizing the point cloud data by adopting a PnP algorithm and combining a BA back-end algorithm.

(2) A data communication network is established based on a communication protocol and Socket of the ROS, so that point cloud data and pose information of a vehicle can be transmitted to a virtual environment, the data is updated in real time through Websocket transmission, and bidirectional data communication between the ROS system and the virtual environment Unity is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a process according to the present invention;

FIG. 2 is a flowchart of the overall algorithm of the present invention;

FIG. 3 is a flow chart of a visual odometer according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a reprojection error according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a basic structure of a feature dictionary tree according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of data synchronization of a ROS environment in accordance with an embodiment of the present invention;

FIG. 7 is a diagram illustrating data transmission between ROS-Unity according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to 7, the invention provides a three-dimensional point cloud environment real-time reconstruction method based on a Kinect V2 sensor, comprising the following steps:

Before the pose estimation process of the Kinect V2 sensor, feature data of image data are extracted, and camera pose data are obtained according to the feature data and the image data.

And extracting feature data according to an NNDR algorithm based on FAST key point detection and BRIEF feature descriptor algorithm.

In the process of estimating the pose of the Kinect V2 sensor, extracting the space coordinates of the characteristic data and the pixel coordinates corresponding to the space coordinates to construct a pose matrix; based on the pose matrix, camera pose data are obtained by calculating a reprojection error and iterating according to a Gauss-Newton algorithm.

Extracting a feature point set of image data based on OpenCV in the process of constructing a feature dictionary tree; and clustering the feature point set according to a K-means unsupervised machine learning clustering method to obtain a feature dictionary tree.

In the process of clustering the feature point set, acquiring a root node and a first-layer clustering set based on the feature point set; based on the first layer of cluster set, obtaining a next layer of cluster set according to a K-means unsupervised machine learning clustering method; obtaining feature point words based on the first layer of clustering set; and constructing a feature dictionary tree according to the root node, the first-layer clustering set, the next-layer clustering set and the feature point words.

After the process of constructing the feature dictionary tree, extracting feature points of the image data, and comparing the feature points with a clustering center point of the feature dictionary tree to determine the data accuracy of the camera pose data, wherein the clustering center point is used for representing each center node established when the feature dictionary tree is created.

After the process of constructing the feature dictionary tree, extracting first image data and second image data of the image data to obtain first feature data of the first image data and second feature data of the second image data; and obtaining the similarity of the first image data and the second image data according to the first feature data, the second feature data and the feature dictionary tree.

Extracting an error function of camera pose data in the process of constructing a three-dimensional point cloud environment; constructing a pose data optimization matrix model according to a Gauss-Newton method based on the camera pose data and an error function, wherein the pose data optimization matrix model is used for optimizing the camera pose data; and splicing the optimized camera pose data to construct a three-dimensional point cloud environment.

In the process of simulating the three-dimensional point cloud environment scene, an information transmission channel is constructed based on a TCP/IP protocol of ROS, data interaction is achieved through WebSocket, the information transmission channel is used for a Kinect V2 sensor to capture environment information in real time, and the environment information is generated into input data of the three-dimensional point cloud environment scene through the information transmission channel.

Example 1: three-dimensional environment reconstruction is a key technology for remote operation research. The three-dimensional environment reconstruction method based on the classical SLAM method mainly comprises the following steps: the method comprises the following parts of front-end visual odometer, loop detection, rear-end optimization, three-dimensional environment construction and the like.

The Kinect V2 is adopted to collect image data and mileage data, and then data preprocessing and synchronization are carried out. And then, carrying out feature detection and matching through image data collected by Kinect V2, and estimating the pose of the camera to realize a front-end visual odometer. The loop detection is mainly used for judging the relation between the current position and the previous position of the camera, so that the accuracy of the modeling environment is optimized. And the back-end optimization is combined with the front-end visual odometer and the loop detection data, and the environmental modeling of the three-dimensional point cloud is realized in a graph optimization mode.

Meanwhile, in order to present the acquired three-dimensional point cloud in the virtual environment, the point cloud environment in the three-dimensional environment is realized through a data communication mode between the ROS and the virtual environment engine Unity.

The three-dimensional point cloud environment real-time reconstruction algorithm mainly realizes the presentation of the three-dimensional point cloud environment in the virtual environment through related processes of SLAM. The algorithm mainly comprises the following steps: firstly, extracting and detecting the characteristics of the collected RGB-D image data, secondly, estimating the pose of the camera to obtain a key frame of the camera, thirdly, establishing a visual dictionary tree through the detected data and carrying out closed-loop judgment, and fourthly, optimizing the rear end to reduce the generated three-dimensional point cloud environment error. After modeling of the point cloud environment is completed in the ROS environment, a data transmission channel from the ROS to the virtual engine Unity is also needed to be established, so that the construction of the three-dimensional point cloud in the virtual environment is realized.

(1) Feature extraction and matching

The features are a digital representation of the image information, and the image features should have good and significant stability, be easy to detect and extract in image processing, and be stable and invariant to illumination, object material, etc. The corners, edges, blocks and the like in the image can be used as characteristic points in the image, and the obvious image characteristics can also be used as road signs in the three-dimensional environment reconstruction.

The feature points are composed of Key points (Key-points) and descriptors (descriptors), the Key points represent position coordinate information of the feature points in the image, and the descriptors describe pixel information around the Key points and are generally in a vector form. The method is based on FAST (features from obtained segment test) key point detection and BRIEF (binary robust independent feature descriptors) feature descriptor calculation, and is used for detecting and extracting feature points of image data and performing feature matching by adopting an NNDR (nearest neighbor distance ratio) method.

(2) Camera pose estimation and keyframe computation

The pose estimation of the camera is mainly to match the continuous frames and perform the pose estimation of the camera. When estimating the camera pose, the main method is to estimate the pose of the camera through the coordinates of n three-dimensional space points and the corresponding projection positions. When comparing the current frame data captured by the Kinect V2 sensor with the previous frame data, extracting corresponding two-dimensional feature points from each frame data, converting the two-dimensional feature points into three-dimensional space points, and then obtaining the pose change data of the camera by a method of minimizing reprojection errors.

Suppose a point's spatial coordinate P_i＝[X_i，Y_i，Z_i]^TCorresponding to pixel coordinate m_i＝[u_i，v_i]^TThen the relationship between the pixel coordinates and the position coordinates of the space point is

In the formulaK is the internal parameter matrix of the camera, xi represents the lie algebraic form of the pose (R, T) matrix of the camera, s_iIs the distance of the pixel, ξ^∧Expressed in matrix form as

s_im_i＝Kexp(ξ^∧)P_i (3.3)

Because the pose data of the camera is unknown information and noise exists at an observation point due to the influences of equipment, environment and the like, an error exists in the formula (3.3). In order to minimize the error, the reprojection error sum is calculated by equation (3.4), and the best camera pose data is found by building a least squares problem.

The reprojection error is a comparison of the pixel coordinate values obtained by the observed projection position with the coordinate values obtained by the projection of the spatial position estimated by the current pose, as shown in FIG. 3, p₁，p₂Is a feature matching point, is the projection of the same space point P,

is by p₁，p₂And (4) coordinates obtained after the projection of the point estimated spatial position points. In the initial position, since the pose of the camera has not been calculated yet, the projected position of P

With the actual projection position p₂The distance between the cameras is l, and the distance is gradually reduced by continuously adjusting the pose of the cameras.

In order to minimize the equation (3.4), the derivative of the error term to the optimization variable is first defined, expressed in interpolated form,

e(x+Δx)≈e(x)+JΔx (3.5)

in the formula, e is an error value of a pixel coordinate point, x is a pose value of the camera, J is a matrix form, and Δ x is a pose value increment of the camera. The calculation process is as follows:

note that P' is the coordinate of P in the camera coordinate system, and P ═ is (exp (ξ)^∧)P)_1：3＝[X′，Y′，Z′]^TThen, the camera projection model has su-KP 'for P', i.e.

Taking into account the derivative of the amount of change of e with respect to the disturbance variable, vs^∧Left-hand disturbance quantity delta xi, have

In the formula (I), the compound is shown in the specification,

representing the left-multiplied perturbation under a lie algebra. And due to

Thus, it is possible to provide

Multiplying formula (3.8) by formula (3.9) to obtain

And then, iterative solution formula (3.5) is solved through Gauss-Newton algorithm, so that an optimal camera pose transformation matrix is obtained, and frame matching between point cloud data can be carried out.

(3) Loop detection

The loop detection is also called closed loop detection, which mainly compares the current position of the trolley with the previous position, and if the loop is detected, the loop information is transmitted to the rear end for further processing. The loop detection is mainly used for reducing the problem that the camera pose estimation is accumulated and drifted along with the time, so that the system can stably work in a large range of time, and has important significance for realizing the global consistency of the system.

The invention trains a dictionary by a point feature method, and the basic method is as follows:

(1) the method is characterized in that the method comprises the following steps of firstly preprocessing the existing image data, unifying the format and the specification of the image and facilitating the subsequent processing. And then extracting feature points of each image, extracting N feature points from the images based on OpenCV, and providing data support for the establishment of a subsequent bag-of-words model.

(2) And identifying the characteristic points from the acquired picture data by adopting a K-means unsupervised machine learning clustering method. Firstly, the features are clustered into K classes at the root nodes by a K-means method to obtain a first-layer cluster, then the operations are sequentially carried out, and according to the nodes obtained at the previous layer, the data belonging to the nodes are clustered by a clustering method to obtain a next-layer cluster, finally, feature point words are obtained, and the establishment of a feature dictionary tree is completed, as shown in FIG. 4.

(3) After the construction of the feature dictionary is completed, when the picture needs to be processed, feature points of the picture are extracted, and then the feature points of the picture are compared with the clustering center point of each center node to search words corresponding to the picture, so that the comparison accuracy is guaranteed, and the searching efficiency is improved.

(4) Through a bag-of-words model, pictures can be described by using vectors, and for a certain picture A, the number of all features is assumed to be n, and a word w_iNumber of occurrences is n_iThe number of words co-occurring in the picture is m, consider w_iOf (d) weight η_iFor the influence of picture similarity, picture A may be a vector v composed of words corresponding to feature points thereof_AThe expression is shown as the formula (3.11).

Thus, in the comparison of two images, the problem of determining the similarity of the two images is converted into a comparison of the distance between the two vectors, given the vector v of the two images₁,v₂Available based on L₁The norm form of (2) calculates the similarity of the two pictures, as shown in equation (3.12).

When s (v)₁，v₂) The smaller the numerical value of (A) is, the greater the difference between the two pictures is, the smaller the similarity is, the smaller the possibility of generating a closed loop is, and the larger the numerical value is, the opposite is. When s (v)₁，v₂) When the value of (a) exceeds a fixed similarity threshold, the system will return to the state before closed-loop detection. By setting a space consistency condition, namely, after continuous multiple closed loops are carried out, a complete closed loop state can be ensured, so that the stability of the algorithm is improved.

(4) Back-end optimization algorithm

The back-end optimization is mainly used for processing the noise problem in the system, the noise source is mainly the measurement error of the sensor, some of the noise sources can be influenced by the magnetic field, the temperature and the like, the back-end optimization is used for perfecting, supplementing and calibrating the camera pose data and loop information at different moments, and estimating the uncertainty of the surrounding environment, so that the globally consistent driving track and the three-dimensional environment are obtained.

The visual odometer comprises certain errors, and as the errors are gradually accumulated, the three-dimensional environment modeling is not accurate enough, and the back-end optimization is a solution provided for the problem. Let x be (x)₁,x₂,…,x_t)^TFor t moments of the cameraPose value, error function of

In the formula, z_ijThe state transition matrix observed by the camera from time i to time j,

is the true transformation matrix. The overall cost (error) function F (x) is

Where C represents the collective sequence of all poses, Ω_ijAn information matrix representing an error due to noise or the like at the time of image matching.

The optimization aims to obtain the value of x when the cost function F (x) is minimum, as shown in a formula (3.15),

the first-order Taylor formula expansion is carried out on F (x) to obtain

Then

Wherein c ═ Σ c_ij,b＝∑b_ij,H＝∑H_ij，H_ijHessian matrix of the cost function f (x). By making the derivative of equation (3.17) zero, the minimum value can be found, having

In equation (3.18), matrix H has sparsity which relates to Jacobian matrix J, let

The matrix is then of the form:

J_ij＝[0 ... 0 A_ij 0 ... 0 B_ij 0 ... 0] (3.19)

thus, as a result of H and b,

according to the formula (3.20), the value of delta x can be obtained, the pose of the camera can be optimized by continuously iterating by using a Gauss-Newton method, and then the point cloud data corresponding to the optimized pose data is spliced to obtain the three-dimensional point cloud environment.

(5) Three-dimensional environment reconstruction and rendering

5.1 acquisition of Point cloud data

After the capture and reconstruction of the three-dimensional point cloud environment scene are completed, a remote three-dimensional environment needs to be presented and the real-time pose of the vehicle needs to be observed. In order to connect the Kinect V2 sensor, the real-time three-dimensional point cloud environment and the pose of the vehicle, ROS is used as a central platform, a TCP/IP protocol based on ROS is adopted for mutual transmission of data between different terminals, and meanwhile, a plurality of control processes of the remote trolley and three-dimensional point cloud scene data are processed.

The ROS-based TCP/IP protocol is a transmission layer of an ROS message and a server side, and can transmit data types and routing information through a standard TCP/IP Socket function. Socket is an endpoint in a network that functions to connect two-way communications between two running programs. The basic process is that firstly, a server program binds a Socket to a specific network port, through the Socket, the connection request of a client can be waited and monitored, and according to the host name and the port number of the server program, the client program sends out a corresponding connection request, thereby ensuring the network communication connection between the two programs.

Socket in fig. 6 represents creating a Socket descriptor, binding represents binding the Socket to an address end, determining a local address and a local port, and monitoring represents monitoring a Socket request of a client. For programs executing different functions, nodes are adopted for connection, and the nodes communicate with each other through topic streams. In fig. 6, when a Socket at a server receives a connection request from a client, a node of the Socket issues calculated vehicle pose data and sends the vehicle pose data to the client through a topic stream, the client receives the data through a node subscription topic, the client issues current vehicle pose data through the node, and the server obtains the data through subscribing the topic, so that the virtual vehicle pose controlled by the computer is kept in communication with the pose of an actual trolley, and the two poses are consistent.

The vehicle surroundings information is captured by the Kinect V2 sensor and is input data for the entire scene. In order to ensure the normal running of the vehicle in the ROS environment, the vehicle model in the virtual environment is in a URDF format, and the vehicle is described through an XML syntax framework so as to be convenient for subsequent simulation and analysis in the ROS environment. All state data are published to the ROS platform and used for constructing a virtual three-dimensional scene in Unity later.

5.2 transfer of environmental data between ROS-Unity

In order to realize synchronous data transmission between the ROS and the Unity, a network architecture as shown in fig. 7 is designed, and once three-dimensional point cloud scene and vehicle state data are acquired in the ROS or a vehicle control command is issued in the Unity, the system immediately transmits updated data through a Web Socket to ensure that a large amount of data can be stably transmitted in real time regardless of network fluctuation.

The communication with the HTTP protocol can only be initiated by a client, and through Web Socket, the server can actively push messages to the client, and the client can also actively send messages to the server. As shown in fig. 7, the ROS server obtains data through Socket and transmits the data through Web Socket, the data transmission format is JSON format, and the Unity server obtains data through Socket and is connected to the ROS server through an IP address and a port number. The Web Socket can establish a bidirectional link between the server and the client by using the Socket method, and the link is real-time and is effective for a long time (unless closed). When the server sends data to the client, as long as a Socket which is opened and is still connected with the server exists in the client, the data of the server can be pushed to the Socket without reestablishing a link, so that the data is pushed to the client.

The invention provides a point cloud-based three-dimensional environment reconstruction method based on a classical SLAM algorithm, and realizes the construction of a three-dimensional point cloud environment. Meanwhile, a data transmission network is developed aiming at the data transmission problem of a virtual reality engine and an ROS operating system, and the reconstruction of a three-dimensional point cloud environment in a virtual environment is realized.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The three-dimensional point cloud environment real-time reconstruction method based on the Kinect V2 sensor is characterized by comprising the following steps:

constructing a feature dictionary tree based on the image data according to a K-means unsupervised machine learning clustering method, wherein the feature dictionary tree is used for performing loop detection on the image data to obtain loop information, and the loop information is used for improving the data accuracy of the camera pose data;

establishing a three-dimensional point cloud environment by setting an error function and iterating based on a Gauss-Newton method based on the camera pose data;

2. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 1,

before the pose estimation process of the Kinect V2 sensor, extracting feature data of the image data, and obtaining the camera pose data according to the feature data and the image data.

3. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 2,

and extracting the characteristic data according to an NNDR algorithm based on FAST key point detection and BRIEF feature descriptor algorithm.

4. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 3,

extracting the space coordinates of the characteristic data and the pixel coordinates corresponding to the space coordinates to construct a pose matrix in the process of estimating the pose of the Kinect V2 sensor;

and based on the pose matrix, obtaining the camera pose data by calculating a reprojection error and iterating according to a Gauss-Newton algorithm.

5. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 4,

extracting a feature point set of the image data based on OpenCV in the process of constructing the feature dictionary tree;

and clustering the feature point set according to the K-means unsupervised machine learning clustering method to obtain the feature dictionary tree.

6. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 5,

in the process of clustering the feature point set, acquiring a root node and a first-layer clustering set based on the feature point set;

based on the first layer of cluster set, obtaining a next layer of cluster set according to the K-means unsupervised machine learning clustering method;

obtaining feature point words based on the first layer clustering set;

and constructing the feature dictionary tree according to the root node, the first-layer clustering set, the next-layer clustering set and the feature point words.

7. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 6,

after the process of constructing the feature dictionary tree, extracting feature points of the image data, and comparing the feature points with a cluster center point of the feature dictionary tree to determine the data accuracy of the camera pose data, wherein the cluster center point is used for representing each center node established when the feature dictionary tree is created.

8. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 6,

after the process of constructing the feature dictionary tree, extracting first image data and second image data of the image data to obtain first feature data of the first image data and second feature data of the second image data;

9. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 8,

extracting the error function of the camera pose data in the process of constructing the three-dimensional point cloud environment;

constructing a pose data optimization matrix model according to the Gauss-Newton method based on the camera pose data and the error function, wherein the pose data optimization matrix model is used for optimizing the camera pose data;

and splicing the optimized camera pose data to construct the three-dimensional point cloud environment.

10. The method for reconstructing the three-dimensional point cloud environment based on the Kinect V2 sensor in real time as claimed in claim 9,

in the process of simulating the three-dimensional point cloud environment scene, an information transmission channel is constructed based on a TCP/IP protocol of ROS, and data interaction is achieved through WebSocket, wherein the information transmission channel is used for the Kinect V2 sensor to capture environment information in real time, and the environment information is generated into input data of the three-dimensional point cloud environment scene through the information transmission channel.