WO2022077296A1

WO2022077296A1 - Three-dimensional reconstruction method, gimbal load, removable platform and computer-readable storage medium

Info

Publication number: WO2022077296A1
Application number: PCT/CN2020/120978
Authority: WO
Inventors: 徐骥飞; 杜劼熹; 上官政和
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2022-04-21

Abstract

A three-dimensional reconstruction method, a gimbal load (20), a removable platform and a computer-readable storage medium. The method comprises: acquiring three-dimensional point clouds of a target scene, an image sequence, and movement data of a gimbal load (20), wherein the three-dimensional point clouds are acquired by a first image collection apparatus (21), the image sequence is acquired by a second image collection apparatus (22), and the movement data comprises pose information of the gimbal load (20) during the acquisition of the three-dimensional point clouds of the target scene and the image sequence; registering the three-dimensional point clouds according to the movement data, so as to acquire a relative pose between the three-dimensional point clouds, and registering the image sequence, so as to acquire a relative pose between images; and generating a three-dimensional model of the target scene according to the three-dimensional point clouds, the image sequence, the movement data, the relative pose between the three-dimensional point clouds and the relative pose between the images. A high-precision three-dimensional model is acquired.

Description

Three-dimensional reconstruction method, pan-tilt load, movable platform, and computer-readable storage medium

technical field

The present application relates to the technical field of three-dimensional reconstruction, and in particular, to a three-dimensional reconstruction method, a PTZ load, a movable platform, and a computer-readable storage medium.

Background technique

Simultaneous Localization and Mapping (SLAM) is a technology for robots to estimate their own motion and build a map of the surrounding environment in an unknown environment. It has a wide range of applications in drones, autonomous driving, mobile robot navigation, virtual reality and augmented reality.

When the robot is in an outdoor environment, the positioning under the global map can be achieved through high-precision GPS signals and prior maps. However, when the robot is in an environment where GPS signals cannot be reached, the SLAM method is usually implemented based on the camera. This method uses images for 3D reconstruction, and uses the 3D reconstructed map to perceive its own motion and the surrounding environment, but from a single image Therefore, this method has the problems of low positioning accuracy and low quality of 3D model construction.

SUMMARY OF THE INVENTION

In view of this, one of the objectives of the present application is to provide a three-dimensional reconstruction method, a pan-tilt load, a movable platform, and a computer-readable storage medium.

In a first aspect, an embodiment of the present application provides a three-dimensional reconstruction method, which is applied to a pan-tilt load, and the pan-tilt payload is provided with a first image acquisition device and a second image acquisition device, and the method includes:

Acquiring a 3D point cloud, an image sequence, and motion data of the pan/tilt load of the target scene; wherein, the 3D point cloud is acquired by the first image acquisition device, and the image sequence is acquired by the second image acquisition device ; The motion data includes: during the acquisition of the three-dimensional point cloud and image sequence of the target scene, the pose information of the load on the PTZ;

The three-dimensional point cloud is registered according to the motion data to obtain the relative pose between the three-dimensional point clouds; and the image sequence is registered to obtain the relative pose between the images;

According to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, obtain the pose and/or the first image acquisition device when acquiring the three-dimensional point cloud. 2. the pose when the image acquisition device acquires the image;

A 3D model of the target scene is generated according to the 3D point cloud, the sequence of images, and the pose when the first image acquisition device acquires the 3D point cloud and/or the pose when the second image acquisition device acquires the image .

In a second aspect, an embodiment of the present application provides a pan-tilt load, including a first image acquisition device, a second image acquisition device, an inertial measurement unit, a memory for storing executable instructions, and a processor;

The first image acquisition device is used to acquire a three-dimensional point cloud of the target scene;

The second image acquisition device is used for acquiring the image sequence of the target scene;

During the collection of the three-dimensional point cloud and the image sequence, the inertial measurement unit is used to obtain motion data of the gimbal load; the motion data includes the pose information of the gimbal load;

When the processor executes the executable instructions, it is configured to:

In a third aspect, an embodiment of the present application provides a movable platform, including the PTZ load described in the second aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the implementation is as described in the first aspect Methods.

A three-dimensional reconstruction method, a pan-tilt load, a movable platform, and a computer-readable storage medium provided by the embodiments of the present application acquire a three-dimensional point cloud and an image sequence of a target scene by adopting a multi-sensor method; During the three-dimensional point cloud and the image sequence, the motion data of the gimbal load is obtained, and the motion data corresponds to the pose information of the gimbal load; then the three-dimensional point cloud is registered according to the motion data , obtain the relative pose between the three-dimensional point clouds; and, register the image sequence to obtain the relative pose between the images; then play according to the motion data, the relative position between the three-dimensional point clouds pose and the relative pose between the images, obtain the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image; finally, according to the three-dimensional point cloud The point cloud, the image sequence, and the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image generates a three-dimensional model of the target scene. . In this embodiment, multiple types of data are acquired through multiple sensors, and the three-dimensional point cloud, the image sequence, and the motion data are used for three-dimensional reconstruction. Since the three-dimensional point cloud can indicate the three-dimensional information of the target scene, the The motion data includes the pose information of the pan-tilt load, and the first image acquisition device and the second image acquisition device are arranged on the pan-tilt payload, so the motion data can indirectly indicate the first image acquisition The pose information of the device and the second image acquisition device is obtained, thereby obtaining high-precision pose information, so that the three-dimensional model generated based on the high-precision pose information has higher positioning accuracy and robustness.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

1 is a schematic structural diagram of a pan-tilt load provided by an embodiment of the present application;

FIG. 2 is an application scenario diagram provided by an embodiment of the present application;

3 is a schematic flowchart of a three-dimensional reconstruction method provided by an embodiment of the present application;

4 , 5 , 6 and 7 are schematic diagrams of different structures of a pose graph model provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of another pan-tilt load provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

When the robot is in an outdoor environment, the positioning under the global map can be achieved through high-precision GPS signals and prior maps. However, when the robot is in an environment where GPS signals cannot be reached, the SLAM method is usually implemented based on the camera. This method uses images for 3D reconstruction, and uses the 3D reconstructed map to perceive its own motion and the surrounding environment, but from a single image The information obtained is limited, therefore, this method has the problems of low positioning accuracy and low quality of 3D model construction.

Based on this, the embodiments of the present application provide a three-dimensional reconstruction method and a gimbal load. The gimbal load integrates a plurality of different sensors, and obtains a three-dimensional point cloud and an image sequence of a target scene by adopting a multi-sensor method; and , during the acquisition of the three-dimensional point cloud and the image sequence, acquire motion data of the gimbal load, where the motion data includes the pose information of the gimbal load; The point clouds are registered to obtain the relative pose between the three-dimensional point clouds; and the image sequence is registered to obtain the relative pose between the images; then according to the motion data, the three-dimensional point cloud and the relative pose between the images, obtain the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the second image acquisition device when the image is acquired; finally A 3D model of the target scene is generated according to the 3D point cloud, the sequence of images, and the pose when the first image acquisition device acquires the 3D point cloud and/or the pose when the second image acquisition device acquires the image . In this embodiment, a variety of data are acquired through multi-sensors, and high-precision pose information is acquired based on the combination of various data, so that the three-dimensional model generated based on the high-precision pose information has higher positioning accuracy and robustness .

The three-dimensional reconstruction method can be applied to a pan-tilt load, where a plurality of different sensors are installed. For example, please refer to FIG. 1 , which provides a schematic structural diagram of a pan-tilt load. There are a first image acquisition device 21, a second image acquisition device 22 and an inertial measurement unit 23; the first image acquisition device 21 is used to acquire the three-dimensional point cloud of the target scene in real time; the second image acquisition device 22 is used for to acquire the image sequence of the target scene in real time; the inertial measurement unit 23 is used to acquire the motion data of the PTZ load in real time; then the PTZ load can be based on the real-time acquisition of the three-dimensional point cloud, the image The sequence and the motion data generate a three-dimensional model in real time, so that the real-time requirements can be met while the accuracy requirements are met, so that subsequent positioning or measurement can be performed directly based on the real-time generated three-dimensional model.

The inertial measurement unit is used to measure the three-axis attitude angle (or angular rate) and acceleration of the gimbal load. Generally, an inertial measurement unit includes three single-axis accelerometers and three single-axis gyroscopes. The accelerometer detects the acceleration signal of the gimbal load, and the gyro detects the angular velocity signal of the gimbal load, and measures the angular velocity signal of the gimbal load. The angular velocity and acceleration of the gimbal load in the three-dimensional space are calculated, and the attitude of the gimbal load is calculated.

The first image acquisition device may include at least one or more of the following: lidar, binocular vision sensor, and structured light depth camera.

The lidar is used to transmit a laser pulse sequence to the target scene, then receive the laser pulse sequence reflected from the target, and generate a three-dimensional point cloud according to the reflected laser pulse sequence. In one example, the lidar can determine the reception time of the reflected laser pulse sequence, eg, by detecting the rising edge time and/or the falling edge time of the electrical signal pulse to determine the reception time of the laser pulse sequence. In this way, the laser radar can calculate TOF (Time of flight, time of flight) by using the receiving time information and the transmitting time of the laser pulse sequence, so as to determine the distance from the detected object to the laser radar. The lidar is a self-illuminating sensor, does not depend on light source illumination, is less disturbed by ambient light, and can work normally even in a closed environment without light, so that a high-precision three-dimensional model can be generated later, and it has a wide range of applicability.

The binocular vision sensor obtains two images of the target scene from different positions based on the principle of parallax, and obtains three-dimensional geometric information by calculating the positional deviation between corresponding points of the two images, thereby generating a three-dimensional point cloud. The binocular vision sensor has low hardware requirements, and correspondingly, it can also reduce the cost. It only needs to be an ordinary CMOS (Complementary Metal Oxide Semiconductor) camera. As long as the light is suitable, both indoor and outdoor environments can be used. use, so it also has certain applicability.

The structured light depth camera projects light with certain structural characteristics into the target scene and then collects it. This kind of light with certain structure will collect different image phase information due to different depth regions of the subject, and then It is converted into depth information to obtain a 3D point cloud. Structured light depth camera is also a self-illuminating sensor, does not depend on light source illumination, and is less disturbed by ambient light, and can work normally even in a closed environment without light, so as to generate high-precision 3D models later, which has a wide range of applicability .

The second image acquisition device can acquire color images, grayscale images, infrared images, and the like. The second image acquisition device includes at least one or more of the following: a visible light camera, a grayscale camera, and an infrared camera.

The second image acquisition device may capture a sequence of images at a specified frame rate. In some embodiments, the sequence of images may be captured at a standard frame rate such as about 24p, 25p, 30p, 48p, 50p, 60p, 72p, 90p, 100p, 120p, 300p, 50i or 60i. In some embodiments, less than or equal to about every 0.0001 second, 0.0002 second, 0.0005 second, 0.001 second, 0.002 second, 0.005 second, 0.01 second, 0.02 second, 0.05 second, 0.1 second, 0.2 second, 0.5 second, 1 Capture image sequences at frame rates of one image per second, 2 seconds, 5 seconds, or 10 seconds.

The second image capture device may have adjustable capture parameters. Under different capture parameters, the second image capture device may capture different images despite being subjected to exactly the same external conditions (eg location, lighting). Capture parameters may include exposure (eg, exposure time, shutter speed, aperture, film speed), gain, gamma, region of interest, binning/subsampling, pixel clock, offset, trigger, ISO, and the like. Exposure-related parameters may control the amount of light reaching the image sensor in the second image capture device. For example, shutter speed can control the amount of time light reaches the image sensor and aperture can control the amount of light that reaches the image sensor in a given time. A gain-related parameter can control the amplification of the signal from the optical sensor. ISO controls the level of sensitivity of the camera to the available light.

In an exemplary embodiment, a lidar, a visible light camera and an inertial measurement unit are installed on the pan/tilt load. In one example, the lidar, the visible light camera, and the inertial measurement unit may operate at the same frame rate. In another example, the lidar, the visible light camera and the inertial measurement unit may also work at different frame rates, the frame rates of the lidar and the visible light camera and the inertial measurement unit The three-dimensional point cloud, the image sequence and the motion data of the PTZ load can be acquired within a preset time period.

In one embodiment, the pan-tilt payload can be mounted on a movable platform, and the pan-tilt payload can be carried by the movable platform to move in a target scene, so as to perform three-dimensional reconstruction for the target scene, which is convenient for follow-up. The three-dimensional model obtained by the three-dimensional reconstruction can be used for positioning tasks, mapping tasks, or navigation tasks, and the like. Wherein, the movable platform includes, but is not limited to, an unmanned aerial vehicle, an unmanned vehicle, or a mobile robot.

In one embodiment, the PTZ load can be connected to the remote control terminal in communication, and the user can control the PTZ load through the remote control terminal to collect data in real time and perform 3D reconstruction in real time, and transmit the 3D model obtained by the 3D reconstruction to the remote control terminal in real time. The remote control terminal is displayed on the remote control terminal, so that the user can know the progress of the three-dimensional reconstruction in real time, which is convenient for the user to use; further, the user can also input feedback information for the three-dimensional model displayed on the remote control terminal. The feedback information is used to indicate the position where the first image acquisition device missed the acquisition, and the remote control terminal transmits the feedback information to the PTZ load, so that the PTZ load can control the first image acquisition device The 3D point cloud at the position indicated by the feedback information in the target scene is collected to realize the online monitoring process, to ensure accurate and effective acquisition of 3D point cloud data, and to improve the efficiency of 3D reconstruction.

In an exemplary application scenario, please refer to FIG. 2 , the gimbal load 20 can be mounted on the unmanned aerial vehicle 10, and the gimbal load 20 and the unmanned aerial vehicle 10 are connected in communication with the remote control terminal 30, The user can control the UAV 10 and the gimbal load 20 through the remote control terminal 30. For example, the remote control terminal 30 can respond to the user's trigger operation and send a three-dimensional reconstruction instruction to the gimbal load 20, so the The pan-tilt load 20 is installed with a first image acquisition device, a second image acquisition device and an inertial measurement unit; the first image acquisition device acquires the three-dimensional point cloud of the target scene in real time in response to the three-dimensional reconstruction instruction; The second image acquisition device acquires the image sequence of the target scene in real time in response to the three-dimensional reconstruction instruction; the inertial measurement unit acquires the motion data of the pan-tilt load 20 in real time in response to the three-dimensional reconstruction instruction, so the The motion data includes: during the acquisition of the three-dimensional point cloud and image sequence of the target scene, the pose information of the gimbal load; then the gimbal load 20 can be based on the real-time acquisition of the three-dimensional point cloud, the image The sequence and the motion data generate a three-dimensional model in real time, so as to meet the real-time requirements while meeting the accuracy requirements, and send the three-dimensional model generated in real time to the remote control terminal 30, so that the remote control terminal 30 can display all the data. The three-dimensional model is described so that the user can know the progress of the three-dimensional reconstruction in real time, which is convenient for the user to use; the user can also control the flight of the unmanned aerial vehicle 10 through the remote control terminal 30, so that the unmanned aerial vehicle 10 can carry the gimbal load 20 and fly to the desired location. different positions of the target scene, so that the pan-tilt load 20 can perform three-dimensional reconstruction on different positions of the target scene, and obtain a complete three-dimensional map in the target scene, so that the unmanned aerial vehicle can be based on the 3D map for positioning tasks, measurement tasks or cruise tasks, etc.

Next, the three-dimensional reconstruction process will be described. Please refer to FIG. 3 , which is a schematic flowchart of a three-dimensional reconstruction method provided by an embodiment of the present application. The method is applied to a PTZ load, and the method includes:

In step S101, a 3D point cloud of a target scene, an image sequence, and motion data loaded by the gimbal are acquired; wherein, the 3D point cloud is acquired by the first image acquisition device, and the image sequence is acquired by the first image acquisition device. 2. Acquired by an image acquisition device; the motion data includes: during the acquisition of a three-dimensional point cloud and an image sequence of a target scene, the pose information of the PTZ load.

In step S102, the three-dimensional point cloud is registered according to the motion data to obtain the relative pose between the three-dimensional point clouds; and the image sequence is registered to obtain the relative pose between the images .

In step S103, according to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, the pose and the pose when the first image acquisition device acquires the three-dimensional point cloud is acquired /or the pose when the second image acquisition device acquires an image.

In step S104, according to the three-dimensional point cloud, the image sequence, and the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image, generate 3D model of the target scene.

Wherein, the target scene may be a scene where the satellite positioning signal is lower than a predetermined intensity, for example, the target scene may be an indoor scene or an outdoor non-open scene (such as underwater, a mine, etc.), in this kind of scene, you can use The three-dimensional reconstruction method of this embodiment generates a high-precision three-dimensional model.

The satellite positioning signals (ie, GNSS signals) include, but are not limited to, GPS signals, signals from the Galileo Satellite Navigation System (GALILEO), or signals from the Beidou Satellite Navigation System (BDS).

In one example, the strength of the satellite positioning signal may be determined by at least one of the following parameters: the signal-to-noise ratio of the satellite positioning signal, the cold start time or the warm start time of the satellite positioning signal; in an exemplary embodiment , if the signal-to-noise ratio of the satellite positioning signal is lower than the preset threshold, it indicates that the strength of the satellite positioning signal is lower than the preset strength; in another exemplary embodiment, if the signal-to-noise ratio of the satellite positioning signal If the ratio is lower than the preset threshold and the cold start time of the satellite positioning signal does not meet the preset time condition, it indicates that the strength of the satellite positioning signal is lower than the preset strength.

The pan-tilt load can use a depth camera to acquire a 3D point cloud of the target scene in real time, and use a second image acquisition device to acquire an image sequence of the target scene in real time; and during acquisition of the 3D point cloud and the image sequence, Using an inertial measurement unit to collect motion data of the gimbal load in real time, the motion data corresponds to the pose information of the gimbal load. This embodiment uses the three-dimensional point cloud, the image sequence, and the motion data acquired in real time to perform real-time three-dimensional reconstruction. Since the three-dimensional point cloud can indicate the three-dimensional information of the target scene, the motion data can indicate The pose information of the pan-tilt load, the first image acquisition device and the second image acquisition device are set on the pan-tilt payload, so the motion data can indirectly indicate the first image acquisition device and the The pose information of the second image acquisition device is obtained to obtain high-precision pose information, so that the three-dimensional model generated based on the high-precision pose information has higher positioning accuracy and robustness; and real-time acquisition and real-time three-dimensional reconstruction The process is also enough to meet the needs of the scene with real-time requirements, so that the subsequent positioning or measurement can be directly based on the real-time generated 3D model.

In an embodiment, the pan-tilt load can be connected to the remote control terminal in communication, and the pan-tilt load can generate a collection trajectory according to the accuracy requirements and/or density requirements of the three-dimensional model input by the user on the remote control terminal. and acquisition time, and then control the first image acquisition device, the second image acquisition device and the inertial measurement unit to acquire data according to the acquisition track and the acquisition time. This embodiment implements automatic planning of the acquisition scheme based on the accuracy requirements and/or density requirements of the three-dimensional model, and controls the first image acquisition device, the second image acquisition device, and the inertial measurement unit to acquire data. There is no need to participate in the process, which can effectively prevent human errors introduced in the acquisition process and improve the accuracy of the generated 3D model.

After acquiring the 3D point cloud, the image sequence and the motion data, the PTZ load registers the 3D point cloud according to the motion data, and obtains the relative pose between the 3D point clouds, The relative poses between the three-dimensional point clouds are used to calibrate the three-dimensional point clouds, thereby improving the accuracy of subsequent three-dimensional reconstruction results; The relative pose between the images is used to determine the pose of the second image acquisition device, thereby improving the accuracy of subsequent three-dimensional reconstruction results.

The process of obtaining the relative pose between 3D point clouds is described here:

Considering that such as lidar, structured light depth camera, etc., only one point in the 3D point cloud can be collected at the same time. This point represents a 3D coordinate point in the 3D coordinate system. In the process of collecting the 3D point cloud There will be a problem similar to the rolling shutter effect. The points in a frame of 3D point cloud are not collected at the same time. During the collection process, the lidar moves with the carrier, but the lidar points measure the The distance between the object and the radar, so the coordinate system of different laser points is different. In order to solve this problem, each point in the 3D point cloud has an independent coordinate system, so as to ensure the accuracy of the 3D coordinates of the point in the corresponding coordinate system. However, when the 3D point cloud is used for 3D reconstruction in the future, the points in the 3D point cloud need to be in the same coordinate system, so the 3D point cloud needs to be reprojected, the motion of the lidar during the acquisition process is calculated, and the This amount of movement is compensated on the corresponding laser point. In this embodiment, the three-dimensional point cloud is transformed by the motion data obtained by the inertial measurement unit installed on the same pan/tilt load as the first image acquisition device, and the motion data obtained by the inertial measurement unit corresponds to the cloud The pose information of the platform load, since the inertial measurement unit and the first image acquisition device are both installed on the pan-tilt load, the motion process of the first image acquisition device and the motion process of the pan-tilt load are the same or similar Therefore, the motion data acquired by the inertial measurement unit also indicates the motion process of the first image acquisition device, so the motion data measured by the inertial measurement unit can be used to accurately determine the transformation between points in different coordinate systems For example, the transformation relationship between the points can be obtained by using the motion data and the external parameter transformation relationship between the inertial measurement unit and the first image acquisition device.

For a group of 3D point clouds collected in a continuous time series, the PTZ load obtains the transformation relationship between points according to the motion data corresponding to each point in the group of 3D point clouds, and then according to the relationship between the points The transformation relationship reprojects the points in the set of 3D point clouds to the same coordinate system. In this embodiment, a group of three-dimensional point clouds collected in a continuous time series are transformed into the same coordinate system, so as to realize the integration of three-dimensional point cloud data, which is beneficial to improve the efficiency of three-dimensional reconstruction.

Wherein, the PTZ load acquires the motion data corresponding to the point according to the acquisition time of each point; in an example, the first image acquisition device is a laser radar, and the acquisition time of the point is based on the reflected laser pulses Determined by the receiving time of the sequence, the acquisition time of the point is the acquisition time of the corresponding motion data, and the gimbal load acquires the motion data from the inertial measurement unit at the same moment when the point is acquired.

In a continuous time series, the first image acquisition device will acquire each point at a preset time interval, and when acquiring the transformation relationship between points, the corresponding motion of each point in the set of point clouds can be obtained. The data determines the relative pose of the inertial measurement unit within the preset time interval, wherein the preset time interval is the time interval for acquiring adjacent points; then the gimbal load is based on the inertial measurement unit at the time interval. The relative pose within a preset time interval, and the extrinsic parameter transformation relationship between the first image acquisition device and the inertial measurement unit, obtain the transformation relationship between the points. In this embodiment, since the inertial measurement unit and the first image acquisition device are both installed on the pan-tilt load, the motion data acquired by the inertial measurement unit also indicates the movement process of the first image acquisition device, so it can be Using the motion data measured by the inertial measurement unit, the transformation relationship between points in different coordinate systems can be accurately determined, so that a set of three-dimensional point clouds collected in a continuous time series can be transformed based on the transformation relationship between the points. Under the same coordinate system, the integration of 3D point cloud data is realized, which is beneficial to improve the efficiency of 3D reconstruction.

In one example, for each set of 3D point clouds accumulated at time intervals of {Δt,Δt=t ⁱ⁺¹ -t ⁱ }

n is an integer greater than 1, for each point

The corresponding transformation matrix obtained after integrating, filtering, and nonlinear difference using the corresponding motion data

j is an integer less than or equal to n, and then all points can be reprojected to the coordinate system defined at time T _i , that is

So as to get the 3D point cloud in the same coordinate system

After transforming a group of 3D point clouds collected in a continuous time series into the same coordinate system, in order to facilitate the subsequent 3D reconstruction process and further improve the efficiency of 3D reconstruction, considering that the coordinate systems between different groups of 3D point clouds may also be Therefore, further integration of 3D point cloud data is required. For two adjacent groups of 3D point clouds, the gimbal load can obtain the relative poses between the adjacent two groups of 3D point clouds, and the relative poses between the adjacent two groups of 3D point clouds are used to The two sets of 3D point clouds are reprojected to the same coordinate system to realize the matching and integration of 3D point cloud data, thus facilitating the subsequent 3D reconstruction process. Among them, the adjacent two sets of three-dimensional point clouds include points collected for the same object, so as to ensure the smooth matching of the two adjacent three-dimensional point clouds.

Further, considering that the Iterative Closest Point/Plane (ICP) algorithm is usually used in related technologies to match between 3D point clouds, but this algorithm is time-consuming and may not meet the needs of real-time generation of 3D models. Therefore, when it is required to be able to generate a 3D model in real time, when acquiring the relative poses between two adjacent groups of 3D point clouds, the PTZ load acquires the surface features and/or corner features of each group of 3D point clouds, Then, according to the surface features and/or corner features of the adjacent two groups of three-dimensional point clouds, the relative poses between the two adjacent groups of three-dimensional point clouds are obtained; the relative poses between the two adjacent groups of three-dimensional point clouds are used for Reproject two adjacent sets of 3D point clouds to the same coordinate system. This embodiment uses the surface features and/or corner features in the 3D point cloud acquisition space to perform high-speed matching, simplifies the matching process, helps improve the efficiency of acquiring the relative poses between two adjacent groups of 3D point clouds, and shortens the matching time. .

It can be understood that in the absence of real-time requirements, the Iterative Closest Point/Plane (ICP) algorithm in related technologies can also be used to match between 3D point clouds to obtain the relative pose between 3D point clouds. This embodiment does not have any limitation on this.

In an implementation manner, for two adjacent groups of three-dimensional point clouds, the surface features and/or corner features of the three-dimensional point clouds of the group may be determined according to the curvature information between the midpoints of the three-dimensional point clouds of each group. As an example, the surface features of each group of 3D point clouds are determined according to curvature information whose curvature is greater than a preset threshold; the corner features of each group of 3D point clouds are determined according to curvature information whose curvature is less than or equal to a preset threshold determined. In this embodiment, the matching process between points is converted into a matching process between surface features and/or corner features of a three-dimensional point cloud in space, which is beneficial to simplify the matching process and realize the two adjacent sets of three-dimensional point clouds. The high-speed matching of features and/or corner features is further conducive to improving the efficiency of obtaining relative poses between two adjacent sets of 3D point clouds, shortening the matching time, and meeting the needs of real-time generation of 3D models.

Wherein, the surface feature includes at least one three-dimensional coordinate information indicating a plane and a normal vector of the plane; and/or the corner feature includes at least one three-dimensional coordinate information indicating an edge and a vector indicating the edge.

Specifically, when the relative poses between the adjacent two groups of three-dimensional point clouds are obtained according to the surface features and/or corner features of the two adjacent groups of three-dimensional point clouds, the pan-tilt load may be based on the adjacent two groups of three-dimensional point clouds. The similarity between the surface features of the cloud, and/or the similarity between the corner features of the adjacent two sets of three-dimensional point clouds, to obtain the relative pose between the three-dimensional point clouds; The more similar the features are, and/or the more similar the corner features of the adjacent two sets of 3D point clouds are, indicating that the points in the adjacent two sets of 3D point clouds indicated by the surface feature and/or the corner feature may be matched. . In this embodiment, the relative poses between the adjacent two groups of three-dimensional point clouds are obtained through the similarity between the surface features and/or the similarity between the corner features, so as to ensure the accuracy of the obtained results.

In an implementation manner, the similarity between the surface features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the surface features of the two adjacent groups of three-dimensional point clouds; and/or, the The similarity between the corner features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the corner features of the two adjacent groups of three-dimensional point clouds. Wherein, and/or represent both or one of the two.

In one example, according to the three-dimensional point cloud in the same coordinate system

generate

Surface features

and

Edge features

According to the above information, the optimization algorithm can be used to solve:

in,

represents the distance difference between the surface features of the adjacent two groups of three-dimensional point clouds,

Represents the distance difference between the corner features of the adjacent two groups of three-dimensional point clouds.

The relative poses between two adjacent groups of 3D point clouds can be obtained through the transformation and expression of LieAlgebra algorithm. This embodiment uses the surface feature and the corner feature to perform high-speed matching, which is beneficial to improve the matching efficiency and meet the requirement of generating a three-dimensional model in real time.

The process of acquiring the relative pose between images is described here: the image sequence includes a plurality of images acquired in a continuous time sequence.

In an implementation manner, the PTZ payload can extract feature points of each of the images, and then perform registration based on the feature points of the images to obtain the relative poses between the images. In one example, the PTZ load may use a bundle adjustment (Bundle Adjustment, BA) algorithm to register the feature points of the images, and obtain the relative poses between the images.

In the second implementation manner, considering that the embodiment of the present application also acquires the 3D point cloud while acquiring the image sequence, the depth map obtained by the back-projection of the 3D point cloud can be used to assist the registration of the image process to improve image registration accuracy. Specifically, the PTZ load can reproject the 3D point cloud by using the motion data, and generate a depth map corresponding to the image according to the reprojected 3D point cloud and the motion data; It is mentioned that each point in the 3D point cloud has independent coordinates. In order to ensure the accuracy of the generated depth map, each point in the 3D point cloud needs to be in the same coordinate system, so the PTZ load uses The motion data reprojects the 3D point cloud, so that each point in the reprojected 3D point cloud is in the same coordinate system; further, the motion data obtained by the inertial measurement unit corresponds to the PTZ The pose information of the load, since the inertial measurement unit, the second image acquisition device and the first image acquisition device are all installed on the PTZ load, the second image acquisition device and the first image acquisition device are The motion process is the same as or similar to the motion process of the PTZ load, so the motion data acquired by the inertial measurement unit can also indicate the motion process of the second image acquisition device and the second image acquisition device, Then, a depth map corresponding to the image may be generated using the reprojected 3D point cloud and the motion data.

In one example, considering the limited density of the 3D point cloud, a polyhedron may be generated based on the reprojected 3D point cloud, and then projected onto the camera plane according to the multi-facet map and the motion data to obtain the image corresponding to the depth map.

Next, the PTZ load also extracts the feature points in the image, and obtains the depth information of the feature points from the depth map corresponding to the image; and then according to the feature points of the image and the The depth information is registered to obtain the relative pose between the images. In this embodiment, the depth map obtained by the back-projection of the three-dimensional point cloud is used to assist the registration process of the image, which is beneficial to improve the accuracy of image registration, thereby improving the accuracy of the subsequently generated three-dimensional model.

In a third implementation manner, if the second image acquisition device is a visible light camera, considering that the light camera may be dependent on illumination, the image sequence acquired by the visible light camera may not be available in the absence of light , based on this, in order to ensure the accuracy of image registration, in order to unify points in different coordinate systems in the three-dimensional point cloud, the pan/tilt load in the embodiment of the present application uses the motion data to reproject the three-dimensional point cloud , so that each point in the reprojected three-dimensional point cloud is in the same coordinate system, and considering that the inertial measurement unit, the second image acquisition device and the first image acquisition device are all installed on the PTZ load, The motion data obtained by the inertial measurement unit can also indicate the motion process of the second image acquisition device and the second image acquisition device, and then can generate and match the motion data according to the re-projected three-dimensional point cloud and the motion data. The depth map corresponding to the images in the image sequence is obtained; finally, registration is performed according to the feature points extracted from the depth map to obtain the relative pose between the images. In this embodiment, the depth map obtained by the back-projection of the three-dimensional point cloud is used to perform registration instead of the image, so as to ensure that an accurate image registration result can be obtained even in the absence of light, thereby improving the accuracy of the subsequently generated three-dimensional model.

In one embodiment, in order to meet the requirement of generating a 3D model online in real time, a distributed accelerator can be used to obtain the relative posture between the 3D point clouds and the relative posture between the images, so as to speed up the matching process and improve data acquisition. efficiency.

After acquiring the relative poses between the three-dimensional point clouds and the relative poses between the images, the PTZ load can be based on the three-dimensional point cloud, the image sequence, the motion data, the three-dimensional point The relative poses between the clouds and the relative poses between the images generate a three-dimensional model of the target scene in real time, so as to meet the requirement of generating a three-dimensional model online in real time.

Further, the PTZ load can transmit the 3D model to the remote control terminal in real time after generating the 3D model, so that the 3D model can be displayed on the remote control terminal in real time, so that the user can know the progress of the 3D reconstruction in real time, which is convenient. Used by the user; if the user finds a missing location after seeing the 3D model, he can also input feedback information for the 3D model displayed on the remote control terminal, where the feedback information is used to instruct the first image acquisition device The position of the missed acquisition, the remote control terminal transmits the feedback information to the PTZ load, so that the PTZ load can control the first image acquisition device to collect the information indicated by the feedback information in the target scene. The 3D point cloud at the location realizes the online monitoring process, ensures the accurate and effective acquisition of 3D point cloud data, and is conducive to improving the efficiency of 3D reconstruction.

In one embodiment, when the 3D model of the target scene is generated in real time, the PTZ load is obtained according to the motion data, the relative pose between the 3D point clouds and the relative pose between the images. the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image; and then according to the three-dimensional point cloud, the image sequence and the first image The pose when the acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image, generates a three-dimensional model in real time. This embodiment realizes the generation of a high-precision three-dimensional model based on the fusion of various data.

In an implementation manner, the gimbal load first establishes a pose graph model, and the pose graph model includes but is not limited to a sliding window pose graph model. Then, the pose graph model is optimized by using the motion data, the relative poses between the three-dimensional point clouds, and the relative poses between the images, and the first image acquisition device acquires the three-dimensional point cloud. The pose and/or the pose when the second image acquisition device acquires the image. Wherein, the motion data obtained by the inertial measurement unit corresponds to the pose information of the gimbal load, since the inertial measurement unit, the second image acquisition device and the first image acquisition device are all installed on the gimbal load, The motion process of the second image acquisition device and the first image acquisition device is the same or similar to the motion process of the gimbal load, so the motion data acquired by the inertial measurement unit can also indicate the first image acquisition device. The motion process of the second image acquisition device and the second image acquisition device, therefore, the motion data can be used as one of the optimization factors for optimizing the pose graph model. For example, motion data includes acceleration, which is integrated to obtain velocity, and velocity is integrated to obtain relative displacement. The relative displacement can constrain the displacement between the two positions of the gimbal load. For another example, the motion data includes the angular velocity, and the angle is obtained after the angular velocity is integrated, and then the relative attitude information is obtained. The relative attitude can constrain the rotation between the two attitudes of the gimbal load. This embodiment performs optimization based on the pose graph model to obtain high-precision pose data, so that the three-dimensional model generated based on the high-precision pose data has higher positioning accuracy and robustness. Wherein, in the pose graph model, each vertex is configured to indicate the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires an image; and , the edges between the vertices are configured to indicate the relative poses between the three-dimensional point clouds, the relative poses between the images, or the motion data, and each edge constitutes the constraint relationship of each vertex, through The relative pose between the three-dimensional point clouds indicated by each edge, the relative pose between the images, or the motion data are used to optimize the position of the first image acquisition device in each vertex when the three-dimensional point cloud is collected. pose and/or the pose when the second image acquisition device acquires the image.

In an exemplary embodiment, the pose graph model includes a sliding window pose graph model, and in each iterative optimization process of the window, the pose indicated by the ith (i is an integer greater than 1) vertex is selected and the pose indicated by at least one reference vertex (for example, the reference vertex can be the i-jth vertex, and j is an integer greater than 0) as the pose to be iteratively optimized in the window, using the three-dimensional point indicated by the edge between the vertices The relative pose between clouds, the relative pose between the images, or the motion data is used for pose optimization. After the optimization is completed, the window is slid to select the pose indicated by the next vertex for optimization.

In one example, referring to FIG. 4 , assuming that the first image acquisition device and the second image acquisition device acquire data at the same frame rate, each vertex instructs the first image acquisition device to acquire a three-dimensional point cloud n is an integer greater than 0, and each edge indicates the relative pose between the three-dimensional point clouds and the relative pose between the images or the motion data; in another example, please refer to FIG. 5 , assuming that the first image acquisition device and the second image acquisition device acquire data at different frame rates, and each vertex indicates the first image The pose when the collecting device collects the three-dimensional point cloud and/or the pose when the second image collecting device obtains the image, n is an integer greater than 0, and each edge indicates the relative pose between the three-dimensional point clouds, all The relative pose between the images or the motion data.

In some scenarios, in scenarios where the positioning signal is higher than a predetermined strength, such as outdoor scenarios, in order to further improve the accuracy of the three-dimensional model, the PTZ load may also be equipped with a locator, which is used to obtain positioning information such as GPS information and/or RTK information, the locator collects positioning information during the process of collecting the three-dimensional point cloud by the first image collecting device and acquiring the image sequence by the second image collecting device, please refer to FIG. The edges between the vertices can also be configured to indicate GPS information and/or RTK information, so as to obtain high-precision pose information and improve the positioning accuracy of the generated three-dimensional model.

In one embodiment, after acquiring the pose when the first image acquisition device acquires the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image, the pan-tilt load can utilize the The pose of the three-dimensional point cloud reprojects the three-dimensional point cloud to the target coordinate system, for example, the target coordinate system can be the world coordinate system, and then generates an initial three-dimensional model according to the re-projected three-dimensional point cloud, and then according to the The pose when the second image acquisition device acquires the image and the multiple images in the image sequence perform texture mapping on the initial three-dimensional model to obtain the final three-dimensional model.

After acquiring the 3D model, the PTZ load may add a tag index to the 3D model, and then store the 3D model in the storage location pointed to by the tag index, so that the tag index can be used in subsequent use based on the tag index. The three-dimensional model is found by index; and/or, the three-dimensional model is transmitted to the remote control terminal in real time, so that the three-dimensional model is displayed on the remote control terminal in real time, so that the progress of the three-dimensional reconstruction can be displayed to the user in real time, which is convenient for the user to use.

In order to further improve the accuracy of the 3D model, the PTZ load can also perform loop closure detection according to the 3D point cloud or the image sequence, and optimize the pose graph model according to the results of the loop closure detection, thereby optimizing all The 3D model is described to ensure the accuracy and global uniformity of the optimized 3D model.

It can be understood that, the specific implementation process of loopback detection in this embodiment of the present application is not limited, and specific settings may be made according to actual application scenarios.

In one example, a bag-of-words model may be constructed according to the image sequence or the depth map obtained by re-projection of the three-dimensional point cloud, so that the bag-of-words model can measure the similarity or all of the images in the image sequence. The similarity between the multiple depth images, the pan-tilt load can perform loop closure detection according to the similarity between the multiple images in the image sequence, and determine the image forming a loop; or, according to the three-dimensional point cloud The similarity between the multiple depth maps obtained by reprojection is subjected to loop closure detection to determine the depth map forming a loop closure, and then to determine the image corresponding to the depth map; The pose graph model is globally optimized, that is, the relative poses between the looped images are formed to globally optimize the pose graph model, and the optimized pose information is obtained, that is, the optimized first image acquisition device The pose when collecting the three-dimensional point cloud and/or the pose when the second image acquisition device acquires the image, and then update the three-dimensional model according to the globally optimized pose graph model, that is, update according to the optimized pose information The three-dimensional model ensures the accuracy and global uniformity of the optimized three-dimensional model.

In another example, the down-sampled global model and the three-dimensional model stored in the first image acquisition device may be used to perform loop closure detection, to determine the relative poses between the three-dimensional point clouds forming loop closures, and then to perform loop closure detection. The gimbal load can globally optimize the pose graph model by using the result of the loopback detection, that is, the relative poses between the three-dimensional point clouds forming the loopback can be used to globally optimize the pose graph model, and the optimized pose graph model can be obtained. The pose information, that is, the optimized pose when the first image acquisition device collects the 3D point cloud and/or the pose when the second image acquisition device acquires the image, and then updated according to the globally optimized pose graph model The three-dimensional model is to update the three-dimensional model according to the optimized pose information, so as to ensure the accuracy and global unity of the optimized three-dimensional model.

During the global optimization process, referring to FIG. 7, the edges in the pose graph model are also configured to indicate the relative poses between the images forming the loops or the relative poses between the three-dimensional point clouds forming the loops, thereby The pose graph model is optimized, and the three-dimensional model is updated according to the globally optimized pose graph model.

After obtaining the updated 3D model, the PTZ load can store the updated 3D model in the storage location pointed to by the corresponding label index; and/or, transmit the updated 3D model to the remote control in real time terminal to display the 3D model in real time on the remote control terminal, so that the user can know the progress of the 3D reconstruction in real time, which is convenient for the user to use; if the user finds a missing location after seeing the 3D model, he can also target the display on the remote control terminal. Feedback information is input to the 3D model on the PTZ, and the feedback information is used to indicate the position where the first image acquisition device is missing, and the remote control terminal transmits the feedback information to the PTZ load, so that the PTZ The load can control the first image acquisition device to collect the three-dimensional point cloud at the position indicated by the feedback information in the target scene, realize the online monitoring process, ensure the accurate and effective acquisition of the three-dimensional point cloud data, and help improve the three-dimensional reconstruction. efficiency.

Correspondingly, referring to FIG. 8 , an embodiment of the present application further provides a pan-tilt load 20 , including a first image acquisition device 21 , a second image acquisition device 22 , an inertial measurement unit 23 , and a memory for storing executable instructions 24 and processor 25.

The first image acquisition device 21 is used to acquire a three-dimensional point cloud of the target scene.

The second image acquisition device 22 is configured to acquire an image sequence of the target scene.

During the collection of the three-dimensional point cloud and the image sequence, the inertial measurement unit 23 is configured to acquire motion data of the gimbal load; the motion data includes the pose information of the gimbal load.

When the processor 25 executes the executable instructions, it is configured to:

The processor 25 executes the executable instructions included in the memory 24, and the processor 25 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors) Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 24 stores executable instructions for the three-dimensional reconstruction method, and the memory 24 may include at least one type of storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.) , Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Disk, CD and so on. Also, the pan-tilt load 20 may cooperate with a network storage device that performs the storage function of the memory through a network connection. The memory 24 may be an internal storage unit of the gimbal load 20 , such as a hard disk or a memory of the gimbal load 20 . The memory 24 can also be an external storage device of the gimbal load 20, such as a pluggable hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory card equipped on the gimbal load 20. (Flash Card) etc. Further, the memory 24 may also include both an internal storage unit of the pan-tilt load 20 and an external storage device. Memory 24 is used to store executable instructions and other programs and data required by the device. The memory 24 may also be used to temporarily store data that has been or will be output.

The various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented. For software implementation, embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation. The software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in memory and executed by a controller.

Those skilled in the art can understand that FIG. 8 is only an example of the pan-tilt load 20, and does not constitute a limitation to the pan-tilt payload 20. It may include more or less components than the one shown, or combine some components, or different Components, such as devices, may also include input and output devices, network access devices, buses, and the like.

In one embodiment, the first image acquisition device 21 is specifically configured to acquire the 3D point cloud of the target scene in real time; the second image acquisition device 22 is specifically configured to acquire the image sequence of the target scene in real time; and the The inertial measurement unit 23 is specifically configured to acquire motion data of the PTZ load in real time.

In an embodiment, the target scene includes an indoor scene or an outdoor non-empty scene, and/or a scene where the satellite positioning signal is lower than a predetermined intensity.

In one embodiment, the first image acquisition device 21 includes at least one or more of the following: lidar, binocular vision sensor, and structured light depth camera;

The second image acquisition device 22 includes at least one or more of the following: a visible light camera, a grayscale camera, and an infrared camera.

In an embodiment, the processor 25 is specifically configured to: acquire the pose and/or the second image according to the three-dimensional point cloud, the image sequence, and the first image acquisition device when acquiring the three-dimensional point cloud The acquisition device acquires the pose of the image, and generates a three-dimensional model of the target scene in real time.

In one embodiment, when acquiring the relative pose between the three-dimensional point clouds, the processor 25 is further configured to: for a group of three-dimensional point clouds collected in a continuous time series, according to the The transformation relationship between the motion data acquisition points corresponding to each point;

The points in the set of three-dimensional point clouds are reprojected to the same coordinate system according to the transformation relationship between the points.

In one embodiment, the processor 25 is further configured to: determine the relative pose of the inertial measurement unit 23 within a preset time interval according to the motion data corresponding to each point in the set of three-dimensional point clouds; Let the time interval be the time interval for obtaining adjacent points;

Obtain the transformation between the points according to the relative pose of the inertial measurement unit 23 within a preset time interval and the external parameter conversion relationship between the first image acquisition device 21 and the inertial measurement unit 23 relation.

In one embodiment, the motion data corresponding to each point is determined according to the acquisition time of the point.

In one embodiment, when acquiring the relative pose between the three-dimensional point clouds, the processor 25 is further configured to:

For two adjacent groups of 3D point clouds, obtain the surface features and/or corner features of each group of 3D point clouds; wherein, the adjacent two groups of 3D point clouds both include points collected for the same object;

According to the surface features and/or corner features of the adjacent two groups of three-dimensional point clouds, the relative poses between the two adjacent groups of three-dimensional point clouds are obtained; the relative poses between the two adjacent groups of three-dimensional point clouds are used to The adjacent two sets of 3D point clouds are reprojected to the same coordinate system.

In one embodiment, the surface features and/or corner features of each group of three-dimensional point clouds are determined according to curvature information between three-dimensional points in the group of three-dimensional point clouds.

In one embodiment, the surface features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is greater than a preset threshold;

The corner features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is less than or equal to a preset threshold.

In one embodiment, the processor 25 is further configured to: according to the similarity between the surface features of the adjacent two groups of three-dimensional point clouds, and/or the similarity between the corner features of the adjacent two groups of three-dimensional point clouds degrees to obtain the relative pose between 3D point clouds.

In one embodiment, the similarity between the surface features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the surface features of the two adjacent groups of three-dimensional point clouds; and/or,

The similarity between the corner features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the corner features of the two adjacent groups of three-dimensional point clouds.

In an embodiment, the surface feature includes at least one three-dimensional coordinate information indicating a plane and a normal vector of the plane; and/or,

The corner feature includes at least one three-dimensional coordinate information indicating an edge and a vector indicating the edge.

In one embodiment, the sequence of images includes a plurality of images acquired in a continuous time sequence;

When acquiring the relative poses between the images, the processor 25 is further configured to extract feature points in the images, and perform registration based on the feature points of the images to obtain the relative poses between the images .

When acquiring the relative pose between the images, the processor 25 is further configured to:

reprojecting the 3D point cloud using the motion data, and generating a depth map corresponding to the image according to the reprojected 3D point cloud and the motion data;

extracting feature points in the image, and obtaining depth information of the feature points from a depth map corresponding to the image;

The registration is performed according to the feature points of the images and the depth information of the feature points, and the relative poses between the images are acquired.

In one embodiment, when acquiring the relative pose between the images, the processor 25 is further configured to: in the absence of light, use the motion data to reproject the three-dimensional point cloud, and re-project the three-dimensional point cloud according to the re-projection. The projected three-dimensional point cloud and the motion data generate a depth map corresponding to the images in the image sequence; perform registration according to the feature points extracted from the depth map to obtain the relative poses between the images.

In one embodiment, the processor 25 is further configured to:

Build a pose graph model;

The pose graph model is optimized by using the motion data, the relative poses between the three-dimensional point clouds, and the relative poses between the images, and obtains the data obtained when the first image acquisition device 21 obtains the three-dimensional point cloud. The pose and/or the pose when the second image acquisition device 22 acquires the image.

In one embodiment, in the pose graph model, each vertex is configured to instruct the first image acquisition device 21 to acquire a pose when acquiring a three-dimensional point cloud and/or the second image acquisition device 22 to acquire an image. posture when

And, the edges between the vertices are configured to indicate relative poses between the three-dimensional point clouds, relative poses between the images, or the motion data.

In one embodiment, the edges between the various vertices are further configured to indicate GPS information and/or RTK information.

In one embodiment, the pose graph model includes a sliding window pose graph model.

In one embodiment, the processor 25 is further configured to: after adding a label index to the 3D model, store the 3D model in the storage location pointed to by the label index; and/or, store the 3D model Real-time transmission to the remote control terminal to display the three-dimensional model in real time on the remote control terminal.

In one embodiment, the processor 25 is further configured to:

Loop closure detection is performed according to the similarity between multiple images in the image sequence; or, loop closure detection is performed according to the similarity between multiple depth maps obtained by re-projection of the 3D point cloud; The global model and the three-dimensional model for loop closure detection;

The pose graph model is globally optimized using the result of loop closure detection; and the three-dimensional model is updated according to the globally optimized pose graph model.

In one embodiment, the processor 25 is further configured to: perform global optimization on the pose graph model by using the relative pose between the images forming the loop or the relative pose between the points forming the loop.

In one embodiment, a bag-of-words model is used to measure the similarity between multiple images in the image sequence or the similarity between the multiple depth images.

In one embodiment, the processor 25 is further configured to:

generating an acquisition trajectory and acquisition time according to the accuracy requirements and/or density requirements of the three-dimensional model;

The first image acquisition device 21 , the second image acquisition device 22 and the inertial measurement unit 23 are controlled to acquire data according to the acquisition track and acquisition time.

In one embodiment, the processor 25 is further configured to:

Receive feedback information transmitted by the remote control terminal; the feedback information is used to indicate the position of the missed acquisition by the first image acquisition device 21, and the feedback information is input by the user with respect to the three-dimensional model displayed on the remote control terminal;

The first image acquisition device 21 is controlled to acquire the three-dimensional point cloud at the position indicated by the feedback information in the target scene.

For details of the implementation process of the functions and functions of the units in the above device, please refer to the implementation process of the corresponding steps in the above method, which will not be repeated here.

Correspondingly, an embodiment of the present application also provides a movable platform, including the above-mentioned PTZ load.

Wherein, the movable platform includes at least an unmanned aerial vehicle, an unmanned vehicle or a mobile robot.

In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium, such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

A non-transitory computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.

It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

A three-dimensional reconstruction method, characterized in that it is applied to a pan-tilt load, wherein the pan-tilt load is provided with a first image acquisition device and a second image acquisition device, and the method comprises:

Acquiring a 3D point cloud, an image sequence, and motion data of the pan/tilt load of the target scene; wherein, the 3D point cloud is acquired by the first image acquisition device, and the image sequence is acquired by the second image acquisition device ; The motion data includes: during the acquisition of the three-dimensional point cloud and the image sequence of the target scene, the pose information of the load on the PTZ;

The three-dimensional point cloud is registered according to the motion data to obtain the relative pose between the three-dimensional point clouds; and the image sequence is registered to obtain the relative pose between the images;

According to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, obtain the pose and/or the first image acquisition device when acquiring the three-dimensional point cloud. 2. the pose when the image acquisition device acquires the image;

According to the 3D point cloud, the image sequence, and the pose when the first image acquisition device acquires the 3D point cloud and/or the pose when the second image acquisition device acquires the image, a 3D image of the target scene is generated Model.
The method according to claim 1, wherein the acquiring the 3D point cloud of the target scene, the image sequence and the motion data loaded by the PTZ comprises: acquiring the 3D point cloud, the image sequence and all the motion data of the target scene in real time. Describe the motion data of the gimbal load.
The method according to claim 1, wherein the target scene includes an indoor scene or an outdoor non-empty scene; and/or a scene where the satellite positioning signal is lower than a predetermined intensity.
The method according to claim 1 or 2, wherein an inertial measurement unit is further installed on the pan-tilt load;

The first image acquisition device is configured to acquire a three-dimensional point cloud of the target scene;

The second image acquisition device is used for acquiring the image sequence of the target scene;

The inertial measurement unit is used for acquiring motion data of the PTZ load.
The method according to claim 4, wherein the first image acquisition device comprises at least one or more of the following: lidar, binocular vision sensor and structured light depth camera; the second image acquisition device comprises At least one or more of the following: visible light cameras, grayscale cameras, and infrared cameras.
The method according to claim 1, wherein the pose and/or the second image when acquiring the three-dimensional point cloud according to the three-dimensional point cloud, the image sequence and the first image acquisition device The acquisition device acquires the pose of the image, and generates a three-dimensional model of the target scene, including:

According to the 3D point cloud, the image sequence, and the pose when the first image acquisition device acquires the 3D point cloud and/or the pose when the second image acquisition device acquires the image, a 3D image of the target scene is generated in real time Model.
The method according to claim 4, wherein the registering the three-dimensional point cloud according to the motion data, and obtaining the relative pose between the three-dimensional point clouds, comprises:

For a group of 3D point clouds collected on a continuous time series, obtain the transformation relationship between points according to the motion data corresponding to each point in the group of 3D point clouds;

The points in the set of three-dimensional point clouds are reprojected to the same coordinate system according to the transformation relationship between the points.
The method according to claim 7, wherein the obtaining the transformation relationship between the points according to the motion data corresponding to each point in the set of three-dimensional point clouds comprises:

Determine the relative pose of the inertial measurement unit within a preset time interval according to the motion data corresponding to each point in the group of three-dimensional point clouds; the preset time interval is the time interval for obtaining adjacent points;

The transformation relationship between the points is acquired according to the relative pose of the inertial measurement unit within a preset time interval and the external parameter transformation relationship between the first image acquisition device and the inertial measurement unit.
The method according to claim 7, wherein the motion data corresponding to each point is determined according to the acquisition time of the point.
The method according to claim 7, wherein the registering the three-dimensional point cloud according to the motion data to obtain the relative pose between the three-dimensional point clouds, further comprising:

For two adjacent groups of 3D point clouds, obtain the surface features and/or corner features of each group of 3D point clouds; wherein, the adjacent two groups of 3D point clouds both include points collected for the same object;

According to the surface features and/or corner features of the adjacent two groups of three-dimensional point clouds, the relative poses between the two adjacent groups of three-dimensional point clouds are obtained; the relative poses between the two adjacent groups of three-dimensional point clouds are used to The adjacent two sets of 3D point clouds are reprojected to the same coordinate system.
The method according to claim 10, wherein the surface features and/or corner features of each group of three-dimensional point clouds are determined according to the curvature information between the middle points of the group of three-dimensional point clouds.
The method according to claim 11, wherein the surface features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is greater than a preset threshold;

The corner features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is less than or equal to a preset threshold.
The method according to claim 10, wherein the obtaining the relative pose between the three-dimensional point clouds according to the surface features and/or corner features of two adjacent groups of three-dimensional point clouds, comprising:

The relative pose between the three-dimensional point clouds is obtained according to the similarity between the surface features of the two adjacent groups of three-dimensional point clouds, and/or the similarity between the corner features of the two adjacent groups of three-dimensional point clouds.
The method according to claim 13, wherein the similarity between the surface features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the surface features of the adjacent two groups of three-dimensional point clouds; and /or,

The similarity between the corner features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the corner features of the two adjacent groups of three-dimensional point clouds.
The method according to claim 10, wherein the surface feature comprises at least one three-dimensional coordinate information indicating a plane and a normal vector of the plane; and/or,

The corner feature includes at least one three-dimensional coordinate information indicating an edge and a vector indicating the edge.
The method of claim 1, wherein the image sequence comprises a plurality of images acquired in a continuous time series;

The registering the image sequence to obtain the relative pose between the images includes:

Feature points in the images are extracted, and registration is performed based on the feature points of the images to obtain relative poses between the images.
The method of claim 1, wherein the image sequence comprises a plurality of images acquired in a continuous time series;

The registering the image sequence to obtain the relative pose between the images includes:

reprojecting the 3D point cloud using the motion data, and generating a depth map corresponding to the image according to the reprojected 3D point cloud and the motion data;

extracting feature points in the image, and obtaining depth information of the feature points from a depth map corresponding to the image;

The registration is performed according to the feature points of the images and the depth information of the feature points, and the relative poses between the images are acquired.
The method according to claim 1, wherein the registering the image sequence to obtain the relative pose between the images further comprises:

In the absence of light, the three-dimensional point cloud is re-projected using the motion data, and a depth map corresponding to the images in the image sequence is generated according to the re-projected three-dimensional point cloud and the motion data;

The registration is performed according to the feature points extracted from the depth map, and the relative pose between the images is obtained.
The method of claim 1, wherein:

According to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, the pose and/or the pose and/or the pose when the first image acquisition device acquires the three-dimensional point cloud is acquired. Describe the pose when the second image acquisition device acquires the image, including:

Build a pose graph model;

The pose graph model is optimized by using the motion data, the relative poses between the three-dimensional point clouds, and the relative poses between the images, and obtains the pose when the first image acquisition device acquires the three-dimensional point cloud. pose and/or the pose when the second image acquisition device acquires the image.
The method according to claim 19, wherein, in the pose graph model, each vertex is configured to indicate the pose and/or the second pose when the first image capturing device collects the three-dimensional point cloud the pose when the image acquisition device acquires the image;

And, the edges between the vertices are configured to indicate relative poses between the three-dimensional point clouds, relative poses between the images, or the motion data.
The method according to claim 20, wherein the edges between the vertices are further configured to indicate positioning information; the positioning information includes at least GPS information and/or RTK information.
The method of claim 19, wherein the pose graph model comprises a sliding window pose graph model.
The method of claim 19, further comprising:

After adding a label index to the 3D model, store the 3D model in a storage location pointed to by the label index; and/or, transmit the 3D model to a remote control terminal in real time, so as to display the 3D model on the remote control terminal in real time 3D model.
The method according to claim 1, characterized in that, when the three-dimensional point cloud is acquired according to the three-dimensional point cloud, the image sequence and the first image acquisition device, the pose and/or the second After the image acquisition device acquires the pose of the image and generates the three-dimensional model of the target scene, it further includes:

Loop closure detection is performed according to the similarity between multiple images in the image sequence; or, loop closure detection is performed according to the similarity between multiple depth maps obtained by re-projection of the 3D point cloud; The global model and the three-dimensional model for loop closure detection;

using the results of loop closure detection to globally optimize the pose graph model; and,

The three-dimensional model is updated according to the globally optimized pose graph model.
The method according to claim 24, wherein the performing global optimization on the pose graph model using a result of loop closure detection comprises:

The pose graph model is globally optimized using the relative poses between the images forming the loops or the relative poses between the three-dimensional point clouds forming the loops.
The method according to claim 24, wherein a bag-of-words model is used to measure the similarity between multiple images in the image sequence or the similarity between the multiple depth images.
The method of claim 4, further comprising:

generating an acquisition trajectory and acquisition time according to the accuracy requirements and/or density requirements of the three-dimensional model;

The first image acquisition device, the second image acquisition device and the inertial measurement unit are controlled to acquire data according to the acquisition trajectory and acquisition time.
The method of claim 23, further comprising:

Receive feedback information transmitted by the remote control terminal; the feedback information is used to indicate the position of the missed acquisition by the first image acquisition device, and the feedback information is input by the user with respect to the three-dimensional model displayed on the remote control terminal;

The first image acquisition device is controlled to acquire the three-dimensional point cloud at the position indicated by the feedback information in the target scene.
A pan-tilt load, characterized by comprising a first image acquisition device, a second image acquisition device, an inertial measurement unit, a memory for storing executable instructions, and a processor;

The first image acquisition device is used to acquire a three-dimensional point cloud of the target scene;

The second image acquisition device is used for acquiring the image sequence of the target scene;

During the collection of the three-dimensional point cloud and the image sequence, the inertial measurement unit is used to obtain motion data of the gimbal load; the motion data includes the pose information of the gimbal load;

When the processor executes the executable instructions, it is configured to:

The three-dimensional point cloud is registered according to the motion data to obtain the relative pose between the three-dimensional point clouds; and the image sequence is registered to obtain the relative pose between the images;

According to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, obtain the pose and/or the first image acquisition device when acquiring the three-dimensional point cloud. 2. the pose when the image acquisition device acquires the image;

A 3D model of the target scene is generated according to the 3D point cloud, the sequence of images, and the pose when the first image acquisition device acquires the 3D point cloud and/or the pose when the second image acquisition device acquires the image .

According to the motion data, the relative pose between the three-dimensional point clouds, and the relative pose between the images, obtain the pose and/or the first image acquisition device when acquiring the three-dimensional point cloud. The pose when the image acquisition device acquires the image
The PTZ load according to claim 29, wherein,

The first image acquisition device is specifically configured to acquire the three-dimensional point cloud of the target scene in real time;

The second image acquisition device is specifically configured to acquire the image sequence of the target scene in real time; and

The inertial measurement unit is specifically configured to acquire motion data of the PTZ load in real time.
The PTZ load according to claim 29, wherein the target scene includes an indoor scene or an outdoor non-open scene, and/or a scene where the satellite positioning signal is lower than a predetermined strength.
The pan-tilt load according to claim 29, wherein the first image acquisition device comprises at least one or more of the following: a lidar, a binocular vision sensor, and a structured light first image acquisition device;

The second image acquisition device includes at least one or more of the following: a visible light camera, a grayscale camera, and an infrared camera.
The PTZ payload according to claim 29, wherein the processor is specifically configured to: acquire the pose of the 3D point cloud according to the 3D point cloud, the image sequence and the first image acquisition device when acquiring the 3D point cloud And/or the pose when the second image acquisition device acquires the image, and generates a three-dimensional model of the target scene in real time.
The PTZ load according to claim 29, wherein when acquiring the relative pose between the three-dimensional point clouds, the processor is further configured to: for a group of three-dimensional point clouds collected in a continuous time series , obtain the transformation relationship between points according to the motion data corresponding to each point in the set of three-dimensional point clouds;

The points in the set of three-dimensional point clouds are reprojected to the same coordinate system according to the transformation relationship between the points.
The pan-tilt load according to claim 34, wherein the processor is further configured to: determine the motion data of the inertial measurement unit within a preset time interval according to the motion data corresponding to each point in the set of three-dimensional point clouds relative pose; the preset time interval is the time interval for obtaining adjacent points;

The transformation relationship between the points is acquired according to the relative pose of the inertial measurement unit within a preset time interval and the external parameter transformation relationship between the first image acquisition device and the inertial measurement unit.
The PTZ load according to claim 34, wherein the motion data corresponding to each point is determined according to the acquisition time of the point.
The pan-tilt load according to claim 34, wherein when acquiring the relative pose between the three-dimensional point clouds, the processor is further configured to:

For two adjacent groups of 3D point clouds, obtain the surface features and/or corner features of each group of 3D point clouds; wherein, the adjacent two groups of 3D point clouds both include points collected for the same object;

According to the surface features and/or corner features of the adjacent two groups of three-dimensional point clouds, the relative poses between the two adjacent groups of three-dimensional point clouds are obtained; the relative poses between the two adjacent groups of three-dimensional point clouds are used to The adjacent two sets of 3D point clouds are reprojected to the same coordinate system.
The pan/tilt load according to claim 37, wherein the surface features and/or corner features of each group of three-dimensional point clouds are determined according to the curvature information between the midpoints of the group of three-dimensional point clouds.
The PTZ load according to claim 38, wherein the surface features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is greater than a preset threshold;

The corner features of each group of three-dimensional point clouds are determined according to curvature information whose curvature is less than or equal to a preset threshold.
The PTZ load according to claim 37, wherein the processor is further configured to: according to the similarity between the surface features of the adjacent two groups of three-dimensional point clouds, and/or, the adjacent two groups of three-dimensional point clouds The similarity between the corner features of the cloud and the relative pose between the three-dimensional point clouds are obtained.
The pan/tilt load according to claim 40, wherein the similarity between the surface features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the surface features of the two adjacent groups of three-dimensional point clouds ;and / or,

The similarity between the corner features of the adjacent two groups of three-dimensional point clouds is determined based on the distance between the corner features of the two adjacent groups of three-dimensional point clouds.
The PTZ payload according to claim 37, wherein the surface feature includes at least one three-dimensional coordinate information indicating a plane and a normal vector of the plane; and/or,

The corner feature includes at least one three-dimensional coordinate information indicating an edge and a vector indicating the edge.
The pan-tilt load of claim 29, wherein the image sequence comprises a plurality of images acquired in a continuous time series;

When acquiring the relative poses between the images, the processor is further configured to extract feature points in the images, perform registration based on the feature points of the images, and obtain the relative poses between the images.
The pan-tilt load of claim 29, wherein the image sequence comprises a plurality of images acquired in a continuous time series;

When acquiring the relative pose between the images, the processor is further configured to:

reprojecting the 3D point cloud using the motion data, and generating a depth map corresponding to the image according to the reprojected 3D point cloud and the motion data;

extracting feature points in the image, and obtaining depth information of the feature points from a depth map corresponding to the image;

The registration is performed according to the feature points of the images and the depth information of the feature points, and the relative poses between the images are acquired.
The pan-tilt load according to claim 29, wherein when acquiring the relative pose between images, the processor is further configured to: use the motion data to convert the three-dimensional point in the absence of light Reproject the cloud, and generate a depth map corresponding to the image in the image sequence according to the reprojected 3D point cloud and the motion data; perform registration according to the feature points extracted from the depth map, and obtain the Relative pose between images.
The PTZ load according to claim 29, wherein the processor is further configured to:

Build a pose graph model;

The pose graph model is optimized by using the motion data, the relative poses between the three-dimensional point clouds, and the relative poses between the images, and obtains the pose when the first image acquisition device acquires the three-dimensional point cloud. pose and/or the pose when the second image acquisition device acquires the image.
The PTZ payload according to claim 46, wherein in the pose graph model, each vertex is configured to indicate the pose and/or the pose when the first image capturing device collects a three-dimensional point cloud The pose when the second image acquisition device acquires the image;

And, the edges between the vertices are configured to indicate relative poses between the three-dimensional point clouds, relative poses between the images, or the motion data.
The PTZ load according to claim 47, wherein the edges between the vertices are further configured to indicate positioning information; the positioning information at least includes GPS information and/or RTK information.
The gimbal payload according to claim 46, wherein the pose graph model comprises a sliding window pose graph model.
The PTZ load according to claim 46, wherein the processor is further configured to: after adding a label index to the 3D model, store the 3D model to a storage location pointed to by the label index; and /or, transmitting the three-dimensional model to the remote control terminal in real time, so as to display the three-dimensional model on the remote control terminal in real time.
The PTZ load according to claim 29, wherein the processor is further configured to:

Loop closure detection is performed according to the similarity between multiple images in the image sequence; or, loop closure detection is performed according to the similarity between multiple depth maps obtained by re-projection of the 3D point cloud; The global model and the three-dimensional model for loop closure detection;

The pose graph model is globally optimized using the result of loop closure detection; and the three-dimensional model is updated according to the globally optimized pose graph model.
The pan-tilt load according to claim 51, wherein the processor is further configured to: use the relative pose between the images forming the loop or the relative pose between the three-dimensional points forming the loop to The pose graph model is globally optimized.
The PTZ payload according to claim 51, wherein a bag-of-words model is used to measure the similarity between multiple images in the image sequence or the similarity between the multiple depth images.
The PTZ load according to claim 29, wherein the processor is further configured to:

generating an acquisition trajectory and acquisition time according to the accuracy requirements and/or density requirements of the three-dimensional model;

The first image acquisition device, the second image acquisition device and the inertial measurement unit are controlled to acquire data according to the acquisition trajectory and acquisition time.
The PTZ load according to claim 50, wherein the processor is further configured to:

Receive feedback information transmitted by the remote control terminal; the feedback information is used to indicate the position of the missed acquisition by the first image acquisition device, and the feedback information is input by the user with respect to the three-dimensional model displayed on the remote control terminal;

The first image acquisition device is controlled to acquire a three-dimensional point cloud at the position indicated by the feedback information in the target scene.
A movable platform, characterized by comprising the PTZ load according to any one of claims 29 to 55.
The movable platform of claim 56, wherein the movable platform comprises at least an unmanned aerial vehicle, an unmanned vehicle or a mobile robot.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, implements the method according to any one of claims 1 to 28 method.