WO2020113423A1 - 目标场景三维重建方法、系统及无人机 - Google Patents

目标场景三维重建方法、系统及无人机 Download PDF

Info

Publication number
WO2020113423A1
WO2020113423A1 PCT/CN2018/119190 CN2018119190W WO2020113423A1 WO 2020113423 A1 WO2020113423 A1 WO 2020113423A1 CN 2018119190 W CN2018119190 W CN 2018119190W WO 2020113423 A1 WO2020113423 A1 WO 2020113423A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
key frame
frame
image sequence
Prior art date
Application number
PCT/CN2018/119190
Other languages
English (en)
French (fr)
Inventor
朱晏辰
马东东
石进桥
薛唐立
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880072188.7A priority Critical patent/CN111433818A/zh
Priority to PCT/CN2018/119190 priority patent/WO2020113423A1/zh
Publication of WO2020113423A1 publication Critical patent/WO2020113423A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • Embodiments of the present invention relate to the technical field of drones, and in particular, to a three-dimensional reconstruction method and system of a target scene and a drone.
  • Simultaneous Localization and map construction (Simultaneous Localization and Mapping, SLAM) describes starting from an unknown position in an unknown environment, repeatedly observing the environment during movement, positioning its position and posture according to the environmental characteristics sensed by the sensor, and then incrementally according to its position Building map.
  • the existing three-dimensional reconstruction method Due to the particularity of drone aerial photography, the existing three-dimensional reconstruction method has a large three-dimensional reconstruction error under the drone aerial photography scene. In summary, there is an urgent need for a three-dimensional reconstruction method of target scenes that can meet the requirements of drone aerial photography scenes.
  • Embodiments of the present invention provide a three-dimensional reconstruction method and system for a target scene and a drone, to solve the existing method cannot meet the needs of the three-dimensional reconstruction of the target scene in the aerial photography scene of the drone.
  • an embodiment of the present invention provides a three-dimensional reconstruction method for a target scene, including:
  • the three-dimensional point cloud of the key frame is fused to obtain a three-dimensional model of the target scene.
  • an embodiment of the present invention provides a three-dimensional reconstruction method for a target scene, including:
  • M is a natural number greater than or equal to 1;
  • an embodiment of the present invention provides a three-dimensional reconstruction system for a target scene, including: a processor and a memory;
  • the memory is used to store program codes
  • the processor calls the program code, and when the program code is executed, it is used to perform the following operations:
  • the three-dimensional point cloud of the key frame is fused to obtain a three-dimensional model of the target scene.
  • an embodiment of the present invention provides a three-dimensional reconstruction system for a target scene, including: a processor and a memory;
  • the memory is used to store program codes
  • the processor calls the program code, and when the program code is executed, it is used to perform the following operations:
  • M is a natural number greater than or equal to 1;
  • an embodiment of the present invention provides a drone, including: a processor;
  • the drone is equipped with a shooting device, and the shooting device is used to shoot a target scene;
  • the processor is used for,
  • the three-dimensional point cloud of the key frame is fused to obtain a three-dimensional model of the target scene.
  • an embodiment of the present invention provides a drone, including: a processor;
  • the drone is equipped with a shooting device, and the shooting device is used to shoot a target scene;
  • the processor is used for,
  • M is a natural number greater than or equal to 1;
  • an embodiment of the present invention provides a three-dimensional reconstruction device (eg, chip, integrated circuit, etc.) of a target scene, including: a memory and a processor.
  • the memory is used to store code for performing a three-dimensional reconstruction method of the target scene.
  • the processor is configured to call the code stored in the memory and execute the three-dimensional reconstruction method of the target scene according to the first aspect or the second aspect of the embodiment of the present invention.
  • an embodiment of the present invention provides a computer-readable storage medium that stores a computer program, where the computer program includes at least one piece of code, and the at least one piece of code can be executed by a computer to control the computer
  • the computer executes the three-dimensional reconstruction method of the target scene according to the first aspect or the second aspect of the embodiment of the present invention.
  • an embodiment of the present invention provides a computer program, which, when executed by a computer, is used to implement the three-dimensional reconstruction method of the target scene according to the first aspect or the second aspect of the embodiment of the present invention.
  • the method and system for three-dimensional reconstruction of a target scene provided by an embodiment of the present invention and a drone acquire an image sequence of the target scene, the image sequence includes a plurality of image frames that are continuous in time sequence; and obtain key frames according to the image sequence, And obtain a three-dimensional point cloud of the key frame based on the image sequence; fuse the three-dimensional point cloud of the key frame to obtain a three-dimensional model of the target scene.
  • the three-dimensional reconstruction of the target scene under the drone aerial scene is realized.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment does not need to rely on the expensive binocular vision system, nor is it limited by the depth of the depth sensor, and can meet the three-dimensional reconstruction requirements of the target scene in the aerial photography scene of the drone.
  • FIG. 1 is a schematic structural diagram of an unmanned aerial system provided by an embodiment of the present invention.
  • FIG. 2 is a flowchart of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention
  • FIG. 3 is a schematic diagram of reference frame selection in an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention
  • FIG. 4 is a schematic block diagram of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention.
  • FIG. 5 is a schematic structural diagram of an embodiment of a three-dimensional reconstruction system for a target scene provided by the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a drone provided by the present invention.
  • a component when a component is said to be “fixed” to another component, it can be directly on another component or it can also exist in a centered component. When a component is considered to be “connected” to another component, it can be directly connected to another component or there can be centered components at the same time.
  • the embodiments of the present invention provide a three-dimensional reconstruction method and system of a target scene and a drone.
  • the drone may be, for example, a rotorcraft (rotorcraft), for example, a multirotor aircraft propelled by a plurality of propulsion devices through air, and the embodiments of the present invention are not limited thereto.
  • FIG. 1 is a schematic architectural diagram of an unmanned aerial system provided by an embodiment of the present invention.
  • a rotary-wing UAV is taken as an example for description.
  • the unmanned aerial system 100 may include a drone 110, a display device 130, and a control terminal 140.
  • the UAV 110 may include a power system 150, a flight control system 160, a rack, and a gimbal 120 carried on the rack.
  • the drone 110 can communicate wirelessly with the control terminal 140 and the display device 130.
  • the rack may include a fuselage and a tripod (also called landing gear).
  • the fuselage may include a center frame and one or more arms connected to the center frame, the one or more arms extending radially from the center frame.
  • the tripod is connected to the fuselage and is used to support the UAV 110 when it lands.
  • the power system 150 may include one or more electronic governors (abbreviated as electric governors) 151, one or more propellers 153, and one or more motors 152 corresponding to the one or more propellers 153, wherein the motor 152 is connected to Between the electronic governor 151 and the propeller 153, the motor 152 and the propeller 153 are disposed on the arm of the drone 110; the electronic governor 151 is used to receive the driving signal generated by the flight control system 160 and provide driving according to the driving signal The current is given to the motor 152 to control the rotation speed of the motor 152. The motor 152 is used to drive the propeller to rotate, thereby providing power for the flight of the drone 110, which enables the drone 110 to achieve one or more degrees of freedom of movement.
  • electric governors abbreviated as electric governors
  • the drone 110 may rotate about one or more rotation axes.
  • the rotation axis may include a roll axis (Roll), a yaw axis (Yaw), and a pitch axis (Pitch).
  • the motor 152 may be a DC motor or an AC motor.
  • the motor 152 may be a brushless motor or a brush motor.
  • the flight control system 160 may include a flight controller 161 and a sensing system 162.
  • the sensing system 162 is used to measure the attitude information of the drone, that is, the position information and status information of the drone 110 in space, for example, three-dimensional position, three-dimensional angle, three-dimensional velocity, three-dimensional acceleration, and three-dimensional angular velocity.
  • the sensing system 162 may include, for example, at least one of a gyroscope, an ultrasonic sensor, an electronic compass, an inertial measurement unit (Inertial Measurement Unit, IMU), a visual sensor, a global navigation satellite system, and a barometer.
  • the global navigation satellite system may be a global positioning system (Global Positioning System, GPS).
  • the flight controller 161 is used to control the flight of the drone 110.
  • the flight of the drone 110 can be controlled according to the attitude information measured by the sensor system 162. It should be understood that the flight controller 161 may control the drone 110 according to pre-programmed program instructions, or may control the drone 110 by responding to one or more control instructions from the control terminal 140.
  • the gimbal 120 may include a motor 122.
  • the gimbal is used to carry the shooting device 123.
  • the flight controller 161 can control the movement of the gimbal 120 through the motor 122.
  • the gimbal 120 may further include a controller for controlling the movement of the gimbal 120 by controlling the motor 122.
  • the gimbal 120 may be independent of the drone 110, or may be a part of the drone 110.
  • the motor 122 may be a DC motor or an AC motor.
  • the motor 122 may be a brushless motor or a brush motor.
  • the gimbal can be located at the top of the drone or at the bottom of the drone.
  • the shooting device 123 may be, for example, a device for capturing images such as a camera or a video camera.
  • the shooting device 123 may communicate with the flight controller and perform shooting under the control of the flight controller.
  • the photographing device 123 of this embodiment includes at least a photosensitive element, for example, a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor (CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor. It can be understood that the shooting device 123 can also be directly fixed on the drone 110, so that the gimbal 120 can be omitted.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the display device 130 is located on the ground side of the unmanned aerial system 100, can communicate with the unmanned aircraft 110 in a wireless manner, and can be used to display the attitude information of the drone 110.
  • the image captured by the imaging device may also be displayed on the display device 130. It should be understood that the display device 130 may be an independent device or may be integrated in the control terminal 140.
  • the control terminal 140 is located at the ground end of the unmanned aerial system 100, and can communicate with the drone 110 in a wireless manner for remote manipulation of the drone 110.
  • the drone 110 may also be equipped with a speaker (not shown in the figure).
  • the speaker is used to play audio files.
  • the speaker may be directly fixed on the drone 110 or may be mounted on the gimbal 120.
  • the shooting device 123 in this embodiment may be, for example, a monocular camera, which is used to shoot a target scene to obtain an image sequence of the target scene.
  • the three-dimensional reconstruction method of the target scene provided by the following embodiment may be executed by, for example, the flight controller 161, and the flight controller 161 acquires the image sequence of the target through the shooting device 123 to realize the three-dimensional reconstruction of the target scene, which may be used for drone flight.
  • the three-dimensional reconstruction method of the target scene can also be performed by the control terminal 140 located on the ground side, for example, the drone transmits the image sequence of the target scene acquired by the shooting device 123 to the control terminal 140 through image transmission technology, and the control terminal 140 completes the The three-dimensional reconstruction of the target scene; for example, the three-dimensional reconstruction method of the target scene can also be executed by a cloud server (not shown in the figure) located in the cloud, and the drone transmits the image sequence of the target scene acquired by the shooting device 123 to the cloud through image transmission technology Server, the cloud server completes the three-dimensional reconstruction of the target scene.
  • a cloud server not shown in the figure
  • FIG. 2 is a flowchart of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention. As shown in FIG. 2, the method provided in this embodiment may include:
  • a drone equipped with a monocular shooting device may be used to shoot a target scene to obtain an image sequence of the target scene.
  • the target scene is an object that requires three-dimensional reconstruction.
  • the flight path can be planned for the drone, the flight speed and shooting frame rate can be set to obtain the image sequence of the target scene, or the shooting location can also be set when the drone is flying When you reach the preset shooting location, shoot.
  • the image sequence of the target scene acquired in this embodiment includes a plurality of image frames continuous in time sequence.
  • the key frame is an image frame that needs deep restoration in order to realize three-dimensional reconstruction.
  • the key frame in this embodiment may include one frame among a plurality of image frames consecutive in time series.
  • the first image frame may be used as a key frame, and then the key frame is determined by filtering through a threshold according to the number of matched feature points.
  • the three-dimensional point cloud of the key frame can be determined by performing feature extraction, feature point matching, pose estimation, etc. on the acquired multiple image frames that are consecutive in time sequence.
  • features with rotation invariance such as Scale-Invariant Feature Transform (SIFT), Accelerated Robust Features (Speed Up Robust Features, SURF), and so on.
  • SIFT Scale-Invariant Feature Transform
  • SURF Accelerated Robust Features
  • the posture estimation of each image frame during shooting can be obtained by sensors mounted on the drone, such as an odometer, a gyroscope, an IMU, and the like.
  • the target scene is three-dimensionally reconstructed according to the three-dimensional point cloud.
  • the three-dimensional reconstruction method of a target scene obtains an image sequence of a target scene, the image sequence includes a plurality of image frames that are continuous in time sequence; a key frame is obtained according to the image sequence, and is obtained based on the image sequence A three-dimensional point cloud of the key frame; fuse the three-dimensional point cloud of the key frame to obtain a three-dimensional model of the target scene.
  • the three-dimensional reconstruction of the target scene under the drone aerial scene is realized.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment does not need to rely on the expensive binocular vision system, nor is it limited by the depth of the depth sensor, and can meet the three-dimensional reconstruction requirements of the target scene in the aerial photography scene of the drone.
  • the method may further include: Initialize the three-dimensional information of the image sequence.
  • the three-dimensional information of the image sequence can be initialized according to the position information and posture information provided by the sensor.
  • it can be initialized based on real-time dynamic (RTK) information, global positioning system (GPS) information, and pan/tilt angle information.
  • RTK real-time dynamic
  • GPS global positioning system
  • pan/tilt angle information pan/tilt angle information
  • an implementation manner for initializing the three-dimensional information of the image sequence may be: acquiring the initial rotation transformation matrix of the visual coordinate system to the world coordinate system according to the pan/tilt angle information corresponding to the first image frame; Real-time dynamic RTK information and camera center information corresponding to N image frames, correct the initial rotation transformation matrix to obtain rotation matrix, translation matrix and scale information from the visual coordinate system to the world coordinate system; according to the rotation matrix, translation The matrix and scale information initialize the three-dimensional information of the image sequence.
  • the initial rotation transformation matrix of the visual coordinate system to the world coordinate system is determined according to the pan/tilt angle information provided by the UAV airborne pan/tilt when shooting the first image frame. According to the initial rotation transformation matrix of the visual coordinate system to the world coordinate system, the absolute positioning information in the real world coordinate system can be obtained.
  • the following N image frames are used to correct the initial rotation transformation matrix.
  • N is a natural number greater than or equal to 1, and the specific value can be set according to actual needs, which is not limited in this embodiment.
  • the initial rotation change matrix is corrected to obtain the rotation matrix, translation matrix and Scale information.
  • the three-dimensional information of the image sequence is initialized according to the rotation matrix, translation matrix and scale information. Absolute positioning information in the world coordinate system can be obtained.
  • the three-dimensional reconstruction method of the target scene provided in this embodiment, based on the above embodiment, adds the conversion of the visual coordinate system to the world coordinate system during the initialization process, and through the rotation matrix, the translation matrix and the scale information, the solution poses It is the pose information available in the world coordinate system.
  • an implementation manner of obtaining a three-dimensional point cloud of a key frame based on an image sequence may be: acquiring feature information of the image sequence; tracking feature points according to the feature information; and tracking feature points according to the feature points Tracking results, determine the three-dimensional point cloud of the key frame.
  • the feature information in this embodiment may be, for example, image features obtained by performing feature extraction on the image sequence.
  • image features obtained by performing feature extraction on the image sequence.
  • scale-invariant feature transform features Scale-Invariant Features Transform, SIFT
  • accelerated robust features Speed Up Robust Features, SURF
  • the SIFT feature matching point pair may be determined according to the matching relationship of the SIFT feature points between the current image frame and the current key frame, and the three-dimensional point cloud of the key frame may be determined according to the feature matching point pair.
  • an implementation manner of tracking feature points may be:
  • the first pose information includes: first real-time dynamic RTK information and first gimbal angle information;
  • estimate second pose information of the second image frame in the world coordinate system includes: second RTK information and second gimbal angle information;
  • first image frame and the second image frame are two adjacent frames in the image sequence.
  • the pose of the second image frame is estimated according to the RTK information of the first image frame and the gimbal angle information provided by the sensor. Since the accurate RTK information and gimbal angle information provided by the sensor are used, the accuracy of the estimated pose information of the second image frame will be greatly improved, and the accurate pose information improves the accuracy and speed of feature matching.
  • the feature matching of the feature information of the first image frame and the feature information of the second image frame according to the first pose information and the second pose information may specifically include: acquiring the first image frame and the second image frame With respect to the features of the first image frame, according to the first pose information and the second pose information, the corresponding search range is determined in the second image frame to perform feature matching. Since accurate pose information is obtained, not only can the accurate search range be determined, but also the search range can be greatly reduced, so not only the accuracy of feature matching is improved but also the speed of feature matching is increased.
  • the overlap rate between two adjacent frames of images is low, resulting in poor tracking of feature points.
  • a judgment is made as to whether the previous frame is a key frame. If it is a key frame, the original frame information of the previous frame is replaced with the key frame feature information. Since the key frame has an additional 3D point cloud generation operation, the available 3D point cloud generated by the overlapping area image can be used to the maximum within a limited time, so that the number of effective feature points tracked is improved.
  • the RTK information and the gimbal angle information provided by the sensor are added to the pose calculation, so that the pose calculation accuracy is higher and is not easily interfered by mismatch. It solves the problem in the prior art that in the vision-based scheme, when there is a mismatch, the accuracy of pose calculation is reduced or even an error occurs.
  • the three-dimensional reconstruction method of the target scene provided in this embodiment before fusing the three-dimensional point cloud of key frames, may further include:
  • the RTK information and the gimbal angle information are optimized in a non-linear optimization manner to optimize the pose information of the key frame and the position of the three-dimensional point cloud.
  • This embodiment does not limit the specific algorithm used in the nonlinear optimization, for example, the Gauss-Newton method, the crack Berg-Marquardt method, etc. may be used.
  • optimization processing is performed based on RTK information and PTZ angle information. This can include:
  • the local map can be composed of the current frame, the common-view key frames of the current frame, and the point clouds they can observe.
  • the RTK information and the pan/tilt angle information corresponding to each key frame participating in the optimization are added, so that the pose calculation of the key frame and the position of the three-dimensional point cloud are more accurate.
  • the optimized cost function not only considers the reprojection error, but also considers the current estimated pose and sensor
  • the gap between the provided poses can be optimized by using the optimized cost function. It solves the problem of poor stability caused by considering only visual reprojection error in the prior art.
  • this embodiment will also perform global optimization on all retained key frames and three-dimensional point clouds. It can be understood that adding RTK information and PTZ angle information to the global optimization makes the final output result more accurate.
  • the image A reference frame is selected for the key frame in the sequence, and then a depth map of the key frame is determined according to the selected reference frame, and a three-dimensional point cloud of the key frame is obtained according to the depth map of the key frame.
  • the reference frame may include at least a first image frame and a second image frame. Wherein, the first image frame is located before the key frame in time sequence, and the second image frame is located after the key frame in time sequence.
  • the reference frame in this embodiment includes both the first image frame that is located before the reference frame in timing , Also includes a second image frame that is located behind the reference frame in timing, which improves the overlap rate between the key frame and the reference frame, reduces the area where parallax has no solution, and thus improves the depth of the key frame obtained based on the reference frame The accuracy of the graph.
  • the reference frame includes two frames before and after the key frame.
  • the overlap rate between two adjacent frames is 70%
  • the reference frame only includes the image frame before the key frame, at least 30% of the parallax in the key frame has no solution .
  • the selection strategy of the reference frame provided in this embodiment enables all areas in the key frame to find the matching area in the reference frame, avoiding the occurrence of parallax insolubility and improving the depth map of the key frame. accuracy.
  • the first image frame may include a preset number of image frames before the Nth frame
  • the second image frame may include a preset number of image frames after the Nth frame.
  • the first image frame may be one of a preset number of image frames before the Nth frame
  • the second image frame may be a preset number of image frames after the Nth frame In a frame.
  • the reference frame may include at least a third Image frame.
  • the epipolar directions of the third image frame and the key frame are not parallel.
  • the epipolar line in this embodiment is the epipolar line in the epipolar geometry, that is, the intersection between the polar plane and the image.
  • the epipolar direction of the third image frame and the key frame are not parallel, that is to say, the first intersection line of the polar plane and the third image frame is not parallel to the second intersection line of the polar plane and the key frame.
  • the third image frame may include an image frame that has overlapping pixels with the key frame in the adjacent flight zone of the key frame.
  • the third image frame may be an image frame with the highest overlap rate with the key frame in the adjacent flight zone of the key frame.
  • FIG. 3 is a schematic diagram of reference frame selection in an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention.
  • the solid line is used to represent the flight path of the drone
  • the route covers the target scene
  • the arrow indicates the flight direction of the drone
  • the black circles and black squares on the flight path indicate the shooting of the drone
  • the device shoots at this position, that is, a black circle and a black square correspond to an image frame of the target scene.
  • the image sequence of the target scene can be obtained through the shooting device mounted on the drone, such as a monocular camera, which includes multiple consecutive image frames in time series.
  • M-1, M, M+1, N-1, N, N+1 in FIG. 3 represent the frame number of the image frame
  • N and M are natural numbers, and the specific values of N and M are not limited in this embodiment .
  • the reference frame may include the N-1th frame and the N+1th frame shown in the figure.
  • the reference frame may include the Mth frame shown in the figure.
  • the reference frame may include the Mth frame, the N-1th frame, and the N+1th frame shown in the figure, that is, FIG. 3 The image frame included in the dotted circle.
  • the reference frame may further include more image frames, for example, the M-1th frame, the M+1th frame, the N-2th frame, and the like.
  • the overlap rate of the key frame and the reference frame and the calculation speed can be comprehensively considered and selected.
  • one implementation manner of obtaining the depth map of the key frame based on the reference frame may be: obtaining the depth map of the key frame according to the disparity between the key frame and the reference frame.
  • the depth map of the key frame can be obtained according to the aberrations of the same object in the key frame and the reference frame.
  • an implementation manner of obtaining the three-dimensional point cloud of the key frame based on the image sequence may be: obtaining a depth map of the key frame according to the image sequence; according to the depth of the key frame Figure to obtain a three-dimensional point cloud of the key frame.
  • an implementation manner of obtaining the depth map of the key frame according to the image sequence may be: according to the image sequence, determining a matching cost corresponding to the key frame; according to the key frame correspondence To determine the depth map of the key frame.
  • the matching cost of the key frame can be determined by matching the image sequence with the pixels in the key frame. After the matching cost corresponding to the key frame is determined, matching cost aggregation can be performed, and then the parallax is determined, and the depth map of the key frame is determined according to the correspondence between the parallax and the depth.
  • parallax optimization can also be performed to enhance the parallax. According to the parallax after optimization and enhancement, the depth map of the key frame is determined.
  • the flying height of the drone is usually about 100 meters, and the drone is usually shot vertically downwards. Due to the fluctuation of the ground, the reflection of the sun is different, and the images taken by the drone have non-negligible illumination Changes, lighting changes will reduce the accuracy of the three-dimensional reconstruction of the target scene.
  • determining the matching cost corresponding to the key frame according to the image sequence may include: According to the image sequence, determine the first type matching cost and the second type matching cost corresponding to the key frame; determine that the matching cost corresponding to the key frame is equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the robustness of the matching cost to illumination is improved compared to using only a single type of matching cost, thereby reducing the The influence of illumination changes on 3D reconstruction improves the accuracy of 3D reconstruction.
  • the weighting coefficients of the first-type matching cost and the second-type matching cost in this embodiment can be set according to specific needs, and this embodiment does not limit this.
  • the first-type matching cost may be determined based on zero-normalized normalized cross-correlation (Zero-based Normalized Cross Correlation, ZNCC). Based on ZNCC, the similarity between the key frame and the reference frame can be accurately measured.
  • ZNCC Zero-normalized normalized cross-correlation
  • the matching cost of the second type may be determined based on the invariant feature of illumination.
  • the illumination-invariant features in the image frames collected by the drone can be extracted, such as local binary patterns (LBP), census sequences, etc., and then the second type can be determined based on the illumination-invariant features Match the cost.
  • LBP local binary patterns
  • the census sequence in this embodiment can be determined as follows: select any point in the image frame, draw a rectangle such as 3 ⁇ 3 with the point as the center, and every point except the center point in the rectangle is the same as the center point For comparison, the gray value is less than the center point is recorded as 1, the gray value is greater than the center point is recorded as 0, and the resulting sequence of length 8 is only 0 and 1 as the census sequence of the center point, that is, the center pixel The gray value is replaced by the census sequence.
  • the Hamming distance can be used to determine the second type matching cost of the key frame.
  • the matching cost corresponding to the key frame may be equal to the weighted sum of the two matching costs of ZNCC and census.
  • an implementation manner for determining the depth map of the key frame may be: dividing the key frame into multiple image blocks; according to the image sequence, determining the matching corresponding to each image block Cost; according to the matching cost corresponding to each image block, the matching cost corresponding to the key frame is determined.
  • one or more of the following methods may be used to divide the key frame into multiple image blocks:
  • the key frame may be divided into multiple image blocks according to the color information and/or texture information of the key frame in a clustering manner.
  • the key frame is evenly divided into multiple image blocks.
  • the number of image blocks may be set in advance, and then the key frames may be divided according to the number of image blocks set in advance.
  • the matching cost corresponding to each image block may be determined in parallel according to the image sequence.
  • the matching cost corresponding to each image block may be determined in parallel by using software and/or hardware.
  • multiple threads may be used to determine the matching cost corresponding to each image block in parallel, and/or a graphics processor (Graphics Processing Unit, GPU) may be used to determine the matching cost corresponding to each image block in parallel.
  • a graphics processor Graphics Processing Unit, GPU
  • the three-dimensional reconstruction method of the target scene provided in this embodiment, on the basis of the above embodiment, by dividing the key frame into multiple image blocks, according to the image sequence, the matching cost corresponding to each image block is determined in parallel, and then according to each image The matching cost corresponding to the block determines the matching cost corresponding to the key frame, which improves the calculation speed of the matching cost and further improves the real-time nature of the three-dimensional reconstruction of the target scene.
  • the number of depth samples can be determined according to the depth range and accuracy.
  • the number of depth samples is positively related to the depth range and negatively related to the accuracy. For example, if the depth range is 50 meters and the accuracy requirement is 0.1 meters, the number of depth samples can be 500.
  • SLAM Simultaneous Localization and Mapping
  • the matching cost corresponding to each image block is determined according to the image sequence , May include: determining the depth sampling times of each image block according to the sparse points in each image block; determining the matching cost corresponding to each image block according to the image sequence and the depth sampling times of each image block.
  • the key frame can contain a variety of subjects, such as pedestrians, cars, trees, tall buildings, etc., so the depth range of the entire key frame is relatively large, and the preset accuracy When required, the depth sampling times are larger.
  • the depth range corresponding to each image block in the key frame is relatively small. For example, when an image block includes only pedestrians, the depth range corresponding to the image block will be much smaller than the depth range of the entire key frame. Under the same accuracy requirements , Can greatly reduce the number of depth sampling. That is to say, under the same accuracy requirements, the depth sampling times of the image blocks in the key frame must be less than or equal to the depth sampling times of the entire key frame.
  • the depth range of each image block is fully considered, and the number of depth samples is set according to the depth range of each image block.
  • the calculation complexity is reduced and the speed is increased.
  • This embodiment may use SLAM to recover some sparse three-dimensional points in the image block for each image block, determine the depth range of the image block according to the sparse three-dimensional points, and determine the depth range and accuracy requirements of the image block The number of depth samples for this image block. The determined depth sampling times determine the matching cost corresponding to each image block.
  • the key frame is an image frame with a size of 640*480 pixels, and the number of depth samples is determined to be 500 according to the depth range of the key frame, the matching cost of 640*480*500 times needs to be calculated. If the key frame is evenly divided into 320*160 size image blocks, the depth sampling times of the 6 image blocks determined according to the depth range of each image block are 100, 200, 150, 100, 50, and 300, respectively. 320*160*(100+200+150+100+150+300) matching cost. The amount of calculation is only one-third of the original.
  • the depth map of the key frame may be determined according to a semi-global matching algorithm (Semi Global Matching, SGM).
  • SGM Semi Global Matching
  • the 3D reconstruction method of the target scene provided in this embodiment may further include: The depth map of the key frame is filtered. By filtering the depth map of the key frame, the noise in the depth map can be filtered out, and the accuracy of 3D reconstruction is improved.
  • an implementation manner of filtering the depth map of the key frame may be: performing a three-sided filtering process on the depth map of the key frame.
  • the trilateral filtering in this embodiment means that the weighting coefficients in the filtering process can be comprehensively determined according to the three factors of pixel distance, depth difference and color difference.
  • the size of the filtering template is 5*5, that is to say, the depth value of the target pixel after the filtering process can be determined by the depth values of the pixel and the surrounding 24 pixels.
  • the weight value of each pixel's influence on the depth value of the target pixel, according to the Euclidean distance of the pixel from the target pixel, the difference between the depth value of the pixel and the depth value of the target pixel, and the value of the pixel The difference between the RGB value and the RGB value of the target pixel is determined.
  • the three-dimensional reconstruction method of the target scene provided in this embodiment further performs a three-sided filtering process on the depth map of the key frame, and improves the depth of the key frame by the sharp and fine edge information in the key frame.
  • the accuracy of the edge of the map on the premise of saving the edge, removes the noise more robustly, making the depth map of the key frame more accurate, and the 3D reconstruction based on the depth map will also be more accurate.
  • a three-dimensional point cloud of key frames is fused to obtain a three-dimensional model of the target scene.
  • An implementation manner may be: fuse the three-dimensional point cloud corresponding to the key frame into the voxel corresponding to the target scene; To obtain a 3D model of the target scene.
  • a voxel-based three-dimensional point cloud fusion method is used.
  • the route is planned before the drone takes off, and the drone is shot vertically downwards, so the coverage of the planned route can be expressed with voxels of preset size.
  • the coverage of the planned route can be expressed with voxels of preset size.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment has high real-time performance and high scalability.
  • the computational complexity of fusion of 3D point clouds into voxels is o(1), and the real-time fusion is very high; for a planning task, according to the particularity of the route planning, the target area can be divided into multiple sub-blocks, which makes The point cloud also has good segmentation, which is conducive to the loading of the point cloud and the subsequent display of multiple levels of detail (LOD), which is convenient for real-time 3D reconstruction of large scenes.
  • LOD levels of detail
  • An embodiment of the present invention also provides a three-dimensional reconstruction method of a target scene, which may include:
  • the M-th image frame in this embodiment may be any frame in the image sequence of the target scene captured by the drone. It can be understood that, in order to complete the conversion of the coordinate system as soon as possible and meet the real-time requirements of the system, the first image frame may be used.
  • the pan/tilt angle information corresponding to the M-th image frame can be obtained by an unmanned aerial vehicle sensor, such as a gyroscope, electronic compass, IMU, odometer, etc.
  • the gimbal angle information in this embodiment may include at least one of the following information: a roll axis (Roll) angle, a yaw axis (Yaw) angle, and a pitch axis (pitch) angle.
  • the positive direction of each angle can be determined using the right-handed spiral rule in the visual coordinate system.
  • the initial rotation transformation matrix of the visual coordinate system to the world coordinate system is determined according to the gimbal angle information provided by the UAV onboard gimbal when the M-th image frame is taken. According to the initial rotation transformation matrix of the visual coordinate system to the world coordinate system, the absolute positioning information in the real world coordinate system can be obtained.
  • P subsequent image frames are used to correct the initial rotation transformation matrix.
  • P is a natural number greater than or equal to 1, and the specific value can be set according to actual needs, which is not limited in this embodiment.
  • the P image frames may be P-frame image frames that are consecutive in time sequence after the M-th frame image frame, or may be selected from N-frame image frames that are sequentially in time sequence after the M-th frame image frame.
  • the initial rotation change matrix is corrected to obtain the rotation matrix, translation matrix and Scale information.
  • the displacement in the visual coordinate system can be converted into the corresponding real distance in the world coordinate system.
  • the information is complementary and absolute positioning information can be obtained.
  • the corresponding pose information of the image sequence in the world coordinate system can be obtained according to the rotation matrix, translation matrix, and scale information.
  • the three-dimensional reconstruction method of the target scene obtaineds the initial rotation transformation matrix from the visual coordinate system to the world coordinate system according to the PTZ angle information corresponding to the M-th image frame, and according to the P
  • the real-time dynamic RTK information and camera center information corresponding to the image frame correct the initial rotation transformation matrix, obtain the rotation matrix, translation matrix and scale information from the visual coordinate system to the world coordinate system, and then according to the rotation matrix, translation matrix and scale information and
  • For the image sequence of the target scene obtain the corresponding pose information of the image sequence in the world coordinate system, track the feature points, and obtain the three-dimensional model of the target scene according to the tracking results of the feature points.
  • the pose information obtained in this embodiment is the pose information available in the world coordinate system, and a three-dimensional model of the target scene in the world coordinate system can be obtained.
  • an implementation manner of tracking feature points may be:
  • the first pose information includes: first real-time dynamic RTK information and first pan/tilt angle information;
  • the second pose information includes: second RTK information and second pan/tilt angle information;
  • the first image frame and the second image frame are two adjacent frames in the image sequence.
  • an implementation manner of obtaining the three-dimensional model of the target scene may be:
  • the position and attitude information of the key frame and the position of the three-dimensional point cloud are optimized by nonlinear optimization
  • the three-dimensional reconstruction method of the target scene provided by this embodiment may be implemented by two threads, namely a tracking thread and a graph building thread.
  • the tracking thread includes the steps of initialization, tracking feature points of the previous frame, tracking feature points of the local map, and inter-frame pose calculation.
  • the image information can be acquired through the shooting device mounted on the drone.
  • the initialization, tracking of the feature points of the previous frame and the inter-frame pose calculation are performed.
  • the map-building thread includes steps such as local map generation and local map optimization.
  • the local map can be optimized according to the RTK information provided by the sensors of the drone platform and the pan/tilt angle.
  • the local map in this embodiment may be composed of the current frame, the common-view key frame of the current frame, and the point cloud that they can observe. Due to the introduction of more accurate RTK information and gimbal angle information provided by the sensor, the entire system no longer only depends on visual measurement, so the robustness of the entire system is improved, and this embodiment can still be more effective in the case of poor visual information Good handling. Due to the introduction of more accurate RTK information and gimbal angle information, the inter-frame pose calculation in this embodiment has a higher pose determination accuracy, so that even in the case of more mismatched visual feature points, it can still Get accurate pose information.
  • a scene with a lower overlapping rate of adjacent image frames has better feature tracking effect, making feature tracking less likely to be lost.
  • the tracking thread and the mapping thread in this embodiment can be executed in parallel to increase the speed of three-dimensional reconstruction of the target scene and improve real-time performance.
  • FIG. 5 is a schematic structural diagram of an embodiment of a three-dimensional reconstruction system for a target scene provided by the present invention.
  • the target scene three-dimensional reconstruction system 500 provided in this embodiment may include: a processor 501 and a memory 502.
  • the processor 501 and the memory 502 are communicatively connected via a bus.
  • the processor 501 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), or off-the-shelf.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the above-mentioned memory 502 may be, but not limited to, random access memory (Random Access Memory, RAM for short), read-only memory (Read Memory Only, ROM for short), programmable read-only memory (Programmable Read-Only Memory, short for : PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc.
  • the memory 502 is used to store the program code; the processor 501 calls the program code, and when the program code is executed, it is used to perform the following operations:
  • the three-dimensional point cloud of the key frame is fused to obtain a three-dimensional model of the target scene.
  • processor 501 is also used for:
  • the three-dimensional information of the image sequence is initialized.
  • the processor 501 is used to initialize the three-dimensional information of the image sequence, which may specifically include:
  • the three-dimensional information of the image sequence is initialized according to the rotation matrix, translation matrix and scale information.
  • the processor 501 is used to obtain a three-dimensional point cloud of the key frame based on the image sequence, which may specifically include:
  • the three-dimensional point cloud of the key frame is determined.
  • the processor 501 is used for tracking feature points according to the feature information, which may specifically include:
  • the first pose information includes: first real-time dynamic RTK information and first gimbal angle information;
  • estimate second pose information of the second image frame in the world coordinate system includes: second RTK information and second gimbal angle information;
  • first image frame and the second image frame are two adjacent frames in the image sequence.
  • the processor 501 is further used to nonlinearly optimize the key frame according to the RTK information and the pan/tilt angle information corresponding to the key frame before the three-dimensional point cloud fused with the key frame The pose information and the position of the 3D point cloud are optimized.
  • the processor 501 is used to obtain a three-dimensional point cloud of the key frame based on the image sequence, which may specifically include:
  • a three-dimensional point cloud of the key frame is obtained.
  • the processor 501 is configured to obtain a depth map of the key frame according to the image sequence, which may specifically include:
  • the depth map of the key frame is determined according to the matching cost corresponding to the key frame.
  • the processor 501 is used to determine the matching cost corresponding to the key frame according to the image sequence, which may specifically include:
  • the matching cost corresponding to the key frame is equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the matching cost of the first type is determined based on a zero-mean normalized cross-correlation.
  • the matching cost of the second type is determined based on the invariant feature of illumination.
  • the processor 501 is configured to determine the depth map of the key frame according to the matching cost corresponding to the key frame, which may specifically include:
  • the matching cost corresponding to the key frame is determined according to the matching cost corresponding to each image block.
  • the processor 501 is used to divide the key frame into multiple image blocks, which may specifically include:
  • the key frame is divided into multiple image blocks.
  • the processor 501 is used to divide the key frame into multiple image blocks, which may specifically include:
  • the key frame is evenly divided into multiple image blocks.
  • the processor 501 is used to determine the matching cost corresponding to each image block according to the image sequence, which may specifically include:
  • the matching cost corresponding to each image block is determined in parallel.
  • the processor 501 is used to determine the matching cost corresponding to each image block according to the image sequence, which may specifically include:
  • the matching cost corresponding to each image block is determined according to the image sequence and the number of depth sampling times of each image block.
  • the processor 501 is further configured to filter the depth map of the key frame after obtaining the depth map of the key frame according to the image sequence.
  • the processor 501 is used to filter the depth map of the key frame, which specifically includes:
  • the processor 501 is used to fuse the three-dimensional point cloud of the key frame to obtain a three-dimensional model of the target scene, which may specifically include:
  • a three-dimensional model of the target scene is obtained.
  • An embodiment of the present invention also provides a three-dimensional reconstruction system for a target scene, including: a processor and a memory.
  • a processor for a specific implementation, reference may be made to the structural schematic diagram of the three-dimensional reconstruction system for the target scene shown in FIG. 5.
  • the memory is used to store the program code; the processor, calling the program code, when the program code is executed, is used to perform the following operations:
  • M is a natural number greater than or equal to 1;
  • the initial rotation transformation matrix is corrected to obtain the rotation matrix, translation matrix and scale information from the visual coordinate system to the world coordinate system.
  • the P image frames are located in time series After the M-th image frame, P is a natural number greater than or equal to 1;
  • the processor is used for tracking feature points according to the corresponding pose information of the image sequence in the world coordinate system, which may specifically include:
  • the first pose information includes: first real-time dynamic RTK information and first pan/tilt angle information;
  • the second pose information includes: second RTK information and second pan/tilt angle information;
  • the first image frame and the second image frame are two adjacent frames in the image sequence.
  • the processor is used to obtain a three-dimensional model of the target scene according to the tracking results of the feature points, which may specifically include:
  • the position and attitude information of the key frame and the position of the three-dimensional point cloud are optimized by nonlinear optimization
  • the drone 600 provided in this embodiment may include a processor 601.
  • the processor 601 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the drone 600 is equipped with a shooting device 602, and the shooting device 602 is used to shoot a target scene.
  • the processor 601 is used to obtain an image sequence of the target scene, and the image sequence includes a plurality of image frames continuous in time sequence;
  • the three-dimensional point cloud of the key frame is fused to obtain a three-dimensional model of the target scene.
  • the processor 601 is further configured to initialize the three-dimensional information of the image sequence before obtaining the key frame according to the image sequence.
  • the processor 601 is used to initialize the three-dimensional information of the image sequence, which may specifically include:
  • the three-dimensional information of the image sequence is initialized according to the rotation matrix, translation matrix and scale information.
  • the processor 601 is used to obtain a three-dimensional point cloud of the key frame based on the image sequence, which may specifically include:
  • the three-dimensional point cloud of the key frame is determined.
  • the processor 601 is used for tracking feature points according to the feature information, which may specifically include:
  • the first pose information includes: first real-time dynamic RTK information and first gimbal angle information;
  • estimate second pose information of the second image frame in the world coordinate system includes: second RTK information and second gimbal angle information;
  • first image frame and the second image frame are two adjacent frames in the image sequence.
  • the processor 601 is further configured to use a non-linear optimization method based on the RTK information and the pan/tilt angle information corresponding to the key frame before the three-dimensional point cloud fused with the key frame.
  • the pose information of the key frame and the position of the 3D point cloud are optimized.
  • the processor 601 is used to obtain a three-dimensional point cloud of the key frame based on the image sequence, which may specifically include:
  • a three-dimensional point cloud of the key frame is obtained.
  • the processor 601 is used to obtain a depth map of the key frame according to the image sequence, which may specifically include:
  • the depth map of the key frame is determined according to the matching cost corresponding to the key frame.
  • the processor 601 is configured to determine the matching cost corresponding to the key frame according to the image sequence, which may specifically include:
  • the matching cost corresponding to the key frame is equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the matching cost of the first type is determined based on a zero-mean normalized cross-correlation.
  • the matching cost of the second type is determined based on the invariant feature of illumination.
  • the processor 601 is configured to determine the depth map of the key frame according to the matching cost corresponding to the key frame, which may specifically include:
  • the matching cost corresponding to the key frame is determined according to the matching cost corresponding to each image block.
  • the processor 601 is used to divide the key frame into multiple image blocks, which may specifically include:
  • the key frame is divided into multiple image blocks.
  • the processor 601 is used to divide the key frame into multiple image blocks, which may specifically include:
  • the key frame is evenly divided into multiple image blocks.
  • the processor 601 is used to determine the matching cost corresponding to each image block according to the image sequence, which may specifically include:
  • the matching cost corresponding to each image block is determined in parallel.
  • the processor 601 is used to determine the matching cost corresponding to each image block according to the image sequence, which may specifically include:
  • the matching cost corresponding to each image block is determined according to the image sequence and the number of depth sampling times of each image block.
  • the processor 601 is further configured to filter the depth map of the key frame after obtaining the depth map of the key frame according to the image sequence.
  • the processor 601 is used for filtering the depth map of the key frame, which may specifically include:
  • the processor 601 is used to fuse the three-dimensional point cloud of the key frame to obtain a three-dimensional model of the target scene, which may specifically include:
  • a three-dimensional model of the target scene is obtained.
  • An embodiment of the present invention also provides a drone.
  • a drone for a specific implementation, reference may be made to the structural diagram of the drone shown in FIG. 6, which may include: a processor; Filming; the processor is used for,
  • M is a natural number greater than or equal to 1;
  • the initial rotation transformation matrix is corrected to obtain the rotation matrix, translation matrix and scale information from the visual coordinate system to the world coordinate system.
  • the P image frames are located in time series After the M-th image frame, P is a natural number greater than or equal to 1;
  • the processor is used for tracking feature points according to the corresponding pose information of the image sequence in the world coordinate system, which may specifically include:
  • the first pose information includes: first real-time dynamic RTK information and first pan/tilt angle information;
  • the second pose information includes: second RTK information and second pan/tilt angle information;
  • the first image frame and the second image frame are two adjacent frames in the image sequence.
  • the processor is used to obtain a three-dimensional model of the target scene according to the tracking results of the feature points, which may specifically include:
  • the position and attitude information of the key frame and the position of the three-dimensional point cloud are optimized by nonlinear optimization
  • An embodiment of the present invention further provides a three-dimensional reconstruction device (such as a chip, an integrated circuit, etc.) of a target scene, including: a memory and a processor.
  • the memory is used to store code for performing a three-dimensional reconstruction method of the target scene.
  • the processor is configured to call the code stored in the memory and execute the three-dimensional reconstruction method of the target scene described in any of the foregoing method embodiments.
  • An embodiment of the present invention also provides a computer-readable storage medium that stores a computer program, and the computer program includes at least one piece of code, and the at least one piece of code can be executed by a computer to control the computer to execute The three-dimensional reconstruction method of the target scene according to any one of the foregoing method embodiments.
  • An embodiment of the present invention provides a computer program which, when executed by a computer, is used to implement the three-dimensional reconstruction method of the target scene described in any of the foregoing method embodiments.
  • the foregoing program may be stored in a computer-readable storage medium, and when the program is executed, It includes the steps of the above method embodiments; and the foregoing storage media include: read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Medium.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种目标场景三维重建方法、系统及无人机,通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧(S201);根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云(S202);融合所述关键帧的三维点云,获得所述目标场景的三维模型(S203)。该方法既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。

Description

目标场景三维重建方法、系统及无人机 技术领域
本发明实施例涉及无人机技术领域,尤其涉及一种目标场景三维重建方法、系统及无人机。
背景技术
随着图像处理技术的不断发展,利用图像序列进行场景的三维重建已经成为计算机视觉领域和摄影测量学领域的热点问题。同步定位与地图构建(Simultaneous Localization and Mapping,SLAM)描述了从未知环境的未知位置出发,在运动过程中重复观测环境,根据传感器感知的环境特征定位自身位置和姿态,再根据自身位置增量式的构建地图。
由于无人机航拍的特殊性,现有三维重建方法在无人机航拍场景下,三维重建误差大。综上所述,亟需一种能够满足无人机航拍场景需求的目标场景三维重建方法。
发明内容
本发明实施例提供一种目标场景三维重建方法、系统及无人机,用以解决现有方法无法满足无人机航拍场景下目标场景三维重建的需求。
第一方面,本发明实施例提供一种目标场景三维重建方法,包括:
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
融合所述关键帧的三维点云,获得所述目标场景的三维模型。
第二方面,本发明实施例提供一种目标场景三维重建方法,包括:
根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始 旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
第三方面,本发明实施例提供一种目标场景三维重建系统,包括:处理器和存储器;
所述存储器,用于存储程序代码;
所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
融合所述关键帧的三维点云,获得所述目标场景的三维模型。
第四方面,本发明实施例提供一种目标场景三维重建系统,包括:处理器和存储器;
所述存储器,用于存储程序代码;
所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
第五方面,本发明实施例提供一种无人机,包括:处理器;
所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
所述处理器用于,
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
融合所述关键帧的三维点云,获得所述目标场景的三维模型。
第六方面,本发明实施例提供一种无人机,包括:处理器;
所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
所述处理器用于,
根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
第七方面,本发明实施例提供一种目标场景三维重建装置(例如芯片、集成电路等),包括:存储器和处理器。所述存储器,用于存储执行目标场景三维重建方法的代码。所述处理器,用于调用所述存储器中存储的所述代码,执行如第一方面或者如第二方面本发明实施例所述的目标场景三维重建方法。
第八方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至 少一段代码可由计算机执行,以控制所述计算机执行第一方面或者如第二方面本发明实施例所述的目标场景三维重建方法。
第九方面,本发明实施例提供一种计算机程序,当所述计算机程序被计算机执行时,用于实现第一方面或者第二方面本发明实施例所述的目标场景三维重建方法。
本发明实施例提供的目标场景三维重建方法、系统及无人机,通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;融合所述关键帧的三维点云,获得所述目标场景的三维模型。实现了无人机航拍场景下,目标场景的三维重建。本实施例提供的目标场景三维重建方法,既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的无人飞行系统的示意性架构图;
图2为本发明提供的目标场景三维重建方法一实施例的流程图;
图3为本发明提供的目标场景三维重建方法一实施例中参考帧选取的示意图;
图4为本发明提供的目标场景三维重建方法一实施例的示意性框图;
图5为本发明提供的目标场景三维重建系统一实施例的结构示意图;
图6为本发明提供的无人机一实施例的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于 本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,当组件被称为“固定于”另一个组件,它可以直接在另一个组件上或者也可以存在居中的组件。当一个组件被认为是“连接”另一个组件,它可以是直接连接到另一个组件或者可能同时存在居中组件。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。
下面结合附图,对本发明的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
本发明的实施例提供了目标场景三维重建方法、系统及无人机。其中无人机例如可以是旋翼飞行器(rotorcraft),例如,由多个推动装置通过空气推动的多旋翼飞行器,本发明的实施例并不限于此。
图1是本发明实施例提供的无人飞行系统的示意性架构图。本实施例以旋翼无人机为例进行说明。
无人飞行系统100可以包括无人机110、显示设备130和控制终端140。其中,无人机110可以包括动力系统150、飞行控制系统160、机架和承载在机架上的云台120。无人机110可以与控制终端140和显示设备130进行无线通信。
机架可以包括机身和脚架(也称为起落架)。机身可以包括中心架以及与中心架连接的一个或多个机臂,一个或多个机臂呈辐射状从中心架延伸出。脚架与机身连接,用于在无人机110着陆时起支撑作用。
动力系统150可以包括一个或多个电子调速器(简称为电调)151、一个或多个螺旋桨153以及与一个或多个螺旋桨153相对应的一个或多个电机152,其中电机152连接在电子调速器151与螺旋桨153之间,电机152和螺旋桨153设置在无人机110的机臂上;电子调速器151用于接收飞行控制系统160产生的驱动信号,并根据驱动信号提供驱动电流给电机152,以控制电机152的转速。电机152用于驱动螺旋桨旋转,从而为无人机110的飞行提供动力, 该动力使得无人机110能够实现一个或多个自由度的运动。在某些实施例中,无人机110可以围绕一个或多个旋转轴旋转。例如,上述旋转轴可以包括横滚轴(Roll)、偏航轴(Yaw)和俯仰轴(pitch)。应理解,电机152可以是直流电机,也可以交流电机。另外,电机152可以是无刷电机,也可以是有刷电机。
飞行控制系统160可以包括飞行控制器161和传感系统162。传感系统162用于测量无人机的姿态信息,即无人机110在空间的位置信息和状态信息,例如,三维位置、三维角度、三维速度、三维加速度和三维角速度等。传感系统162例如可以包括陀螺仪、超声传感器、电子罗盘、惯性测量单元(Inertial Measurement Unit,IMU)、视觉传感器、全球导航卫星系统和气压计等传感器中的至少一种。例如,全球导航卫星系统可以是全球定位系统(Global Positioning System,GPS)。飞行控制器161用于控制无人机110的飞行,例如,可以根据传感系统162测量的姿态信息控制无人机110的飞行。应理解,飞行控制器161可以按照预先编好的程序指令对无人机110进行控制,也可以通过响应来自控制终端140的一个或多个控制指令对无人机110进行控制。
云台120可以包括电机122。云台用于携带拍摄装置123。飞行控制器161可以通过电机122控制云台120的运动。可选地,作为另一实施例,云台120还可以包括控制器,用于通过控制电机122来控制云台120的运动。应理解,云台120可以独立于无人机110,也可以为无人机110的一部分。应理解,电机122可以是直流电机,也可以是交流电机。另外,电机122可以是无刷电机,也可以是有刷电机。还应理解,云台可以位于无人机的顶部,也可以位于无人机的底部。
拍摄装置123例如可以是照相机或摄像机等用于捕获图像的设备,拍摄装置123可以与飞行控制器通信,并在飞行控制器的控制下进行拍摄。本实施例的拍摄装置123至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。可以理解,拍摄装置123也可直接固定于无人机110上,从而云台120可以省略。
显示设备130位于无人飞行系统100的地面端,可以通过无线方式与无 人机110进行通信,并且可以用于显示无人机110的姿态信息。另外,还可以在显示设备130上显示成像装置拍摄的图像。应理解,显示设备130可以是独立的设备,也可以集成在控制终端140中。
控制终端140位于无人飞行系统100的地面端,可以通过无线方式与无人机110进行通信,用于对无人机110进行远程操纵。
另外,无人机110还可以机载有扬声器(图中未示出),该扬声器用于播放音频文件,扬声器可直接固定于无人机110上,也可搭载在云台120上。
本实施例中的拍摄装置123例如可以是单目相机,用于对目标场景进行拍摄,以获取目标场景的图像序列。下面实施例提供的目标场景三维重建方法例如可以由飞行控制器161执行,飞行控制器161通过拍摄装置123获取目标成精的图像序列,实现对目标场景的三维重建,可以用于无人机飞行避障;目标场景三维重建方法例如还可以由位于地面端的控制终端140执行,无人机通过图传技术将拍摄装置123获取的目标场景的图像序列传输至控制终端140,由控制终端140完成对目标场景的三维重建;目标场景三维重建方法例如还可以由位于云端的云服务器(图中未示出)执行,无人机通过图传技术将拍摄装置123获取的目标场景的图像序列传输至云服务器,由云服务器完成对目标场景的三维重建。
应理解,上述对于无人飞行系统各组成部分的命名仅是出于标识的目的,并不应理解为对本发明的实施例的限制。
图2为本发明提供的目标场景三维重建方法一实施例的流程图。如图2所示,本实施例提供的方法可以包括:
S201、获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧。
本实施例中例如可以采用搭载有单目拍摄装置的无人机,对目标场景进行拍摄,以获取目标场景的图像序列。
其中,目标场景为需要进行三维重建的对象。本实施例中在目标场景确定后,可以为无人机规划飞行航线,设置飞行速度和拍摄帧率,以获取目标场景的图像序列,或者,也可以对拍摄地点进行设置,当无人机飞行至预设拍摄地点时,进行拍摄。
本实施例中获取到的目标场景的图像序列,包含在时序上连续的多个图 像帧。
S202、根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云。
本实施例中在获取到目标场景的图像序列之后,为了实现对目标场景的三维重建,则需要根据所获取到的图像序列确定关键帧。其中,关键帧是为了实现三维重建需要进行深度恢复的图像帧。
可选的,本实施例中的关键帧可以包含在时序上连续的多个图像帧中的一帧。
可选的,例如可以将第一帧图像帧作为关键帧,后续根据匹配的特征点的个数,通过阈值进行过滤确定关键帧。
本实施例中例如可以通过对获取到的时序上连续的多个图像帧进行特征提取、特征点匹配、位姿估计等,以确定关键帧的三维点云。为了提高准确性,通常可以选用具有旋转不变性的特征,例如尺度不变特征变换特征(Scale-Invariant Feature Transform,SIFT)、加速稳健特征(Speed Up Robust Features,SURF)等。本实施例中各图像帧在拍摄时的位姿估计可以通过无人机上搭载的传感器,例如里程计、陀螺仪、IMU等获得。
S203、融合所述关键帧的三维点云,获得所述目标场景的三维模型。
本实施例中在获得了关键帧的三维点云之后,则根据三维点云对目标场景进行三维重建。
本实施例提供的目标场景三维重建方法,通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;融合所述关键帧的三维点云,获得所述目标场景的三维模型。实现了无人机航拍场景下,目标场景的三维重建。本实施例提供的目标场景三维重建方法,既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。
在上述实施例的基础上,为了获得真实世界坐标系下的绝对定位信息,本实施例提供的目标场景三维重建方法中,在根据所述图像序列获得关键帧之前,还可以包括:对所述图像序列的三维信息进行初始化。
可选的,可以根据传感器提供的位置信息和姿态信息,对图像序列的三 维信息进行初始化。例如,可以根据实时动态(Real-Time Kinematic,RTK)信息、全球定位系统(Global Positioning System,GPS)信息、云台角信息等,进行初始化。
在一些实施例中,对图像序列的三维信息进行初始化的一种实现方式可以是:根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
根据图像帧的视觉信息,仅能够完成相对定位,无法获得真实世界坐标系下绝对定位信息。因此,本实施例中根据拍摄第一帧图像帧时,无人机机载云台提供的云台角信息,确定视觉坐标系到世界坐标系的初始旋转变换矩阵。根据视觉坐标系到世界坐标系的初始旋转变换矩阵,便可以获取真实世界坐标系下的绝对定位信息。
为了获得更加准确的定位信息,本实施例在确定初始旋转变换矩阵之后,采用其后的N个图像帧对该初始旋转变换矩阵进行校正。N为大于等于1的自然数,具体取值可以根据实际需要进行设置,本实施例对此不做限制。根据第一帧图像帧之后的N个图像帧对应的RTK信息,以及SLAM系统计算出的相机中心,对初始旋转变化矩阵进行校正,以获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息。根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。可以获取世界坐标系下的绝对定位信息。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,在初始化过程中加入视觉坐标系至世界坐标系的转换,通过旋转矩阵、平移矩阵和尺度信息,使得解算的位姿为世界坐标系下可用的位姿信息。
在一些实施例中,基于图像序列获得关键帧的三维点云的一种实现方式可以是:获取所述图像序列的特征信息;根据所述特征信息,进行特征点的跟踪;根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
本实施例中的特征信息例如可以是对图像序列进行特征提取,获取的图像特征。例如尺度不变特征变换特征(Scale-Invariant Feature Transform,SIFT)、 加速稳健特征(Speed Up Robust Features,SURF)等。
例如可以根据当前图像帧与当前关键帧之间的SIFT特征点的匹配关系,确定SIFT特征匹配点对,根据特征匹配点对确定关键帧的三维点云。
在一些实施例中,根据特征信息,进行特征点的跟踪的一种实现方式可以是:
获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
现有基于视觉的方案中通常采用匀速运动模型对相机下一帧的位姿进行估计,由于无人机机动灵敏,其运行通常不符合匀速运动模型,因此基于匀速运动模型估计的位姿将极不准确,进而导致特征点的跟踪数量和精度降低。
为了获得准确的位姿估计,本实施例中根据传感器提供的第一图像帧的RTK信息和云台角信息,对第二图像帧的位姿进行估计。由于采用了传感器提供的准确的RTK信息和云台角信息,因此估计出的第二图像帧的位姿信息的准确度将大幅提升,准确的位姿信息提高了特征匹配的准确度和速度。
本实施例中根据第一位姿信息和第二位姿信息对第一图像帧的特征信息和第二图像帧的特征信息进行特征匹配,具体可以包括:获取第一图像帧和第二图像帧的特征,针对第一图像帧的特征,根据第一位姿信息和第二位姿信息,在第二图像帧中确定相应的搜索范围,进行特征匹配。由于获取了准确的位姿信息,不仅可以确定准确的搜索范围,而且可以大大缩小搜索范围,因此不仅提高了特征匹配的准确率而且提高了特征匹配的速度。
由于无人机飞行速度较快,因此相邻两帧图像之间的重叠率较低,导致特征点跟踪效果差。本实施例中在特征跟踪时,加入对上一帧是否为关键帧的判断,若为关键帧,则用关键帧的特征信息替换上一帧原始的特征信息。由于关键帧有额外的三维点云生成操作,可以在限定的时间内最大限度的利 用重叠区域图像生成的可用三维点云,使得跟踪的有效特征点数量得到提升。
本实施例中在完成特征跟踪之后,需要利用所有的特征点匹配对进行位姿解算。本实施例在位姿解算中加入传感器提供的RTK信息和云台角信息,使得位姿解算精度更高且不易受到误匹配的干扰。解决了现有技术中,基于视觉的方案中,当存在误匹配时,导致位姿解算精度降低甚至出现错误的问题。
在上述实施例的基础上,为了进一步提高目标场景三维重建的准确性,本实施例提供的目标场景三维重建方法,在融合关键帧的三维点云之前,还可以包括:根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
本实施例对于非线性优化所采用的具体算法不做限制,例如可以采用高斯牛顿法、裂纹伯格-马夸尔特方法等。
本实施例中在根据关键帧及其三维点云构建全局一致性的地图之前,根据RTK信息和云台角信息进行优化处理。具体可以包括:
首先维护一个局部地图,该局部地图可以由当前帧、当前帧的共视关键帧及它们所能观测到的点云组成。本实施例在利用非线性优化调整局部地图时,加入每一个参与优化的关键帧对应的RTK信息与云台角信息,使得关键帧的位姿解算及三维点云的位置更加精确。
本实施例通过在非线性优化过程中,引入更将精确的传感器信息,即RTK信息与云台角信息,优化后的代价函数不仅考虑了重投影误差,而且考虑了当前估计的位姿与传感器提供的位姿之间的差距,采用优化后的代价函数可以得到最优的位姿估计。解决了现有技术中仅考虑视觉重投影误差,所带了的稳定性差的问题。
可选的,在实时测量结束后,本实施例还会对所有保留下的关键帧和三维点云进行全局的优化。可以理解的是,在该全局优化中加入RTK信息与云台角信息,使得最终输出的结果更加精确。
在上一实施例的基础上,为了获得更加精准的关键帧的三维点云关键帧,以提高目标场景三维重建的准确度,本实施例提供的目标场景三维重建方法中,可以在所述图像序列中为所述关键帧选取参考帧,然后根据所选取的参考帧,确定所述关键帧的深度图,根据关键帧的深度图获取关键帧的三维点 云。参考帧至少可以包括第一图像帧和第二图像帧。其中,第一图像帧在时序上位于所述关键帧之前,第二图像帧在时序上位于所述关键帧之后。
无人机航拍时,可以沿着规划的航线飞行。当无人机沿着一条航线飞行时,当前图像帧中存在相当大的一部分区域不存在于之前拍摄的图像帧中。也就是说,若参考帧中仅包括当前图像帧之前拍摄的图像帧,根据参考帧确定当前图像帧的深度图时,会存在相当大的一部分区域的视差无解,深度图中必然会存在大片的无效区域。
因此,为了避免关键帧中的区域在参考帧中无相应的匹配区域,而导致该区域对应的深度图无效,本实施例中的参考帧既包括在时序上位于参考帧之前的第一图像帧,也包括在时序上位于参考帧之后的第二图像帧,提高了关键帧与参考帧之间的重叠率,减小了视差无解的区域,进而提高了基于参考帧获得的关键帧的深度图的准确性。
可选的,若关键帧为第N帧,则第一图像帧为第N-1帧,第二图像帧为第N+1帧,即参考帧包括与关键帧相邻的前后两帧。举例来说,若无人机在航拍时,相邻两帧之间的重叠率为70%,若参考帧仅包括关键帧之前的图像帧,则关键帧中至少有30%区域的视差无解。而本实施例提供的参考帧的选取策略,使得关键帧中的全部区域均可以在参考帧中找到与之相匹配的区域,避免了视差无解现象的产生,提高了关键帧的深度图的准确性。
可选的,若关键帧为第N帧,则第一图像帧可以包括第N帧之前预设数量的图像帧,第二图像帧可以包括第N帧之后预设数量的图像帧。
可选的,若关键帧为第N帧,则第一图像帧可以为第N帧之前预设数量的图像帧中的一帧,第二图像帧可以为第N帧之后预设数量的图像帧中的一帧。
在上述任一实施例的基础上,为了提高关键帧的深度图的可靠性,以提高目标场景三维重建的可靠性,本实施例提供的目标场景三维重建方法中,参考帧至少可以包括第三图像帧。其中,第三图像帧与关键帧的极线方向不平行。
本实施例中的极线为对极几何中的极线,即极平面与图像之间的交线。第三图像帧与关键帧的极线方向不平行,也就是说,极平面与第三图像帧的第一交线,与该极平面与关键帧的第二交线,不平行。
当关键帧中存在重复纹理时,若关键帧与参考帧的极线方向平行,则会出现沿着平行极线分布的重复纹理,将会降低该区域对应的深度图的可靠性。因此,本实施例通过选取与关键帧的极线方向不平行的第三图像帧作为参考帧,避免了出现重复纹理沿着平行极线分布的现象,提高了深度图的可靠性。
可选的,第三图像帧可以包括关键帧相邻航带中与关键帧存在重叠像素的图像帧。
可选的,第三图像帧可以为关键帧相邻航带中与关键帧的重叠率最高的图像帧。
下面通过一个具体的示例来说明本发明实施例提供的参考帧的选取方法。图3为本发明提供的目标场景三维重建方法一实施例中参考帧选取的示意图。如图3所示,其中的实线用于表示无人机的飞行航线,航线覆盖了目标场景,箭头表示无人机的飞行方向,飞行航线上的黑色圆圈和黑色正方形表示无人机的拍摄装置在该位置进行拍摄,即黑色圆圈和黑色正方形对应目标场景的一个图像帧。当无人机沿着飞行航线飞行时,通过无人机上搭载的拍摄装置,如单目相机,便可以获取到目标场景的图像序列,包含了在时序上连续的多个图像帧。图3中的M-1、M、M+1、N-1、N、N+1表示图像帧的帧号,N和M为自然数,本实施例对N和M的具体取值不做限制。
若黑色正方形表示的第N帧为关键帧,在一种可能的实现方式中,参考帧可以包括图中所示的第N-1帧和第N+1帧。
若黑色正方形表示的第N帧为关键帧,在又一种可能的实现方式中,参考帧可以包括图中所示的第M帧。
若黑色正方形表示的第N帧为关键帧,在另一种可能的实现方式中,参考帧可以包括图中所示的第M帧、第N-1帧和第N+1帧,即图3中虚线圆圈中包括的图像帧。
可以理解的是,参考帧还可以包括更多的图像帧,例如还可以包括第M-1帧、第M+1帧、第N-2帧等。在具体实现时,可以综合考虑关键帧与参考帧的重叠率以及计算速度,进行选取。
在一些实施例中,基于参考帧获得关键帧的深度图的一种实现方式可以是:根据所述关键帧和所述参考帧之间的像差,获得所述关键帧的深度图。
本实施例中可以根据同一对象在关键帧和参考帧中的像差,获得关键帧的深度图。
在一些实施例中,基于所述图像序列获得所述关键帧的三维点云的一种实现方式可以是:根据所述图像序列,获得所述关键帧的深度图;根据所述关键帧的深度图,获得所述关键帧的三维点云。
在一些实施例中,根据所述图像序列,获得所述关键帧的深度图的一种实现方式可以是:根据所述图像序列,确定所述关键帧对应的匹配代价;根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
本实施例中可以通过对图像序列与关键帧中的像素点进行匹配,以确定关键帧对应的匹配代价。在确定了关键帧对应的匹配代价之后,可以进行匹配代价聚合,然后确定视差,根据视差与深度之间的对应关系,确定关键帧的深度图。可选的,在确定视差之后,还可以进行视差优化,视差加强。根据优化以及加强之后的视差,确定关键帧的深度图。
无人机的飞行高度通常在100米左右,且无人机通常都是垂直朝下进行拍摄的,由于地面高低起伏,对阳光的反射具有差异性,无人机拍摄的图像具有不可忽视的光照变化,光照变化将降低目标场景三维重建的准确性。
在上述任一实施例的基础上,为了提高目标场景三维重建对于光照的鲁棒性,本实施例提供的目标场景三维重建方法中,根据图像序列,确定关键帧对应的匹配代价,可以包括:根据图像序列,确定关键帧对应的第一类型匹配代价和第二类型匹配代价;确定关键帧对应的匹配代价等于第一类型匹配代价和第二类型匹配代价的加权和。
本实施例中在计算匹配代价时,通过将第一类型匹配代价与第二类型匹配代价进行融合,相较于仅采用单一类型匹配代价,提高了匹配代价对于光照的鲁棒性,进而减少了光照变化对于三维重建的影响,提高了三维重建的准确性。本实施例中第一类型匹配代价和第二类型匹配代价的加权系数可以根据具体需要进行设置,本实施例对此不做限制。
可选的,第一类型匹配代价可以基于零均值归一化互相关(Zero-based Normalized Cross Correlation,ZNCC)确定。基于ZNCC可以精确的度量关键帧与参考帧之间的相似性。
可选的,第二类型匹配代价可以基于光照不变特征确定。本实施例中, 可以提取无人机所采集的图像帧中的光照不变特征,例如局部二值模式(Local Binary Patterns,LBP),census序列等,然后可以基于光照不变特征确定第二类型匹配代价。
本实施例中的census序列可以通过如下方式确定:在图像帧中选取任一点,以该点为中心划出一个例如3×3的矩形,矩形中除中心点之外的每一点都与中心点进行比较,灰度值小于中心点即记为1,灰度值大于中心点的则记为0,以所得长度为8的只有0和1的序列作为该中心点的census序列,即中心像素的灰度值被census序列替换。
经过census变换后,可以采用汉明距离确定关键帧的第二类型匹配代价。
例如,关键帧对应的匹配代价可以等于ZNCC和census两种匹配代价的加权和。
在一些实施例中,根据关键帧对应的匹配代价,确定关键帧的深度图的一种实现方式可以是:将关键帧划分成多个图像块;根据图像序列,确定每一个图像块对应的匹配代价;根据每一个所述图像块对应的匹配代价,确定关键帧对应的匹配代价。
本实施例中可以采用如下方式中的一种或者多种将关键帧划分为多个图像块:
(1)采用聚类的方式,将关键帧划分成多个图像块。本实施例中例如可以根据关键帧的色彩信息和/或纹理信息,采用聚类的方式,将关键帧划分成多个图像块。
(2)将关键帧均匀划分成多个图像块。本实施例中例如可以预先设置图像块的数量,然后根据预先设置的图像块的数量,对关键帧进行划分。
(3)将关键帧划分成预设大小的多个图像块。例如可以预先设置图像块的大小,然后根据预先设置的图像块的大小,对关键帧进行划分。
可选的,在将关键帧划分成多个图像块之后,可以根据图像序列,并行确定每一个图像块对应的匹配代价。本实施例中例如可以采用软件和/或硬件的方式并行确定每一个图像块对应的匹配代价。具体的,例如可以采用多线程并行确定每一个图像块对应的匹配代价,和/或,可以采用图形处理器(Graphics Processing Unit,GPU)并行确定每一个图像块对应的匹配代价。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,通过 将关键帧划分成多个图像块,根据图像序列,并行确定每一个图像块对应的匹配代价,然后根据每一个图像块对应的匹配代价,确定关键帧对应的匹配代价,提高了匹配代价的计算速度,进而提高了目标场景三维重建的实时性。
深度采样次数可以根据深度范围和精度确定,深度采样次数与深度范围正相关,与精度负相关。举例来说,若深度范围为50米,精度要求为0.1米,则深度采样次数可以为500。
在确定关键帧的匹配代价时,可以采用预设深度采样次数,也可以采用即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)恢复出关键帧中一些稀疏的三维点,然后根据这些稀疏的三维点确定整个关键帧的深度范围,然后根据整个关键帧的深度范围以及精度要求,确定深度采样次数。若深度采样次数为N,则需要针对关键帧中每一个像素点计算N次匹配代价。对于640*480像素大小的关键帧,需要计算640*480*N次匹配代价。
在上述任一实施例的基础上,为了进一步提高处理速度,提高目标场景三维重建的实时性,本实施例提供的目标场景三维重建方法中,根据图像序列,确定每一个图像块对应的匹配代价,可以包括:根据每一个图像块中的稀疏点确定该图像块的深度采样次数;根据图像序列以及每一个图像块的深度采样次数,确定每一个图像块对应的匹配代价。
需要说明的是,当无人机垂直朝下进行拍摄时,关键帧中可以包含多种拍摄对象,例如行人、汽车、树木、高楼等,因此整个关键帧的深度范围比较大,在预设精度要求下,深度采样次数较大。然而关键帧中各个图像块对应的深度范围是比较小的,比如当一个图像块中仅包括行人时,该图像块对应的深度范围将远远小于整个关键帧的深度范围,在相同精度要求下,可以大幅减小深度采样次数。也就是说,在相同精度要求下,关键帧中图像块的深度采样次数必定小于等于关键帧整体的深度采样次数。
本实施例充分考虑了各个图像块的深度范围,根据各个图像块的深度范围设置深度采样次数,在保证精度的前提下,降低了计算复杂度,提高了速度。
本实施例可以针对每一个图像块,采用SLAM恢复出该图像块中一些稀疏的三维点,根据这些稀疏的三维点确定该图像块的深度范围,根据该图像块的深度范围以及精度要求,确定该图像块的深度采样次数。以确定的深度 采样次数确定每一个图像块对应的匹配代价。
下面通过具体的数值分析来说明本实施例提供的方法如何降低计算复杂度,提高处理速度:
若关键帧为640*480像素大小的图像帧,根据关键帧的深度范围确定深度采样次数为500,则需要计算640*480*500次匹配代价。若将关键帧均匀划分为320*160大小的图像块,根据各个图像块的深度范围确定的6个图像块的深度采样次数分别为100、200、150、100、50和300,则仅需要计算320*160*(100+200+150+100+150+300)次匹配代价。计算量仅为原来的三分之一。
可选的,在确定了关键帧对应的匹配代价之后,可以根据半全局匹配算法(Semi Global Matching,SGM)确定关键帧的深度图。
可以理解的是,由于目标场景中的法向量存在较大偏差且深度采样的本身的离散性,再加上弱纹理以及重复纹理等因素,关键帧的深度图不可避免的会存在大量的随机分布的噪声。
在上述任一实施例的基础上,为了避免深度图中的噪声降低三维重建的准确性,本实施例提供的目标场景三维重建方法中,在获得关键帧的深度图之后,还可以包括:对关键帧的深度图进行滤波处理。通过对关键帧的深度图进行滤波处理,可以滤除深度图中的噪声,提高三维重建的准确性。
可选的,对关键帧的深度图进行滤波处理的一种实现方式可以是:对关键帧的深度图进行三边滤波处理。本实施例中的三边滤波是指滤波过程中的加权系数可以根据像素距离、深度差值和颜色差值三个因素综合确定。
举例来说,在一种滤波处理过程中,滤波模板的大小为5*5,也就是说滤波处理后目标像素点的深度值可以由该像素点以及周围的24个像素点的深度值确定。每一个像素点对于目标像素点的深度值影响的权重值,根据该像素点距离目标像素点的欧式距离、该像素点的深度值与目标像素点的深度值的差值,以及该像素点的RGB值与目标像素点的RGB值的差值确定。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,进一步的通过对关键帧的深度图进行三边滤波处理,通过关键帧中锐利、精细的边缘信息提高了关键帧的深度图边缘的精确性,在保存边缘的前提下,更鲁棒的去除了噪声,使得关键帧的深度图更加精确,基于该深度图的三维重建 也将更加准确。
在一些实施例中,融合关键帧的三维点云,获得目标场景的三维模型的一种实现方式可以是:将关键帧对应的三维点云融合至目标场景对应的体素中;根据目标场景对应的体素,获得目标场景的三维模型。
本实施例中采用了基于体素的三维点云融合方法。由于无人机在作业时,航线在无人机起飞前已经规划好,且无人机都是垂直朝下进行拍摄,因此可以将规划航线的覆盖范围用预设大小的体素表示,可以在将每一帧深度图转化成点云之后,根据点云的三维坐标定位到相应的体素中,将点云的法向量融合成体素的法向量,点云的坐标融合成体素的坐标,用体素保存可见性信息。
本实施例提供的目标场景三维重建方法实时性高、可扩展性高。三维点云融合成体素的计算复杂度是o(1),融合的实时性非常高;对于一次规划任务而言,根据航线规划的特殊性,可以将目标区域分块为多个子块,这样使得点云也具有良好的分块性,有利于点云的加载和后续的多细节层次(Levels of Detail,LOD)显示,便于进行大场景的实时三维重建。
本发明实施例还提供一种目标场景三维重建方法,可以包括:
S301、根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数。
本实施例中的第M帧图像帧可以是无人机拍摄目标场景的图像序列中的任意一帧。可以理解的是,为了尽快完成坐标系的转换,满足系统的实时性要求,可以采用第一帧图像帧。
本实施例中第M帧图像帧对应的云台角信息可以通过无人机机载传感器获取,例如陀螺仪、电子罗盘、IMU、里程计等。本实施例中的云台角信息可以包括以下至少一种信息:横滚轴(Roll)角度、偏航轴(Yaw)角度和俯仰轴(pitch)角度。各角度的正方向可以在视觉坐标系中采用右手螺旋定则确定。
若只根据图像帧的视觉信息,仅能够完成相对定位,无法获得真实世界坐标系下绝对定位信息。因此,本实施例中根据拍摄第M帧图像帧时,无人机机载云台提供的云台角信息,确定视觉坐标系到世界坐标系的初始旋转变换矩阵。根据视觉坐标系到世界坐标系的初始旋转变换矩阵,便可以获取真 实世界坐标系下的绝对定位信息。
S302、根据P个图像帧对应的实时动态RTK信息和相机中心信息,对初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,P个图像帧在时序上位于第M帧图像帧之后,P为大于等于1的自然数。
为了获得更加准确的定位信息,本实施例在确定初始旋转变换矩阵之后,采用其后的P个图像帧对该初始旋转变换矩阵进行校正。P为大于等于1的自然数,具体取值可以根据实际需要进行设置,本实施例对此不做限制。
可选的,P个图像帧可以是在时序上位于第M帧图像帧之后连续的P帧图像帧,也可以是从时序上位于第M帧图像帧之后连续的N帧图像帧中筛选出的符合条件的P帧图像帧,其中,N为大于等于P的自然数。
根据第M帧图像帧之后的P个图像帧对应的RTK信息,以及SLAM系统计算出的相机中心,对初始旋转变化矩阵进行校正,以获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息。通过所获取的尺度信息,可以将视觉坐标系中的位移转化为世界坐标系中对应的真实距离。通过将视觉信息与RTK信息进行融合,信息互补,可以获得绝对定位信息。
S303、根据旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取图像序列在世界坐标系下对应的位姿信息。
本实施例中在确定旋转矩阵、平移矩阵和尺度信息之后,便可以根据旋转矩阵、平移矩阵和尺度信息获取图像序列在世界坐标系下对应的位姿信息。
S304、根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪。
S305、根据特征点的跟踪结果,获得目标场景的三维模型。
本实施例提供的目标场景三维重建方法,根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,并根据位于第M帧图像帧之后的P个图像帧对应的实时动态RTK信息和相机中心信息,对初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,然后根据旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,并根据特征点的跟踪结果,获得目标场景的三维模型。本实施例中获 得的位姿信息为世界坐标系下可用的位姿信息,可以获得目标场景在世界坐标系中的三维模型。
在一些实施例中,根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪的一种实现方式可以是:
获取第一图像帧在世界坐标系中的第一位姿信息,第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,第二位姿信息包括:第二RTK信息和第二云台角信息;
根据第一位姿信息和第二位姿信息对第一图像帧的特征信息和第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,第一图像帧和第二图像帧为图像序列中相邻的两帧。
在一些实施例中,根据特征点的跟踪结果,获得目标场景的三维模型的一种实现方式可以是:
根据特征点的跟踪结果,确定关键帧的三维点云;
根据关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对关键帧的位姿信息及三维点云的位置进行优化;
根据优化后的关键帧的位姿信息及三维点云的位置,获得目标场景的三维模型。
图4为本发明提供的目标场景三维重建方法一实施例的示意性框图。如图4所示,本实施例提供的目标场景三维重建方法可以通过两个线程实现,即跟踪线程和建图线程。其中,跟踪线程包括了初始化、跟踪上一帧特征点、跟踪局部地图特征点及帧间位姿解算等步骤。例如,可以通过无人机所搭载的拍摄装置获取图像信息。然后根据无人机平台的传感器提供的RTK信息和云台角,进行初始化、跟踪上一帧特征点及帧间位姿解算等操作。建图线程包括了局部地图生成以及局部地图优化等步骤。本实施例中可以根据无人机平台的传感器提供的RTK信息和云台角,进行局部地图优化。本实施例中的局部地图,例如可以是由当前帧、当前帧的共视关键帧及它们所能观测到的点云组成的。由于引入了传感器提供的较为精确的RTK信息和云台角信息,整个系统不再仅仅依赖于视觉测量,因此 整个系统的鲁棒性得到提升,针对视觉信息较差的情况本实施例仍能够较好的处理。由于较精确的RTK信息及云台角信息的引入,本实施例中的帧间位姿解算具有更高的位姿确定精度,使得即使在视觉特征点误匹配较多的情况下,依然能够获得精确的位姿信息。本实施例中由于特征追踪策略的优化,对于相邻图像帧重叠率较低场景,具有更好的特征跟踪效果,使得特征跟踪不易丢失。本实施例中的跟踪线程和建图线程可以并行执行,以提高目标场景三维重建的速度,提高实时性。
图5为本发明提供的目标场景三维重建系统一实施例的结构示意图。如图5所示,本实施例提供的目标场景三维重建系统500可以包括:处理器501和存储器502。处理器501与存储器502通过总线通信连接。上述处理器501可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。上述存储器502可以是,但不限于,随机存取存储器(Random Access Memory,简称:RAM),只读存储器(Read Only Memory,简称:ROM),可编程只读存储器(Programmable Read-Only Memory,简称:PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,简称:EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,简称:EEPROM)等。
存储器502,用于存储程序代码;处理器501调用程序代码,当程序代码被执行时,用于执行以下操作:
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
融合所述关键帧的三维点云,获得所述目标场景的三维模型。
可选的,处理器501还用于:
在所述根据所述图像序列获得关键帧之前,对所述图像序列的三维信息 进行初始化。
可选的,处理器501用于对所述图像序列的三维信息进行初始化,具体可以包括:
根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;
根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;
根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
可选的,处理器501用于基于所述图像序列获得所述关键帧的三维点云,具体可以包括:
获取所述图像序列的特征信息;
根据所述特征信息,进行特征点的跟踪;
根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
可选的,处理器501用于根据所述特征信息,进行特征点的跟踪,具体可以包括:
获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
可选的,处理器501还用于在所述融合所述关键帧的三维点云之前,根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
可选的,处理器501用于基于所述图像序列获得所述关键帧的三维点云,具体可以包括:
根据所述图像序列,获得所述关键帧的深度图;
根据所述关键帧的深度图,获得所述关键帧的三维点云。
可选的,处理器501用于根据所述图像序列,获得所述关键帧的深度图,具体可以包括:
根据所述图像序列,确定所述关键帧对应的匹配代价;
根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
可选的,处理器501用于根据所述图像序列,确定所述关键帧对应的匹配代价,具体可以包括:
根据所述图像序列,确定所述关键帧对应的第一类型匹配代价和第二类型匹配代价;
确定所述关键帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和。
可选的,所述第一类型匹配代价基于零均值归一化互相关确定。
可选的,所述第二类型匹配代价基于光照不变特征确定。
可选的,处理器501用于根据所述关键帧对应的匹配代价,确定所述关键帧的深度图,具体可以包括:
将所述关键帧划分成多个图像块;
根据所述图像序列,确定每一个图像块对应的匹配代价;
根据每一个所述图像块对应的匹配代价,确定所述关键帧对应的匹配代价。
可选的,处理器501用于将所述关键帧划分成多个图像块,具体可以包括:
采用聚类的方式,将所述关键帧划分成多个图像块。
可选的,处理器501用于将所述关键帧划分成多个图像块,具体可以包括:
将所述关键帧均匀划分成多个图像块。
可选的,处理器501用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体可以包括:
根据所述图像序列,并行确定每一个图像块对应的匹配代价。
可选的,处理器501用于根据所述图像序列,确定每一个图像块对应的 匹配代价,具体可以包括:
根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
根据所述图像序列以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
可选的,处理器501还用于在所述根据所述图像序列,获得所述关键帧的深度图之后,对所述关键帧的深度图进行滤波处理。
可选的,处理器501用于对所述关键帧的深度图进行滤波处理,具体包括:
对所述关键帧的深度图进行三边滤波处理。
可选的,处理器501用于融合所述关键帧的三维点云,获得所述目标场景的三维模型,具体可以包括:
将所述关键帧对应的三维点云融合至所述目标场景对应的体素中;
根据所述目标场景对应的体素,获得所述目标场景的三维模型。
本发明实施例还提供一种目标场景三维重建系统,包括:处理器和存储器,其具体实现可以参考图5所示的目标场景三维重建系统的结构性示意图。其中,存储器,用于存储程序代码;处理器,调用程序代码,当程序代码被执行时,用于执行以下操作:
根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
根据P个图像帧对应的实时动态RTK信息和相机中心信息,对初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,P个图像帧在时序上位于第M帧图像帧之后,P为大于等于1的自然数;
根据旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取图像序列在世界坐标系下对应的位姿信息;
根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
根据特征点的跟踪结果,获得目标场景的三维模型。
可选的,处理器用于,根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,具体可以包括:
获取第一图像帧在世界坐标系中的第一位姿信息,第一位姿信息包括: 第一实时动态RTK信息和第一云台角信息;
根据第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,第二位姿信息包括:第二RTK信息和第二云台角信息;
根据第一位姿信息和第二位姿信息对第一图像帧的特征信息和第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,第一图像帧和第二图像帧为图像序列中相邻的两帧。
可选的,处理器,用于根据特征点的跟踪结果,获得目标场景的三维模型,具体可以包括:
根据特征点的跟踪结果,确定关键帧的三维点云;
根据关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对关键帧的位姿信息及三维点云的位置进行优化;
根据优化后的关键帧的位姿信息及三维点云的位置,获得目标场景的三维模型。
图6为本发明提供的无人机一实施例的结构示意图。如图6所示,本实施例提供的无人机600可以包括处理器601。该处理器601可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
无人机600上搭载有拍摄装置602,拍摄装置602用于对目标场景进行拍摄。
处理器601用于,获取目标场景的图像序列,图像序列包含在时序上连续的多个图像帧;
根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
融合所述关键帧的三维点云,获得所述目标场景的三维模型。
可选的,所述处理器601还用于在所述根据所述图像序列获得关键帧之 前,对所述图像序列的三维信息进行初始化。
可选的,所述处理器601用于对所述图像序列的三维信息进行初始化,具体可以包括:
根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;
根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;
根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
可选的,所述处理器601用于基于所述图像序列获得所述关键帧的三维点云,具体可以包括:
获取所述图像序列的特征信息;
根据所述特征信息,进行特征点的跟踪;
根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
可选的,所述处理器601用于根据所述特征信息,进行特征点的跟踪,具体可以包括:
获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
可选的,所述处理器601还用于在所述融合所述关键帧的三维点云之前,根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
可选的,所述处理器601用于基于所述图像序列获得所述关键帧的三维点云,具体可以包括:
根据所述图像序列,获得所述关键帧的深度图;
根据所述关键帧的深度图,获得所述关键帧的三维点云。
可选的,所述处理器601用于根据所述图像序列,获得所述关键帧的深度图,具体可以包括:
根据所述图像序列,确定所述关键帧对应的匹配代价;
根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
可选的,所述处理器601用于根据所述图像序列,确定所述关键帧对应的匹配代价,具体可以包括:
根据所述图像序列,确定所述关键帧对应的第一类型匹配代价和第二类型匹配代价;
确定所述关键帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和。
可选的,所述第一类型匹配代价基于零均值归一化互相关确定。
可选的,所述第二类型匹配代价基于光照不变特征确定。
可选的,所述处理器601用于根据所述关键帧对应的匹配代价,确定所述关键帧的深度图,具体可以包括:
将所述关键帧划分成多个图像块;
根据所述图像序列,确定每一个图像块对应的匹配代价;
根据每一个所述图像块对应的匹配代价,确定所述关键帧对应的匹配代价。
可选的,所述处理器601用于将所述关键帧划分成多个图像块,具体可以包括:
采用聚类的方式,将所述关键帧划分成多个图像块。
可选的,所述处理器601用于将所述关键帧划分成多个图像块,具体可以包括:
将所述关键帧均匀划分成多个图像块。
可选的,所述处理器601用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体可以包括:
根据所述图像序列,并行确定每一个图像块对应的匹配代价。
可选的,所述处理器601用于根据所述图像序列,确定每一个图像块对 应的匹配代价,具体可以包括:
根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
根据所述图像序列以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
可选的,所述处理器601还用于在所述根据所述图像序列,获得所述关键帧的深度图之后,对所述关键帧的深度图进行滤波处理。
可选的,所述处理器601用于对所述关键帧的深度图进行滤波处理,具体可以包括:
对所述关键帧的深度图进行三边滤波处理。
可选的,所述处理器601用于融合所述关键帧的三维点云,获得所述目标场景的三维模型,具体可以包括:
将所述关键帧对应的三维点云融合至所述目标场景对应的体素中;
根据所述目标场景对应的体素,获得所述目标场景的三维模型。
本发明实施例还提供一种无人机,其具体实现可以参考图6所示的无人机的结构示意图,可以包括:处理器;无人机上搭载有拍摄装置,拍摄装置用于对目标场景进行拍摄;处理器用于,
根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
根据P个图像帧对应的实时动态RTK信息和相机中心信息,对初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,P个图像帧在时序上位于第M帧图像帧之后,P为大于等于1的自然数;
根据旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取图像序列在世界坐标系下对应的位姿信息;
根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
根据特征点的跟踪结果,获得目标场景的三维模型。
可选的,处理器,用于根据图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,具体可以包括:
获取第一图像帧在世界坐标系中的第一位姿信息,第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,第二位姿信息包括:第二RTK信息和第二云台角信息;
根据第一位姿信息和第二位姿信息对第一图像帧的特征信息和第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,第一图像帧和第二图像帧为图像序列中相邻的两帧。
可选的,处理器,用于根据特征点的跟踪结果,获得目标场景的三维模型,具体可以包括:
根据特征点的跟踪结果,确定关键帧的三维点云;
根据关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对关键帧的位姿信息及三维点云的位置进行优化;
根据优化后的关键帧的位姿信息及三维点云的位置,获得目标场景的三维模型。
本发明实施例还提供一种目标场景三维重建装置(例如芯片、集成电路等),包括:存储器和处理器。所述存储器,用于存储执行目标场景三维重建方法的代码。所述处理器,用于调用所述存储器中存储的所述代码,执行上述任一方法实施例所述的目标场景三维重建方法。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至少一段代码可由计算机执行,以控制所述计算机执行上述任一方法实施例所述的目标场景三维重建方法。
本发明实施例提供一种计算机程序,当所述计算机程序被计算机执行时,用于实现上述任一方法实施例所述的目标场景三维重建方法。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:只读内存(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对 其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (57)

  1. 一种目标场景三维重建方法,其特征在于,包括:
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
    融合所述关键帧的三维点云,获得所述目标场景的三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述图像序列获得关键帧之前,所述方法还包括:
    对所述图像序列的三维信息进行初始化。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述图像序列的三维信息进行初始化,包括:
    根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;
    根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;
    根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述图像序列获得所述关键帧的三维点云,包括:
    获取所述图像序列的特征信息;
    根据所述特征信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述特征信息,进行特征点的跟踪,包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  6. 根据权利要求1所述的方法,其特征在于,所述融合所述关键帧的三维点云之前,所述方法还包括:
    根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述基于所述图像序列获得所述关键帧的三维点云,包括:
    根据所述图像序列,获得所述关键帧的深度图;
    根据所述关键帧的深度图,获得所述关键帧的三维点云。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述图像序列,获得所述关键帧的深度图,包括:
    根据所述图像序列,确定所述关键帧对应的匹配代价;
    根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述图像序列,确定所述关键帧对应的匹配代价,包括:
    根据所述图像序列,确定所述关键帧对应的第一类型匹配代价和第二类型匹配代价;
    确定所述关键帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和;
    其中,所述第一类型匹配代价基于零均值归一化互相关确定,所述第二类型匹配代价基于光照不变特征确定。
  10. 根据权利要求8所述的方法,其特征在于,所述根据所述关键帧对应的匹配代价,确定所述关键帧的深度图,包括:
    将所述关键帧划分成多个图像块;
    根据所述图像序列,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述关键帧对应的匹配代价。
  11. 根据权利要求10所述的方法,其特征在于,所述将所述关键帧划分成多个图像块,包括:
    采用聚类的方式,将所述关键帧划分成多个图像块;
    或者,
    将所述关键帧均匀划分成多个图像块。
  12. 根据权利要求10所述的方法,其特征在于,所述根据所述图像序列,确定每一个图像块对应的匹配代价,包括:
    根据所述图像序列,并行确定每一个图像块对应的匹配代价。
  13. 根据权利要求10所述的方法,其特征在于,所述根据所述图像序列,确定每一个图像块对应的匹配代价,包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述图像序列以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  14. 根据权利要求7所述的方法,其特征在于,所述根据所述图像序列,获得所述关键帧的深度图之后,还包括:
    对所述关键帧的深度图进行滤波处理。
  15. 根据权利要求14所述的方法,其特征在于,所述对所述关键帧的深度图进行滤波处理,包括:
    对所述关键帧的深度图进行三边滤波处理。
  16. 根据权利要求1所述的方法,其特征在于,所述融合所述关键帧的三维点云,获得所述目标场景的三维模型,包括:
    将所述关键帧对应的三维点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
  17. 一种目标场景三维重建方法,其特征在于,包括:
    根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
    根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
    根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
    根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  19. 根据权利要求17所述的方法,其特征在于,所述根据所述特征点的跟踪结果,获得所述目标场景的三维模型,包括:
    根据所述特征点的跟踪结果,确定关键帧的三维点云;
    根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化;
    根据优化后的关键帧的位姿信息及三维点云的位置,获得所述目标场景的三维模型。
  20. 一种目标场景三维重建系统,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储程序代码;
    所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
    融合所述关键帧的三维点云,获得所述目标场景的三维模型。
  21. 根据权利要求20所述的系统,其特征在于,所述处理器还用于:
    在所述根据所述图像序列获得关键帧之前,对所述图像序列的三维信息进行初始化。
  22. 根据权利要求21所述的系统,其特征在于,所述处理器,用于对所述图像序列的三维信息进行初始化,具体包括:
    根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;
    根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;
    根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
  23. 根据权利要求20所述的系统,其特征在于,所述处理器,用于基于所述图像序列获得所述关键帧的三维点云,具体包括:
    获取所述图像序列的特征信息;
    根据所述特征信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
  24. 根据权利要求23所述的系统,其特征在于,所述处理器,用于根据所述特征信息,进行特征点的跟踪,具体包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  25. 根据权利要求20所述的系统,其特征在于,所述处理器,还用于在所述融合所述关键帧的三维点云之前,根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位 置进行优化。
  26. 根据权利要求20-25任一项所述的系统,其特征在于,所述处理器,用于基于所述图像序列获得所述关键帧的三维点云,具体包括:
    根据所述图像序列,获得所述关键帧的深度图;
    根据所述关键帧的深度图,获得所述关键帧的三维点云。
  27. 根据权利要求26所述的系统,其特征在于,所述处理器,用于根据所述图像序列,获得所述关键帧的深度图,具体包括:
    根据所述图像序列,确定所述关键帧对应的匹配代价;
    根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
  28. 根据权利要求27所述的系统,其特征在于,所述处理器,用于根据所述图像序列,确定所述关键帧对应的匹配代价,具体包括:
    根据所述图像序列,确定所述关键帧对应的第一类型匹配代价和第二类型匹配代价;
    确定所述关键帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和;
    其中,所述第一类型匹配代价基于零均值归一化互相关确定,所述第二类型匹配代价基于光照不变特征确定。
  29. 根据权利要求27所述的系统,其特征在于,所述处理器,用于根据所述关键帧对应的匹配代价,确定所述关键帧的深度图,具体包括:
    将所述关键帧划分成多个图像块;
    根据所述图像序列,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述关键帧对应的匹配代价。
  30. 根据权利要求29所述的系统,其特征在于,所述处理器,用于将所述关键帧划分成多个图像块,具体包括:
    采用聚类的方式,将所述关键帧划分成多个图像块;
    或者,
    将所述关键帧均匀划分成多个图像块。
  31. 根据权利要求29所述的系统,其特征在于,所述处理器,用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体包括:
    根据所述图像序列,并行确定每一个图像块对应的匹配代价。
  32. 根据权利要求29所述的系统,其特征在于,所述处理器,用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述图像序列以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  33. 根据权利要求26所述的系统,其特征在于,所述处理器,还用于在所述根据所述图像序列,获得所述关键帧的深度图之后,对所述关键帧的深度图进行滤波处理。
  34. 根据权利要求33所述的系统,其特征在于,所述处理器,用于对所述关键帧的深度图进行滤波处理,具体包括:
    对所述关键帧的深度图进行三边滤波处理。
  35. 根据权利要求20所述的系统,其特征在于,所述处理器,用于融合所述关键帧的三维点云,获得所述目标场景的三维模型,具体包括:
    将所述关键帧对应的三维点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
  36. 一种目标场景三维重建系统,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储程序代码;
    所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
    根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
    根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
    根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
    根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
  37. 根据权利要求36所述的系统,其特征在于,所述处理器用于,根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,具体包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  38. 根据权利要求36所述的系统,其特征在于,所述处理器,用于根据所述特征点的跟踪结果,获得所述目标场景的三维模型,具体包括:
    根据所述特征点的跟踪结果,确定关键帧的三维点云;
    根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化;
    根据优化后的关键帧的位姿信息及三维点云的位置,获得所述目标场景的三维模型。
  39. 一种无人机,其特征在于,包括:处理器;
    所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
    所述处理器用于,
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述图像序列获得关键帧,并基于所述图像序列获得所述关键帧的三维点云;
    融合所述关键帧的三维点云,获得所述目标场景的三维模型。
  40. 根据权利要求39所述的无人机,其特征在于,所述处理器,还用于在所述根据所述图像序列获得关键帧之前,对所述图像序列的三维信息进行初始化。
  41. 根据权利要求40所述的无人机,其特征在于,所述处理器,用于对 所述图像序列的三维信息进行初始化,具体包括:
    根据第一帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵;
    根据N个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息;
    根据所述旋转矩阵、平移矩阵和尺度信息对所述图像序列的三维信息进行初始化。
  42. 根据权利要求39所述的无人机,其特征在于,所述处理器,用于基于所述图像序列获得所述关键帧的三维点云,具体包括:
    获取所述图像序列的特征信息;
    根据所述特征信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,确定所述关键帧的三维点云。
  43. 根据权利要求42所述的无人机,其特征在于,所述处理器,用于根据所述特征信息,进行特征点的跟踪,具体包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  44. 根据权利要求39所述的无人机,其特征在于,所述处理器,还用于在所述融合所述关键帧的三维点云之前,根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
  45. 根据权利要求39-44任一项所述的无人机,其特征在于,所述处理器,用于基于所述图像序列获得所述关键帧的三维点云,具体包括:
    根据所述图像序列,获得所述关键帧的深度图;
    根据所述关键帧的深度图,获得所述关键帧的三维点云。
  46. 根据权利要求45所述的无人机,其特征在于,所述处理器,用于根据所述图像序列,获得所述关键帧的深度图,具体包括:
    根据所述图像序列,确定所述关键帧对应的匹配代价;
    根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
  47. 根据权利要求46所述的无人机,其特征在于,所述处理器,用于根据所述图像序列,确定所述关键帧对应的匹配代价,具体包括:
    根据所述图像序列,确定所述关键帧对应的第一类型匹配代价和第二类型匹配代价;
    确定所述关键帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和;
    其中,所述第一类型匹配代价基于零均值归一化互相关确定,所述第二类型匹配代价基于光照不变特征确定。
  48. 根据权利要求46所述的无人机,其特征在于,所述处理器,用于根据所述关键帧对应的匹配代价,确定所述关键帧的深度图,具体包括:
    将所述关键帧划分成多个图像块;
    根据所述图像序列,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述关键帧对应的匹配代价。
  49. 根据权利要求48所述的无人机,其特征在于,所述处理器,用于将所述关键帧划分成多个图像块,具体包括:
    采用聚类的方式,将所述关键帧划分成多个图像块;
    或者,
    将所述关键帧均匀划分成多个图像块。
  50. 根据权利要求48所述的无人机,其特征在于,所述处理器,用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体包括:
    根据所述图像序列,并行确定每一个图像块对应的匹配代价。
  51. 根据权利要求48所述的无人机,其特征在于,所述处理器,用于根据所述图像序列,确定每一个图像块对应的匹配代价,具体包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述图像序列以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  52. 根据权利要求45所述的无人机,其特征在于,所述处理器,还用于在所述根据所述图像序列,获得所述关键帧的深度图之后,对所述关键帧的深度图进行滤波处理。
  53. 根据权利要求52所述的无人机,其特征在于,所述处理器,用于对所述关键帧的深度图进行滤波处理,具体包括:
    对所述关键帧的深度图进行三边滤波处理。
  54. 根据权利要求39所述的无人机,其特征在于,所述处理器,用于融合所述关键帧的三维点云,获得所述目标场景的三维模型,具体包括:
    将所述关键帧对应的三维点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
  55. 一种无人机,其特征在于,包括:处理器;
    所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
    所述处理器用于,
    根据第M帧图像帧对应的云台角信息,获取视觉坐标系到世界坐标系的初始旋转变换矩阵,M为大于等于1的自然数;
    根据P个图像帧对应的实时动态RTK信息和相机中心信息,对所述初始旋转变换矩阵进行校正,获取视觉坐标系到世界坐标系的旋转矩阵、平移矩阵和尺度信息,所述P个图像帧在时序上位于所述第M帧图像帧之后,P为大于等于1的自然数;
    根据所述旋转矩阵、平移矩阵和尺度信息以及目标场景的图像序列,获取所述图像序列在世界坐标系下对应的位姿信息;
    根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪;
    根据所述特征点的跟踪结果,获得所述目标场景的三维模型。
  56. 根据权利要求55所述的无人机,其特征在于,所述处理器,用于根据所述图像序列在世界坐标系下对应的位姿信息,进行特征点的跟踪,具体包括:
    获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
    根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
    根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
    根据特征匹配结果,进行特征点的跟踪;
    其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
  57. 根据权利要求55所述的无人机,其特征在于,所述处理器,用于根据所述特征点的跟踪结果,获得所述目标场景的三维模型,具体包括:
    根据所述特征点的跟踪结果,确定关键帧的三维点云;
    根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化;
    根据优化后的关键帧的位姿信息及三维点云的位置,获得所述目标场景的三维模型。
PCT/CN2018/119190 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机 WO2020113423A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880072188.7A CN111433818A (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机
PCT/CN2018/119190 WO2020113423A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/119190 WO2020113423A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Publications (1)

Publication Number Publication Date
WO2020113423A1 true WO2020113423A1 (zh) 2020-06-11

Family

ID=70974832

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/119190 WO2020113423A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Country Status (2)

Country Link
CN (1) CN111433818A (zh)
WO (1) WO2020113423A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634370A (zh) * 2020-12-31 2021-04-09 广州极飞科技有限公司 一种无人机打点方法、装置、设备及存储介质
CN112927271A (zh) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 图像处理方法、图像处理装置、存储介质与电子设备
CN113240615A (zh) * 2021-05-20 2021-08-10 北京城市网邻信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN117475358A (zh) * 2023-12-27 2024-01-30 广东南方电信规划咨询设计院有限公司 一种基于无人机视觉的碰撞预测方法及装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288817B (zh) * 2020-11-18 2024-05-07 Oppo广东移动通信有限公司 基于图像的三维重建处理方法及装置
CN112767534B (zh) * 2020-12-31 2024-02-09 北京达佳互联信息技术有限公司 视频图像处理方法、装置、电子设备及存储介质
CN113190515B (zh) * 2021-05-14 2022-11-29 重庆市勘测院 基于异构并行计算的城市级海量点云坐标转换方法
CN113884025B (zh) * 2021-09-16 2024-05-03 河南垂天智能制造有限公司 增材制造结构光回环检测方法、装置、电子设备和存储介质
CN113985436A (zh) * 2021-11-04 2022-01-28 广州中科云图智能科技有限公司 基于slam的无人机三维地图构建与定位方法及装置
CN114170146A (zh) * 2021-11-12 2022-03-11 苏州瑞派宁科技有限公司 图像处理方法、装置、电子设备以及计算机可读存储介质
CN114429495B (zh) * 2022-03-14 2022-08-30 荣耀终端有限公司 一种三维场景的重建方法和电子设备
CN115311424B (zh) * 2022-08-02 2023-04-07 深圳市华赛睿飞智能科技有限公司 一种目标场景的三维重建方法、装置、无人机及存储介质
CN116452776B (zh) * 2023-06-19 2023-10-20 国网浙江省电力有限公司湖州供电公司 基于视觉同步定位与建图系统的低碳变电站场景重建方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568026A (zh) * 2011-12-12 2012-07-11 浙江大学 一种多视点自由立体显示的三维增强现实方法
CN103679674A (zh) * 2013-11-29 2014-03-26 航天恒星科技有限公司 一种无人飞行器实时图像拼接方法及系统
CN104537709A (zh) * 2014-12-15 2015-04-22 西北工业大学 一种基于位姿变化的实时三维重建关键帧确定方法
CN107945220A (zh) * 2017-11-30 2018-04-20 华中科技大学 一种基于双目视觉的重建方法
CN108335353A (zh) * 2018-02-23 2018-07-27 清华-伯克利深圳学院筹备办公室 动态场景的三维重建方法、装置和系统、服务器、介质
CN108846857A (zh) * 2018-06-28 2018-11-20 清华大学深圳研究生院 视觉里程计的测量方法及视觉里程计

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750687A (zh) * 2011-09-23 2012-10-24 新奥特(北京)视频技术有限公司 一种摄像机参数标定和三维点云生成方法和装置
CN106570507B (zh) * 2016-10-26 2019-12-27 北京航空航天大学 单目视频场景三维结构的多视角一致的平面检测解析方法
CN107481288A (zh) * 2017-03-31 2017-12-15 触景无限科技(北京)有限公司 双目摄像头的内外参确定方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568026A (zh) * 2011-12-12 2012-07-11 浙江大学 一种多视点自由立体显示的三维增强现实方法
CN103679674A (zh) * 2013-11-29 2014-03-26 航天恒星科技有限公司 一种无人飞行器实时图像拼接方法及系统
CN104537709A (zh) * 2014-12-15 2015-04-22 西北工业大学 一种基于位姿变化的实时三维重建关键帧确定方法
CN107945220A (zh) * 2017-11-30 2018-04-20 华中科技大学 一种基于双目视觉的重建方法
CN108335353A (zh) * 2018-02-23 2018-07-27 清华-伯克利深圳学院筹备办公室 动态场景的三维重建方法、装置和系统、服务器、介质
CN108846857A (zh) * 2018-06-28 2018-11-20 清华大学深圳研究生院 视觉里程计的测量方法及视觉里程计

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634370A (zh) * 2020-12-31 2021-04-09 广州极飞科技有限公司 一种无人机打点方法、装置、设备及存储介质
CN112927271A (zh) * 2021-03-31 2021-06-08 Oppo广东移动通信有限公司 图像处理方法、图像处理装置、存储介质与电子设备
CN112927271B (zh) * 2021-03-31 2024-04-05 Oppo广东移动通信有限公司 图像处理方法、图像处理装置、存储介质与电子设备
CN113240615A (zh) * 2021-05-20 2021-08-10 北京城市网邻信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN113240615B (zh) * 2021-05-20 2022-06-07 北京城市网邻信息技术有限公司 图像处理方法、装置、电子设备和计算机可读存储介质
CN117475358A (zh) * 2023-12-27 2024-01-30 广东南方电信规划咨询设计院有限公司 一种基于无人机视觉的碰撞预测方法及装置
CN117475358B (zh) * 2023-12-27 2024-04-23 广东南方电信规划咨询设计院有限公司 一种基于无人机视觉的碰撞预测方法及装置

Also Published As

Publication number Publication date
CN111433818A (zh) 2020-07-17

Similar Documents

Publication Publication Date Title
WO2020113423A1 (zh) 目标场景三维重建方法、系统及无人机
US11915502B2 (en) Systems and methods for depth map sampling
JP7252943B2 (ja) 航空機のための対象物検出及び回避
US20210141378A1 (en) Imaging method and device, and unmanned aerial vehicle
Won et al. Sweepnet: Wide-baseline omnidirectional depth estimation
JP6496323B2 (ja) 可動物体を検出し、追跡するシステム及び方法
US9420265B2 (en) Tracking poses of 3D camera using points and planes
WO2020172875A1 (zh) 道路结构信息的提取方法、无人机及自动驾驶系统
US11064178B2 (en) Deep virtual stereo odometry
WO2019119328A1 (zh) 一种基于视觉的定位方法及飞行器
CN106873619B (zh) 一种无人机飞行路径的处理方法
CN112567201A (zh) 距离测量方法以及设备
CN111527463A (zh) 用于多目标跟踪的方法和系统
CN108171715B (zh) 一种图像分割方法及装置
WO2019104571A1 (zh) 图像处理方法和设备
WO2019126930A1 (zh) 测距方法、装置以及无人机
Eynard et al. Real time UAV altitude, attitude and motion estimation from hybrid stereovision
CN113228103A (zh) 目标跟踪方法、装置、无人机、系统及可读存储介质
WO2021081774A1 (zh) 一种参数优化方法、装置及控制设备、飞行器
CN116359873A (zh) 结合鱼眼相机实现车端4d毫米波雷达slam处理的方法、装置、处理器及其存储介质
Bazin et al. UAV attitude estimation by vanishing points in catadioptric images
WO2020113417A1 (zh) 目标场景三维重建方法、系统及无人机
WO2021217450A1 (zh) 目标跟踪方法、设备及存储介质
WO2014203743A1 (en) Method for registering data using set of primitives
US20210256732A1 (en) Image processing method and unmanned aerial vehicle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18942339

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18942339

Country of ref document: EP

Kind code of ref document: A1