WO2020113417A1 - 目标场景三维重建方法、系统及无人机 - Google Patents

目标场景三维重建方法、系统及无人机 Download PDF

Info

Publication number
WO2020113417A1
WO2020113417A1 PCT/CN2018/119153 CN2018119153W WO2020113417A1 WO 2020113417 A1 WO2020113417 A1 WO 2020113417A1 CN 2018119153 W CN2018119153 W CN 2018119153W WO 2020113417 A1 WO2020113417 A1 WO 2020113417A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
target
target frame
image
matching cost
Prior art date
Application number
PCT/CN2018/119153
Other languages
English (en)
French (fr)
Inventor
杨志华
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2018/119153 priority Critical patent/WO2020113417A1/zh
Priority to CN201880073770.5A priority patent/CN111433819A/zh
Publication of WO2020113417A1 publication Critical patent/WO2020113417A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • Embodiments of the present invention relate to the technical field of drones, and in particular, to a three-dimensional reconstruction method and system of a target scene and a drone.
  • the three-dimensional reconstruction based on the image sequence may generally include: three-dimensional reconstruction based on color image and depth image (Red-Green-Blue-Depth Map, RGB-D) data, three-dimensional reconstruction based on binocular and three-dimensional reconstruction based on monocular.
  • Three-dimensional reconstruction based on RGB-D data is limited by the depth of the depth sensor, and can usually only be used in scenes with relatively limited indoors.
  • Three-dimensional reconstruction based on binocular relies on binocular vision system, and the hardware cost is high. Therefore, 3D reconstruction based on a single object is of great significance for the reconstruction of the 3D model of the shooting scene.
  • Single-purpose three-dimensional reconstruction refers to the use of a single camera, through the movement of the camera, according to the image movement of objects on different images to determine the depth map, and then fusion depth map to achieve three-dimensional reconstruction. Due to the particularity of UAV aerial photography, the existing single-purpose three-dimensional reconstruction method in the UAV aerial photography scene, the depth map calculation results are poor, and the three-dimensional reconstruction error is large. In summary, there is an urgent need for a three-dimensional reconstruction method of target scenes that can meet the requirements of drone aerial photography scenes.
  • Embodiments of the present invention provide a three-dimensional reconstruction method and system for a target scene and a drone, to solve the existing method cannot meet the needs of the three-dimensional reconstruction of the target scene in the aerial photography scene of the drone.
  • an embodiment of the present invention provides a three-dimensional reconstruction method for a target scene, including:
  • an embodiment of the present invention provides a three-dimensional reconstruction system for a target scene, including: a processor and a memory;
  • the memory is used to store program codes
  • the processor calls the program code, and when the program code is executed, it is used to perform the following operations:
  • an embodiment of the present invention provides a drone, including: a processor;
  • the drone is equipped with a shooting device, and the shooting device is used to shoot a target scene;
  • the processor is used for,
  • an embodiment of the present invention provides a three-dimensional reconstruction device (eg, chip, integrated circuit, etc.) of a target scene, including: a memory and a processor.
  • the memory is used to store code for performing a three-dimensional reconstruction method of the target scene.
  • the processor is configured to call the code stored in the memory and execute the three-dimensional reconstruction method of the target scene according to the embodiment of the present invention in the first aspect.
  • an embodiment of the present invention provides a computer-readable storage medium that stores a computer program.
  • the computer program includes at least one piece of code.
  • the at least one piece of code can be executed by a computer to control the computer.
  • the computer executes the three-dimensional reconstruction method of the target scene according to the first aspect of the present invention.
  • an embodiment of the present invention provides a computer program that, when executed by a computer, is used to implement the three-dimensional reconstruction method of a target scene according to the first aspect of the present invention.
  • the method and system for three-dimensional reconstruction of a target scene provided by an embodiment of the present invention, and an unmanned aerial vehicle, by acquiring an image sequence of a target scene, the image sequence includes a plurality of image frames continuous in time series, and a plurality of image frames continuous in time series A target frame and a reference frame are obtained, and a depth map of the target frame is obtained based on the reference frame, and the depth map of the target frame is fused to obtain a three-dimensional model of the target scene.
  • Realize the three-dimensional reconstruction of the target scene based on monocular vision in the UAV aerial photography scene.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment does not need to rely on the expensive binocular vision system, nor is it limited by the depth of the depth sensor, and can meet the three-dimensional reconstruction requirements of the target scene in the aerial photography scene of the drone.
  • FIG. 1 is a schematic structural diagram of an unmanned aerial system provided by an embodiment of the present invention.
  • FIG. 2 is a flowchart of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention
  • FIG. 3 is a schematic diagram of reference frame selection in an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention
  • FIG. 4 is a schematic block diagram of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention.
  • FIG. 5 is a schematic structural diagram of an embodiment of a three-dimensional reconstruction system for a target scene provided by the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a drone provided by the present invention.
  • a component when a component is said to be “fixed” to another component, it can be directly on another component or it can also exist in a centered component. When a component is considered to be “connected” to another component, it can be directly connected to another component or there can be centered components at the same time.
  • the embodiments of the present invention provide a three-dimensional reconstruction method and system of a target scene and a drone.
  • the drone may be, for example, a rotorcraft (rotorcraft), for example, a multirotor aircraft propelled by a plurality of propulsion devices through air, and the embodiments of the present invention are not limited thereto.
  • FIG. 1 is a schematic architectural diagram of an unmanned aerial system provided by an embodiment of the present invention.
  • a rotary-wing UAV is taken as an example for description.
  • the unmanned aerial system 100 may include a drone 110, a display device 130, and a control terminal 140.
  • the UAV 110 may include a power system 150, a flight control system 160, a rack, and a gimbal 120 carried on the rack.
  • the drone 110 can communicate wirelessly with the control terminal 140 and the display device 130.
  • the rack may include a fuselage and a tripod (also called landing gear).
  • the fuselage may include a center frame and one or more arms connected to the center frame, the one or more arms extending radially from the center frame.
  • the tripod is connected to the fuselage and is used to support the UAV 110 when it lands.
  • the power system 150 may include one or more electronic governors (abbreviated as electric governors) 151, one or more propellers 153, and one or more motors 152 corresponding to the one or more propellers 153, wherein the motor 152 is connected to Between the electronic governor 151 and the propeller 153, the motor 152 and the propeller 153 are disposed on the arm of the drone 110; the electronic governor 151 is used to receive the driving signal generated by the flight control system 160 and provide driving according to the driving signal The current is given to the motor 152 to control the rotation speed of the motor 152. The motor 152 is used to drive the propeller to rotate, thereby providing power for the flight of the drone 110, which enables the drone 110 to achieve one or more degrees of freedom of movement.
  • electric governors abbreviated as electric governors
  • the drone 110 may rotate about one or more rotation axes.
  • the rotation axis may include a roll axis (Roll), a yaw axis (Yaw), and a pitch axis (Pitch).
  • the motor 152 may be a DC motor or an AC motor.
  • the motor 152 may be a brushless motor or a brush motor.
  • the flight control system 160 may include a flight controller 161 and a sensing system 162.
  • the sensor system 162 is used to measure the attitude information of the drone, that is, the position information and status information of the drone 110 in space, for example, three-dimensional position, three-dimensional angle, three-dimensional velocity, three-dimensional acceleration, and three-dimensional angular velocity.
  • the sensing system 162 may include, for example, at least one of a gyroscope, an ultrasonic sensor, an electronic compass, an inertial measurement unit (Inertial Measurement Unit, IMU), a visual sensor, a global navigation satellite system, and a barometer.
  • the global navigation satellite system may be a global positioning system (Global Positioning System, GPS).
  • the flight controller 161 is used to control the flight of the drone 110.
  • the flight of the drone 110 can be controlled according to the attitude information measured by the sensor system 162. It should be understood that the flight controller 161 may control the drone 110 according to pre-programmed program instructions, or may control the drone 110 by responding to one or more control instructions from the control terminal 140.
  • the gimbal 120 may include a motor 122.
  • the gimbal is used to carry the shooting device 123.
  • the flight controller 161 can control the movement of the gimbal 120 through the motor 122.
  • the gimbal 120 may further include a controller for controlling the movement of the gimbal 120 by controlling the motor 122.
  • the gimbal 120 may be independent of the drone 110, or may be a part of the drone 110.
  • the motor 122 may be a DC motor or an AC motor.
  • the motor 122 may be a brushless motor or a brush motor.
  • the gimbal can be located at the top of the drone or at the bottom of the drone.
  • the shooting device 123 may be, for example, a device for capturing images such as a camera or a video camera.
  • the shooting device 123 may communicate with the flight controller and perform shooting under the control of the flight controller.
  • the photographing device 123 of this embodiment includes at least a photosensitive element, for example, a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor (CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor. It can be understood that the shooting device 123 can also be directly fixed on the drone 110, so that the gimbal 120 can be omitted.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the display device 130 is located on the ground end of the unmanned aerial system 100, can communicate with the drone 110 in a wireless manner, and can be used to display the attitude information of the drone 110.
  • the image captured by the imaging device may also be displayed on the display device 130. It should be understood that the display device 130 may be an independent device or may be integrated in the control terminal 140.
  • the control terminal 140 is located at the ground end of the unmanned aerial system 100, and can communicate with the drone 110 in a wireless manner for remote manipulation of the drone 110.
  • the drone 110 may also be equipped with a speaker (not shown in the figure), which is used to play audio files.
  • the speaker may be directly fixed on the drone 110 or may be mounted on the gimbal 120.
  • the shooting device 123 in this embodiment may be, for example, a monocular camera, which is used to shoot a target scene to obtain an image sequence of the target scene.
  • the three-dimensional reconstruction method of the target scene provided by the following embodiment may be executed by, for example, the flight controller 161, and the flight controller 161 acquires the image sequence of the target through the shooting device 123 to realize the three-dimensional reconstruction of the target scene, which may be used for drone flight.
  • the three-dimensional reconstruction method of the target scene can also be performed by the control terminal 140 located on the ground side, for example, the drone transmits the image sequence of the target scene acquired by the shooting device 123 to the control terminal 140 through image transmission technology, and the control terminal 140 completes the The three-dimensional reconstruction of the target scene; for example, the three-dimensional reconstruction method of the target scene can also be executed by a cloud server (not shown in the figure) located in the cloud, and the drone transmits the image sequence of the target scene acquired by the shooting device 123 to the cloud through image transmission technology Server, the cloud server completes the three-dimensional reconstruction of the target scene.
  • a cloud server not shown in the figure
  • FIG. 2 is a flowchart of an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention. As shown in FIG. 2, the method provided in this embodiment may include:
  • a drone equipped with a monocular shooting device may be used to shoot a target scene to obtain an image sequence of the target scene.
  • the target scene is an object that requires three-dimensional reconstruction.
  • the flight path can be planned for the drone, the flight speed and shooting frame rate can be set to obtain the image sequence of the target scene, or the shooting location can also be set when the drone is flying When you reach the preset shooting location, shoot.
  • the image sequence of the target scene acquired in this embodiment includes a plurality of image frames continuous in time sequence.
  • the target frame and the reference frame need to be determined according to a plurality of image frames that are consecutive in time series.
  • the target frame is an image frame that needs depth recovery in order to achieve three-dimensional reconstruction
  • the reference frame is an image acquisition frame that provides data such as depth of field for the target frame, and has time-domain correlation and pixel correlation with the target frame.
  • the target frame in this embodiment may include one frame among a plurality of image frames consecutive in time series.
  • the reference frame in this embodiment may include a frame that has overlapping pixels with the target frame.
  • the target frame and the reference frame can be determined by performing feature extraction, feature point matching, pose estimation, etc. on the acquired multiple image frames that are consecutive in time sequence.
  • features with rotation invariance such as Scale-Invariant Feature Transform (SIFT), Accelerated Robust Features (Speed Up Robust Features, SURF), and so on.
  • SIFT Scale-Invariant Feature Transform
  • SURF Accelerated Robust Features
  • the posture estimation of each image frame during shooting can be obtained by sensors mounted on the drone, such as an odometer, a gyroscope, an IMU, and the like.
  • the depth map corresponding to the target frame can be obtained based on the reference frame image data based on the feature point matching between the target frame and the reference frame and the knowledge of epipolar geometry.
  • the depth map of the target frame is obtained, it is converted into a corresponding three-dimensional point cloud according to the depth value and position information corresponding to each pixel in the target frame. 3D reconstruction of the target scene according to the 3D point cloud.
  • the method for three-dimensional reconstruction of a target scene obtains a target sequence and a reference frame according to a plurality of image frames that are continuous in time sequence by acquiring an image sequence of the target scene that is continuous in time sequence, Based on the reference frame, a depth map of the target frame is obtained, and the depth map of the target frame is fused to obtain a three-dimensional model of the target scene. Realize the three-dimensional reconstruction of the target scene based on monocular vision in the UAV aerial photography scene.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment does not need to rely on the expensive binocular vision system, nor is it limited by the depth of the depth sensor, and can meet the three-dimensional reconstruction requirements of the target scene in the aerial photography scene of the drone.
  • the embodiment of this specification uses a monocular camera device, which does not mean that the method of this specification is not applicable to the binocular camera device.
  • the binocular camera device or the multi-eye camera device is also applicable to the solutions described in this specification.
  • the reference frame may include at least the first image Frame and second image frame.
  • the first image frame is located before the target frame in time sequence
  • the second image frame is located behind the target frame in time sequence.
  • the reference frame in this embodiment includes both the first image frame that is located before the reference frame in timing , Also includes a second image frame that is located after the reference frame in time sequence, improves the overlap rate between the target frame and the reference frame, reduces the parallax-free area, and thus improves the depth of the target frame obtained based on the reference frame The accuracy of the graph.
  • the target frame is the Nth frame
  • the first image frame is the N-1th frame
  • the second image frame is the N+1th frame
  • the reference frame includes two frames before and after the target frame.
  • the overlap rate between two adjacent frames is 70%
  • the reference frame includes only the image frame before the target frame, at least 30% of the parallax in the target frame has no solution .
  • the reference frame selection strategy provided in this embodiment enables all regions in the target frame to find matching regions in the reference frame, avoiding the occurrence of parallax insolubility and improving the depth map of the target frame. accuracy.
  • the first image frame may include a preset number of image frames before the Nth frame
  • the second image frame may include a preset number of image frames after the Nth frame.
  • the first image frame may be one of a preset number of image frames before the Nth frame
  • the second image frame may be a preset number of image frames after the Nth frame In a frame.
  • the reference frame may include at least a third Image frame.
  • the epipolar direction of the third image frame and the target frame are not parallel.
  • the epipolar line in this embodiment is the epipolar line in the epipolar geometry, that is, the intersection between the polar plane and the image.
  • the epipolar direction of the third image frame and the target frame are not parallel, that is, the first intersection line of the polar plane and the third image frame and the second intersection line of the polar plane and the target frame are not parallel.
  • the third image frame may include an image frame that has overlapping pixels with the target frame in the adjacent flight zone of the target frame.
  • the third image frame may be an image frame with the highest overlap rate with the target frame in the adjacent flight zone of the target frame.
  • FIG. 3 is a schematic diagram of reference frame selection in an embodiment of a three-dimensional reconstruction method for a target scene provided by the present invention.
  • the solid line is used to represent the flight path of the drone
  • the route covers the target scene
  • the arrow indicates the flight direction of the drone
  • the black circles and black squares on the flight path indicate the shooting of the drone
  • the device shoots at this position, that is, a black circle and a black square correspond to an image frame of the target scene.
  • the image sequence of the target scene can be obtained through the shooting device mounted on the drone, such as a monocular camera, which includes multiple consecutive image frames in time series.
  • M-1, M, M+1, N-1, N, N+1 in FIG. 3 represent the frame number of the image frame
  • N and M are natural numbers, and the specific values of N and M are not limited in this embodiment .
  • the reference frame may include the N-1th frame and the N+1th frame shown in the figure.
  • the reference frame may include the Mth frame shown in the figure.
  • the reference frame may include the Mth frame, the N-1th frame, and the N+1th frame shown in the figure, that is, FIG. 3 The image frame included in the dotted circle.
  • the reference frame may further include more image frames, for example, the M-1th frame, the M+1th frame, the N-2th frame, and the like.
  • the overlapping rate of the target frame and the reference frame and the calculation speed can be comprehensively considered and selected.
  • one implementation manner of obtaining the depth map of the target frame based on the reference frame may be: obtaining the depth map of the target frame according to the disparity between the target frame and the reference frame.
  • the depth map of the target frame can be obtained according to the aberrations of the same object in the target frame and the reference frame.
  • an implementation manner of obtaining the depth map of the target frame based on the reference frame may be: determining the matching cost corresponding to the target frame according to the reference frame; and determining the depth map of the target frame according to the matching cost corresponding to the target frame.
  • pixel points in the reference frame and the target frame can be matched to determine the matching cost corresponding to the target frame.
  • matching cost aggregation can be performed, and then the parallax is determined, and the depth map of the target frame is determined according to the correspondence between the parallax and the depth.
  • parallax optimization can also be performed to enhance the parallax. According to the parallax after optimization and enhancement, the depth map of the target frame is determined.
  • the flying height of the drone is usually about 100 meters, and the drone is usually shot vertically downwards. Due to the fluctuation of the ground, the reflection of the sun is different, and the images taken by the drone have non-negligible illumination Changes, lighting changes will reduce the accuracy of the three-dimensional reconstruction of the target scene.
  • determining the matching cost corresponding to the target frame according to the reference frame may include: According to the target frame and the reference frame, the first type matching cost and the second type matching cost corresponding to the target frame are determined; the matching cost corresponding to the target frame is determined to be equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the robustness of the matching cost to illumination is improved compared to using only a single type of matching cost, thereby reducing the The influence of illumination changes on 3D reconstruction improves the accuracy of 3D reconstruction.
  • the weighting coefficients of the first-type matching cost and the second-type matching cost in this embodiment can be set according to specific needs, and this embodiment does not limit this.
  • the first-type matching cost may be determined based on zero-normalized normalized cross-correlation (Zero-based Normalized Cross Correlation, ZNCC). Based on ZNCC, the similarity between the target frame and the reference frame can be accurately measured.
  • ZNCC Zero-normalized normalized cross-correlation
  • the matching cost of the second type may be determined based on the invariant feature of illumination.
  • the illumination-invariant features in the image frames collected by the drone can be extracted, such as Local Binary Patterns (LBP), census sequence, etc., and then the second type can be determined based on the illumination-invariant features Match the cost.
  • LBP Local Binary Patterns
  • the census sequence in this embodiment can be determined as follows: select any point in the image frame, draw a rectangle such as 3 ⁇ 3 with the point as the center, and every point except the center point in the rectangle is the same as the center point For comparison, the gray value is less than the center point is recorded as 1, the gray value is greater than the center point is recorded as 0, and the resulting sequence of length 8 is only 0 and 1 as the census sequence of the center point, that is, the center pixel The gray value is replaced by the census sequence.
  • the Hamming distance can be used to determine the second type matching cost between the target frame and the reference frame.
  • the matching cost corresponding to the target frame may be equal to the weighted sum of the two matching costs of ZNCC and census.
  • an implementation of determining the matching cost corresponding to the target frame according to the reference frame may be: dividing the target frame into multiple image blocks; according to the reference frame, determining the matching cost corresponding to each image block; The matching cost corresponding to each image block determines the matching cost corresponding to the target frame.
  • one or more of the following methods may be used to divide the target frame into multiple image blocks:
  • the target frame is divided into multiple image blocks.
  • the target frame may be divided into multiple image blocks by clustering according to the color information and/or texture information of the target frame.
  • the target frame is evenly divided into multiple image blocks.
  • the number of image blocks may be set in advance, and then the target frame may be divided according to the number of image blocks set in advance.
  • the target frame into multiple image blocks of preset size.
  • the size of the image block may be set in advance, and then the target frame may be divided according to the size of the image block set in advance.
  • the matching cost corresponding to each image block may be determined in parallel according to the reference frame.
  • the matching cost corresponding to each image block may be determined in parallel by using software and/or hardware.
  • multiple threads may be used to determine the matching cost corresponding to each image block in parallel, and/or a graphics processor (Graphics Processing Unit, GPU) may be used to determine the matching cost corresponding to each image block in parallel.
  • a graphics processor Graphics Processing Unit, GPU
  • the three-dimensional reconstruction method of the target scene provided in this embodiment, on the basis of the foregoing embodiment, divides the target frame into a plurality of image blocks, determines the matching cost corresponding to each image block in parallel according to the reference frame, and then according to each image
  • the matching cost corresponding to the block determines the matching cost corresponding to the target frame, which improves the calculation speed of the matching cost and further improves the real-time nature of the three-dimensional reconstruction of the target scene.
  • the number of depth samples can be determined according to the depth range and accuracy.
  • the number of depth samples is positively related to the depth range and negatively related to the accuracy. For example, if the depth range is 50 meters and the accuracy requirement is 0.1 meters, the number of depth samples can be 500.
  • the matching cost of the target frame you can use the preset depth sampling times, or you can use real-time positioning and map construction (Simultaneous Localization and Mapping, SLAM) to recover some sparse 3D points in the target frame, and then based on these sparse 3D Point to determine the depth range of the entire target frame, and then determine the number of depth samples according to the depth range and accuracy requirements of the entire target frame. If the number of depth samples is N, the matching cost of N times needs to be calculated for each pixel in the target frame. For a target frame with a size of 640*480 pixels, the matching cost of 640*480*N times needs to be calculated.
  • SLAM Simultaneous Localization and Mapping
  • the matching cost corresponding to each image block is determined according to the reference frame , May include: determining the depth sampling times of each image block according to the sparse points in each image block; determining the matching cost corresponding to each image block according to the reference frame and the depth sampling times of each image block.
  • the target frame can contain a variety of shooting objects, such as pedestrians, cars, trees, tall buildings, etc., so the depth range of the entire target frame is relatively large, and the preset accuracy When required, the depth sampling times are larger.
  • the depth range corresponding to each image block in the target frame is relatively small.
  • the depth range corresponding to the image block will be much smaller than the depth range of the entire target frame, under the same accuracy requirements , Can greatly reduce the number of depth sampling. That is to say, under the same precision requirement, the depth sampling times of image blocks in the target frame must be less than or equal to the depth sampling times of the entire target frame.
  • the depth range of each image block is fully considered, and the number of depth samples is set according to the depth range of each image block.
  • the calculation complexity is reduced and the speed is increased.
  • This embodiment may use SLAM to recover some sparse three-dimensional points in the image block for each image block, determine the depth range of the image block according to the sparse three-dimensional points, and determine the depth range and accuracy requirements of the image block The number of depth samples for this image block. The determined depth sampling times determine the matching cost corresponding to each image block.
  • the target frame is an image frame with a size of 640*480 pixels, and the depth sampling times are determined to be 500 according to the depth range of the target frame, the matching cost of 640*480*500 times needs to be calculated. If the target frame is evenly divided into 320*160 size image blocks, and the depth sampling times of the 6 image blocks determined according to the depth range of each image block are 100, 200, 150, 100, 50, and 300, only the calculation is required 320*160*(100+200+150+100+150+300) matching cost. The amount of calculation is only one-third of the original.
  • the depth map of the target frame may be determined according to a semi-global matching algorithm (Semi Global Matching, SGM).
  • SGM Semi Global Matching
  • the 3D reconstruction method of the target scene provided in this embodiment, after obtaining the depth map of the target frame based on the reference frame, Including: filtering the depth map of the target frame. By filtering the depth map of the target frame, the noise in the depth map can be filtered out, and the accuracy of 3D reconstruction is improved.
  • an implementation manner of filtering the depth map of the target frame may be: performing a three-sided filtering process on the depth map of the target frame.
  • the trilateral filtering in this embodiment means that the weighting coefficients in the filtering process can be comprehensively determined according to the three factors of pixel distance, depth difference and color difference.
  • the size of the filtering template is 5*5, that is to say, the depth value of the target pixel after the filtering process can be determined by the depth values of the pixel and the surrounding 24 pixels.
  • the weight value of each pixel's influence on the depth value of the target pixel, according to the Euclidean distance of the pixel from the target pixel, the difference between the depth value of the pixel and the depth value of the target pixel, and the value of the pixel The difference between the RGB value and the RGB value of the target pixel is determined.
  • the three-dimensional reconstruction method of the target scene provided in this embodiment further performs a trilateral filtering process on the depth map of the target frame, and improves the depth of the target frame by the sharp and fine edge information in the target frame.
  • the accuracy of the edge of the map on the premise of saving the edge, removes the noise more robustly, making the depth map of the target frame more accurate, and the 3D reconstruction based on the depth map will also be more accurate.
  • one method of fusing the depth map of the target frame to obtain a three-dimensional model of the target scene may be: determining the point cloud corresponding to the target frame according to the depth map of the target frame; fusing the point cloud corresponding to the target frame To the voxel corresponding to the target scene; according to the voxel corresponding to the target scene, a three-dimensional model of the target scene is obtained.
  • the point cloud corresponding to the target frame is determined according to the depth map of the target frame, and the corresponding relationship between the depth map and the three-dimensional point cloud in the prior art can be used for conversion. This embodiment does not limit this.
  • a voxel-based point cloud fusion method is used.
  • the route is planned before the drone takes off, and the drone is shot vertically downwards, so the coverage of the planned route can be expressed with voxels of preset size.
  • the coverage of the planned route can be expressed with voxels of preset size.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment has high real-time performance and high scalability.
  • the fusion of depth maps is almost pixel-by-pixel parallel, and from the depth value to the point cloud, the computational complexity of the fusion of point clouds into voxels is o(1), and the real-time fusion is very high; for a planning task, according to the route
  • the particularity of the planning can divide the target area into multiple sub-blocks, which makes the point cloud also have good blockiness, which is conducive to the loading of the point cloud and the subsequent display of multiple levels of detail (LOD), which is convenient Perform real-time 3D reconstruction of large scenes.
  • LOD levels of detail
  • the three-dimensional reconstruction method of the target scene provided by this embodiment may be implemented by two threads, namely a dense thread and a fusion thread.
  • the dense thread includes the steps of initialization, frame selection, depth map calculation and filter processing. For example, a new key frame and its position and orientation can be acquired through a camera mounted on the drone, and the initialization process is completed based on the acquired new key frame and its orientation. After the initialization is completed, frame selection is performed.
  • the frame selection in this embodiment includes determining the target frame and the reference frame.
  • the fusion thread in this embodiment completes the fusion of the depth map according to the RGB map, the position orientation, and the depth map queue, and creates a three-dimensional point cloud according to the fused depth map.
  • the dense thread and the fusion thread in this embodiment can be executed in parallel to increase the speed of three-dimensional reconstruction of the target scene and improve real-time performance.
  • FIG. 5 is a schematic structural diagram of an embodiment of a three-dimensional reconstruction system for a target scene provided by the present invention.
  • the target scene three-dimensional reconstruction system 500 provided in this embodiment may include: a processor 501 and a memory 502.
  • the processor 501 and the memory 502 are communicatively connected via a bus.
  • the processor 501 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), or off-the-shelf.
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the above-mentioned memory 502 may be, but not limited to, random access memory (Random Access Memory, RAM for short), read-only memory (Read Memory Only, ROM for short), programmable read-only memory (Programmable Read-Only Memory, short for : PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc.
  • the memory 502 is used to store the program code; the processor 501 calls the program code, and when the program code is executed, it is used to perform the following operations:
  • the image sequence contains multiple image frames that are continuous in time sequence
  • the depth map of the target frame is fused to obtain a three-dimensional model of the target scene.
  • the target frame includes one of a plurality of image frames that are consecutive in time series.
  • the reference frame includes a frame with overlapping pixels with the target frame.
  • the reference frame includes at least a first image frame and a second image frame; the first image frame is located before the target frame in time sequence; the second image frame is located after the target frame in time sequence.
  • the target frame is the Nth frame
  • the first image frame is the N-1th frame
  • the second image frame is the N+1th frame.
  • the reference frame includes at least a third image frame; the epipolar direction of the third image frame and the target frame are not parallel.
  • the processor 501 is used to obtain a depth map of the target frame based on the reference frame, which may specifically include:
  • a depth map of the target frame is obtained.
  • the processor 501 is used to obtain a depth map of the target frame based on the reference frame, which may specifically include:
  • the depth map of the target frame is determined.
  • the processor 501 is used to determine the matching cost corresponding to the target frame according to the reference frame, which may specifically include:
  • the target frame and the reference frame determine the first type matching cost and the second type matching cost corresponding to the target frame
  • the matching cost corresponding to the target frame is equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the first-type matching cost is determined based on the zero-mean normalized cross-correlation.
  • the matching cost of the second type is determined based on the invariant feature of illumination.
  • the processor 501 is used to determine the matching cost corresponding to the target frame according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined.
  • the processor 501 is used to divide the target frame into multiple image blocks, which may specifically include:
  • the target frame is divided into multiple image blocks.
  • the processor 501 is used to divide the target frame into multiple image blocks, which may specifically include:
  • the target frame is evenly divided into multiple image blocks.
  • the processor 501 is used to determine the matching cost corresponding to each image block according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined in parallel.
  • the processor 501 is used to determine the matching cost corresponding to each image block according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined.
  • the processor 501 is further configured to filter the depth map of the target frame after obtaining the depth map of the target frame based on the reference frame.
  • the processor 501 is used to filter the depth map of the target frame, which may specifically include:
  • Three-sided filtering is performed on the depth map of the target frame.
  • the processor 501 is used to fuse the depth map of the target frame to obtain a three-dimensional model of the target scene, which may specifically include:
  • a three-dimensional model of the target scene is obtained.
  • the drone 600 provided in this embodiment may include a processor 601.
  • the processor 601 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the drone 600 is equipped with a shooting device 602, and the shooting device 602 is used to shoot a target scene.
  • the processor 601 is used to obtain an image sequence of the target scene, and the image sequence includes a plurality of image frames continuous in time sequence;
  • the depth map of the target frame is fused to obtain a three-dimensional model of the target scene.
  • the target frame includes one of a plurality of image frames that are consecutive in time series.
  • the reference frame includes a frame with overlapping pixels with the target frame.
  • the reference frame includes at least a first image frame and a second image frame; the first image frame is located before the target frame in time sequence; the second image frame is located after the target frame in time sequence.
  • the target frame is the Nth frame
  • the first image frame is the N-1th frame
  • the second image frame is the N+1th frame.
  • the reference frame includes at least a third image frame; the epipolar direction of the third image frame and the target frame are not parallel.
  • the processor 601 is used to obtain the depth map of the target frame based on the reference frame, which may specifically include:
  • a depth map of the target frame is obtained.
  • the processor 601 is used to obtain the depth map of the target frame based on the reference frame, which may specifically include:
  • the depth map of the target frame is determined.
  • the processor 601 is used to determine the matching cost corresponding to the target frame according to the reference frame, which may specifically include:
  • the target frame and the reference frame determine the first type matching cost and the second type matching cost corresponding to the target frame
  • the matching cost corresponding to the target frame is equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the first-type matching cost is determined based on the zero-mean normalized cross-correlation.
  • the matching cost of the second type is determined based on the invariant feature of illumination.
  • the processor 601 is used to determine the matching cost corresponding to the target frame according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined.
  • the processor 601 is used to divide the target frame into multiple image blocks, which may specifically include:
  • the target frame is divided into multiple image blocks.
  • the processor 601 is used to divide the target frame into multiple image blocks, which may specifically include:
  • the target frame is evenly divided into multiple image blocks.
  • the processor 601 is used to determine the matching cost corresponding to each image block according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined in parallel.
  • the processor 601 is used to determine the matching cost corresponding to each image block according to the reference frame, which may specifically include:
  • the matching cost corresponding to each image block is determined.
  • the processor 601 is further configured to filter the depth map of the target frame after obtaining the depth map of the target frame based on the reference frame.
  • the processor 601 is used to filter the depth map of the target frame, which may specifically include:
  • Three-sided filtering is performed on the depth map of the target frame.
  • the processor 601 is used to fuse the depth map of the target frame to obtain a three-dimensional model of the target scene, which may specifically include:
  • a three-dimensional model of the target scene is obtained.
  • An embodiment of the present invention further provides a three-dimensional reconstruction device (such as a chip, an integrated circuit, etc.) of a target scene, including: a memory and a processor.
  • the memory is used to store code for performing a three-dimensional reconstruction method of the target scene.
  • the processor is configured to call the code stored in the memory and execute the three-dimensional reconstruction method of the target scene described in any of the foregoing method embodiments.
  • An embodiment of the present invention also provides a computer-readable storage medium that stores a computer program, and the computer program includes at least one piece of code, and the at least one piece of code can be executed by a computer to control the computer to execute The three-dimensional reconstruction method of the target scene according to any one of the foregoing method embodiments.
  • An embodiment of the present invention provides a computer program which, when executed by a computer, is used to implement the three-dimensional reconstruction method of the target scene described in any of the foregoing method embodiments.
  • the foregoing program may be stored in a computer-readable storage medium, and when the program is executed, It includes the steps of the above method embodiments; and the foregoing storage media include: read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes Medium.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种目标场景三维重建方法、系统及无人机,所述方法包括:通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧(S201),根据时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图(S202),融合所述目标帧的深度图,获得目标场景的三维模型(S203)。所述方法实现了无人机航拍场景下,基于单目视觉的目标场景的三维重建。该方法既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。

Description

目标场景三维重建方法、系统及无人机 技术领域
本发明实施例涉及无人机技术领域,尤其涉及一种目标场景三维重建方法、系统及无人机。
背景技术
随着图像处理技术的不断发展,利用图像序列进行拍摄场景的三维重建已经成为计算机视觉领域和摄影测量学领域的热点问题。基于图像序列的三维重建通常可以包括:基于彩色图像和深度图像(Red Green Blue-Depth Map,RGB-D)数据的三维重建、基于双目的三维重建和基于单目的三维重建。基于RGB-D数据的三维重建,受深度传感器的深度限制,通常只能用于室内比较局限的场景。基于双目的三维重建依赖于双目视觉系统,硬件成本高。因此,基于单目的三维重建对于拍摄场景三维模型的重建,具有重要意义。
基于单目的三维重建,是指采用单个摄像头,通过移动摄像头,根据物体在不同图像上的像移,确定深度图,然后融合深度图实现三维重建。由于无人机航拍的特殊性,现有基于单目的三维重建方法在无人机航拍场景下,深度图计算结果差,三维重建误差大。综上所述,亟需一种能够满足无人机航拍场景需求的目标场景三维重建方法。
发明内容
本发明实施例提供一种目标场景三维重建方法、系统及无人机,用以解决现有方法无法满足无人机航拍场景下目标场景三维重建的需求。
第一方面,本发明实施例提供一种目标场景三维重建方法,包括:
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
融合所述目标帧的深度图,获得目标场景的三维模型。
第二方面,本发明实施例提供一种目标场景三维重建系统,包括:处理器和存储器;
所述存储器,用于存储程序代码;
所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
融合所述目标帧的深度图,获得目标场景的三维模型。
第三方面,本发明实施例提供一种无人机,包括:处理器;
所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
所述处理器用于,
获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
融合所述目标帧的深度图,获得目标场景的三维模型。
第四方面,本发明实施例提供一种目标场景三维重建装置(例如芯片、集成电路等),包括:存储器和处理器。所述存储器,用于存储执行目标场景三维重建方法的代码。所述处理器,用于调用所述存储器中存储的所述代码,执行如第一方面本发明实施例所述的目标场景三维重建方法。
第五方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至少一段代码可由计算机执行,以控制所述计算机执行第一方面本发明实施例所述的目标场景三维重建方法。
第六方面,本发明实施例提供一种计算机程序,当所述计算机程序被计算机执行时,用于实现第一方面本发明实施例所述的目标场景三维重建方法。
本发明实施例提供的目标场景三维重建方法、系统及无人机,通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧,根据 时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图,融合所述目标帧的深度图,获得目标场景的三维模型。实现了无人机航拍场景下,基于单目视觉的目标场景的三维重建。本实施例提供的目标场景三维重建方法,既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的无人飞行系统的示意性架构图;
图2为本发明提供的目标场景三维重建方法一实施例的流程图;
图3为本发明提供的目标场景三维重建方法一实施例中参考帧选取的示意图;
图4为本发明提供的目标场景三维重建方法一实施例的示意性框图;
图5为本发明提供的目标场景三维重建系统一实施例的结构示意图;
图6为本发明提供的无人机一实施例的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,当组件被称为“固定于”另一个组件,它可以直接在另一个组件上或者也可以存在居中的组件。当一个组件被认为是“连接”另一个组件,它可以是直接连接到另一个组件或者可能同时存在居中组件。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技 术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。
下面结合附图,对本发明的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
本发明的实施例提供了目标场景三维重建方法、系统及无人机。其中无人机例如可以是旋翼飞行器(rotorcraft),例如,由多个推动装置通过空气推动的多旋翼飞行器,本发明的实施例并不限于此。
图1是本发明实施例提供的无人飞行系统的示意性架构图。本实施例以旋翼无人机为例进行说明。
无人飞行系统100可以包括无人机110、显示设备130和控制终端140。其中,无人机110可以包括动力系统150、飞行控制系统160、机架和承载在机架上的云台120。无人机110可以与控制终端140和显示设备130进行无线通信。
机架可以包括机身和脚架(也称为起落架)。机身可以包括中心架以及与中心架连接的一个或多个机臂,一个或多个机臂呈辐射状从中心架延伸出。脚架与机身连接,用于在无人机110着陆时起支撑作用。
动力系统150可以包括一个或多个电子调速器(简称为电调)151、一个或多个螺旋桨153以及与一个或多个螺旋桨153相对应的一个或多个电机152,其中电机152连接在电子调速器151与螺旋桨153之间,电机152和螺旋桨153设置在无人机110的机臂上;电子调速器151用于接收飞行控制系统160产生的驱动信号,并根据驱动信号提供驱动电流给电机152,以控制电机152的转速。电机152用于驱动螺旋桨旋转,从而为无人机110的飞行提供动力,该动力使得无人机110能够实现一个或多个自由度的运动。在某些实施例中,无人机110可以围绕一个或多个旋转轴旋转。例如,上述旋转轴可以包括横滚轴(Roll)、偏航轴(Yaw)和俯仰轴(pitch)。应理解,电机152可以是直流电机,也可以交流电机。另外,电机152可以是无刷电机,也可以是有刷电机。
飞行控制系统160可以包括飞行控制器161和传感系统162。传感系统 162用于测量无人机的姿态信息,即无人机110在空间的位置信息和状态信息,例如,三维位置、三维角度、三维速度、三维加速度和三维角速度等。传感系统162例如可以包括陀螺仪、超声传感器、电子罗盘、惯性测量单元(Inertial Measurement Unit,IMU)、视觉传感器、全球导航卫星系统和气压计等传感器中的至少一种。例如,全球导航卫星系统可以是全球定位系统(Global Positioning System,GPS)。飞行控制器161用于控制无人机110的飞行,例如,可以根据传感系统162测量的姿态信息控制无人机110的飞行。应理解,飞行控制器161可以按照预先编好的程序指令对无人机110进行控制,也可以通过响应来自控制终端140的一个或多个控制指令对无人机110进行控制。
云台120可以包括电机122。云台用于携带拍摄装置123。飞行控制器161可以通过电机122控制云台120的运动。可选地,作为另一实施例,云台120还可以包括控制器,用于通过控制电机122来控制云台120的运动。应理解,云台120可以独立于无人机110,也可以为无人机110的一部分。应理解,电机122可以是直流电机,也可以是交流电机。另外,电机122可以是无刷电机,也可以是有刷电机。还应理解,云台可以位于无人机的顶部,也可以位于无人机的底部。
拍摄装置123例如可以是照相机或摄像机等用于捕获图像的设备,拍摄装置123可以与飞行控制器通信,并在飞行控制器的控制下进行拍摄。本实施例的拍摄装置123至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器。可以理解,拍摄装置123也可直接固定于无人机110上,从而云台120可以省略。
显示设备130位于无人飞行系统100的地面端,可以通过无线方式与无人机110进行通信,并且可以用于显示无人机110的姿态信息。另外,还可以在显示设备130上显示成像装置拍摄的图像。应理解,显示设备130可以是独立的设备,也可以集成在控制终端140中。
控制终端140位于无人飞行系统100的地面端,可以通过无线方式与无人机110进行通信,用于对无人机110进行远程操纵。
另外,无人机110还可以机载有扬声器(图中未示出),该扬声器用于 播放音频文件,扬声器可直接固定于无人机110上,也可搭载在云台120上。
本实施例中的拍摄装置123例如可以是单目相机,用于对目标场景进行拍摄,以获取目标场景的图像序列。下面实施例提供的目标场景三维重建方法例如可以由飞行控制器161执行,飞行控制器161通过拍摄装置123获取目标成精的图像序列,实现对目标场景的三维重建,可以用于无人机飞行避障;目标场景三维重建方法例如还可以由位于地面端的控制终端140执行,无人机通过图传技术将拍摄装置123获取的目标场景的图像序列传输至控制终端140,由控制终端140完成对目标场景的三维重建;目标场景三维重建方法例如还可以由位于云端的云服务器(图中未示出)执行,无人机通过图传技术将拍摄装置123获取的目标场景的图像序列传输至云服务器,由云服务器完成对目标场景的三维重建。
应理解,上述对于无人飞行系统各组成部分的命名仅是出于标识的目的,并不应理解为对本发明的实施例的限制。
图2为本发明提供的目标场景三维重建方法一实施例的流程图。如图2所示,本实施例提供的方法可以包括:
S201、获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧。
本实施例中例如可以采用搭载有单目拍摄装置的无人机,对目标场景进行拍摄,以获取目标场景的图像序列。
其中,目标场景为需要进行三维重建的对象。本实施例中在目标场景确定后,可以为无人机规划飞行航线,设置飞行速度和拍摄帧率,以获取目标场景的图像序列,或者,也可以对拍摄地点进行设置,当无人机飞行至预设拍摄地点时,进行拍摄。
本实施例中获取到的目标场景的图像序列,包含在时序上连续的多个图像帧。
S202、根据时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图。
本实施例中在获取到目标场景的图像序列之后,为了实现对目标场景的三维重建,则需要根据时序上连续的多个图像帧确定目标帧和参考帧。其中,目标帧是为了实现三维重建需要进行深度恢复的图像帧,参考帧是为目标帧 提供景深等数据、与目标帧存在时域关联和像素关联的图像采集帧。
可选的,本实施例中的目标帧可以包含在时序上连续的多个图像帧中的一帧。
可选的,本实施例中的参考帧可以包含与所述目标帧存在重叠像素的帧。
本实施例中例如可以通过对获取到的时序上连续的多个图像帧进行特征提取、特征点匹配、位姿估计等,以确定目标帧和参考帧。为了提高准确性,通常可以选用具有旋转不变性的特征,例如尺度不变特征变换特征(Scale-Invariant Feature Transform,SIFT)、加速稳健特征(Speed Up Robust Features,SURF)等。本实施例中各图像帧在拍摄时的位姿估计可以通过无人机上搭载的传感器,例如里程计、陀螺仪、IMU等获得。
本实施例中在确定了目标帧和参考帧之后,则可以根据目标帧与参考帧之间的特征点匹配以及对极几何的知识,基于参考帧图像数据,获得目标帧对应的深度图。
S203、融合所述目标帧的深度图,获得目标场景的三维模型。
本实施例中在获得了目标帧的深度图之后,则根据目标帧中各个像素点对应的深度值以及位置信息,将其转换为对应的三维点云。根据三维点云对目标场景进行三维重建。
本实施例提供的目标场景三维重建方法,通过获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧,根据时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图,融合所述目标帧的深度图,获得目标场景的三维模型。实现了无人机航拍场景下,基于单目视觉的目标场景的三维重建。本实施例提供的目标场景三维重建方法,既无需依赖价格高昂的双目视觉系统,也不受深度传感器的深度限制,能够满足无人机航拍场景下,对目标场景的三维重建需求。当然,本说明书实施例使用单目摄像装置,并不意味着本说明书方法不适用双目摄像装置,实际上双目摄像装置或者多目摄像装置同样适用于本说明书所记载的方案。
在上一实施例的基础上,为了获得更加精准的目标帧的深度图,以提高目标场景三维重建的准确度,本实施例提供的目标场景三维重建方法中,参考帧至少可以包括第一图像帧和第二图像帧。其中,第一图像帧在时序上位 于所述目标帧之前,第二图像帧在时序上位于所述目标帧之后。
无人机航拍时,可以沿着规划的航线飞行。当无人机沿着一条航线飞行时,当前图像帧中存在相当大的一部分区域不存在于之前拍摄的图像帧中。也就是说,若参考帧中仅包括当前图像帧之前拍摄的图像帧,根据参考帧确定当前图像帧的深度图时,会存在相当大的一部分区域的视差无解,深度图中必然会存在大片的无效区域。
因此,为了避免目标帧中的区域在参考帧中无相应的匹配区域,而导致该区域对应的深度图无效,本实施例中的参考帧既包括在时序上位于参考帧之前的第一图像帧,也包括在时序上位于参考帧之后的第二图像帧,提高了目标帧与参考帧之间的重叠率,减小了视差无解的区域,进而提高了基于参考帧获得的目标帧的深度图的准确性。
可选的,若目标帧为第N帧,则第一图像帧为第N-1帧,第二图像帧为第N+1帧,即参考帧包括与目标帧相邻的前后两帧。举例来说,若无人机在航拍时,相邻两帧之间的重叠率为70%,若参考帧仅包括目标帧之前的图像帧,则目标帧中至少有30%区域的视差无解。而本实施例提供的参考帧的选取策略,使得目标帧中的全部区域均可以在参考帧中找到与之相匹配的区域,避免了视差无解现象的产生,提高了目标帧的深度图的准确性。
可选的,若目标帧为第N帧,则第一图像帧可以包括第N帧之前预设数量的图像帧,第二图像帧可以包括第N帧之后预设数量的图像帧。
可选的,若目标帧为第N帧,则第一图像帧可以为第N帧之前预设数量的图像帧中的一帧,第二图像帧可以为第N帧之后预设数量的图像帧中的一帧。
在上述任一实施例的基础上,为了提高目标帧的深度图的可靠性,以提高目标场景三维重建的可靠性,本实施例提供的目标场景三维重建方法中,参考帧至少可以包括第三图像帧。其中,第三图像帧与目标帧的极线方向不平行。
本实施例中的极线为对极几何中的极线,即极平面与图像之间的交线。第三图像帧与目标帧的极线方向不平行,也就是说,极平面与第三图像帧的第一交线,与该极平面与目标帧的第二交线,不平行。
当目标帧中存在重复纹理时,若目标帧与参考帧的极线方向平行,则会 出现沿着平行极线分布的重复纹理,将会降低该区域对应的深度图的可靠性。因此,本实施例通过选取与目标帧的极线方向不平行的第三图像帧作为参考帧,避免了出现重复纹理沿着平行极线分布的现象,提高了深度图的可靠性。
可选的,第三图像帧可以包括目标帧相邻航带中与目标帧存在重叠像素的图像帧。
可选的,第三图像帧可以为目标帧相邻航带中与目标帧的重叠率最高的图像帧。
下面通过一个具体的示例来说明本发明实施例提供的参考帧的选取方法。图3为本发明提供的目标场景三维重建方法一实施例中参考帧选取的示意图。如图3所示,其中的实线用于表示无人机的飞行航线,航线覆盖了目标场景,箭头表示无人机的飞行方向,飞行航线上的黑色圆圈和黑色正方形表示无人机的拍摄装置在该位置进行拍摄,即黑色圆圈和黑色正方形对应目标场景的一个图像帧。当无人机沿着飞行航线飞行时,通过无人机上搭载的拍摄装置,如单目相机,便可以获取到目标场景的图像序列,包含了在时序上连续的多个图像帧。图3中的M-1、M、M+1、N-1、N、N+1表示图像帧的帧号,N和M为自然数,本实施例对N和M的具体取值不做限制。
若黑色正方形表示的第N帧为目标帧,在一种可能的实现方式中,参考帧可以包括图中所示的第N-1帧和第N+1帧。
若黑色正方形表示的第N帧为目标帧,在又一种可能的实现方式中,参考帧可以包括图中所示的第M帧。
若黑色正方形表示的第N帧为目标帧,在另一种可能的实现方式中,参考帧可以包括图中所示的第M帧、第N-1帧和第N+1帧,即图3中虚线圆圈中包括的图像帧。
可以理解的是,参考帧还可以包括更多的图像帧,例如还可以包括第M-1帧、第M+1帧、第N-2帧等。在具体实现时,可以综合考虑目标帧与参考帧的重叠率以及计算速度,进行选取。
在一些实施例中,基于参考帧获得目标帧的深度图的一种实现方式可以是:根据所述目标帧和所述参考帧之间的像差,获得所述目标帧的深度图。
本实施例中可以根据同一对象在目标帧和参考帧中的像差,获得目标帧 的深度图。
在一些实施例中,基于参考帧获得目标帧的深度图的一种实现方式可以是:根据参考帧,确定目标帧对应的匹配代价;根据目标帧对应的匹配代价,确定目标帧的深度图。
本实施例中可以通过对参考帧与目标帧中的像素点进行匹配,以确定目标帧对应的匹配代价。在确定了目标帧对应的匹配代价之后,可以进行匹配代价聚合,然后确定视差,根据视差与深度之间的对应关系,确定目标帧的深度图。可选的,在确定视差之后,还可以进行视差优化,视差加强。根据优化以及加强之后的视差,确定目标帧的深度图。
无人机的飞行高度通常在100米左右,且无人机通常都是垂直朝下进行拍摄的,由于地面高低起伏,对阳光的反射具有差异性,无人机拍摄的图像具有不可忽视的光照变化,光照变化将降低目标场景三维重建的准确性。
在上述任一实施例的基础上,为了提高目标场景三维重建对于光照的鲁棒性,本实施例提供的目标场景三维重建方法中,根据参考帧,确定目标帧对应的匹配代价,可以包括:根据目标帧以及参考帧,确定目标帧对应的第一类型匹配代价和第二类型匹配代价;确定目标帧对应的匹配代价等于第一类型匹配代价和第二类型匹配代价的加权和。
本实施例中在计算匹配代价时,通过将第一类型匹配代价与第二类型匹配代价进行融合,相较于仅采用单一类型匹配代价,提高了匹配代价对于光照的鲁棒性,进而减少了光照变化对于三维重建的影响,提高了三维重建的准确性。本实施例中第一类型匹配代价和第二类型匹配代价的加权系数可以根据具体需要进行设置,本实施例对此不做限制。
可选的,第一类型匹配代价可以基于零均值归一化互相关(Zero-based Normalized Cross Correlation,ZNCC)确定。基于ZNCC可以精确的度量目标帧与参考帧之间的相似性。
可选的,第二类型匹配代价可以基于光照不变特征确定。本实施例中,可以提取无人机所采集的图像帧中的光照不变特征,例如局部二值模式(Local Binary Patterns,LBP),census序列等,然后可以基于光照不变特征确定第二类型匹配代价。
本实施例中的census序列可以通过如下方式确定:在图像帧中选取任一 点,以该点为中心划出一个例如3×3的矩形,矩形中除中心点之外的每一点都与中心点进行比较,灰度值小于中心点即记为1,灰度值大于中心点的则记为0,以所得长度为8的只有0和1的序列作为该中心点的census序列,即中心像素的灰度值被census序列替换。
经过census变换后,可以采用汉明距离确定目标帧与参考帧之间的第二类型匹配代价。
例如,目标帧对应的匹配代价可以等于ZNCC和census两种匹配代价的加权和。
在一些实施例中,根据参考帧,确定目标帧对应的匹配代价的一种实现方式可以是:将目标帧划分成多个图像块;根据参考帧,确定每一个图像块对应的匹配代价;根据每一个图像块对应的匹配代价,确定目标帧对应的匹配代价。
本实施例中可以采用如下方式中的一种或者多种将目标帧划分为多个图像块:
(1)采用聚类的方式,将目标帧划分成多个图像块。本实施例中例如可以根据目标帧的色彩信息和/或纹理信息,采用聚类的方式,将目标帧划分成多个图像块。
(2)将目标帧均匀划分成多个图像块。本实施例中例如可以预先设置图像块的数量,然后根据预先设置的图像块的数量,对目标帧进行划分。
(3)将目标帧划分成预设大小的多个图像块。例如可以预先设置图像块的大小,然后根据预先设置的图像块的大小,对目标帧进行划分。
可选的,在将目标帧划分成多个图像块之后,可以根据参考帧,并行确定每一个图像块对应的匹配代价。本实施例中例如可以采用软件和/或硬件的方式并行确定每一个图像块对应的匹配代价。具体的,例如可以采用多线程并行确定每一个图像块对应的匹配代价,和/或,可以采用图形处理器(Graphics Processing Unit,GPU)并行确定每一个图像块对应的匹配代价。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,通过将目标帧划分成多个图像块,根据参考帧,并行确定每一个图像块对应的匹配代价,然后根据每一个图像块对应的匹配代价,确定目标帧对应的匹配代价,提高了匹配代价的计算速度,进而提高了目标场景三维重建的实时性。
深度采样次数可以根据深度范围和精度确定,深度采样次数与深度范围正相关,与精度负相关。举例来说,若深度范围为50米,精度要求为0.1米,则深度采样次数可以为500。
在确定目标帧的匹配代价时,可以采用预设深度采样次数,也可以采用即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)恢复出目标帧中一些稀疏的三维点,然后根据这些稀疏的三维点确定整个目标帧的深度范围,然后根据整个目标帧的深度范围以及精度要求,确定深度采样次数。若深度采样次数为N,则需要针对目标帧中每一个像素点计算N次匹配代价。对于640*480像素大小的目标帧,需要计算640*480*N次匹配代价。
在上述任一实施例的基础上,为了进一步提高处理速度,提高目标场景三维重建的实时性,本实施例提供的目标场景三维重建方法中,根据参考帧,确定每一个图像块对应的匹配代价,可以包括:根据每一个图像块中的稀疏点确定该图像块的深度采样次数;根据参考帧以及每一个图像块的深度采样次数,确定每一个图像块对应的匹配代价。
需要说明的是,当无人机垂直朝下进行拍摄时,目标帧中可以包含多种拍摄对象,例如行人、汽车、树木、高楼等,因此整个目标帧的深度范围比较大,在预设精度要求下,深度采样次数较大。然而目标帧中各个图像块对应的深度范围是比较小的,比如当一个图像块中仅包括行人时,该图像块对应的深度范围将远远小于整个目标帧的深度范围,在相同精度要求下,可以大幅减小深度采样次数。也就是说,在相同精度要求下,目标帧中图像块的深度采样次数必定小于等于目标帧整体的深度采样次数。
本实施例充分考虑了各个图像块的深度范围,根据各个图像块的深度范围设置深度采样次数,在保证精度的前提下,降低了计算复杂度,提高了速度。
本实施例可以针对每一个图像块,采用SLAM恢复出该图像块中一些稀疏的三维点,根据这些稀疏的三维点确定该图像块的深度范围,根据该图像块的深度范围以及精度要求,确定该图像块的深度采样次数。以确定的深度采样次数确定每一个图像块对应的匹配代价。
下面通过具体的数值分析来说明本实施例提供的方法如何降低计算复杂度,提高处理速度:
若目标帧为640*480像素大小的图像帧,根据目标帧的深度范围确定深度采样次数为500,则需要计算640*480*500次匹配代价。若将目标帧均匀划分为320*160大小的图像块,根据各个图像块的深度范围确定的6个图像块的深度采样次数分别为100、200、150、100、50和300,则仅需要计算320*160*(100+200+150+100+150+300)次匹配代价。计算量仅为原来的三分之一。
可选的,在确定了目标帧对应的匹配代价之后,可以根据半全局匹配算法(Semi Global Matching,SGM)确定目标帧的深度图。
可以理解的是,由于目标场景中的法向量存在较大偏差且深度采样的本身的离散性,再加上弱纹理以及重复纹理等因素,目标帧的深度图不可避免的会存在大量的随机分布的噪声。
在上述任一实施例的基础上,为了避免深度图中的噪声降低三维重建的准确性,本实施例提供的目标场景三维重建方法中,在基于参考帧获得目标帧的深度图之后,还可以包括:对目标帧的深度图进行滤波处理。通过对目标帧的深度图进行滤波处理,可以滤除深度图中的噪声,提高三维重建的准确性。
可选的,对目标帧的深度图进行滤波处理的一种实现方式可以是:对目标帧的深度图进行三边滤波处理。本实施例中的三边滤波是指滤波过程中的加权系数可以根据像素距离、深度差值和颜色差值三个因素综合确定。
举例来说,在一种滤波处理过程中,滤波模板的大小为5*5,也就是说滤波处理后目标像素点的深度值可以由该像素点以及周围的24个像素点的深度值确定。每一个像素点对于目标像素点的深度值影响的权重值,根据该像素点距离目标像素点的欧式距离、该像素点的深度值与目标像素点的深度值的差值,以及该像素点的RGB值与目标像素点的RGB值的差值确定。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,进一步的通过对目标帧的深度图进行三边滤波处理,通过目标帧中锐利、精细的边缘信息提高了目标帧的深度图边缘的精确性,在保存边缘的前提下,更鲁棒的去除了噪声,使得目标帧的深度图更加精确,基于该深度图的三维重建也将更加准确。
在一些实施例中,融合目标帧的深度图,获得目标场景的三维模型的一 种实现方式可以是:根据目标帧的深度图,确定目标帧对应的点云;将目标帧对应的点云融合至目标场景对应的体素中;根据目标场景对应的体素,获得目标场景的三维模型。
本实施例中根据目标帧的深度图,确定目标帧对应的点云,可以采用现有技术中深度图与三维点云之间的对应关系进行转换,本实施例对此不做限制。
本实施例中采用了基于体素的点云融合方法。由于无人机在作业时,航线在无人机起飞前已经规划好,且无人机都是垂直朝下进行拍摄,因此可以将规划航线的覆盖范围用预设大小的体素表示,可以在将每一帧深度图转化成点云之后,根据点云的三维坐标定位到相应的体素中,将点云的法向量融合成体素的法向量,点云的坐标融合成体素的坐标,用体素保存可见性信息。
本实施例提供的目标场景三维重建方法实时性高、可扩展性高。深度图的融合几乎是逐像素并行的,且由深度值到点云,点云融合成体素的计算复杂度是o(1),融合的实时性非常高;对于一次规划任务而言,根据航线规划的特殊性,可以将目标区域分块为多个子块,这样使得点云也具有良好的分块性,有利于点云的加载和后续的多细节层次(Levels of Detail,LOD)显示,便于进行大场景的实时三维重建。
图4为本发明提供的目标场景三维重建方法一实施例的示意性框图。如图4所示,本实施例提供的目标场景三维重建方法可以通过两个线程实现,即稠密化线程和融合线程。其中,稠密化线程包括了初始化、选帧、深度图计算及滤波处理等步骤。例如,可以通过无人机所搭载的拍摄装置获取新的关键帧及其位置朝向,基于所获取的新的关键帧及其位置朝向完成初始化过程。在完成初始化之后,则进行选帧,本实施例中的选帧包括确定目标帧和参考帧,具体实现方式可以参考上述任一实施例中目标帧和参考帧的确定方法,此处不再赘述。本实施例中的深度图计算例如可以采用将平面扫描算法(Plane Sweeping)与半全局优化(Semi-Global Matching,SGM)相结合的方法,或者也可以参考上述任一实施例中根据匹配代价确定深度图的实现方式,此处不再赘述。本实施例中的滤波处理例如可以采用三边滤波处理,具体实现方式可以参考上述实施例,此处不再赘述。在完成稠密化线程的处理之后,本实施例中的融合线程根据RGB图、位置 朝向以及深度图队列完成深度图的融合,并根据融合后的深度图创建三维点云。本实施例中的稠密化线程和融合线程可以并行执行,以提高目标场景三维重建的速度,提高实时性。
图5为本发明提供的目标场景三维重建系统一实施例的结构示意图。如图5所示,本实施例提供的目标场景三维重建系统500可以包括:处理器501和存储器502。处理器501与存储器502通过总线通信连接。上述处理器501可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。上述存储器502可以是,但不限于,随机存取存储器(Random Access Memory,简称:RAM),只读存储器(Read Only Memory,简称:ROM),可编程只读存储器(Programmable Read-Only Memory,简称:PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,简称:EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,简称:EEPROM)等。
存储器502,用于存储程序代码;处理器501调用程序代码,当程序代码被执行时,用于执行以下操作:
获取目标场景的图像序列,图像序列包含在时序上连续的多个图像帧;
根据时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图;
融合目标帧的深度图,获得目标场景的三维模型。
可选的,目标帧包含在时序上连续的多个图像帧中的一帧。
可选的,参考帧包含与目标帧存在重叠像素的帧。
可选的,参考帧至少包括第一图像帧和第二图像帧;第一图像帧在时序上位于目标帧之前;第二图像帧在时序上位于目标帧之后。
可选的,若目标帧为第N帧,则第一图像帧为第N-1帧,第二图像帧为第N+1帧。
可选的,参考帧至少包括第三图像帧;第三图像帧与目标帧的极线方向 不平行。
可选的,处理器501用于基于参考帧获得目标帧的深度图,具体可以包括:
根据目标帧和参考帧之间的像差,获得目标帧的深度图。
可选的,处理器501用于基于参考帧获得目标帧的深度图,具体可以包括:
根据参考帧,确定目标帧对应的匹配代价;
根据目标帧对应的匹配代价,确定目标帧的深度图。
可选的,处理器501用于根据参考帧,确定目标帧对应的匹配代价,具体可以包括:
根据目标帧以及参考帧,确定目标帧对应的第一类型匹配代价和第二类型匹配代价;
确定目标帧对应的匹配代价等于第一类型匹配代价和第二类型匹配代价的加权和。
可选的,第一类型匹配代价基于零均值归一化互相关确定。
可选的,第二类型匹配代价基于光照不变特征确定。
可选的,处理器501用于根据参考帧,确定目标帧对应的匹配代价,具体可以包括:
将目标帧划分成多个图像块;
根据参考帧,确定每一个图像块对应的匹配代价;
根据每一个图像块对应的匹配代价,确定目标帧对应的匹配代价。
可选的,处理器501用于将目标帧划分成多个图像块,具体可以包括:
采用聚类的方式,将目标帧划分成多个图像块。
可选的,处理器501用于将目标帧划分成多个图像块,具体可以包括:
将目标帧均匀划分成多个图像块。
可选的,处理器501用于根据参考帧,确定每一个图像块对应的匹配代价,具体可以包括:
根据参考帧,并行确定每一个图像块对应的匹配代价。
可选的,处理器501用于根据参考帧,确定每一个图像块对应的匹配代价,具体可以包括:
根据每一个图像块中的稀疏点确定该图像块的深度采样次数;
根据参考帧以及每一个图像块的深度采样次数,确定每一个图像块对应的匹配代价。
可选的,处理器501还用于在基于参考帧获得目标帧的深度图之后,对目标帧的深度图进行滤波处理。
可选的,处理器501用于对目标帧的深度图进行滤波处理,具体可以包括:
对目标帧的深度图进行三边滤波处理。
可选的,处理器501用于融合目标帧的深度图,获得目标场景的三维模型,具体可以包括:
根据目标帧的深度图,确定目标帧对应的点云;
将目标帧对应的点云融合至目标场景对应的体素中;
根据目标场景对应的体素,获得目标场景的三维模型。
图6为本发明提供的无人机一实施例的结构示意图。如图6所示,本实施例提供的无人机600可以包括处理器601。该处理器601可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
无人机600上搭载有拍摄装置602,拍摄装置602用于对目标场景进行拍摄。
处理器601用于,获取目标场景的图像序列,图像序列包含在时序上连续的多个图像帧;
根据时序上连续的多个图像帧获得目标帧和参考帧,并基于参考帧获得目标帧的深度图;
融合目标帧的深度图,获得目标场景的三维模型。
可选的,目标帧包含在时序上连续的多个图像帧中的一帧。
可选的,参考帧包含与目标帧存在重叠像素的帧。
可选的,参考帧至少包括第一图像帧和第二图像帧;第一图像帧在时序上位于目标帧之前;第二图像帧在时序上位于目标帧之后。
可选的,若目标帧为第N帧,则第一图像帧为第N-1帧,第二图像帧为第N+1帧。
可选的,参考帧至少包括第三图像帧;第三图像帧与目标帧的极线方向不平行。
可选的,处理器601用于基于参考帧获得目标帧的深度图,具体可以包括:
根据目标帧和参考帧之间的像差,获得目标帧的深度图。
可选的,处理器601用于基于参考帧获得目标帧的深度图,具体可以包括:
根据参考帧,确定目标帧对应的匹配代价;
根据目标帧对应的匹配代价,确定目标帧的深度图。
可选的,处理器601用于根据参考帧,确定目标帧对应的匹配代价,具体可以包括:
根据目标帧以及参考帧,确定目标帧对应的第一类型匹配代价和第二类型匹配代价;
确定目标帧对应的匹配代价等于第一类型匹配代价和第二类型匹配代价的加权和。
可选的,第一类型匹配代价基于零均值归一化互相关确定。
可选的,第二类型匹配代价基于光照不变特征确定。
可选的,处理器601用于根据参考帧,确定目标帧对应的匹配代价,具体可以包括:
将目标帧划分成多个图像块;
根据参考帧,确定每一个图像块对应的匹配代价;
根据每一个图像块对应的匹配代价,确定目标帧对应的匹配代价。
可选的,处理器601用于将目标帧划分成多个图像块,具体可以包括:
采用聚类的方式,将目标帧划分成多个图像块。
可选的,处理器601用于将目标帧划分成多个图像块,具体可以包括:
将目标帧均匀划分成多个图像块。
可选的,处理器601用于根据参考帧,确定每一个图像块对应的匹配代价,具体可以包括:
根据参考帧,并行确定每一个图像块对应的匹配代价。
可选的,处理器601用于根据参考帧,确定每一个图像块对应的匹配代价,具体可以包括:
根据每一个图像块中的稀疏点确定该图像块的深度采样次数;
根据参考帧以及每一个图像块的深度采样次数,确定每一个图像块对应的匹配代价。
可选的,处理器601还用于在基于参考帧获得目标帧的深度图之后,对目标帧的深度图进行滤波处理。
可选的,处理器601用于对目标帧的深度图进行滤波处理,具体可以包括:
对目标帧的深度图进行三边滤波处理。
可选的,处理器601用于融合目标帧的深度图,获得目标场景的三维模型,具体可以包括:
根据目标帧的深度图,确定目标帧对应的点云;
将目标帧对应的点云融合至目标场景对应的体素中;
根据目标场景对应的体素,获得目标场景的三维模型。
本发明实施例还提供一种目标场景三维重建装置(例如芯片、集成电路等),包括:存储器和处理器。所述存储器,用于存储执行目标场景三维重建方法的代码。所述处理器,用于调用所述存储器中存储的所述代码,执行上述任一方法实施例所述的目标场景三维重建方法。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包含至少一段代码,所述至少一段代码可由计算机执行,以控制所述计算机执行上述任一方法实施例所述的目标场景三维重建方法。
本发明实施例提供一种计算机程序,当所述计算机程序被计算机执行时,用于实现上述任一方法实施例所述的目标场景三维重建方法。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读 取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:只读内存(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (57)

  1. 一种目标场景三维重建方法,其特征在于,包括:
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
    融合所述目标帧的深度图,获得所述目标场景的三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述目标帧包含在时序上连续的多个图像帧中的一帧。
  3. 根据权利要求2所述的方法,其特征在于,所述参考帧包含与所述目标帧存在重叠像素的帧。
  4. 根据权利要求1所述的方法,其特征在于,所述参考帧至少包括第一图像帧和第二图像帧;
    所述第一图像帧在时序上位于所述目标帧之前;
    所述第二图像帧在时序上位于所述目标帧之后。
  5. 根据权利要求4所述的方法,其特征在于,若所述目标帧为第N帧,则所述第一图像帧为第N-1帧,所述第二图像帧为第N+1帧。
  6. 根据权利要求1所述的方法,其特征在于,所述参考帧至少包括第三图像帧;
    所述第三图像帧与所述目标帧的极线方向不平行。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述基于所述参考帧获得所述目标帧的深度图,包括:
    根据所述目标帧和所述参考帧之间的像差,获得所述目标帧的深度图。
  8. 根据权利要求1所述的方法,其特征在于,所述基于所述参考帧获得所述目标帧的深度图,包括:
    根据所述参考帧,确定所述目标帧对应的匹配代价;
    根据所述目标帧对应的匹配代价,确定所述目标帧的深度图。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述参考帧,确定所述目标帧对应的匹配代价,包括:
    根据所述目标帧以及所述参考帧,确定所述目标帧对应的第一类型匹配 代价和第二类型匹配代价;
    确定所述目标帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和。
  10. 根据权利要求9所述的方法,其特征在于,所述第一类型匹配代价基于零均值归一化互相关确定。
  11. 根据权利要求9所述的方法,其特征在于,所述第二类型匹配代价基于光照不变特征确定。
  12. 根据权利要求8所述的方法,其特征在于,所述根据所述参考帧,确定所述目标帧对应的匹配代价,包括:
    将所述目标帧划分成多个图像块;
    根据所述参考帧,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述目标帧对应的匹配代价。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述目标帧划分成多个图像块,包括:
    采用聚类的方式,将所述目标帧划分成多个图像块。
  14. 根据权利要求12所述的方法,其特征在于,所述将所述目标帧划分成多个图像块,包括:
    将所述目标帧均匀划分成多个图像块。
  15. 根据权利要求12所述的方法,其特征在于,所述根据所述参考帧,确定每一个图像块对应的匹配代价,包括:
    根据所述参考帧,并行确定每一个图像块对应的匹配代价。
  16. 根据权利要求12所述的方法,其特征在于,所述根据所述参考帧,确定每一个图像块对应的匹配代价,包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述参考帧以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  17. 根据权利要求1所述的方法,其特征在于,在所述基于所述参考帧获得所述目标帧的深度图之后,还包括:
    对所述目标帧的深度图进行滤波处理。
  18. 根据权利要求17所述的方法,其特征在于,所述对所述目标帧的深度图进行滤波处理,包括:
    对所述目标帧的深度图进行三边滤波处理。
  19. 根据权利要求1所述的方法,其特征在于,所述融合所述目标帧的深度图,获得所述目标场景的三维模型,包括:
    根据所述目标帧的深度图,确定所述目标帧对应的点云;
    将所述目标帧对应的点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
  20. 一种目标场景三维重建系统,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储程序代码;
    所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
    融合所述目标帧的深度图,获得所述目标场景的三维模型。
  21. 根据权利要求20所述的系统,其特征在于,所述目标帧包含在时序上连续的多个图像帧中的一帧。
  22. 根据权利要求21所述的系统,其特征在于,所述参考帧包含与所述目标帧存在重叠像素的帧。
  23. 根据权利要求20所述的系统,其特征在于,所述参考帧至少包括第一图像帧和第二图像帧;
    所述第一图像帧在时序上位于所述目标帧之前;
    所述第二图像帧在时序上位于所述目标帧之后。
  24. 根据权利要求23所述的系统,其特征在于,若所述目标帧为第N帧,则所述第一图像帧为第N-1帧,所述第二图像帧为第N+1帧。
  25. 根据权利要求20所述的系统,其特征在于,所述参考帧至少包括第三图像帧;
    所述第三图像帧与所述目标帧的极线方向不平行。
  26. 根据权利要求20-25任一项所述的系统,其特征在于,所述处理器,用于基于所述参考帧获得所述目标帧的深度图,具体包括:
    根据所述目标帧和所述参考帧之间的像差,获得所述目标帧的深度图。
  27. 根据权利要求20所述的系统,其特征在于,所述处理器,用于基于所述参考帧获得所述目标帧的深度图,具体包括:
    根据所述参考帧,确定所述目标帧对应的匹配代价;
    根据所述目标帧对应的匹配代价,确定所述目标帧的深度图。
  28. 根据权利要求27所述的系统,其特征在于,所述处理器,用于根据所述参考帧,确定所述目标帧对应的匹配代价,具体包括:
    根据所述目标帧以及所述参考帧,确定所述目标帧对应的第一类型匹配代价和第二类型匹配代价;
    确定所述目标帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和。
  29. 根据权利要求28所述的系统,其特征在于,所述第一类型匹配代价基于零均值归一化互相关确定。
  30. 根据权利要求28所述的系统,其特征在于,所述第二类型匹配代价基于光照不变特征确定。
  31. 根据权利要求27所述的系统,其特征在于,所述处理器,用于根据所述参考帧,确定所述目标帧对应的匹配代价,具体包括:
    将所述目标帧划分成多个图像块;
    根据所述参考帧,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述目标帧对应的匹配代价。
  32. 根据权利要求31所述的系统,其特征在于,所述处理器,用于将所述目标帧划分成多个图像块,具体包括:
    采用聚类的方式,将所述目标帧划分成多个图像块。
  33. 根据权利要求31所述的系统,其特征在于,所述处理器,用于将所述目标帧划分成多个图像块,具体包括:
    将所述目标帧均匀划分成多个图像块。
  34. 根据权利要求31所述的系统,其特征在于,所述处理器,用于根据 所述参考帧,确定每一个图像块对应的匹配代价,具体包括:
    根据所述参考帧,并行确定每一个图像块对应的匹配代价。
  35. 根据权利要求31所述的系统,其特征在于,所述处理器,用于根据所述参考帧,确定每一个图像块对应的匹配代价,具体包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述参考帧以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  36. 根据权利要求20所述的系统,其特征在于,所述处理器,还用于在所述基于所述参考帧获得所述目标帧的深度图之后,对所述目标帧的深度图进行滤波处理。
  37. 根据权利要求36所述的系统,其特征在于,所述处理器,用于对所述目标帧的深度图进行滤波处理,具体包括:
    对所述目标帧的深度图进行三边滤波处理。
  38. 根据权利要求20所述的系统,其特征在于,所述处理器,用于融合所述目标帧的深度图,获得所述目标场景的三维模型,具体包括:
    根据所述目标帧的深度图,确定所述目标帧对应的点云;
    将所述目标帧对应的点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
  39. 一种无人机,其特征在于,包括:处理器;
    所述无人机上搭载有拍摄装置,所述拍摄装置用于对目标场景进行拍摄;
    所述处理器用于,
    获取目标场景的图像序列,所述图像序列包含在时序上连续的多个图像帧;
    根据所述时序上连续的多个图像帧获得目标帧和参考帧,并基于所述参考帧获得所述目标帧的深度图;
    融合所述目标帧的深度图,获得所述目标场景的三维模型。
  40. 根据权利要求39所述的无人机,其特征在于,所述目标帧包含在时序上连续的多个图像帧中的一帧。
  41. 根据权利要求40所述的无人机,其特征在于,所述参考帧包含与所述目标帧存在重叠像素的帧。
  42. 根据权利要求39所述的无人机,其特征在于,所述参考帧至少包括第一图像帧和第二图像帧;
    所述第一图像帧在时序上位于所述目标帧之前;
    所述第二图像帧在时序上位于所述目标帧之后。
  43. 根据权利要求42所述的无人机,其特征在于,若所述目标帧为第N帧,则所述第一图像帧为第N-1帧,所述第二图像帧为第N+1帧。
  44. 根据权利要求39所述的无人机,其特征在于,所述参考帧至少包括第三图像帧;
    所述第三图像帧与所述目标帧的极线方向不平行。
  45. 根据权利要求39-44任一项所述的无人机,其特征在于,所述处理器,用于基于所述参考帧获得所述目标帧的深度图,具体包括:
    根据所述目标帧和所述参考帧之间的像差,获得所述目标帧的深度图。
  46. 根据权利要求39所述的无人机,其特征在于,所述处理器,用于基于所述参考帧获得所述目标帧的深度图,具体包括:
    根据所述参考帧,确定所述目标帧对应的匹配代价;
    根据所述目标帧对应的匹配代价,确定所述目标帧的深度图。
  47. 根据权利要求46所述的无人机,其特征在于,所述处理器,用于根据所述参考帧,确定所述目标帧对应的匹配代价,具体包括:
    根据所述目标帧以及所述参考帧,确定所述目标帧对应的第一类型匹配代价和第二类型匹配代价;
    确定所述目标帧对应的匹配代价等于所述第一类型匹配代价和第二类型匹配代价的加权和。
  48. 根据权利要求47所述的无人机,其特征在于,所述第一类型匹配代价基于零均值归一化互相关确定。
  49. 根据权利要求47所述的无人机,其特征在于,所述第二类型匹配代价基于光照不变特征确定。
  50. 根据权利要求46所述的无人机,其特征在于,所述处理器,用于根据所述参考帧,确定所述目标帧对应的匹配代价,具体包括:
    将所述目标帧划分成多个图像块;
    根据所述参考帧,确定每一个图像块对应的匹配代价;
    根据每一个所述图像块对应的匹配代价,确定所述目标帧对应的匹配代价。
  51. 根据权利要求50所述的无人机,其特征在于,所述处理器,用于将所述目标帧划分成多个图像块,具体包括:
    采用聚类的方式,将所述目标帧划分成多个图像块。
  52. 根据权利要求50所述的无人机,其特征在于,所述处理器,用于将所述目标帧划分成多个图像块,具体包括:
    将所述目标帧均匀划分成多个图像块。
  53. 根据权利要求50所述的无人机,其特征在于,所述处理器,用于根据所述参考帧,确定每一个图像块对应的匹配代价,具体包括:
    根据所述参考帧,并行确定每一个图像块对应的匹配代价。
  54. 根据权利要求50所述的无人机,其特征在于,所述处理器,用于根据所述参考帧,确定每一个图像块对应的匹配代价,具体包括:
    根据每一个所述图像块中的稀疏点确定该图像块的深度采样次数;
    根据所述参考帧以及每一个所述图像块的深度采样次数,确定每一个所述图像块对应的匹配代价。
  55. 根据权利要求39所述的无人机,其特征在于,所述处理器,还用于在所述基于所述参考帧获得所述目标帧的深度图之后,对所述目标帧的深度图进行滤波处理。
  56. 根据权利要求55所述的无人机,其特征在于,所述处理器,用于对所述目标帧的深度图进行滤波处理,具体包括:
    对所述目标帧的深度图进行三边滤波处理。
  57. 根据权利要求39所述的无人机,其特征在于,所述处理器,用于融合所述目标帧的深度图,获得所述目标场景的三维模型,具体包括:
    根据所述目标帧的深度图,确定所述目标帧对应的点云;
    将所述目标帧对应的点云融合至所述目标场景对应的体素中;
    根据所述目标场景对应的体素,获得所述目标场景的三维模型。
PCT/CN2018/119153 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机 WO2020113417A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/119153 WO2020113417A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机
CN201880073770.5A CN111433819A (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/119153 WO2020113417A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Publications (1)

Publication Number Publication Date
WO2020113417A1 true WO2020113417A1 (zh) 2020-06-11

Family

ID=70974848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/119153 WO2020113417A1 (zh) 2018-12-04 2018-12-04 目标场景三维重建方法、系统及无人机

Country Status (2)

Country Link
CN (1) CN111433819A (zh)
WO (1) WO2020113417A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113253293A (zh) * 2021-06-03 2021-08-13 中国人民解放军国防科技大学 一种激光点云畸变的消除方法和计算机可读存储介质
WO2023115561A1 (zh) * 2021-12-24 2023-06-29 深圳市大疆创新科技有限公司 一种可移动平台的移动控制方法、装置以及可移动平台

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106814744A (zh) * 2017-03-14 2017-06-09 吉林化工学院 一种无人机飞行控制系统以及方法
US20180077400A1 (en) * 2016-09-12 2018-03-15 Dassault Systemes 3D Reconstruction Of A Real Object From A Depth Map
CN108521788A (zh) * 2017-11-07 2018-09-11 深圳市大疆创新科技有限公司 生成模拟航线的方法、模拟飞行的方法、设备及存储介质
CN108701373A (zh) * 2017-11-07 2018-10-23 深圳市大疆创新科技有限公司 基于无人机航拍的三维重建方法、系统及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665086B (zh) * 2012-04-26 2014-03-05 清华大学深圳研究生院 利用基于区域的局部立体匹配获取视差的方法
CN103260032B (zh) * 2013-04-18 2016-07-06 清华大学深圳研究生院 一种立体视频深度图序列的帧率提升方法
CN108537876B (zh) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 三维重建方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180077400A1 (en) * 2016-09-12 2018-03-15 Dassault Systemes 3D Reconstruction Of A Real Object From A Depth Map
CN106814744A (zh) * 2017-03-14 2017-06-09 吉林化工学院 一种无人机飞行控制系统以及方法
CN108521788A (zh) * 2017-11-07 2018-09-11 深圳市大疆创新科技有限公司 生成模拟航线的方法、模拟飞行的方法、设备及存储介质
CN108701373A (zh) * 2017-11-07 2018-10-23 深圳市大疆创新科技有限公司 基于无人机航拍的三维重建方法、系统及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIAN, HU: "Depth Estimation from a Monocular Image", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, INFORMATION & TECHNOLOGY, no. 03, 15 March 2016 (2016-03-15), DOI: 20190820162605X *
TIAN, HU: "Depth Estimation from a Monocular Image", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, INFORMATION & TECHNOLOGY, no. 03, 15 March 2016 (2016-03-15), DOI: 20190820162622X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113253293A (zh) * 2021-06-03 2021-08-13 中国人民解放军国防科技大学 一种激光点云畸变的消除方法和计算机可读存储介质
WO2023115561A1 (zh) * 2021-12-24 2023-06-29 深圳市大疆创新科技有限公司 一种可移动平台的移动控制方法、装置以及可移动平台

Also Published As

Publication number Publication date
CN111433819A (zh) 2020-07-17

Similar Documents

Publication Publication Date Title
WO2020113423A1 (zh) 目标场景三维重建方法、系统及无人机
US11915502B2 (en) Systems and methods for depth map sampling
JP7252943B2 (ja) 航空機のための対象物検出及び回避
WO2020014909A1 (zh) 拍摄方法、装置和无人机
CN109144095B (zh) 用于无人驾驶飞行器的基于嵌入式立体视觉的障碍躲避系统
JP6496323B2 (ja) 可動物体を検出し、追跡するシステム及び方法
WO2020172875A1 (zh) 道路结构信息的提取方法、无人机及自动驾驶系统
JP2019532268A (ja) プロペラの羽根に統合された撮像装置を用いたステレオ距離情報の判定
CN108171715B (zh) 一种图像分割方法及装置
WO2019127518A1 (zh) 避障方法、装置及可移动平台
WO2021035731A1 (zh) 无人飞行器的控制方法、装置及计算机可读存储介质
WO2021081774A1 (zh) 一种参数优化方法、装置及控制设备、飞行器
CN116359873A (zh) 结合鱼眼相机实现车端4d毫米波雷达slam处理的方法、装置、处理器及其存储介质
WO2020113417A1 (zh) 目标场景三维重建方法、系统及无人机
WO2019183789A1 (zh) 无人机的控制方法、装置和无人机
CN111712687B (zh) 航测方法、飞行器及存储介质
WO2021081958A1 (zh) 地形检测方法、可移动平台、控制设备、系统及存储介质
JP6630939B2 (ja) 制御装置、撮像装置、移動体、制御方法、及びプログラム
JP7501535B2 (ja) 情報処理装置、情報処理方法、情報処理プログラム
WO2021217450A1 (zh) 目标跟踪方法、设备及存储介质
US20210256732A1 (en) Image processing method and unmanned aerial vehicle
WO2021035746A1 (zh) 图像处理方法、装置和可移动平台
EP3714427A1 (en) Sky determination in environment detection for mobile platforms, and associated systems and methods
JP2021103410A (ja) 移動体及び撮像システム
US20240037759A1 (en) Target tracking method, device, movable platform and computer-readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18942135

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18942135

Country of ref document: EP

Kind code of ref document: A1