CN112509124A

CN112509124A - Depth map obtaining method and system, unmanned aerial vehicle orthographic map generating method and medium

Info

Publication number: CN112509124A
Application number: CN202011462830.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Chengdu Shuzhilian Technology Co Ltd
Current assignee: Chengdu Shuzhilian Technology Co Ltd
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-16
Anticipated expiration: 2040-12-14
Also published as: CN112509124B

Abstract

The invention discloses a depth map obtaining method and a depth map obtaining system, an unmanned aerial vehicle ortho map generating method and a medium, which relate to the field of unmanned aerial vehicle remote sensing image processing and comprise the following steps: shooting the same scene at a plurality of positions by using an unmanned aerial vehicle to obtain a plurality of pictures; obtaining a plurality of matching pairs among different pictures and whether the pictures are overlapped or not through feature point matching; based on the matching result, motion recovery calculation is carried out to obtain camera external parameters and sparse space point clouds of all the pictures; performing multi-view stereo geometric calculation on all the pictures to obtain dense depth maps corresponding to all the pictures; the method uses block matching to match pixels, and the block matching adds space constraint for depth measurement of pixel points through regularization processing; according to the method, the error of depth measurement can be reduced through space regularization constraint, the local depths tend to be uniform, the method is more suitable for the condition that the depth change amplitude in the three-dimensional reconstruction scene of the high-altitude unmanned aerial vehicle is not large, and meanwhile, the depths of some block matching failure points can be reasonably recovered.

Description

Depth map obtaining method and system, unmanned aerial vehicle orthographic map generating method and medium

Technical Field

The invention relates to the field of unmanned aerial vehicle remote sensing image processing, in particular to a depth map obtaining method and system, an unmanned aerial vehicle orthographic map generating method and medium.

Background

At present, the three-dimensional reconstruction technology is mature day by day, and the fusion with deep learning is also performed vigorously. The most dominant techniques in three-dimensional reconstruction are also matching point pairs, SfM (motion recovery) and MVS (multi-view stereovision). The main function of the matching point pairs is to match the pixels with the same name, and the main methods are a characteristic point method and an optical flow method. SfM estimates camera parameters from the result of the matching point pair, and performs global optimization on the camera parameters to obtain a smaller error globally. The MVS is to perform depth measurement on pixel points in an image by using estimated camera parameters to obtain dense or semi-dense point clouds. And whether the depth refers to the distance between the measuring tool and the surface of the measured target or not, wherein the measuring tool is a camera, and the measured target is an object corresponding to each pixel point in the image.

At present, in a three-dimensional reconstruction technology, for a high-altitude unmanned aerial vehicle, under the condition that a measurement distance is long and a depth change amplitude is not large (5% of flight height) in a three-dimensional reconstruction process (namely, under the condition that the height change of a shot object is not large), no method for improving the scene exists at present. In the conventional MVS method, the depth measurement is less accurate the farther away. Meanwhile, the situation that the depth change is severe and gentle needs to be adapted, but the situation that measurement errors easily occur in the three-dimensional reconstruction of the unmanned aerial vehicle is increased. In addition, common block matching algorithms are also prone to mismatch.

Disclosure of Invention

In order to solve the problems in the background art, the invention provides a depth map obtaining method and system and an unmanned aerial vehicle orthographic map generating method. The depth measurement error can be reduced through space regularization constraint, the local depth tends to be uniform, the method is more suitable for the condition that the depth change amplitude in a high-altitude unmanned aerial vehicle three-dimensional reconstruction scene is not large, and meanwhile, the depth of some block matching failure points can be reasonably recovered.

To achieve the above object, the present invention provides a depth map obtaining method, including:

shooting the same scene at a plurality of positions by using an unmanned aerial vehicle to obtain a plurality of pictures;

obtaining a plurality of matching pairs among different pictures and whether the pictures are overlapped or not through feature point matching;

based on the matching result, motion recovery calculation is carried out to obtain camera external parameters and sparse space point clouds of all the pictures;

performing multi-view stereo geometric calculation on all the pictures to obtain dense depth maps corresponding to all the pictures;

in the multi-view stereo geometric calculation process, matching among pixels is carried out by using block matching, and space constraint is added for pixel point depth measurement through regularization processing by the block matching.

The depth regularization method based on the unmanned aerial vehicle depth regularization is used for conducting spatial regularization on the depth to adapt to an unmanned aerial vehicle aerial shooting scene, the accuracy of depth measurement is improved, and meanwhile a reasonable depth can be given to some points which cannot be measured. The method is used in MVS (multi-view stereoscopic vision), and after the steps of matching, SfM and the like, the camera parameters of each camera, the space coordinates of a plurality of space points and the association of the space points and pixel points in an image are obtained.

Preferably, the method constructs a spatial regularization term during the block matching process, and performs the regularization process by using a linear function with respect to the depth distance.

Preferably, the method performs block matching by using a normalized cross-correlation matching method.

Preferably, the method uses a linear function in equation (1) to perform the regularization process:

ncc_norm＝(k1△d+bias)ncc (1)

wherein d ═ d-d_mI, k1 is the slope of the linear function, bias is the bias of the linear function, NCC is the value obtained by taking the mean value NCC calculation, d is the depth measured with the current pixel as the matching point_mTo predict depth.

Preferably, the method uses a linear function in equation (2) to perform the regularization process:

ncc_norm＝(k1l+bias)ncc (2)

where k1 is the slope of the linear function, bias is the bias of the linear function, NCC is the value obtained by the calculation of the mean NCC, and l is the pixel distance of the matching point from the expected matching point.

The method can represent the distance of the depth by simply calculating the pixel distance, and avoids the need of calculating the depth for each matching.

Preferably, the height change of the object shot in the scene shot by the unmanned aerial vehicle is within a preset range. I.e. the variation in height of the ground-based object is relatively small with respect to the flying height.

The present invention also provides a depth map obtaining system, the system comprising:

the picture obtaining unit is used for shooting the same scene at a plurality of positions by using the unmanned aerial vehicle to obtain a plurality of pictures;

the characteristic point matching unit is used for obtaining a plurality of matching pairs among different pictures and whether the pictures are overlapped or not through characteristic point matching;

the motion recovery calculation unit is used for carrying out motion recovery calculation to obtain camera external parameters and sparse space point clouds of all the pictures based on the matching result;

the multi-view solid geometry calculation unit is used for performing multi-view solid geometry calculation on all the pictures to obtain dense depth maps corresponding to all the pictures;

in the multi-view stereo geometric calculation process, pixel matching is carried out by using block matching, and space constraint is added for pixel point depth measurement by regularization processing of the block matching.

The system uses a linear function in formula (1) to perform regularization processing:

ncc_norm＝(k1△d+bias)ncc (1)

The system uses a linear function in an equation (2) to perform regularization processing:

ncc_norm＝(k1l+bias)ncc (2)

The invention also provides an unmanned aerial vehicle orthograph generation method, which comprises the following steps:

shooting a scene from a plurality of angles by using an unmanned aerial vehicle to obtain a plurality of pictures;

preprocessing all pictures;

extracting feature points from the preprocessed pictures, matching any two pictures, determining a picture pair with an overlapping region, and obtaining a feature point matching result;

performing motion recovery calculation based on the feature point matching result to obtain camera parameters of each picture and coordinates of space points corresponding to the matching feature points;

based on the unmanned aerial vehicle camera parameters and the coordinates of the space points corresponding to the matched feature points, performing multi-view stereo geometric calculation to obtain dense depth maps of all the pictures; performing pixel matching by using epipolar line search and block matching in the multi-view solid geometry calculation process;

carrying out depth unified processing on the dense depth maps of all the pictures;

carrying out image fusion processing on the dense depth map subjected to the depth unification processing to generate an unmanned aerial vehicle orthographic map;

performing motion recovery calculation on all pictures to obtain camera parameters and sparse space point clouds of all the pictures;

performing multi-view stereo geometric calculation on all the pictures based on the camera parameters and the sparse space point clouds of all the pictures to obtain dense depth maps corresponding to all the pictures;

in the multi-view stereo geometric calculation process, a block matching method is used for matching pixels, and space constraint is added for pixel point depth measurement through regularization processing in the block matching.

The invention also provides a depth map obtaining device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the depth map obtaining method when executing the computer program.

The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the depth map obtaining method.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

the traditional ncc without regularization often has the phenomena of matching error and nonuniform local depth, after the regularization processing in the invention, the matching accuracy in the repeated texture is improved, a reasonable depth is calculated in the area with changed brightness, so that the local depth is unified, and the finally generated depth map is obviously denser than the depth map without regularization. The depth measurement error can be reduced through space regularization constraint, the local depth tends to be uniform, the method is more suitable for the condition that the depth change amplitude in a high-altitude unmanned aerial vehicle three-dimensional reconstruction scene is not large, and meanwhile, the depth of some block matching failure points can be reasonably recovered.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;

FIG. 1 is a schematic flow chart of a depth map acquisition method;

FIG. 2 is a geometric schematic of epipolar lines in an imaging model;

FIG. 3 is a geometric schematic of epipolar lines in an imaging model;

fig. 4 is a schematic composition diagram of a depth map acquisition system.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflicting with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.

Example one

The invention provides a depth map obtaining method, and FIG. 1 is a flow diagram of the depth map obtaining method, and the method includes:

The depth is spatially regularized by the method, so that the method is suitable for an unmanned aerial vehicle aerial shooting scene, the accuracy of depth measurement is improved, and meanwhile, a reasonable depth can be given to some points which cannot be measured. The method is used in MVS (multi-view stereoscopic vision), and after the steps of matching, SfM and the like, the camera parameters of each camera, the space coordinates of a plurality of space points and the association of the space points and pixel points in an image are obtained.

The camera parameters mainly include internal parameters and external parameters: the intrinsic parameters are often represented by a 3x3 matrix, written K, for the purpose of converting the camera coordinate system to the image coordinate system, and the intrinsic parameters of the pictures taken by the same camera are the same for the same camera. The extrinsic parameters include two rotation matrices R, and a displacement matrix t, for implementing the conversion from the world coordinate system to the camera coordinate system. Assuming that a certain spatial point is P and its corresponding pixel is P in the world coordinate system (the pixel coordinate at this time is homogeneous coordinate, that is, there are 3 dimensions), the conversion from the world coordinate system to the pixel coordinate can be written as:

z in the above formula is depth, which is a scalar quantity, and is a quantity to be measured by the MVS. (u, v) is the coordinate of pixel P, (u, v,1) is the homogeneous coordinate representation of P, and (x, y, z) is the coordinate of spatial point P.

The steps of measuring the depth of a single pixel in the MVS are as follows:

1. polar line search:

(1) polar line finding

Suppose pictures a and B both originate from the same camera, so they have the same in-camera parameters, denoted as K. Suppose the camera extrinsic parameter of Picture A is R₁，t₁And the camera external parameter of the picture B is R₂，t₂。

For two cameras with obtained camera parameters, the depth of a pixel point can be calculated only by finding the corresponding point of the pixel point in one camera in the other camera. At this time, it is not difficult to find that, for a pixel point with unknown depth, it must be at a certain point on a ray formed by the camera optical center and the pixel point, and this ray is imaged as a straight line in another camera, and if the camera extrinsic parameter obtained through SfM calculation is accurate, then the point matching this point must also be on this straight line, and this straight line is called epipolar line.

The optical center of the pinhole camera refers to the pinhole, and the optical center of the digital camera is generally the optical center of the convex lens at the forefront of the lens.

If the depth of the pixels in Picture A is known, then each of Picture A can be calculated by remapping using the camera parametersThe position of the corresponding point of the pixel in the picture B assumes that a pixel point in the picture A is p₁The point of the image A after depth normalization in the camera coordinate system is P₁Depth z₁The corresponding pixel point in the picture B is p₂，z₂Then the equation can be obtained:

P₁＝K^-1p₁ (4)

p₂＝KP₂ (5)

order:

the following can be obtained:

z₂P₂＝z₁R₂₁P₁+t₂₁ (7)

for z₁Theoretically, its range is (0, + ∞), i.e. there is no upper limit for the epipolar line length. However, in a specific picture, the depths of all pixel points are in a range, and the spatial points can be obtained through SfM to estimate the mean value and standard deviation of the pixel depth of the whole picture. Assume mean value e₁Standard deviation of d₁. So that z can be assumed₁∈[e₁-λd₁,e₁+λd₁]＝[z_min,z_max]Where λ is a parameter to be set, and is generally 2 or 3.

Then, the line segment of the polar line imaged in the picture B can be obtained, and the end points of the line segment are p respectively_min,p_max：

a₁p_min＝K(z_minR₂₁K^-1p₁+t₂₁) (8)

a₂p_max＝K(z_maxR₂₁K^-1p₁+t₂₁) (9)

(2) Block matching

Matching the homonym points by a single pixel is not accurate enough, and block matching is often used, that is, the gray values of surrounding pixels are also considered. The simpler block matching is the de-averaged NCC:

wherein, NCC (A, B)_PTo remap the P point into AB for two pixels of NCC, A (i, j) is the gray value at the (i, j) location in A, E (A)_P) For the pixels P remapped into A and the mean value of the gray levels in its domain, A_PRemap P back to the area formed by the pixels in A and their leading edge.

The range of NCC values is [ -1,1], with closer to 1 indicating more similarity. A threshold is set for NCC, and when NCC is greater than the threshold, two points are considered similar, being homonymous points. Typically this threshold is taken to be 0.85.

The NCC is a normalized cross correlation matching method NCC (normalized cross correlation matching), which is a matching method based on image gray scale information, and is used for normalizing the correlation degree between the objects to be matched. The normalized cross-correlation matching algorithm is a more classical matching algorithm in the image matching algorithm.

(3) Spatial regularization

Because the NCC is relatively simple, points that pass the NCC match are more likely to be mismatched or mismatched. The mismatching occurs because the epipolar line search range is too large, when a region with repeated textures is encountered, mismatching is very easy to occur, and the matching failure occurs because illumination influences on imaging when shooting at different angles, although the NCC with mean value removed has the effect of weakening the illumination influence compared with the common NCC, the weakening degree is limited, a large number of points can not be matched, and some depth maps are possibly too sparse.

Considering the application scene of unmanned aerial vehicle aerial photography, the depth change amplitude in a single image is not very large, so that the depth change of adjacent points can be assumed to be very small, and then the depth of adjacent pixels can be used as the predicted depth of the pixel to be measured. With this assumption, a spatial regularization term can be constructed for NCC, where the regularization is not done using a simplest linear function with respect to depth distance:

ncc_norm＝(k1△d+bias)ncc (1)

where d ═ d-d_mI, k1 is the slope of the linear function, bias is the bias of the linear function, NCC is the value calculated by taking the mean NCC. Where d is the depth measured using the current pixel as the matching point, d_mTo predict depth, which is dynamically changing, the depth of neighboring pixels can be averaged.

The original formula needs to calculate the depth of the block matching point at each matching, and the calculation amount is very large, so that the relation between the pixel distance and the depth needs to be discussed next to avoid calculating the depth at each epipolar line search.

Referring to FIG. 2, FIG. 2 is a schematic view of epipolar lines in an imaging model;

a new pixel point of unmeasured depth is considered, P is a space point, depth is a predicted depth, A and B are cameras A and B respectively, a line segment AB is displacement between the two cameras, E is a pixel point on an imaging plane, and F is an optimal pixel point on an epipolar line matched with NCC. So the coordinates for A, B, P, E in fig. 2 are known

Is known, it is the relationship of the projected length (i.e. depth) of EF and DP in the z direction that needs to be discussed. The triangle BDP needs to be solved, the triangle BDP is taken out independently, and a coordinate system is established by taking P as an origin, PB as a y-axis and unit length as one pixel.

In fig. 3, fig. 3 is a geometrical diagram of the epipolar lines after being coordinated; let the coordinates of B be (0, B), the coordinates of E be (0, E),

for the polar direction, its unit vector is (m, n), the equation for the straight line AP is x-qz, and the EF length is l, so there are coordinates of F as (lm, e + ln) where only l is an unknown quantity. The problem then translates into finding the length of the DP.

a. The equation for the straight line BF is:

b. and (5) obtaining the intersection point of the AP and the BD. Since the AP straight line is known, the length of DP can be represented by finding only one of the horizontal and vertical coordinates, D has the vertical coordinate:

q is the inverse of the slope of the straight line AP, and if q is larger, it means that the camera a is very close to the ground, which is inconsistent with the assumption of the high-altitude unmanned aerial vehicle, so in the case of aerial photography by the high-altitude unmanned aerial vehicle, q is smaller, and therefore the size of (m-nq) l is smaller. In addition, q is not infinitely small, and the more q tends to be 0, the smaller the distances between the cameras A and B are until the cameras coincide with each other, but in the process of triangulating the depth, a certain distance must be ensured, and the depth which is measured by the method is not reliable. Furthermore, (b-e) is equal to the focal length of the camera, which is typically thousands of pixels, so q (b-e) > (m-nq) l can be obtained. So the above can be formulated as:

the analysis here is that the depth z and the x direction are the same, and the depth z and the y direction are similar, and the results obtained by the two dimensions are unified to obtain that z is proportional to l.

Therefore, the depth of the nearby pixel point can be used as the prediction depth of the pixel point to be detected, and then the prediction depth is used for obtaining the prediction matching point. The above derivation can result in that the pixel distance Δ dis between the matching point obtained in the NCC matching process and the predicted matching point is proportional to the difference Δ depth between the depth calculated by the matching point and the predicted depth.

The pixel distance between the matching point and the expected matching point is actually l, and then equation (1) can be rewritten as:

ncc_norm＝(k1l+bias)ncc (2)

the distance of the depth can then be represented by simply calculating the pixel distance, avoiding the need to calculate the depth for each match.

Regarding the parameter k1, the bias is selected, depending on the application scenario, for the plain, the absolute values of bias and k1 can be slightly larger, for example, 1.4, which means that ncc of the pixels at the depth close to the predicted depth is enlarged by 1.4 times. k1 may take-0.1, meaning that ncc is reduced to 0.9 times when the difference from the predicted depth to the pixel is up to 5 pixels.

Other more complex regularization functions may also be selected to achieve better regularization.

The optimum ncc can be quickly found by using the formula (2)_normWill obtain the optimum ncc_normThe point of (2) can be used as a matching point to calculate the depth of each pixel point in a triangularization mode. Depth unification, and image fusion may then be performed.

Example two

An embodiment of the present invention provides a depth map obtaining system, and fig. 4 is a schematic composition diagram of the depth map obtaining system, where the depth map obtaining system includes:

ncc_norm＝(k1△d+bias)ncc (1)

ncc_norm＝(k1l+bias)ncc (2)

In this embodiment, the regularization processing method is the same as the regularization processing method in the depth map obtaining method.

EXAMPLE III

The third embodiment of the invention provides an unmanned aerial vehicle orthographic view generating method which is characterized by comprising the following steps:

preprocessing all pictures;

Wherein, at the process that unmanned aerial vehicle orthographic mapping generated, the step that passes through has: preprocessing, point matching, SfM, MVS, depth unification and image fusion.

The preprocessing mainly comprises the steps of image distortion removal, image denoising and the like.

The point matching is divided into two methods, namely sparse point matching and dense point matching, the representative method of the sparse point matching is to perform feature point matching, and the feature point matching is divided into SIFT-based point matching, SURF-based point matching and the like due to different feature point extraction modes. Dense point matching is mainly referred to as optical flow. The sparse point matching effect is determined by the matching effect of the feature points, and usually, a good feature point extraction algorithm has good brightness, rotation and scale invariance. Dense point matching is carried out, because feature point descriptors do not need to be calculated, the matching speed is high, the sparsity degree of the result is controllable, the dependence on image textures is small, and the method has a good effect on two images with small changes and is suitable for inter-frame matching of videos.

Sfm (structure from motion) requires the use of matched pairs of points to estimate camera parameters and the coordinates of the spatial points corresponding to the matched points. Methods for SfM are divided into incremental SfM, global SfM and hybrid SfM.

The coordinates of a space point are determined by using a pair of matching points (the imaging positions of the space point in the two cameras are the positions of the pair of matching points in the respective images), one point in the point cloud can be generated by using the coordinates of the space point and the gray value in the images based on the coordinates of the space point, and the space points corresponding to all the matching points form a sparse point cloud.

MVS (Multi-view Stereo) requires reconstruction of a dense depth map for each picture using SfM estimated camera parameters and sparse spatial point clouds. This is also the stage to which the invention is primarily directed.

The step of generating the dense depth map is MVS, and in the process of MVS:

1. for a single picture: and calculating pictures A with overlapped areas with the pictures A one by one. Assume that a certain picture with an overlapping area is B.

2. For a pair of pictures A and B, all pixels in A are traversed, and in one traversal, if a pixel p is processed, pixels matched with p in B are searched in an epipolar line search + block matching mode.

And (4) depth unification, processing a dense depth map of each picture obtained by MVS, and enabling the depth of points in the image overlapping area to be consistent.

Image fusion is the fusion of images done with the rgb image of each image under the guidance of a dense depth map.

Example four

The fourth embodiment of the present invention provides a depth map obtaining apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the depth map obtaining method when executing the computer program.

The processor may be a Central Processing Unit (CPU), or other general-purpose processor, a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the depth map obtaining device in the invention by operating or executing the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

EXAMPLE five

An embodiment five of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the depth map obtaining method are implemented.

The depth map obtaining means, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A depth map obtaining method, characterized in that the method comprises:

2. The method according to claim 1, wherein a spatial regularization term is constructed during the block matching process, and a linear function with respect to depth distance is used for regularization.

3. The depth map obtaining method according to claim 2, wherein the method performs block matching using a normalized cross-correlation matching method.

4. The depth map obtaining method according to claim 3, wherein the method uses a linear function in equation (1) for regularization:

ncc_norm＝(k1△d+bias)ncc (1)

5. The depth map obtaining method according to claim 3, wherein the method uses a linear function in equation (2) for regularization:

ncc_norm＝(k1l+bias)ncc (2)

6. A depth map acquisition system, characterized in that the system comprises:

7. The depth map acquisition system of claim 6, wherein the system uses a linear function in equation (1) for regularization:

ncc_norm＝(k1△d+bias)ncc (1)

8. The depth map acquisition system of claim 6, wherein the system uses a linear function in equation (2) for regularization:

ncc_norm＝(k1l+bias)ncc (2)

9. An unmanned aerial vehicle orthographic mapping generation method is characterized by comprising the following steps:

preprocessing all pictures;

in the multi-view solid geometry calculation process, block matching adds space constraint for pixel point depth measurement through regularization processing.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when being executed by a processor, carries out the steps of the depth map obtaining method according to any one of claims 1 to 5.